Checkout, Frequently Asked Questions!

OLLAMA - REST API Approach#

Objective#

In this notebook, we are going to use Ollama using REST API approach

Initial Setup#

If you have not completed the initial Conda environment setup and JupyterLab access steps, please refer to OLLama on Ibex Guide - Approach-2: Notebook Workflows (Jupyter Bases).

Starting the Ollama Server#

Start the OLLAMA REST API server using the following bash script in a terminal:

The script has the following:

  • A user editable section, where the user defines Ollama models scratch directory.

  • The allocated port is saved in a temporary ollama_port.txt file, in order to be used in the Python notebook to read the assigned port to Ollama server.

  • Cleanup section in order to stop the singularity instance when the script is terminated.

User Modification Section#

  • This section of the script is reserved for user-specific setup to set the directory where the Ollama models are pulled locally.

  • In the script, you will find a clearly marked block:

    # ------------------------------------
    # START OF USER MODIFICATION SECTION
    # ------------------------------------
    

Note: Do not modify other parts of the script unless you are sure, as they are required for correct execution.

import os, subprocess

script_content = """
#!/bin/bash

# Pre-start cleanup: ensure no stale instances or files
pre_cleanup() {
    echo "Running pre-start cleanup..."

    # 1. Stop any running Singularity instance with the same name
    if singularity instance list | grep -q "$SINGULARITY_INSTANCE_NAME"; then
        echo "Stopping existing Singularity instance: $SINGULARITY_INSTANCE_NAME"
        singularity instance stop "$SINGULARITY_INSTANCE_NAME"
    fi

    # 2. Remove old temporary or state files
    if [ -n "$OLLAMA_PORT_TXT_FILE" ] && [ -f "$OLLAMA_PORT_TXT_FILE" ]; then
        echo "Removing old port file: $OLLAMA_PORT_TXT_FILE"
        rm -f "$OLLAMA_PORT_TXT_FILE"
    fi

    if [ -n "$OLLAMA_LOG_FILE" ] && [ -f "$OLLAMA_LOG_FILE" ]; then
        echo "Removing old log file: $OLLAMA_LOG_FILE"
        rm -f "$OLLAMA_LOG_FILE"
    fi

    echo "Cleanup complete โ€” ready to start new instance."
}

# Cleanup process while exiting the server
cleanup() {
    echo "๐Ÿงน   Cleaning up before exit..."
    # Put your exit commands here, e.g.:
    rm -f $OLLAMA_PORT_TXT_FILE
    # Remove the Singularity instance
    singularity instance stop $SINGULARITY_INSTANCE_NAME
}
trap cleanup SIGINT  # Catch Ctrl+C (SIGINT) and run cleanup
pre_cleanup

# --------------------------------
# START OF USER MODIFICATION SECTION
# --------------------------------
# Make target directory on /ibex/user/$USER/ollama_models_scratch to store your Ollama models
export OLLAMA_MODELS_SCRATCH=/ibex/user/$USER/ollama_models_scratch
# --------------------------------
# END OF USER Editable Section
# --------------------------------

mkdir -p $OLLAMA_MODELS_SCRATCH

SINGULARITY_INSTANCE_NAME='ollama'
SINGULARITY_SIF_FILE="${SINGULARITY_INSTANCE_NAME}.sif"
OLLAMA_PORT_TXT_FILE='ollama_port.txt'
LOG_FILE=$PWD/ollama_server.log

# 2. Load Singularity module
module load singularity

# 3. Pull OLLAMA docker image
singularity pull --name $SINGULARITY_SIF_FILE docker://ollama/ollama

# 4. Change the default port for OLLAMA_HOST: (default 127.0.0.1:11434)
export PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')

# 5. Copy the assigned port, it will be required in the second part during working on the notebook.
echo "$PORT" > $OLLAMA_PORT_TXT_FILE

echo "OLLAMA PORT: $PORT  -- Stored in $OLLAMA_PORT_TXT_FILE"

# 6. Define the OLLAMA Host
export SINGULARITYENV_OLLAMA_HOST=127.0.0.1:$PORT

# 7. Change the default model directory stored: 
export SINGULARITYENV_OLLAMA_MODELS=$OLLAMA_MODELS_SCRATCH

# 8. Create an Instance:
singularity instance start --nv -B "/ibex/user:/ibex/user" $SINGULARITY_SIF_FILE $SINGULARITY_INSTANCE_NAME

# 7. Run the OLLAMA REST API server on the background
nohup singularity exec instance://$SINGULARITY_INSTANCE_NAME bash -c "ollama serve" > $LOG_FILE 2>&1 &
echo "Ollama server started. Logs at: $LOG_FILE"
"""

# Write script file
script_path = "ollama-server-start.sh"
with open(script_path, "w") as f:
    f.write(script_content)
os.chmod(script_path, 0o755)

# Run script
subprocess.run(["bash", script_path])
Running pre-start cleanup...
Cleanup complete โ€” ready to start new instance.
Loading module for Singularity
Singularity 3.9.7 modules now loaded
OLLAMA PORT: 53639  -- Stored in ollama_port.txt
ollama-server-start.sh: line 9: singularity: command not found
FATAL:   Image file already exists: "ollama.sif" - will not overwrite
Ollama server started. Logs at: /ibex/user/solimaay/scripts/jupyter/631115-ollama-sif/ibex-nb/ollama_server.log
INFO:    instance started successfully
CompletedProcess(args=['bash', 'ollama-server-start.sh'], returncode=0)

Using REST API Requests#

Follow the following Python notebook below, it contains the codes for:

  • Initialization Setup.

  • List local models.

  • Pull models.

  • Testing connection to the Ollama server.

  • Chat with the models.

1. Initialization#

  1. Define the base URL for the remote Ollama Server.

  2. Testing the Ollama server connectivity.

# 1.1- Define the base URL for the remote Ollama Server
with open("ollama_port.txt") as f :
    PORT = f.read().strip()
    
BASE_URL=f"http://127.0.0.1:{PORT}"
print(BASE_URL)
http://127.0.0.1:50677
# 1.2- Testing the Ollama server connectivity
import requests

try:
    r = requests.get(BASE_URL)
    print("Ollama is running!", r.status_code)
except requests.ConnectionError as e:
    print("Ollama is NOT reachable:", e)
Ollama is running! 200

2. Get a List of Local Models#

  • Get a list of locally available Ollama models.

  • Locally available models are located under path: /ibex/user/$USER/ollama_models_scratch

  • To change the location for pulled models, modify the variable OLLAMA_MODELS_SCRATCH in the scriptstart_ollama_server.sh

# Get a list of downloaded models
def get_local_models(base_url: str = BASE_URL):
    """
    Returns a list of locally available Ollama Models.

    Returns:
        list: A list of model names as strings

    Raises:
        RuntimeError: If there is a failure to connect the Ollama server.
    """
    r = requests.get(f"{base_url}/api/tags")
    if r.ok:
        models = r.json().get("models", [])
        return [m["name"] for m in models]
    else:
        raise RuntimeError(f"Failed to list models: {r.text}")

get_local_models()
['gemma3:270m', 'phi3:3.8b', 'qwen3:0.6b']

3. Pull The Model#

  • Pull a model from the Ollama server and stream the download progress.

  • Please refer to Ollama Library to check available models.

# Pull the required model
import requests

def pull_model(model: str, base_url: str =BASE_URL) -> list:
    """
    Pull a model from the Ollama server and stream the download progress.

    Args:
        model_name (str): Name of the model to pull.
        base_url (str, optional): Base URL of the Ollama server. Defaults to BASE_URL.

    Returns:
        list: A list of strings representing the streamed output lines.

    Raises:
        requests.HTTPError: If the server response indicates failure.
    """
    url = f"{base_url}/api/pull"
    response = requests.post(url, json={"name": model}, stream=True)

    if response.status_code != 200:
        raise requests.HTTPError(f"Failed to pull model '{model}': {response.text}")

    output_lines = []
    for line in response.iter_lines():
        if line:
            decoded = line.decode("utf-8")
            print(decoded)
            output_lines.append(decoded)

    return output_lines
# Usage
model = "phi3:3.8b"
output_logs = pull_model(model=model)

4. Running a Sample Query#

  • Send a single chat prompt to a specified Ollama model and stream the response.

import requests
import json
from typing import List

def chat_once(model: str, prompt: str, base_url: str = BASE_URL) -> List[str]:
    """
    Send a single chat prompt to a specified Ollama model and stream the response.

    Args:
        model (str): Name of the Ollama model to use.
        prompt (str): User input to send to the model.
        base_url (str, optional): Base URL of the Ollama server. Defaults to BASE_URL.

    Returns:
        List[str]: List of streamed output chunks from the model.

    Raises:
        requests.HTTPError: If the server response status is not 200.
    """
    url = f"{base_url}/api/chat"
    response = requests.post(
        url,
        json={"model": model, "messages": [{"role": "user", "content": prompt}]},
        stream=True
    )

    if response.status_code != 200:
        raise requests.HTTPError(f"Failed to chat with model '{model}': {response.text}")

    output_lines = []
    for line in response.iter_lines():
        if line:
            data = json.loads(line.decode('utf-8'))
            if "message" in data:
                content = data["message"]["content"]
                print(content, end="", flush=True)  # Stream to console
                output_lines.append(content)

    print()  # Newline after full response
    return output_lines
# Usage
model="qwen3:0.6b"
prompt= "How old are you"
output_logs = chat_once(model=model, prompt=prompt)
I don't have a physical age, but I can help you with a wide range of tasks, whether you need assistance with writing, math, or anything else. How can I assist you today?

5. Interactive Chat with Ollama Models#

  • This function enables a live, interactive conversation with a local Ollama LLM model.

  • Users can type messages in the terminal, and the model streams its responses in real time.

  • Features:

    • Maintains conversation history between user and model.

    • Supports multiple local models (must be pulled beforehand).

    • Type โ€˜exitโ€™ or โ€˜quitโ€™ to end the session.

    • Returns the full conversation history for further processing or logging.

import requests
import json
from typing import List, Dict

def ollama_chat(
    model: str,
    base_url: str = BASE_URL,
    system_prompt: str = 'You are a helpful assistant. You only give a short sentence by answer.'
) -> List [Dict[str, str]]:
    """
    Start an interactive chat session with a local Ollama model via HTTP streaming.

    This function streams responses from the model in real time, maintains conversation
    history, and allows the user to exit by typing 'exit'. A system prompt can guide
    the assistant's behavior.

    Args:
        model (str): Name of the local Ollama model to use.
        base_url (str, optional): Base URL of the Ollama server. Defaults to BASE_URL.
        system_prompt (str, optional): Instruction for the assistant. Defaults to a short-answer style.

    Returns:
        List[Dict[str, str]]: Full conversation history as a list of messages with roles ('user' or 'assistant').

    Raises:
        ValueError: If the requested model is not in the local models list.
        requests.HTTPError: If the chat request fails.
    """
    # Validate model existence
    if model not in get_local_models():
        raise ValueError(f"Requested model '{model}' is not in the local list. Pull the model first!")

    # Initialize message history
    history: List[Dict[str, str]] = []

    print("๐Ÿค– Chat started โ€” type 'exit' to quit.\n")
    
    while True:
        user_input = input("๐Ÿ‘ค You: ").strip()
        if user_input.lower() == 'exit':
            print("๐Ÿ‘‹ Goodbye!")
            break
    
        # Compose full message payload with system + history
        request_messages = [{'role': 'system', 'content': system_prompt}] + history + [{'role': 'user', 'content': user_input}]
    
        # Start request
        try:
            response = requests.post(
                f"{base_url}/api/chat",
                json={"model": model, "messages": request_messages},
                stream=True
            )

            if response.status_code != 200:
                raise requests.HTTPError(f"Chat request failed: {response.text}")

            assistant_reply = ""
            print("๐Ÿค– Ollama:", end=" ", flush=True)
    
            for line in response.iter_lines():
                if line:
                    data = json.loads(line.decode("utf-8"))
                    if "message" in data and "content" in data["message"]:
                        chunk = data["message"]["content"]
                        assistant_reply += chunk
                        print(chunk, end='', flush=True)
    
            print("\n")
    
            # Add interaction to message history
            history.append({'role': 'user', 'content': user_input})
            history.append({'role': 'assistant', 'content': assistant_reply})
    
        except Exception as e:
            print("\nโš ๏ธ Error:", e)

    return history
# Usage
model = "qwen3:0.6b"
history = ollama_chat(model='qwen3:0.6b')
๐Ÿค– Chat started โ€” type 'exit' to quit.
๐Ÿค– Ollama: The weather is cloudy with a light breeze.
๐Ÿ‘‹ Goodbye!

Stop the Ollama Server#

Make sure to stop the Ollama server by terminating the Singularity container.

import subprocess
import os

def stop_singularity_instance(instance_name="ollama", log_file=None, port_file=None):
    """
    Gracefully stop a running Singularity instance by name, 
    and optionally remove associated log or port files.
    """
    print(f"Checking for Singularity instance: {instance_name}")

    # 1. Check if instance is running
    try:

        result = subprocess.run(
            'bash -lc "module load singularity 2>/dev/null || true; singularity instance list"',
            shell=True,
            capture_output=True,
            text=True
        )

        if instance_name not in result.stdout:
            print(f"No running instance named '{instance_name}' found.")
        else:
            print(f"Instance '{instance_name}' is running. Attempting to stop it...")
            stop_result = subprocess.run(
                f'bash -lc "module load singularity 2>/dev/null || true; singularity instance stop {instance_name}"',
                shell=True,
                capture_output=True,
                text=True
            )
            if stop_result.returncode == 0:
                print(f"Singularity instance '{instance_name}' stopped successfully.")
            else:
                print(f"Warning: Failed to stop instance '{instance_name}'.")
                print(stop_result.stderr)

    except FileNotFoundError:
        print("Singularity command not found. Ensure it's installed and in PATH.")
        return

    # 2. Optional cleanup for files
    if port_file and os.path.exists(port_file):
        os.remove(port_file)
        print(f"Removed port file: {port_file}")

    if log_file and os.path.exists(log_file):
        os.remove(log_file)
        print(f"Removed log file: {log_file}")

    print("Cleanup complete.")
stop_singularity_instance(
    instance_name="ollama",
    log_file=os.path.expandvars("$PWD/ollama_server.log"),
    port_file=os.path.expandvars("$PWD/ollama_port.txt")
)
Checking for Singularity instance: ollama
Instance 'ollama' is running. Attempting to stop it...
Singularity instance 'ollama' stopped successfully.
Cleanup complete.