OLLAMA - REST API Approach#

This page was generated from ollama-interactive-inference/ollama-sif-api-ibex.ipynb. You can view or Download notebook. Or view it on nbviewer

Objective#

In this notebook, we are going to use Ollama using REST API approach

Initial Setup#

If you have not completed the initial Conda environment setup and JupyterLab access steps, please refer to OLLama on Ibex Guide - Approach-2: Notebook Workflows (Jupyter Bases).

Starting the Ollama Server#

Start the OLLAMA REST API server using the following bash script in a terminal:

The script has the following:

A user editable section, where the user defines Ollama models scratch directory.
The allocated port is saved in a temporary ollama_port.txt file, in order to be used in the Python notebook to read the assigned port to Ollama server.
Cleanup section in order to stop the singularity instance when the script is terminated.

User Modification Section#

This section of the script is reserved for user-specific setup to set the directory where the Ollama models are pulled locally.

In the script, you will find a clearly marked block:

# ------------------------------------
# START OF USER MODIFICATION SECTION
# ------------------------------------

Note: Do not modify other parts of the script unless you are sure, as they are required for correct execution.

import os, subprocess

script_content = """
#!/bin/bash

# Pre-start cleanup: ensure no stale instances or files
pre_cleanup() {
    echo "Running pre-start cleanup..."

    # 1. Stop any running Singularity instance with the same name
    if singularity instance list | grep -q "$SINGULARITY_INSTANCE_NAME"; then
        echo "Stopping existing Singularity instance: $SINGULARITY_INSTANCE_NAME"
        singularity instance stop "$SINGULARITY_INSTANCE_NAME"
    fi

    # 2. Remove old temporary or state files
    if [ -n "$OLLAMA_PORT_TXT_FILE" ] && [ -f "$OLLAMA_PORT_TXT_FILE" ]; then
        echo "Removing old port file: $OLLAMA_PORT_TXT_FILE"
        rm -f "$OLLAMA_PORT_TXT_FILE"
    fi

    if [ -n "$OLLAMA_LOG_FILE" ] && [ -f "$OLLAMA_LOG_FILE" ]; then
        echo "Removing old log file: $OLLAMA_LOG_FILE"
        rm -f "$OLLAMA_LOG_FILE"
    fi

    echo "Cleanup complete — ready to start new instance."
}

# Cleanup process while exiting the server
cleanup() {
    echo "🧹   Cleaning up before exit..."
    # Put your exit commands here, e.g.:
    rm -f $OLLAMA_PORT_TXT_FILE
    # Remove the Singularity instance
    singularity instance stop $SINGULARITY_INSTANCE_NAME
}
trap cleanup SIGINT  # Catch Ctrl+C (SIGINT) and run cleanup
pre_cleanup

# --------------------------------
# START OF USER MODIFICATION SECTION
# --------------------------------
# Make target directory on /ibex/user/$USER/ollama_models_scratch to store your Ollama models
export OLLAMA_MODELS_SCRATCH=/ibex/user/$USER/ollama_models_scratch
# --------------------------------
# END OF USER Editable Section
# --------------------------------

mkdir -p $OLLAMA_MODELS_SCRATCH

SINGULARITY_INSTANCE_NAME='ollama'
SINGULARITY_SIF_FILE="${SINGULARITY_INSTANCE_NAME}.sif"
OLLAMA_PORT_TXT_FILE='ollama_port.txt'
LOG_FILE=$PWD/ollama_server.log

# 2. Load Singularity module
module load singularity

# 3. Pull OLLAMA docker image
singularity pull --name $SINGULARITY_SIF_FILE docker://ollama/ollama

# 4. Change the default port for OLLAMA_HOST: (default 127.0.0.1:11434)
export PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')

# 5. Copy the assigned port, it will be required in the second part during working on the notebook.
echo "$PORT" > $OLLAMA_PORT_TXT_FILE

echo "OLLAMA PORT: $PORT  -- Stored in $OLLAMA_PORT_TXT_FILE"

# 6. Define the OLLAMA Host
export SINGULARITYENV_OLLAMA_HOST=127.0.0.1:$PORT

# 7. Change the default model directory stored: 
export SINGULARITYENV_OLLAMA_MODELS=$OLLAMA_MODELS_SCRATCH

# 8. Create an Instance:
singularity instance start --nv -B "/ibex/user:/ibex/user" $SINGULARITY_SIF_FILE $SINGULARITY_INSTANCE_NAME

# 7. Run the OLLAMA REST API server on the background
nohup singularity exec instance://$SINGULARITY_INSTANCE_NAME bash -c "ollama serve" > $LOG_FILE 2>&1 &
echo "Ollama server started. Logs at: $LOG_FILE"
"""

# Write script file
script_path = "ollama-server-start.sh"
with open(script_path, "w") as f:
    f.write(script_content)
os.chmod(script_path, 0o755)

# Run script
subprocess.run(["bash", script_path])

Running pre-start cleanup...
Cleanup complete — ready to start new instance.
Loading module for Singularity
Singularity 3.9.7 modules now loaded
OLLAMA PORT: 53639  -- Stored in ollama_port.txt

ollama-server-start.sh: line 9: singularity: command not found
FATAL:   Image file already exists: "ollama.sif" - will not overwrite

Ollama server started. Logs at: /ibex/user/solimaay/scripts/jupyter/631115-ollama-sif/ibex-nb/ollama_server.log

INFO:    instance started successfully

CompletedProcess(args=['bash', 'ollama-server-start.sh'], returncode=0)

Using REST API Requests#

Follow the following Python notebook below, it contains the codes for:

Initialization Setup.
List local models.
Pull models.
Testing connection to the Ollama server.
Chat with the models.

1. Initialization#

Define the base URL for the remote Ollama Server.
Testing the Ollama server connectivity.

# 1.1- Define the base URL for the remote Ollama Server
with open("ollama_port.txt") as f :
    PORT = f.read().strip()
    
BASE_URL=f"http://127.0.0.1:{PORT}"
print(BASE_URL)

http://127.0.0.1:50677

# 1.2- Testing the Ollama server connectivity
import requests

try:
    r = requests.get(BASE_URL)
    print("Ollama is running!", r.status_code)
except requests.ConnectionError as e:
    print("Ollama is NOT reachable:", e)

Ollama is running! 200

2. Get a List of Local Models#

Get a list of locally available Ollama models.
Locally available models are located under path: /ibex/user/$USER/ollama_models_scratch
To change the location for pulled models, modify the variable OLLAMA_MODELS_SCRATCH in the scriptstart_ollama_server.sh

# Get a list of downloaded models
def get_local_models(base_url: str = BASE_URL):
    """
    Returns a list of locally available Ollama Models.

    Returns:
        list: A list of model names as strings

    Raises:
        RuntimeError: If there is a failure to connect the Ollama server.
    """
    r = requests.get(f"{base_url}/api/tags")
    if r.ok:
        models = r.json().get("models", [])
        return [m["name"] for m in models]
    else:
        raise RuntimeError(f"Failed to list models: {r.text}")

get_local_models()

['gemma3:270m', 'phi3:3.8b', 'qwen3:0.6b']

3. Pull The Model#

Pull a model from the Ollama server and stream the download progress.
Please refer to Ollama Library to check available models.

# Pull the required model
import requests

def pull_model(model: str, base_url: str =BASE_URL) -> list:
    """
    Pull a model from the Ollama server and stream the download progress.

    Args:
        model_name (str): Name of the model to pull.
        base_url (str, optional): Base URL of the Ollama server. Defaults to BASE_URL.

    Returns:
        list: A list of strings representing the streamed output lines.

    Raises:
        requests.HTTPError: If the server response indicates failure.
    """
    url = f"{base_url}/api/pull"
    response = requests.post(url, json={"name": model}, stream=True)

    if response.status_code != 200:
        raise requests.HTTPError(f"Failed to pull model '{model}': {response.text}")

    output_lines = []
    for line in response.iter_lines():
        if line:
            decoded = line.decode("utf-8")
            print(decoded)
            output_lines.append(decoded)

    return output_lines

# Usage
model = "phi3:3.8b"
output_logs = pull_model(model=model)

4. Running a Sample Query#

Send a single chat prompt to a specified Ollama model and stream the response.

import requests
import json
from typing import List

def chat_once(model: str, prompt: str, base_url: str = BASE_URL) -> List[str]:
    """
    Send a single chat prompt to a specified Ollama model and stream the response.

    Args:
        model (str): Name of the Ollama model to use.
        prompt (str): User input to send to the model.
        base_url (str, optional): Base URL of the Ollama server. Defaults to BASE_URL.

    Returns:
        List[str]: List of streamed output chunks from the model.

    Raises:
        requests.HTTPError: If the server response status is not 200.
    """
    url = f"{base_url}/api/chat"
    response = requests.post(
        url,
        json={"model": model, "messages": [{"role": "user", "content": prompt}]},
        stream=True
    )

    if response.status_code != 200:
        raise requests.HTTPError(f"Failed to chat with model '{model}': {response.text}")

    output_lines = []
    for line in response.iter_lines():
        if line:
            data = json.loads(line.decode('utf-8'))
            if "message" in data:
                content = data["message"]["content"]
                print(content, end="", flush=True)  # Stream to console
                output_lines.append(content)

    print()  # Newline after full response
    return output_lines

# Usage
model="qwen3:0.6b"
prompt= "How old are you"
output_logs = chat_once(model=model, prompt=prompt)

I don't have a physical age, but I can help you with a wide range of tasks, whether you need assistance with writing, math, or anything else. How can I assist you today?

5. Interactive Chat with Ollama Models#

This function enables a live, interactive conversation with a local Ollama LLM model.
Users can type messages in the terminal, and the model streams its responses in real time.
Features:
- Maintains conversation history between user and model.
- Supports multiple local models (must be pulled beforehand).
- Type ‘exit’ or ‘quit’ to end the session.
- Returns the full conversation history for further processing or logging.

import requests
import json
from typing import List, Dict

def ollama_chat(
    model: str,
    base_url: str = BASE_URL,
    system_prompt: str = 'You are a helpful assistant. You only give a short sentence by answer.'
) -> List [Dict[str, str]]:
    """
    Start an interactive chat session with a local Ollama model via HTTP streaming.

    This function streams responses from the model in real time, maintains conversation
    history, and allows the user to exit by typing 'exit'. A system prompt can guide
    the assistant's behavior.

    Args:
        model (str): Name of the local Ollama model to use.
        base_url (str, optional): Base URL of the Ollama server. Defaults to BASE_URL.
        system_prompt (str, optional): Instruction for the assistant. Defaults to a short-answer style.

    Returns:
        List[Dict[str, str]]: Full conversation history as a list of messages with roles ('user' or 'assistant').

    Raises:
        ValueError: If the requested model is not in the local models list.
        requests.HTTPError: If the chat request fails.
    """
    # Validate model existence
    if model not in get_local_models():
        raise ValueError(f"Requested model '{model}' is not in the local list. Pull the model first!")

    # Initialize message history
    history: List[Dict[str, str]] = []

    print("🤖 Chat started — type 'exit' to quit.\n")
    
    while True:
        user_input = input("👤 You: ").strip()
        if user_input.lower() == 'exit':
            print("👋 Goodbye!")
            break
    
        # Compose full message payload with system + history
        request_messages = [{'role': 'system', 'content': system_prompt}] + history + [{'role': 'user', 'content': user_input}]
    
        # Start request
        try:
            response = requests.post(
                f"{base_url}/api/chat",
                json={"model": model, "messages": request_messages},
                stream=True
            )

            if response.status_code != 200:
                raise requests.HTTPError(f"Chat request failed: {response.text}")

            assistant_reply = ""
            print("🤖 Ollama:", end=" ", flush=True)
    
            for line in response.iter_lines():
                if line:
                    data = json.loads(line.decode("utf-8"))
                    if "message" in data and "content" in data["message"]:
                        chunk = data["message"]["content"]
                        assistant_reply += chunk
                        print(chunk, end='', flush=True)
    
            print("\n")
    
            # Add interaction to message history
            history.append({'role': 'user', 'content': user_input})
            history.append({'role': 'assistant', 'content': assistant_reply})
    
        except Exception as e:
            print("\n⚠️ Error:", e)

    return history

# Usage
model = "qwen3:0.6b"
history = ollama_chat(model='qwen3:0.6b')

🤖 Chat started — type 'exit' to quit.

🤖 Ollama: The weather is cloudy with a light breeze.

👋 Goodbye!

Stop the Ollama Server#

Make sure to stop the Ollama server by terminating the Singularity container.

import subprocess
import os

def stop_singularity_instance(instance_name="ollama", log_file=None, port_file=None):
    """
    Gracefully stop a running Singularity instance by name, 
    and optionally remove associated log or port files.
    """
    print(f"Checking for Singularity instance: {instance_name}")

    # 1. Check if instance is running
    try:

        result = subprocess.run(
            'bash -lc "module load singularity 2>/dev/null || true; singularity instance list"',
            shell=True,
            capture_output=True,
            text=True
        )

        if instance_name not in result.stdout:
            print(f"No running instance named '{instance_name}' found.")
        else:
            print(f"Instance '{instance_name}' is running. Attempting to stop it...")
            stop_result = subprocess.run(
                f'bash -lc "module load singularity 2>/dev/null || true; singularity instance stop {instance_name}"',
                shell=True,
                capture_output=True,
                text=True
            )
            if stop_result.returncode == 0:
                print(f"Singularity instance '{instance_name}' stopped successfully.")
            else:
                print(f"Warning: Failed to stop instance '{instance_name}'.")
                print(stop_result.stderr)

    except FileNotFoundError:
        print("Singularity command not found. Ensure it's installed and in PATH.")
        return

    # 2. Optional cleanup for files
    if port_file and os.path.exists(port_file):
        os.remove(port_file)
        print(f"Removed port file: {port_file}")

    if log_file and os.path.exists(log_file):
        os.remove(log_file)
        print(f"Removed log file: {log_file}")

    print("Cleanup complete.")

stop_singularity_instance(
    instance_name="ollama",
    log_file=os.path.expandvars("$PWD/ollama_server.log"),
    port_file=os.path.expandvars("$PWD/ollama_port.txt")
)

Checking for Singularity instance: ollama
Instance 'ollama' is running. Attempting to stop it...
Singularity instance 'ollama' stopped successfully.
Cleanup complete.