OLLAMA - Python Package Approach#

This page was generated from ollama-interactive-inference/ollama-sif-py-ibex.ipynb. You can view or download notebook. Or view it on nbviewer

Objective#

In this notebook, we are going to use Ollama using python package approach

Initial Setup#

If you have not completed the initial Conda environment setup and JupyterLab access steps, please refer to OLLama on Ibex Guide - Approach-2: Notebook Workflows (Jupyter Bases).

Starting the Ollama Server#

Start the OLLAMA REST API server using the following bash script in a terminal:

The script has the following:

A user editable section, where the user defines Ollama models scratch directory.
The allocated port is saved in a temporary ollama_port.txt file, in order to be used in the Python notebook to read the assigned port to Ollama server.
Cleanup section in order to stop the singularity instance when the script is terminated.

User Modification Section#

This section of the script is reserved for user-specific setup to set the directory where the Ollama models are pulled locally.

In the script, you will find a clearly marked block:

# ------------------------------------
# START OF USER MODIFICATION SECTION
# ------------------------------------

Note: Do not modify other parts of the script unless you are sure, as they are required for correct execution.

import os, subprocess

script_content = """
#!/bin/bash

# Pre-start cleanup: ensure no stale instances or files
pre_cleanup() {
    echo "Running pre-start cleanup..."

    # 1. Stop any running Singularity instance with the same name
    if singularity instance list | grep -q "$SINGULARITY_INSTANCE_NAME"; then
        echo "Stopping existing Singularity instance: $SINGULARITY_INSTANCE_NAME"
        singularity instance stop "$SINGULARITY_INSTANCE_NAME"
    fi

    # 2. Remove old temporary or state files
    if [ -n "$OLLAMA_PORT_TXT_FILE" ] && [ -f "$OLLAMA_PORT_TXT_FILE" ]; then
        echo "Removing old port file: $OLLAMA_PORT_TXT_FILE"
        rm -f "$OLLAMA_PORT_TXT_FILE"
    fi

    if [ -n "$OLLAMA_LOG_FILE" ] && [ -f "$OLLAMA_LOG_FILE" ]; then
        echo "Removing old log file: $OLLAMA_LOG_FILE"
        rm -f "$OLLAMA_LOG_FILE"
    fi

    echo "Cleanup complete — ready to start new instance."
}

# Cleanup process while exiting the server
cleanup() {
    echo "🧹   Cleaning up before exit..."
    # Put your exit commands here, e.g.:
    rm -f $OLLAMA_PORT_TXT_FILE
    # Remove the Singularity instance
    singularity instance stop $SINGULARITY_INSTANCE_NAME
}
trap cleanup SIGINT  # Catch Ctrl+C (SIGINT) and run cleanup
pre_cleanup

# --------------------------------
# START OF USER MODIFICATION SECTION
# --------------------------------
# Make target directory on /ibex/user/$USER/ollama_models_scratch to store your Ollama models
export OLLAMA_MODELS_SCRATCH=/ibex/user/$USER/ollama_models_scratch
# --------------------------------
# END OF USER Editable Section
# --------------------------------

mkdir -p $OLLAMA_MODELS_SCRATCH

SINGULARITY_INSTANCE_NAME='ollama'
SINGULARITY_SIF_FILE="${SINGULARITY_INSTANCE_NAME}.sif"
OLLAMA_PORT_TXT_FILE='ollama_port.txt'
LOG_FILE=$PWD/ollama_server.log

# 2. Load Singularity module
module load singularity

# 3. Pull OLLAMA docker image
singularity pull --name $SINGULARITY_SIF_FILE docker://ollama/ollama

# 4. Change the default port for OLLAMA_HOST: (default 127.0.0.1:11434)
export PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')

# 5. Copy the assigned port, it will be required in the second part during working on the notebook.
echo "$PORT" > $OLLAMA_PORT_TXT_FILE

echo "OLLAMA PORT: $PORT  -- Stored in $OLLAMA_PORT_TXT_FILE"

# 6. Define the OLLAMA Host
export SINGULARITYENV_OLLAMA_HOST=127.0.0.1:$PORT

# 7. Change the default model directory stored: 
export SINGULARITYENV_OLLAMA_MODELS=$OLLAMA_MODELS_SCRATCH

# 8. Create an Instance:
singularity instance start --nv -B "/ibex/user:/ibex/user" $SINGULARITY_SIF_FILE $SINGULARITY_INSTANCE_NAME

# 7. Run the OLLAMA REST API server on the background
nohup singularity exec instance://$SINGULARITY_INSTANCE_NAME bash -c "ollama serve" > $LOG_FILE 2>&1 &
echo "Ollama server started. Logs at: $LOG_FILE"
"""

# Write script file
script_path = "ollama-server-start.sh"
with open(script_path, "w") as f:
    f.write(script_content)
os.chmod(script_path, 0o755)

# Run script
subprocess.run(["bash", script_path])

Running pre-start cleanup...
Cleanup complete — ready to start new instance.
Loading module for Singularity
Singularity 3.9.7 modules now loaded
OLLAMA PORT: 50677  -- Stored in ollama_port.txt

ollama-server-start.sh: line 9: singularity: command not found
FATAL:   Image file already exists: "ollama.sif" - will not overwrite

Ollama server started. Logs at: /ibex/user/solimaay/scripts/jupyter/631115-ollama-sif/ibex-nb/ollama_server.log

INFO:    instance started successfully

CompletedProcess(args=['bash', 'ollama-server-start.sh'], returncode=0)

Using REST API Requests#

Follow the following Python notebook below, it contains the codes for:

Initialization Setup.
List local models.
Pull models.
Testing connection to the Ollama server.
Chat with the models.

1. Initialization#

Define the base URL for the remote Ollama Server.
Create a connection Object to talk to the Ollama server

# 1.1- Define the base URL for the remote Ollama Server.
with open("ollama_port.txt") as f :
    PORT = f.read().strip()
BASE_URL=f"http://127.0.0.1:{PORT}"
print(BASE_URL)

http://127.0.0.1:50677

# 2. Create a connection Object to talk to the Ollama server
from ollama import Client

# Create a client instance
client = Client(
  host=BASE_URL,
)

2. Get a List of Local Models#

Get a list of locally available Ollama models.
Locally available models are located under path: /ibex/user/$USER/ollama_models_scratch
To change the location for pulled models, modify the variable OLLAMA_MODELS_SCRATCH in the scriptstart_ollama_server.sh

def get_local_models():
    """
    Returns a list of locally available Ollama Models.

    Returns:
        list: A list of model names as strings
    """
    models = [model['model'] for model in client.list()['models']]
    return models

get_local_models()

['gemma3:270m', 'phi3:3.8b', 'qwen3:0.6b']

3. Pull The Model#

To pull a specific model, use pull method.
Please refer to Ollama Library to check available models.

# Pull the required models
client.pull("gemma3:270m")

ProgressResponse(status='success', completed=None, total=None, digest=None)

4. Running a sample query#

4.1- Non-Streaming Request#

Sends the full message to the model and waits for the complete response.
The function returns only after the model finishes generating.
Simple and easy to use
Slower perceived latency — nothing is shown until the answer is complete.

# Set the target LLM model
model = 'gemma3:270m'

response = client.chat(model=model, messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
response['message']['content']

"The sky is blue because of a phenomenon called Rayleigh scattering. \n\nHere's a breakdown of why:\n\n*   **Sunlight:** Sunlight is made up of all the colors of the rainbow. When sunlight enters the Earth's atmosphere, it bumps into tiny air molecules. These molecules are much smaller than the atoms in the air, so they scatter the light in all directions.\n\n*   **Rayleigh Scattering:** Because the light has to travel through a much longer path than the atmosphere, the scattered light is much less intense than the ambient light. This is called Rayleigh scattering.\n\n*   **Blue Light:** Blue light has a shorter wavelength than other colors. Therefore, it's scattered more strongly by the air molecules in the atmosphere.\n\n*   **Why Blue?** The blue color of the sky is due to the scattering of blue light by the air molecules. This makes the sky appear blue to our eyes.\n\n"

4.2- Streaming Request (Synchronous)#

Requests the model to stream its output as it’s generated.
Each ‘chunk’ contains a partial piece of the message.
Ideal for real-time display or CLI tools
Still blocks your main thread while waiting for new chunks.

# Set the target LLM model
model = 'gemma3:270m'

stream = client.chat(
    model=model,
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

The sky is blue because of a phenomenon called **Rayleigh Scattering**. Here's the breakdown:

*   **Light's Journey:** Light travels in a wave-like motion.
*   **Entering the Atmosphere:** When light hits the Earth's atmosphere, it interacts with air molecules.
*   **Scattering:** The molecules in the atmosphere scatter the light, causing it to be scattered in all directions.
*   **Blue Light:** The most energetic part of the light is blue light.
*   **Rayleigh Scattering:** This scattering is what allows the blue light to be scattered away.
*   **Why Blue?** Because blue light is scattered more than other colors, making the sky appear blue.

4.3- Asynchronous Streaming Chat#

Same as streaming mode, but runs inside an async event loop.
Allows other async tasks to run concurrently while receiving outputs.
Best for Jupyter notebooks, web servers, or multitasking apps
Requires ‘await’ and async context to run properly.

import asyncio
from ollama import AsyncClient

async def chat_with_model(model: str, prompt: str, base_url: str = BASE_URL):
    """
    Stream a chat response from a local Ollama model for a given prompt.

    Args:
        model (str): Name of the local Ollama model to use.
        prompt (str): The user input to send to the model.
        base_url (str, optional): The host URL for the Ollama server. Defaults to BASE_URL.

    Raises:
        ValueError: If the requested model is not in the local models list.
    """
    # Validate model existence
    if model not in get_local_models():
        raise ValueError(f"Requested model '{model}' is not in the local list. Pull the model first!")
        
    message = {'role': 'user', 'content': prompt}

    client = AsyncClient(host=base_url)
    async for part in await client.chat(model=model, messages=[message], stream=True):
        print(part['message']['content'], end='', flush=True)


# Usage
model = 'gemma3:270m'
await chat_with_model(model=model, prompt="Why the sky is blue?")

The sky is blue due to a phenomenon called Rayleigh scattering. Here's the breakdown:

*   **Rayleigh Scattering:** This is the scattering of electromagnetic radiation (light) by particles of a much smaller wavelength. When light interacts with air molecules, it absorbs some of the energy and re-emits it as visible light.

*   **Blue Light:** Blue and violet light have a shorter wavelength than other colors. Therefore, they are scattered much more strongly by the air molecules than other colors like red or orange.

*   **Why We See Blue:** Because blue light is scattered more than other colors, we perceive the sky as blue.

5- Interactive Chat with Ollama Models#

This function enables a live, interactive conversation with a local Ollama LLM model.
Users can type messages in the terminal, and the model streams its responses in real time.
Features:
- Maintains conversation history between user and model.
- Supports multiple local models (must be pulled beforehand).
- Type ‘exit’ or ‘quit’ to end the session.
- Returns the full conversation history for further processing or logging.

import asyncio
from ollama import AsyncClient

# Stores full conversation history
messages = []

async def interactive_chat(model: str, base_url: str = BASE_URL, history: list = None):
    """
    Start an interactive chat session with a local Ollama model.

    This function streams responses from the model in real time,
    maintaining conversation history. Users can type 'exit' or 'quit'
    to end the session.

    Args:
        model (str): Name of the local Ollama model to use.
        base_url (str, optional): Host URL for the Ollama server. Defaults to BASE_URL.
        history (list, optional): Pre-existing conversation history. Defaults to a new list.

    Returns:
        list: Full conversation history as a list of message dictionaries.

    Raises:
        ValueError: If the requested model is not in the local models list.
    """
    # Validate model existence
    if model not in get_local_models():
        raise ValueError(f"Requested model '{model}' is not in the local list. Pull the model first!")

    if history is None:
        history = []
        
    client = AsyncClient(host=base_url)
    print("🤖 Chat started — type 'exit' to quit.\n")

    while True:
        user_input = input("👤 You: ")
        if user_input.lower().strip() in {"exit", "quit"}:
            print("👋 Goodbye!")
            break

        # Add user input to history
        history.append({"role": "user", "content": user_input})

        print("🤖 Ollama:", end=" ", flush=True)
        assistant_reply = ""

        async for chunk in await client.chat(
            model=model,
            messages=history,
            stream=True
        ):
            if chunk.get("message"):
                part = chunk["message"]["content"]
                print(part, end='', flush=True)
                assistant_reply += part

        print("\n")  # Newline after full reply

        # Add assistant reply to history
        history.append({"role": "assistant", "content": assistant_reply})
    
    return history

# Usage
model = 'gemma3:270m'
history = await interactive_chat(model=model)

🤖 Chat started — type 'exit' to quit.

🤖 Ollama: HPC stands for **High-Performance Computing**. It's a field of computer science that focuses on designing and optimizing computing systems that can handle massive amounts of data and complex calculations efficiently. 

Here's a breakdown of what HPC is about:

*   **Data Processing:** HPC systems are designed to process and analyze large datasets, making them suitable for scientific, engineering, and business applications.
*   **Complex Calculations:** HPC algorithms are used to perform computationally intensive tasks such as:
    *   **Machine Learning:** Training and deploying machine learning models on large datasets.
    *   **Scientific Simulations:** Simulating complex physical phenomena, such as climate modeling, fluid dynamics, and astrophysics.
    *   **Data Mining:** Extracting valuable insights from large datasets.
    *   **Cryptography:** Developing and implementing secure cryptographic algorithms.
*   **Scalability:** HPC systems can be scaled up to handle increasing workloads and data volumes.
*   **Resource Optimization:** HPC systems can be optimized to minimize energy consumption, reduce hardware costs, and improve the performance of existing hardware.
*   **Real-Time Computing:** HPC systems are often designed to operate in real-time, enabling applications that require immediate responses to changing conditions.

In essence, HPC aims to make computing more powerful, efficient, and accessible by enabling the development and deployment of complex and demanding applications.

👋 Goodbye!

Stop the Ollama Server#

Make sure to stop the Ollama server by terminating the Singularity container.

import subprocess
import os

def stop_singularity_instance(instance_name="ollama", log_file=None, port_file=None):
    """
    Gracefully stop a running Singularity instance by name, 
    and optionally remove associated log or port files.
    """
    print(f"Checking for Singularity instance: {instance_name}")

    # 1. Check if instance is running
    try:

        result = subprocess.run(
            'bash -lc "module load singularity 2>/dev/null || true; singularity instance list"',
            shell=True,
            capture_output=True,
            text=True
        )

        if instance_name not in result.stdout:
            print(f"No running instance named '{instance_name}' found.")
        else:
            print(f"Instance '{instance_name}' is running. Attempting to stop it...")
            stop_result = subprocess.run(
                f'bash -lc "module load singularity 2>/dev/null || true; singularity instance stop {instance_name}"',
                shell=True,
                capture_output=True,
                text=True
            )
            if stop_result.returncode == 0:
                print(f"Singularity instance '{instance_name}' stopped successfully.")
            else:
                print(f"Warning: Failed to stop instance '{instance_name}'.")
                print(stop_result.stderr)

    except FileNotFoundError:
        print("Singularity command not found. Ensure it's installed and in PATH.")
        return

    # 2. Optional cleanup for files
    if port_file and os.path.exists(port_file):
        os.remove(port_file)
        print(f"Removed port file: {port_file}")

    if log_file and os.path.exists(log_file):
        os.remove(log_file)
        print(f"Removed log file: {log_file}")

    print("Cleanup complete.")

stop_singularity_instance(
    instance_name="ollama",
    log_file=os.path.expandvars("$PWD/ollama_server.log"),
    port_file=os.path.expandvars("$PWD/ollama_port.txt")
)

Checking for Singularity instance: ollama
Instance 'ollama' is running. Attempting to stop it...
Singularity instance 'ollama' stopped successfully.
Cleanup complete.