OLLAMA - REST API Approach#
This page was generated from ollama-interactive-inference/ollama-sif-api-ibex.ipynb. You can view or Download notebook. Or view it on nbviewer
Objective#
In this notebook, we are going to use Ollama using REST API approach
Initial Setup#
If you have not completed the initial Conda environment setup and JupyterLab access steps, please refer to OLLama on Ibex Guide - Approach-2: Notebook Workflows (Jupyter Bases).
Starting the Ollama Server#
Start the OLLAMA REST API server using the following bash script in a terminal:
The script has the following:
A user editable section, where the user defines Ollama models scratch directory.
The allocated port is saved in a temporary ollama_port.txt file, in order to be used in the Python notebook to read the assigned port to Ollama server.
Cleanup section in order to stop the singularity instance when the script is terminated.
User Modification Section#
This section of the script is reserved for user-specific setup to set the directory where the Ollama models are pulled locally.
In the script, you will find a clearly marked block:
# ------------------------------------ # START OF USER MODIFICATION SECTION # ------------------------------------
Note: Do not modify other parts of the script unless you are sure, as they are required for correct execution.
import os, subprocess
script_content = """
#!/bin/bash
# Pre-start cleanup: ensure no stale instances or files
pre_cleanup() {
echo "Running pre-start cleanup..."
# 1. Stop any running Singularity instance with the same name
if singularity instance list | grep -q "$SINGULARITY_INSTANCE_NAME"; then
echo "Stopping existing Singularity instance: $SINGULARITY_INSTANCE_NAME"
singularity instance stop "$SINGULARITY_INSTANCE_NAME"
fi
# 2. Remove old temporary or state files
if [ -n "$OLLAMA_PORT_TXT_FILE" ] && [ -f "$OLLAMA_PORT_TXT_FILE" ]; then
echo "Removing old port file: $OLLAMA_PORT_TXT_FILE"
rm -f "$OLLAMA_PORT_TXT_FILE"
fi
if [ -n "$OLLAMA_LOG_FILE" ] && [ -f "$OLLAMA_LOG_FILE" ]; then
echo "Removing old log file: $OLLAMA_LOG_FILE"
rm -f "$OLLAMA_LOG_FILE"
fi
echo "Cleanup complete โ ready to start new instance."
}
# Cleanup process while exiting the server
cleanup() {
echo "๐งน Cleaning up before exit..."
# Put your exit commands here, e.g.:
rm -f $OLLAMA_PORT_TXT_FILE
# Remove the Singularity instance
singularity instance stop $SINGULARITY_INSTANCE_NAME
}
trap cleanup SIGINT # Catch Ctrl+C (SIGINT) and run cleanup
pre_cleanup
# --------------------------------
# START OF USER MODIFICATION SECTION
# --------------------------------
# Make target directory on /ibex/user/$USER/ollama_models_scratch to store your Ollama models
export OLLAMA_MODELS_SCRATCH=/ibex/user/$USER/ollama_models_scratch
# --------------------------------
# END OF USER Editable Section
# --------------------------------
mkdir -p $OLLAMA_MODELS_SCRATCH
SINGULARITY_INSTANCE_NAME='ollama'
SINGULARITY_SIF_FILE="${SINGULARITY_INSTANCE_NAME}.sif"
OLLAMA_PORT_TXT_FILE='ollama_port.txt'
LOG_FILE=$PWD/ollama_server.log
# 2. Load Singularity module
module load singularity
# 3. Pull OLLAMA docker image
singularity pull --name $SINGULARITY_SIF_FILE docker://ollama/ollama
# 4. Change the default port for OLLAMA_HOST: (default 127.0.0.1:11434)
export PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
# 5. Copy the assigned port, it will be required in the second part during working on the notebook.
echo "$PORT" > $OLLAMA_PORT_TXT_FILE
echo "OLLAMA PORT: $PORT -- Stored in $OLLAMA_PORT_TXT_FILE"
# 6. Define the OLLAMA Host
export SINGULARITYENV_OLLAMA_HOST=127.0.0.1:$PORT
# 7. Change the default model directory stored:
export SINGULARITYENV_OLLAMA_MODELS=$OLLAMA_MODELS_SCRATCH
# 8. Create an Instance:
singularity instance start --nv -B "/ibex/user:/ibex/user" $SINGULARITY_SIF_FILE $SINGULARITY_INSTANCE_NAME
# 7. Run the OLLAMA REST API server on the background
nohup singularity exec instance://$SINGULARITY_INSTANCE_NAME bash -c "ollama serve" > $LOG_FILE 2>&1 &
echo "Ollama server started. Logs at: $LOG_FILE"
"""
# Write script file
script_path = "ollama-server-start.sh"
with open(script_path, "w") as f:
f.write(script_content)
os.chmod(script_path, 0o755)
# Run script
subprocess.run(["bash", script_path])
Running pre-start cleanup...
Cleanup complete โ ready to start new instance.
Loading module for Singularity
Singularity 3.9.7 modules now loaded
OLLAMA PORT: 53639 -- Stored in ollama_port.txt
ollama-server-start.sh: line 9: singularity: command not found
FATAL: Image file already exists: "ollama.sif" - will not overwrite
Ollama server started. Logs at: /ibex/user/solimaay/scripts/jupyter/631115-ollama-sif/ibex-nb/ollama_server.log
INFO: instance started successfully
CompletedProcess(args=['bash', 'ollama-server-start.sh'], returncode=0)
Using REST API Requests#
Follow the following Python notebook below, it contains the codes for:
Initialization Setup.
List local models.
Pull models.
Testing connection to the Ollama server.
Chat with the models.
1. Initialization#
Define the base URL for the remote Ollama Server.
Testing the Ollama server connectivity.
# 1.1- Define the base URL for the remote Ollama Server
with open("ollama_port.txt") as f :
PORT = f.read().strip()
BASE_URL=f"http://127.0.0.1:{PORT}"
print(BASE_URL)
http://127.0.0.1:50677
# 1.2- Testing the Ollama server connectivity
import requests
try:
r = requests.get(BASE_URL)
print("Ollama is running!", r.status_code)
except requests.ConnectionError as e:
print("Ollama is NOT reachable:", e)
Ollama is running! 200
2. Get a List of Local Models#
Get a list of locally available Ollama models.
Locally available models are located under path: /ibex/user/$USER/ollama_models_scratch
To change the location for pulled models, modify the variable OLLAMA_MODELS_SCRATCH in the scriptstart_ollama_server.sh
# Get a list of downloaded models
def get_local_models(base_url: str = BASE_URL):
"""
Returns a list of locally available Ollama Models.
Returns:
list: A list of model names as strings
Raises:
RuntimeError: If there is a failure to connect the Ollama server.
"""
r = requests.get(f"{base_url}/api/tags")
if r.ok:
models = r.json().get("models", [])
return [m["name"] for m in models]
else:
raise RuntimeError(f"Failed to list models: {r.text}")
get_local_models()
['gemma3:270m', 'phi3:3.8b', 'qwen3:0.6b']
3. Pull The Model#
Pull a model from the Ollama server and stream the download progress.
Please refer to Ollama Library to check available models.
# Pull the required model
import requests
def pull_model(model: str, base_url: str =BASE_URL) -> list:
"""
Pull a model from the Ollama server and stream the download progress.
Args:
model_name (str): Name of the model to pull.
base_url (str, optional): Base URL of the Ollama server. Defaults to BASE_URL.
Returns:
list: A list of strings representing the streamed output lines.
Raises:
requests.HTTPError: If the server response indicates failure.
"""
url = f"{base_url}/api/pull"
response = requests.post(url, json={"name": model}, stream=True)
if response.status_code != 200:
raise requests.HTTPError(f"Failed to pull model '{model}': {response.text}")
output_lines = []
for line in response.iter_lines():
if line:
decoded = line.decode("utf-8")
print(decoded)
output_lines.append(decoded)
return output_lines
# Usage
model = "phi3:3.8b"
output_logs = pull_model(model=model)
4. Running a Sample Query#
Send a single chat prompt to a specified Ollama model and stream the response.
import requests
import json
from typing import List
def chat_once(model: str, prompt: str, base_url: str = BASE_URL) -> List[str]:
"""
Send a single chat prompt to a specified Ollama model and stream the response.
Args:
model (str): Name of the Ollama model to use.
prompt (str): User input to send to the model.
base_url (str, optional): Base URL of the Ollama server. Defaults to BASE_URL.
Returns:
List[str]: List of streamed output chunks from the model.
Raises:
requests.HTTPError: If the server response status is not 200.
"""
url = f"{base_url}/api/chat"
response = requests.post(
url,
json={"model": model, "messages": [{"role": "user", "content": prompt}]},
stream=True
)
if response.status_code != 200:
raise requests.HTTPError(f"Failed to chat with model '{model}': {response.text}")
output_lines = []
for line in response.iter_lines():
if line:
data = json.loads(line.decode('utf-8'))
if "message" in data:
content = data["message"]["content"]
print(content, end="", flush=True) # Stream to console
output_lines.append(content)
print() # Newline after full response
return output_lines
# Usage
model="qwen3:0.6b"
prompt= "How old are you"
output_logs = chat_once(model=model, prompt=prompt)
I don't have a physical age, but I can help you with a wide range of tasks, whether you need assistance with writing, math, or anything else. How can I assist you today?
5. Interactive Chat with Ollama Models#
This function enables a live, interactive conversation with a local Ollama LLM model.
Users can type messages in the terminal, and the model streams its responses in real time.
Features:
Maintains conversation history between user and model.
Supports multiple local models (must be pulled beforehand).
Type โexitโ or โquitโ to end the session.
Returns the full conversation history for further processing or logging.
import requests
import json
from typing import List, Dict
def ollama_chat(
model: str,
base_url: str = BASE_URL,
system_prompt: str = 'You are a helpful assistant. You only give a short sentence by answer.'
) -> List [Dict[str, str]]:
"""
Start an interactive chat session with a local Ollama model via HTTP streaming.
This function streams responses from the model in real time, maintains conversation
history, and allows the user to exit by typing 'exit'. A system prompt can guide
the assistant's behavior.
Args:
model (str): Name of the local Ollama model to use.
base_url (str, optional): Base URL of the Ollama server. Defaults to BASE_URL.
system_prompt (str, optional): Instruction for the assistant. Defaults to a short-answer style.
Returns:
List[Dict[str, str]]: Full conversation history as a list of messages with roles ('user' or 'assistant').
Raises:
ValueError: If the requested model is not in the local models list.
requests.HTTPError: If the chat request fails.
"""
# Validate model existence
if model not in get_local_models():
raise ValueError(f"Requested model '{model}' is not in the local list. Pull the model first!")
# Initialize message history
history: List[Dict[str, str]] = []
print("๐ค Chat started โ type 'exit' to quit.\n")
while True:
user_input = input("๐ค You: ").strip()
if user_input.lower() == 'exit':
print("๐ Goodbye!")
break
# Compose full message payload with system + history
request_messages = [{'role': 'system', 'content': system_prompt}] + history + [{'role': 'user', 'content': user_input}]
# Start request
try:
response = requests.post(
f"{base_url}/api/chat",
json={"model": model, "messages": request_messages},
stream=True
)
if response.status_code != 200:
raise requests.HTTPError(f"Chat request failed: {response.text}")
assistant_reply = ""
print("๐ค Ollama:", end=" ", flush=True)
for line in response.iter_lines():
if line:
data = json.loads(line.decode("utf-8"))
if "message" in data and "content" in data["message"]:
chunk = data["message"]["content"]
assistant_reply += chunk
print(chunk, end='', flush=True)
print("\n")
# Add interaction to message history
history.append({'role': 'user', 'content': user_input})
history.append({'role': 'assistant', 'content': assistant_reply})
except Exception as e:
print("\nโ ๏ธ Error:", e)
return history
# Usage
model = "qwen3:0.6b"
history = ollama_chat(model='qwen3:0.6b')
๐ค Chat started โ type 'exit' to quit.
๐ค Ollama: The weather is cloudy with a light breeze.
๐ Goodbye!
Stop the Ollama Server#
Make sure to stop the Ollama server by terminating the Singularity container.
import subprocess
import os
def stop_singularity_instance(instance_name="ollama", log_file=None, port_file=None):
"""
Gracefully stop a running Singularity instance by name,
and optionally remove associated log or port files.
"""
print(f"Checking for Singularity instance: {instance_name}")
# 1. Check if instance is running
try:
result = subprocess.run(
'bash -lc "module load singularity 2>/dev/null || true; singularity instance list"',
shell=True,
capture_output=True,
text=True
)
if instance_name not in result.stdout:
print(f"No running instance named '{instance_name}' found.")
else:
print(f"Instance '{instance_name}' is running. Attempting to stop it...")
stop_result = subprocess.run(
f'bash -lc "module load singularity 2>/dev/null || true; singularity instance stop {instance_name}"',
shell=True,
capture_output=True,
text=True
)
if stop_result.returncode == 0:
print(f"Singularity instance '{instance_name}' stopped successfully.")
else:
print(f"Warning: Failed to stop instance '{instance_name}'.")
print(stop_result.stderr)
except FileNotFoundError:
print("Singularity command not found. Ensure it's installed and in PATH.")
return
# 2. Optional cleanup for files
if port_file and os.path.exists(port_file):
os.remove(port_file)
print(f"Removed port file: {port_file}")
if log_file and os.path.exists(log_file):
os.remove(log_file)
print(f"Removed log file: {log_file}")
print("Cleanup complete.")
stop_singularity_instance(
instance_name="ollama",
log_file=os.path.expandvars("$PWD/ollama_server.log"),
port_file=os.path.expandvars("$PWD/ollama_port.txt")
)
Checking for Singularity instance: ollama
Instance 'ollama' is running. Attempting to stop it...
Singularity instance 'ollama' stopped successfully.
Cleanup complete.