Overview
For Large-Language Model (LLM) research, Ollama is installed on both Great Lakes and Lighthouse computing clusters. Ollama is a powerful module designed for Generative AI research, providing advanced tools and capabilities to facilitate the creation, training, and deployment of generative models. It allows researchers to leverage state-of-the-art AI technologies, enabling innovations in areas such as natural language processing, image generation, and more. The module is now available on the Great Lakes and Lighthouse platforms, offering enhanced computational resources and support for complex AI projects.
Loading the module & starting the server
To load this module, use the following command:
module load ollama
To start the Ollama server in background mode, use the following commands:
export OLPORT=`ruby -e 'require "socket"; puts Addrinfo.tcp("", 0).bind {|s| s.local_address.ip_port }'`
echo $OLPORT
export OLLAMA_HOST=127.0.0.1:$OLPORT
export OLLAMA_BASE_URL="http://localhost:$OLPORT"
export OLLAMA_MODELS=/path/to/your/models/directory
ollama serve >& ollama.log
Command Explanation:
- module load ollama: Loads the latest Ollama module version into your environment, making the Ollama commands available.
- export OLPORT=...: Uses Ruby to find an available network port on your system and assigns it to the OLPORT variable. This ensures Ollama will run on an open port without conflicts.
- echo $OLPORT: Displays the selected port number so you can verify and use it to connect to the Ollama server later.
- export OLLAMA_HOST=127.0.0.1:$OLPORT: Configures Ollama to listen on localhost (127.0.0.1) at the selected port number.
- export OLLAMA_BASE_URL="http://localhost:$OLPORT": Sets the base URL that clients will use to connect to the Ollama API.
- export OLLAMA_MODELS=/path/to/your/models/directory: Specifies a custom directory where Ollama will store and look for downloaded models. This allows you to control where large model files are saved. Current default is in
~/.ollama
(user's home directory) - ollama serve >& ollama.log: Starts the Ollama server and redirects both standard output and error messages to a file named ollama.log for later review.
Expected output from a successful run
$ ollama serve
2024/12/18 09:49:52 routes.go:1195: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/adnanzai/.ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-12-18T09:49:52.802-05:00 level=INFO source=images.go:753 msg="total blobs: 0"
time=2024-12-18T09:49:52.802-05:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-12-18T09:49:52.805-05:00 level=INFO source=routes.go:1246 msg="Listening on 127.0.0.1:11434 (version 0.5.1)"
time=2024-12-18T09:49:52.827-05:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama2225239369/runners
time=2024-12-18T09:49:52.979-05:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx2 cuda_v11 cuda_v12 rocm cpu cpu_avx]"
time=2024-12-18T09:49:52.979-05:00 level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
time=2024-12-18T09:49:53.130-05:00 level=INFO source=gpu.go:386 msg="no compatible GPUs were discovered"
time=2024-12-18T09:49:53.130-05:00 level=INFO source=types.go:123 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="187.0 GiB" available="121.8 GiB"
Overall Information
The Slurm job session you are in should pick up and respect the OLLAMA_HOST
and OLLAMA_BASE
environmental variables defined above, which will point the client command to the right server instance. If running in batch, you may wish to feed the prompt(s) you want to the ollama client. There are several ways to do this, including either feeding the prompt in the ollama run command, skipping the client entirely and simply interacting with the running ollama instance on that port using curl, etc., so you can tune this step towards a useful batch workflow depending on your needs. The ollama server background session will terminate automatically when you sign out of the interactive session or when your batch job terminates. This procedure should keep separate instances from colliding with each other in simultaneous interactive or batch jobs, even if other HPC account users run this or similar programs on the same nodes up to their batch resource limits.