Getting Started with Ollama: Run AI Models Locally
Learn how to install Ollama, run open source language models on your own machine, and build AI-powered applications — all without sending data to the cloud.
If you've been following the AI space, you've probably noticed something interesting happening: the models are getting really, really good — and they're getting easier to run on your own hardware. A year ago, running a large language model locally meant wrestling with Python dependencies, CUDA drivers, and arcane configuration files. Today, it takes about two minutes.
That's largely thanks to Ollama, a tool that does for LLMs what Docker did for containers. It wraps model download, configuration, and inference into a single, clean CLI. You pull a model, you run it, and it just works. No cloud account needed, no API key, no per-token billing.
Why would you want to run AI locally? A few reasons. First, privacy — your prompts and data never leave your machine, which matters if you're working with client data, medical records, or anything you wouldn't paste into a web form. Second, cost — once you've downloaded a model, every query is free. No surprise bills at the end of the month. Third, control — you pick the model, you set the parameters, and you can fine-tune it for your exact use case. And finally, it works offline. On a plane, in a coffee shop with bad wifi, on a classified network — doesn't matter.
In this tutorial, we'll get you from zero to running Llama 3.2 locally, hitting it with API calls, and even creating custom models. Let's get into it.
Software
You'll need the following software installed on your machine:
Terminal
Any terminal application (Terminal.app, iTerm2, Windows Terminal, etc.)
curl (optional)
For testing the API endpoints. Comes pre-installed on most systems.
Step by step
Follow these steps to get up and running with Ollama:
Install Ollama
Head to ollama.com and download the installer for your operating system. On macOS and Windows, run the installer. On Linux, use the install script:
curl -fsSL https://ollama.com/install.sh | shTip: On Linux, this installs Ollama as a systemd service that starts automatically.
Pull Your First Model
Ollama uses a Docker-like pull system. Let's start with Llama 3.2, a great general-purpose model:
# Pull Llama 3.2 (3B parameters, ~2GB download)
ollama pull llama3.2
# Or try a smaller model for lower-spec machines
ollama pull gemma2:2bTip: Models are stored in ~/.ollama/models. The 3B parameter version works well even on machines with 8GB RAM.
Chat with the Model
The simplest way to interact is through the built-in chat interface:
ollama run llama3.2Tip: Type /bye to exit the chat. Use /set parameter to adjust temperature, context length, and other settings on the fly.
Use the REST API
Ollama exposes a local REST API on port 11434, making it easy to integrate into any application:
# Generate a completion
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Explain quantum computing in simple terms",
"stream": false
}'
# Chat format (multi-turn conversations)
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{"role": "user", "content": "What is machine learning?"}
],
"stream": false
}'Use with Python or JavaScript
Ollama provides official libraries for easy integration. The API is also compatible with the OpenAI SDK format, so if you've built anything with the OpenAI API before, switching to a local model is literally a one-line change:
# Python - using the ollama library
import ollama
response = ollama.chat(
model="llama3.2",
messages=[{"role": "user", "content": "Write a haiku about programming"}]
)
print(response["message"]["content"])
# Or use the OpenAI-compatible endpoint
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)Create a Custom Model
One of the best things about running models locally is that you can create custom variants. Want a model that always responds as a Python expert? Or one that writes in a specific tone? You do that with a Modelfile — think of it like a Dockerfile, but for LLMs:
# Save as Modelfile
FROM llama3.2
SYSTEM You are a helpful coding assistant specializing in Python. Always include code examples and explain your reasoning step by step.
PARAMETER temperature 0.7
PARAMETER num_ctx 4096Tip: Build it with: ollama create my-coding-assistant -f Modelfile
Testing
Let's verify everything is working correctly:
Success: If you see model responses and the API returns a list of models, you're all set!
- 1Run ollama list to see your downloaded models
- 2Run ollama run llama3.2 and ask it a question
- 3Test the API with curl http://localhost:11434/api/tags
- 4Check Ollama is running with ollama ps
FAQ
How much RAM do I need?
For 3B parameter models, 8GB RAM is sufficient. For 7-8B models, 16GB is recommended. For 70B+ models, you will need 64GB+ RAM or a GPU with enough VRAM.
Can I use my GPU?
Yes! Ollama automatically detects and uses NVIDIA GPUs (CUDA), AMD GPUs (ROCm), and Apple Silicon GPUs (Metal). No extra configuration needed.
Is it really private?
Yes. Ollama runs entirely on your machine. No data is sent anywhere. The models are downloaded once and run locally.
Can I run multiple models at once?
Yes. Ollama can load multiple models simultaneously, limited only by your available memory.
Wrapping up
You now have a fully functional local AI setup. From here, the possibilities are wide open — build a chatbot, plug it into a RAG pipeline, use it as a code reviewer, or just have a conversation with it while you're on a flight with no wifi. The point is, you own the whole stack now. No rate limits, no usage caps, no terms of service changes to worry about. Just you and a model running on your hardware.
If you want to go deeper, try pulling different models — Mistral is excellent for its size, CodeLlama is great for programming tasks, and the larger Llama 3 70B model is genuinely impressive if you have the hardware for it. The open source AI ecosystem is moving fast, and Ollama makes it easy to keep up.
Ideas to try next
- ‣Build a local AI chatbot with a web interface
- ‣Create a document Q&A system using RAG
- ‣Build a code review assistant with a custom Modelfile
- ‣Set up a private AI writing assistant