OpenNeurons
AI Toolsbeginner20 minutes

Getting Started with Ollama: Run AI Models Locally

Learn how to install Ollama, run open source language models on your own machine, and build AI-powered applications — all without sending data to the cloud.

Getting Started with Ollama: Run AI Models Locally

If you've been following the AI space, you've probably noticed something interesting happening: the models are getting really, really good — and they're getting easier to run on your own hardware. A year ago, running a large language model locally meant wrestling with Python dependencies, CUDA drivers, and arcane configuration files. Today, it takes about two minutes.

That's largely thanks to Ollama, a tool that does for LLMs what Docker did for containers. It wraps model download, configuration, and inference into a single, clean CLI. You pull a model, you run it, and it just works. No cloud account needed, no API key, no per-token billing.

Why would you want to run AI locally? A few reasons. First, privacy — your prompts and data never leave your machine, which matters if you're working with client data, medical records, or anything you wouldn't paste into a web form. Second, cost — once you've downloaded a model, every query is free. No surprise bills at the end of the month. Third, control — you pick the model, you set the parameters, and you can fine-tune it for your exact use case. And finally, it works offline. On a plane, in a coffee shop with bad wifi, on a classified network — doesn't matter.

In this tutorial, we'll get you from zero to running Llama 3.2 locally, hitting it with API calls, and even creating custom models. Let's get into it.

Software

You'll need the following software installed on your machine:

Ollama

The runtime for local LLMs. Available for macOS, Linux, and Windows.

Download

Terminal

Any terminal application (Terminal.app, iTerm2, Windows Terminal, etc.)

curl (optional)

For testing the API endpoints. Comes pre-installed on most systems.

Step by step

Follow these steps to get up and running with Ollama:

1

Install Ollama

Head to ollama.com and download the installer for your operating system. On macOS and Windows, run the installer. On Linux, use the install script:

curl -fsSL https://ollama.com/install.sh | sh

Tip: On Linux, this installs Ollama as a systemd service that starts automatically.

2

Pull Your First Model

Ollama uses a Docker-like pull system. Let's start with Llama 3.2, a great general-purpose model:

# Pull Llama 3.2 (3B parameters, ~2GB download)
ollama pull llama3.2

# Or try a smaller model for lower-spec machines
ollama pull gemma2:2b

Tip: Models are stored in ~/.ollama/models. The 3B parameter version works well even on machines with 8GB RAM.

3

Chat with the Model

The simplest way to interact is through the built-in chat interface:

ollama run llama3.2

Tip: Type /bye to exit the chat. Use /set parameter to adjust temperature, context length, and other settings on the fly.

4

Use the REST API

Ollama exposes a local REST API on port 11434, making it easy to integrate into any application:

# Generate a completion
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain quantum computing in simple terms",
  "stream": false
}'  

# Chat format (multi-turn conversations)
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "user", "content": "What is machine learning?"}
  ],
  "stream": false
}'
5

Use with Python or JavaScript

Ollama provides official libraries for easy integration. The API is also compatible with the OpenAI SDK format, so if you've built anything with the OpenAI API before, switching to a local model is literally a one-line change:

# Python - using the ollama library
import ollama

response = ollama.chat(
    model="llama3.2",
    messages=[{"role": "user", "content": "Write a haiku about programming"}]
)
print(response["message"]["content"])

# Or use the OpenAI-compatible endpoint
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
6

Create a Custom Model

One of the best things about running models locally is that you can create custom variants. Want a model that always responds as a Python expert? Or one that writes in a specific tone? You do that with a Modelfile — think of it like a Dockerfile, but for LLMs:

# Save as Modelfile
FROM llama3.2

SYSTEM You are a helpful coding assistant specializing in Python. Always include code examples and explain your reasoning step by step.

PARAMETER temperature 0.7
PARAMETER num_ctx 4096

Tip: Build it with: ollama create my-coding-assistant -f Modelfile

Testing

Let's verify everything is working correctly:

Success: If you see model responses and the API returns a list of models, you're all set!

  1. 1Run ollama list to see your downloaded models
  2. 2Run ollama run llama3.2 and ask it a question
  3. 3Test the API with curl http://localhost:11434/api/tags
  4. 4Check Ollama is running with ollama ps

FAQ

How much RAM do I need?

For 3B parameter models, 8GB RAM is sufficient. For 7-8B models, 16GB is recommended. For 70B+ models, you will need 64GB+ RAM or a GPU with enough VRAM.

Can I use my GPU?

Yes! Ollama automatically detects and uses NVIDIA GPUs (CUDA), AMD GPUs (ROCm), and Apple Silicon GPUs (Metal). No extra configuration needed.

Is it really private?

Yes. Ollama runs entirely on your machine. No data is sent anywhere. The models are downloaded once and run locally.

Can I run multiple models at once?

Yes. Ollama can load multiple models simultaneously, limited only by your available memory.

Wrapping up

You now have a fully functional local AI setup. From here, the possibilities are wide open — build a chatbot, plug it into a RAG pipeline, use it as a code reviewer, or just have a conversation with it while you're on a flight with no wifi. The point is, you own the whole stack now. No rate limits, no usage caps, no terms of service changes to worry about. Just you and a model running on your hardware.

If you want to go deeper, try pulling different models — Mistral is excellent for its size, CodeLlama is great for programming tasks, and the larger Llama 3 70B model is genuinely impressive if you have the hardware for it. The open source AI ecosystem is moving fast, and Ollama makes it easy to keep up.

Ideas to try next

  • Build a local AI chatbot with a web interface
  • Create a document Q&A system using RAG
  • Build a code review assistant with a custom Modelfile
  • Set up a private AI writing assistant
ollamalocal AILLMtutorialbeginnerprivacy