How to Run Local LLMs on Linux with Ollama and Open WebUI

Introduction

Local LLMs make it possible to run large language models entirely on your own Linux system, without sending prompts or data to third-party cloud services. All processing stays local, respecting your privacy.

This How to run Local LLMs on Linux with Ollama and Open WebUI guide shows how to install Ollama as a local model runtime, deploy Open WebUI as a chat interface using Podman and Quadlet, and connect everything into a working local AI stack. We will also install a real LLM and verify that both CLI and web-based usage work.

While the terminology around LLMs, runtimes, and frontends can sound complex, the actual setup is straightforward. It is also a great hands-on way to learn how modern AI systems are structured, from model execution to APIs and client applications.

By the end of this guide, you will have a reproducible Linux-based setup where models run locally, Open WebUI provides a ChatGPT-like interface, and tools can connect to Ollama through its API.

Local LLM Architecture Overview

Model

LLMs (Large Language Models)

Large Language Models are neural networks trained on large text corpora to predict the next token in a sequence. This simple mechanism enables emergent capabilities such as reasoning, summarization, code generation, translation, and structured output.

Runtime + API

Ollama

Ollama is the local model runtime and management layer.
It downloads, stores, loads, and executes LLMs on your machine and exposes them through a local HTTP API. This makes models accessible both from the command line and from external applications.
Conceptually, Ollama plays a role similar to Docker or Podman, but for language models instead of containers.

Client

Open WebUI

Open WebUI is a web-based user interface for interacting with LLMs.
It connects to the Ollama API and provides a chat-style interface similar to ChatGPT, including conversation history, model selection, and basic configuration.
Open WebUI does not run models itself. It is purely a frontend that talks to Ollama.

Claude Code, Codex, OpenClaw

These tools act as AI-powered assistants for coding and sysadmin workflows.
They connect to an LLM backend (in this setup: Ollama) and use it as a reasoning engine to:

Generate and modify code
Explain errors and logs
Assist with shell commands and automation
Perform refactoring and documentation tasks

They are clients of the Ollama API, not model runtimes.

Mental Model

LLM > The model that does the reasoning
Ollama > Runs the model and exposes an API
Open WebUI > Chat frontend
Claude Code / Codex / OpenClaw > Specialized clients for development and operations

This separation is important: you can change models without changing tools, and change tools without changing models.

Flow Diagram

┌───────────────────┐
│        LLMs       │
│ (Models on disk)  │
└─────────▲─────────┘
          │
          ▼
┌─────────┴─────────┐
│      Ollama       │
│  Runtime & API    │
└─────────▲─────────┘
          │
┌─────────▼─────────────────────┐
│      Clients / Interfaces     │
│  Open WebUI                   │
│  Claude Code, Codex, OpenClaw │
└───────────────────────────────┘

Install Ollama

Ollama is available in dnf and apt, but those packages often lag behind upstream.
Since LLM runtimes evolve quickly, it is recommended to install the latest version directly from upstream.

Run the installer:

curl -fsSL https://ollama.com/install.sh | sh

This installs the Ollama binary and the systemd service.

Because Ollama is installed outside your system package manager, updates will not be handled by dnf or apt. To update Ollama in the future, simply run the same install script again.

Verify installation:

ollama --version

Make Ollama listen on all interfaces

By default, Ollama listens on 127.0.0.1.
If Open WebUI runs inside a container, it must be able to reach Ollama over the network.
This is required so containers such as Open WebUI can reach Ollama.

The simplest approach is to bind Ollama to all interfaces.

Create a systemd override:

sudo systemctl edit ollama.service

Add:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

This tells Ollama to listen on all interfaces on port 11434.

Change Model Storage Location

Models are large and accumulate quickly.

By default, Ollama stores models in: /usr/share/ollama/.ollama/models/

If you have a dedicated storage volume, you can change this location. This is optional but recommended.

In the same systemd override file, add:

Environment="OLLAMA_MODELS=/workloads/ollama/models"

Example override file:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_MODELS=/workloads/ollama/models"

Save and exit.

If you already downloaded models, move them manually:

sudo mv /usr/share/ollama/.ollama/models/* /workloads/ollama/models/

Reload and Enable Ollama

After modifying the service:

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

Verify that it is running:

sudo systemctl status ollama

Install Open WebUI

We will install Open WebUI inside a rootless Podman container managed by Quadlet (systemd-native container units).

If you prefer other deployment methods (Docker, Kubernetes, etc.), see the official quick-start documentation: https://docs.openwebui.com/getting-started/quick-start/

Create Quadlet Directory

Quadlet user units live here:

mkdir -p ~/.config/containers/systemd/

Create Open WebUI Container Unit

Create the file:

nano ~/.config/containers/systemd/open-webui.container

With the following content:

[Unit]
Description=Open WebUI Container
After=network-online.target

[Container]
Image=ghcr.io/open-webui/open-webui:main
ContainerName=open-webui
PublishPort=3000:8080
Volume=open-webui:/app/backend/data

# Podman 5+ uses pasta (user-mode networking) by default.
# Add a host entry so the container can reach services on the host
# (for example Ollama running on the host):
AddHost=host.containers.internal:host-gateway

[Service]
Restart=always

[Install]
WantedBy=default.target

Explanation:

Publishes Open WebUI on port 3000
Persists application data in a named volume
Podman’s user-mode networking does not expose the host automatically.
The AddHost line creates a hostname inside the container that resolves to the host system (required to reach Ollama running on the host)

Reload systemd and Start Open WebUI

Reload and start:

systemctl --user daemon-reload
systemctl --user start open-webui

Enable automatic startup at boot:

systemctl --user enable open-webui

Verify status:

systemctl --user status open-webui

Access Open WebUI

Open your browser, navigate to: http://localhost:3000 and create an admin account.

You should now see the Open WebUI interface.

At this point no models are installed yet, so there is nothing to chat with.
In the next step, we will download and run our first LLM.

Install LLMs

Ollama supports hundreds of LLMs, each with different strengths and hardware requirements.
There is no single “best” model. When running models locally, the main limiting factor is your hardware, especially GPU VRAM.

In my current setup I am running an NVIDIA GeForce RTX 3080 Ti with 12 GB VRAM, which allows me to run models up to around the 30B class.

After testing many models, I have had the best overall results with qwen3-coder:30b from Alibaba.
It is fast and performs well for coding-related tasks.

You can browse all available models at: https://ollama.com/search

Pull Model

In this guide we will install and use qwen3-coder:30b.

Install model:

ollama pull qwen3-coder:30b

Verify that the model is installed:

ollama list

Test the model from CLI:

ollama run qwen3-coder:30b

Setting up Open WebUI

Navigate back to: http://localhost:3000

After installing qwen3-coder:30b and refreshing the page, the model should now appear in the model picker in the top-left corner.

Select the model and set it as default.

Arena Models

The description can be confusing at first.

Arena Models is essentially A/B split testing. By default, Arena Models appears as a selectable option in the model picker. Choosing it makes Open WebUI select two random installed models and compare their responses, allowing you to choose the better result.

If you do not want this behavior, disable it by going to: Username > Admin Panel > Settings > Evaluations and uncheck “Arena Models” and hit “Save“.

Web Search

By default, models cannot access the web.

To enable web search go to Username > Admin Panel > Settings > Web Search and check “Web Search“.

Next, select a Web Search Engine.
Most search engines require an API key. You can also run your own search proxy, but in this guide we will use DDGS (DuckDuckGo Search) since it does not require an API key.

Set:

Web Search Engine: DDGS
DDGS Backend: DuckDuckGo

Click “Save“.

At this point you should be able to chat with qwen3-coder:30b, and it should be able to perform web searches.

Try asking:

How is the weather today?

Memory

Open WebUI only maintains context within each individual chat.

If you want the system to remember persistent information (preferences, instructions, etc.), enable memory by going to: User > Settings > Personalization and check “Memory“.

Click Manage > “Add Memory“, enter the information you want persisted, then click “Add” and “Save“.

Optional Settings

Open WebUI allows extensive customization of the interface by going to: User > Settings > Interface

Examples:

UI scale
Disable Follow-Up Auto-Generation
Disable Insert Suggestion Prompt to Input
Disable Insert Follow-Up Prompt to Input

Tune these according to preference.

Management

Useful commands for managing the Open WebUI service.

Check status:

systemctl --user status open-webui

View logs:

journalctl --user -u open-webui -f

Start service:

systemctl --user start open-webui

Stop service:

systemctl --user stop open-webui

Restart service:

systemctl --user restart open-webui

Conclusion

You now have a fully functional local LLM environment running on your own hardware. Ollama manages and serves models, Open WebUI provides a chat interface, and models are stored and executed locally without relying on external cloud providers.

This How to run Local LLMs on Linux with Ollama and Open WebUI setup gives you:

Full control over models and data
Keeps prompts and responses local, respecting privacy
No subscription costs
Reproducible, systemd-managed services
A foundation for adding developer tools such as Claude Code, Codex, or OpenClaw later

With this base in place, you can experiment with different models, tune performance for your hardware, and expand your local AI stack as your needs grow.

If you wish to go further, you can integrate your local LLM directly into your coding editor (ie. VSCode). This will be covered in a separate guide.