Neszed-Mobile-header-logo
Saturday, July 26, 2025
Newszed-Header-Logo
HomeAIRun Ollama Models Locally and make them Accessible via Public API

Run Ollama Models Locally and make them Accessible via Public API

Blog thumbnail - Expose Local Ollama Models with a Public API

Introduction

Running Large Language Models (LLMs) and other open-source models locally offers significant advantages for developers. This is where Ollama shines. Ollama simplifies the process of downloading, setting up, and running these powerful models on your local machine, giving you greater control, enhanced privacy, and reduced costs compared to cloud-based solutions.

While running models locally offers immense benefits, integrating them with cloud-based projects or sharing them for broader access can be a challenge. This is precisely where Clarifai Local Runners come in. Local Runners enable you to expose your locally running Ollama models via a public API endpoint, allowing seamless integration with any project, anywhere, effectively bridging the gap between your local environment and the cloud.

In this post, we’ll walk through how to run open-source models using Ollama and expose them with a public API using Clarifai Local Runners. This makes your local models accessible globally while still running entirely on your machine.

Local Runners Explained

Local Runners let you run models on your own machine, whether it’s your laptop, workstation, or on-prem server, while exposing them through a secure, public API endpoint. You don’t need to upload the model to the cloud. The model stays local but behaves like it’s hosted on Clarifai.

Once initialized, the Local Runner opens a secure tunnel to Clarifai’s control plane. Any requests to your model’s Clarifai API endpoint are routed to your machine, processed locally, and returned to the caller. From the outside, it functions like any other hosted model. Internally, everything runs on your hardware.

Local Runners are especially useful for:

  • Fast local development: Build, test, and iterate on models in your own environment without deployment delays. Inspect traffic, test outputs, and debug in real time.
  • Using your own hardware: Take advantage of local GPUs or custom hardware setups. Let your machine handle inference while Clarifai manages routing and API access.
  • Private and offline data: Run models that rely on local files, internal databases, or private APIs. Keep everything on-prem while still exposing a usable endpoint.

Local Runners gives you the flexibility of local execution along with the reach of a managed API, all without giving up control over your data or environment.

Expose Local Ollama Models via Public API

This section will walk you through the steps to get your Ollama model running locally and accessible via a Clarifai public endpoint.

Prerequisites

Before we begin, ensure you have:

Step 1: Install Clarifai and Login

First, install the Clarifai Python SDK:

Next, log in to Clarifai to configure your context. This links your local environment to your Clarifai account, allowing you to manage and expose your models.

Follow the prompts to enter your User ID and Personal Access Token (PAT). If you need help obtaining these, refer to the documentation here.

Step 2: Set Up Your Local Ollama Model for Clarifai

Next, you’ll prepare your local Ollama model so it can be accessed by Clarifai’s Local Runners. This step sets up the necessary files and configuration to expose your model through a public API endpoint using Clarifai’s platform.

Use the following command to initialize the setup:

This generates three key files within your project directory:

  • model.py

  • config.yaml

  • requirements.txt

These define how Clarifai will communicate with your locally running Ollama model.

You can also customize the command with the following options:

  • --model-name: Name of the Ollama model you want to serve. This pulls from the Ollama model library (defaults to llama3:8b).

  • --port: The port where your Ollama model is running (defaults to 23333).

  • --context-length: Sets the model’s context length (defaults to 8192).

For example, to use the gemma:2b model with a 16K context length on port 8008, run:

After this step, your local model is ready to be exposed using Clarifai Local Runners.

Step 3: Start the Clarifai Local Runner

Once your local Ollama model is configured, the next step is to run Clarifai’s Local Runner. This exposes your local model to the internet through a secure Clarifai endpoint.

Navigate into the model directory and run:

Once the runner starts, you will receive a public Clarifai URL. This URL is your gateway to accessing your locally running Ollama model from anywhere. Requests made to this Clarifai endpoint will be securely routed to your local machine, allowing your Ollama model to process them.

Running Inference on Your Exposed Model

With your Ollama model running locally and exposed via Clarifai Local Runner, you can now send inference requests to it from anywhere using the Clarifai SDK or an OpenAI-compatible endpoint.

Inference using OpenAI compatible method

Set your Clarifai PAT as an environment variable:

Then, you can use the OpenAI client to send requests:

For multimodal inference, you can include image data:

Inference with Clarifai SDK

You can also use the Clarifai Python SDK for inference. The model URL can be obtained from your Clarifai account.

Customizing Ollama Model Configuration

The clarifai model init --toolkit ollama command generates a model file structure:

ollama-model-upload/
├── 1/
│ └── model.py

├── config.yaml
└── requirements.txt

You can customize the generated files to control how your model works:

  • 1/model.py – Customize to tailor your model’s behavior, implement custom logic, or optimize performance.

  • config.yaml – Define settings such as compute requirements, especially useful when deploying to dedicated compute using Compute Orchestration.

  • requirements.txt – List any required Python packages for your model.

This setup gives you full control over how your Ollama model is exposed and used via API. Refer to the documentation here.

Conclusion

Running open-source models locally with Ollama gives you full control over privacy, latency, and customization. With Clarifai Local Runners, you can expose these models via a public API without relying on centralized infrastructure. This setup makes it easy to plug local models into larger workflows or agentic systems, while keeping compute and data fully in your control. If you want to scale beyond your machine, check out Compute Orchestration to deploy models on dedicated GPU nodes.



Source link

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments