Neszed-Mobile-header-logo
Saturday, August 16, 2025
Newszed-Header-Logo
HomeAIIntroducing Local Runners — Ngrok for AI Models

Introducing Local Runners — Ngrok for AI Models

Blog thumbnail 11.6_blog_hero

This blog post focuses on new features and improvements. For a comprehensive list, including bug fixes, please see the release notes.

Introducing Local Runners: Run Models on Your Own Hardware

Building AI models often starts locally. You experiment with architecture, fine-tune on small datasets, and validate ideas using your own machine. But the moment you want to test that model inside a real-world pipeline, things become complicated.

You usually have two options:

  1. Upload the model to a remote cloud environment, even for early-stage testing

  2. Build and expose your own API server, handle authentication, security, and infrastructure just to test locally

Neither path is ideal, especially if you’re:

  • Working on personal or resource-limited projects

  • Developing models that need access to local files, OS-level tools, or restricted data

  • Managing edge or on-prem environments where cloud isn’t viable

Local Runners solve this problem.

They allow you to develop, test, and run models on your own machine while still connecting to Clarifai’s platform. You don’t need to upload your model to the cloud. You simply run it where it is — your laptop, workstation, or server — and Clarifai takes care of routing, authentication, and integration.

Once registered, the Local Runner opens a secure connection to Clarifai’s control plane. Any requests to your model’s Clarifai API endpoint are securely routed to your local runner, processed, and returned. From a user perspective, it works like any other model hosted on Clarifai, but behind the scenes it’s running entirely on your machine.

Here’s what you can do with Local Runners:

  • Streamlined model development
    Develop and debug models without deployment overhead. Watch real-time traffic, inspect inputs, and test outputs interactively.

  • Leverage your own compute
    If you have a powerful GPU or custom setup, use it to serve models. Your machine does the heavy lifting, while Clarifai handles the rest of the stack.

  • Private data and system-level access
    Serve models that interact with local files, private APIs, or internal databases. With support for the MCP (Model Context Protocol), you can expose local capabilities securely to agents, without making your infrastructure public.

Getting Started

Before starting a Local Runner, make sure you’ve done the following:

  1. Built or downloaded a model – You can use your own model or pick a compatible one from a repo like Hugging Face. If you’re building your own, check out the documentation on how to structure it using the Clarifai-compatible project format.

  2. Installed the Clarifai CLI – run

    pip install --upgrade clarifai
  3. Generated a Personal Access Token (PAT) – from your Clarifai account’s settings page under “Security.”

  4. Created a context – this stores your local environment variables (like user ID, app ID, model ID, etc.) so the runner knows how to connect to Clarifai.

You can set up the context easily by logging in through the CLI, which will walk you through entering all the required values:

clarifai login

Starting the Runner

Once everything is set up, you can start your Local Dev Runner from the directory containing your model (or provide a path):

clarifai model local-runner [OPTIONS] [MODEL_PATH]
  • MODEL_PATH is the path to your model directory. If you leave it blank, it defaults to the current directory.

  • This command will launch a local server that mimics a production Clarifai deployment, letting you test and debug your model live.

If the runner doesn’t find an existing context or config, it’ll prompt you to generate one with default values. This will create:

  • A dedicated local compute cluster and nodepool.

  • An app and model entry in your Clarifai account.

  • A deployment and runner ID that ties your local instance to the Clarifai platform.

Once launched, it also auto-generates a client code snippet to help you test the model.

Local Runners give you the flexibility to build and test models exactly where your data and compute live, while still integrating with Clarifai’s API, workflows, and platform features. Check out the full example and setup guide in the documentation here.

You can try Local Runners for free. There’s also a $1/month Developer Plan for the first year, which gives you the ability to connect up to 5 Local Runners to the cloud API with unlimited runner hours.

local-runners-298af3177d2174a3805238cc2a99d2cc

Compute UI

  • We’ve introduced a new Compute Overview dashboard that gives you a clear, unified view of all your compute resources. From a single screen, you can now manage Clusters, Nodepools, Deployments, and the newly added Runners.
  • This update also includes two major additions: Connect a Local Runner, which lets you run models directly on your own hardware with complete privacy, and Connect your own cloud, allowing you to integrate external infrastructure like AWS, GCP, or Oracle for dynamic, cost-efficient scaling. It’s now easier than ever to control where and how your models run.
    Screenshot 2025-07-10 at 11.15.41 AM
  • We’ve also redesigned the cluster creation experience to make provisioning compute even more intuitive. Instead of selecting each parameter step by step, you now get a unified, filterable view of all available configurations across providers like AWS, GCP, Azure, Vultr, and Oracle. You can filter by region, instance type, and hardware specs, then select exactly what you need with full visibility into GPU, memory, CPU, and pricing. Once selected, you can spin up a cluster instantly with a single click.
    Screenshot 2025-07-10 at 11.29.17 AM

Published New Models

We published the Gemma-3n-E2B and Gemma-3n-E4B models. We’ve added both the E2B and E4B variants, optimized for text-only generation and suited for different compute needs.

Gemma 3n is designed for real-world, low-latency use on devices like phones, tablets, and laptops. These models leverage Per-Layer Embedding (PLE) caching, the MatFormer architecture, and conditional parameter loading.

You can run them directly in the Clarifai Playground or access them via our OpenAI-compatible API.

Twitter - 2025-07-10T113601.670

Token-Based Billing

We’ve started rolling out token-based billing for select models on our Community platform. This change aligns with industry standards and more accurately reflects the cost of inference, especially for large language models.

Token-based pricing will apply only to models running on Clarifai’s default Shared compute in the Community. Models deployed on Dedicated compute will continue to be billed based on compute time, with no change. Legacy vision models will still follow per-request billing for now.

Playground

  • The Playground page is now publicly accessible — no login required. However, certain features remain available only to logged-in users.
  • Added model descriptions and predefined prompt examples to the Playground, making it easier for users to understand model capabilities and get started quickly.
  • Added Pythonic support in the Playground for consuming the new model specification.
  • Improved the Playground user experience with enhanced inference parameter controls, restored model version selectors, and clearer error feedback.

Screenshot 2025-07-10 at 11.40.15 AM

Additional Changes

  • Python SDK: Added per-output token tracking, async endpoints, improved batch support, code validation, and build optimizations.
    Check all SDK updates here.

  • Platform Updates: Improved billing accuracy, added dynamic code snippets, UI tweaks to Community Home and Control Center, and better privacy defaults.
    Find all platform changes here.

  • Clarifai Organizations: Made invites clearer, improved token visibility, and added persistent invite prompts for better onboarding.
    See full org improvements here.

Ready to start building?

With Local Runners, you can now serve models, MCP servers, or agents directly from your own hardware without uploading model weights or managing infrastructure. It’s the fastest way to test, iterate, and securely run models from your laptop, workstation, or on-prem server. You can read the documentation, watch the demo video to get started.



Source link

RELATED ARTICLES

Most Popular

Recent Comments