How I Reduced My OpenClaw AI Agent API Costs by ~85%

Zea Gatdula Zea Gatdula March 11, 2026

A Step-by-Step Guide (Windows + WSL + Ubuntu Setup)

While experimenting with AI agents, I discovered something quickly:

AI agents are powerful — but they can burn through API credits very fast.

If you run an AI agent framework without optimization, costs can increase dramatically due to token usage, large context windows, and inefficient prompts.

So I decided to run a small experiment.

Goal:

Build a functional AI agent system while minimizing token usage and API costs.

The result was a system that runs significantly cheaper than a default configuration, with my setup reducing costs by around 85% in testing.

This article explains how the system was built and optimized step by step.

System Architecture

My AI agent setup uses a hybrid environment.

Host Operating System

Windows 11

Linux Environment

The system runs Linux through Windows Subsystem for Linux (WSL), which allows a full Linux environment inside Windows.

Linux Distribution

Ubuntu installed through WSL.

Agent Framework

OpenClaw

Model Routing

Requests are routed through OpenRouter, which provides access to multiple AI models through one API.

Primary Model

The system uses MiniMax M2.5.

MiniMax M2.5 is a relatively low-cost model that performs well for many agent workflows, especially when optimizing for lower token costs.

Table of Contents

Step 1 — Installing Windows Subsystem for Linux

The first step is enabling WSL.

Open PowerShell as Administrator and run:

wsl –install

Restart your computer after installation.

If Ubuntu is not automatically installed, run:

wsl –install -d Ubuntu

You can learn more in Microsoft’s official WSL documentation:

https://learn.microsoft.com/en-us/windows/wsl/

Step 2 — Installing Ubuntu

Once WSL is enabled, open the Ubuntu terminal.

Update system packages:

sudo apt update

sudo apt upgrade -y

Install basic development tools:

sudo apt install git curl jq nodejs npm -y

Ubuntu installation instructions are also available here:

https://ubuntu.com/tutorials/install-ubuntu-on-wsl2

Step 3 — Installing OpenClaw


Next, install the OpenClaw framework.

Clone the repository:

git clone https://github.com/openclaw/openclaw

Enter the directory:

cd openclaw

Install dependencies:

npm install

Depending on the repository version, some setups may also require commands such as:

npm run build

npm start

or

npm run dev

Check the repository documentation for the exact startup command.

Step 4 — Creating the Agent Workspace


OpenClaw agents use a workspace directory where files, memory, and outputs are stored.

Create the workspace:

mkdir -p ~/.openclaw/workspace

Your directory structure should look like this:

~/.openclaw

  ├── workspace

  └── config.json

This workspace is used to store:

  • generated outputs
  • agent memory
  • task files
  • configuration data

Step 5 — Configure a Lower-Cost Model


One of the biggest cost optimizations is selecting an efficient model.

Instead of higher-cost models, I configured the system to use:

MiniMax M2.5

Example configuration:

{

“model”: {

 “primary”: “openrouter/minimax/minimax-m2.5”

}

}

MiniMax M2.5 is generally considered a lower-cost model that performs well for many automation and agent tasks, particularly when cost efficiency is important.

Step 6 — Remove Expensive Fallback Models

Some agent frameworks allow automatic model switching when a request fails or exceeds limits.

For example:

  • GPT models
  • Claude
  • Gemini

If automatic routing is enabled, the system may silently switch to a higher-cost model.

To control costs, I configured OpenClaw to use MiniMax M2.5 as the primary model and avoided automatic routing through OpenRouter.

This ensures the agent continues using the lower-cost model instead of escalating to expensive ones.

Step 7 — Enable Context Pruning

Agent conversations naturally grow over time.

Longer prompts lead to higher token usage.

To control this, I enabled context pruning, which removes older conversation context.

Example configuration (implementation may vary depending on framework):

“contextPruning”: {

“mode”: “cache-ttl”,

“ttl”: “1h”

}

The goal of context pruning is to:

  • remove stale context
  • reduce prompt size
  • improve response speed

Step 8 — Enable Memory Compaction

Agents may accumulate large amounts of internal memory over time.

Without limits, prompts can grow unnecessarily large.

Example configuration:

“compaction”: {

“mode”: “safeguard”,

“memoryFlush”: {

  “enabled”: true,

  “softThresholdTokens”: 4000

}

}

When memory grows beyond the defined threshold, the system compresses or clears older memory to maintain efficiency.

Step 9 — Enable Context Caching

Caching is one of the most powerful cost optimizations.

Instead of sending identical prompts repeatedly, the system reuses previously processed context.

In my testing with repeated workflows, caching achieved up to approximately a 96% cache hit rate.

When prompts are reused, fewer new tokens are processed, which significantly reduces API cost.

Step 10 — Reduce Reasoning Level

Some models support different reasoning intensities.

Higher reasoning levels typically consume more tokens.

In my setup, I configured the agent to use a lower reasoning level, which helps reduce token usage and speeds up responses.

Step 11 — Implement a Budget Guardrail System


To prevent unexpected API spending, I implemented a custom budget guardrail system.

This is not a built-in feature in many agent frameworks but can be implemented manually.

Example file:

~/.openclaw/workspace/budget.json

Example configuration:

{

“monthly_budget”: 5.31,

“daily_target”: 0.20,

“max_task_cost”: 0.05,

“spent_today”: 0,

“carry_over”: 0

}

This system allows the agent to:

  • limit monthly spending
  • control daily usage
  • prevent expensive tasks

It acts as a simple financial safety layer for AI automation systems.

Step 12 — Agent Specialization

Instead of running one large agent, I divided responsibilities across multiple specialized agents.

Example architecture:

Zeus → Manager agent

Jack → Technical / SEO agent

Benefits of this structure:

  • smaller prompts
  • more focused context
  • improved performance
  • lower token usage

Step 13 — Telegram Integration


The system communicates through Telegram using the Telegram Bot API.

Recommended architecture:

Telegram

  ↓

Zeus (Manager Agent)

  ↓

Delegates tasks

  ↓

Jack

Centralizing requests through a manager agent prevents unnecessary API calls from multiple agents responding independently.

Example Agent Prompts

Zeus — Manager Agent

You are Zeus, the system manager agent.

Responsibilities:

– receive requests from Telegram

– analyze the request

– delegate tasks to the correct agent

Agents available:

Jack

Role: technical SEO and development

Rules:

– keep responses concise

– minimize token usage

Jack — Technical Agent

You are Jack, the technical agent.

Responsibilities:

– website automation

– technical SEO

– debugging

– development tasks

Final Result

After applying these optimizations, the system includes:

  • low-cost model selection
  • removal of fallback models
  • context pruning
  • memory compaction
  • caching
  • lower reasoning level
  • custom budget guardrails
  • agent specialization

Cost Reduction Results

In my experiment, combining these optimizations reduced API costs by approximately 85% compared to a default configuration.

Actual results will vary depending on:

  • model selection
  • workflow complexity
  • prompt size
  • caching effectiveness

Related Article

This experiment builds on my previous guide explaining the full AI agent architecture.

If you’d like to see the complete setup, you can read:

https://zeagatdula.com/blog/ai-agent-system-using-openclaw

Final Thoughts

AI agents are incredibly powerful tools for automation.

However, without proper optimization, they can become expensive to operate.

By combining:

  • efficient models
  • prompt optimization
  • caching
  • specialized agents
  • budget controls

you can build AI automation systems that run at a fraction of the cost.