A Step-by-Step Guide (Windows + WSL + Ubuntu Setup)
While experimenting with AI agents, I discovered something quickly:
AI agents are powerful — but they can burn through API credits very fast.
If you run an AI agent framework without optimization, costs can increase dramatically due to token usage, large context windows, and inefficient prompts.
So I decided to run a small experiment.
Goal:
Build a functional AI agent system while minimizing token usage and API costs.
The result was a system that runs significantly cheaper than a default configuration, with my setup reducing costs by around 85% in testing.
This article explains how the system was built and optimized step by step.
System Architecture
My AI agent setup uses a hybrid environment.
Host Operating System
Windows 11
Linux Environment
The system runs Linux through Windows Subsystem for Linux (WSL), which allows a full Linux environment inside Windows.
Linux Distribution
Ubuntu installed through WSL.
Agent Framework
OpenClaw
Model Routing
Requests are routed through OpenRouter, which provides access to multiple AI models through one API.
Primary Model
The system uses MiniMax M2.5.
MiniMax M2.5 is a relatively low-cost model that performs well for many agent workflows, especially when optimizing for lower token costs.
Table of Contents
- Step 1 — Installing Windows Subsystem for Linux
- Step 2 — Installing Ubuntu
- Step 3 — Installing OpenClaw
- Step 4 — Creating the Agent Workspace
- Step 5 — Configure a Lower-Cost Model
- Step 6 — Remove Expensive Fallback Models
- Step 7 — Enable Context Pruning
- Step 8 — Enable Memory Compaction
- Step 9 — Enable Context Caching
- Step 10 — Reduce Reasoning Level
- Step 11 — Implement a Budget Guardrail System
- Step 12 — Agent Specialization
- Step 13 — Telegram Integration
- Final Result
- Cost Reduction Results
- Related Article
- Final Thoughts
Step 1 — Installing Windows Subsystem for Linux
The first step is enabling WSL.
Open PowerShell as Administrator and run:
wsl –install
Restart your computer after installation.
If Ubuntu is not automatically installed, run:
wsl –install -d Ubuntu
You can learn more in Microsoft’s official WSL documentation:
https://learn.microsoft.com/en-us/windows/wsl/
Step 2 — Installing Ubuntu
Once WSL is enabled, open the Ubuntu terminal.
Update system packages:
sudo apt update
sudo apt upgrade -y
Install basic development tools:
sudo apt install git curl jq nodejs npm -y
Ubuntu installation instructions are also available here:
https://ubuntu.com/tutorials/install-ubuntu-on-wsl2
Step 3 — Installing OpenClaw
Next, install the OpenClaw framework.
Clone the repository:
git clone https://github.com/openclaw/openclaw
Enter the directory:
cd openclaw
Install dependencies:
npm install
Depending on the repository version, some setups may also require commands such as:
npm run build
npm start
or
npm run dev
Check the repository documentation for the exact startup command.
Step 4 — Creating the Agent Workspace
OpenClaw agents use a workspace directory where files, memory, and outputs are stored.
Create the workspace:
mkdir -p ~/.openclaw/workspace
Your directory structure should look like this:
~/.openclaw
├── workspace
└── config.json
This workspace is used to store:
- generated outputs
- agent memory
- task files
- configuration data
Step 5 — Configure a Lower-Cost Model
One of the biggest cost optimizations is selecting an efficient model.
Instead of higher-cost models, I configured the system to use:
MiniMax M2.5
Example configuration:
{
“model”: {
“primary”: “openrouter/minimax/minimax-m2.5”
}
}
MiniMax M2.5 is generally considered a lower-cost model that performs well for many automation and agent tasks, particularly when cost efficiency is important.
Step 6 — Remove Expensive Fallback Models
Some agent frameworks allow automatic model switching when a request fails or exceeds limits.
For example:
- GPT models
- Claude
- Gemini
If automatic routing is enabled, the system may silently switch to a higher-cost model.
To control costs, I configured OpenClaw to use MiniMax M2.5 as the primary model and avoided automatic routing through OpenRouter.
This ensures the agent continues using the lower-cost model instead of escalating to expensive ones.
Step 7 — Enable Context Pruning
Agent conversations naturally grow over time.
Longer prompts lead to higher token usage.
To control this, I enabled context pruning, which removes older conversation context.
Example configuration (implementation may vary depending on framework):
“contextPruning”: {
“mode”: “cache-ttl”,
“ttl”: “1h”
}
The goal of context pruning is to:
- remove stale context
- reduce prompt size
- improve response speed
Step 8 — Enable Memory Compaction
Agents may accumulate large amounts of internal memory over time.
Without limits, prompts can grow unnecessarily large.
Example configuration:
“compaction”: {
“mode”: “safeguard”,
“memoryFlush”: {
“enabled”: true,
“softThresholdTokens”: 4000
}
}
When memory grows beyond the defined threshold, the system compresses or clears older memory to maintain efficiency.
Step 9 — Enable Context Caching
Caching is one of the most powerful cost optimizations.
Instead of sending identical prompts repeatedly, the system reuses previously processed context.
In my testing with repeated workflows, caching achieved up to approximately a 96% cache hit rate.
When prompts are reused, fewer new tokens are processed, which significantly reduces API cost.
Step 10 — Reduce Reasoning Level
Some models support different reasoning intensities.
Higher reasoning levels typically consume more tokens.
In my setup, I configured the agent to use a lower reasoning level, which helps reduce token usage and speeds up responses.
Step 11 — Implement a Budget Guardrail System
To prevent unexpected API spending, I implemented a custom budget guardrail system.
This is not a built-in feature in many agent frameworks but can be implemented manually.
Example file:
~/.openclaw/workspace/budget.json
Example configuration:
{
“monthly_budget”: 5.31,
“daily_target”: 0.20,
“max_task_cost”: 0.05,
“spent_today”: 0,
“carry_over”: 0
}
This system allows the agent to:
- limit monthly spending
- control daily usage
- prevent expensive tasks
It acts as a simple financial safety layer for AI automation systems.
Step 12 — Agent Specialization
Instead of running one large agent, I divided responsibilities across multiple specialized agents.
Example architecture:
Zeus → Manager agent
Jack → Technical / SEO agent
Benefits of this structure:
- smaller prompts
- more focused context
- improved performance
- lower token usage
Step 13 — Telegram Integration
The system communicates through Telegram using the Telegram Bot API.
Recommended architecture:
Telegram
↓
Zeus (Manager Agent)
↓
Delegates tasks
↓
Jack
Centralizing requests through a manager agent prevents unnecessary API calls from multiple agents responding independently.
Example Agent Prompts
Zeus — Manager Agent
You are Zeus, the system manager agent.
Responsibilities:
– receive requests from Telegram
– analyze the request
– delegate tasks to the correct agent
Agents available:
Jack
Role: technical SEO and development
Rules:
– keep responses concise
– minimize token usage
Jack — Technical Agent
You are Jack, the technical agent.
Responsibilities:
– website automation
– technical SEO
– debugging
– development tasks
Final Result
After applying these optimizations, the system includes:
- low-cost model selection
- removal of fallback models
- context pruning
- memory compaction
- caching
- lower reasoning level
- custom budget guardrails
- agent specialization
Cost Reduction Results
In my experiment, combining these optimizations reduced API costs by approximately 85% compared to a default configuration.
Actual results will vary depending on:
- model selection
- workflow complexity
- prompt size
- caching effectiveness
Related Article
This experiment builds on my previous guide explaining the full AI agent architecture.
If you’d like to see the complete setup, you can read:
https://zeagatdula.com/blog/ai-agent-system-using-openclaw
Final Thoughts
AI agents are incredibly powerful tools for automation.
However, without proper optimization, they can become expensive to operate.
By combining:
- efficient models
- prompt optimization
- caching
- specialized agents
- budget controls
you can build AI automation systems that run at a fraction of the cost.