Core Workflow · Python 3.11+

tollama

Benchmark-backed hourly demand forecasting core.
Preprocess irregular series, benchmark models, and route operational forecast workloads through one API.

Time-series forecasting is still fragmented across incompatible runtimes, ad hoc preprocessing, and weak production evidence. tollama Core turns that into one opinionated workflow for operations teams: preprocess, forecast, benchmark, and route from benchmark-backed evidence, with a checked-in hourly-demand demo path and a thin artifact bundle.

Star on GitHub Quick Start →
Why another tool?

Every TSFM ships its own install, its own API, and its own dependency tree. Building on top of them means fighting fragmentation at every layer.

01 / FRAGMENTATION
Every model ships its own install.
Chronos needs torch, TimesFM needs JAX, Moirai needs einops. Installing multiple models in one environment leads to version hell.
02 / INCONSISTENT APIs
No common interface across models.
Different function signatures, parameter names, and output formats per family. Comparing models means writing adapters for each one.
03 / DEPENDENCY CONFLICTS
Models can't coexist.
torch 2.x and JAX fight over GPU memory allocation. Different CUDA requirements per family make unified environments impossible without isolation.
04 / NO EVIDENCE LOOP
Teams still guess which model to trust.
Without repeatable benchmarks and routing artifacts, production model choice becomes a manual judgment call instead of an evidence-backed workflow.
tollama — bash
# Start the forecasting daemon
$ tollama serve
✓ Daemon running at http://127.0.0.1:11435

# Pull and validate the Core path
$ tollama quickstart
Pulling model, running forecast demo, printing next Core steps...

# Run the hourly-demand concrete solution path
$ USE_CHECKED_IN_INPUT=1 MODELS=mock bash examples/core_concrete_solution_demo.sh
result.json · routing.json · leaderboard.csv · summary.md
Quickstart

Install the Core path, start the daemon, and run the hourly-demand concrete solution in under five minutes.

1) Install
python -m pip install "tollama[eval,preprocess]"

# from source (dev):
python -m pip install -e ".[dev]"
2) Start the daemon
# terminal 1
tollama serve

# check health + diagnostics
curl http://localhost:11435/api/version
tollama doctor
tollama info --json
3) Run the concrete-solution demo
# terminal 2
tollama quickstart
USE_CHECKED_IN_INPUT=1 MODELS=mock bash examples/core_concrete_solution_demo.sh
tollama routing show artifacts/core-solution/benchmark/result.json --json

Human-friendly progress is enabled automatically on interactive terminals. The upstream repo now includes a checked-in hourly benchmark input, a concrete-solution walkthrough, and an expected artifact bundle: docs/concrete-solution.md and examples/core_solution_expected_output.

// Core Workflow
4 Core actions.
One evidence loop.

The first-touch story is not platform breadth. It is one operational loop: preprocess irregular series, forecast, benchmark, and route future requests.

Preprocess
import numpy as np
from tollama.preprocess import PreprocessConfig, run_pipeline

x = np.arange(48, dtype=float)
y = np.sin(x * 0.15) * 10
y[[7, 19, 33]] = np.nan

result = run_pipeline(x, y, config=PreprocessConfig(lookback=12, horizon=4))
print(result.X.shape, result.y.shape)

Spline-based preprocessing is the Core differentiator for irregular series with gaps, smoothing needs, and leakage-safe window generation.

Forecast
from tollama import Tollama

sdk = Tollama()
forecast = sdk.forecast(
    model="chronos2",
    series={"target": [10, 11, 12, 13, 14], "freq": "D"},
    horizon=3,
)
print(forecast.to_df())

Core keeps the forecast contract stable across TSFMs and neural baselines, even when runtime dependencies differ per family.

Benchmark
tollama benchmark examples/core_solution_hourly_input.json \
  --models chronos2,granite-ttm-r2,timesfm-2.5-200m,moirai-2.0-R-small \
  --horizon 24 \
  --folds 2 \
  --output artifacts/core-solution/benchmark

Core benchmark output is intentionally thin: result.json, routing.json, leaderboard.csv, and the operator-facing summary.md.

Route
tollama routing apply artifacts/core-solution/benchmark/result.json
tollama routing show artifacts/core-solution/benchmark/result.json --json

python -c "from tollama import Tollama; \
sdk = Tollama(); \
resp = sdk.auto_forecast(series={'target':[10,11,12,13,14],'freq':'D'}, horizon=3, mode='high_accuracy'); \
print(resp.selection.chosen_model)"

Routing turns benchmark evidence into reusable defaults like default, fast_path, and high_accuracy, while preserving lane rationale for later Trust attachment.

// Model Registry
Forecasting backends.
One Core contract.

The upstream registry spans TSFMs and neural baselines that share the same Core CLI, SDK, and HTTP forecast contract while family runtimes stay isolated under ~/.tollama/runtimes/.

PASSING
chronos2
Amazon · Chronos-2
past-numeric past-categorical known-future
PASSING
granite-ttm-r2
IBM · Granite TTM
past-numeric known-future-numeric
PASSING
timesfm-2.5-200m
Google · TimesFM 2.5
past-numeric xreg-support xreg-mode
PASSING
moirai-2.0-R-small
Salesforce · Uni2TS
past-numeric known-future-numeric
PASSING
sundial-base-128m
THUML · Sundial
target-only freq-agnostic
PASSING
toto-open-base-1.0
Datadog · Toto
past-numeric open-base
PASSING
lag-llama
TSFM Community · Lag-Llama
target-only zero-shot
BASELINE
patchtst
IBM Granite · PatchTST
neural-baseline target-only
BASELINE
tide
Unit8 / Darts · TiDE
neural-baseline past-numeric known-future
BASELINE
n-hits
Nixtla / NeuralForecast · N-HiTS
neural-baseline target-only auto-fit
BASELINE
n-beatsx
Nixtla / NeuralForecast · N-BEATSx
neural-baseline target-only auto-fit
Model Family Past Numeric Past Categorical Known-Future Numeric Known-Future Categorical
Chronos-2
Granite TTM
TimesFM 2.5
Uni2TS / Moirai
Sundial
Toto Open Base
Lag-Llama
PatchTST
TiDE
N-HiTS / N-BEATSx
// Additional Integrations
Available after Core.
5 integration pathways.

These integrations remain available, but they are intentionally secondary to the Core workflow. Start with preprocess, forecast, benchmark, and route first.

15 MCP Tools
13 LangChain Tools
5+ Agent Pathways
42 REST Endpoints
🔌
MCP Server (15 tools)

Register tollama-mcp as an MCP server. AI assistants discover and call 15 forecasting tools — forecast, auto-forecast, pipeline, what-if, model management, and data ingest.

🦜
LangChain (13 tools)

First-party LangChain toolkit with 13 tools wrapping the full API surface. Compose forecast chains, embed in ReAct agents, or use with LangGraph workflows.

🤖
CrewAI / AutoGen / smolagents

Framework adapters now ship directly in the package: CrewAI tools, AutoGen tool specs plus function maps, and smolagents-compatible tool wrappers.

🧩
OpenClaw Skills

Skill package at skills/tollama-forecast/ with health, models, forecast, pull, rm, and info wrappers. E2E validated with contract-first error handling.

🤝
A2A Protocol (JSON-RPC)

Authenticated discovery plus task lifecycle support via POST /a2a and /.well-known/agent-card.json, including message/stream, tasks/get, tasks/query, and tasks/cancel.

A daemon that
manages everything.

The tollamad daemon supervises worker-per-family runtimes, keeps the public contract stable, and auto-bootstrap installs isolated venvs per backend when needed.

  • tollamad Public daemon for forecast, benchmark, routing, and lifecycle requests
  • runtimes/ Per-family isolated venvs with auto-bootstrap under ~/.tollama/runtimes/
  • JSON-lines stdio RPC between daemon and runners
  • SQLite Per-key usage metering at /api/usage
  • SSE Progressive forecasts and daemon events via /api/events
  • API keys Optional auth for daemon, docs, dashboard, and A2A
tollama daemon
(tollamad)
FastAPI 42 endpoints SSE Dashboard
venv
chronos2
venv
timesfm
venv
moirai
venv
lag-llama
venv
+ baselines
JSON-lines stdio protocol
// Advanced Features
Beyond basic forecasting.
🧠
Structured Intelligence

Combined analysis, recommendation, and forecast in a single call via /api/report, with optional narrative blocks.

📡
Progressive Forecasting

Real-time forecast refinement and daemon event feeds over SSE via /api/forecast/progressive and /api/events.

🧪
Analyze + Generate

Model-free descriptive analysis at /api/analyze and synthetic series generation at /api/generate.

📥
Upload + Ingest

Forecast directly from CSV or Parquet using data_url, /api/forecast/upload, or /api/ingest/upload.

🔗
Workflow + Compare

Chain benchmarks, comparisons, and end-to-end plans through /api/compare and /api/pipeline.

🔮
What-if + Counterfactual + Trees

Explore alternative futures with /api/what-if, /api/counterfactual, and /api/scenario-tree.

⚖️
Auto-Forecast + Ensemble

Zero-config model selection via /api/auto-forecast, with ensemble mean and median strategies available today.

📄
TSModelfiles + Config

Create named forecast profiles with tollama modelfile and manage pull or routing defaults with tollama config.

🔒
Auth + Metrics + Diagnostics

Optional API-key auth, docs protection, usage metering at /api/usage, Prometheus at /metrics, and full diagnostics at /api/info.

42 endpoints.
One daemon.

The current endpoint inventory spans system diagnostics, model lifecycle, upload plus ingest, stable v1 routes, structured analysis, scenario workflows, TSModelfiles, observability, and A2A discovery.

json · /api/forecast POST
{
  "model": "timesfm-2.5-200m",
  "horizon": 2,
  "series": [{
    "id": "store_001",
    "freq": "D",
    "target": [120, 135, 142],
    "actuals": [141, 145],
    "past_covariates": {
      "promo": [0, 1, 0]
    },
    "future_covariates": {
      "promo": [1, 0]
    }
  }],
  "parameters": {
    "covariates_mode": "best_effort",
    "metrics": {
      "names": ["mape", "mase", "mae", "rmse", "smape"],
      "mase_seasonality": 1
    }
  }
}
System
/api/version
/api/info
/v1/health
Forecasting
/api/validate
/api/forecast
/api/forecast/progressive
/v1/forecast
/api/auto-forecast
Structured Analysis
/api/analyze
/api/generate
/api/compare
/api/report
/api/pipeline
Model Management
/api/tags
/api/show
/api/pull
/api/delete
/api/ps
/v1/models
Upload + Profiles
/api/ingest/upload
/api/forecast/upload
/api/modelfiles
/api/modelfiles/{name}
Observability + Agents
/api/usage
/api/events
/metrics
/.well-known/agent-card.json
/a2a
// Roadmap
What's next.

The upstream roadmap is now implementation-aware and explicitly tracks what is shipped versus what remains for v1 hardening.

COMPLETED
Auto Model Comparison
Basic auto model comparison and selection are shipped, including ensemble mean and median strategies.
COMPLETED
Auto Data Preprocessing
Basic preprocessing is now implemented with validation, spline interpolation, smoothing, train-fit scaling, and sliding-window generation in tollama.preprocess.
IN PROGRESS
Runtime Hardening
Current TODOs center on VRAM reclaim policy, idle strategy, crash recovery behavior, and stronger runner lifecycle controls.
PLANNED
Local + Cloud Execution
The project remains local-first for v1; future packaging work is focused on better Docker and cloud deployment guidance rather than distributed training.
// Get Started

Forecast, compare,
and serve simply.

Runtime management, analysis, ingest, dashboards, and agent integration all ship in the same platform.

View on GitHub ↗ API Reference →