tollama Core | Tollama AI

// The Problem

Why another tool?

Every TSFM ships its own install, its own API, and its own dependency tree. Building on top of them means fighting fragmentation at every layer.

01 / FRAGMENTATION

Every model ships its own install.

Chronos needs torch, TimesFM needs JAX, Moirai needs einops. Installing multiple models in one environment leads to version hell.

02 / INCONSISTENT APIs

No common interface across models.

Different function signatures, parameter names, and output formats per family. Comparing models means writing adapters for each one.

03 / DEPENDENCY CONFLICTS

Models can't coexist.

torch 2.x and JAX fight over GPU memory allocation. Different CUDA requirements per family make unified environments impossible without isolation.

04 / NO EVIDENCE LOOP

Teams still guess which model to trust.

Without repeatable benchmarks and routing artifacts, production model choice becomes a manual judgment call instead of an evidence-backed workflow.

Quickstart

Install the Core path, start the daemon, and run the hourly-demand concrete solution in under five minutes.

1) Install

python -m pip install "tollama[eval,preprocess]"

# from source (dev):
python -m pip install -e ".[dev]"

2) Start the daemon

# terminal 1
tollama serve

# check health + diagnostics
curl http://localhost:11435/api/version
tollama doctor
tollama info --json

3) Run the concrete-solution demo

# terminal 2
tollama quickstart
USE_CHECKED_IN_INPUT=1 MODELS=mock bash examples/core_concrete_solution_demo.sh
tollama routing show artifacts/core-solution/benchmark/result.json --json

Human-friendly progress is enabled automatically on interactive terminals. The upstream repo now includes a checked-in hourly benchmark input, a concrete-solution walkthrough, and an expected artifact bundle: docs/concrete-solution.md and examples/core_solution_expected_output.

// Core Workflow

4 Core actions.
One evidence loop.

The first-touch story is not platform breadth. It is one operational loop: preprocess irregular series, forecast, benchmark, and route future requests.

Preprocess

import numpy as np
from tollama.preprocess import PreprocessConfig, run_pipeline

x = np.arange(48, dtype=float)
y = np.sin(x * 0.15) * 10
y[[7, 19, 33]] = np.nan

result = run_pipeline(x, y, config=PreprocessConfig(lookback=12, horizon=4))
print(result.X.shape, result.y.shape)

Spline-based preprocessing is the Core differentiator for irregular series with gaps, smoothing needs, and leakage-safe window generation.

Forecast

from tollama import Tollama

sdk = Tollama()
forecast = sdk.forecast(
    model="chronos2",
    series={"target": [10, 11, 12, 13, 14], "freq": "D"},
    horizon=3,
)
print(forecast.to_df())

Core keeps the forecast contract stable across TSFMs and neural baselines, even when runtime dependencies differ per family.

Benchmark

tollama benchmark examples/core_solution_hourly_input.json \
  --models chronos2,granite-ttm-r2,timesfm-2.5-200m,moirai-2.0-R-small \
  --horizon 24 \
  --folds 2 \
  --output artifacts/core-solution/benchmark

Core benchmark output is intentionally thin: result.json, routing.json, leaderboard.csv, and the operator-facing summary.md.

Route

tollama routing apply artifacts/core-solution/benchmark/result.json
tollama routing show artifacts/core-solution/benchmark/result.json --json

python -c "from tollama import Tollama; \
sdk = Tollama(); \
resp = sdk.auto_forecast(series={'target':[10,11,12,13,14],'freq':'D'}, horizon=3, mode='high_accuracy'); \
print(resp.selection.chosen_model)"

Routing turns benchmark evidence into reusable defaults like default, fast_path, and high_accuracy, while preserving lane rationale for later Trust attachment.

// Model Registry

Forecasting backends.
One Core contract.

The upstream registry spans TSFMs and neural baselines that share the same Core CLI, SDK, and HTTP forecast contract while family runtimes stay isolated under ~/.tollama/runtimes/.

PASSING

chronos2

Amazon · Chronos-2

past-numeric past-categorical known-future

PASSING

granite-ttm-r2

IBM · Granite TTM

past-numeric known-future-numeric

PASSING

timesfm-2.5-200m

Google · TimesFM 2.5

past-numeric xreg-support xreg-mode

PASSING

moirai-2.0-R-small

Salesforce · Uni2TS

past-numeric known-future-numeric

PASSING

sundial-base-128m

THUML · Sundial

target-only freq-agnostic

PASSING

toto-open-base-1.0

Datadog · Toto

past-numeric open-base

PASSING

lag-llama

TSFM Community · Lag-Llama

target-only zero-shot

BASELINE

patchtst

IBM Granite · PatchTST

neural-baseline target-only

BASELINE

tide

Unit8 / Darts · TiDE

neural-baseline past-numeric known-future

BASELINE

n-hits

Nixtla / NeuralForecast · N-HiTS

neural-baseline target-only auto-fit

BASELINE

n-beatsx

Nixtla / NeuralForecast · N-BEATSx

neural-baseline target-only auto-fit

Model Family	Past Numeric	Past Categorical	Known-Future Numeric	Known-Future Categorical
Chronos-2	✓	✓	✓	✓
Granite TTM	✓	—	✓	—
TimesFM 2.5	✓	—	✓	—
Uni2TS / Moirai	✓	—	✓	—
Sundial	—	—	—	—
Toto Open Base	✓	—	—	—
Lag-Llama	—	—	—	—
PatchTST	—	—	—	—
TiDE	✓	—	✓	—
N-HiTS / N-BEATSx	—	—	—	—

// Additional Integrations

Available after Core.
5 integration pathways.

These integrations remain available, but they are intentionally secondary to the Core workflow. Start with preprocess, forecast, benchmark, and route first.

15 MCP Tools

13 LangChain Tools

5+ Agent Pathways

42 REST Endpoints

🔌

MCP Server (15 tools)

Register tollama-mcp as an MCP server. AI assistants discover and call 15 forecasting tools — forecast, auto-forecast, pipeline, what-if, model management, and data ingest.

🦜

LangChain (13 tools)

First-party LangChain toolkit with 13 tools wrapping the full API surface. Compose forecast chains, embed in ReAct agents, or use with LangGraph workflows.

🤖

CrewAI / AutoGen / smolagents

Framework adapters now ship directly in the package: CrewAI tools, AutoGen tool specs plus function maps, and smolagents-compatible tool wrappers.

🧩

OpenClaw Skills

Skill package at skills/tollama-forecast/ with health, models, forecast, pull, rm, and info wrappers. E2E validated with contract-first error handling.

🤝

A2A Protocol (JSON-RPC)

Authenticated discovery plus task lifecycle support via POST /a2a and /.well-known/agent-card.json, including message/stream, tasks/get, tasks/query, and tasks/cancel.

// Architecture

A daemon that
manages everything.

The tollamad daemon supervises worker-per-family runtimes, keeps the public contract stable, and auto-bootstrap installs isolated venvs per backend when needed.

tollamad Public daemon for forecast, benchmark, routing, and lifecycle requests
runtimes/ Per-family isolated venvs with auto-bootstrap under ~/.tollama/runtimes/
JSON-lines stdio RPC between daemon and runners
SQLite Per-key usage metering at /api/usage
SSE Progressive forecasts and daemon events via /api/events
API keys Optional auth for daemon, docs, dashboard, and A2A

tollama daemon

(tollamad)

FastAPI 42 endpoints SSE Dashboard

venv

chronos2

venv

timesfm

venv

moirai

venv

lag-llama

venv

+ baselines

↕ JSON-lines stdio protocol

// Advanced Features

Beyond basic forecasting.

🧠

Structured Intelligence

Combined analysis, recommendation, and forecast in a single call via /api/report, with optional narrative blocks.

📡

Progressive Forecasting

Real-time forecast refinement and daemon event feeds over SSE via /api/forecast/progressive and /api/events.

🧪

Analyze + Generate

Model-free descriptive analysis at /api/analyze and synthetic series generation at /api/generate.

📥

Upload + Ingest

Forecast directly from CSV or Parquet using data_url, /api/forecast/upload, or /api/ingest/upload.

🔗

Workflow + Compare

Chain benchmarks, comparisons, and end-to-end plans through /api/compare and /api/pipeline.

🔮

What-if + Counterfactual + Trees

Explore alternative futures with /api/what-if, /api/counterfactual, and /api/scenario-tree.

⚖️

Auto-Forecast + Ensemble

Zero-config model selection via /api/auto-forecast, with ensemble mean and median strategies available today.

📄

TSModelfiles + Config

Create named forecast profiles with tollama modelfile and manage pull or routing defaults with tollama config.

🔒

Auth + Metrics + Diagnostics

Optional API-key auth, docs protection, usage metering at /api/usage, Prometheus at /metrics, and full diagnostics at /api/info.

// REST API

42 endpoints.
One daemon.

The current endpoint inventory spans system diagnostics, model lifecycle, upload plus ingest, stable v1 routes, structured analysis, scenario workflows, TSModelfiles, observability, and A2A discovery.

        json · /api/forecast
        POST
      
{

  "model": "timesfm-2.5-200m",

  "horizon": 2,

  "series": [{

    "id": "store_001",

    "freq": "D",

    "target": [120, 135, 142],

    "actuals": [141, 145],

    "past_covariates": {

      "promo": [0, 1, 0]

    },

    "future_covariates": {

      "promo": [1, 0]

    }

  }],

  "parameters": {

    "covariates_mode": "best_effort",

    "metrics": {

      "names": ["mape", "mase", "mae", "rmse", "smape"],

      "mase_seasonality": 1

    }

  }

}

// Roadmap

What's next.

The upstream roadmap is now implementation-aware and explicitly tracks what is shipped versus what remains for v1 hardening.

COMPLETED

Auto Model Comparison

Basic auto model comparison and selection are shipped, including ensemble mean and median strategies.

COMPLETED

Auto Data Preprocessing

Basic preprocessing is now implemented with validation, spline interpolation, smoothing, train-fit scaling, and sliding-window generation in tollama.preprocess.

IN PROGRESS

Runtime Hardening

Current TODOs center on VRAM reclaim policy, idle strategy, crash recovery behavior, and stronger runner lifecycle controls.

PLANNED

Local + Cloud Execution

The project remains local-first for v1; future packaging work is focused on better Docker and cloud deployment guidance rather than distributed training.

tollama

Forecast, compare,and serve simply.

Forecast, compare,
and serve simply.