Thesis

The operating system for machine learning.

Thesis helps you improve models systematically.

Download for macOS

From hypothesis to measurable improvement.

Turn ideas into experiments, run them end-to-end, and ship the best-performing version — with clear, tracked deltas.

Learn about Agent Mode

Hypothesis → Delta

Baseline. Change. Measure.

fraud.py

Baseline (naive split)

Running…

1tx = load_table("card_transactions.parquet")
2train, val = split(tx, method="random", test_size=0.2)
3
4X_train, y_train = featurize(train, label="is_fraud")
5X_val, y_val = featurize(val, label="is_fraud")
6
7clf = fit(
8  model="xgb_classifier",
9  X=X_train,
10  y=y_train
11)
12
13p = clf.predict_proba(X_val)[:, 1]
14auc = pr_auc(y_val, p)
15print(auc)

Three modes for every workflow

Ask mode for safe exploration without changes. Plan mode for research and structured planning. Agent mode for full autonomous execution. You control how much independence to give the AI.

Learn about Modes

Your prompt:

“Build a sentiment classifier for product reviews”

Ask

Read-only

For sentiment analysis, you'll want a transformer-based model like BERT or DistilBERT. Fine-tune on your labeled dataset, then evaluate with precision/recall metrics...

Plan

Planning

I'll build a sentiment classifier using DistilBERT...

Plan

1.Load and preprocess the reviews dataset
2.Fine-tune DistilBERT for classification
3.Evaluate on held-out test set
4.Export model for inference

Agent

Execution

I'll build a sentiment classifier using DistilBERT...

Plan

1.Load and preprocess the reviews dataset
2.Fine-tune DistilBERT for classification
3.Evaluate on held-out test set
4.Export model for inference

Changes made

├──data/preprocessing.py(new)

├──models/sentiment_classifier.py(new)

├──train.py(+45 lines)

Ask

Read-only

For sentiment analysis, you'll want a transformer-based model like BERT or DistilBERT. Fine-tune on your labeled dataset, then evaluate with precision/recall metrics...

Plan

Planning

I'll build a sentiment classifier using DistilBERT...

Plan

1.Load and preprocess the reviews dataset
2.Fine-tune DistilBERT for classification
3.Evaluate on held-out test set
4.Export model for inference

Agent

Execution

I'll build a sentiment classifier using DistilBERT...

Plan

1.Load and preprocess the reviews dataset
2.Fine-tune DistilBERT for classification
3.Evaluate on held-out test set
4.Export model for inference

Changes made

├──data/preprocessing.py(new)

├──models/sentiment_classifier.py(new)

├──train.py(+45 lines)

Choose your intelligence

On-premise

Run Thesis entirely on your own machine or cluster using Ollama. Your data never leaves your environment.

Learn about on-prem

RuntimeLocal

Modelllama3.1:8b

Datalocal only

Network0 external calls

Ideal for regulated or private data.

Frontier models

Access cutting-edge models from OpenAI, including reasoning models optimized for complex ML workflows.

Explore models

GPT-4oSuggested

o1-previewReasoning

o3-miniReasoning

Switch models per task or per run.

Integrated web search

Let the agent pull in documentation, papers, and live information while it works — without breaking context.

Learn about search

PyTorch DataLoader best practices

Searching documentation…

CLI-first workflow

Run Thesis programmatically from the command line. Perfect for automation, CI/CD, and large-scale experiments.

View CLI docs

$ thesis exec \

"train a fraud model with time-split eval" \

--model o1-preview \

--mode agent

Try Thesis now.

Download for macOS