AutoGRAMS Supervisors: Simulation & Finetuning Guide

Overview

AutoGRAMS provides tools to simulate conversations between agents and use the results to train or finetune models. This guide walks through:

Setting up simulations between a model chatbot and a user chatbot
Using supervisor functions to capture richer outputs
Saving and transforming simulation data
Running finetuning with OpenAI or Hugging Face models

Why Simulate Conversations?

Simulation helps generate data for two common scenarios:

LLM Distillation — simulate high-quality conversations using GPT-4 or a complex agent, then finetune a smaller model (e.g. GPT-3.5, Mistral, Qwen) to behave similarly.
Agent Distillation — convert a complex, multi-step AutoGRAM (with planning, thoughts, or tool use) into a simpler agent using direct primitives like reply_instruction() or silent_thought().

Simulating with `run_simulation.py`

Run simulated conversations using:

python run_simulation.py \
  --autogram_file my_chatbot.py \
  --userbot_file user_agent.py \
  --num_turns 2 \
  --num_examples 10 \
  --save_dir simulation_data

Key Arguments

Flag	Purpose
`--autogram_file`	File with chatbot logic (must define `chatbot()` function)
`--userbot_file`	File defining the user agent logic
`--num_turns`	Number of turns per conversation
`--num_examples`	Number of conversations to simulate
`--simulation_list_file`	Optional: predefined list of scenarios
`--save_dir`	Where to save `.pkl` memory outputs

What is a Userbot?

A userbot can be:

A simple echo agent:

@autograms_function()
def chatbot(system_prompt):
    set_system_prompt(system_prompt)
    while True:
        reply_instruction("Reply as the user following the system prompt")

A fully autonomous AutoGRAM with memory and branching logic.

Supervisor Functions

Supervisor functions let you override primitive calls (e.g., silent_thought) with more sophisticated multi-step reasoning during simulation.

When supervisor_mode=True, you can wrap calls like this:

silent_thought("Summarize the article", supervisor=reasoned_summary, article=article)

Example Supervisor Function

from autograms.supervisors import SupervisorReturn

def reasoned_summary(instruction, article):
    findings = silent_thought(f"{article}\n\nWhat are the key findings?", article=article)
    rel = silent_thought(f"{article}\nFindings: {findings}\nHow are they relevant?", article=article)
    result = silent_thought(f"{article}\nFinal Summary: {findings}\nRelevance: {rel}", article=article)
    return SupervisorReturn(result)

This logic only runs during simulation. During deployment, silent_thought() is used directly.

DPO Training with SupervisorReturn

You can also use supervisor functions to output preferred and rejected completions:

def make_formal(instruction):
    formal = silent_thought(f"{instruction}. Make it formal.")
    informal = silent_thought(f"{instruction}. Make it casual.")
    return SupervisorReturn(formal, rejected_output=informal)

reply_instruction("Respond to the user.", supervisor=make_formal)

Custom Simulation Scenarios

You can define a JSON list of simulation cases:

[
  {"num_turns": 2, "chatbot_kwargs": {"tag": "A"}, "user_kwargs": {}},
  {"num_turns": 3, "chatbot_kwargs": {"tag": "B"}, "user_kwargs": {}}
]

Then use:

--simulation_list_file my_cases.json

Finetuning with `run_finetuning.py`

Once .pkl files are generated by the simulation, run:

python run_finetuning.py \
  --save_dir simulation_data \
  --model Qwen/Qwen2.5-32B-Instruct \
  --model_type huggingface \
  --finetuning_type dpo

Finetuning Options

Option	Values	Purpose
`--model_type`	`huggingface`, `openai`	Backend to train on
`--finetuning_type`	`normal`, `dpo`	Choose standard vs. preference training
`--prepare_data_only`	flag	Export JSONL but skip training

Output Files

File	Contents
`train.jsonl`	Instruction → output pairs
`train_dpo.jsonl`	DPO format: prompt, chosen, rejected

These can be loaded into OpenAI’s finetuning CLI, Hugging Face’s Trainer, or any custom pipeline.

Summary: Components

Component	Role
`run_simulation.py`	Simulates agent-user conversations, optionally with supervisors
Supervisor functions	Adds logic for complex reasoning or preference selection
`run_finetuning.py`	Extracts conversation data and calls finetuning APIs

AutoGRAMS lets you simulate rich interaction traces — then distill them into powerful, streamlined agents.