AutoGRAMS Supervisors: Simulation & Finetuning Guide
Overview
AutoGRAMS provides tools to simulate conversations between agents and use the results to train or finetune models. This guide walks through:
- Setting up simulations between a model chatbot and a user chatbot
- Using supervisor functions to capture richer outputs
- Saving and transforming simulation data
- Running finetuning with OpenAI or Hugging Face models
Why Simulate Conversations?
Simulation helps generate data for two common scenarios:
- LLM Distillation — simulate high-quality conversations using GPT-4 or a complex agent, then finetune a smaller model (e.g. GPT-3.5, Mistral, Qwen) to behave similarly.
- Agent Distillation — convert a complex, multi-step AutoGRAM (with planning, thoughts, or tool use) into a simpler agent using direct primitives like
reply_instruction()
orsilent_thought()
.
Simulating with run_simulation.py
Run simulated conversations using:
python run_simulation.py \
--autogram_file my_chatbot.py \
--userbot_file user_agent.py \
--num_turns 2 \
--num_examples 10 \
--save_dir simulation_data
Key Arguments
Flag | Purpose |
---|---|
--autogram_file |
File with chatbot logic (must define chatbot() function) |
--userbot_file |
File defining the user agent logic |
--num_turns |
Number of turns per conversation |
--num_examples |
Number of conversations to simulate |
--simulation_list_file |
Optional: predefined list of scenarios |
--save_dir |
Where to save .pkl memory outputs |
What is a Userbot?
A userbot can be:
- A simple echo agent:
@autograms_function()
def chatbot(system_prompt):
set_system_prompt(system_prompt)
while True:
reply_instruction("Reply as the user following the system prompt")
- A fully autonomous AutoGRAM with memory and branching logic.
Supervisor Functions
Supervisor functions let you override primitive calls (e.g., silent_thought
) with more sophisticated multi-step reasoning during simulation.
When supervisor_mode=True
, you can wrap calls like this:
silent_thought("Summarize the article", supervisor=reasoned_summary, article=article)
Example Supervisor Function
from autograms.supervisors import SupervisorReturn
def reasoned_summary(instruction, article):
findings = silent_thought(f"{article}\n\nWhat are the key findings?", article=article)
rel = silent_thought(f"{article}\nFindings: {findings}\nHow are they relevant?", article=article)
result = silent_thought(f"{article}\nFinal Summary: {findings}\nRelevance: {rel}", article=article)
return SupervisorReturn(result)
This logic only runs during simulation. During deployment, silent_thought()
is used directly.
DPO Training with SupervisorReturn
You can also use supervisor functions to output preferred and rejected completions:
def make_formal(instruction):
formal = silent_thought(f"{instruction}. Make it formal.")
informal = silent_thought(f"{instruction}. Make it casual.")
return SupervisorReturn(formal, rejected_output=informal)
reply_instruction("Respond to the user.", supervisor=make_formal)
Custom Simulation Scenarios
You can define a JSON list of simulation cases:
[
{"num_turns": 2, "chatbot_kwargs": {"tag": "A"}, "user_kwargs": {}},
{"num_turns": 3, "chatbot_kwargs": {"tag": "B"}, "user_kwargs": {}}
]
Then use:
--simulation_list_file my_cases.json
Finetuning with run_finetuning.py
Once .pkl
files are generated by the simulation, run:
python run_finetuning.py \
--save_dir simulation_data \
--model Qwen/Qwen2.5-32B-Instruct \
--model_type huggingface \
--finetuning_type dpo
Finetuning Options
Option | Values | Purpose |
---|---|---|
--model_type |
huggingface , openai |
Backend to train on |
--finetuning_type |
normal , dpo |
Choose standard vs. preference training |
--prepare_data_only |
flag | Export JSONL but skip training |
Output Files
File | Contents |
---|---|
train.jsonl |
Instruction → output pairs |
train_dpo.jsonl |
DPO format: prompt, chosen, rejected |
These can be loaded into OpenAI’s finetuning CLI, Hugging Face’s Trainer
, or any custom pipeline.
Summary: Components
Component | Role |
---|---|
run_simulation.py |
Simulates agent-user conversations, optionally with supervisors |
Supervisor functions | Adds logic for complex reasoning or preference selection |
run_finetuning.py |
Extracts conversation data and calls finetuning APIs |
AutoGRAMS lets you simulate rich interaction traces — then distill them into powerful, streamlined agents.