Skip to content

AutoGRAMS Supervisors: Simulation & Finetuning Guide

Overview

AutoGRAMS provides tools to simulate conversations between agents and use the results to train or finetune models. This guide walks through:

  • Setting up simulations between a model chatbot and a user chatbot
  • Using supervisor functions to capture richer outputs
  • Saving and transforming simulation data
  • Running finetuning with OpenAI or Hugging Face models

Why Simulate Conversations?

Simulation helps generate data for two common scenarios:

  1. LLM Distillation — simulate high-quality conversations using GPT-4 or a complex agent, then finetune a smaller model (e.g. GPT-3.5, Mistral, Qwen) to behave similarly.
  2. Agent Distillation — convert a complex, multi-step AutoGRAM (with planning, thoughts, or tool use) into a simpler agent using direct primitives like reply_instruction() or silent_thought().

Simulating with run_simulation.py

Run simulated conversations using:

python run_simulation.py \
  --autogram_file my_chatbot.py \
  --userbot_file user_agent.py \
  --num_turns 2 \
  --num_examples 10 \
  --save_dir simulation_data

Key Arguments

Flag Purpose
--autogram_file File with chatbot logic (must define chatbot() function)
--userbot_file File defining the user agent logic
--num_turns Number of turns per conversation
--num_examples Number of conversations to simulate
--simulation_list_file Optional: predefined list of scenarios
--save_dir Where to save .pkl memory outputs

What is a Userbot?

A userbot can be:

  • A simple echo agent:
@autograms_function()
def chatbot(system_prompt):
    set_system_prompt(system_prompt)
    while True:
        reply_instruction("Reply as the user following the system prompt")
  • A fully autonomous AutoGRAM with memory and branching logic.

Supervisor Functions

Supervisor functions let you override primitive calls (e.g., silent_thought) with more sophisticated multi-step reasoning during simulation.

When supervisor_mode=True, you can wrap calls like this:

silent_thought("Summarize the article", supervisor=reasoned_summary, article=article)

Example Supervisor Function

from autograms.supervisors import SupervisorReturn

def reasoned_summary(instruction, article):
    findings = silent_thought(f"{article}\n\nWhat are the key findings?", article=article)
    rel = silent_thought(f"{article}\nFindings: {findings}\nHow are they relevant?", article=article)
    result = silent_thought(f"{article}\nFinal Summary: {findings}\nRelevance: {rel}", article=article)
    return SupervisorReturn(result)

This logic only runs during simulation. During deployment, silent_thought() is used directly.


DPO Training with SupervisorReturn

You can also use supervisor functions to output preferred and rejected completions:

def make_formal(instruction):
    formal = silent_thought(f"{instruction}. Make it formal.")
    informal = silent_thought(f"{instruction}. Make it casual.")
    return SupervisorReturn(formal, rejected_output=informal)

reply_instruction("Respond to the user.", supervisor=make_formal)

Custom Simulation Scenarios

You can define a JSON list of simulation cases:

[
  {"num_turns": 2, "chatbot_kwargs": {"tag": "A"}, "user_kwargs": {}},
  {"num_turns": 3, "chatbot_kwargs": {"tag": "B"}, "user_kwargs": {}}
]

Then use:

--simulation_list_file my_cases.json

Finetuning with run_finetuning.py

Once .pkl files are generated by the simulation, run:

python run_finetuning.py \
  --save_dir simulation_data \
  --model Qwen/Qwen2.5-32B-Instruct \
  --model_type huggingface \
  --finetuning_type dpo

Finetuning Options

Option Values Purpose
--model_type huggingface, openai Backend to train on
--finetuning_type normal, dpo Choose standard vs. preference training
--prepare_data_only flag Export JSONL but skip training

Output Files

File Contents
train.jsonl Instruction → output pairs
train_dpo.jsonl DPO format: prompt, chosen, rejected

These can be loaded into OpenAI’s finetuning CLI, Hugging Face’s Trainer, or any custom pipeline.


Summary: Components

Component Role
run_simulation.py Simulates agent-user conversations, optionally with supervisors
Supervisor functions Adds logic for complex reasoning or preference selection
run_finetuning.py Extracts conversation data and calls finetuning APIs

AutoGRAMS lets you simulate rich interaction traces — then distill them into powerful, streamlined agents.