Sakana AI’s RLT: Computers Learn How to Teach
Introduction
If you had to teach a math problem to a friend, how would you go about it? Would you just give them the answer? Or would you walk through the solution step by step?
The new research from Sakana AI introduced today starts from exactly that idea. Computers, just like people, can learn how to teach!
The Problem with Existing AI: The Lone Learner
Limitations of the Old Approach
Until now, smart AI systems studied like this:
- Try to solve problems on their own
- Get rewarded when correct, and try again when wrong
- Repeat to build problem-solving ability
What is wrong with this method?
- Too expensive and time-consuming: only large computers can participate
- Good at only one domain: if you only solve math problems, you struggle with everything else
- Hard to teach others: knows the answer but cannot explain it well
It is like a math genius who cannot explain anything to their friends!
The New Approach: RLT, the Teaching AI
What Is RLT?
RLT (Reinforcement Learning Teachers) is AI that learns how to teach.
What makes it different from existing AI:
- Given the problem and the answer in advance
- Practices explaining the solution process
- Is scored on whether the student understood
Comparing with a Real Teacher
| Good Teacher | RLT AI |
|---|---|
| Does not just recite theorems | Does not re-solve the problem from scratch |
| Explains so the student can understand | Explains so another AI can understand |
| Improves teaching based on student reactions | Measures performance by the student AI’s comprehension |
Remarkable Results from RLT
The Birth of a Small Giant
The results from Sakana AI’s experiments are truly surprising:
- A small RLT with 7 billion components (7B parameters)
- Outperforms the massive DeepSeek R1 with 671 billion components (671B parameters) as a teacher!
Results by the Numbers
Teaching a student AI of the same size:
- RLT teacher: 26.3 points
- DeepSeek R1 teacher: 18.9 points
Teaching a larger student AI:
- RLT teacher: 37.6 points
- DeepSeek R1 teacher: 34.4 points
The small teacher even outperformed itself teaching a student four times its own size!
Why RLT Is Special
1. Efficiency: Fast and Cheap
Old method:
- Months of training on large computers
- Large amounts of electricity and money required
RLT method:
- Training complete in a single day
- Possible even on small computers
2. Clear Explanations
DeepSeek R1’s explanations:
- Describes how to use a calculator
- Includes jokes or unrelated remarks
- Complex and difficult expressions
RLT’s explanations:
- Explains only the core points precisely
- Adds explanation for any missing steps
- Clear and direct language
3. Complementary, Not a Replacement
RLT does not completely replace the old approach; they work better together!
- Build foundational skills with RLT
- Finish with traditional reinforcement learning
- Achieve higher results!
Understanding Through Everyday Examples
Comparing to a Math Tutor
Existing AI (problem-solving genius):
Student: "How do I solve this problem?"
AI: "The answer is 42. I am not sure how I solved it either."
RLT AI (teaching expert):
Student: "How do I solve this problem?"
RLT: "Great! Let us look at this part first.
Step 1: Multiply both sides by 2
Step 2: Subtract 3
Step 3: There is your answer!
Does that make sense?"
Future Outlook
1. Cheaper AI Education
- Building smart AI on small computers becomes possible
- Individuals and small companies can participate in AI development
2. AI That Teaches Itself
In the future, AI may play teacher and student at the same time!
- Learning by explaining to itself
- Continuously improving AI
3. Expansion into Diverse Fields
- Beyond math into science, language, and the arts
- Teaching AI capable of appearing in every field
Summary
RLT has presented a new paradigm for AI development:
- Method matters more than size
- The importance of teaching ability
- Balancing efficiency and effectiveness
Just as a small but excellent teacher can successfully teach much larger students, a small AI trained in the right way can also achieve great results.
When you teach someone, remember not to just give them the answer: explain things step by step so they can understand. That is what true teaching means, and AI is now learning that lesson too!
References
- Paper: Reinforcement Learning Teachers of Test Time Scaling
- Code: GitHub - SakanaAI/RLT
- Original post: Sakana AI - RLT