Lemma AI: Automated Research & Code Generation for AI
Explore Lemma AI, a multi-agent system that automates scientific research, from literature review and experiment design to code generation and paper writing. Learn how it reduces LLM hallucinations and democratizes advanced AI research.
Introduction

Lemma AI is a multi-agent research system that automates the entire scientific research process, from ideation and experimentation to writing professional-grade papers. It democratizes access to in-depth research, allowing users to describe their research goals and have an AI system autonomously execute the necessary steps.
Configuration Checklist
| Element | Version / Link |
|---|---|
| Language / Runtime | Python (implied) |
| Main libraries | PyTorch, Torchvision, Matplotlib, Seaborn, Scikit-learn (implied) |
| Required APIs | Kolors API (for AI image generation), Hugging Face (for real photos/models) |
| Keys / credentials needed | API keys for Kolors and Hugging Face (implied) |
Step-by-Step Guide
Step 1 — Define Your Research Question
Clearly articulate your research objective or problem. This prompt guides Lemma AI's multi-agent system in exploring the topic, reviewing literature, and formulating a research plan. A precise prompt ensures the AI focuses on relevant aspects and generates targeted outputs.
Example Prompt for LLM Hallucination Research:
Investigate whether asking an LLM to explicitly state uncertainty ("I am not sure") reduces hallucinations. Design a simple experiment with 20 questions, compare a normal prompt versus an uncertainty-aware prompt, analyze expected results, and provide practical recommendations. Do not write code or discuss model training.
Step 2 — Choose a Research Mode
Lemma AI offers four distinct modes, each tailored for different research needs and complexity levels. Selecting the appropriate mode is crucial for efficient resource utilization and desired output.
- Explore: Delivers a concise report with selected key references in 1-3 minutes. This mode is suitable for quick overviews and initial literature scans.
- Survey: Generates a long-form academic survey with broad citation coverage in hours. This is ideal for comprehensive literature reviews on a specific topic.
- Code: Implements methods and executes experiments automatically. This mode is for users who want to put a research idea into practice, generating and running actual code.
- FARS (Fully Automated Research System): A comprehensive system that can generate research proposals, experimental studies, or full research papers. This mode is for end-to-end automated research projects.
Step 3 — Code Implementation and Experimentation

For tasks requiring practical implementation, the 'Code' mode allows Lemma AI to generate, execute, and visualize code within a virtual environment. This is particularly useful for machine learning experiments.
Example Prompt for Image Classifier:
Build an image classifier that can distinguish between AI-generated images and real photos, train it on a small dataset, and visualize what features it's picking up on.
Upon receiving the prompt, Lemma AI performs the following autonomous steps:
- Environment Setup: Configures a
condaenvironment and installs necessary dependencies. - Dataset Generation: Downloads real photos (e.g., from Hugging Face) and generates AI images (e.g., via Kolors API) to create a balanced dataset.
- Code Implementation: Writes Python scripts for dataset handling, model training, and visualization.
# dataset.py - Example snippet for data loading import os from pathlib import Path import torch from torch.utils.data import Dataset, DataLoader from sklearn.model_selection import train_test_split IMAGENET_MEAN = [0.485, 0.456, 0.406] IMAGENET_STD = [0.229, 0.224, 0.225] IMG_SIZE = 128 def get_dataloaders(data_dir="workspace/data", val_l=1, test_l=1): # ... (code to load and split dataset into train, validation, test loaders) train_ds = AIVsRealDataset(train_p, train_l) val_ds = AIVsRealDataset(val_p, val_l) test_ds = AIVsRealDataset(test_p, test_l) train_loader = DataLoader(train_ds, batch_size=32, num_workers=4, shuffle=True) val_loader = DataLoader(val_ds, batch_size=32, num_workers=4, shuffle=False) test_loader = DataLoader(test_ds, batch_size=32, num_workers=4, shuffle=False) print(f"Train: {len(train_ds)}, Val: {len(val_ds)}, Test: {len(test_ds)}") return train_loader, val_loader, test_loader, CLASS_NAMES # generate_dataset.py - Example snippet for image generation # ... (code to download real images and generate AI images via API) # Example of API call for AI image generation # response = requests.post("https://api.kolors.ai/generate", json=payload) # ... # train.py - Example snippet for model training import torch.nn as nn import torch.optim as optim # ... (model definition, training loop, evaluation) # visualize.py - Example snippet for visualization import matplotlib.pyplot as plt import seaborn as sns # ... (code to generate confusion matrix, Grad-CAM, prediction grids) - Execution and Visualization: Runs the generated code, trains the model (e.g., MobileNetV3-Small-0.5 fine-tuned on 300 images), and produces visualizations such as confusion matrices, Grad-CAM insights, and prediction grids. The entire process runs in a virtual workspace, eliminating the need for local setup.
Results Summary for Image Classifier:
- Model: MobileNetV3-Small-0.5, fine-tuned on 300 images (150 real, 150 AI-generated via Kolors)
- Best Val Accuracy: 96.7% (epoch 8)
- Test Accuracy: 100%
- Macro F1: 1.00
- Training time: ~45 seconds (CPU)
Step 4 — Fully Automated Research System (FARS)

FARS is Lemma AI's most advanced mode, capable of conducting multi-day, complex research projects autonomously. It follows a structured research pipeline:
- Ideation: Based on the initial prompt, FARS generates a research proposal. This proposal includes strategic context, user request alignment, constraint compliance, a