T
Tech With Tim
#AI research#automated research#machine learning

Lemma AI: Automated Research & Code Generation for AI

Explore Lemma AI, a multi-agent system that automates scientific research, from literature review and experiment design to code generation and paper writing. Learn how it reduces LLM hallucinations and democratizes advanced AI research.

5 min readAI Guide

Introduction

Introduction
Lemma AI is a multi-agent research system that automates the entire scientific research process, from ideation and experimentation to writing professional-grade papers. It democratizes access to in-depth research, allowing users to describe their research goals and have an AI system autonomously execute the necessary steps.

Configuration Checklist

Element Version / Link
Language / Runtime Python (implied)
Main libraries PyTorch, Torchvision, Matplotlib, Seaborn, Scikit-learn (implied)
Required APIs Kolors API (for AI image generation), Hugging Face (for real photos/models)
Keys / credentials needed API keys for Kolors and Hugging Face (implied)

Step-by-Step Guide

Step 1 — Define Your Research Question

Clearly articulate your research objective or problem. This prompt guides Lemma AI's multi-agent system in exploring the topic, reviewing literature, and formulating a research plan. A precise prompt ensures the AI focuses on relevant aspects and generates targeted outputs.

Example Prompt for LLM Hallucination Research:

Investigate whether asking an LLM to explicitly state uncertainty ("I am not sure") reduces hallucinations. Design a simple experiment with 20 questions, compare a normal prompt versus an uncertainty-aware prompt, analyze expected results, and provide practical recommendations. Do not write code or discuss model training.

Step 2 — Choose a Research Mode

Lemma AI offers four distinct modes, each tailored for different research needs and complexity levels. Selecting the appropriate mode is crucial for efficient resource utilization and desired output.

  • Explore: Delivers a concise report with selected key references in 1-3 minutes. This mode is suitable for quick overviews and initial literature scans.
  • Survey: Generates a long-form academic survey with broad citation coverage in hours. This is ideal for comprehensive literature reviews on a specific topic.
  • Code: Implements methods and executes experiments automatically. This mode is for users who want to put a research idea into practice, generating and running actual code.
  • FARS (Fully Automated Research System): A comprehensive system that can generate research proposals, experimental studies, or full research papers. This mode is for end-to-end automated research projects.

Step 3 — Code Implementation and Experimentation

Step 3 — Code Implementation and Experimentation
For tasks requiring practical implementation, the 'Code' mode allows Lemma AI to generate, execute, and visualize code within a virtual environment. This is particularly useful for machine learning experiments.

Example Prompt for Image Classifier:

Build an image classifier that can distinguish between AI-generated images and real photos, train it on a small dataset, and visualize what features it's picking up on.

Upon receiving the prompt, Lemma AI performs the following autonomous steps:

  1. Environment Setup: Configures a conda environment and installs necessary dependencies.
  2. Dataset Generation: Downloads real photos (e.g., from Hugging Face) and generates AI images (e.g., via Kolors API) to create a balanced dataset.
  3. Code Implementation: Writes Python scripts for dataset handling, model training, and visualization.
    # dataset.py - Example snippet for data loading
    import os
    from pathlib import Path
    import torch
    from torch.utils.data import Dataset, DataLoader
    from sklearn.model_selection import train_test_split
    
    IMAGENET_MEAN = [0.485, 0.456, 0.406]
    IMAGENET_STD = [0.229, 0.224, 0.225]
    IMG_SIZE = 128
    
    def get_dataloaders(data_dir="workspace/data", val_l=1, test_l=1):
        # ... (code to load and split dataset into train, validation, test loaders)
        train_ds = AIVsRealDataset(train_p, train_l)
        val_ds = AIVsRealDataset(val_p, val_l)
        test_ds = AIVsRealDataset(test_p, test_l)
    
        train_loader = DataLoader(train_ds, batch_size=32, num_workers=4, shuffle=True)
        val_loader = DataLoader(val_ds, batch_size=32, num_workers=4, shuffle=False)
        test_loader = DataLoader(test_ds, batch_size=32, num_workers=4, shuffle=False)
    
        print(f"Train: {len(train_ds)}, Val: {len(val_ds)}, Test: {len(test_ds)}")
        return train_loader, val_loader, test_loader, CLASS_NAMES
    
    # generate_dataset.py - Example snippet for image generation
    # ... (code to download real images and generate AI images via API)
    # Example of API call for AI image generation
    # response = requests.post("https://api.kolors.ai/generate", json=payload)
    # ...
    
    # train.py - Example snippet for model training
    import torch.nn as nn
    import torch.optim as optim
    # ... (model definition, training loop, evaluation)
    
    # visualize.py - Example snippet for visualization
    import matplotlib.pyplot as plt
    import seaborn as sns
    # ... (code to generate confusion matrix, Grad-CAM, prediction grids)
    
  4. Execution and Visualization: Runs the generated code, trains the model (e.g., MobileNetV3-Small-0.5 fine-tuned on 300 images), and produces visualizations such as confusion matrices, Grad-CAM insights, and prediction grids. The entire process runs in a virtual workspace, eliminating the need for local setup.

Results Summary for Image Classifier:

  • Model: MobileNetV3-Small-0.5, fine-tuned on 300 images (150 real, 150 AI-generated via Kolors)
  • Best Val Accuracy: 96.7% (epoch 8)
  • Test Accuracy: 100%
  • Macro F1: 1.00
  • Training time: ~45 seconds (CPU)

Step 4 — Fully Automated Research System (FARS)

Step 4 — Fully Automated Research System (FARS)
FARS is Lemma AI's most advanced mode, capable of conducting multi-day, complex research projects autonomously. It follows a structured research pipeline:

  1. Ideation: Based on the initial prompt, FARS generates a research proposal. This proposal includes strategic context, user request alignment, constraint compliance, a