Getting Started

A quick tour of Codegen in a Jupyter notebook.

Installation

Install codegen on pypi via uv:

uv tool install codegen

This makes the codegen command available globally in your terminal, while keeping its dependencies isolated.

Quick Start with Jupyter

The codgen notebook command creates a virtual environment and opens a Jupyter notebook for quick prototyping. This is often the fastest way to get up and running.

Prefer working in your IDE? See IDE Usage

# Navigate to your repository
cd path/to/git/repository

# Initialize codegen and launch Jupyter
codegen notebook

This will:

Create a .codegen/ directory with:
- .venv/ - A dedicated virtual environment for this project
- jupyter/ - Jupyter notebooks for exploring your code
- config.toml - Project configuration
Launch Jupyter Lab with a pre-configured notebook

The notebook comes pre-configured to load your codebase, so you can start exploring right away!

Initializing a Codebase

Instantiating a Codebase will automatically parse a codebase and make it available for manipulation.

from codegen import Codebase

# Parse a codebase
codebase = Codebase("./")

This will automatically infer the programming language of the codebase and parse all files in the codebase.

The initial parse may take a few minutes for large codebases. This pre-computation enables constant-time operations afterward. Learn more here.

Exploring Your Codebase

Let’s explore the codebase we just initialized. Here are some common patterns for code navigation in Codegen:

Iterate over all Functions with Codebase.functions
View class inheritance with Class.superclasses
View function call-sites with Function.call_sites
View function usages with Function.usages

# Print overall stats
print("🔍 Codebase Analysis")
print("=" * 50)
print(f"📚 Total Classes: {len(codebase.classes)}")
print(f"⚡ Total Functions: {len(codebase.functions)}")
print(f"🔄 Total Imports: {len(codebase.imports)}")

# Find class with most inheritance
if codebase.classes:
    deepest_class = max(codebase.classes, key=lambda x: len(x.superclasses))
    print(f"\n🌳 Class with most inheritance: {deepest_class.name}")
    print(f"   📊 Chain Depth: {len(deepest_class.superclasses)}")
    print(f"   ⛓️ Chain: {' -> '.join(s.name for s in deepest_class.superclasses)}")

# Find first 5 recursive functions
recursive = [f for f in codebase.functions
            if any(call.name == f.name for call in f.function_calls)][:5]
if recursive:
    print(f"\n🔄 Recursive functions:")
    for func in recursive:
        print(f"  - {func.name}")

Analyzing Tests

Let’s specifically drill into large test files, which can be cumbersome to manage.

from collections import Counter

# Filter to all test functions and classes
test_functions = [x for x in codebase.functions if x.name.startswith('test_')]
test_classes = [x for x in codebase.classes if x.name.startswith('Test')]

print("🧪 Test Analysis")
print("=" * 50)
print(f"📝 Total Test Functions: {len(test_functions)}")
print(f"🔬 Total Test Classes: {len(test_classes)}")
print(f"📊 Tests per File: {len(test_functions) / len(codebase.files):.1f}")

# Find files with the most tests
print("\n📚 Top Test Files by Class Count")
print("-" * 50)
file_test_counts = Counter([x.file for x in test_classes])
for file, num_tests in file_test_counts.most_common()[:5]:
    print(f"🔍 {num_tests} test classes: {file.filepath}")
    print(f"   📏 File Length: {len(file.source)} lines")
    print(f"   💡 Functions: {len(file.functions)}")

Splitting Up Large Test Files

Lets split up the largest test files into separate modules for better organization:

print("\n📦 Splitting Test Files")
print("=" * 50)

# Process top 5 largest test files
for file, num_tests in file_test_counts.most_common()[:5]:
    # Create a new directory based on the file name
    base_name = file.path.replace('.py', '')
    print(f"\n🔄 Processing: {file.filepath}")
    print(f"   📊 {num_tests} test classes to split")

    # Move each test class to its own file
    for test_class in file.classes:
        if test_class.name.startswith('Test'):
            # Create descriptive filename from test class name
            new_file = f"{base_name}/{test_class.name.lower()}.py"
            print(f"   📝 Moving {test_class.name} -> {new_file}")

            # Codegen handles all the complexity:
            # - Creates directories if needed
            # - Updates all imports automatically
            # - Maintains test dependencies
            # - Preserves decorators and docstrings
            test_class.move_to_file(new_file)

# Commit changes to disk
codebase.commit()

In order to commit changes to your filesystem, you must call codebase.commit(). Learn more about commit() and reset().

Finding Specific Content

Once you have a general sense of your codebase, you can filter down to exactly what you’re looking for. Codegen’s graph structure makes it straightforward and performant to find and traverse specific code elements:

# Grab specific content by name
my_resource = codebase.get_symbol('TestResource')

# Find classes that inherit from a specific base
resource_classes = [
    cls for cls in codebase.classes
    if cls.is_subclass_of('Resource')
]

# Find functions with specific decorators
test_functions = [
    f for f in codebase.functions
    if any('pytest' in d.source for d in f.decorators)
]

# Find files matching certain patterns
test_files = [
    f for f in codebase.files
    if f.name.startswith('test_')
]

Safe Code Transformations

Codegen guarantees that code transformations maintain correctness. It automatically handles updating imports, references, and dependencies. Here are some common transformations:

# Move all Enum classes to a dedicated file
for cls in codebase.classes:
    if cls.is_subclass_of('Enum'):
        # Codegen automatically:
        # - Updates all imports that reference this class
        # - Maintains the class's dependencies
        # - Preserves comments and decorators
        # - Generally performs this in a sane manner
        cls.move_to_file(f'enums.py')

# Rename a function and all its usages
old_function = codebase.get_function('process_data')
old_function.rename('process_resource')  # Updates all references automatically

# Change a function's signature
handler = codebase.get_function('event_handler')
handler.get_parameter('e').rename('event') # Automatically updates all call-sites
handler.add_parameter('timeout: int = 30')  # Handles formatting and edge cases
handler.add_return_type('Response | None')

# Perform surgery on call-sites
for fcall in handler.call_sites:
    arg = fcall.get_arg_by_parameter_name('env')
    # f(..., env={ data: x }) => f(..., env={ data: x or None })
    if isinstance(arg.value, Collection):
        data_key = arg.value.get('data')
        data_key.value.edit(f'{data_key.value} or None')

When moving symbols, Codegen will automatically update all imports and references. See Moving Symbols to learn more.

Leveraging Graph Relations

Codegen’s graph structure makes it easy to analyze relationships between code elements across files:

# Find dead code
for func in codebase.functions:
    if len(function.usages) == 0:
        print(f'🗑️ Dead code: {func.name}')
        func.remove()

# Analyze import relationships
file = codebase.get_file('api/endpoints.py')
print("\nFiles that import endpoints.py:")
for import_stmt in file.inbound_imports:
    print(f"  {import_stmt.file.path}")

print("\nFiles that endpoints.py imports:")
for import_stmt in file.imports:
    if import_stmt.resolved_symbol:
        print(f"  {import_stmt.resolved_symbol.file.path}")

# Explore class hierarchies
base_class = codebase.get_class('BaseModel')
if base_class:
    print(f"\nClasses that inherit from {base_class.name}:")
    for subclass in base_class.subclasses:
        print(f"  {subclass.name}")
        # We can go deeper in the inheritance tree
        for sub_subclass in subclass.subclasses:
            print(f"    └─ {sub_subclass.name}")

Learn more about dependencies and references or imports and exports.

What’s Next?

View Tutorials

Follow step-by-step tutorials for common code transformation tasks like modernizing React codebases or migrating APIs.

Learn Core Concepts

Understand key concepts like working with files, functions, imports, and the call graph to effectively manipulate code.

Integrate with AI Tools

Learn how to use Codegen with Cursor, Devin, Windsurf, and more.

API Reference

Explore the complete API documentation for all Codegen classes and methods.

Introduction

Tutorials

Building with Codegen

Getting Started

Installation

Quick Start with Jupyter

Initializing a Codebase

Exploring Your Codebase

Analyzing Tests

Splitting Up Large Test Files

Finding Specific Content

Safe Code Transformations

Leveraging Graph Relations

What’s Next?

View Tutorials

Learn Core Concepts

Integrate with AI Tools

API Reference

Introduction

Tutorials

Building with Codegen

​Installation

​Quick Start with Jupyter

​Initializing a Codebase

​Exploring Your Codebase

​Analyzing Tests

​Splitting Up Large Test Files

​Finding Specific Content

​Safe Code Transformations

​Leveraging Graph Relations

​What’s Next?

View Tutorials

Learn Core Concepts

Integrate with AI Tools

API Reference

Installation

Quick Start with Jupyter

Initializing a Codebase

Exploring Your Codebase

Analyzing Tests

Splitting Up Large Test Files

Finding Specific Content

Safe Code Transformations

Leveraging Graph Relations

What’s Next?