A quick tour of Codegen in a Jupyter notebook.
Installation
Install codegen on pypi via uv:
This makes the codegen command available globally in your terminal, while
keeping its dependencies isolated.
Quick Start with Jupyter
The codgen notebook command creates a virtual environment and opens a Jupyter notebook for quick prototyping. This is often the fastest way to get up and running.
# Navigate to your repository
cd path/to/git/repository
# Initialize codegen and launch Jupyter
codegen notebook
This will:
- Create a
.codegen/ directory with:
.venv/ - A dedicated virtual environment for this project
jupyter/ - Jupyter notebooks for exploring your code
config.toml - Project configuration
- Launch Jupyter Lab with a pre-configured notebook
The notebook comes pre-configured to load your codebase, so you can start
exploring right away!
Initializing a Codebase
Instantiating a Codebase will automatically parse a codebase and make it available for manipulation.
from codegen import Codebase
# Parse a codebase
codebase = Codebase("./")
This will automatically infer the programming language of the codebase and
parse all files in the codebase.
The initial parse may take a few minutes for large codebases. This
pre-computation enables constant-time operations afterward. Learn more
here.
Exploring Your Codebase
Let’s explore the codebase we just initialized.
Here are some common patterns for code navigation in Codegen:
# Print overall stats
print("🔍 Codebase Analysis")
print("=" * 50)
print(f"📚 Total Classes: {len(codebase.classes)}")
print(f"⚡ Total Functions: {len(codebase.functions)}")
print(f"🔄 Total Imports: {len(codebase.imports)}")
# Find class with most inheritance
if codebase.classes:
deepest_class = max(codebase.classes, key=lambda x: len(x.superclasses))
print(f"\n🌳 Class with most inheritance: {deepest_class.name}")
print(f" 📊 Chain Depth: {len(deepest_class.superclasses)}")
print(f" ⛓️ Chain: {' -> '.join(s.name for s in deepest_class.superclasses)}")
# Find first 5 recursive functions
recursive = [f for f in codebase.functions
if any(call.name == f.name for call in f.function_calls)][:5]
if recursive:
print(f"\n🔄 Recursive functions:")
for func in recursive:
print(f" - {func.name}")
Analyzing Tests
Let’s specifically drill into large test files, which can be cumbersome to manage.
from collections import Counter
# Filter to all test functions and classes
test_functions = [x for x in codebase.functions if x.name.startswith('test_')]
test_classes = [x for x in codebase.classes if x.name.startswith('Test')]
print("🧪 Test Analysis")
print("=" * 50)
print(f"📝 Total Test Functions: {len(test_functions)}")
print(f"🔬 Total Test Classes: {len(test_classes)}")
print(f"📊 Tests per File: {len(test_functions) / len(codebase.files):.1f}")
# Find files with the most tests
print("\n📚 Top Test Files by Class Count")
print("-" * 50)
file_test_counts = Counter([x.file for x in test_classes])
for file, num_tests in file_test_counts.most_common()[:5]:
print(f"🔍 {num_tests} test classes: {file.filepath}")
print(f" 📏 File Length: {len(file.source)} lines")
print(f" 💡 Functions: {len(file.functions)}")
Splitting Up Large Test Files
Lets split up the largest test files into separate modules for better organization:
print("\n📦 Splitting Test Files")
print("=" * 50)
# Process top 5 largest test files
for file, num_tests in file_test_counts.most_common()[:5]:
# Create a new directory based on the file name
base_name = file.path.replace('.py', '')
print(f"\n🔄 Processing: {file.filepath}")
print(f" 📊 {num_tests} test classes to split")
# Move each test class to its own file
for test_class in file.classes:
if test_class.name.startswith('Test'):
# Create descriptive filename from test class name
new_file = f"{base_name}/{test_class.name.lower()}.py"
print(f" 📝 Moving {test_class.name} -> {new_file}")
# Codegen handles all the complexity:
# - Creates directories if needed
# - Updates all imports automatically
# - Maintains test dependencies
# - Preserves decorators and docstrings
test_class.move_to_file(new_file)
# Commit changes to disk
codebase.commit()
Finding Specific Content
Once you have a general sense of your codebase, you can filter down to exactly what you’re looking for. Codegen’s graph structure makes it straightforward and performant to find and traverse specific code elements:
# Grab specific content by name
my_resource = codebase.get_symbol('TestResource')
# Find classes that inherit from a specific base
resource_classes = [
cls for cls in codebase.classes
if cls.is_subclass_of('Resource')
]
# Find functions with specific decorators
test_functions = [
f for f in codebase.functions
if any('pytest' in d.source for d in f.decorators)
]
# Find files matching certain patterns
test_files = [
f for f in codebase.files
if f.name.startswith('test_')
]
Codegen guarantees that code transformations maintain correctness. It automatically handles updating imports, references, and dependencies. Here are some common transformations:
# Move all Enum classes to a dedicated file
for cls in codebase.classes:
if cls.is_subclass_of('Enum'):
# Codegen automatically:
# - Updates all imports that reference this class
# - Maintains the class's dependencies
# - Preserves comments and decorators
# - Generally performs this in a sane manner
cls.move_to_file(f'enums.py')
# Rename a function and all its usages
old_function = codebase.get_function('process_data')
old_function.rename('process_resource') # Updates all references automatically
# Change a function's signature
handler = codebase.get_function('event_handler')
handler.get_parameter('e').rename('event') # Automatically updates all call-sites
handler.add_parameter('timeout: int = 30') # Handles formatting and edge cases
handler.add_return_type('Response | None')
# Perform surgery on call-sites
for fcall in handler.call_sites:
arg = fcall.get_arg_by_parameter_name('env')
# f(..., env={ data: x }) => f(..., env={ data: x or None })
if isinstance(arg.value, Collection):
data_key = arg.value.get('data')
data_key.value.edit(f'{data_key.value} or None')
When moving symbols, Codegen will automatically update all imports and
references. See Moving Symbols to
learn more.
Leveraging Graph Relations
Codegen’s graph structure makes it easy to analyze relationships between code elements across files:
# Find dead code
for func in codebase.functions:
if len(function.usages) == 0:
print(f'🗑️ Dead code: {func.name}')
func.remove()
# Analyze import relationships
file = codebase.get_file('api/endpoints.py')
print("\nFiles that import endpoints.py:")
for import_stmt in file.inbound_imports:
print(f" {import_stmt.file.path}")
print("\nFiles that endpoints.py imports:")
for import_stmt in file.imports:
if import_stmt.resolved_symbol:
print(f" {import_stmt.resolved_symbol.file.path}")
# Explore class hierarchies
base_class = codebase.get_class('BaseModel')
if base_class:
print(f"\nClasses that inherit from {base_class.name}:")
for subclass in base_class.subclasses:
print(f" {subclass.name}")
# We can go deeper in the inheritance tree
for sub_subclass in subclass.subclasses:
print(f" └─ {sub_subclass.name}")
What’s Next?