The primary entrypoint to programs leveraging Codegen is the Codebase class.
Local Codebases
Construct a Codebase by passing in a path to a local git repository or any subfolder within it. The path must be within a git repository (i.e., somewhere in the parent directory tree must contain a .git folder).
from codegen import Codebase
from codegen.sdk.enums import ProgrammingLanguage
# Parse from a git repository root
codebase = Codebase("path/to/repository")
# Parse from a subfolder within a git repository
codebase = Codebase("path/to/repository/src/subfolder")
# Parse from current directory (must be within a git repo)
codebase = Codebase("./")
# Specify programming language (instead of inferring from file extensions)
codebase = Codebase("./", programming_language=ProgrammingLanguage.TYPESCRIPT)
By default, Codegen will automatically infer the programming language of the codebase and
parse all files in the codebase. You can override this by passing the programming_language parameter
with a value from the ProgrammingLanguage enum.
The initial parse may take a few minutes for large codebases. This
pre-computation enables constant-time operations afterward. Learn more
here.
Remote Repositories
To fetch and parse a repository directly from GitHub, use the from_repo function.
import codegen
from codegen.sdk.enums import ProgrammingLanguage
# Fetch and parse a repository (defaults to /tmp/codegen/{repo_name})
codebase = codegen.from_repo('fastapi/fastapi')
# Customize temp directory, clone depth, specific commit, or programming language
codebase = codegen.from_repo(
'fastapi/fastapi',
tmp_dir='/custom/temp/dir', # Optional: custom temp directory
commit='786a8ada7ed0c7f9d8b04d49f24596865e4b7901', # Optional: specific commit
shallow=False, # Optional: full clone instead of shallow
programming_language=ProgrammingLanguage.PYTHON # Optional: override language detection
)
Remote repositories are cloned to the /tmp/codegen/{repo_name} directory by
default. The clone is shallow by default for better performance.
Configuration Options
You can customize the behavior of your Codebase instance by passing a CodebaseConfig object. This allows you to configure secrets (like API keys) and toggle specific features:
from codegen import Codebase
from codegen.sdk.codebase.config import CodebaseConfig, GSFeatureFlags, Secrets
codebase = Codebase(
"path/to/repository",
config=CodebaseConfig(
secrets=Secrets(
openai_key="your-openai-key" # For AI-powered features
),
feature_flags=GSFeatureFlags(
sync_enabled=True, # Enable graph synchronization
... # Add other feature flags as needed
)
)
)
The CodebaseConfig allows you to configure:
secrets: API keys and other sensitive information needed by the codebase
feature_flags: Toggle specific features like language engines, dependency management, and graph synchronization
For a complete list of available feature flags and configuration options, see the source code on GitHub.
Advanced Initialization
For more complex scenarios, Codegen supports an advanced initialization mode using ProjectConfig. This allows for fine-grained control over:
- Repository configuration
- Base path and subdirectory filtering
- Multiple project configurations
Here’s an example:
from codegen import Codebase
from codegen.git.repo_operator.local_repo_operator import LocalRepoOperator
from codegen.git.schemas.repo_config import BaseRepoConfig
from codegen.sdk.codebase.config import ProjectConfig
from codegen.sdk.enums import ProgrammingLanguage
codebase = Codebase(
projects = [
ProjectConfig(
repo_operator=LocalRepoOperator(
repo_path="/tmp/codegen-sdk",
repo_config=BaseRepoConfig(),
bot_commit=True
),
programming_language=ProgrammingLanguage.TYPESCRIPT,
base_path="src/codegen/sdk/typescript",
subdirectories=["src/codegen/sdk/typescript"]
)
]
)
For more details on advanced configuration options, see the source code on GitHub.
Supported Languages
Codegen currently supports: