Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
fe26ff8
fix: slack event type for rich text (#727)
jayhack Mar 4, 2025
9c453a6
update changelog
github-actions[bot] Mar 4, 2025
d648646
fix: another slack schema error (#728)
jayhack Mar 4, 2025
09b2a08
update changelog
github-actions[bot] Mar 4, 2025
007c499
docs: devin blog post (#730)
jayhack Mar 4, 2025
a4994ee
docs: updated API reference
codegen-team Mar 4, 2025
adf4b3f
docs: fixes broken links (#731)
jayhack Mar 4, 2025
5c3ccf8
docs: updated API reference
codegen-team Mar 4, 2025
839f11d
chore: speeding up benchmark with concurrent requests (#720)
jemeza-codegen Mar 4, 2025
ee58fb6
chore(ci): use 3.13 codecov-cli (#735)
christinewangcw Mar 4, 2025
5493f34
Speed up directory tree generation (#736)
EdwardJXLi Mar 4, 2025
0058abe
fix: removes fancy tools from default agent implementation (#738)
jayhack Mar 4, 2025
c5b3ba4
feat: runner tooling (#733)
kopekC Mar 4, 2025
c06d1c6
Fix tests??? Somehow? (#740)
EdwardJXLi Mar 4, 2025
36625f0
update changelog
github-actions[bot] Mar 4, 2025
ae8dd56
Gitignore pyrightconfig.json (#743)
EdwardJXLi Mar 4, 2025
ed49306
fix(deps): update dependency openai to v1.65.3 (#744)
renovate[bot] Mar 4, 2025
4d4b079
update changelog
github-actions[bot] Mar 4, 2025
e23dac4
Dynamic scaling of asyncio request (#721)
jemeza-codegen Mar 4, 2025
fc31867
chore: allows evaluator to run on existing predictions (#734)
jemeza-codegen Mar 4, 2025
cf45221
feat: Better search tool (#739)
kopekC Mar 4, 2025
b61ba9d
[CG-10935] fix: issues with unpacking (#724)
Mar 4, 2025
a8c1002
docs: updated API reference
codegen-team Mar 4, 2025
2fc0604
update changelog
github-actions[bot] Mar 4, 2025
f64f455
[CG-10916] fix: WildcardImport error when trying to parse astropy (#754)
Mar 5, 2025
09b797a
Github Checks (import) (#750)
vishalshenoy Mar 5, 2025
16a114b
fix tests (#763)
kopekC Mar 5, 2025
0af487f
chore: better linking between a swebench run and its langsmith trace …
jemeza-codegen Mar 5, 2025
cb2329e
Fix invalid file bug with `iter_files` (#764)
EdwardJXLi Mar 5, 2025
7903a9e
GitHub Webhook Fix (#765)
vishalshenoy Mar 6, 2025
a2c6a53
fix(deps): update dependency openai to v1.65.4 (#766)
renovate[bot] Mar 6, 2025
5515372
update changelog
github-actions[bot] Mar 6, 2025
26b5ca4
chore: model is now a parameter for the run eval command (#767)
jemeza-codegen Mar 6, 2025
c44d98e
feat: adds SearchGithubIssuesTool (#747)
jayhack Mar 6, 2025
0dc9974
update changelog
github-actions[bot] Mar 6, 2025
074e761
feat: add view PR checks tool (#768)
christinewangcw Mar 6, 2025
3eb3324
update changelog
github-actions[bot] Mar 6, 2025
30c016e
chore: CG-10986 checkout pr tool (#769)
christinewangcw Mar 6, 2025
0127d2b
Model name is a tag for langsmith traces (#770)
jemeza-codegen Mar 6, 2025
6616219
fix: fix Codebase.from_repo (#771)
caroljung-cg Mar 6, 2025
42b0537
docs: updated API reference
codegen-team Mar 6, 2025
aa00b72
update changelog
github-actions[bot] Mar 6, 2025
997bcb5
fix: CG-10985 view PR returning 404 (#772)
christinewangcw Mar 6, 2025
66f33f5
update changelog
github-actions[bot] Mar 6, 2025
5e8f95f
Add docs for Advanced Settings (#774)
EdwardJXLi Mar 6, 2025
c6f2f5a
docs: updated API reference
codegen-team Mar 6, 2025
91d767e
chore: Integrate xAI as provider (#773)
kopekC Mar 7, 2025
d435cc5
fix: remote git repo doesn't need token if it's public repo (#775)
christinewangcw Mar 7, 2025
c61256a
update changelog
github-actions[bot] Mar 7, 2025
3360eae
Delete Chat Agent (#777)
vishalshenoy Mar 7, 2025
5472cad
fix: fix search tool import (#778)
caroljung-cg Mar 7, 2025
1d1a1c7
update changelog
github-actions[bot] Mar 7, 2025
b763d1f
CG-10997: Added subsets of swe-bench lite (#779)
jemeza-codegen Mar 8, 2025
0d600cf
chore: reverts to clasic regex + ripgrep search (#776)
kopekC Mar 8, 2025
e8783f6
chore: improvements to command line arguments (#781)
jemeza-codegen Mar 8, 2025
8a5818e
fix(deps): update dependency openai to v1.65.5 (#782)
renovate[bot] Mar 9, 2025
d53181c
update changelog
github-actions[bot] Mar 9, 2025
d9cb6d9
chore(deps): lock file maintenance (#784)
renovate[bot] Mar 10, 2025
2f7d002
fix: pull request context - body can be null (#787)
christinewangcw Mar 10, 2025
7de5af7
update changelog
github-actions[bot] Mar 10, 2025
6629cd1
CG-10967: Create Custom Langgraph Nodes + Retry Policy (#788)
tawsifkamal Mar 10, 2025
6da1173
500 Error Anthropic Retry Policy (#789)
tawsifkamal Mar 10, 2025
a652487
chore: minor tag and metadata refactor (#780)
jemeza-codegen Mar 10, 2025
ef1dba8
From repo with full history (#790)
vishalshenoy Mar 10, 2025
3553788
docs: updated API reference
codegen-team Mar 10, 2025
8490b30
Architecture Docs V1 (#792)
EdwardJXLi Mar 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/actions/setup-environment/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,5 @@ runs:
- name: Install codecov
shell: bash
run: |
uv tool install codecov-cli@10.0.1 --python 3.10
uv tool install codecov-cli@10.0.1
uv tool update-shell
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ supabase/config.toml
**/scripts/*.md
**/scripts/Personal/*
**/infrastructure/aws_infra/.terraform/*
pyrightconfig.json
Edwards Scratchpad.ipynb

# Allowing .env files to exist in repository, but not allowing updates
Expand Down
16 changes: 1 addition & 15 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -80,27 +80,13 @@ repos:
rev: 39.169.3
hooks:
- id: renovate-config-validator

- repo: https://github.com/astral-sh/uv-pre-commit
rev: "0.5.31"
hooks:
- id: uv-sync
args: ["--frozen", "--all-packages", "--all-extras"]

- repo: "local"
hooks:
# Disabled as part of LFS removal.
# - id: disallowed-words-check
# name: Check for disallowed words
# entry: scripts/disallowed-words-check.sh
# language: script
# files: '' # Check all files
- id: generate-runner-imports
name: Generate Runner Imports
entry: bash -c "uv run --frozen python -m codegen.gscli.cli generate runner-imports src/codegen/shared/compilation/function_imports.py"
language: system
pass_filenames: false
always_run: true

- repo: https://github.com/hukkin/mdformat
rev: 0.7.22 # Use the ref you want to point at
hooks:
Expand Down
2 changes: 1 addition & 1 deletion architecture/2. parsing/B. AST Construction.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,4 +74,4 @@ Statements have another layer of complexity. They are essentially pattern based

## Next Step

After the AST is constructed, the system moves on to [Import Resolution](../3.%20imports-exports/A.%20Imports.md) to analyze module dependencies and resolve symbols across files.
After the AST is constructed, the system moves on to [Directory Parsing](./C.%20Directory%20Parsing.md) to build a hierarchical representation of the codebase's directory structure.
50 changes: 50 additions & 0 deletions architecture/2. parsing/C. Directory Parsing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Directory Parsing

The Directory Parsing system is responsible for creating and maintaining a hierarchical representation of the codebase's directory structure in memory. Directories do not hold references to the file itself, but instead holds the names to the files and does a dynamic lookup when needed.

In addition to providing a more cohesive API for listing directory files, the Directory API is also used for [TSConfig](../3.%20imports-exports/C.%20TSConfig.md)-based (Import Resolution)[../3.%20imports-exports/A.%20Imports.md].

## Core Components

The Directory Tree is constructed during the initial build_graph step in codebase_context.py, and is recreated from scratch on every re-sync. More details are below:

## Directory Tree Construction

The directory tree is built through the following process:

1. The `build_directory_tree` method in `CodebaseContext` is called during graph initialization or when the codebase structure changes.
1. The method iterates through all files in the repository, creating directory objects for each directory path encountered.
1. For each file, it adds the file to its parent directory using the `_add_file` method.
1. Directories are created recursively as needed using the `get_directory` method with create_on_missing=True\`.

## Directory Representation

The `Directory` class provides a rich interface for working with directories:

- **Hierarchy Navigation**: Access parent directories and subdirectories
- **File Access**: Retrieve files by name or extension
- **Symbol Access**: Find symbols (classes, functions, etc.) within files in the directory
- **Directory Operations**: Rename, remove, or update directories

Each `Directory` instance maintains:

- A reference to its parent directory
- Lists of files and subdirectories
- Methods to recursively traverse the directory tree

## File Representation

Files are represented by the `File` class and its subclasses:

- `File`: Base class for all files, supporting basic operations like reading and writing content
- `SourceFile`: Specialized class for source code files that can be parsed into an AST

Files maintain references to:

- Their parent directory
- Their content (loaded dynamically to preserve the source of truth)
- For source files, the parsed AST and symbols

## Next Step

After the directory structure is parsed, the system can perform [Import Resolution](../3.%20imports-exports/A.%20Imports.md) to analyze module dependencies and resolve symbols across files.
57 changes: 55 additions & 2 deletions architecture/3. imports-exports/A. Imports.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,60 @@
# Import Resolution

TODO
Import resolution follows AST construction in the code analysis pipeline. It identifies dependencies between modules and builds a graph of relationships across the codebase.

> NOTE: This is an actively evolving part of Codegen SDK, so some details here may be imcomplete, outdated, or incorrect.

## Purpose

The import resolution system serves these purposes:

1. **Dependency Tracking**: Maps relationships between files by resolving import statements.
1. **Symbol Resolution**: Connects imported symbols to their definitions.
1. **Module Graph Construction**: Builds a directed graph of module dependencies.
1. **(WIP) Cross-Language Support**: Provides implementations for different programming languages.

## Core Components

### ImportResolution Class

The `ImportResolution` class represents the outcome of resolving an import statement. It contains:

- The source file containing the imported symbol
- The specific symbol being imported (if applicable)
- Whether the import references an entire file/module

### Import Base Class

The `Import` class is the foundation for language-specific import implementations. It:

- Stores metadata about the import (module path, symbol name, alias)
- Provides the abstract `resolve_import()` method
- Adds symbol resolution edges to the codebase graph

### Language-Specific Implementations

#### Python Import Resolution

The `PyImport` class extends the base `Import` class with Python-specific logic:

- Handles relative imports
- Supports module imports, named imports, and wildcard imports
- Resolves imports using configurable resolution paths and `sys.path`
- Handles special cases like `__init__.py` files

#### TypeScript Import Resolution

The `TSImport` class implements TypeScript-specific resolution:

- Supports named imports, default imports, and namespace imports
- Handles type imports and dynamic imports
- Resolves imports using TSConfig path mappings
- Supports file extension resolution

## Implementation

After file and directory parse, we loop through all import nodes and perform `add_symbol_resolution_edge`. This then invokes the language-specific `resolve_import` method that converts the import statement into a resolvable `ImportResolution` object (or None if the import cannot be resolved). This import symbol and the `ImportResolution` object are then used to add a symbol resolution edge to the graph, where it can then be used in future steps to resolve symbols.

## Next Step

After import resolution, the system analyzes [Export Analysis](./B.%20Exports.md) and handles [TSConfig Support](./C.%20TSConfig.md) for TypeScript projects. This is followed by comprehensive [Type Analysis](../4.%20type-analysis/A.%20Type%20Analysis.md).
After import resolution, the system analyzes [Export Analysis](./B.%20Exports.md) and handles [TSConfig Support](./C.%20TSConfig.md) for TypeScript projects. This is followed by [Type Analysis](../4.%20type-analysis/A.%20Type%20Analysis.md).
70 changes: 69 additions & 1 deletion architecture/3. imports-exports/B. Exports.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,74 @@
# Export Analysis

TODO
Some languages contain additional metadata on "exported" symbols, specifying which symbols are made available to other modules. Export analysis follows import resolution in the code analysis pipeline. It identifies and processes exported symbols from modules, enabling the system to track what each module makes available to others.

## Core Components

### Export Base Class

The `Export` class serves as the foundation for language-specific export implementations. It:

- Stores metadata about the export (symbol name, is default, etc.)
- Tracks the relationship between the export and its declared symbol
- Adds export edges to the codebase graph

### TypeScript Export Implementation

The `TSExport` class implements TypeScript-specific export handling:

- Supports various export styles (named exports, default exports, re-exports)
- Handles export declarations with and without values
- Processes wildcard exports (`export * from 'module'`)
- Manages export statements with multiple exports

#### Export Types and Symbol Resolution

The TypeScript implementation handles several types of exports:

1. **Declaration Exports**

- Function declarations (including generators)
- Class declarations
- Interface declarations
- Type alias declarations
- Enum declarations
- Namespace declarations
- Variable/constant declarations

1. **Value Exports**

- Object literals with property exports
- Arrow functions and function expressions
- Classes and class expressions
- Assignment expressions
- Primitive values and expressions

1. **Special Export Forms**

- Wildcard exports (`export * from 'module'`)
- Named re-exports (`export { name as alias } from 'module'`)
- Default exports with various value types

#### Symbol Tracking and Dependencies

The export system:

- Maintains relationships between exported symbols and their declarations
- Validates export names match their declared symbols
- Tracks dependencies through the codebase graph
- Handles complex scenarios like:
- Shorthand property exports in objects
- Nested function and class declarations
- Re-exports from other modules

#### Integration with Type System

Exports are tightly integrated with the type system:

- Exported type declarations are properly tracked
- Symbol resolution considers both value and type exports
- Re-exports preserve type information
- Export edges in the codebase graph maintain type relationships

## Next Step

Expand Down
76 changes: 75 additions & 1 deletion architecture/3. imports-exports/C. TSConfig.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,80 @@
# TSConfig Support

TODO
TSConfig support is a critical component for TypeScript projects in the import resolution system. It processes TypeScript configuration files (tsconfig.json) to correctly resolve module paths and dependencies.

## Purpose

The TSConfig support system serves these purposes:

1. **Path Mapping**: Resolves custom module path aliases defined in the tsconfig.json file.
1. **Base URL Resolution**: Handles non-relative module imports using the baseUrl configuration.
1. **Project References**: Manages dependencies between TypeScript projects using the references field.
1. **Directory Structure**: Respects rootDir and outDir settings for maintaining proper directory structures.

## Core Components

### TSConfig Class

The `TSConfig` class represents a parsed TypeScript configuration file. It:

- Parses and stores the configuration settings from tsconfig.json
- Handles inheritance through the "extends" field
- Provides methods for translating between import paths and absolute file paths
- Caches computed values for performance optimization

## Configuration Processing

### Configuration Inheritance

TSConfig files can extend other configuration files through the "extends" field:

1. Base configurations are loaded and parsed first
1. Child configurations inherit and can override settings from their parent
1. Path mappings, base URLs, and other settings are merged appropriately

### Path Mapping Resolution

The system processes the "paths" field in tsconfig.json to create a mapping between import aliases and file paths:

1. Path patterns are normalized (removing wildcards, trailing slashes)
1. Relative paths are converted to absolute paths
1. Mappings are stored for efficient lookup during import resolution

### Project References

The "references" field defines dependencies between TypeScript projects:

1. Referenced projects are identified and loaded
1. Their configurations are analyzed to determine import paths
1. Import resolution can cross project boundaries using these references

## Import Resolution Process

### Path Translation

When resolving an import path in TypeScript:

1. Check if the path matches any path alias in the tsconfig.json
1. If a match is found, translate the path according to the mapping
1. Apply baseUrl resolution for non-relative imports
1. Handle project references for cross-project imports

### Optimization Techniques

The system employs several optimizations:

1. Caching computed values to avoid redundant processing
1. Early path checking for common patterns (e.g., paths starting with "@" or "~")
1. Hierarchical resolution that respects the configuration inheritance chain

## Integration with Import Resolution

The TSConfig support integrates with the broader import resolution system:

1. Each TypeScript file is associated with its nearest tsconfig.json
1. Import statements are processed using the file's associated configuration
1. Path mappings are applied during the module resolution process
1. Project references are considered when resolving imports across project boundaries

## Next Step

Expand Down
7 changes: 0 additions & 7 deletions architecture/5. performing-edits/A. Edit Operations.md

This file was deleted.

54 changes: 54 additions & 0 deletions architecture/5. performing-edits/A. Transactions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Transactions

Transactions represent atomic changes to files in the codebase. Each transaction defines a specific modification that can be queued, validated, and executed.

## Transaction Types

The transaction system is built around a base `Transaction` class with specialized subclasses:

### Content Transactions

- **RemoveTransaction**: Removes content between specified byte positions
- **InsertTransaction**: Inserts new content at a specified byte position
- **EditTransaction**: Replaces content between specified byte positions

### File Transactions

- **FileAddTransaction**: Creates a new file
- **FileRenameTransaction**: Renames an existing file
- **FileRemoveTransaction**: Deletes a file

## Transaction Priority

Transactions are executed in a specific order defined by the `TransactionPriority` enum:

1. **Remove** (highest priority)
1. **Edit**
1. **Insert**
1. **FileAdd**
1. **FileRename**
1. **FileRemove**

This ordering ensures that content is removed before editing or inserting, and that all content operations happen before file operations.

## Key Concepts

### Byte-Level Operations

All content transactions operate at the byte level rather than on lines or characters. This provides precise control over modifications and allows transactions to work with any file type, regardless of encoding or line ending conventions.

### Content Generation

Transactions support both static content (direct strings) and dynamic content (generated at execution time). This flexibility allows for complex transformations where the new content depends on the state of the codebase at execution time.

Most content transactions use static content, but dynamic content is supported for rare cases where the new content depends on the state of other transactions. One common example is handling whitespace during add and remove transactions.

### File Operations

File transactions are used to create, rename, and delete files.

> NOTE: It is important to note that most file transactions such as `FileAddTransaction` are no-ops (AKA skiping Transaction Manager) and instead applied immediately once the `create_file` API is called. This allows for created files to be immediately available for edit and use. The reason file operations are still added to Transaction Manager is to help with optimizing graph re-parse and diff generation. (Keeping track of which files exist and don't exist anymore).

## Next Step

After understanding the transaction system, they are managed by the [Transaction Manager](./B.%20Transaction%20Manager.md) to ensure consistency and atomicity.
Loading
Loading