Mapping Your Organizational Codebase for Evaluation

# Mapping Your Organizational Codebase for Evaluation Before you can evaluate a GraphRAG system on your code, you need to understand *your codebase* as a system. If you’re working with 20–30 repositories, each owned by different teams and evolving at different speeds, “the codebase” is really an ecosystem: services, frontends, connectors, schemas, security layers, and infrastructure all woven together. This post walks through how to map that ecosystem so you can design meaningful evaluation datasets for your cross-repository assistant. --- ## Why Mapping Comes Before Evaluation Most evaluation failures trace back to one root cause: the evaluation dataset doesn’t reflect how your system is *actually* used. If you randomly sample files or pick a few issues from one repository, you’ll get: - Single-repo, single-file questions. - Little to no cross-service reasoning. - Very little stress on retrieval. But in reality, your developers ask things like: - “If I change this user field, what breaks?” - “Where is authentication actually enforced?” - “Why did we change this integration last quarter?” Answering those questions requires cross-repository retrieval, version awareness, and graph traversal. To build an evaluation set that reflects this, you first need a **map**: a dependency graph of your organizational codebase and the queries developers actually care about. --- ## Step 1: Build a Repository Dependency Graph Start by extracting how your repositories depend on each other. At a minimum, you want to capture: - **Imports** between modules and shared libraries. - **API calls** between services (HTTP, RPC, GraphQL, gRPC, messaging). - **Schema references** (tables, columns, migrations). - **Configuration references** (shared config keys, feature flags, secrets). Conceptually, you’re building a graph where: - Nodes = repositories, services, schemas, key modules. - Edges = “calls”, “imports”, “references”, “configured by”. Here’s a simplified pseudocode sketch for Python repos: ```python from pathlib import Path import ast import json def extract_cross_repo_dependencies(repo_root: Path) -> dict: """ Extract imports, API calls, and configuration references that cross repository boundaries. """ dependencies = for py_file in repo_root.rglob("*.py"): tree = ast.parse(py_file.read_text()) # Extract import statements # Identify API client instantiations # Find SQL query strings # Locate config key accesses return dependencies ``` In a full implementation, you would: - Parse import statements and map them to other internal packages/repos. - Look for HTTP/RPC clients (e.g., `requests`, `httpx`, gRPC stubs) and extract target service names/URLs. - Identify SQL strings and extract referenced tables/columns. - Track access to config modules (e.g., `settings["PAYMENTS_API_URL"]`). Normalize these into a graph format (e.g., JSON, Neo4j, NetworkX) so you can query: - “Which repos depend on `user-service`?” - “Which services hit the `payments` database?” - “What calls flow through `auth-middleware`?” --- ## Step 2: Classify Repositories by Role Once you can see edges, group repos by the role they play. For a typical 25-repository setup, you might see something like: | Repository Type | Count | Cross-Repo Dependency Pattern | |---------------------|-------|-------------------------------------------------------------------------| | Backend Services | ~8 | Import connectors, call other backends, reference SQL schemas | | Frontend Apps | ~4 | Call backend APIs, share component libraries | | Connectors/SDKs | ~5 | Imported by backends and frontends | | SQL/Migrations | ~3 | Referenced by backends, define data contracts | | Security/Auth | ~2 | Wrapped around or injected into all service calls | | Infrastructure | ~3 | Configure deployment, routing, and environment for all of the above | This classification helps you: - Understand which repos are **sources of truth** (e.g., schemas, auth). - See where **contract boundaries** live (APIs, types, connectors). - Identify **central hubs** that many others depend on. These roles will later drive your **query types** and **evaluation focus areas**. --- ## Step 3: Mine Real Developer Questions Next, you need to understand what developers actually ask when they’re stuck. Useful sources: - **Slack / Teams channels** - `#help-backend`, `#help-frontend`, `#incidents`, `#oncall` - Look for threads where someone asks a question and multiple services/repos get discussed. - **Code review comments** - “Where else is this called?” - “Does this break X integration?” - “Is this consistent with the schema in Y?” - **Onboarding docs and FAQs** - “How does authentication work end-to-end?” - “How does the billing pipeline operate?” - **Incident post-mortems** - “What caused this outage?” - “Which services were impacted and why?” For each question you collect, note: - Which **repositories** were referenced in the answer. - Whether the resolution required: - Following a dependency chain. - Understanding a shar

Repository Type	Count	Cross-Repo Dependency Pattern
Backend Services	~8	Import connectors, call other backends, reference SQL schemas
Frontend Apps	~4	Call backend APIs, share component libraries
Connectors/SDKs	~5	Imported by backends and frontends
SQL/Migrations	~3	Referenced by backends, define data contracts
Security/Auth	~2	Wrapped around or injected into all service calls
Infrastructure	~3	Configure deployment, routing, and environment for all of the above

Mapping Your Organizational Codebase for Evaluation

Mapping Your Organizational Codebase for Evaluation

Why Mapping Comes Before Evaluation

Step 1: Build a Repository Dependency Graph

Step 2: Classify Repositories by Role

Step 3: Mine Real Developer Questions

Step 4: Define Evaluation-Relevant Query Types

Type A: Single-Repository, Multi-File

Type B: Cross-Repository, Single Concept

Type C: Cross-Repository, Dependency Chain

Type D: Cross-Repository, Temporal

Step 5: Identify Evaluation-Critical Code Regions

1. High-Connectivity Nodes

2. Recent Change Hotspots

3. Architectural Boundaries

Step 6: Produce Concrete Evaluation Artifacts