Knowledge Graph¶

The knowledge graph is Potpie's core data structure — a structured, navigable, and semantically searchable representation of your entire codebase.

How the graph is built¶

The knowledge graph is constructed through three stages: parsing, inference, and indexing.

Parsing¶

When a repository is added, Potpie performs structural analysis:

File analysis — files without meaningful source code (images, binaries, notebooks) are excluded. Files with recognized source extensions proceed to parsing.
AST querying — each file is parsed into an abstract syntax tree and queried for symbols. Every function, class, method, or interface is recorded with its name and exact location. Each symbol is classified as either a definition or a reference.
Graph construction — definitions become nodes, references become edges. This creates a navigable structural map capturing both what exists and how components depend on each other.
Storage — the repository is marked as ready only when all nodes and relationships are persisted. Subsequent requests for the same commit skip parsing entirely.

Inference¶

After the structural graph is built, Potpie runs inference on each node to generate semantic understanding:

Source resolution — each node's raw source code is fully resolved so it contains complete, self-contained code rather than dangling references.
Cache check — nodes with no code change since the previous run are retrieved from cache. Only new or modified nodes proceed to LLM processing.
LLM processing — each uncached node is sent to an LLM which produces:
- A short description of what the code does
- Tags classifying its role (authentication, database access, UI rendering, state management, etc.)
Embedding generation — semantic embeddings are generated from each description and written back to the graph.

Info

Large nodes are split into chunks, processed separately, and merged into a single result before indexing.

Indexing¶

The result is a fully annotated graph where every node is searchable by name, by role, and by meaning.

How agents query the graph¶

Once the knowledge graph is built, agents can query it through two complementary methods:

Vector search¶

Converts a natural language question into an embedding and finds nodes whose descriptions are most semantically similar. When the agent is working within a specific part of the codebase, search is scoped to connected nodes rather than the entire graph.

Structural traversal¶

Starts from a known node and follows all relationships outward, returning the complete subgraph of everything connected to it as a nested tree.

Agents understand code flows through two specialized capabilities:

Entry point detection

Identifies every function that nothing else in the codebase calls — API handlers, CLI commands, event listeners, scheduled jobs. These are the natural starting points of execution and give the agent a map of where behavior originates.

Neighbor traversal

Starts from any function and follows its call graph outward, collecting every function reachable through any number of hops. This gives the agent the full downstream impact surface of any given function.

Architecture¶

Potpie's context graph operates in two modes:

Local — the daemon runs on your machine, graph state stays local
Managed — opt-in cloud backend for team features and shared context

flowchart TB
  cli["potpie CLI"]

  subgraph local["Local profile"]
    direction TB
    daemon["local daemon"]
    services["service modules"]
    store[(local stores)]
    daemon --> services --> store
  end

  subgraph managed["Managed backend"]
    direction TB
    api["managed backend API"]
    mservices["service modules"]
    hstore[(hosted stores)]
    api --> mservices --> hstore
  end

  cli --> daemon
  cli -.->|login / managed pot| api

Both profiles use the same CLI surface — --local and --managed flags disambiguate when needed.