Graph Engine
Graph Engine
This note explains how the application builds and reasons about the internal graph of notes (the “wiki graph”). It covers the link-extraction logic, virtual nodes (temporal hierarchy), indexing behavior in the worker thread, interactions with the main indexer, and the utility algorithms used for validation, backlinks, components and statistics.
Key Files
- Implementation: src/main/graphBuilder.ts
- Main indexer / file I/O: src/main/indexer.ts
- Worker indexer: src/main/worker/indexer.ts
- File watcher: src/main/watcher.ts
- Wikilink parsing helpers: src/shared/wikilinks.ts
Data model
WikiGraph: The canonical in-memory representation is a map from filename to an array of outgoing link targets (type Record<string, string[]>). Each key is a filename (usually ending in.md, but virtual nodes also use.md-like IDs such as2026.mdor2026-01.md). SeebuildWikiGraphingraphBuilder.ts.
How links are extracted
- Source of truth: the link extraction logic lives in src/shared/wikilinks.ts. The worker and main indexer both call the same helpers so behavior is consistent between threaded and main-thread indexing.
- Extraction steps (simplified):
- Find all
[[...]]occurrences ignoring transclusions prefixed with!. - Strip aliases and headings (i.e.
file|aliasandfile#headingbecomefile). - Normalize
./relative markers by removing them. - If a link has a file extension, treat known media extensions as attachments and skip them (they are not added to the graph). If the extension is absent or ambiguous (e.g.
example.com) the code appends.mdso the graph uses page-like targets. - For nested paths (e.g.
People/John.md) the extractor emits explicit parent targets as additional links (e.g.People.md). This guarantees that hierarchical navigation is represented in the graph.
- Find all
See extractWikilinks for exact behavior, including the media extension blacklist.
Implicit path links
getImplicitPathLinks(filename)(insrc/shared/wikilinks.ts) returns an array of parent-level implicit links for any nested path. For exampleProjects/ProjA/notes.mdproducesProjects.mdandProjects/ProjA.md. This function is used both when parsing explicit wikilinks and when building the outgoing-links for a file itself (so a file at a nested path implicitly links to its parents).
Virtual temporal nodes (daily notes -> months -> years)
- The system recognizes daily notes by filename format YYYY-MM-DD.md via
isDailyNoteandextractDateHierarchyinsrc/main/graphBuilder.ts(and duplicated logic in the worker). generateVirtualTemporalNodes(files: string[])produces a small subgraph containing year nodes (e.g.2026.md) which point to month nodes (e.g.2026-01.md) which in turn point at the daily notes (e.g.2026-01-13.md). These are virtual nodes — they are created only in the graph and do not require backing files on disk. The virtual graph is merged into the main graph during indexing.
Worker indexing flow
- The worker entrypoint is src/main/worker/indexer.ts. The worker receives an array of
{ filename, content }objects viaparentPortand processes them in parallel. - For each file the worker:
- Calls
extractWikilinks(content)to get explicit wikilinks. - Calls
getImplicitPathLinks(filename)to add implicit parent links for the file’s own path. - Deduplicates links (preserves first-seen insertion order) by creating a Set and writing the result to
graph[filename]. - Extracts tasks using a regex-based
extractTaskshelper and emits a flat task array. - Indexes the file into MiniSearch (if available) with fields
title,content, andtags.
- Calls
- After all files are processed the worker calls
generateVirtualTemporalNodes(...)and merges those nodes into the graph. Finally the worker posts agraph-completemessage with{ graph, tasks }back to the main thread.
Notes:
- MiniSearch is dynamically imported in the worker to support multiple module contexts; search initialization occurs in
initSearch()and documents are added viasearchEngine.add(...). - The worker intentionally ignores read errors for robustness and logs counts on completion.
Main-thread indexing and I/O
- The main indexer lives in src/main/indexer.ts. It is responsible for reading files from disk, handling vault encryption (via
isEncryptionEnabled,getActiveMasterKey, anddecryptBuffer), and delegating heavy work to the worker thread vianew Worker(...). - File reading is done via
readMarkdownFile(filePath, vaultPath). If the vault is encrypted and a master key is available the file buffer is decrypted; on decryption failure the code falls back to treating the buffer as plaintext. getFilesRecursively(dir)is used to collect.mdfiles in the vault for full indexing.- The indexer also manages a separate prediction model for text completion — it computes token / n-gram / casing statistics per file and merges them globally. That behavior is separate from graph construction but coexists in the same module.
Graph construction API (graphBuilder.ts)
buildWikiGraph(fileContents: Record<string,string>, files?: string[])is a reusable helper that builds a graph from an in-memory map of filename→content. It:- Calls
extractWikilinks(content)for each file. - Calls
getImplicitPathLinks(filename)for the file’s own path. - Adds virtual temporal nodes when a
fileslist is provided by callinggenerateVirtualTemporalNodes(files). - Returns the raw
WikiGraph(a mapping of filename → outgoing-link list).
- Calls
validateGraph(graph, existingFiles)filters outgoing links to only those targets that exist (based on a provided set). This is used to remove broken links from views that expect only reachable targets.
Graph utility algorithms
getBacklinks(graph, targetFile)— linear scan over graph keys to find files that includetargetFilein their outgoing list. Returns a sorted array.getConnectedComponent(graph, targetFile)— BFS-like traversal that follows forward edges and also looks up backwards links (viagetBacklinks) to find the entire connected component containingtargetFile.findIsolatedFiles(graph)— returns files with no outgoing links and no backlinks.detectCycles(graph)— DFS that tracks a recursion stack to find back-edges, collects cycles (deduplicated by stringifying arrays). Implemented ingraphBuilder.ts.getGraphStats(graph)— computes total files, total links, average links per file, isolated-count, cycle-count and top-most-linked files by computing backlink counts.
Complexity notes:
- Most utilities scan the graph (O(N + E)) where N is node count and E is edge count.
getBacklinksis O(N + E) per invocation; the code uses it in places likegetConnectedComponentwhich makes the component traversal effectively O(N + E * f) where f depends on how many times backlinks are recomputed. For typical vault sizes this is acceptable; if scaling becomes a problem a precomputed reverse index could be added.
Task extraction
- The worker’s
extractTasksuses a regex to find checklist items (- [ ],- [/],- [x]) and maps these totodo|doing|done. It attempts to parse due dates and completion timestamps from several syntaxes (@due(YYYY-MM-DD),📅 YYYY-MM-DD,DEADLINE: <YYYY-MM-DD>, and✓ YYYY-MM-DD HH:MM:SS). Extracted tasks are included in thegraph-completepayload.
Watcher integration
- src/main/watcher.ts uses
chokidarto observe the vault. It debounces rapid FS events and ignores internal saves (viamarkInternalSave()and a small grace window) to avoid re-indexing changes the app itself wrote. - On changes the watcher sends
vault:file-changed,vault:file-added, andvault:file-deletedIPC messages to the renderer and calls registered callbacks (the main indexer registers callbacks to perform targeted reindexing for the affected file).
Normalization and edge cases
- Attachments and media links are ignored for graph edges (the attachment extension blacklist is in
extractWikilinks). - Transclusions prefixed with
![[...]]are skipped — they are treated as embedded content rather than navigational links. - Links that look like domain names (contain a dot) are treated as page names and
.mdis appended unless the extension is a known media extension. - Virtual nodes are only added where the file list is provided;
buildWikiGraphmerges them without overwriting existing keys (so a real file named2026.mdwill take precedence over a virtual year node).
Search indexing
- The worker dynamically loads MiniSearch and builds an in-memory index using the fields
title,content, andtags. Search requests are handled by the worker and return results with short content snippets.
Example: what a small graph looks like
Suppose the vault contains:
2026-01-13.mdwith content linking to[[ProjectX]]and[[People/John.md]]ProjectX.mdwith no outgoing links
The worker will produce a graph roughly like:
2026-01-13.md→ [ProjectX.md,People/John.md,People.md]ProjectX.md→ []People/John.md→ [People.md]2026-01.md→ [2026-01-13.md, …] (virtual month node)2026.md→ [2026-01.md, …] (virtual year node)
After validateGraph(graph, existingFiles) where existingFiles is the set of real filenames on disk, the validated graph will contain only links present in existingFiles (virtual nodes remain because they are created with the file list input).
Where to change behavior
- To adjust how wikilinks are parsed (e.g. change attachment handling or alias rules): edit src/shared/wikilinks.ts.
- To change virtual node format (for example different naming for year/month): edit
generateVirtualTemporalNodesin src/main/graphBuilder.ts (and the corresponding worker copy at src/main/worker/indexer.ts). - To alter task extraction rules: edit
extractTasksin src/main/worker/indexer.ts.
Implementation notes & rationale
- Parsing and indexing are split: the worker handles CPU-bound parsing and indexing (MiniSearch), while the main process handles I/O, decryption and coalescing updates into the worker. This keeps the main UI thread responsive.
- The system intentionally emits implicit parent links for nested paths so hierarchical navigation works without requiring duplicate explicit links in content.
- Virtual temporal nodes are a convenience layer; because they are generated from the file list they can be re-generated when the set of files changes and don’t require on-disk artifacts.