Graphify - mjkonieczny's coniferous forest

> A tool that builds a [[Knowledge graph]] from a folder of mixed inputs - code, docs, audio, transcripts, images - via a three-pass pipeline. [^1] ^definition ## Structure In fact, structurally the graph is a [[Confidence-weighted labelled property graph]]: ![[Confidence-weighted labelled property graph#^definition]] Every edge holds the confidence tier set during extraction (`EXTRACTED` / `INFERRED` / `AMBIGUOUS`), and edges are typed with relation verb-phrases such as: - `calls` - `imports` - `rationale_for` - `semantically_similar_to` Groupings of three or more nodes that cannot be captured by pairwise edges are stored separately as [[Hyperedge|hyperedges]] in `G.graph["hyperedges"]`. ## Incremental updates for documents A new or edited document triggers re-extraction of only that file - a SHA256 content cache skips everything else. A Claude subagent reads the changed file as part of a small batch and emits the new nodes and typed edges; the result is merged into the existing graph. The subagent does not see the existing graph during extraction, and there is no embedding-based lookup over the corpus. So a newly added doc is linked to existing nodes only by: - Explicit name references that match existing node labels - Other files in the same extraction batch, which the subagent reads together It will *not* discover semantic similarity to docs added in earlier runs unless they end up in the same batch on a future full rebuild. Restoring cross-corpus semantic similarity requires a full rebuild (`graphify .` without `--update`), which pays the full LLM cost on the whole corpus.[^2] [^1]: safishamsi/graphify - README and how-it-works. https://github.com/safishamsi/graphify. Accessed 2026-05-25. [^2]: safishamsi/graphify - README, `docs/how-it-works.md` (SHA256 cache section), and `CHANGELOG.md` (v0.6.3 fixes for `--update` manifest persistence and code-rebuild node preservation). https://github.com/safishamsi/graphify. Accessed 2026-05-25.