Incremental Indexing
Overview
sqry index is incremental by default. It reads a persistent hash index from .sqry-cache/, hashes every source file in the workspace, and reparses only the files whose hash changed since the last run. Unchanged files reuse the previously computed AST nodes and edges. The result: the second invocation of sqry index . on a clean tree completes in tens of milliseconds, even on large workspaces.
The hash index lives at .sqry-cache/file_hashes.bin. It is a tiny binary file (a few hundred bytes for most projects) and is safe to commit or to delete — sqry rebuilds it on the next index run.
How it works
┌───────────────────┐
sqry index . ──▶ hash files ──┤ hash matches? │── yes ──▶ reuse cached nodes/edges
└────────┬──────────┘
│ no
▼
reparse file
│
▼
commit nodes + edges
update .sqry-cache/file_hashes.bin
On every run sqry:
- Loads
.sqry-cache/file_hashes.binif it exists. - Hashes every workspace file in parallel.
- Reparses only files whose hash changed (or files that don’t yet have a cached hash).
- Commits the new nodes and edges into the existing graph snapshot.
- Writes the updated hash table back to
.sqry-cache/file_hashes.bin.
The on-disk graph snapshot at .sqry/graph/snapshot.sqry is updated atomically. Concurrent reads (CLI/LSP/MCP queries) see a consistent view at all times.
Forcing a full rebuild
Two ways to bypass the incremental path:
sqry index --force . # Same hash table, but reparse every file
sqry index --no-incremental . # Skip the hash table entirely (debug / forensic mode)
| Flag | Effect |
|---|---|
--force (-f) | Reparse every file but still update the hash table. Use after a major sqry upgrade or when the snapshot version bumps. |
--no-incremental | Disable the hash index entirely; sqry parses every file and does not write .sqry-cache/file_hashes.bin. Useful for debugging metadata-only evaluation paths. |
--add-to-gitignore | Auto-append .sqry-index/ to .gitignore so cached state never lands in commits. |
Custom cache directory
By default, the hash index lives at <workspace>/.sqry-cache/. Override the location with --cache-dir:
sqry index . --cache-dir /tmp/sqry-cache
This is most useful in:
- Read-only or sandboxed source trees — point the cache at a writable scratch directory while keeping the project read-only.
- Ephemeral CI runners — write the cache to a host-mounted volume so it survives container teardown, then mount it back in on the next CI job for free incrementality.
- Multi-checkout workflows — share one cache across two worktrees of the same repo to avoid double-indexing.
The --cache-dir path is created if it does not exist. Relative paths are resolved against the current working directory.
Metrics export
sqry index --status prints metadata about the existing index — age, symbol count, languages, validation health. Combine it with --metrics-format for machine-readable output:
# JSON (default)
sqry index --status --json
# Prometheus / OpenMetrics text
sqry index --status --json --metrics-format prometheus
The Prometheus output is OpenMetrics-compatible and exports the following gauges:
| Metric | Type | Description |
|---|---|---|
sqry_index_age_seconds | gauge | Seconds since the snapshot was last written |
sqry_index_node_count | gauge | Total nodes in the snapshot |
sqry_index_edge_count | gauge | Total edges in the snapshot |
sqry_index_file_count | gauge | Total files indexed |
sqry_index_validation_total | counter | Files inspected by the validation pass |
sqry_index_validation_missing | counter | Files that disappeared between index time and now |
sqry_index_validation_modified | counter | Files modified since index time |
Pipe the output directly into a Prometheus push gateway, or scrape it from a CI job:
sqry index --status --json --metrics-format prometheus \
| curl --data-binary @- "http://pushgateway:9091/metrics/job/sqry/instance/$(hostname)"
Validation modes
Independently of the cache, sqry index --validate <mode> controls how strict sqry is about source-file drift detected during a query:
| Mode | Behaviour |
|---|---|
warn (default) | Log a warning on drift, return results from the snapshot. |
fail | Exit with code 2 if more than 20% of indexed files are missing on disk. |
off | Skip validation entirely (fastest). |
sqry search "test" --validate fail # CI-friendly strict mode
sqry search "test" --validate off # Hot-path performance mode
Inside the daemon
When sqry runs as a daemon (see Daemon (sqryd)), the file-system watcher debounces events over debounce_ms (default 2000 ms) and triggers sqryd’s incremental reindex path automatically. You don’t need to call sqry index by hand — saving a file in your editor is enough. The hash-index machinery is the same in both paths.
Troubleshooting
- “Snapshot version mismatch”: A major sqry upgrade bumped the snapshot format. Run
sqry index --force .once to rewrite the snapshot in the new format. Hash cache survives across version bumps; the snapshot does not. - Stale results after editing files outside the editor: If you edit files via a tool sqry’s daemon watcher doesn’t see (e.g.
git checkoutof a different branch), runsqry index .to refresh, or callsqry daemon rebuild <path>to refresh a daemon-loaded workspace. - Hash index corrupt:
rm -rf .sqry-cache && sqry index .. The next run rebuilds it from scratch. - Cache dir on a slow filesystem (NFS, shared SMB): set
--cache-dir /tmp/sqry-cacheto keep the hash index on local disk. - Disk full during an index run: sqry writes the snapshot atomically — a partial write is rolled back. Free space and rerun.
Related
- Daemon (sqryd) — keep the graph warm in memory; integrates with the same hash-index machinery.
- Configuration — environment variables that influence cache and indexing throughput.
- Performance — measured benchmarks for cold and warm index runs.