sqry v4.6.3 introduces natural language queries via sqry ask, switches the intent classifier to all-MiniLM-L6-v2, and validates performance at Linux kernel scale.
sqry ask translates plain English into safe, validated sqry commands:
sqry ask "find authentication functions in rust"
# → sqry query "name~=/auth/ AND kind:function" --language rust
sqry ask "who calls the login function"
# → sqry graph direct-callers "login"
sqry ask "trace from main to database"
# → sqry graph trace-path "main" "database"
The translation pipeline runs a 6-stage process: preprocess (Unicode normalization, homoglyph detection) → extract entities → classify intent → assemble command → validate safety → cache. Every generated command is checked against a whitelist — no shell metacharacters, no path traversal, no write operations.
A 4-tier confidence system controls execution:
| Tier | Confidence | Behavior |
|---|---|---|
| Execute | >= 85% | Shows the command, asks to run it |
| Confirm | 65–84% | Confirmation prompt with the command |
| Disambiguate | < 65% | Multiple options to choose from |
| Reject | n/a | Input failed validation; shows suggestions |
Use --auto-execute to skip confirmation for high-confidence translations, or --dry-run to see the generated command without running it.
Natural language is also available via the sqry_ask MCP tool (for Claude, Codex, Gemini) and the sqry/ask LSP endpoint.
The intent classifier has been switched from DistilBERT to all-MiniLM-L6-v2 (22M parameters). Key improvements:
| Metric | Value |
|---|---|
| Base model | sentence-transformers/all-MiniLM-L6-v2 |
| ONNX INT8 size | 57 MB |
| Accuracy | 99.75% |
| P50 latency | 2.1 ms |
| P90 latency | 3.0 ms |
| Calibrated ECE | 0.0006 |
The classifier is feature-gated. Without ONNX Runtime, sqry falls back to a rule-based classifier that achieves >=70% accuracy with zero external dependencies.
sqry v4.6.x has been validated against the Linux kernel source tree:
| Metric | Value |
|---|---|
| Codebase | ~28M LOC, 63,074 C files |
| Index time | 1m48s (24-core machine) |
| Nodes indexed | 11,205,544 |
| Edges resolved | 18,292,255 |
| Snapshot size | 1.8 GB |
| Caller query latency | ~85 ms (100 results) |
Tested scenarios include syscall-to-disk call path tracing, cross-subsystem cycle detection, blast-radius analysis for kfree and copy_from_user, and dead code detection in drivers/staging/.