Status & Roadmap

Production readiness gates, current evaluation results, and the sequenced roadmap for the Hyperliquid module.

Production readiness

Tier 0: Search Advisory — READY

All 13 gates pass (evaluated 2026-02-22, 24-query benchmark):

Gate	Threshold	Observed	Status
ANN-inline execution success	>= 99%	100.0%	Pass
ANN-inline SQL alignment	>= 95%	100.0%	Pass
Retrieval top-1 hit rate	>= 85%	91.1%	Pass
Retrieval NDCG@K	>= 90%	95.3%	Pass
ANN recall@256	>= 45%	47.7%	Pass
Retrieval risk level	<= medium	medium	Pass
Router eval@2	>= 90%	91.5%	Pass
Invalid action rate	<= 3%	0.0%	Pass
Penalty rate	<= 5%	3.1%	Pass
BQ integrity (baseline)	pass	pass	Pass
BQ integrity (ANN-inline)	pass	pass	Pass
Execution success rate	>= 95%	100.0%	Pass
Data provenance	implemented	yes	Pass

Capability: Semantic trader search, cohort discovery, explanation payloads.

Tier 1: Copytrade Assist — READY

Adds allocation plan proposals and preview-only execution (no live commit). All Tier 0 gates apply plus 7 control-plane tests (all pass):

Preview accepts valid plan
Preview rejects disallowed coin
Preview rejects weight violations
Commit requires preview OK
Kill switch blocks commit
Idempotency replay

Tier 2: Copytrade Execute — PROVISIONAL

Adds live commit with risk policy enforcement. Simulation harness passes, but no live exchange integration tested. The /copytrade/execute/commit endpoint should remain restricted until:

Authenticated order path dry-runs against Hyperliquid endpoint
Error/retry semantics under venue failures validated
Reconciliation against fill + position state tested
Policy violation handling + rollback tested

What works end-to-end

Intent Family	Template	Status	Evidence
risk_regime	risk_regime_stress_scoring_daily	Working	Sandbox t002: success (4/5)
whale_ranking	whale_ranking_by_position_daily	Working	Sandbox t011: partial (needs chaining)
screening	composite_trader_screening_daily	Working	Sandbox t007: success (4/5)
anomaly	anomaly_zscore_detection_daily	Working	Sandbox t009: success (4/5)
counterfactual	counterfactual_stop_policy_bucket_proxy	Working	Sandbox t006: success (4/5)
similarity	similarity_profile_scoring_daily	Partial	Sandbox t003: partial (address validation)
copy_lag	copy_lag_pairwise_corr_daily	Broken	Sandbox t004, t012, t015: blocked
help	(internal)	Working	Sandbox t018: success

Known blockers

Single-address copytrade (critical)

copy_lag_pairwise_corr_daily SQL self-join returns 0 rows when only 1 address in shortlist. Cannot discover followers of a single whale automatically.

Workaround: Provide both --leader and --follower addresses explicitly.

Multi-coin query execution

When mentions_multi_coin=True, pipeline returns coin=None, template=None instead of iterating per coin.

Workaround: Run separate single-coin queries and compare manually.

Intent classification (regex_v1)

Regex-based cascading classifier. Works for 6/8 intent families but misroutes on:

Negation ("NOT high risk" → should be screening, not risk_regime)
Multi-intent ("find whales then check for copy-trading")
Ambiguous risk/screening boundary

Status: Ownership transferred to RL team via REQ-006.

Competitive landscape

Surveyed 12 Hyperliquid-related tools (March 2026):

Capability	Any Competitor?	HLQ
Natural language trader query	None	Yes
Behavioral embeddings / similarity search	None	Yes
Semantic trader profiling API	None	Yes
Copy-trading detection	Copin, Hyperdash, Hyperbot	Yes
Whale tracking	CoinGlass, HyperTracker	Yes
Anomaly detection	None	Yes
Counterfactual analysis	None	Yes
Per-result provenance	None	Yes

HLQ occupies an uncontested position in NL query + behavioral embeddings for Hyperliquid.

Roadmap

Near-term (weeks)

Freeze the harness contract — stabilize replay-pack schema, action names, result handles, working-set semantics, and terminal outputs.
Frontier-operated harness baseline — run the Hyperliquid harness with Codex OAuth as the primary planner/operator and establish reference traces.
Fix critical live-path gaps — single-address copytrade and multi-coin execution remain important if the legacy one-shot path stays exposed.

Medium-term (months)

Targeted small models — add bounded specialists for action hints, keep/drop/prune, abstain calibration, and verifier/reranker checks.
Monitoring & alerts — build alerting and change detection on top of the same harness and provenance contracts.
Remote surfaces on one contract — keep CLI, MCP, and web surfaces aligned around the same replayable harness semantics.

Long-term (quarters)

Broader learned search policy — only after the harness contract is stable and measurable.
Richer bridge views — expand the bridge DSL carefully without letting the environment quietly do the reasoning.
Cross-domain investigation — broaden beyond Hyperliquid only if provenance, replayability, and bounded state remain intact.

Document	Location
Agentic Search Proposal	`~/work/AGENTIC_SEARCH_PROPOSAL.md`
MVP Approach Comparison	`~/work/MVP_APPROACH_COMPARISON.md`
Behavioral Encoder Notes	`~/work/BEHAVIORAL_ENCODER_BREAKTHROUGH_NOTES.md`
Policy Delta Roadmap	`~/work/hlq/POLICY_DELTA_ROADMAP.md`
RL Requests	`~/work/hlq/RL_REQUESTS.md`
Dev Notebook	`~/work/hlq/NOTEBOOK.md`

On this page