axe
Reference

Status & Roadmap

Production readiness gates, current evaluation results, and the sequenced roadmap for the Hyperliquid module.

Production readiness

Tier 0: Search Advisory — READY

All 13 gates pass (evaluated 2026-02-22, 24-query benchmark):

GateThresholdObservedStatus
ANN-inline execution success>= 99%100.0%Pass
ANN-inline SQL alignment>= 95%100.0%Pass
Retrieval top-1 hit rate>= 85%91.1%Pass
Retrieval NDCG@K>= 90%95.3%Pass
ANN recall@256>= 45%47.7%Pass
Retrieval risk level<= mediummediumPass
Router eval@2>= 90%91.5%Pass
Invalid action rate<= 3%0.0%Pass
Penalty rate<= 5%3.1%Pass
BQ integrity (baseline)passpassPass
BQ integrity (ANN-inline)passpassPass
Execution success rate>= 95%100.0%Pass
Data provenanceimplementedyesPass

Capability: Semantic trader search, cohort discovery, explanation payloads.

Tier 1: Copytrade Assist — READY

Adds allocation plan proposals and preview-only execution (no live commit). All Tier 0 gates apply plus 7 control-plane tests (all pass):

  • Preview accepts valid plan
  • Preview rejects disallowed coin
  • Preview rejects weight violations
  • Commit requires preview OK
  • Kill switch blocks commit
  • Idempotency replay

Tier 2: Copytrade Execute — PROVISIONAL

Adds live commit with risk policy enforcement. Simulation harness passes, but no live exchange integration tested. The /copytrade/execute/commit endpoint should remain restricted until:

  1. Authenticated order path dry-runs against Hyperliquid endpoint
  2. Error/retry semantics under venue failures validated
  3. Reconciliation against fill + position state tested
  4. Policy violation handling + rollback tested

What works end-to-end

Intent FamilyTemplateStatusEvidence
risk_regimerisk_regime_stress_scoring_dailyWorkingSandbox t002: success (4/5)
whale_rankingwhale_ranking_by_position_dailyWorkingSandbox t011: partial (needs chaining)
screeningcomposite_trader_screening_dailyWorkingSandbox t007: success (4/5)
anomalyanomaly_zscore_detection_dailyWorkingSandbox t009: success (4/5)
counterfactualcounterfactual_stop_policy_bucket_proxyWorkingSandbox t006: success (4/5)
similaritysimilarity_profile_scoring_dailyPartialSandbox t003: partial (address validation)
copy_lagcopy_lag_pairwise_corr_dailyBrokenSandbox t004, t012, t015: blocked
help(internal)WorkingSandbox t018: success

Known blockers

Single-address copytrade (critical)

copy_lag_pairwise_corr_daily SQL self-join returns 0 rows when only 1 address in shortlist. Cannot discover followers of a single whale automatically.

Workaround: Provide both --leader and --follower addresses explicitly.

Multi-coin query execution

When mentions_multi_coin=True, pipeline returns coin=None, template=None instead of iterating per coin.

Workaround: Run separate single-coin queries and compare manually.

Intent classification (regex_v1)

Regex-based cascading classifier. Works for 6/8 intent families but misroutes on:

  • Negation ("NOT high risk" → should be screening, not risk_regime)
  • Multi-intent ("find whales then check for copy-trading")
  • Ambiguous risk/screening boundary

Status: Ownership transferred to RL team via REQ-006.

Competitive landscape

Surveyed 12 Hyperliquid-related tools (March 2026):

CapabilityAny Competitor?HLQ
Natural language trader queryNoneYes
Behavioral embeddings / similarity searchNoneYes
Semantic trader profiling APINoneYes
Copy-trading detectionCopin, Hyperdash, HyperbotYes
Whale trackingCoinGlass, HyperTrackerYes
Anomaly detectionNoneYes
Counterfactual analysisNoneYes
Per-result provenanceNoneYes

HLQ occupies an uncontested position in NL query + behavioral embeddings for Hyperliquid.

Roadmap

Near-term (weeks)

  • Freeze the harness contract — stabilize replay-pack schema, action names, result handles, working-set semantics, and terminal outputs.
  • Frontier-operated harness baseline — run the Hyperliquid harness with Codex OAuth as the primary planner/operator and establish reference traces.
  • Fix critical live-path gaps — single-address copytrade and multi-coin execution remain important if the legacy one-shot path stays exposed.

Medium-term (months)

  • Targeted small models — add bounded specialists for action hints, keep/drop/prune, abstain calibration, and verifier/reranker checks.
  • Monitoring & alerts — build alerting and change detection on top of the same harness and provenance contracts.
  • Remote surfaces on one contract — keep CLI, MCP, and web surfaces aligned around the same replayable harness semantics.

Long-term (quarters)

  • Broader learned search policy — only after the harness contract is stable and measurable.
  • Richer bridge views — expand the bridge DSL carefully without letting the environment quietly do the reasoning.
  • Cross-domain investigation — broaden beyond Hyperliquid only if provenance, replayability, and bounded state remain intact.
DocumentLocation
Agentic Search Proposal~/work/AGENTIC_SEARCH_PROPOSAL.md
MVP Approach Comparison~/work/MVP_APPROACH_COMPARISON.md
Behavioral Encoder Notes~/work/BEHAVIORAL_ENCODER_BREAKTHROUGH_NOTES.md
Policy Delta Roadmap~/work/hlq/POLICY_DELTA_ROADMAP.md
RL Requests~/work/hlq/RL_REQUESTS.md
Dev Notebook~/work/hlq/NOTEBOOK.md