axe
Reference

Architecture

Architecture of the Hyperliquid module — deterministic harness, frontier planner, auxiliary models, replay-grounded outputs.

Current target shape

HLQ is shifting from a one-shot trader search pipeline toward a replayable Hyperliquid search harness.

The new architecture has three layers:

  • a deterministic pi-based harness that defines the environment and action loop
  • a frontier model accessed through Codex OAuth that acts as the heavyweight planner/operator
  • small domain-specific auxiliary models that handle narrow repeated tasks inside the same harness

The harness remains the source of truth for evidence, working-set transitions, provenance, and replay.

High-level flow

User brief

Frontier planner/operator (Codex OAuth)
  ↓ chooses bounded actions
Deterministic harness runtime

Environment views / result handles / working-set edits

Finalize or abstain with retained evidence

Trajectory log + provenance

This replaces the old mental model of "query → route once → render once → answer once" as the primary story. The one-shot path can still exist as a narrow execution mode, but the harness is now the main architectural center.

Core layers

1. Deterministic harness

The harness is the execution substrate. It defines:

  • bounded reads over replayable environment views
  • explicit keep/drop/prune working-set semantics
  • stable result handles
  • bounded terminal actions, including finalize and abstain
  • trajectory logs that reconstruct the full episode

This layer is intentionally deterministic and inspectable. For fixed code and fixed replay-pack input, the environment and logs should be reproducible.

2. Frontier planner/operator

A heavyweight frontier model sits above the harness.

Its job is to:

  • interpret the task
  • choose the next harness action
  • decompose the task into shallow branches where needed
  • decide when retained evidence is sufficient
  • package the final retained evidence into a compact answer

The frontier model is the flexible planner. It is not the authority on environment facts.

3. Auxiliary domain models

Small specialized models are attached where they clearly improve economics or consistency.

Early candidate roles:

  • action policy hints for standard search states
  • keep/drop/prune policy over the active working set
  • abstain calibration near weak-signal boundaries
  • verifier or reranker passes on candidate retained evidence

These are bounded coprocessors, not separate autonomous agents.

Why not start with a parallel search model

HLQ is not starting with a separate independent search planner.

That would create another policy surface to train, evaluate, and debug before the harness contract is stable. The preferred order is:

  1. freeze the deterministic harness
  2. run it with a heavyweight frontier planner/operator
  3. add small specialist models for repeated narrow decisions
  4. only later consider a more independent learned search policy if the stable harness shows it is warranted

Contract surfaces

The main contract surfaces are:

  • replay-pack schema
  • action names and argument shapes
  • result-handle semantics
  • working-set transitions
  • trajectory logs
  • terminal outputs
  • provenance fields on retained evidence

These surfaces let the planning layer improve without making the environment opaque.

Module layout

src/hlq/
├── backend.py              # Backend protocol + LocalBackend + RemoteBackend
├── bridge/
│   ├── loader.py           # Manifest loading, checksum validation
│   ├── action_space.py     # RouteAction loader, intent-based routing
│   ├── ann.py              # Retrieval/artifact loaders where applicable
│   ├── config.py           # Runtime config
│   ├── sql.py              # Template loader + renderer where applicable
│   └── models.py           # Model manifest + fallback utilities
├── search/
│   ├── harness_types.py    # Typed models for actions, observations, terminals
│   ├── harness_state.py    # Working-set and episode state
│   ├── replay_env.py       # Deterministic replay-pack environment
│   ├── graph_replay_env.py # Relational replay path where needed
│   ├── result_handles.py   # Stable handle contracts
│   ├── graph_query_dsl.py  # Bounded bridge DSL surface
│   ├── trajectory.py       # Multi-step trajectory logging
│   └── pipeline.py         # Narrow one-shot/live path where still useful
└── api/
    ├── cli.py              # CLI entry points
    ├── mcp.py              # MCP server
    └── types.py            # API models

Action loop

The planner does not receive one giant monolithic prompt. Instead it interacts with bounded harness actions.

Typical loop:

  1. inspect environment or schema
  2. fetch a small result set or expand neighbors
  3. keep selected evidence in the working set
  4. drop or prune stale evidence
  5. branch into one shallow subquery if needed
  6. finalize or abstain with a bounded stop reason

This keeps environment facts separate from policy reasoning.

Replay boundary

The harness replay path is deterministic-data-facing:

  • reads episode records from replay packs
  • serves bounded environment views by stable artifact or handle ID
  • records working-set transitions step by step
  • emits a structured terminal record plus retained evidence

The live path can remain separate for direct execution against current infrastructure, but replayability is the primary design center for search-policy work.

Provenance and retained evidence

Every retained artifact should carry lineage back to:

  • the replay pack or source snapshot
  • the query/action that produced the handle
  • the parent handle where relevant
  • the step where the artifact entered the working set

This is what makes the planner auditable.

Economics

The intended economics are:

  • use the frontier model for hard planning and final synthesis
  • use small domain models for bounded repeated decisions
  • keep the deterministic harness as the stable execution substrate

That gives HLQ a clearer path to lower-latency routine decisions without giving up broad reasoning capability.

Phased implementation

Phase 0: freeze the harness

  • stabilize replay-pack format
  • stabilize action names and terminal schema
  • stabilize trajectory logging and provenance fields

Phase 1: frontier-operated harness

  • run the harness with Codex OAuth as the primary planner/operator
  • establish baseline traces and evals

Phase 2: add targeted specialists

  • action hints
  • keep/drop/prune
  • abstain calibration
  • verifier/reranker

Phase 3: broader search-policy work

  • only after the harness contract is stable and measurable
  • evaluate any more independent learned policy against the same replay packs and bridge contracts