Architecture

Architecture of the Hyperliquid module — deterministic harness, frontier planner, auxiliary models, replay-grounded outputs.

Current target shape

HLQ is shifting from a one-shot trader search pipeline toward a replayable Hyperliquid search harness.

The new architecture has three layers:

a deterministic pi-based harness that defines the environment and action loop
a frontier model accessed through Codex OAuth that acts as the heavyweight planner/operator
small domain-specific auxiliary models that handle narrow repeated tasks inside the same harness

The harness remains the source of truth for evidence, working-set transitions, provenance, and replay.

High-level flow

User brief
  ↓
Frontier planner/operator (Codex OAuth)
  ↓ chooses bounded actions
Deterministic harness runtime
  ↓
Environment views / result handles / working-set edits
  ↓
Finalize or abstain with retained evidence
  ↓
Trajectory log + provenance

This replaces the old mental model of "query → route once → render once → answer once" as the primary story. The one-shot path can still exist as a narrow execution mode, but the harness is now the main architectural center.

Core layers

1. Deterministic harness

The harness is the execution substrate. It defines:

bounded reads over replayable environment views
explicit keep/drop/prune working-set semantics
stable result handles
bounded terminal actions, including finalize and abstain
trajectory logs that reconstruct the full episode

This layer is intentionally deterministic and inspectable. For fixed code and fixed replay-pack input, the environment and logs should be reproducible.

2. Frontier planner/operator

A heavyweight frontier model sits above the harness.

Its job is to:

interpret the task
choose the next harness action
decompose the task into shallow branches where needed
decide when retained evidence is sufficient
package the final retained evidence into a compact answer

The frontier model is the flexible planner. It is not the authority on environment facts.

3. Auxiliary domain models

Small specialized models are attached where they clearly improve economics or consistency.

Early candidate roles:

action policy hints for standard search states
keep/drop/prune policy over the active working set
abstain calibration near weak-signal boundaries
verifier or reranker passes on candidate retained evidence

These are bounded coprocessors, not separate autonomous agents.

Why not start with a parallel search model

HLQ is not starting with a separate independent search planner.

That would create another policy surface to train, evaluate, and debug before the harness contract is stable. The preferred order is:

freeze the deterministic harness
run it with a heavyweight frontier planner/operator
add small specialist models for repeated narrow decisions
only later consider a more independent learned search policy if the stable harness shows it is warranted

Contract surfaces

The main contract surfaces are:

replay-pack schema
action names and argument shapes
result-handle semantics
working-set transitions
trajectory logs
terminal outputs
provenance fields on retained evidence

These surfaces let the planning layer improve without making the environment opaque.

Module layout

src/hlq/
├── backend.py              # Backend protocol + LocalBackend + RemoteBackend
├── bridge/
│   ├── loader.py           # Manifest loading, checksum validation
│   ├── action_space.py     # RouteAction loader, intent-based routing
│   ├── ann.py              # Retrieval/artifact loaders where applicable
│   ├── config.py           # Runtime config
│   ├── sql.py              # Template loader + renderer where applicable
│   └── models.py           # Model manifest + fallback utilities
├── search/
│   ├── harness_types.py    # Typed models for actions, observations, terminals
│   ├── harness_state.py    # Working-set and episode state
│   ├── replay_env.py       # Deterministic replay-pack environment
│   ├── graph_replay_env.py # Relational replay path where needed
│   ├── result_handles.py   # Stable handle contracts
│   ├── graph_query_dsl.py  # Bounded bridge DSL surface
│   ├── trajectory.py       # Multi-step trajectory logging
│   └── pipeline.py         # Narrow one-shot/live path where still useful
└── api/
    ├── cli.py              # CLI entry points
    ├── mcp.py              # MCP server
    └── types.py            # API models

Action loop

The planner does not receive one giant monolithic prompt. Instead it interacts with bounded harness actions.

Typical loop:

inspect environment or schema
fetch a small result set or expand neighbors
keep selected evidence in the working set
drop or prune stale evidence
branch into one shallow subquery if needed
finalize or abstain with a bounded stop reason

This keeps environment facts separate from policy reasoning.

Replay boundary

The harness replay path is deterministic-data-facing:

reads episode records from replay packs
serves bounded environment views by stable artifact or handle ID
records working-set transitions step by step
emits a structured terminal record plus retained evidence

The live path can remain separate for direct execution against current infrastructure, but replayability is the primary design center for search-policy work.

Provenance and retained evidence

Every retained artifact should carry lineage back to:

the replay pack or source snapshot
the query/action that produced the handle
the parent handle where relevant
the step where the artifact entered the working set

This is what makes the planner auditable.

Economics

The intended economics are:

use the frontier model for hard planning and final synthesis
use small domain models for bounded repeated decisions
keep the deterministic harness as the stable execution substrate

That gives HLQ a clearer path to lower-latency routine decisions without giving up broad reasoning capability.

Phased implementation

Phase 0: freeze the harness

stabilize replay-pack format
stabilize action names and terminal schema
stabilize trajectory logging and provenance fields

Phase 1: frontier-operated harness

run the harness with Codex OAuth as the primary planner/operator
establish baseline traces and evals

Phase 2: add targeted specialists

action hints
keep/drop/prune
abstain calibration
verifier/reranker

Phase 3: broader search-policy work

only after the harness contract is stable and measurable
evaluate any more independent learned policy against the same replay packs and bridge contracts

On this page