Search Harness

Phase-0 contract for the Hyperliquid search harness — bounded reads, working-set edits, terminal decisions, replayable runs.

Status

This document freezes the Phase 0 contract for the Hyperliquid search harness v1.

It now reflects the preferred operating shape:

a pi-based deterministic harness is the main runtime contract
a frontier model accessed through Codex OAuth acts as the heavyweight planner/operator
small domain-specific auxiliary models may support bounded repeated tasks inside the harness
the harness, not hidden prompt state, remains the source of truth

Purpose

The v1 harness turns the current one-shot search flow into a bounded search environment where a policy can:

inspect deterministic evidence in multiple steps
keep or drop evidence from an explicit working set
branch into shallow subqueries
stop with either a bounded decision or an abstention

The environment is external to the prompt. The planner inspects slices of that environment through explicit actions rather than receiving one large preassembled bundle.

In the preferred operating mode, the frontier planner handles the broad reasoning while the harness enforces bounded reads, explicit state transitions, and replayable terminal outputs.

How This Differs From The Current Pipeline

Current one-shot search:

accepts one natural-language query
routes once
renders one SQL template
executes once against live data
returns one ranked result payload plus provenance

Search harness v1:

accepts one bounded monitoring or research query tied to a replay episode
runs a stepwise action loop over deterministic replay-pack views
maintains an explicit active working set separate from the full environment
records keep/drop/prune decisions in the trajectory
terminates with a structured decision object rather than a single one-shot result

In short, the current pipeline is a direct query executor. The harness is a replayable search environment operated by a heavyweight planner and later supportable by small specialist policies for bounded subproblems.

Scope

The v1 harness is intentionally narrow:

Hyperliquid monitoring and research queries only
deterministic replay-pack inputs only
shallow branching only
bounded action set only
no broad web search
no arbitrary-depth recursion
no requirement to touch live BigQuery in the replay path

Core Concepts

Environment state

The environment is the full replayable episode state available to the runtime but not automatically injected into the active prompt.

Required episode-level fields:

episode_id
query
anchor_market
window_id
step_budget
token_budget_class
environment_views
ground_truth_reference

Artifact registry

Each retrievable view is represented as an artifact with a stable ID inside the episode.

Required artifact fields:

artifact_id
artifact_type
view_name
anchor_market
window_id
payload
source_refs

artifact_id stability matters for deterministic replay. For a fixed replay pack and fixed loader version, the same artifact must resolve to the same ID on every run.

Working set

The working set is the active evidence context selected by the policy.

Required working-set fields:

active_artifact_ids
active_artifact_summaries
retrieval_history
branch_history
step_budget_remaining
context_pressure_class

Working-set semantics are explicit:

reading an artifact does not automatically keep it
keep_artifact copies an artifact into active working memory
drop_artifact removes it from active working memory only
dropped artifacts remain available in the episode registry for later revisit
prune_working_set is a bulk drop operation and must remain trajectory-visible

Frozen v1 Action Set

Action names below are the Phase 0 contract and should not be renamed without a deliberate contract update.

Read actions

Action	Arguments	Returns
`read_market_state`	`anchor_market`, `window_id`	Raw market snapshot artifact for one market-window pair
`read_derived_metrics`	`anchor_market`, `window_id`	Normalized or derived metrics artifact for the same scope
`read_persistence`	`anchor_market`, `window_id`	Persistence or follow-through artifact for the same scope
`compare_markets`	`anchor_market`, `peer_markets`, `window_id`	Comparison artifact for ranking or prioritization
`retrieve_similar_prior_episodes`	`anchor_market`, `pattern_id_or_query`	Ranked shortlist artifact of prior episodes

Context-management actions

Action	Arguments	Contract
`keep_artifact`	`artifact_id`	Adds the artifact to the active working set
`drop_artifact`	`artifact_id`	Removes the artifact from the active working set
`prune_working_set`	`artifact_ids`, `reason`	Removes multiple artifacts in one explicit step

Control actions

Action	Arguments	Contract
`branch_subquery`	`subquery_type`, `arguments`	Materializes a bounded child retrieval turn and returns artifacts into the same episode registry
`finalize`	`decision_class`, `retained_artifact_ids`, `stop_reason`, `open_risks`	Completes the episode with a bounded decision
`abstain`	`retained_artifact_ids`, `stop_reason`, `open_risks`	Completes the episode with an explicit no-decision outcome

decision_class is frozen to:

finalize_signal
finalize_low_signal

abstain remains a separate terminal action rather than a decision_class value.

Branching Contract

v1 allows shallow explicit decomposition only.

branch_subquery must:

name the branch via subquery_type
carry structured arguments
point back to the parent step in the trajectory
register any produced artifacts in the same episode registry

Examples of subquery_type values:

derived_metrics
persistence_check
peer_comparison
prior_episode_lookup

Arbitrary recursive model self-calls are out of scope for v1.

Terminal Output Contract

Every harness run must end in exactly one terminal record.

Required terminal fields:

episode_id
terminal_action
decision_class
retained_artifact_ids
retained_evidence
open_risks
stop_reason

Terminal rules:

terminal_action=finalize requires decision_class in finalize_signal | finalize_low_signal
terminal_action=abstain requires decision_class=null
retained_evidence is a materialized view of the final retained artifacts
stop_reason must be bounded and schema-valid, not a long free-form essay

Replay-Pack Contract

The replay path is directory-based and immutable once produced.

Required pack layout:

manifest.json
episodes.jsonl
README.md

`manifest.json`

Required manifest fields:

schema_version
pack_id
generated_at_utc
generator_script
generator_repo_relpath
generator_git_sha
source_dataset_refs
episode_count

`episodes.jsonl`

Each line is one episode record with this minimum shape:

{
  "episode_id": "hl_btc_2026-03-28_window_24h_001",
  "query": "Is BTC on Hyperliquid showing meaningful activity worth briefing right now?",
  "anchor_market": "BTC",
  "window_id": "2026-03-28T00:00:00Z/2026-03-29T00:00:00Z",
  "step_budget": 12,
  "token_budget_class": "medium",
  "environment_views": {
    "market_state": {
      "artifact_id": "artifact.market_state.btc.24h",
      "artifact_type": "market_state",
      "view_name": "market_state",
      "anchor_market": "BTC",
      "window_id": "2026-03-28T00:00:00Z/2026-03-29T00:00:00Z",
      "payload": {},
      "source_refs": []
    },
    "derived_metrics": {
      "artifact_id": "artifact.derived_metrics.btc.24h",
      "artifact_type": "derived_metrics",
      "view_name": "derived_metrics",
      "anchor_market": "BTC",
      "window_id": "2026-03-28T00:00:00Z/2026-03-29T00:00:00Z",
      "payload": {},
      "source_refs": []
    },
    "persistence": {
      "artifact_id": "artifact.persistence.btc.24h",
      "artifact_type": "persistence",
      "view_name": "persistence",
      "anchor_market": "BTC",
      "window_id": "2026-03-28T00:00:00Z/2026-03-29T00:00:00Z",
      "payload": {},
      "source_refs": []
    },
    "peer_comparisons": [],
    "prior_episode_shortlists": []
  },
  "ground_truth_reference": {
    "decision_class": "finalize_signal",
    "retained_artifact_ids": [
      "artifact.market_state.btc.24h",
      "artifact.derived_metrics.btc.24h"
    ]
  }
}

Notes:

peer_comparisons and prior_episode_shortlists are optional arrays that may be empty
field ordering should be stable where easy
episode ordering should be stable for fixed inputs
packs must be regenerated into a new timestamped directory instead of edited in place

Deterministic Replay Notes

Deterministic replay is part of the contract, not an implementation detail.

For fixed code revision and fixed replay-pack input, the runtime should aim for stable:

artifact ID resolution
episode ordering
environment-view ordering
step ordering
terminal output shape

If any part cannot be fully deterministic, the nondeterministic component must be called out in the replay-pack manifest or runtime metadata.

On this page