axe
Reference

Search Harness

Phase-0 contract for the Hyperliquid search harness — bounded reads, working-set edits, terminal decisions, replayable runs.

Status

This document freezes the Phase 0 contract for the Hyperliquid search harness v1.

It now reflects the preferred operating shape:

  • a pi-based deterministic harness is the main runtime contract
  • a frontier model accessed through Codex OAuth acts as the heavyweight planner/operator
  • small domain-specific auxiliary models may support bounded repeated tasks inside the harness
  • the harness, not hidden prompt state, remains the source of truth

Purpose

The v1 harness turns the current one-shot search flow into a bounded search environment where a policy can:

  • inspect deterministic evidence in multiple steps
  • keep or drop evidence from an explicit working set
  • branch into shallow subqueries
  • stop with either a bounded decision or an abstention

The environment is external to the prompt. The planner inspects slices of that environment through explicit actions rather than receiving one large preassembled bundle.

In the preferred operating mode, the frontier planner handles the broad reasoning while the harness enforces bounded reads, explicit state transitions, and replayable terminal outputs.

How This Differs From The Current Pipeline

Current one-shot search:

  • accepts one natural-language query
  • routes once
  • renders one SQL template
  • executes once against live data
  • returns one ranked result payload plus provenance

Search harness v1:

  • accepts one bounded monitoring or research query tied to a replay episode
  • runs a stepwise action loop over deterministic replay-pack views
  • maintains an explicit active working set separate from the full environment
  • records keep/drop/prune decisions in the trajectory
  • terminates with a structured decision object rather than a single one-shot result

In short, the current pipeline is a direct query executor. The harness is a replayable search environment operated by a heavyweight planner and later supportable by small specialist policies for bounded subproblems.

Scope

The v1 harness is intentionally narrow:

  • Hyperliquid monitoring and research queries only
  • deterministic replay-pack inputs only
  • shallow branching only
  • bounded action set only
  • no broad web search
  • no arbitrary-depth recursion
  • no requirement to touch live BigQuery in the replay path

Core Concepts

Environment state

The environment is the full replayable episode state available to the runtime but not automatically injected into the active prompt.

Required episode-level fields:

  • episode_id
  • query
  • anchor_market
  • window_id
  • step_budget
  • token_budget_class
  • environment_views
  • ground_truth_reference

Artifact registry

Each retrievable view is represented as an artifact with a stable ID inside the episode.

Required artifact fields:

  • artifact_id
  • artifact_type
  • view_name
  • anchor_market
  • window_id
  • payload
  • source_refs

artifact_id stability matters for deterministic replay. For a fixed replay pack and fixed loader version, the same artifact must resolve to the same ID on every run.

Working set

The working set is the active evidence context selected by the policy.

Required working-set fields:

  • active_artifact_ids
  • active_artifact_summaries
  • retrieval_history
  • branch_history
  • step_budget_remaining
  • context_pressure_class

Working-set semantics are explicit:

  • reading an artifact does not automatically keep it
  • keep_artifact copies an artifact into active working memory
  • drop_artifact removes it from active working memory only
  • dropped artifacts remain available in the episode registry for later revisit
  • prune_working_set is a bulk drop operation and must remain trajectory-visible

Frozen v1 Action Set

Action names below are the Phase 0 contract and should not be renamed without a deliberate contract update.

Read actions

ActionArgumentsReturns
read_market_stateanchor_market, window_idRaw market snapshot artifact for one market-window pair
read_derived_metricsanchor_market, window_idNormalized or derived metrics artifact for the same scope
read_persistenceanchor_market, window_idPersistence or follow-through artifact for the same scope
compare_marketsanchor_market, peer_markets, window_idComparison artifact for ranking or prioritization
retrieve_similar_prior_episodesanchor_market, pattern_id_or_queryRanked shortlist artifact of prior episodes

Context-management actions

ActionArgumentsContract
keep_artifactartifact_idAdds the artifact to the active working set
drop_artifactartifact_idRemoves the artifact from the active working set
prune_working_setartifact_ids, reasonRemoves multiple artifacts in one explicit step

Control actions

ActionArgumentsContract
branch_subquerysubquery_type, argumentsMaterializes a bounded child retrieval turn and returns artifacts into the same episode registry
finalizedecision_class, retained_artifact_ids, stop_reason, open_risksCompletes the episode with a bounded decision
abstainretained_artifact_ids, stop_reason, open_risksCompletes the episode with an explicit no-decision outcome

decision_class is frozen to:

  • finalize_signal
  • finalize_low_signal

abstain remains a separate terminal action rather than a decision_class value.

Branching Contract

v1 allows shallow explicit decomposition only.

branch_subquery must:

  • name the branch via subquery_type
  • carry structured arguments
  • point back to the parent step in the trajectory
  • register any produced artifacts in the same episode registry

Examples of subquery_type values:

  • derived_metrics
  • persistence_check
  • peer_comparison
  • prior_episode_lookup

Arbitrary recursive model self-calls are out of scope for v1.

Terminal Output Contract

Every harness run must end in exactly one terminal record.

Required terminal fields:

  • episode_id
  • terminal_action
  • decision_class
  • retained_artifact_ids
  • retained_evidence
  • open_risks
  • stop_reason

Terminal rules:

  • terminal_action=finalize requires decision_class in finalize_signal | finalize_low_signal
  • terminal_action=abstain requires decision_class=null
  • retained_evidence is a materialized view of the final retained artifacts
  • stop_reason must be bounded and schema-valid, not a long free-form essay

Replay-Pack Contract

The replay path is directory-based and immutable once produced.

Required pack layout:

  • manifest.json
  • episodes.jsonl
  • README.md

manifest.json

Required manifest fields:

  • schema_version
  • pack_id
  • generated_at_utc
  • generator_script
  • generator_repo_relpath
  • generator_git_sha
  • source_dataset_refs
  • episode_count

episodes.jsonl

Each line is one episode record with this minimum shape:

{
  "episode_id": "hl_btc_2026-03-28_window_24h_001",
  "query": "Is BTC on Hyperliquid showing meaningful activity worth briefing right now?",
  "anchor_market": "BTC",
  "window_id": "2026-03-28T00:00:00Z/2026-03-29T00:00:00Z",
  "step_budget": 12,
  "token_budget_class": "medium",
  "environment_views": {
    "market_state": {
      "artifact_id": "artifact.market_state.btc.24h",
      "artifact_type": "market_state",
      "view_name": "market_state",
      "anchor_market": "BTC",
      "window_id": "2026-03-28T00:00:00Z/2026-03-29T00:00:00Z",
      "payload": {},
      "source_refs": []
    },
    "derived_metrics": {
      "artifact_id": "artifact.derived_metrics.btc.24h",
      "artifact_type": "derived_metrics",
      "view_name": "derived_metrics",
      "anchor_market": "BTC",
      "window_id": "2026-03-28T00:00:00Z/2026-03-29T00:00:00Z",
      "payload": {},
      "source_refs": []
    },
    "persistence": {
      "artifact_id": "artifact.persistence.btc.24h",
      "artifact_type": "persistence",
      "view_name": "persistence",
      "anchor_market": "BTC",
      "window_id": "2026-03-28T00:00:00Z/2026-03-29T00:00:00Z",
      "payload": {},
      "source_refs": []
    },
    "peer_comparisons": [],
    "prior_episode_shortlists": []
  },
  "ground_truth_reference": {
    "decision_class": "finalize_signal",
    "retained_artifact_ids": [
      "artifact.market_state.btc.24h",
      "artifact.derived_metrics.btc.24h"
    ]
  }
}

Notes:

  • peer_comparisons and prior_episode_shortlists are optional arrays that may be empty
  • field ordering should be stable where easy
  • episode ordering should be stable for fixed inputs
  • packs must be regenerated into a new timestamped directory instead of edited in place

Deterministic Replay Notes

Deterministic replay is part of the contract, not an implementation detail.

For fixed code revision and fixed replay-pack input, the runtime should aim for stable:

  • artifact ID resolution
  • episode ordering
  • environment-view ordering
  • step ordering
  • terminal output shape

If any part cannot be fully deterministic, the nondeterministic component must be called out in the replay-pack manifest or runtime metadata.