Language Server CLI Empowers Language Agents with Process Rewards

Yifan Zhang et al.

Abstract

Large language models hallucinate APIs and mislocalize edits; language servers compute verified, IDE‑grade facts. Lanser‑CLI is a CLI‑first orchestration layer that pins and mediates a Language Server Protocol (LSP) server for coding agents and CI, exposing deterministic, replayable workflows. Beyond structural information (definitions, references, types, diagnostics), the same facts yield an actionable process reward: machine‑checked, step‑wise signals that align an agent’s planning loop with program reality.

Contributions include: (i) a robust addressing scheme via a Selector DSL (symbolic, AST‑path, content‑anchored) with a principled relocation algorithm; (ii) deterministic Analysis Bundles that normalize server responses and capture environment/capability metadata under stable content hashes; (iii) a safety envelope for mutating operations (preview, workspace jail, Git‑aware transactional apply); and (iv) a process‑reward functional from LSP facts (diagnostic deltas, disambiguation confidence, safe‑apply checks) that is computable online and replayable offline.

Project Repository Read the Paper

Motivation: Bridging the Agent–Server Gap

Raw LSP usage leaves four gaps for autonomous agents:

Determinism & Replay: stable ordering, content hashing, and version pinning (no “works on my machine”).
Robust Addressing: selectors that survive edits and resist positional drift; explicit indexing semantics.
Safety for Mutations: preview‑first, workspace jail, and transactional apply.
Process Supervision: intermediate, verifiable feedback correlated with task success.

Lanser‑CLI transforms interactive language‑server sessions into verifiable, replayable artifacts for protocol grounding: prefer machine‑checked facts over model speculation.

Architecture Overview

Lanser‑CLI orchestrator mediates agent ↔ LSP server ↔ workspace; emits deterministic bundles. Figure: The orchestrator mediates JSON‑RPC with a pinned LSP (e.g., Pyright) and emits bundles.

Environment Capture

Each bundle records: {serverVersion, positionEncoding, pythonExe, pythonVersion, venvPath, configDigest, platform}.

Contracts & Invariants

Deterministic list order: (uri, sL, sC, eL, eC) with stable tie‑breakers.
bundleId is a SHA‑256 over a JCS‑canonicalized subset (timestamps excluded).
Replay under frozen snapshot: identical inputs → identical bundles.

{
  "version": "1.3",
  "bundleId": "sha256:…",
  "request": {"cmd": "definition", "selector": {...}},
  "facts": {"definitions": [...], "hover": {...}},
  "environment": {"server":{"name":"pyright","version":"1.1.406"},
    "positionEncoding":"utf-16","python":{"version":"3.11.6"}},
  "meta": {"sorting_keys": ["uri","range[0]","range[1]","range[2]","range[3]"]}
}

Selectors & Repositioning

Agents address intent rather than bytes. The PositionSpec union supports cursor, range, symbolic, AST‑path, and content‑anchor forms with an optional docVersion.

// Structured PositionSpec (subset)
{ kind:"cursor", uri, line, col, indexing:"utf-16|utf-8|codepoint" }
{ kind:"range", uri, start:[l,c], end:[l,c] }
{ kind:"symbol", qualname:"pkg.mod:Class.method", role:"def|sig|body|doc", overload:0 }
{ kind:"ast", path:[["module","pkg.mod"],["class","C"],["def","m"]] }
{ kind:"anchor", uri, snippet:"def load_data(", ctx:24, hash:"sha1:…"}

Canonical strings for the CLI:

# Cursor / range
src/app.py@L42:C7
src/app.py@R(42,7->44,1)

# Symbolic
py://pkg.mod#Class.method:body
py://pkg.mod#function_name:sig

# AST path (subset)
ast://[module=pkg.mod]/[class=Class]/[def=method]/name[1]

# Content anchor (snippet + context N chars)
anchor://src/app.py#"def load_data("?ctx=24

Indexing Semantics

Lanser‑CLI negotiates positionEncoding with the server (default LSP utf‑16) and allows CLI I/O in utf‑8 or codepoint. When they differ, both are surfaced in verbose traces; bundles retain server coordinates.

Relocate Algorithm (deterministic)

Algorithm: Relocate(selector s, workspace W, optional snapshot v)
1. If v present and W has exact map(s, v): return [(map(s, v), 1.0)].
2. If s.kind ∈ {symbol, ast}: resolve via module graph + parser → candidates A.
3. If s.kind = anchor: fuzzy match k-grams within context → candidates H.
4. Score candidates by: 0.5·ast + 0.2·module + 0.2·tokenJaccard + 0.1·proximity.
5. Sort desc by (score, uri, range). If empty → E/NOT_FOUND.
6. If ties or low top-score: attach disambiguation evidence. Return top-k.

Process Rewards from Structural Signals

We expose a shaped reward computable online and replayable from bundles. Let $D_t$ be relevant diagnostics at step $t$, $S_t\in\{0,1\}$ indicate that safety checks pass, and $\alpha_t\in[0,1]$ be the top disambiguation confidence.

$$ r_t = \alpha\,(D_{t-1}-D_t) + \beta\,S_t - \gamma\,(1-\alpha_t), \quad \alpha,\beta,\gamma \ge 0. $$

Example weights $(\alpha,\beta,\gamma)=(0.5,0.4,0.1)$.

// Example 1: diag↓, safe, confident
D_{t-1}=5, D_t=2, S_t=1, α_t=0.94
r_t = 0.5*(3) + 0.4*(1) - 0.1*(0.06) ≈ 1.894

// Example 2: no improvement, unsafe, ambiguous
D_{t-1}=7, D_t=7, S_t=0, α_t=0.62
r_t = 0 + 0 - 0.1*(0.38) = -0.038

Under a frozen snapshot and fixed toolchain, reward monotonicity holds when ambiguity does not increase and safety is not violated; replay retrieves the same $r_t$ from bundles.

Editing & Safety Envelope

Preview‑by‑default: dry‑run produces diffs; apply requires --apply.
Workspace jail: edits confined to project root with allow/deny path filters.
Git guardrails: clean worktree required (override with --allow-dirty).
Atomic apply: temp files + fsync + rename(2); optional git apply --3way.
Indexing & encoding checks: dual coordinates surfaced in verbose mode.

Algorithm: PreviewThenApply (Guarded Rename)
1. Assert clean git or --allow-dirty.
2. prepareRename(s) → must pass; else error.
3. rename(s, newName) → WorkspaceEdit preview; emit unified diff.
4. If --apply: apply atomically (jail + filters). On conflict → E/APPLY_CONFLICT.
5. Notify server (didChange); emit success bundle.

Interface: CLI, Tracing, and Bundles

Navigation (read‑only)

# Definition / References / Hover / Symbols / Diagnostics
lanser def py://pkg.mod#Class.method:sig --json
lanser refs anchor://src/app.py#"def load_data("?ctx=24 --json
lanser hover src/app.py@L42:C7 --json
lanser symbols --json
lanser diag --json

Safe Mutations

lanser prepare-rename py://pkg.mod#load_data:def --json
lanser rename py://pkg.mod#load_data:def read_data --dry-run
lanser rename py://pkg.mod#load_data:def read_data --apply

Batch & Replay

# JSONL batch execution
lanser batch --in requests.jsonl --out bundles.jsonl --trace-file trace.jsonl

# Deterministic replay of a recorded trace
lanser trace replay --trace-file trace.jsonl --verify

Schema Contracts

# Export JSON Schemas for Selector DSL and bundle envelopes
lanser schema selector --out selector.schema.json
lanser schema bundle   --out bundle.schema.json

Bundle Envelope (with process reward)

{
  "version": "1.2",
  "bundleId": "sha256:…",
  "status": "ok",
  "request": {"cmd":"definition","selector": {...}},
  "resolution": {"original":"py://pkg.mod#load_data:def",
                 "resolved": {...}, "disambiguation":[...]},
  "facts": {"definitions":[...], "hover": {...}, "provenance":"lsp"},
  "edits": {"workspaceEdit": null, "diff": null},
  "processReward": {
    "version":"pr-v1","r":0.872,
    "components":{"diag_delta":1,"safety":1,"ambiguity_penalty":0.28},
    "weights":{"alpha":0.5,"beta":0.4,"gamma":0.1},
    "explanation":"Reward computed from deterministic bundle contents."
  },
  "environment": {"server":{"name":"pyright","version":"1.1.406"},
                  "positionEncoding":"utf-16","python":{"version":"3.12.0"}},
  "capabilities": {"partialResult": false, "cancellable": true},
  "meta": {"exit_code":0,
           "sorting_keys":["uri","range[0]","range[1]","range[2]","range[3]"]}
}

Citation

If you find this work useful, please cite:

@article{zhang2025lanser,
   title   = {Language Server CLI Empowers Language Agents with Process Rewards},
   author  = {Zhang, Yifan and Contributors, Lanser},
   journal = {arXiv preprint arXiv:2510.22907},
   year    = {2025}
}

Language Server CLI Empowers
Language Agents with Process Rewards

A CLI‑first, agent‑native orchestrator that pins an LSP server, normalizes its facts into deterministic, replayable bundles, and exposes a process reward for planner‑act loops.

Yifan Zhang · Lanser Contributors

Princeton University • October 24, 2025