Question-Asking Compression

How It Works

Inspired by 20 Questions, QA compression turns model knowledge transfer into an interactive game. The small model identifies its own uncertainty, asks targeted questions, and uses the answers to self-correct. Only the binary answers travel over the wire.

Step 1 Initial Attempt (Small Model)
The small model attempts to solve the problem on its own, producing an initial answer that may contain errors or gaps.

Step 2 Generate Questions (Small Model)
The small model examines its own solution and formulates targeted yes/no questions about it: "Is my approach to step 3 correct?", "Did I apply the tax rate to the right subtotal?"

Step 3 Binary Answers (Large Model)
The large model answers each question with a single yes or no. Each response is exactly 1 bit of information. 10 questions = 10 bits total.

Step 4 Revised Answer (Small Model)
The small model incorporates all 10 answers and produces a revised solution. Because the questions are deterministic given the initial response, only the 10-bit answer string needs to be transmitted.

Key insight: The receiver reconstructs the full question-answer transcript by running an identical SLM locally. The communication cost is exactly N bits, independent of response length. The LLM does not even need to receive the questions, as it can perfectly simulate the SLM to generate them itself.

OpenRouter Key:

Your key is stored only in your browser's local storage. We do not store or log your API key or queries on the server — it is passed through to OpenRouter and discarded.

Model Configuration

Small Model

Question Asker

Large Model

Question asker uses same model as small model

Opus Reference Answer

anthropic/claude-opus-4.5

Waiting...

Haiku Initial Answer

anthropic/claude-haiku-4.5

Waiting...

Haiku Asks 10 Questions

10 bits

#	Question (Haiku)	Answer (Opus)

Haiku Revised Answer

incorporating 10 bits

Waiting...