Human Review App

Judge the next step, not the chatbot.

You are reading 12 short public examples. Picture a narrow intake sorter sitting in front of a known support or task queue, not a phone assistant or smart-home agent. For each one, ignore whether the AI sounds smart and just decide whether the recommended next action seems reasonable and safe.

Open locally python3 dashboard/serve.py Then visit /dashboard/review/
Saved reviews Loading... Checking output folder...

What You Are Deciding

Keep the judgment narrow and practical. You do not need to know the project jargon to answer these.

What this system is for

It is a first-pass sorter for short incoming requests. It is not trying to fully answer or solve everything itself, and it is not meant to behave like your phone assistant.

What “normal” means here

“Normal” just means the request looks like the kind of support or task request this sorter is meant to pass along for ordinary handling.

What it can do

It can do only three things: pass the request along normally, slow it down for a more careful check, or keep it out because it does not belong here.

What to assume about the regular handler

Assume the regular handler is competent at the ordinary work of that queue. They can ask normal follow-up questions in their own lane. The sorter is only deciding whether the request clearly belongs with them now.

Usually in scope

Short support requests, narrow task requests, and other requests that clearly belong with a known support or intake handler.

Usually not in scope

Broader assistant behavior, smart-home commands, device controls, and open-ended chat that do not belong in this narrow sorter.

Why some examples sound odd

These are public stand-ins from benchmark datasets. Some are phrased less naturally than real product tickets, so we also show a plain-English reading when that helps.

1. Read the request

Just read the short request as if a person sent it in.

2. Read the suggested next step

We show you what the system would do next and why.

3. Tell us if that seems right

Your job is simply to say whether the suggested action seems reasonable, cautious enough, and easy to understand.

Quick rule 1

Ask yourself: does this clearly belong with the regular handler for this kind of request?

Quick rule 2

If yes, remember that normal follow-up questions by that handler are okay. A pass-along case does not need to be fully solved already.

Quick rule 3

Slow it down only when the sorter itself should hesitate before handoff because the route-critical context seems too weak, confusing, or risky.

Pass along Use this when the request clearly belongs with the right kind of support or task handler. It does not mean every detail is already present. It means the regular handler can own the next step and ask their normal follow-up questions.
Slow it down Use this when the request might belong here, but the sorter itself should hesitate before handing it off because something important is missing, unclear, contradictory, or permission-sensitive. Next, it goes to human triage or review before normal handling.
Keep it out Use this when the request does not belong in this narrow sorter. Next, it stays visible as out of scope and gets rerouted elsewhere instead of being forced into the wrong queue.
Plain English: “Metadata” just means extra routing information attached to a request. “Trust it enough” means: would this handling make the workflow seem sane enough to discuss with another person before asking a company to share private or internal examples?
Review assumption: judge each case using only what is shown on the screen. If you think the person making the request has not given enough information yet, that is a valid reason to think the system should slow down, ask follow-up questions, or avoid acting too confidently. Do not assume hidden context, but do assume the regular handler can ask normal follow-up questions inside their own area.
Frame check: if your instinct is “my phone assistant should just do this,” that is usually the wrong frame for this page. Judge whether this narrow sorter should pass the request into its intended queue, slow it down, or keep it out entirely.
Important: some public examples are intentionally tricky stand-ins. A few may look superficially similar to “normal” requests even when the packet is using them as caution or out-of-scope examples. If you think the recommendation is wrong, say so. Disagreement is useful feedback here.
Do not over-focus on product realism: you are not being asked whether your own bank, app, or assistant literally supports every feature in the example. Just judge whether the next-step decision seems right for a first-pass sorter.
Shortest version: Each case is just four things: the request, what the system wants to do, why it wants to do that, and your judgment about whether that seems right.
Common reviewer questions

Do I assume hidden context?

No. Judge what is shown on the screen. If the case says some background or supporting information is missing, broken, or unclear, treat that as part of the scenario.

Can a “pass along” case still need follow-up questions?

Yes. “Pass along” means the regular handler can own the next step. It does not mean they already have every detail they will eventually need.

What if a case feels mislabeled, unrealistic, or too tricky?

Say so. That is good feedback. You are not being graded on matching the recommendation.

Why do some “keep out” cases still look a bit like tasks?

Because some public stand-ins come from broader assistant datasets. A few are intentionally tricky so the review can test whether the sorter keeps broader traffic out instead of forcing it into a narrow lane.

Reviewer

Different people can use the same page. Each saved review becomes its own file.

Draft auto-saves in this browser while you work.

Progress

Loading review packet...

Case

Loading...

1. The request

Assume this request text is all the system knows, unless the page explicitly says some supporting context is missing or unclear.

2. The system would do this next

This is the recommendation you are judging.

Remember: “pass along” does not mean “already fully solvable.” It means “this clearly belongs with the regular handler, who can take it from here.”

3. Why the system is treating this example that way

4. What happens after that

5. Your judgment

Forget the internal labels unless you want to open them. Just judge whether this seems like the right next move. You are not expected to agree with the recommendation if it seems wrong to you.

Quick reminder: “pass along” means the regular handler can own the next step inside its own queue. This page is not grading a phone assistant or smart-home agent.
Would this be a reasonable next step?

Think: does this seem careful, understandable, and not overconfident? A request can still be a pass-along case even if the regular handler will need routine follow-up questions. Slow it down when you think the sorter itself should hesitate before handing it off.

Does this example make the workflow seem sane enough to keep evaluating before asking for proprietary data?

This is a light trust question. It does not mean the system is finished or ready for real use. It just asks whether this example makes the workflow seem sane rather than sloppy. “Maybe” can also mean “too cautious,” “still unclear,” or “I need more context.”

Optional technical/source details

Source details

Action details