Where does data come from?

AgentLoop is deliberately agnostic about which turns get logged for review — that's a product decision you make. The SDK's log_turn() accepts an arbitrary signals dict; whatever reaches it ends up in the review queue. Here are the patterns most teams combine.

1. Explicit user feedback

When: highest signal-to-noise. Always worth capturing.

Trade-off: catches only 1–5% of bad answers — most users don't click.

python

# Only log if the user took the time to say the answer was bad.
loop.log_turn_if(
    user_clicked_thumbs_down,
    question=q,
    agent_response=a,
    user_id=user.id,
    signals={"thumbs_down": True},
)

2. Low model confidence

When: the agent flags its own uncertainty — free data, no user action needed.

Trade-off: confident-wrong answers are the worst kind, and this misses them.

python

# If you can score confidence (logprobs, self-rating, etc.)
response = call_llm(question)
if response.confidence < 0.6:
    loop.log_turn(
        question=q,
        agent_response=response.text,
        signals={"confidence": response.confidence},
    )

3. Domain heuristics in your own code

When: best precision. You know your domain — AgentLoop doesn't.

Trade-off: requires real thought about what failure looks like.

python

# Examples — pick the ones that match your domain:

# User asked twice in a row about the same thing
if is_rephrase_of_previous(question, history):
    loop.log_turn(question=q, agent_response=a,
                  signals={"rephrase": True})

# Agent punted ("contact support")
if "contact support" in answer.lower() or "i don't know" in answer.lower():
    loop.log_turn(question=q, agent_response=a,
                  signals={"agent_punted": True})

# Agent made a factual claim worth auditing (price, quote, promise)
if mentions_price_or_quote(answer):
    loop.log_turn(question=q, agent_response=a,
                  signals={"factual_claim": True})

4. Downstream outcome signals

When: the highest-quality signal — did the agent's answer actually work?

Trade-off: requires application telemetry piped back. Usually a later-stage addition.

python

# Support agent said "try restarting"; user came back with same issue.
if user_returned_with_same_issue(session_id, within_hours=2):
    loop.log_turn(
        question=original_question,
        agent_response=original_answer,
        session_id=session_id,
        signals={"recurrence": True, "gap_hours": 1.5},
    )

# Coding agent suggested a change that got reverted fast.
if suggestion_reverted(suggestion_id, within_minutes=10):
    loop.log_turn(
        question=q,
        agent_response=a,
        signals={"reverted_quickly": True},
    )

5. Random sampling (or: log everything)

When: early-stage agents, regulated domains, or when you just want full visibility.

Trade-off: queue floods. Reviewers fall behind. TTL starts dropping unreviewed turns.

python

import random

# Sample 5% of all turns unconditionally.
if random.random() < 0.05:
    loop.log_turn(question=q, agent_response=a,
                  signals={"sample": True})

# Or log every single one — only safe for low-volume agents
# (< a few hundred turns/day). Unreviewed turns TTL out at 30 days.
loop.log_turn(question=q, agent_response=a)

Recommended stack

A mature AgentLoop integration usually combines 2–3 of these:

Always: explicit user feedback (1)
Usually: a handful of domain heuristics (3)
Sometimes: downstream outcome signals (4) — if you have the telemetry
Rarely: low confidence alone (2) — combine with others
Rarely: log-everything (5) — only early-stage or regulated

Every response can include a feedback_url. Embed it as a link or iframe in your UI — when the user clicks it, they see a minimal form. The submission arrives as an annotation in the dashboard, no login required (the URL is HMAC-signed with the annotation context).

Tip

The feedback URL is signed identically by the Python and JavaScript SDKs. A URL signed in one language validates in the other — feedback flows can cross language boundaries safely.

Where does data come from?

1. Explicit user feedback

2. Low model confidence

3. Domain heuristics in your own code

4. Downstream outcome signals

5. Random sampling (or: log everything)

Recommended stack

Feedback widget