Where does data come from?

AgentLoop is deliberately agnostic about which turns get logged for review — that's a product decision you make. The SDK's log_turn() accepts an arbitrary signals dict; whatever reaches it ends up in the review queue. Here are the patterns most teams combine.

1. Explicit user feedback

When: highest signal-to-noise. Always worth capturing.
Trade-off: catches only 1–5% of bad answers — most users don't click.
python
# Only log if the user took the time to say the answer was bad.
loop.log_turn_if(
    user_clicked_thumbs_down,
    question=q,
    agent_response=a,
    user_id=user.id,
    signals={"thumbs_down": True},
)

2. Low model confidence

When: the agent flags its own uncertainty — free data, no user action needed.
Trade-off: confident-wrong answers are the worst kind, and this misses them.
python
# If you can score confidence (logprobs, self-rating, etc.)
response = call_llm(question)
if response.confidence < 0.6:
    loop.log_turn(
        question=q,
        agent_response=response.text,
        signals={"confidence": response.confidence},
    )

3. Domain heuristics in your own code

When: best precision. You know your domain — AgentLoop doesn't.
Trade-off: requires real thought about what failure looks like.
python
# Examples — pick the ones that match your domain:

# User asked twice in a row about the same thing
if is_rephrase_of_previous(question, history):
    loop.log_turn(question=q, agent_response=a,
                  signals={"rephrase": True})

# Agent punted ("contact support")
if "contact support" in answer.lower() or "i don't know" in answer.lower():
    loop.log_turn(question=q, agent_response=a,
                  signals={"agent_punted": True})

# Agent made a factual claim worth auditing (price, quote, promise)
if mentions_price_or_quote(answer):
    loop.log_turn(question=q, agent_response=a,
                  signals={"factual_claim": True})

4. Downstream outcome signals

When: the highest-quality signal — did the agent's answer actually work?
Trade-off: requires application telemetry piped back. Usually a later-stage addition.
python
# Support agent said "try restarting"; user came back with same issue.
if user_returned_with_same_issue(session_id, within_hours=2):
    loop.log_turn(
        question=original_question,
        agent_response=original_answer,
        session_id=session_id,
        signals={"recurrence": True, "gap_hours": 1.5},
    )

# Coding agent suggested a change that got reverted fast.
if suggestion_reverted(suggestion_id, within_minutes=10):
    loop.log_turn(
        question=q,
        agent_response=a,
        signals={"reverted_quickly": True},
    )

5. Random sampling (or: log everything)

When: early-stage agents, regulated domains, or when you just want full visibility.
Trade-off: queue floods. Reviewers fall behind. TTL starts dropping unreviewed turns.
python
import random

# Sample 5% of all turns unconditionally.
if random.random() < 0.05:
    loop.log_turn(question=q, agent_response=a,
                  signals={"sample": True})

# Or log every single one — only safe for low-volume agents
# (< a few hundred turns/day). Unreviewed turns TTL out at 30 days.
loop.log_turn(question=q, agent_response=a)

A mature AgentLoop integration usually combines 2–3 of these:

Feedback widget

Every response can include a feedback_url. Embed it as a link or iframe in your UI — when the user clicks it, they see a minimal form. The submission arrives as an annotation in the dashboard, no login required (the URL is HMAC-signed with the annotation context).

Tip

The feedback URL is signed identically by the Python and JavaScript SDKs. A URL signed in one language validates in the other — feedback flows can cross language boundaries safely.