Truth over validation.
Why we built Blocksyc.
AI assistants can be surprisingly responsive to how you phrase a question. Ask one whether a restaurant will be “busy” on Saturday and you might get one assessment. Ask whether it'll be “too packed” and you can get a different one, even though it's the same restaurant on the same day. When this happens, the model is responding to the loaded word in your prompt, not the question underneath. It isn't lying. It's reading what you want to hear and giving it back to you.
This is sycophancy: when a model leans toward telling you what you want to hear. It's a known issue, studied openly by Anthropic, OpenAI, and other labs. In April 2025, a major model update was even rolled back because users said it had started to feel manipulative.
Why it matters
Sycophancy isn't just annoying. It can quietly skew the work AI is actually good at. A model that mirrors your framing is less likely to flag the assumption you're missing when you're trying to make a decision. Asking “is my idea any good?” tends to get warmer feedback than asking “what could go wrong with my idea?”, even though the underlying question is the same. Loaded prompts can produce slanted research summaries. Asking a model “what do you think of this code?” often yields softer feedback than asking it to “find the issues.”
The cost usually isn't a wrong answer. It's a confidently delivered answer that's been shaped by what you implied you wanted, which is harder to catch than a flat error. Over time, that can shape which questions we trust AI to answer at all.
What Blocksyc does
Blocksyc routes your prompts through a system layer designed to reduce sycophancy in the underlying models. The layer asks the model to strip emotional framing before answering, lead with conclusions instead of burying them, push back when a premise is wrong, and skip the unsolicited validation (no “great question!”).
A separate evaluator runs alongside every response and rates how sycophantic the answer would have been without the filter. A shield badge surfaces the result on each reply. Click it to see the original response and a detailed score breakdown, so you can decide whether the difference matters for your use.
Limitations
No system can guarantee a perfectly honest model. Blocksyc's layer is designed to reduce sycophancy, not to eliminate it. The evaluator gives you a comparison and a rating, not a verdict. We refine the system layer and the rubric as we learn what they catch and what they miss. The goal is to surface a failure mode that's normally invisible and let you decide when it matters.
Sources
- Anthropic, Towards Understanding Sycophancy in Language Models (2023).
- OpenAI, Sycophancy in GPT-4o: what happened and what we're doing about it (April 2025).
