Whose slop is it anyway?
A mini series on the stakeholders of workslop and their incentives
Disclaimer: The opinions stated here are my own, not necessarily those of my employer.
You ask a colleague for a detailed analysis about a topic. To your surprise, they send you a document within the hour. The subject line says “All done. Please take a look and share your feedback.” It’s three pages long, neatly formatted with section headers, data insights and a clear recommendation. As you read through the document, you realize that every new section makes less sense than the last one. The data was pulled from the web, instead of internal sources. Adjectives like “robust”, “scalable” and “synergistic” are thrown around completely out of context. You stare at the final recommendation, which looks devoid of the very nuance you had gone to your colleague for in the first place. How would this make you feel about working with them again? Apparently, this situation is less hypothetical than previously imagined. (Although it is completely hypothetical in my case!)
A recent study found that almost 40% of surveyed employees had received “workslop” from their peers, which in turn reduced productivity and eroded trust. The study defined “workslop” as “AI generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task.” I love this definition, except for its specificity around AI-generated content. The workplace has had meaningless content masquerading as good work since time immemorial. And while GenAI is highly effective at creating workslop, it’s not the root cause. To really explain the origin of workslop, we need to look at the intentions of three different entities: (1) the employees that use GenAI, (2) organizations that mandate AI usage, and (3) the people who build GenAI products. In this essay, we dig into how product builders (3) became a part of the problem and how they can try to clean up their mess.
Building at the current level of quality
AGI might arrive in 2027 (the popular timeline) or in 2035 (the I, Robot timeline) or much later, but it isn’t here yet. Product builders need to make their peace with this fact and aim to build useful products at the current level of quality. A multi-step agent that can perform each step at 85% accuracy might sound like a great idea. But if user journeys take four sequential steps to complete a task, you have built something that works basically half the time (0.85^4 = ~52%)! As Karpathy recently mentioned in a podcast, building software is a “march of the nines”, where sweat and blood are spilled in moving reliability from 90% to 99% to 99.9% and so on. Despite what the hot takes on LinkedIn and Twitter might suggest, agents have barely reached the “first nine”. But the march will not take place unless there is honest introspection about the current level of quality.
Co-creation before automation
An honest introspection would point at the first paradigm flaw we are now seeing in agents.
Everyone loves throwing around analogies to explain how good agents are — from “junior engineer” to “PhD level intelligence”. But even a high school intern would know when to ask questions before proceeding, while most agents don’t. In the tradeoff between seeking confirmation and seamless automation, most agents err on the wrong side. “Forgiveness over permission” is a great philosophy to accelerate the process of building products, but it is counterproductive to incorporate into your product’s user journeys. No one doing serious work wants a large chunk of text filled with garbage, followed by a set of sycophantic “Oh yes, you’re right!” replies to every mistake being pointed out.
Agentic products need to encourage co-creation with humans in their workflows, instead of promising magical automation. They should ask humans when they have insufficient info, create drafts or early plans to seek feedback before creating large outputs. They need to ask the right questions before proceeding with important tasks. A recent study by OpenAI found that hallucinations were largely explained by models being incentivized to always give an answer instead of pushing back or asking clarifying questions. Agents that are built on this misaligned incentive go on to churn out unusable slop.
Better interfaces for verification and revision
Coding agents don’t just keep adding to their previous output. They offer a changelog with every revision and checkpoints to revert your code to — standard version control capabilities. It’s surprising that almost every other agent that creates artifacts like docs, sheets or any long-form output, still doesn’t provide users with easy ways to manage their output.
We are already at the absolute limits of the chat user interface; instead of squeezing it further to accommodate more capabilities, product builders need to rethink the canvas from the perspective of the user. I’m not sure exactly what the interface needs to look like, but the agent needs to walk me through its work, seek feedback and make precise changes — this entire activity should feel less like an annoying back-and-forth, and more like a productive jam session.
Disclaimer: The opinions stated here are my own, not necessarily those of my employer.



Nice read!
Was also reading about hallucinations and one of the reasons for that is - humans making up crap and feigning knowledge rather than accepting that they don't know! Co-creation would definitely be useful