As of 2024, AI-based systems are already redefining the fabric of our society... specifically, our careers. Crowd work and data annotation, once confined to “niche” status, have now permeated the mainstream. It’s not a far-fetched idea that one day we will all be data workers of some kind.
Recently, we held a Q&A session with Dr. Mark Whiting, one of our company advisors, for his thoughts on this evolving crowd work paradigm.
Dr. Whiting is a Senior Computational Social Scientist at the CSSLab at University of Pennsylvania, and his expertise lies in building systems to study how people behave and coordinate at scale. With numerous research papers under his belt, he is a distinguished figure in this field and a source of knowledge for many.
Together, we delved deep into an important conversation around challenges in the current crowd work ecosystem, worker incentive design, and the future of crowd work. Enjoy!
There is a growing sentiment on the internet that paid crowdsourcing platforms like mTurk are suffering from low quality work and unfair rejections. Why would you say this is happening?
There have been changes in quality for sure but I don't necessarily think that the starting point was great to begin with.
The situation of crowd work has always been a complicated one. Partially why it has been successful is because it's not regulated like normal labor. As a result, it can be cheaper than traditional labor, but a consequence of being cheaper is that you can't manage certain aspects of it, and you have to find ways to ensure quality that are kind of external to the platform.
Different platforms have embraced that issue in different ways. mTurk has things like the master's qualification, which is one way that they've tried to deal with this issue of law quality work. Some crowd workers have built communities and browser plugins that help them understand which work is worth doing and which requesters are providing what kinds of work and so on and so forth.
Additionally, on the requester side, people have implemented different systems to ensure that workers are doing what they need to and that they're paid in a cost effective way.
These diverging motives and interests caused friction in the ecosystem, resulting in the quality of work to deteriorate in some scenarios. Crowd work is an industry that is ripe with complexity around the socio-dynamics of this kind of labor. This is what has caused challenges in terms of quality and the overall experience for both requesters and workers.
What key challenges would you say are arising as a by-product of the current crowdwork system?
There are two major challenges that come to mind:
One is that crowd workers are using more advanced scripts and techniques to get through their work quickly and compromise on quality in the process.
There have been scripts out there for many years that help crowd workers get through certain tasks, sometimes in a way that is very good. If they have to answer the same kinds of survey questions frequently, maybe a script that helps them answer those questions more consistently could be valuable for them and requesters.
But sometimes these hacks are used in not-so-great ways, like if they're just using some content generation tool, then the responses may no longer be high quality.
Secondly, the type of labor has changed. It used to be more diverse labor. And now more and more of the traditional jobs are being done by automatic systems, like large machine learning systems. And more and more of the labor of the platform is just training data. There are many of these annotation platforms now.
It really leans on the fact that we have a massive amount of training data that we need to produce to maintain the modern machine learning complex, as we might call it. It leads to a situation where the work is kind of less interesting and less fun.
How would partnering with a platform that champions a communicative and worker-centric approach benefit requesters?
One benefit in particular is that data workers understand the ins and outs of the market in a way that the market owner does not.
Part of the valuable secrets that you keep are how you get workers to do a certain task. And I think workers have great insight on that because they're doing their task and they have micro-feedback like “I wish you moved this button a little bit. And then I wouldn't have to move my mouse as far” or “I wish you made a keyboard shortcut for this.”
As the owners of the platform, they don't have transparency to that. That's complex experiential learning, right? It's not something that you can just see from the outside, it is something you really need to talk to workers to understand. In that regard, there's this massive benefit of embracing community at multiple levels and making communication easy.
I think beyond just embracing community, also having a spirit of changing in the face of updated knowledge about what's working is really important. Even building a flow of trust that conveys to everyone involved, workers and requesters, that you're working hard to embrace those changes and constantly evolve with time.
You have mentioned how “crowdsourcing platforms have prioritized requesters, leading to a fragmented worker pool.” What would you say is a viable solution to this problem in terms of worker incentive systems?
I think being smart about the design of incentive systems is crucial because incentive systems are essentially the bread and butter of social fabric, right? What we feel comfortable doing in the world is heavily dependent on what we're incentivized to do.
One of the properties of the design of incentive systems is that they usually get in people's way. And as an incentive system emerges, some people will be dissatisfied by the constraints that it imposes. And I think part of having that healthy communication with your community is having a point of discourse that is fluid enough that when you introduce an incentive constraint that might frustrate some people, they can appreciate why it needs to be there.
Often incentives are somewhat disconnected from the experience on the ground. So a platform must think about how to make those incentives connect with workers better.
Do you think there is any merit to workers creating their own communities vs. addressing all their concerns with the platform directly? What do you think about integrating more of a community-centric approach to data work platforms?
In the crowd work environment, questions and especially complaints by workers often go to an email inbox that is unread.
To your more specific question about whether integrating some community features can resolve that? I definitely think it's a step in the right direction. But I think there's two counterpoints.
One, creating and running communities is hard work. This is a full-time, labor -intensive activity, where you will deal with complex political interplays between different groups within a community. And that quickly gets very hard. And that's one of the reasons that MTurk doesn't do it, or at least doesn't do it by the level that people might wish that they did. It's that it's extremely hard to build and maintain communities.
Secondly, I think that there is legitimate value in letting people talk to each other without being watched. So yes, having a very supportive platform is great. But I think there is also a place for off-platform communication that I think I would actively encourage.
How else would you say that platforms can promote high quality work besides incentives?
Besides incentives, there is another component we haven't talked about yet, which is identity.
Creating an environment where work feels purposeful and fostering a community that really cares about doing excellent work is key.
I think that using your internal reputation as an incentive component is a great idea. Letting other people see your performance in the form of rating or even as a form of social reputation. Those are two separate things, right? One would be anonymous, like you get anonymously rated by your peers. And the other is sort of public, where like your peers see your work and they can see if you're not good at it.
Can you elaborate on how implementing an internal reputation system or leaderboard would incentivize workers?
I don't want to rest on one particular idea, but something like a leader board that shows who's performing well is one way that helps people compete to sort of be “seen”, but also helps provide recognition. Like, you know, this person who's a top worker, this is an example of their task and their work output.
And you can do that very explicitly, right? You could write internal blog posts about who the best workers are and what makes them the best. Or you can do it implicitly where the top one percent of work is shared publicly no matter what. That way, people can see what the best work looks like and they can model it.
You can think about this like a systematic design that happens kind of without too much intervention, or you can think about it in terms of goal setting and culture building.
Would it be optimal to further segregate the leaderboards and communities for each specialization or just have a general labeler community?
I definitely think that there's an extent to which some of the things that make people excellent are general. And then there's an extent to which some of the things that make work excellent are very specific to the particular types of work being done. So you can have both a general worker community and sub-communities, although remember this will require more effort to maintain.
As long as you have more than one worker doing tasks for a person or a particular job, you can immediately start creating shared quality signals.You can also have workers rating other workers to try and help the requester understand quality in that debate.
How can companies develop a robust Quality Assurance (QA) process to assure requesters that they meet their standards? Also, can you briefly share your opinion on AI-assisted QA?
Broadly speaking, leveraging AI is great, although people tend to be a little worried that the AI is going to make bad choices and that they're controlled by a system that they don't have introspection on.
In general, providing feedback and support is almost always stronger than providing a direct rating shift or something like that, unless it's very subtle. The other thing that I think is good to think about in QA is peer review, which is an excellent technique, because workers can catch other weaknesses best. They can spot the cheating. They can spot other things going on better than an outside party can.
Also, I think people are much more comfortable thinking that somebody is going to review them and that that review will be a more meaningful signal than an AI. Even if the accuracy is the same, I think there's just sort of this notion that AIs are sort of impenetrable, whereas the human is a little bit more sort of relatable.
How do you envision the future of crowd work? What types of companies are poised to succeed in this evolving space?
Well, one of the short-term challenges is that supporting worker interests is going to cost more eventually, which most companies may be unwilling to adapt to. As a total ecosystem, a system that cares about people is going to cost more than a system that doesn't care about people. I believe the winners in this space will be the companies that recognize this gap. Based on my research, a worker-centric model is likely to lead to higher quality work, a better society, better livelihoods for everyone, workers and requesters.
Given my relationship to Pareto.AI, I think it's probably clear that I think you guys are approaching crowd work in a conscious and meaningful way. I think Pareto has potential to unlock something about labor markets because of this focus. I think there's a huge amount of value left on the table in modern labor markets because of this lack of focus on the participants in the ecosystem.