We get a lot of requests for advice from other startup founders trying to use Mechanical Turk for various things. Not all of these folks are great prospects for Houdini for various reasons, which means we frequently end up giving out a lot of general advice about how to get things going in Mechanical Turk directly. I have attempted to pin down some of our most common advice here.
You’ll get the most mileage out of these tips if you have spent a little bit a of time using Mechanical Turk already. Because Mechanical Turk can be used for so many different types of tasks – online research, data classification, transcription, translation, content moderation, etc. – there a number of best practices that are very application specific. But for this post I have tried to stick to the most general advice.
1. Make sure your job is appropriate for Mechanical Turk.
This may not seem like much of a tip, but we frequently find that the folks who approach us for help are trying to do something that Mechanical Turk was never meant to do. This generally comes from a poor understanding of the types of jobs that are suitable for Mechanical Turk. In a nutshell, Mechanical Turk is designed to allow you perform manual, but repetitive processes at scale. You will have the best outcomes when dealing with high volumes of quick, simple tasks.
Conversely, low volumes of one-off tasks (“e.g. Find 5 hotels located with 3 blocks of Union Square with rooms available March 3rd”) are generally not good candidates for outsourcing to mTurk. Tasks that are complex or very open-ended should also be avoided, unless they can be be broken down into simpler, more discrete tasks.
Good use cases for Mechanical Turk:
- Manually collecting data from a large volume of images or pdfs.
- Scraping specific pieces of information (e.g. prices, contact information) from a long list of websites.
- Classifying images, videos, tweets, etc. into discrete categories – e.g. indoors vs. outdoors, positive vs. negative.
- Researching information about a list of names or places (e.g. finding all of the LinkedIn pages or Twitter accounts for a list of individuals)
Tasks to avoid:
- Tasks that require special skills, knowledge or judgment – e.g. “Summarize the major points of U.S. foreign policy”
- Tasks that are interdependent, such as compiling lists – e.g. “Compile a list of 50 U.S. companies with women CEOs.” Compiling lists of things sequentially is problematic because it requires each worker to have knowledge of all of the prior responses.
- Tasks that don’t provide a clear process for completing the work – In general, you should be able to provide a fairly specific description of how the task should be completed in addition to the desired outcome. “Find Bill Gates’ home phone number” is a poor candidate for outsourcing to Mechanical Turk – if you can’t figure out how to complete the task, the average worker probably won’t either.
2. Run a small sample test before submitting a large volume of tasks.
Would you launch a high-volume website or application without testing it first? The same rules apply here. Frequently, the first iteration of your task setup will contain errors in the instructions or input fields that may not be immediately apparent. Taking the time to work out the bugs on a small test run will spare you the agony of having to throw out hundreds or thousands of results due to a small error in your task design. Running a test batch also gives you a chance to tweak your instructions and test out the pricing for your task (more on that later).
Once your test run goes live, search Mechanical Turk for your requester name (normally your first and last name) until you find your job listed. Review the task interface as it appears on mTurk and try completing a few HITs on your own to make sure that you can submit the tasks without any problems. Once you have had workers complete your test run, review the answers and check for any errors. If you find errors that appear to be common across multiple responses, consider providing additional details in your instructions. Its also a good practice to include a comment field in your form, which provides workers an opportunity to leave feedback about anything that may be difficult or confusing.
3. Prepare worker instructions carefully.
One of the best investments of time and energy that you can make is ensuring that your worker instructions are clear and detailed. Vague or ambiguous instructions are a recipe for disaster. Treat your instructions like you would treat copy on your website or landing page – make sure that it is clear, concise and optimized to maximize conversions (a “conversion” in this case being an accurate response). Give your instructions to a friend or co-worker who is unfamiliar with your task and ask them point out anything that is unclear or ambiguous before submitting it to Mechanical Turk. Consider the first iteration of your instructions a rough draft and use your test run(s) to iterate and improve.
Unless the process for performing the tasks is completely self-evident, provide step by step instructions of any actions the worker should perform and list any important guidelines or criteria upon which answers will be judged. Try to use plain language (e.g. use words like “opinion” rather than “sentiment”). Provide examples of good/bad answers within the instructions – include pictures or diagrams where appropriate. Also anticipate potential edge cases and explain how they should be handled. Consider providing an external link pointing to more detailed instructions or further examples.
4. Don’t set HIT prices too low. Or too high.
One of the ways that new requesters frequently get into trouble is by setting unreasonably low prices for tasks. If you view Mechanical Turk as simply an opportunity to take advantage of the most desperately low-cost labor available, then you are doing it wrong. Resist the urge to try to price every task at one or two cents. Take a look at http://turkernation.com/showthread.php?3247-How-much-money-is-everybody-currently-averaging-an-hour to get a sense of the range of typical hourly rates. You can compute a rough hourly wage by estimating the time it takes to complete a given task. For example, if it takes 30 seconds to complete one HIT properly, then a price of $0.10 per HIT works out to a rate $12/hour. Be realistic about the amount of time it takes to complete a single HIT – try completing a few while timing yourself to see how long it takes. Also expect to pay a higher per hour rate for writing tasks or anything that requires a higher degree of skill.
You can also try searching Mechanical Turk for tasks that are similar to what you are looking to do. If you need to have a bunch of images tagged, look for other image tagging tasks; if you need to collect contact names from a list of websites, look for other data collection tasks. If you can find tasks that are pretty similar, take their price and add 20-25%. Since Mechanical Turk is a marketplace, new requesters often need to pay a premium over existing requesters who are more familiar to workers. Workers also prefer tasks with steady volume, so if you are only submitting a small number of tasks at a time, you should expect to pay a higher rate than a requester with thousands of tasks available at any given time. On the flipside, don’t expect higher HIT prices to guarantee optimal work quality. Unless the task was under-priced initially, raising the HIT price is unlikely to improve the quality of the responses.
Once you have settled on a price for your test run, a simple way to gauge whether your task is priced appropriately is to keep an eye on how quickly your tasks get completed. If your tasks take a long time to complete, then you should consider raising your task price. However if your tasks are turned around quickly then raising your HIT price any further is not likely to provide any additional benefits.
5. Be prepared to deal with spam/errors.
One of the unfortunate realities of using Mechanical Turk – or any system that relies on human effort – is that people are not perfect. Just like in real life, you will undoubtedly find a large variation in the quality of individual workers. Various studies of worker performance on Mechanical Turk have shown average error rates ranging anywhere between 59% and 79%1 2. You may also have to deal with workers who intentionally provide bogus data or who complete tasks with a less than completely honest effort (“spammers”). Unfortunately, Mechanical Turk lacks a robust worker rating system, which leaves requesters with few ways of distinguishing the good workers from the bad.
A lot of what we spend our time on at Houdini is providing requesters with better ways to filter out poor workers and improve data quality. We track worker performance across a range of different task categories while applying machine learning to help us detect and filter out “spam” workers. Although you won’t be able to easily replicate the full range of techniques that we use inside Houdini (if you think you can, please email us at firstname.lastname@example.org ), however, there are a few tips you should know for dealing with workers:
a. Use multiple judgments where possible – One of the most basic quality assurance techniques we can use is to ask multiple workers to complete the same task and compare each of their results. The premise is simple: by asking two or more workers to complete the same task, we can increase our confidence that a given answer is correct. This technique is useful for questions that have discrete answers – image moderation, sentiment analysis and most basic classification tasks fit into this category. However writing tasks, transcription, research, translation or other open-ended tasks are not good candidates for this approach.
b. Take advantage of worker qualifications – Mechanical Turk also allows you set certain criteria that workers must meet in order to work on your HITs. You can require workers to have previously completed a minimum number of HITs and have achieved a minimum approval rate before being allow to work on your HITs. Although this can be useful on the margins, keep in mind that this is a fairly blunt instrument – a worker who has completed several hundred HITs doing tasks like image moderation or tagging wont necessarily be any good at writing or online research or transcription. However, if your tasks are at all dependent on english language skills (e.g. writing, transcription, sentiment analysis), you can frequently capture a noticeable improvement in average quality by limiting the worker location to the United States.
c. Be very careful about rejecting work – Although Mechanical Turk allows requester to reject any individual work assignment, this power should be used very sparingly. The best workers tend to guard their approval reputation zealously and will avoid requesters known to frequently reject assignments. Try to reserve rejections only for true spammers.
Incorporating all of these steps wont guarantee flawless work performance, but they should help you maximize your chance of getting decent results from Mechanical Turk. What other tips do you have for getting quality results out of Mechanical Turk?