Good for AI

Artificial Intelligence is the biggest threat to mankind, right? Even if robots aren’t taking over the planet by force, the yarn goes, computers will surely push us all into unemployment in the next decade or so. Right?

Let’s meet someone who can give us a slightly different perspective.

This is Joel, standing in front of his house, a few kilometers outside Gulu, Uganda, where he lives with his 14 brothers and sisters.

Joel works for Zillow, the leading online real estate marketplace in the US with 1.1B of revenue in 2017. His neighbors work for other big tech vendors in Silicon Valley such as Google, Microsoft, Walmart, Marriott, SalesForce, Yahoo, TripAdvisor, Box, Getty Images, eBay, and Deloitte. Not kidding.

And this is what Joel sees on his screen when he gets into the office.

Joel works for Samasource, one of a growing number of companies that provide AI organizations the thing they need most: data. Training data, to be precise. They need this, because, despite all the buzz, self-driving cars still need someone to teach them how to drive. More specifically on this example, while a self-driving car has a video stream, someone still needs to tell it which lines in the video stills represent a light pole and which ones represent the road markings. An algorithm needs to receive enough training images to make the distinction between the two for new images and video coming in. You want the car to hit the road rather than the light pole, after all.

So that’s where the businesses providing training data and data enrichment come in.

Samasource, the company Joel works for, is an example of a growing number of organizations and platforms that offer AI companies the opportunity to crowdsource the curation and enrichment of training data. This idea has been around for a while, and one of the more well-known pioneers was Amazon with its Mechanical Turk service, where companies could dump simple microtasks (annotating pictures, sentiment analysis, content moderation, etc) and anyone with a stable internet connection could make some money by executing these microtasks. Companies pay per task and Amazon takes a cut for the platform before paying the workers by the task.

This worked reasonably well because of Amazon’s scale, but there were still opportunities to improve on that model. One big issue with Amazon Mechanical Turk is that it’s casting a very wide net and barely provides any training for the workers it pairs with tasks, so quality is not guaranteed. They do offer plans in which each task is presented to at least two workers, and the result is only valid if they provide the same answer, but it’s not a managed, specialized workforce.

This is a gap that companies like Appen, Gengo and MightyAI try to fill by providing upfront training and strict quality guarantees for specific task sets. The picture below illustrates a use case from birds.ai, a company that uses drones to inspect renewable energy facilities. As their business grew, they needed to scale up their (then) manual labelling, but didn’t have nearly enough data to replace it with a reliable predictive model.

For the labelling to be accurate and actionable, the workers reviewing the images need to be subject matter experts on the types of damage, which requires a substantial amount of training. Also, these companies typically pay a per-hour wage rather than per-task, so there’s less pressure on quantity and more room for quality.

A second important ingredient to the business model of organizations like Samasource and CloudFactory is that they pay a lot more attention to their workforce. They call it Impact Sourcing, rather than crowdsourcing, and go to great lengths to make a difference in the communities they work with. They organize basic and digital literacy skills training, address childcare challenges that are typical for the audience they recruit from, and provide career coaching for workers who want to grow beyond the annotation work. Rather than just paying them cash, they really look at this from a life improvement and community welfare perspective. Samasource has a sales office in SF, NY and Paris, but the work happens in Uganda, Kenya and India. CloudFactory has workers in Kenya and Nepal.

Clearly, AI isn’t just taking away jobs, and while labelling images may not sound like a super exciting job, given the right context and support, these companies are proving that it can really elevate communities. An increasing number of companies advertise posh “Data for Good” and “AI for Good” initiatives in the context of corporate social responsibility programs, but I believe the admirable business model of companies like Samasource and CloudFactory ironically turns this around, to “Good for AI”. Because, really, not having any training data to begin with would be truly “Bad for AI.”

Companies like the ones mentioned here are creating jobs around the world, and we are too. If you’d like to join the ranks of the AI revolution, check out our Careers page at www.InterSystems.com/Careers.

Read more blog posts from Benjamin DeBoe.

Read the latest blog post on Data Matters.

Benjamin DeBoe

Benjamin is a product manager in the Data Platforms group at InterSystems, looking after the areas of scalability and analytics. He joined InterSystems in 2010 as part of the iKnow acquisition and has worked with various database technologies, mostly in the areas of data warehousing, natural language processing and anything analytics.

Leave a Comment

*