18. Content moderation as a machine learning competition
Every platform with user-generated content needs at least some automated content moderation.
Problem
Any platform that allows users to post text, images, videos or audio will quickly get taken over by spam, inappropriate content, harrasment, and even illegal content.
Content moderation by humans is time consuming and it is hard to keep up with all uploaded content for even small websites. Machine learning models are often used to at least pre-filter obviously inappropriate content or obviously OK content - leaving humans to look at the harder cases.
Solution
Content moderation as an on-going machine learning competition. Public datasets are provided for training, and hidden datasets are used to score models.
Using many models together (called an ensemble) often ends up being a better model than even the best model you put into the mix. Much the same as humans: if you want to estimate how many marbles are in a jar ask a hundred people and average it.
When models are used in production as part of this ensemble, the model authors get a share of the profits.
Business Model
You provide automated content moderation as a service, where users upload their content which you rank on various scales (e.g. Kids-friendly to NSFW to NSFL), users pay a small fee per item ranked.
A part of this fee is paid out to the best model creators.
Inspiration
Numerai built such a platform but for predicting the stock market: it is an on-going competition and in the end the best models are combined to build a super-model. Competitors get compensated if they do well in the competition. This idea is just an extension of that idea into a different domain.