Toxicity and Hate Speech

Download the data (raw, processed)

Toxicity and Hate Speech Detection is an NLP task with detecting toxic content and hate speech on online platforms. Hate speech is "usually thought to include communications of animosity or disparagement of an individual or a group on account of a group characteristic such as race, colour, national origin, sex, disability, religion, or sexual orientation". Training NLP systems is important to catch unfettered toxicity and hate speech content online and making it a safe space for everyone.

Toxicity, however is very culture- and region-specific phenomenon. What is considered (or perceived as) toxic by one group may not even be toxic for the other set of people especially ones who do not belong to the targeted group. For example, while racial-based toxicity is prevalent in the US, India has caste-based toxicity and China has religion-based toxicity.

In NLPositionality, we measure the demographic alignment of existing systems and datasets which model toxicity. We specifically look at GPT-4, the Perspective API, Rewire API and ToxicBERT models and Dynahate dataset. Click on the above buttons to explore the alignments.

No. of LITW annotations per day for Toxicity and Hate Speech