Contribution to Resources

Euphemistic Abuse

An English dataset for euphemistic abuse (e.g. You inspire me to fall asleep) created via crowdsourcing.

Implicitly Abusive Remarks about Identity Groups

A German and an English dataset for distinguishing between implicitly abusive remarks among identity groups (e.g. Muslims contaminate our planet) from otherwise negative remarks (e.g. Muslims despise terrorism). The repository containing these datasets also includes a new lexicon for detecting perpetrators and a fine-grained verb lexicon for establishing the sentiment of the agent towards the patient.

Implicitly Abusive Comparisons Dataset

Dataset of 1000 comparisons comprising 500 abusive comparisons (e.g. You run like a headless chicken) and 500 non-abusive comparisons (e.g. Your face is pale as a sheet). The comparisons have been created via crowdsourcing.

Multilingual Lexicon of Abusive Words Induced with Emojis

Three large lexicons of abusive words (English, German and Portuguese). The lexicons have been induced with the help of tweets in which predictive emojis (e.g. middle finger) occur.

Derogatory Compounds Dataset

Dataset of 3,500 German compounds divided into derogatory compounds (e.g. booze hound) and non-derogatory compounds (e.g. fox hound).

GermEval 2019 (Task 2)

Follow-up shared task on abusive language detection for German. The shared task comes with an extended dataset from GermEval 2018.

First shared task on abusive language detection for German. The shared task comes with a large dataset (5000 training and 3500 test instances) comprising tweets from Twitter.

Lexicon of Abusive Words

A large lexicon of English abusive words bootstrapped from a small set of manually labeled negative expressions (annotated as either abusive or not-abusive). Currently classifiers based on such lexicons produce by far best results in cross-domain classification.

German Verbal Polarity Shifers

A large list of German verbal polarity shifters. This resource has been bootstrapped from a small base lexicon of German verbal shifters a a large set of English verbal shifters using a crosslingual approach.

Sense-level Lexicon of Verbal Shifters

A complete sense-level lexicon of English verbal polarity shifters and their shifting scope. Our lexicon covers all verbs of WordNet v3.1 that are single word or particle verbs. Polarity shifter and scope labels are given for each lemma-synset pair (i.e. each word sense of a lemma).

Annotated Corpus for Disambiguating Verbal Shifters

A gold standard of 2000 labeled sentences where each sentence contains a mention of an ambiguous shifter (e.g. spoil). The sentence label indicates whether the usage of the ambiguous conveys shifting (as in spoil the chances of success) or not (as in spoil a child) in that particular sentence.

A Large Word List of English Verbal Shifters

A list of about 1000 verbal shifters. Shifters, such as abandon, are similar to negations (e.g. not) in that they move the polarity of a phrase towards its inverse, as in abandon all hope. This resource has been bootstrapped from a small base lexicon in which a random sample of 2000 verbs from WordNet have been manually annotated.

Negation Modeling for German Sentiment Analysis

A data set focusing on the scope of German negation and a rule-based tool that automatically detects the scope of a wide range of different negation words. The tool also supports sentence-level polarity classification. Negation modeling is incorporated in that classifier.

Morphologically Complex Words

A data set comprising about 9000 complex polar expressions (e.g. compounds) along their polarity label. This resource also includes very rare complex expressions (taken from Wortwarte.de) along their polarity label and morphological analysis.

LINK to resource

LINK to paper

German Opinion Role Extractor

This software is designed for the extraction of subjective expressions, sentiment sources and sentiment targets from German text. It has been developed according to the specification of the STEPS Shared Task (see below). The tool comes with pre-processing scripts (i.e. part-of-speech tagging, named entity recognition and syntactic parsing).

LINK to resource

STEPS Shared Task 2016

2nd iteration of the Shared Task on Source, Subjective Expression and Target Extraction from Political Speeches. Annotation guidelines were heavily refined. New annotated data were produced.

LINK to website

LINK to data

LINK to proceedings

Opinion Compound Dataset
Resource comprising German compounds (e.g. Expertenmeinung or Kinderlärm) that have been annotated with regard to opinion roles. Release comprises two datasets: one dataset comprising 2000 opinion compounds in which the modifier is annotated as either conveying some opinion role or none; 1000 opinion compounds in which the modifier is annotated as either conveying an opinion holder or an opinion target.
LINK_to_resource
LINK_to_publication

Verb View Lexicon
Resource that classifies all opinion verbs from the English Subjectivity Lexicon and from the German Zurich Sentiment Lexicon according to their sentiment views. Each verb is categorized in one of three view categories. Categories are inspired by the different argument positions an opinion holder can assume. The categories are: agent view, where the opinion holder is realized as the agent of the opinion verb (e.g. love, hate, think), patient view, where the opinion holder is realized as the patient of the opinion verb (e.g. please, disappoint, surprise), and speaker view, where the opinion holder is the implicit speaker of the utterance (e.g. succeed, cheat, lie).
LINK_to_resource
LINK_to_publication

MLSA: A Multi-Layered Reference Corpus for German Sentiment Analysis
This corpus consists of 270 sentences manually annotated for objectivity and subjectivity (Layer 1), word and phrase polarity (Layer 2) and expressions of private states (Level 3).
LINK_to_resource

LINK_to_publication