Data and code for the study of bullying
This page contains our data sets and code release for the scientific research of bullying.
Bullying Traces Data Set
-
Version 3.0: bullyingV3.0.zip (size 534950, released in June 2015). 7321 tweets with tweet ID, bullying, author role, teasing, type, form, and emotion labels.
-
(Archived version) bullyingV2.0.zip (size 217680, released in September 2014). 1762 tweets with tweet ID, bullying, author role, and teasing labels.
-
(Archived version) bullyingV1.zip (size 19141, released in April 2012). Same tweets as in V2.0 but without tweet IDs.
Code to Recognize Bullying Traces
-
bullyingtraceV2.zip (size 425383, released in April 2016)
- Java source and ready-to-use jar files for performing the following classification tasks on input tweets:
- classify input text as bullying trace or not; and if yes:
- classify the tweet author's role in a bullying event
- classify the tweet as teasing or not
- classify the type, form and sentiment of the tweet.
- The classifiers in this version are SVMs with linear kernel, and are trained on Bullying Traces Data V3.0 (see above). For more information about its usage, please refer to the README file inside the zip file.
-
(Archived version) bullyingtraceV1.zip (size 156181, released in June 2012)
- Java source code and ready-to-use jar files to classify text input as bullying traces or not.
- This classification task was introduced as "Task A" in the paper:
Jun-Ming Xu, Kwang-Sung Jun, Xiaojin Zhu, and Amy Bellmore.
Learning from bullying traces in social media.
Proceedings of NAACL HLT 2012.
Use Agreement
Access to the data and code files below is conditioned upon your agreement to this agreement.