Data and code for the study of bullying

This page contains our data sets and code release for the scientific research of bullying.

Version 3.0: bullyingV3.0.zip (size 534950, released in June 2015). 7321 tweets with tweet ID, bullying, author role, teasing, type, form, and emotion labels.
- This version was described in: Junming Sui. Understanding and Fighting Bullying with Machine Learning. PhD thesis, Department of Computer Sciences, University of Wisconsin-Madison. 2015.
(Archived version) bullyingV2.0.zip (size 217680, released in September 2014). 1762 tweets with tweet ID, bullying, author role, and teasing labels.
(Archived version) bullyingV1.zip (size 19141, released in April 2012). Same tweets as in V2.0 but without tweet IDs.
- Versions 1.0 and 2.0 of this data set were introduced in the paper: Jun-Ming Xu, Kwang-Sung Jun, Xiaojin Zhu, and Amy Bellmore. Learning from bullying traces in social media. Proceedings of NAACL HLT 2012.

bullyingtraceV2.zip (size 425383, released in April 2016)
- Java source and ready-to-use jar files for performing the following classification tasks on input tweets:
  - classify input text as bullying trace or not; and if yes:
  - classify the tweet author's role in a bullying event
  - classify the tweet as teasing or not
  - classify the type, form and sentiment of the tweet.
- The classifiers in this version are SVMs with linear kernel, and are trained on Bullying Traces Data V3.0 (see above). For more information about its usage, please refer to the README file inside the zip file.
(Archived version) bullyingtraceV1.zip (size 156181, released in June 2012)
- Java source code and ready-to-use jar files to classify text input as bullying traces or not.
- This classification task was introduced as "Task A" in the paper: Jun-Ming Xu, Kwang-Sung Jun, Xiaojin Zhu, and Amy Bellmore. Learning from bullying traces in social media. Proceedings of NAACL HLT 2012.

Access to the data and code files below is conditioned upon your agreement to this agreement.