Record Details

The LGBTQ+ Minority Stress on Social Media (MiSSoM) Dataset: A Labeled Dataset for Natural Language Processing and Machine Learning

Harvard Dataverse (Africa Rice Center, Bioversity International, CCAFS, CIAT, IFPRI, IRRI and WorldFish)

View Archive Info
 
 
Field Value
 
Title The LGBTQ+ Minority Stress on Social Media (MiSSoM) Dataset: A Labeled Dataset for Natural Language Processing and Machine Learning
 
Identifier https://doi.org/10.7910/DVN/GPRSXH
 
Creator Cascalheira, Cory
 
Publisher Harvard Dataverse
 
Description Minority stress is the leading theoretical construct for understanding LGBTQ+ health disparities. As such, there is an urgent need to develop innovative policies and technologies to reduce minority stress. To spur technological innovation, we created the largest labeled datasets on minority stress using natural language from subreddits related to sexual and gender minority people. A team of mental health clinicians, LGBTQ+ health experts, and computer scientists developed two datasets: (1) the publicly available LGBTQ+ Minority Stress on Social Media (MiSSoM) dataset and (2) the advanced request-only version of the dataset, LGBTQ+ MiSSoM+. Both datasets have seven labels related to minority stress, including an overall composite label and six sublabels. LGBTQ+ MiSSoM (N = 27,709) includes both human- and machine-annotated la-bels and comes preprocessed with features (e.g., topic models, psycholinguistic attributes, sentiment, clinical keywords, word embeddings, n-grams, lexicons). LGBTQ+ MiSSoM+ includes all the characteristics of the open-access dataset, but also includes the original Reddit text and sentence-level labeling for a subset of posts (N = 5,772). Benchmark supervised machine learning analyses revealed that features of the LGBTQ+ MiSSoM datasets can predict overall minority stress quite well (F1 = 0.869). Benchmark performance metrics yielded in the prediction of the other labels, namely prejudiced events (F1 = 0.942), expected rejection (F1 = 0.964), internalized stigma (F1 = 0.952), identity concealment (F1 = 0.971), gender dysphoria (F1 = 0.947), and minority coping (F1 = 0.917), were excellent.
 
Subject Computer and Information Science
Medicine, Health and Life Sciences
Social Sciences
minority stress
gender minority
LGBTQ+
transgender
lesbian
gay
bisexual
queer
sexual minority
Reddit
discrimination
prejudice
stigma
stress
gender dysphoria
mental health
 
Language English
 
Date 2023-09-01
 
Contributor Cascalheira, Cory
Chapagain, Santosh
Flinn, Ryan
Klooster, Dannie
Laprade, Danica
Zhao, Yuxuan
Lund, Emily
Gonzalez, Alejandra
Corro, Kelsey
Wheatley, Rikki
Gutierrez, Ana
Villanueva, Oziel
Saha, Koustuv
De Choudhury, Munmun
Scheer, Jillian
Hamdi, Shah
 
Type natural language processing features
 
Source Reddit.com posts and comments from the following subreddits: r/gay, r/trans, r/ainbow, r/actuallesbians, r/genderqueer, r/bisexual, and r/questioning.