The LGBTQ+ Minority Stress on Social Media (MiSSoM) Dataset: A Labeled Dataset for Natural Language Processing and Machine Learning
Harvard Dataverse (Africa Rice Center, Bioversity International, CCAFS, CIAT, IFPRI, IRRI and WorldFish)
View Archive InfoField | Value | |
Title |
The LGBTQ+ Minority Stress on Social Media (MiSSoM) Dataset: A Labeled Dataset for Natural Language Processing and Machine Learning
|
|
Identifier |
https://doi.org/10.7910/DVN/GPRSXH
|
|
Creator |
Cascalheira, Cory
|
|
Publisher |
Harvard Dataverse
|
|
Description |
Minority stress is the leading theoretical construct for understanding LGBTQ+ health disparities. As such, there is an urgent need to develop innovative policies and technologies to reduce minority stress. To spur technological innovation, we created the largest labeled datasets on minority stress using natural language from subreddits related to sexual and gender minority people. A team of mental health clinicians, LGBTQ+ health experts, and computer scientists developed two datasets: (1) the publicly available LGBTQ+ Minority Stress on Social Media (MiSSoM) dataset and (2) the advanced request-only version of the dataset, LGBTQ+ MiSSoM+. Both datasets have seven labels related to minority stress, including an overall composite label and six sublabels. LGBTQ+ MiSSoM (N = 27,709) includes both human- and machine-annotated la-bels and comes preprocessed with features (e.g., topic models, psycholinguistic attributes, sentiment, clinical keywords, word embeddings, n-grams, lexicons). LGBTQ+ MiSSoM+ includes all the characteristics of the open-access dataset, but also includes the original Reddit text and sentence-level labeling for a subset of posts (N = 5,772). Benchmark supervised machine learning analyses revealed that features of the LGBTQ+ MiSSoM datasets can predict overall minority stress quite well (F1 = 0.869). Benchmark performance metrics yielded in the prediction of the other labels, namely prejudiced events (F1 = 0.942), expected rejection (F1 = 0.964), internalized stigma (F1 = 0.952), identity concealment (F1 = 0.971), gender dysphoria (F1 = 0.947), and minority coping (F1 = 0.917), were excellent.
|
|
Subject |
Computer and Information Science
Medicine, Health and Life Sciences Social Sciences minority stress gender minority LGBTQ+ transgender lesbian gay bisexual queer sexual minority discrimination prejudice stigma stress gender dysphoria mental health |
|
Language |
English
|
|
Date |
2023-09-01
|
|
Contributor |
Cascalheira, Cory
Chapagain, Santosh Flinn, Ryan Klooster, Dannie Laprade, Danica Zhao, Yuxuan Lund, Emily Gonzalez, Alejandra Corro, Kelsey Wheatley, Rikki Gutierrez, Ana Villanueva, Oziel Saha, Koustuv De Choudhury, Munmun Scheer, Jillian Hamdi, Shah |
|
Type |
natural language processing features
|
|
Source |
Reddit.com posts and comments from the following subreddits: r/gay, r/trans, r/ainbow, r/actuallesbians, r/genderqueer, r/bisexual, and r/questioning.
|
|