Metadata for 20 newspapers from six Eurasian post-socialist countries
Harvard Dataverse (Africa Rice Center, Bioversity International, CCAFS, CIAT, IFPRI, IRRI and WorldFish)
View Archive InfoField | Value | |
Title |
Metadata for 20 newspapers from six Eurasian post-socialist countries
|
|
Identifier |
https://doi.org/10.7910/DVN/1D3LXE
|
|
Creator |
Under blind review
|
|
Publisher |
Harvard Dataverse
|
|
Description |
This is a metadata for a corpus of 1.3 million newspaper articles from six Eurasian post-socialist countries and twelve IOs in the period 2018-2020. The file format is Rdata and can be loaded to R with load() function. Variables (columns) description. sent – sentiment, number of positive inclination words minus the number of negative inclination words divided by the number of words in the articles rsent – relative sentiment, sentiment of the article minus the average sentiment of all articles published in this newspaper un, wto, who, oecd, scun, nato, imf, ebrd, adb, aiib, wb, cc - one variable for each of the twelve IOs, equals zero if an IO is not mentioned in the article and 1+log(N) if an IO is mentioned N times in the article. dip_ru, dip_kz, dip_bel, dip_ukr, dip_pl, dip_hu - one variable for each of six countries, equals zero if no influential domestic politician is mentioned in the articles and 1+log(N) if influential domestic politicians are mentioned N times in the article. Thirteen topic dummies, the topic dummy equals one is an article was classified as discussing this topic, zero otherwise. The LDA topic modelling was conducted for k=30 topics. Topics were grouped by their analytical relevance and semantic similarity into thirteen categories of: politics and legislation (POL); economy, finance, various sectors of the economy (ECO); military, war, protests, crime, security threats (MIL); international affairs, specific issues concerning foreign countries (INT); technology (TECH); family issues, culture, sport, education (FAM); regional issues and housing (REG); health issues and the Covid-19 pandemic (HEA); media (MED); accidents (ACC); religion (REL); the Soviet Union (USSR); and articles for which no topic could be determined (MISC). Twenty newspaper dummies “n_newspaper_COUNTRY”, the newspaper dummy equals one if an article was published in this newspaper, zero otherwise. The twenty media outlets are: iz.ru, kommersant.ru, novayagazeta.ru, vedomosti.ru, informburo.kz, nur.kz, tengrinews.kz, zakon.kz, bdg.by, belgazeta.by, sb.by, kp.ua, segodnya.ua, vesti.ua, gazeta.pl, rp.pl, wpolityce.pl, index.hu, origo.hu, alfahir.hu. Six country dummies “c_COUNTRY”. Twelve month dummies and three year dummies indicate year and month when the newspaper article was published. |
|
Subject |
Computer and Information Science
Social Sciences post-soviet countries, International Organizations, influential domestic politicians |
|
Contributor |
Under blind review
|
|