Record Details

Metadata for 20 newspapers from six Eurasian post-socialist countries

Harvard Dataverse (Africa Rice Center, Bioversity International, CCAFS, CIAT, IFPRI, IRRI and WorldFish)

View Archive Info
 
 
Field Value
 
Title Metadata for 20 newspapers from six Eurasian post-socialist countries
 
Identifier https://doi.org/10.7910/DVN/1D3LXE
 
Creator Under blind review
 
Publisher Harvard Dataverse
 
Description This is a metadata for a corpus of 1.3 million newspaper articles from six Eurasian post-socialist countries and twelve IOs in the period 2018-2020. The file format is Rdata and can be loaded to R with load() function.



Variables (columns) description.


sent – sentiment, number of positive inclination words minus the number of negative inclination words divided by the number of words in the articles


rsent – relative sentiment, sentiment of the article minus the average sentiment of all articles published in this newspaper


un, wto, who, oecd, scun, nato, imf, ebrd, adb, aiib, wb, cc - one variable for each of the twelve IOs, equals zero if an IO is not mentioned in the article and 1+log(N) if an IO is mentioned N times in the article.


dip_ru, dip_kz, dip_bel, dip_ukr, dip_pl, dip_hu - one variable for each of six countries, equals zero if no influential domestic politician is mentioned in the articles and 1+log(N) if influential domestic politicians are mentioned N times in the article.


Thirteen topic dummies, the topic dummy equals one is an article was classified as discussing this topic, zero otherwise. The LDA topic modelling was conducted for k=30 topics. Topics were grouped by their analytical relevance and semantic similarity into thirteen categories of: politics and legislation (POL); economy, finance, various sectors of the economy (ECO); military, war, protests, crime, security threats (MIL); international affairs, specific issues concerning foreign countries (INT); technology (TECH); family issues, culture, sport, education (FAM); regional issues and housing (REG); health issues and the Covid-19 pandemic (HEA); media (MED); accidents (ACC); religion (REL); the Soviet Union (USSR); and articles for which no topic could be determined (MISC).


Twenty newspaper dummies “n_newspaper_COUNTRY”, the newspaper dummy equals one if an article was published in this newspaper, zero otherwise. The twenty media outlets are: iz.ru, kommersant.ru, novayagazeta.ru, vedomosti.ru, informburo.kz, nur.kz, tengrinews.kz, zakon.kz, bdg.by, belgazeta.by, sb.by, kp.ua, segodnya.ua, vesti.ua, gazeta.pl, rp.pl, wpolityce.pl, index.hu, origo.hu, alfahir.hu.


Six country dummies “c_COUNTRY”.


Twelve month dummies and three year dummies indicate year and month when the newspaper article was published.

 
Subject Computer and Information Science
Social Sciences
post-soviet countries, International Organizations, influential domestic politicians
 
Contributor Under blind review