Record Details

Replication Data for: A natural language measure of ideology in the Brazilian Senate

Harvard Dataverse (Africa Rice Center, Bioversity International, CCAFS, CIAT, IFPRI, IRRI and WorldFish)

View Archive Info
 
 
Field Value
 
Title Replication Data for: A natural language measure of ideology in the Brazilian Senate
 
Identifier https://doi.org/10.7910/DVN/JSZI4V
 
Creator Felipe Carneiro
Bernardo Mueller
Daniel O. Cajueiro
 
Publisher Harvard Dataverse
 
Description We first obtained the documents from Dados Abertos using data scraping. Then, we built our data set according to each legislature, so that each legislature has its very own corpus. For each legislature, there is a specific set of politicians that can have one or more speeches. We treat each speech as a single document, in the sense that we don’t concatenate all speeches from a specific politician in a single document. So that at the end of our pre-processing phase we have five set of speeches (i.e., five corpora) in which each of them comprises a specific set of politicians with one or more speeches.
Another important step is labelling the documents. As described in The Model (main text), we make use of Power and Zucco Jr. (2009) ideological estimation of the main Brazilian parties. They use a multidimensional scaling technique based on survey data and roll-call votes data. Their classification assigns each party to a specific ideological position, in the way that we have parties at the left, center or right . So that given the politician’s party, we can assign their speeches to its respective ideological class.
 
Subject Social Sciences
 
Contributor Ciência Política, Revista Brasileira de