Record Details


Field	Value

Title	Replication Data for: A natural language measure of ideology in the Brazilian Senate

Identifier	https://doi.org/10.7910/DVN/JSZI4V

Creator	Felipe Carneiro Bernardo Mueller Daniel O. Cajueiro

Publisher	Harvard Dataverse

Description	We first obtained the documents from Dados Abertos using data scraping. Then, we built our data set according to each legislature, so that each legislature has its very own corpus. For each legislature, there is a specific set of politicians that can have one or more speeches. We treat each speech as a single document, in the sense that we don’t concatenate all speeches from a specific politician in a single document. So that at the end of our pre-processing phase we have five set of speeches (i.e., five corpora) in which each of them comprises a specific set of politicians with one or more speeches. Another important step is labelling the documents. As described in The Model (main text), we make use of Power and Zucco Jr. (2009) ideological estimation of the main Brazilian parties. They use a multidimensional scaling technique based on survey data and roll-call votes data. Their classification assigns each party to a specific ideological position, in the way that we have parties at the left, center or right . So that given the politician’s party, we can assign their speeches to its respective ideological class.

Subject	Social Sciences

Contributor	Ciência Política, Revista Brasileira de

ICAR Research Data Repository for Knowledge Management