View on GitHub

compendium

The Distant Reading Compendium: A virtual edited volume

On Poetic Topic Modeling: Extracting Themes and Motifs From a Corpus of Spanish Poetry

Reference

Navarro-Colorado, Borja. “On Poetic Topic Modeling: Extracting Themes and Motifs From a Corpus of Spanish Poetry”, Frontiers in Digital Humanities, 5 (2018), DOI: 10.3389/fdigh.2018.00015

Abstract

This paper analyzes the application of LDA topic modeling to a corpus of poetry. First, it explains how the most coherent LDA-topics have been established by running several tests and automatically evaluating the coherence of the resulting LDA-topics. Results show, on one hand, that when dealing with a corpus of poetry, lemmatization is not advisable because several poetic features are lost in the process; and, on the other hand, that a standard LDA algorithm is better than a specific version of LDA for short texts (LF-LDA). The resulting LDA-topics have then been manually analyzed in order to define the relation between word topics and poems. The analysis shows that there are mainly two kinds of semantic relations: an LDA-topic could represent the subject or theme of the poem, but it could also represent a poetic motif. All these analyses have been undertaken on a large corpus of Golden Age Spanish sonnets. Finally, the paper shows the most relevant themes and motifs in this corpus such as “love”, “religion”, “heroics”, “moral” or “mockery” on one hand, and “rhyme”, “marine”, “music” or “painting” on the other hand.

Keywords

Distant reading, Corpus, Poetry, LDA, Topic modeling, Spanish

Direct Access

BibTex


@article{navarro-colorado_poetic_2018,
	title = {On {Poetic} {Topic} {Modeling}: {Extracting} {Themes} and {Motifs} {From} a {Corpus} of {Spanish} {Poetry}},
	volume = {5},
	issn = {2297-2668},
	shorttitle = {On {Poetic} {Topic} {Modeling}},
	url = {https://www.frontiersin.org/articles/10.3389/fdigh.2018.00015/full},
	doi = {10.3389/fdigh.2018.00015},
	abstract = {This paper analyzes the application of LDA topic modeling to a corpus of poetry. First, it explains how the most coherent LDA-topics have been established by running several tests and automatically evaluating the coherence of the resulting LDA-topics. Results show, on one hand, that when dealing with a corpus of poetry, lemmatization is not advisable because several poetic features are lost in the process; and, on the other hand, that a standard LDA algorithm is better than a specific version of LDA for short texts (LF-LDA). The resulting LDA-topics have then been manually analyzed in order to define the relation between word topics and poems. The analysis shows that there are mainly two kinds of semantic relations: an LDA-topic could represent the subject or theme of the poem, but it could also represent a poetic motif. All these analyses have been undertaken on a large corpus of Golden Age Spanish sonnets. Finally, the paper shows the most relevant themes and motifs in this corpus such as "love", "religion", "heroics", "moral" or "mockery" on one hand, and "rhyme", "marine", "music" or "painting" on the other hand.},
	language = {English},
	urldate = {2019-11-19},
	journal = {Frontiers in Digital Humanities},
	author = {Navarro-Colorado, Borja},
	year = {2018},
	keywords = {Distant reading, Golden-age, LDA, Poetry, Topic Modeling, sonnet, spanish, type\_publication},
}