https://openmethods.dariah.eu/2018/11/19/not-all-character-n-grams-are-created-equal-a-study-in-authorship-attribution-acl-anthology/
OpenMethods introduction to: Not All Character N-grams Are Created Equal: A Study in Authorship Attribution - ACL Anthology
2018-11-19 10:46:14
Introduction: Studying n-grams of characters is today a classical choice in authorship attribution. If some discussion about the optimal length of these n-grams have been made, we have still have few clues about which specific type of n-grams are the most helpful in the process of efficiently identifying the author of a text. This paper partly fills that gap, by showing that most of the information gained from studying n-grams of characters comes from the affixes and punctuation.
Florian CAFIERO
https://aclanthology.coli.uni-saarland.de/papers/N15-1010/n15-1010
Blog post
Content Analysis
English
Information Retrieval
Literature
Stilistic Analysis
Text
Authorship debates
Computational fields of study
Computational linguistics
Corpus linguistics
digital humanities
Language modeling
Language varieties and styles
n-grams
Natural language processing
ngrams
Probabilistic models
Quantitative linguistics
Speech recognition
Introduction by Volunteer Editor (Florian Cafiero): Studying n-grams of characters is today a classical choice in authorship attribution. If some discussion about the optimal length of these n-grams have been made, we have still have few clues about which specific type of n-grams are the most helpful in the process of efficiently identifying the author of a text. This paper partly fills that gap, by showing that most of the information gained from studying n-grams of characters comes from the affixes and punctuation.
Source: Not All Character N-grams Are Created Equal: A Study in Authorship Attribution – ACL Anthology
Original date of publication: 06.2015.