Not All Character N-grams Are Created Equal: A Study in Authorship Attribution – ACL Anthology

https://openmethods.dariah.eu/2018/11/19/not-all-character-n-grams-are-created-equal-a-study-in-authorship-attribution-acl-anthology/ Not All Character N-grams Are Created Equal: A Study in Authorship Attribution - ACL Anthology - OpenMethods 2018-11-19 10:46:14 Introduction: Studying n-grams of characters is today a classical choice in authorship attribution. If some discussion about the optimal length of these n-grams have been made, we have still have few clues about which specific type of n-grams are the most helpful in the process of efficiently identifying the author of a text. This paper partly fills that gap, by showing that most of the information gained from studying n-grams of characters comes from the affixes and punctuation. Florian CAFIERO https://aclanthology.coli.uni-saarland.de/papers/N15-1010/n15-1010 Blog post Content Analysis English Information Retrieval Literature Stilistic Analysis Text Authorship debates Computational fields of study Computational linguistics Corpus linguistics digital humanities Language modeling Language varieties and styles n-grams Natural language processing ngrams Probabilistic models Quantitative linguistics Speech recognition

Introduction by Volunteer Editor (Florian Cafiero): Studying n-grams of characters is today a classical choice in authorship attribution. If some discussion about the optimal length of these n-grams have been made, we have still have few clues about which specific type of n-grams are the most helpful in the process of efficiently identifying the author of a text. This paper partly fills that gap, by showing that most of the information gained from studying n-grams of characters comes from the affixes and punctuation.

 

Source: Not All Character N-grams Are Created Equal: A Study in Authorship Attribution – ACL Anthology

Original date of publication: 06.2015.