Short Samples in Authorship Attribution

Introduction by OpenMethods Editor (Maciej Maryl): This article discusses the question of minimal sample size in stylometry setting it up as low as 2,000 words in some cases.

The study was aimed at re-considering the minimum sample size for reliable authorship attribution. The results of the experiments suggest that a sufficient amount of textual data may be as little as 2,000 words in many cases. However, sometimes the authorial fingerprint is so vague, that one needs to use substantially longer samples to make the attribution feasible. A question of some importance is to which category an unknown (disputed) text belongs.


