Stylometry, the study of linguistic style, is a method used for authorship recognition, it has helped in numerous historical breakthroughs attributing documents of unknown authorship. The same technique can be used to identify an anonymous blogger or forum poster but a set of necessary conditions must be met for stylometry to succeed, like having a reduced number of suspects and a few hundred of available paragraphs that can be compared and analyzed by an algorithm.
It is possible for a state sponsored agency to use their computers to scan similar forums to try and link a high target with his real identity by looking at the writing style alone, it is well known that spy agencies already have the capability of scanning Facebook for keywords, where people is using their real name, but due to the millions of users that Facebook has, an stylometry attack would not be feasible unless it is reduced to forums with just a few dozen users. Gathered evidence is still not a definite beyond reasonable doubt, but it can used as an extra intelligence tool pending confirmation.
Manual adversarial stylometry techniques to circumvent authorship recognition:
- Obfuscation: An author can deliberately camouflage his writing style, including punctuation and use the thesaurus to avoid being repetitive or briefly quoting someone’s else words.
- Imitation: An author imitates someone’s writing style so that analysis will point towards that person or throw the algorithm off the trail with no conclusive result.
- Translation: Automatic software can translate the text a couple of times to a different language and then back to the original.
The Drexel University research team has also released an open source tool called Jstylo-Anonymouth, bundling together an authorship recognition analysis tool and authorship recognition evasion tool, the software is written in Java and will work in any operating system. When you use Anonymouth to circumvent authorship recognition you will be shown an analysis of text complexity, unique and sentence word count, average sentence length, letter space and reading ease score then you will be told if each feature is optimal for anonymity or it needs changing, this automated software is ideal to release long documents.
Note: Software is an alpha release still in development.