On the evening of October 16, 1984, the body of four-year-old Grégory Villemin was pulled out of the Vologne river in Eastern France. The little boy had disappeared from the front garden of his home in Lépanges-sur-Vologne earlier that afternoon. His mother had searched desperately all over the small village, but nobody had seen him.
It quickly became clear that his death wasn’t a tragic accident. The boy’s hands and feet had been tied with string, and the family had received several threatening letters and voicemails before he disappeared. The following day, another letter was sent to the boy’s father, Jean-Marie Villemin. “I hope you will die of grief, boss,” it read in messy, joined-up handwriting. “Your money will not bring your son back. This is my revenge, you bastard.”
It was the beginning of what would become France’s best-known unsolved murder case. The case has been reopened several times, and multiple suspects have been arrested. Grégory’s mother, Christine, was charged with the crime and briefly jailed but later acquitted. Jean-Marie also served prison time after he shot dead his cousin Bernard Laroche, who had emerged as a prime suspect. The investigating judge, Jean-Michel Lambert, who was assigned the case at age 32 and made critical mistakes early in the investigation, killed himself in 2017.
In France, the use of stylometry — the study of variations in literary styles — has largely been confined to academic circles. The Grégory case is the first time it has been applied in a major criminal investigation.
More than three decades after Grégory’s murder, police brought in a team of Swiss linguists from a company called OrphAnalytics to examine the letters and their use of vocabulary, spelling and sentence structure. Their report, submitted in 2020, and part of which was leaked to the press, pointed to Grégory’s great-aunt, Jacqueline Jacob. The results echoed earlier handwriting and linguistic analysis that had led to Jacob and her husband’s arrest in 2017. (The couple was freed later that year over procedural issues.)
While the new evidence has not yet been presented in court, some believe it could help to solve the case that has haunted an entire generation. It has also shone a spotlight on the little-known field of forensic linguistics. In France, the use of stylometry — the study of variations in literary styles — has largely been confined to academic circles. The Grégory case is the first time it has been applied in a major criminal investigation.
The use of forensic linguistics in the case was initially treated with skepticism. The Jacobs’ lawyers dismissed the previous stylometric analysis as “completely ridiculous” and a “pseudo expertise.” The general prosecutor at the Court of Appeal of Dijon, Philippe Astruc, declined to comment for this piece on the techniques currently being used in the investigation but in a recent interview with radio station RTL he cautioned: “To imagine that it will suddenly be settled with a single report is an illusion.”
“The press didn’t understand it, and the lawyers are saying it can’t work,” Claude-Alain Roten, CEO of Orphanalytics, told me over the phone from his office in Vevey, a Swiss town on Lac Léman. But he assured me his results are reliable. “We came to similar conclusions to the conclusions they had already reached by other means,” he said, adding that OrphAnalytics last year completed another report commissioned by the general prosecutor of Dijon, who oversees the Villemin investigation, analyzing an additional anonymous letter. “It gives us a very precise idea of who the person who wrote the letter is.”
✺
According to forensic linguists, we all use language in a uniquely identifiable way that can be as incriminating as a fingerprint. The word “forensic” may suggest a scientist in a protective suit inspecting a crime scene for drops of blood. But a forensic linguist has more in common with Sherlock Homes in “A Scandal in Bohemia.” “The man who wrote the note is a German. Do you note the peculiar construction of the sentence?” the detective asks in the 1891 short story. “A Frenchman or a Russian could not have written that. It is the German who is so uncourteous to his verbs.”
The term “forensic linguistics” was likely coined in the 1960s by Jan Svartvik, a Swedish linguist who re-examined the controversial case of Timothy John Evans, a Welshman who was wrongfully accused of murdering his wife and daughter and was convicted and hanged in 1950. Svartvik found that it was unlikely that Evans, who was illiterate, had written the most damning parts of his confession, which had been transcribed by police and likely tampered with. The real murderer was the Evans’ downstairs neighbor, who turned out to be a serial killer.
Today, the field is perhaps still best known for its role in solving the “Unabomber” case in the United States. Between 1978 and 1995, a mysterious figure sent letter bombs to academics, businessmen and random civilians, killing three people and injuring at least 24. The lone bomber was careful not to leave any fingerprints or DNA traces, evading the authorities for 17 years and triggering one of the longest and most expensive criminal investigations in U.S. history. But in 1995, he made a crucial mistake. He told the police he would pause his attacks on the condition that a newspaper publish his 35,000-word anti-technology manifesto.
Increasingly, linguists with a background in computer sciences have also taken on cases of “authorship attribution” — identifying the author of a given text and, in some cases, shedding light on long-standing literary mysteries.
When the document appeared in the Washington Post, the New York Times and Penthouse magazine, several people — including the perpetrator’s brother— reached out to say they recognized the writing style. Meanwhile, FBI linguist James Fitzgerald and sociolinguist Roger Shuy, who had been studying the bomber’s letters, had identified patterns in his language that helped narrow the list of suspects: Spellings such as “wilfully” for “willfully” and “clew” for “clue” pointed to someone from the Chicago area, for example. Eventually, the linguistic evidence was strong enough to issue a search warrant for the home of a reclusive mathematician named Theodore Kaczynski, raised in Chicago but living in rural Montana, where investigators found copies of the manifesto and homemade bombs.
While U.S. authorities hunted down the Unabomber, the field of forensic linguistics was developing in other countries. The University of Birmingham hosted the first British Seminar on Forensic Linguistics in 1992, bringing together academics from Australia, Brazil, Holland, Ukraine, Greece and Germany. Barcelona’s Pompeu Fabra University has had a forensic linguistics laboratory since 1993. But it wasn’t until the next decade that the field became more structured, with the creation of university research teams, master’s degrees and government-funded police laboratories and agencies.
“It’s still emerging in places outside where it initially started, but it is growing gradually as people are getting trained,” said Nicci MacLeod, a senior lecturer at the Aston Institute of Forensic Linguistics in Birmingham, England, which was established in 2019.
Increasingly, linguists with a background in computer sciences have also taken on cases of “authorship attribution” — identifying the author of a given text and, in some cases, shedding light on long-standing literary mysteries. In 2015, computational social scientist Ryan Boyd and social psychologist James Pennebaker, for example, analyzed Shakespeare’s plays and compared them with Double Falsehood, a play with disputed authorship. Using software that analyzed, among other things, the use of short “function” words like the, by, of, a, thee and ye, they identified Shakespeare as the author of the play and suggested he may have collaborated with the writer John Fletcher for its fourth and fifth acts.
The computational linguists Florian Cafiero and Jean-Baptiste Camps have done similar work in France. For over a century, scholars had argued that Molière could not have written some of his best work due to his lack of education, suggesting instead that his plays had been ghostwritten by the poet Pierre Corneille. The academics were able to disprove this theory by looking at language, rhyme, grammar and word forms. This established a “clear-cut separation” between plays penned by Molière and works by Corneille.
“You thought that Molière wrote his plays? Well… yes, he did,” Cafiero said with a chuckle, when we met at his offices at the École Nationale des Chartes in Paris, a university specialized in historical sciences, to discuss his work. “It was the total opposite of a scoop.”
These literary cases, often closely followed by the media, have helped to bring the field into the mainstream in France and elsewhere. After their work on Molière, Cafiero and Camps were tapped (alongside OrphAnalytics) to work on a New York Times investigation into the instigators of QAnon, the far-right American conspiracy theory that started spreading online in 2017. Using machine learning to analyze the social media posts of various suspects, both teams separately identified Paul Furber, a South African software engineer, and Ron Watkins, an American conspiracy theorist, as the likely authors behind the anonymous Q. Their results, which were published in 2022, gave a plausible explanation to an online mystery that had wreaked havoc for years.