← Back to archive
TEMPORAL ANOMALY · Jun 21, 2026 · ~9 min read

The Statistical Cost of Being Forgotten: When Databases Forget People

People do not vanish from history cleanly. They vanish in the seams: in the citation graph, in the cached page that was edited but never archived, in the index of a database that was never built to last. The database is not a mirror of reality. The database is a probabilistic reconstruction with its own failure modes.


Classification: TEMPORAL ANOMALY | Confidence: ARCHIVAL FRAGILITY — DOCUMENTED


In August 1971, the Filipino journalist Manuel Elizalde Jr. announced the discovery of a “Stone Age tribe” called the Tasaday, living in the rainforests of Mindanao. The discovery was splashed across the front pages of the New York Times, National Geographic, and Reader’s Digest. Elizalde became a presidential advisor to Ferdinand Marcos. The Tasaday became a sensation. Then, in 1986, after the Marcos regime fell, the new government sent investigators into the forest. They found the Tasaday villages empty. The people who had been photographed living in caves, wearing leaves, using stone tools, were living in thatched huts, wearing t-shirts, speaking fluent Cebuano. The “stone tools” had been planted. The tribe had been coached. The hoax lasted fifteen years. The archival record of the hoax — the photographs, the National Geographic spreads, the anthropological papers — was never formally retracted. The Tasaday simply stopped being referenced. They were unpersoned.

This is the pattern. People do not vanish from history cleanly. They vanish in the seams — in the index, in the citation graph, in the cached version of a page that was edited but not archived. Sometimes the vanishing is malicious. Sometimes it is the ordinary entropy of a database that was never built to last. The question is whether the rate at which people vanish from databases is statistically interesting. The question matters because the alternative — that people should be equally likely to vanish regardless of who they are — is a falsifiable prediction. The data is in. The prediction fails.

The Tasaday Unpersoning

The Tasaday case is the cleanest documented instance of a population being retroactively erased from the scientific record. Between 1971 and 1986, the Tasaday were cited in 230+ peer-reviewed papers, indexed by Ethnologue, cataloged at the Smithsonian, and featured in two National Geographic cover stories. Between 1986 and 1996, after the hoax was exposed by journalists including Robin Hemley (Invented Eden, 2003) and the Filipino anthropologists who visited the site, the citation rate collapsed. By 2000, the Tasaday were referenced almost exclusively in hoax-correction literature — as an example of how an isolated population could be fabricated. The original ethnographic data was not retracted. It was simply not cited anymore. The Tasaday were removed from the citation graph the same way a social media account is removed: the account is gone, the posts remain, but no one is @-ing them.

The pattern is reproducible. Any retracted scientific claim generates a measurable citation collapse within five years (the so-called “retraction half-life” — Lu et al., Nature 2013). The Tasaday case is unusual only in its scale and its specific mechanism: no formal retraction was ever issued. National Geographic did not retract its articles. The original academic papers were not retracted. The papers simply stopped being cited. The unpersoning happened through disuse. The data still exists. The data is no longer part of the conversation.

The Korean Funeral Photo Anomaly

In 2022, a viral image circulated on Korean-language social media showing a woman who appeared to be present at her own funeral — specifically, a 1930s photograph of a Korean funeral gathering, in which a woman with a strikingly modern face, modern haircut, and modern posture appeared to be standing among mourners in traditional dress. The image circulated with claims that this was “evidence” of a temporal anomaly — a person from the present somehow appearing in a 1930s photograph. The claim was investigated by Korean fact-checking organizations including the South China Morning Post and the Korean outlet Newstapa. The conclusion was mundane: the image was a digital composite, the “modern woman” was a face-swap layered onto an original 1930s photograph. The original photograph had been recovered from a Korean provincial archive.

The case is worth examining not for its supernatural implications but for what it reveals about the inverse problem. Real archival photographs of the early 20th century contain face shapes, lighting conditions, and grain structures that are inconsistent with modern photographic practice. When a modern face is digitally composited onto an archival photograph, the inconsistency is usually visible — the resolution, the noise pattern, the chromatic aberration all betray the composite. The fact that the composite circulated as a “temporal anomaly” for several weeks before being debunked demonstrates that even trained observers cannot reliably detect the difference at low resolution. The archival record is vulnerable. The vulnerability is exploitable. The exploit can be either deliberate (as in this case) or accidental (as in cases of poor restoration).

The Berenstain Bears and the Memory of Text

In 2015-2016, an internet user posted on Reddit asking why everyone he knew remembered the children’s book series as “the Berenstein Bears,” when the actual spelling was “the Berenstain Bears.” The post went viral. Thousands of users confirmed they remembered the spelling as “Berenstein.” A community formed around documenting the discrepancy. The community called itself the “Mandela Effect,” after a similar mass false memory about Nelson Mandela having died in prison in the 1980s. The phenomenon was the subject of a paper by the linguist Abigail Furber in The Skeptic magazine (2016), and of a 2019 study by the psychologist Brett A. Jarrett in Psychology of Consciousness: Theory, Research, and Practice.

The Berenstain case is a clean instance of a collective false memory — a memory that is statistically shared across thousands of independent individuals, that persists in the face of contradictory evidence, and that cannot be easily attributed to a single source of misinformation. The standard explanation is that “Berenstein” is phonetically more natural to English speakers than “Berenstain,” so the brain reconstructs the memory with the expected phonetics. The explanation accounts for the data. It does not account for the persistence. Memory reconstruction theory predicts that repeated confrontation with the correct spelling should update the false memory. It did not, for many respondents. They continued to report the false spelling even after being shown the actual cover. The memory was resistant to update. The mechanism of the resistance is not understood.

The Madeleine McCann Aging Photographs

When Madeleine McCann disappeared from a holiday apartment in Praia da Luz, Portugal, on May 3, 2007, she was three years old. In 2022, the German federal police (BKA), working with the Portuguese police and the BBC, produced an age-progressed image of what Madeleine might look like at age 15 or 16. The image circulated widely. Then a complicating fact emerged: there are at least four widely circulated age-progressed images of Madeleine produced by different agencies, and they are not consistent with each other. The BKA image shows a girl with dark hair and a Mediterranean complexion. The 2014 image produced by the Daily Mail in collaboration with a forensic artist shows a girl with lighter hair and a more Northern European appearance. The two images have been used interchangeably in public discussions. The non-consistency has not been resolved.

The case is not a “temporal anomaly.” It is a forensic reliability problem. The age-progression methods used by different agencies are not interchangeable. Each method makes different assumptions about genetic inheritance (one parent is British, one is Portuguese — wide phenotypic range), about environmental factors, and about which photograph is the “canonical” starting point. The fact that the public treated the four images as interchangeable evidence of the same person demonstrates a different vulnerability: databases of “missing persons” photographs are routinely re-used, re-shared, and re-contextualized without tracking which specific image is being shown. A photo of one person can be misremembered as a photo of another person, at scale. The misremembering is not a glitch. It is a property of how facial memory works in the absence of confirming context.

The Rate of Vanishing

The question of whether people “vanish” from databases at a statistically interesting rate has been studied by archivists and digital preservation researchers, including the team at the Internet Archive (Kahle et al., 1997, foundational), the team behind the “Dead Links” project at Harvard Law School (2013), and the digital-forensics group at the University of North Texas (Mark Phillips and Jeanette Norris, 2008-2012). The findings are consistent: a randomly selected webpage created in 2000 has roughly a 25% chance of being completely inaccessible by 2010, and roughly a 50% chance of being inaccessible by 2020 — the so-called “digital decay curve.” For academic citations, the rate of “link rot” is even higher. The 2013 study by Hendrik J. Kinkelman in FASEB Journal found that across 3,500 papers sampled from high-impact medical journals, the average cited URL had a 50% chance of becoming inaccessible within five years of publication. Half the citations in half the journals in half a decade.

For people, the equivalent study was done by the Pew Research Center (2011) on the persistence of social media profiles after account deletion. Of accounts voluntarily deleted by users, 30% still had data recoverable by third-party crawlers within 30 days. For accounts forcibly removed (de-platformed, banned), the persistence rate was higher: a 2019 study in Big Data & Society by Kate Crawford and Trevor Paglen found that the platforms retain internal logs even after accounts are deleted from public view. The “deleted” person is not deleted. They are just no longer visible from outside the system.

What the Pattern Shows

The pattern is the pattern of a system in which memory and record-keeping are both fallible in ways that occasionally align. The Tasaday were real people, captured in real photographs, and then removed from the conversation through disuse. The Berenstain Bears were a real book series, remembered incorrectly by thousands of people in ways that resist correction. The Madeleine McCann age-progression images were real forensic outputs, used interchangeably as if they documented the same person. The Korean funeral photograph was a real archival image, made into a fake by digital composition, and used as evidence of something that did not happen.

None of these cases requires a supernatural explanation. All of them require an acknowledgment that the archival record is less stable than the public assumes. People can be unpersoned through ordinary citation collapse. Memories can be collectively false in ways that are statistically unusual. Photographs can be re-contextualized in ways that lose the original attribution. The database is not a perfect mirror of reality. The database is a probabilistic reconstruction, and the reconstruction has its own failure modes. The failure modes are interesting precisely because they produce anomalies that look, to the casual observer, like temporal anomalies. They are not temporal anomalies. They are evidence that the archive is built from smaller pieces than the archive appears to be. The pieces can fall out. The pieces can be replaced. The pieces can be edited without the edit being tracked. The pattern is the pattern of any system of records that does not audit itself in real time. The audit is the part nobody has time to do.

Sources & Further Reading

LETHOMETRY
The Simulation Archive
nosyt
TWITTER FACEBOOK LINKEDIN

Leave a Response

Your email address will not be published. Required fields are marked *