AI & Machine Learning

How AI Is Unlocking Herculaneum’s Unread Scrolls

The Cruel Bargain That Lasted Two Millennia When Mount Vesuvius erupted in 79 AD, it buried the Roman town of Herculaneum under a superheated surge of volcanic gas and ash. The temperature was high enough to carbonize everything organic — wood, food, human tissue, and papyrus. For the scrolls stored in what archaeologists now call ... Read more

How AI Is Unlocking Herculaneum’s Unread Scrolls
Illustration · Newzlet

The Cruel Bargain That Lasted Two Millennia

When Mount Vesuvius erupted in 79 AD, it buried the Roman town of Herculaneum under a superheated surge of volcanic gas and ash. The temperature was high enough to carbonize everything organic — wood, food, human tissue, and papyrus. For the scrolls stored in what archaeologists now call the Villa of the Papyri, this created one of history’s most perverse preservation paradoxes. The eruption that should have destroyed them instead froze them in time. But it did so by turning them into objects that crumble at the touch.

For roughly two thousand years, every scholar who wanted to read a Herculaneum papyrus faced the same impossible trade-off: open the scroll and destroy it, or leave it sealed and learn nothing. Eighteenth-century attempts to physically unroll the carbonized papyri — some using a machine invented by Antonio Piaggio — recovered fragments of text but sacrificed much of what they touched. The process was irreversible. Every word gained came at the cost of structural damage that could never be undone.

The result is a silence that has distorted the entire historical record of ancient philosophy and literature. Hundreds of intact scrolls remain unread. The villa’s library appears to have concentrated heavily on Epicurean philosophy, particularly the works of Philodemus of Gadara, but scholars cannot confirm what else the collection holds because the texts are still sealed inside their charred cylinders. Lost works by Aristotle, Sappho, or early Roman historians could sit within that collection. No one knows.

This was never a problem that better excavation techniques or more careful handling could solve. The physical fragility of carbonized papyrus is not a challenge of patience or precision — it is a constraint imposed by chemistry and physics. No scalpel or brush, no matter how skilled the hand guiding it, can separate layers of ancient plant fiber that have fused into carbon without causing damage. The ancient Greek and Roman texts locked inside these scrolls were not waiting for a more careful archaeologist. They were waiting for a method that never required touching them at all.

How the Team Actually Did It: Virtual Unwrapping Explained

The technique behind reading PHerc. 1667 — known as virtual unwrapping — never touches the scroll at all. Researchers first scan the rolled papyrus using high-resolution X-ray computed tomography, building a precise three-dimensional digital map of every compressed layer inside. Software then computationally peels those layers apart, producing a flat, readable surface from an object that hasn’t been opened in roughly two millennia. The physical scroll stays sealed throughout. No conservation risk, no irreversible decisions.

The harder problem is detecting the ink once the layers are separated. Ancient Herculaneum scribes wrote with carbon-based ink, and the eruption of Vesuvius carbonized the papyrus itself. The result: ink and writing surface are nearly identical in density, making them almost indistinguishable in standard X-ray imaging. The team trained machine learning models specifically to find the faint textural and surface differences that betray where ink sits on papyrus — differences invisible to conventional scanning analysis but learnable from enough training data.

That four-stage pipeline — CT scan, layer segmentation, AI ink detection, text rendering — has now been completed on an entire rolled scroll from start to finish. Previous work demonstrated pieces of the process on fragments. This is the first time every stage has run end-to-end on a complete, intact Herculaneum papyrus. That distinction matters enormously, and most coverage of the announcement has glossed over it. Fragments offer controlled conditions; a full scroll introduces deformations, damaged sections, and inconsistencies across its entire length. Solving it completely is a different category of achievement.

The team released everything publicly: the preprint, the raw CT data at scrollprize.org, and the full codebase on GitHub. Any research group with the computational resources can now replicate the Herculaneum scroll reading pipeline, test it against other carbonized papyri, or improve the ink detection models directly. The bottleneck for unlocking the remaining unopened scrolls from the Villa of the Papyri is no longer physical access or archaeological technique. It is compute power, model refinement, and the researchers willing to run the pipeline — which is exactly why the open release changes the timeline for what comes next.

What Was Actually Inside: The Missing Context on the Content

Most coverage of the Herculaneum papyrus breakthrough fixates on the scanning technology and the machine learning pipeline. That’s understandable — the feat is genuinely extraordinary. But it buries what classical scholars care about most: what PHerc. 1667 actually says.

The scroll contains a Greek philosophical text, and the Herculaneum library as a whole skews heavily Epicurean. The villa where the papyri were discovered — the so-called Villa of the Papyri at Herculaneum — is widely believed to have belonged to Lucius Calpurnius Piso Caesoninus, Julius Caesar’s father-in-law and a known patron of the Epicurean philosopher Philodemus of Gadara. A significant portion of the scrolls already identified belong to Philodemus himself, covering ethics, rhetoric, music, and the nature of the gods. Every newly readable scroll from this collection has the potential to expand or correct the record on Epicurean philosophy as it circulated in the Roman world during the late Republic.

That matters because Epicurean texts survived antiquity in fragments and secondhand accounts. Lucretius’s De Rerum Natura is the great exception — a complete Epicurean work that made it through the medieval manuscript tradition. The Herculaneum papyri represent a direct, unmediated archive from the philosophical tradition itself, buried before the centuries of copying errors, selective preservation, and deliberate destruction that erased so much ancient writing.

Beyond Philodemus, scholars have long held out hope that the villa’s library contained works by authors whose writings are entirely lost — Ennius, Varius Rufus, Sappho in fuller form, or Greek prose writers known only by title. The collection is estimated to hold up to 800 scrolls still unread. Each one is a sealed question. The digitally unwrapped Herculaneum scroll demonstrated that those questions can now be answered without physically touching the papyrus at all.

The content of PHerc. 1667 is not a curiosity footnote to a tech story. It is the point. The hundreds of carbonized rolls still waiting are potential recoveries of ancient thought presumed gone for two millennia.

The Vesuvius Challenge: How Open Prizes Accelerated the Science

The Vesuvius Challenge didn’t emerge from a university consortium or a government grant. It was a prize competition — structured deliberately to pull machine learning engineers, computer vision researchers, and independent developers into a problem that classical scholarship had stalled on for two centuries.

The organizers published CT scan data openly at scrollprize.org and hosted all code on GitHub. That single decision transformed the Herculaneum papyri from a walled academic problem into an open engineering target. Developers anywhere could download the raw volumetric scan data, run their own models, and submit results. The barrier dropped from “get institutional access” to “have a laptop and an idea.”

Prize milestones drove the pace. The competition awarded money for incremental progress — first letters read, then columns, then larger portions of text — which meant contributors didn’t need to solve the entire virtual unwrapping problem before getting feedback or recognition. That milestone structure compressed years of potential delay into months of parallel experimentation across a global contributor base.

The result was PHerc. 1667, the scroll the community tracked as Scroll 4, read completely from end to end without physical contact — the first Herculaneum scroll ever fully deciphered by digital means. The preprint, the data, and the unwrapping code are all publicly available.

Most coverage of the Herculaneum discovery treats the story as one about ancient philosophy or Roman history. The structural story — open data plus open code plus prize incentives equals accelerated breakthrough — is receiving almost no attention. That model is directly transferable. Damaged manuscripts, sealed archive collections, degraded inscriptions, and unread cuneiform tablets all share the same bottleneck the Herculaneum scrolls had: no scalable method for reading them. The Vesuvius Challenge demonstrated that crowdsourcing AI talent through transparent prize competitions can crack that kind of problem faster than traditional grant-funded research pipelines.

The ancient text recovery field now has a working template. Whether institutions holding other locked archives choose to use it is a policy and funding question, not a technical one.

What Comes Next: AI Is Now the Bottleneck, Not the Scrolls

The successful reading of PHerc. 1667 shifted the problem entirely. Hundreds of Herculaneum scrolls remain sealed in collections across Naples and Oxford, and the physical barrier to reading them no longer exists. The CT scanning method works. Virtual unwrapping works. What remains is an engineering and machine learning challenge, and that is a fundamentally more tractable problem than two millennia of failed attempts to physically unroll carbonized papyrus.

The immediate technical hurdles are specific and solvable. Ink-detection models need refinement to handle scrolls where the carbon-based ink offers even less contrast against the darkened papyrus surface. Segmentation pipelines — the algorithms that reconstruct the geometry of each rolled layer from scan data — must scale to handle scrolls more severely compressed or deformed than Scroll 4. Some papyri in the Herculaneum library suffered heavier damage from the Vesuvius eruption and present irregular, collapsed structures that current models struggle to process accurately. These are known failure modes, not unknown mysteries, which means researchers can target them directly.

The open-source nature of the Vesuvius Challenge accelerates this work. Scroll data is publicly available, and the codebase is on GitHub, meaning independent AI researchers and computer vision specialists can contribute improvements without waiting for institutional access or archaeological permits. The competitive prize structure that produced the original breakthrough remains in place to incentivize continued model development.

The broader implication reaches well past Herculaneum. Virtual unwrapping using X-ray tomography applies to any sealed or fragile ancient document — damaged medieval manuscripts, sealed letters recovered from archaeological sites, carbonized materials from other volcanic events. The pipeline developed for the Herculaneum papyri is a general-purpose technology for recovering text from objects that cannot be safely opened. Classical scholars have long maintained lists of lost works — plays by Sophocles, treatises by Aristotle, Epicurean texts known only by title — with the understanding that recovery was impossible. That assumption no longer holds. The constraint now is how fast the AI models improve, not how many scrolls survive.

Why This Moment Is Different From Previous ‘Breakthrough’ Claims

Readers who follow archaeology news have seen the “ancient text decoded” headline before. Fragments recovered, a few legible lines extracted, cautious excitement from classicists — then silence. The Herculaneum scrolls alone have generated that cycle repeatedly over the past two decades, as researchers coaxed partial results from damaged papyri using multispectral imaging and early CT scanning techniques. Each announcement was real, but none completed the loop.

PHerc. 1667 is different in a way that matters technically, not just rhetorically. The Vesuvius Challenge team didn’t recover a passage — they virtually unwrapped and read an entire Herculaneum papyrus, end to end, without physically opening it. That distinction separates a proof of concept from a working pipeline. A pipeline can be run again. On the next scroll, and the one after that.

The second distinction is how the result was released. The full dataset sits openly at scrollprize.org/data. The code is on GitHub. The methodology is documented in a published preprint. Any research group with the computational resources can download the data, inspect every processing step, and attempt to reproduce or improve on what was done. Previous announcements in this space — including some that generated significant press coverage — relied on proprietary workflows or restricted access, which made independent verification impossible and adoption slow. Open publication changes the incentive structure entirely: improvements now compound publicly rather than sitting inside a single institution.

For anyone calibrating how seriously to take this against past Herculaneum scroll stories, the honest measure is this: virtual unwrapping of carbonized papyrus scrolls has moved from a technique that occasionally surfaces fragments to a technique that reads complete ancient manuscripts. The Epicurean philosophical text recovered from PHerc. 1667 is the content result. The replicable, open, end-to-end process is the infrastructural result — and the second one is what makes the remaining scrolls a queue rather than a mystery.

AI-Assisted Content — This article was produced with AI assistance. Sources are cited below. Factual claims are verified automatically; uncertain claims are flagged for human review. Found an error? Contact us or read our AI Disclosure.

More in AI & Machine Learning

See all →