Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AI Graphics

Student Uses AI To Decipher Ancient Greco-Roman Scroll, Wins $40K Prize (tomshardware.com) 47

Press2ToContinue writes: "An undergraduate student used an Nvidia GeForce GTX 1070 and AI to decipher a word in one of the Herculaneum scrolls to win a $40,000 prize (via Nvidia)," reports Tom's Hardware. "Herculaneum was covered in ash by the eruption of Mount Vesuvius, and the over 1,800 Herculaneum scrolls are one of the site's most famous artifacts."

The scrolls have been notoriously hard to decipher because they cannot be unwrapped because they're basically like a stick of charcoal. Instead they must be virtually unwrapped, using a 3D scan dataset of it in its wrapped state. So, the task is to find the tiny bits of ink, assemble them into letters, and try to decipher what they say. Machine learning is now becoming the key that picks the lock. A student deciphered one of the words using a GTX 1070, which doesn't even have any tensor cores. Imagine what he could do with a RTX 4090!

This discussion has been archived. No new comments can be posted.

Student Uses AI To Decipher Ancient Greco-Roman Scroll, Wins $40K Prize

Comments Filter:
  • Isn't this sort of thing what old school image processing excels at? Ie extracting faint images from noise. Obviously ML models are flavour of the moment but I wonder if it couldn't have been done using a boatload of maths instead.

    • by Samantha Wright ( 1324923 ) on Tuesday November 14, 2023 @05:59AM (#64004367) Homepage Journal

      First: you are erroneously hallucinating that there is a difference between "ML" and "a boatload of maths." Machine learning is a branch of statistics. It is the biggest boatload of maths.

      Second: the raw data looks like this [scrollprize.org]: researchers are looking for a "crackle" pattern indicative of where ink should be. Nothing is visible to the naked eye because the black carbon ink is chemically indistinguishable from the carbonized papyrus beneath.

      The picture in the article is misleadingly taken out of context: it shows the output of Farritor's model with coloured annotations manually added by linguists. No machines were involved in assigning identities to the letters, since there was so little information that needed parsing. Maybe someday an OCR algorithm will be developed for preprocessed Herculaneum scrolls, but there's no shortage of expert human labour, as most Classics departments haven't had a new manuscript to edit in a very long time.

      Farritor's model was trained on manually-labelled data; he identified the "crackle" patterns in smaller sections of the available images, as Casey had before him, and enlarged the training set until it was adequate to start finding new signals on its own. This is by far the least human-effort-intensive approach to the problem, and it still took many months to surface because of the extreme barrier to entry in terms of expertise.

      If there were another way, it would have been done by now.

      • Re: (Score:2, Informative)

        by Viol8 ( 599362 )

        "Machine learning is a branch of statistics. It is the biggest boatload of maths."

        Yes I realise that , but the stats maths in ML is non specific. I was refering to very specific image processing maths. Plus you could accurately say that computers themselves are just maths realised in hardware form but its would completely miss the point.

        " Nothing is visible to the naked eye"

        When has that ever been an issue in the last 100+ years of IR, UV, xray and who knows what other imaging types?

        "If there were another w

        • by Samantha Wright ( 1324923 ) on Tuesday November 14, 2023 @06:46AM (#64004419) Homepage Journal

          Giant repositories of unreadable texts do not exist, as people tended to throw them away. A few exceptional archaeological discoveries have been spared; Dr. Seales, who perfected the scroll unrolling process, looked at a couple of them on the way to analysing the Herculaneum papyri. He has given exhaustingly long lectures on the topic [youtube.com] in the past. The En-Gedi scroll used iron-based ink, which made the actual analysis trivial once the unrolling was complete.

          The data being used in this case are X-ray spectrographs captured using a synchrotron. (A particle accelerator similar to the LHC.) This is the same technology used to perform crystallography on proteins, but tuned for a large object rather than millions or billions of copies of a single small molecule. This is powerful enough to reconstruct the scrolls on an atom-by-atom level, although it is not quite high enough resolution.

          I can't emphasise enough that after 2000 years, there is no chemical difference between the burnt papyrus of the page and the burnt pine resin of the ink. All that exists in the physical object are patterns of how the stylus deformed the parchment during the writing process, and how the ink caused the fibrous structure to change as it dried. This is why many letters are damaged or left no trace at all.

          Anyway. Image processing as a field is no longer a major topic that has large research grants behind it, and the experts who know the techniques are ageing. The field as a whole was basically killed dead in 2012, after the AlexNet [wikipedia.org] model (a neural network) grossly outperformed the state of the art on the ImageNet challenge. I wouldn't go so far as saying all the researchers got sacked, but they definitely had to make a hard turn into new areas of research to keep their jobs.

          This was an example of "the bitter lesson [incompleteideas.net]": There is no point in hand-crafting a large algorithm using expert knowledge when you can train an AI model to do the job 95% as well with 1% of the development time. Since expert knowledge of the data doesn't exist (and would take many researchers decades of work to divine, researchers who no longer even exist), there isn't a practical alternative.

          • by jd ( 1658 )

            The issue, surely, is not whether it is 95% as good with 1% of the effort, but whether it can do better than hand-turned with less than 100% the effort or whether 95-99% as good is as good as it gets.

            Yes, 95% as good is fine for something like the Archimedes Palimpsest or a damaged cuneiform tablet. In fact, it's more than fine for the tablets, because we've got so many and very few are digitised in a useful way. After ISIS destroyed large numbers of tablets, we've seen the need to do bulk recording and tra

      • by gosso920 ( 6330142 ) on Tuesday November 14, 2023 @10:23AM (#64004811)
        lorem ipsum ovaltine
    • by cowdung ( 702933 )

      So a classic way of solving this is to systematically throw convolutions or cross-correlations at the data until you see something visible. Basically to search the space of convolution/cross-correlation functions until you find one that fixes the data.

      Well, guess what? That's exactly what a convolutional neural network does.

  • by carnivore302 ( 708545 ) on Tuesday November 14, 2023 @05:22AM (#64004331) Journal

    Imagine what he could do with a beowulf cluster of RTX 4090s!

  • How do they know the translation is the correct translation? just take the AI's word?
    • Doesn't the article mean that rather than translate the word, the AI was used to recognise what word it was. The translation that has been applied seems to be of earlier origin, probably from other better legible sources.
      • Re:How do they know? (Score:4, Informative)

        by a5y ( 938871 ) on Tuesday November 14, 2023 @06:11AM (#64004377)

        The Toms Hardware coverage links a Nvidia blog - https://blogs.nvidia.com/blog/... [nvidia.com] - containing:

        His achievement in identifying 10 letters within a small patch of scroll earned him a $40,000 prize.

        Yeah. it's reading the characters. Not understanding the meaning of the word. That definition was evidently known in 2016 because purple dye is on the earliest draft of the wikitionary ancient greek page for the word. Ancient Greek isn't a lost language, wasn't knowledge of it the reason the Rosetta stone could be used to understand hieroglyphics?

        The Tom's coverage, to be generous, played a game of telephone with the original context in a way that *some might think* that **an AI** went and **understood** the meaning of something. But the software was a machine learning algorithm and it didn't understand meaning any more than an app scanning handwriting and turning it into plaintext did.

        This buried the accomplishment and the resourcefulness of the winners (there's multiple related prizes, some still not won).

        • Re:How do they know? (Score:4, Informative)

          by Samantha Wright ( 1324923 ) on Tuesday November 14, 2023 @06:48AM (#64004423) Homepage Journal

          It's not reading the characters; those boxes were drawn in by classicists. Rather it's generating the black and white image underneath; see my comments here [slashdot.org] and here [slashdot.org].

        • by HBI ( 10338492 )

          The Rosetta stone contains Ancient Greek, and Ancient Egyptian written in demotic script and hieroglyphics.

          The demotic script resembles an ancestor form of modern Coptic, so knowledge of that was used to decipher it alongside the Greek, and the hieroglyphics were then assumed to be a translation of same.

          Ironic that our ability to read these languages essentially derives from the equivalent of a modern government notice done in multiple languages.

        • by Sique ( 173459 )
          More so, Ancient Greek has been proliferated through the millennia by scribes copying many pieces of general interest like Homer's Ilias and Ulysses, Sophocles' plays, Classical Greek philosophy and the translation of the Old Testament from Hebrew into Greek (the so called Septuaginta) and the New Testament, which was written in Ancient Greek to begin with. Ancient Greek was never forgotten. There were always people during the times who could read and understand Ancient Greek.

          As a caveat: Between Homer an

          • by HiThere ( 15173 )

            Did Greek really change as much as English did? I sort of doubt it. Not being able to read either, I can't be absolutely sure, but Greece was never conquered by a foreign invader speaking a very different language. And languages don't change at the same rate even when undisturbed by externalities.

            • by Sique ( 173459 )
              As there was no "Greece" until 1822, this is somehow true.

              At first, today's Greece was settled in several waves by different Greek speaking tribes, like the Achaians, the Dorians, the Ionians or the Aiolians - each with a different dialect of Ancient Greek. Homer's Greek for instance was the Dorian dialect.

              Then the Greek cities (poleis) were under constant treat of conquest while at the same time expanding all the time by founding colonies. There were many Greek cities for instance conquered by the Pers

              • by HiThere ( 15173 )

                Yes, but after the Dorian invasion, I believe that all the conquests (in Greece) until the Romans were by those speaking approximately the same language. (Like German and Austrian, not really the same, but not that different. E.g. I believe that they could all understand Homer without translation.) It's not like the Persians had been successful. (I think that's comparable with French and Anglo-Saxon, though possibly not Norman French.)

            • Chaucer postdates the Norman Conquest so the differences between his English and present-day English have nothing to do with the conquest. Homer's Greek is, to begin with, a different dialect from the Attic dialect with which the most familiar Classical Greek materials are written. There were some substantial changes too, but they are not enormous. The Greek of the New Testament is what is known as Koine, a somewhat simplified version of Classical Attic that had become the lingua franca of much of the Easte
  • porphyras (Score:4, Interesting)

    by excelsior_gr ( 969383 ) on Tuesday November 14, 2023 @05:31AM (#64004345)
    In case you were wondering, the word was "porphyras" which the article says it means purple. The Greek wikidictionary displays #b80049. For what is worth, modern Greek people would probably describe porphyra as a shade of red (deep red, royal red) rather than purple, but I digress. The article also shows the original text, which while reading I would have a hard time coming up with "porphyras". Especially the "s" at the end, and the second half of the word in general are pretty much a guess, I would say. The AI markings next to each letter also seem to indicate that, although they also make it harder for us to actually see the text.
    • Time for LLMs to do the rest of the work and fill in the blanks with synthesized text, since we're all into probability and statistics now. Corpus not big enough? Augment it with synthesised text, to make a bigger corpus.
    • Re:porphyras (Score:5, Informative)

      by Samantha Wright ( 1324923 ) on Tuesday November 14, 2023 @06:26AM (#64004389) Homepage Journal

      Here are a few things that the article couldn't hope to be able to tell you (and that are somewhat absent from scrollprize.org also):

      - "Porphyry" exists in English as the typical word for the red rocks called porphyrite (using old-school greekscii, I think it would be written porqurith) in Greek and Latin, so it's not entirely unknown. The colour is also familiar to us as "Tyrian purple," although many armchair classists have no idea that our modern concept of purple originated after several breakthroughs in dyes and pigments in the 19th century. This is the colour beloved by Roman emperors and aristocrats that required immense numbers of crustaceans to produce. (It was eventually replaced with cheaper red dyes.)

      - The actual source image looks more like this [scrollprize.org]. Luke Farritor produced the black and white base image. The coloured boxes were added by linguists, not Farritor's model.

      - The Herculaneum scrolls come from the private "working library" of an Epicurean philosopher, Philodemus. Most of them are different drafts of his writing. He was a Greek who had emigrated to Rome, and lived in the early 1st century AD. The library seems to have been left untouched for decades before the eruption.

      He appears to have adopted what we barbaroi call early Roman cursive [wikipedia.org], which has many differences from how Greek was written by professional scribes elsewhere in the Empire. In particular, "A" often just looked like a lambda, and "R" had many strange shapes, usually looking somewhat like a "C" or "T" with the bottom extending far below the baseline, and has more in common with the letter "r" than the letter "R".

      Once you understand these peculiarities, it looks like the word is written with a mixture of Greek and Latin letters, which is par for the course when dealing with old texts that weren't written by professional scribes for a wealthy client.

  • Doesn't decypher mean learning the meaning of something that wasn't known before (or rediscovering what was once known and then forgotten?). The decoded of something that is in a code? I know not all codes are secret but if it's not secret or mysterious... that's reading.

    I went to check and see if this is a word that has never been understood before (which seems massively unlikely to me, ancient greek on scrolls in a roman ruin... ancient greek not a forgotten language). Scans tech from 2019 was predated by

    • by Entrope ( 68843 )

      First, the word is spelled "decipher".

      Second, it means "To read or interpret (ambiguous, obscure, or illegible matter)." A secondary meaning is "To convert from a code or cipher to plaintext; decode." The first meaning is entirely appropriate here because of how damaged the scroll is.

      • First, the word is spelled "decipher".

        It is lovely that you can write in American but, you have apparently not heard of English. They are closely-related languages with some distinct differences in spelling.

    • In this case, the issue isn't the definition of the word it is picking it out from clusters of ink fragments in a 3d scan of an ancient rolled up document.

      My concern would be more that the 'AI' is suffering from digital pareidolia. The only way to be sure is to have a human look at the output and compare it to the original... and even then you'd want a lot of surrounding text deciphered to ensure the suggested word is appropriate in context.

  • It could decipher doctor's handwritten prescriptions, even upside-down!

  • . Imagine what he could do with a RTX 4090!

    No, imagine what he could do wit ha beowulf cluster of 4090!

  • And why does this feel like a free advertisement for Nvidia?
  • The word? (Score:4, Funny)

    by RoccamOccam ( 953524 ) on Tuesday November 14, 2023 @08:27AM (#64004527)
    And the word was "Ovaltine".
  • by kalpol ( 714519 ) on Tuesday November 14, 2023 @09:19AM (#64004639)
    A Latin professor taught a class I was in, and he showed me one day his work on the Herculaneum scrolls. At the time they were still trying to unroll them, so they had a lot of fragments, and were using things like X-ray spectroscopy and electron microscopes to identify the bits with ink then reassemble them, like the world's worst jigsaw puzzle. There were all kinds of complicated attempts to unroll the scrolls, ranging from looms with strings and glue to just picking them slowly apart, but then the 3D scanning came along to show some amazing results.

You know you've landed gear-up when it takes full power to taxi.

Working...