Scientists are using machine learning to unlock the mysteries of long-dead languages
A cuneiform tablet on show in Jerusalem's Bible Lands Museum. Image: REUTERS/Baz Ratner
As home to many of the most important ancient civilizations on Earth, the lands around the Mediterranean hold enormous historical significance.
As the Romans, ancient Greeks and Egyptians built and expanded their empires across the region, they laid some of the great scientific and cultural foundations on which modern civilization is built.
Predating both the Romans and ancient Greeks, the ancient Mesopotamian civilizations arguably made just as many important contributions to future society, culture and science.
And now, thanks to machine learning, researchers are deciphering the script of these lesser-known cultures.
“The influence that Mesopotamia has on our own culture is something that people don’t know much about,” says Émilie Pagé-Perron, coordinator of the MTAAC project (Machine Translation and Automated Analysis of Cuneiform Languages). With research funding from the Digging into Data Challenge, the project is using 21st-century technology to explore Mesopotamian cuneiform texts from the 21st century BC.
Cracking cuneiform
Mesopotamia was located in what is now Iraq, Kuwait and parts of Turkey, Syria and Iran. In the third and fourth millennia it was home to a number of overlapping civilizations, which conceived important scientific concepts and technologies including astrology, the 60-minute hour and metal-work.
One of ancient Mesopotamia’s most influential civilizations, the Sumerians, gave the world one of the first written languages. The distinctive cuneiform (wedge-shaped) script was adapted from a series of earlier pictograms and written on soft clay tablets using a reed stylus.
Although cuneiform passed to other Mesopotamian cultures, which refined and altered it to suit their own languages and dialects, knowledge of how to read and write the various cuneiform scripts was gradually lost to time.
In the 19th century, translators managed to decipher the writing system; and in 1872 the Assyriologist George Smith translated the most famous example of cuneiform, the Epic of Gilgamesh, a 4000-year-old poem widely believed to be the earliest surviving great work of literature.
Unfortunately, translation of cuneiform tablets is still a time-consuming process and there are very few modern scholars who are able to decipher them. Sumerian is what is known as a "language isolate", one that has no genealogical relationship to any other language spoken today.
But modern technology has given researchers new hope of unravelling the script imprinted on the roughly 300,000 cuneiform tablets discovered to date, of which only around 10% have been translated so far.
Back from the dead
The cuneiform tablets, which had until now been translated using conventional methods, have provided an insight into everyday life in ancient Mesopotamia. The records include legal and scientific documents, financial accounts, beer recipes, as well as creative works such as the Epic of Gilgamesh.
The MTAAC project aims to scan and translate 67,000 cuneiform administrative texts using machine learning and neural machine translation technologies.
The highly standardized documents earmarked for the translation project offer the perfect opportunity to train machine learning algorithms on cuneiform script and understand its intricacies, many of which still elude scholars.
This will then provide the foundations for translation of more complex cuneiform texts in the future and hopefully strengthen methodologies which could be applied to other ancient languages.
Back to Babylon
There is already a drive to bring back spoken Babylonian, and it would not be the first time a language has been brought back from the dead. Before its revival in the late 19th century, Hebrew had not been a spoken mother tongue for well over 1,000 years and was limited to use in religious contexts.
While algorithms may not be able to decode meaning in words perfectly yet, they are making rapid progress in translation of text and even real-time speech. And their potential to help decipher long-lost languages could tell us much about the great civilizations of the past.
Don't miss any update on this topic
Create a free account and access your personalized content collection with our latest publications and analyses.
License and Republishing
World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.
The views expressed in this article are those of the author alone and not the World Economic Forum.
Stay up to date:
Information Technology
Related topics:
The Agenda Weekly
A weekly update of the most important issues driving the global agenda
You can unsubscribe at any time using the link in our emails. For more details, review our privacy policy.
More on Education and SkillsSee all
Agustina Callegari and Adeline Hulin
October 31, 2024