Database

Machine Learning and Destroying the Library of Alexandria

Machine Learning and Destroying the Library of Alexandria

In my last post, you may have noticed I mentioned “evil” four times, but also mentioned “good” four times. Well, you’re in luck. After all that talk about evil and ethics, I want to share with you some good that’s been happening in the world.

But who can talk about goodness, without mentioning the dark circumstances “the machines” don’t want you to know about?

For those who aren’t familiar, the Library of Alexandria was a place of wonder, a holder of so much knowledge, documentation, and so much more. But what happened? It was DESTROYED.

In preparation for this topic, and because I wanted to mention some very specific library destructions over the years, I found this great source on Wikipedia so you can see just how much of our history has been lost.

Some notable events were:

  • ALL the artifacts, libraries, and more destroyed by ISIS
  • The 200+ years’ worth of artifacts, documents, and antiquities destroyed in the National Museum of Brazil fire
  • The very recent fire at Notre Dame, where the fires are hardly even out while this topic smolders within me
  • The Comet Disaster that breaks off and destroys this sleepy Japanese town every 1,200 years (OK, so this one’s from an anime movie, but natural disasters are disasters all the same.)

But how can machine learning help with this? Because I’m sure you all think “the machines” will cause the next level of catastrophe and destruction, right?

I’d like to introduce you to someone I’m honored to know and whose work has inspired growth, change, and not only can be used to preserve the past, but will enlighten the future.

This inspiration is Tkasasagi, who has been setting the ML world on fire with natural language processing and evolutionary changes to the translation of Ancient, Edo era, and cursive Hiragana.

To give you a sense of the significance of this, there’s a quote from last June, “If all Japanese literature and history researchers in the whole country help transcribing all pre-modern books in Japan, it will only take us 2000 years per persons to finish it.”

Let’s put that into perspective—there are countless tomes of knowledge, learning, information, education, and so much more that documents the history and growth of Japanese culture and nation. An island nation in a region with some of the most active volcanoes and frequent earthquakes in the world. It’s only a matter of time before more of this information suffers from life’s natural disasters and gets lost to the winds of time. But what can be done about this? How can this be preserved? That’s exactly the exciting piece that I’m so happy to share with you.

Here in the first epoch of this transcription project, machine learning does an OK job… but is it a complete job? Not even in the least. But fast forward to a few weeks later, and the results are staggering and impressive (even if nowhere near complete).

Images: https://twitter.com/tkasasagi/status/1036094001101692928 

Now, some of you may feel (justifiably so) that this is an impressive growth in such a short amount of time, and I would agree. Not to mention the model is working with greater than 99% accuracy at this point, which is impressive in its own right.

Image: https://twitter.com/tkasasagi/status/1115862769612599296

But the story doesn’t end there—it continues literally day by day. (Feel free to follow Tkasasagi and learn about these adventures in real time.)

Every day, every little advancement in technologies like this through natural language processing (NLP), computer vision (CV), and convolutional neural networks (CNN) continue to grow the entire industry as a whole, where you and I, as consumers of this technology, will eventually find our everyday activities to be easier, and one day will just be seen as commonplace. For example, how many of you are using, or have used, the image language translate function of Google Translate to help display another language, or used WeChat’s natural conversion of Chinese into English or vice-versa?

We are leap-years beyond where we were just a few years ago, and every day, it gets better, and efforts like these just continue to make things better, and better, and better.

How was that for using our machines for good and not the darkest of evils? I’m excited—aren’t you?


Christopher Kusek is Xiologix’s CTO and manages the engineering organization. He and his team evaluate clients’ business and technical requirements and architect solutions that meet the clients’ tactical and strategic goals. Prior to Xiologix, Christopher spent three years at EMC where he was the Global Lead for Cloud and Virtualization. In this position, he was focused on the company’s relationship and integration with VMware, VCE, Storage, and Cloud Services. Christopher was also responsible for Product Management, Marketing, and Sales. Before that, he worked at NetApp as a Technology Evangelist and Principal Consultant responsible for pre-sales and post-sales hybrid consulting engagements surrounding virtualization, storage, Microsoft, and security solutions. Most recently, Christopher has returned from spending the past two years responsible for theater wide infrastructure operations for the war effort in Afghanistan. He is connected to current and future trends; a strategist who comprehends complex business and technology problems uses his organization and leadership skills to solve them. An industry-recognized expert, Christopher is an EMC Elect and VMware vExpert, while also an accomplished speaker and author with five books published. With over 20 years of industry experience spent focused on creating innovative business solutions, Christopher’s expertise is integral to Xiologix’s ability to lead its clients to the right solution for their business.