Machine translation broken down to the very gist

Deep learning has enabled researchers who know almost nothing about language translation to put together relatively simple machine learning solutions that beat the best in the market. All thanks to sequence-to-sequence learning. The exact same algorithm is used for AI chatbots and describing pictures.

How sequence-to-sequence learning changed the translation landscape

We all know and have used Google Translate. To translate a phrase from a book. Or understand the lyrics of a song from a movie. Or fully grasp the meaning of carpe diem. And something you probably noticed: it keeps improving every year. What changes?

The answer is deep learning.

Deep learning has rewritten our approach to machine translation. Deep learning has enabled researchers who know almost nothing about language translation to put together relatively simple machine learning solutions that beat the best in the market. All thanks to sequence-to-sequence learning. The exact same algorithm is used for AI chatbots and describing pictures.

It’s very powerful for problem-solving. Let’s unpack this.

How did we get here?

Here’s how it started. The simplest approach is to replace every word in a sentence with the exact word in the target language. Say, “candy” is “caramelo” in Spanish. But this method ignores grammar and context. Which is a problem because “caramelo” could also be caramel, for example. So if you translate the phrase “Amo el caramelo” it’s a 50/50 chance it could be I love candy or I love caramel.

So we developed language-specific rules. Common two-word phrases made up a single group. Those sorts of rules. Rules to swap the order of nouns and adjectives according to language. And it worked better, but it was hard to implement in lengthier real-world documents.

Afterward, new translation approaches were developed. They now used models based on probability and statistics instead of grammar rules. Statistics-based translation systems required ample training data. What it does is compare two identical texts in at least two languages. After, it takes them as a reference to understand the odds of other phrases being expressed the same. There are many double-translated texts in, for example, the European Parliament (translated into 21 languages) to learn from. But we are limited to these. A problem when it comes to minority languages or specific genres.

A probability based model doesn’t try to generate one exact translation. Instead, it generates thousands of possible translations. And then ranks them by how likely each is to be correct. This could solve the candy-caramelo situation we began with.

But it still required lots of identical translations and source documents. We needed something that could accumulate input and learn. Deep learning models can translate between two languages without any human intervention. How? Recurrent Neural Networks and encodings. These two ideas build a self-learning translation system.

Neural Networks vs Recurrent Neural Networks

In a neural network, you have one output per input. Meanwhile, Recurrent Neural Networks (RNN’s) work very well with sequential data. They can “remember” previous inputs and think of an output based on all of the inputs.

This is important because language models try to predict the next word, given a sequence of words. And sequence to sequence machine translation (what RNN algorithms can do) became a game changer because it could do just that. They learn patterns in data. And human language is one big complex pattern.

RNN’s take very complicated data (a picture of a face) and turn it into something simple (128 numbers). Now, the job of comparing two faces becomes a question of comparing numbers. That is essentially how it works. The algorithm doesn’t even need to know what it’s doing.

So RNN’s are sequence to sequence algorithms?

Sequence to Sequence (often abbreviated to seq2seq) models are a class of Recurrent Neural Network (RNN) architectures. They solve complex language problems like Machine Translation, Question Answering, creating Chatbots, Text Summarization, and the like. They are Google Translate’s cheat sheet.

This model, as aforementioned, is useful for sequence-based problems. Which is why it didn’t necessarily begin trying to solve translation problems, but it did.

One RNN encodes a sequence of symbols into a fixed-length vector and the other decodes the representation into another sequence of symbols. The encoder and decoder model, which is the most popular, maximizes the probability of a target sequence is correct.

The performance of a statistical machine translation system is empirically found to improve by using RNN input.

Drawbacks of Encoder-Decoder Models

There are two primary drawbacks, both related to a length:

Firstly, as with humans, this architecture has limited memory. Second, the deeper a neural network is, the harder it is to train.

But describing just why and how this is, is out of our scope. What we can do is provide you with the best language experts for every single thing machine translation falls short of. You can trust our team at Stillman Translations.

If you would like to learn more about this topic go to “Massively Multilingual: AI, language and how data powers communication”

go back