With most of the information around the world being made available in English, linguistic diversity around the world and the result of globalization requires the information be made available in local languages where English is not spoken or written. Natural Language Processing is an area of research and application that explores how computers can be used to understand and manipulate natural language text or speech to do useful things. Today technology has made it possible for individuals worldwide to access large volumes of information at the click of a button. However, very often the information sought may not be in a language that the individual is familiar with.
Overview of Machine Translation
Machine Translation (MT) is the task of automatically converting one natural language into another, preserving the meaning of the input text, and producing fluent text in the output language.
The term machine translation (MT) is used in the sense of translation of one language to another. The ideal aim of machine translation systems is to produce the best possible translation without human assistance. Basically every machine translation system requires programs for translation and automated dictionaries and grammars to support translation. The translation quality of the machine translation systems can be improved by pre-editing the input. Translation quality can also be improved by controlling the vocabulary. The output of the machine translation should be post-edited to make it perfect. Post-editing is required especially for health related information.
A machine translation (MT) system is a programmed that uses natural language processing technology to automatically translate a text from one language to another. Google Translate is the most commonly used MT system.
Machine translation systems that produce translations between only two particular languages are called bilingual systems and those that produce translations for any given pair of languages are called multilingual systems. Multilingual systems may be either uni-directional or bi-directional. Multilingual systems are preferred to be bi-directional and bi-lingual as they have ability to translate from any given language to any other given language and vice versa.
Figure 1 Machine Translation Pyramid
Machine Translation Process
Machine translation is the process of translating from source language text into the target language. The following diagram shows all the phases involved
Figure 2 Typical Machine Translation (MT) Process
Input Source Text: This is the first phase in the machine translation process and is the first module in any MT system. The sentence categories can be classified based on the degree of difficulty of translation. Sentences that have relations, expectations, assumptions, and conditions make the MT system understand very difficult.
De-formatting and Reformatting: This is to make the machine translation process easier and qualitative. The source language text may contain figures, flowcharts, etc that do not require any translation. Once the text is translated the target text is to be reformatted after post-editing. Reformatting is to see that the target text also contains the non-translation portion.
Pre-editing and Post editing: The level of pre-editing and post-editing depend on the efficiency of the particular MT system. For some systems segmenting the long sentences into short sentences may be required. Fixing up punctuation marks and blocking material that does not require translation are also done during pre-editing. Post editing is done to make sure that the quality of the translation is up to the mark. Post-editing is unavoidable especially for translation of crucial information such as one for health. Post-editing should continue till the MT systems reach the human-like.
Analysis, Transfer and Generation: Morphological analysis determines the word form such as inflections, tense, number, part of speech, etc. Syntactic analysis determines whether the word is subject or object. Semantic and contextual analysis determines a proper interpretation of a sentence from the results produced by the syntactic analysis. Syntactic and semantic analysis is often executed simultaneously and produces syntactic tree structure and semantic network respectively.
Morphological Analysis and Generation: Computational morphology deals with recognition, analysis and generation of words. Some of the morphological processes are inflection, derivation, affixes and combining forms. Inflection is the most regular and productive morphological process across languages. Inflection alters the form of the word in number, gender, mood, tense, aspect, person, and case.
Syntactic Analysis and Generation: As words are the foundation of speech and language processing, syntax can considered as the skeleton. Syntactic analysis concerns with how words are grouped into classes called parts-of-speech, how they group their neighbors into phrases, and the way in which words depends on other words in a sentence.
Grammar Formalism: Grammar formalism is a framework to explain the basic structure of a language. Researchers propose the following grammar formalisms:
- Phrase Structure Grammar (PSG)
- Dependency Grammar
- Case Grammar
- Systematic Grammar
- Montague Grammar
Parsing and Tagging: Tagging means the identification of linguistic properties of the individual words and parsing is the assessment of the functions of the words in relation to each other.
Semantic and Contextual Analysis and Generation: Semantic analysis composes the meaning representations and assigns them the linguistic inputs. The semantic analyzer uses lexicon and grammar to create context independent meanings. The source of knowledge consists of meaning of words, meanings associated with grammatical structures, knowledge about the discourse context and commonsense knowledge
Types of Machine Translation
There are three types of machine translation on the basis of their working scenarios:
Rule-Based Machine Translation systems use large collections of rules, manually developed over time by human experts mapping structures from the source language to the target language. The human factor in rule-based systems helps deliver fairly good automated translations with predictable results. However, due to significant manual labor, rule-based systems can be quite costly, time consuming to implement and maintain and – as rules are added and updated – these systems have the potential of generating ambiguity and translation degradation over time.
Statistical Machine Translation systems use computer algorithms to produce a translation that looks best statistically from millions of permutations. Statistical models consist of words and phrases learned automatically from bilingual parallel sentences, creating a bilingual “database” of translations. The attractiveness of statistical systems comes from the level of automation in building new systems using its machine learning capabilities, leading to rapid turnaround time and the low cost of processing power required for constructing and operating these statistical models. However, the major downside with this type of engine is the “data-dilution effect” caused by scarcity of suitable data for ‘training’ these data-driven systems.
Hybrid Machine Translation, in order to address quality and time-to-market limitations, many Rule-Based Machine Translation developers are augmenting their core technology with Statistical Machine Translation technology to create ‘Hybrid Machine Translation’ solutions. Hybrids provide some quality improvement benefits, however, they keep the costs of Rule-Based systems high by adding complexities of managing side-by-side systems.
Next Generation Approaches, new “augmented” Machine Translation solutions are emerging, upgrading the capabilities (and overcoming the limitations) of Statistical Machine Translation. By introducing sophisticated data pre-processing (Language Transformation), Language Optimization Technologies and terminology management solutions, these new Statistical MT solutions are achieving the same quality improvements introduced by Hybrid MT while dispensing with the need for legacy technology – delivering a new standard in multi-lingual communication solutions.
 “Machine Translation”, the Stanford Natural Language Processing Group, available online at: https://nlp.stanford.edu/projects/mt.shtml
 What are the Main Types of Machine Translation? Available online at: http://www.machinetranslation.net/quick-guide-to-machine-translation/machine-translation-technologies
 Robin, “Machine Translation Process”, available online at: http://language.worldofcomputing.net/category/machine-translation