CAT tools – ATA SLD

Achieving High-Quality Translation: The Final Step

June 24, 2024

High-quality translation requires in-depth domain knowledge, meticulous attention, and a series of well-defined steps. In this article, I’ll focus on the last stage of a standard translation project: automated Quality Assurance (QA).

QA ensures that hard-to-spot and easy-to-miss errors are eliminated. Common issues include source and target inconsistencies, capitalization errors, incorrect spacing around punctuation marks, numerical mistakes, missing or extra tags, incorrect quotation marks, terminology mistakes, measurement unit discrepancies, etc.

Below are the steps I use when doing QA.

Built-in CAT tool QA:

Most computer-assisted translation (CAT) tools (e.g., Trados Studio or memoQ) offer built-in QA functionality. Enable the relevant options in the CAT tool menus and run QA.

Export to Word and use Word’s proofing features:

If your CAT tool allows it, export the translation to Microsoft Word. Press F7 to check for additional mistakes that the CAT tool may have missed. Ensure that Word’s proofing options (grammar, repeated words, uppercase words, etc.) are activated.

Standalone QA tools:

Use standalone QA tools for comprehensive checks. Xbench, Verifika, and QA Distiller are some of the oldest and most popular ones.

QA Distiller is completely free, Xbench has a free limited-functionality version, and Verifika (my personal favorite) has a fully functional free web version, provides language-dependent checks, and covers numerous mistake categories. Take the time to configure Verifika with the options you need—it’s worth the effort.

Double-check and run QA again:

Correct any mistakes the tool found and run another round of QA to catch any overlooked errors or newly introduced mistakes.

Use multiple QA tools:

If you’re doing a test translation or working on a particularly important project, consider running QA using multiple tools. It’s better to spend time reviewing false positives than to miss an embarrassing error.

Impress the client:

Go the extra mile by exporting a final QA report containing only false positives. Demonstrating your commitment to quality will leave a positive impression.

Let no mistakes slip through into your translations!

This is the third and final post in a series of posts on translation quality. The first post can be found here, and the second here.

Mikhail Yashchuk is an industry veteran. In 2002, he received his university degree in English, and six years later he founded a boutique agency where he gained experience in linguist recruitment, project management, translation, editing, and quality assurance. He has recently been admitted as a sworn translator to the Belarusian Notary Chamber.
In 2018, Mikhail joined the American Translators Association and is now working as an English-to-Russian translator, actively sharing knowledge with younger colleagues. He is the moderator for the SLD LinkedIn group. He may be contacted at mikhail@lexicon.biz.

Upcoming November ATA webinars

November 7, 2021

The ATA is offering another round of webinars over the next few weeks that should prove to be fun and informative!

Introduction to Mobile App Localization

There’s an app for everything these days, and there’s now an ATA webinar on mobile app localization, too! 🙂
Presenter Dorota Pawlak will give us an introduction to mobile app localization and the role of translators working in this field. Drawing on years of experience in this field, the speaker will explain what skills, tools, and qualities are needed to localize mobile apps; what are some of the most common issues in mobile app localization projects; and how to solve them.

Join us on November 9 or sign up to get the recording and 1 ATA CEP
https://www.atanet.org/event/introduction-to-mobile-app-localization/

Registration closes: November 9, 10:00 am EST

memoQ for Intermediate and Advanced Users

Join this webinar to take your knowledge of memoQ to the next level: in this session, you will learn useful tips and tricks that can make your work as a translator a lot easier.
You will also learn how to search your preferred websites directly from the translation grid and how to connect to a machine translation provider to be able to use MT in your work. The trainer will also show you how to automate your processes using templates in memoQ and how to fine-tune the import of your documents with the help of powerful import filters.
Register at https://www.atanet.org/event/memoq-for-intermediate-and-advanced-users/
Remember that ATA members can save 35% on new licenses for memoQ translator pro.

Join us on November 12 at 12 pm EST (recording will be available) / 1 ATA CEP
Registration Closes: November 12, 10:00 am EST

Intermediate Tips and Tricks for Trados Studio

This hands-on webinar will explore useful features that will take you a step closer to becoming a power user of the most powerful and popular CAT tool in the market.
This webinar was organized in collaboration with RWS.

You will learn how to:

Identify and modify file type options
Work with a translation memory’s language resources
Use apps to extend Trados Studio’s functionality
Use machine translation for pre-translation and interactive translation
Set up verification option

Remember that ATA members can save 35% on Trados Studio 2021 Freelance and Trados Studio 2021 Freelance Plus.

Join us on November 17 at 12 pm EST (recording will be available) / 1 ATA CEP

Registration Closes: November 17, 10:00 am EST

A [Better] CAT Breed for the Slavic Soul

July 12, 2017

A review by Jennifer Guernsey

Aha! I said to myself upon spying this presentation among the 2013 ATA Conference’s offerings. At last, I will find out which elusive CAT tool actually does a good job with Slavic languages! I had tried several tools, but hadn’t yet run across one that was able to accommodate the peculiarities of my language, Russian, particularly when it came to all of the inflected forms.

Alas, it took no more than two slides for me to be sorely disappointed – not in Konstantin Lakshin’s presentation, but in the sad news that there is, in fact, no such thing as a good CAT tool for Slavic languages. Or, at least, there isn’t yet.

Despite my initial dismay at the news, I fortunately stayed to hear the entire presentation. It can be briefly summarized as follows: A combination of technical, linguistic, and particularly market forces have conspired to make CAT tools what they are today: decidedly Slavic-unfriendly. The good news is that many of the pieces needed to improve them already exist, and it’s up to us to put pressure on developers and companies to make use of those pieces.

The reason it took the better part of an hour to provide this information is that the presentation included a lot of very interesting history, examples, and details. It really was quite educational, at least for me.

Kostya started by outlining the history of computer use in translation, and the development of CATs in particular. He began with a discussion of a 1966 government-funded report by the Automatic Language Processing Advisory Committee on the use of computer technology in translation. The gist of this report as it applies to our CAT tool discussion is that machine translation doesn’t work well, but that something vaguely resembling what we now consider a CAT tool, with a similar workflow, might be useful. This pseudo-CAT workflow used the punch card operator – i.e., a human being – as a morphology analyzer. This is interesting, because one of our principal complaints about today’s CAT tools is that they do not have morphology analysis capability. The report also compared use of this early form of CAT with a standard translation process, and found that while it might save some time, its primary advantage was that it “relieve[d] the translator of the unproductive and tiresome search for the correct technical terms.” The report emphasized that compiling the proper termbase was really the key to an effective translation tool.

In the decade or so following the report, the emphasis in computer-assisted translation was thus on building termbanks. In other words, the focus was on words and phrases – small subsegments, if you will – and these termbanks were generally compiled for specific large organizations operating in specific contexts and were not readily transferrable to other entities.

The philosophy that drives current CAT tools – the “recycling” of previously translated texts – emerged fully only in 1979, though large corporations had begun exploring this starting in the late 1960s. This philosophy was in great part a result of the requirements and technologies in place at the time. In the 1960s, for instance, the world was a less integrated place, and there was limited control over the input side – the source text content, editing, and so on. The example Kostya provided was scientific texts coming out of the USSR that were being translated. Fast-forward to the 1980s and 1990s: large corporations have end-to-end control of processes and utilize translation (and translation technology) for their own documents. In this latter context, being able to retrieve and reuse entire sentences made a lot of sense. Note also that in the prevailing markets in which the early CAT tools developed, the primary languages were not highly inflected.

In the late 1980s and early 1990s, the first commercially available CAT tools appeared: IBM Translation Manager II, XL8, Eurolang, and two still-familiar tools, Trados and Star Transit. Trados, in particular, started life as a language services provider trying to get an IBM contract.

The mid- to late 1990s saw the emergence of tools being created ostensibly for translators: Déjà Vu, Memo Q, and WordFast. However, rather than being fundamentally different from their larger predecessors, these often turned out to be essentially smaller, less functional versions of Trados. This era also witnessed the development of smaller commercial players, such as WordFisher (a set of Word macros) and in-house tools such as LionBridge, Foreign Desk, and Rainbow (specifically for software localization), as well as Omega T, the first open-source CAT tool.

That brings us to the present day, the 2000s, when there are too many CAT tools to list, and there have been many mergers and acquisitions among them. However, NONE of the existing tools can be considered very useful for Slavic or other highly inflected languages. In addition to the reasons noted above, there were other issues that contributed to this situation as the software was being developed. First, there were no obvious ways to incorporate Cyrillic into early software. Second, there were additional market forces, such as software piracy, the cross-border digital divide, and the lack of major clients, that provided little incentive to software developers to make CAT tools that would be particularly useful in Slavic-language markets.

Today, we have a much wider playing field in terms of the market for translation. Translation work is “messier” now, and involves things like corporate rebranding and renaming, a variety of dialects and non-native speech, outsourcing, rewrites for search engine optimization, and bidirectional editing in which both source and target documents are being modified. In this environment, the old “termbase plus recycled text” CAT model is not sufficient.

From this historical background, Kostya next proceeded to illustrate just what the difficulties are that Slavic languages present for today’s CAT tools. These can be boiled down to their relatively free word order, their rich morphology, and their highly inflected nature. The CAT tool’s “fuzzy match” capabilities are insufficient for Slavic languages.

Kostya then provided a number of illustrative examples. Consider the following pairs of segments:

To open the font menu, press CTRL+1.

Press CTRL+1 to open the font menu.

Analyzing and characterizing behaviors

Analysing and characterising behaviours

He ran these and other examples through about a half-dozen CAT tools using a 50% match cutoff, and found that the first example was considered only a 60-80% match, and the second was 0% (in other words, below the 50% threshold). The CAT tools on the market generally do not recognize partial segments in a different order, nor can they tell that “analyzing” and “analysing” are essentially the same word. In other words, they lack language-specific subsegment handling, and morphology-aware matching, searching, and term management. They are also missing form agreement awareness (e.g., noun/adjective case agreement). This diminishes their utility for those translating out of Slavic languages, to be sure, but it also complicates matters for those translating into Slavic languages, as word endings in retrieved fuzzy matches must constantly be checked and corrected.

The obvious question that Kostya next asked is, can this situation be fixed? In theory, yes. Kostya believes that many software tools already in use by search engines, machine translation, and the like could be integrated into CAT tools. These include Levenshtein distance analyzers that can handle differences within words; computational linguistics tools such as taggers, parsers, chunkers, tokenizers, stemmers, and lemmatizers, which analyze such things as syntax and word construction; morphology modules; and even Hunspell, the engine already in use by numerous CAT tools for spellchecking but not for analyzing matches.

Developers continue to cite obstacles to integrating these tools: it’s complicated, they are too language-specific, we don’t know how to set up the interface, there are licensing issues, we have limited resources. While all of these are legitimate factors, Kostya believes that they do not present insurmountable obstacles. He is hopeful that developers will start seeing these tools as data abstraction tools that enable the software to break down the data into something that is no longer language-specific.

So what can we do about this lack of suitable CAT tools? Kostya’s recommendation is principally that we talk to software developers and vendors and explain what we want. We need to create our own market pressure to move things along. In addition, we need to educate developers and vendors about the existing tools that are available; for instance, we might point them to non-English search engines that utilize morphology analyzers.

Alas, there is neither a good CAT tool for the Slavic soul nor a quick fix to this situation. But after listening to Kostya’s presentation, I have a much better understanding of how this situation developed and how we might take action to prompt vendors and developers to move in a new direction.