Language Quality Assurance: the business, the science, the practice and the tool
The language industry calls for a methodology to measure the quality of translations. What is the science behind language quality? Logrus International has developed a practical implementation that can be used to measure and eventually improve translation output.
Language quality in business
Most language professionals are in this business because they love language. However, the language industry considers quality aspects primarily from a business perspective. For businesses that consume or render language services, language quality is much more than an art. Language quality directly affects business considerations.
The main criterion when choosing a translation provider is not the price, the time or quality alone. Usually the decisive factor is quality vs. cost and time. And while cost and time are relatively easy to quantify, quality is not so easy to measure. People tend to use qualitative statements such as “we provide the best quality”, but how do you measure language quality? Without a measure businesses have difficulties to define a roadmap for their business objectives. For language service providers an accurate tool is required to measure the quality of their services.
So what do customers want? Actually customers want a lot of things: They want more content to be translated for less money; they want faster turnarounds and lower vendor costs; they want shorter, simpler and more objective service job placement cycles and processes; they don’t want to debate language quality, they want access to clearly defined, measurable and transparent language services. They don’t want to spend time and money on internal language services; they want a roadmap to continuous automation and technologies (MT included). Technology is seen as part of the future translation architecture.
To achieve these goals, the language industry needs to develop tools and metrics to be able to measure linguistic quality in a consistent, stable and reliable fashion. With varying quality metrics the translation business process is vulnerable, volatile, expensive, complex and immeasurable. The lack of defined and reliable quality standards degrades language services labor value, opens doors for incompetence and unacceptably low quality, causes noise, prevents from distinguishing between service levels and makes many processes more difficult to plan, build and control.
There are a number of proprietary metrics to measure quality; however it is important to understand whether they can be viewed from a single perspective to assess the entire translation systematically. Let’s dig into the science of language quality to understand what can be done.
The science of language quality
Before we start the analysis, let’s consider some common and general facts about language quality and what makes measuring it so difficult:
- The number of correct sentences and translations is unlimited. Human understanding and impression is a process, not a one-time act. Translation has many routes. And in many cases it’s not that one route is better than the other; they may be equally efficient albeit in a different fashion. That’s why we must compare the target result of linguistic work not with the one and only reference point, but with the realm of correct, valid and good reference points.
- Language quality is intrinsically subjective. Quality in its entirety is the sum of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs. Thus, quality is the result of human impression and satisfaction by definition. That’s why it’s impossible to automate the measuring of language quality. Human judgment brings in the basic and intrinsic feature of language quality – the measure of human satisfaction and human perception.
- Language quality is multidimensional. We intuitively understand that a linguistic work can be nicely worded but still contain grammatical or terminological errors and inconsistencies.
Starting from these axioms, the next step is to divide the compound measurement function into elementary parts of quality and find a way to define the function itself. Let’s start from the individual elements of quality and try to find out what the compound function may look like.
First of all, it is easy to see that the dimensions of linguistic quality can be separated into two groups – objective and subjective. Objective characteristics are easy to enumerate and define – they are spelling, omissions, terminology, grammar, and National Language Standards (NLS). Subjective criteria are more difficult to subdivide into categories; however the most systematic option seems to be adequacy, readability and consistency (see drawing below).

- Fig.1: The Dimensions of Language Quality
Creating quality metrics
All these dimensions affect our perception of language quality. However, while it is relatively easy to define objective criteria, it is far more difficult to establish subjective criteria since they are rather vague and depend on an expert opinion.
Consider this: humans are exceptionally good in making quick judgments based on the first impression, even with a lack of information. Malcolm Gladwell explains this in his excellent book “Blink”: we don’t need a lot of time to decide whether something is worth our attention or not. We look at the person or object and decide instantly if the specimen is worth for us to spend further time on. This is how our species defines impression, and therefore this is how the metrics of language quality should work, too. So we must consider the following facts and problems:
- Human perception plays a huge role in the total evaluation. Overall, subjective impression accounts for more than 80% of the entire metrics.
- Before considering the details, humans sort out quickly whether the object or the subject in front of them is worth their attention, and sort out what is not worthy to spend their time on.
This tells us that there is a cut-off quality level below which the material is rejected. It doesn’t matter if it’s the number of typos or a very poor style – if the material is below this specific grade on any of the dimensions, it must be graded as zero and not even taken into consideration.
Since subjective dimensions are rated by humans, the dispersion of the grades is very significant, and therefore any metrics that would give 80% to the expert rating, would be unstable and therefore not really suitable for business or technical purposes. We must therefore define the function that provides subjective dimensions significant weight, and at the same time the system should weight the grade so that the metric is stable even if expert opinions vary significantly.
This is the procedure:
- Set cut-off marks for every quality dimension.
- For those inputs that pass the cut-off, subjective grades must take only small weight in the overall grade, in order to decrease instability and add more objectivity to the grade and the entire process of language quality measurement.
The system is illustrated below:

- Fig.2: The contribution of language quality elements
After the translation has passed the Acceptable Quality filter, the key is to assign less weight to subjective dimensions than to those that can be measured objectively. This means: We cannot and should not cease to grade fluency or style, but their weight in the final score must be minor, otherwise the whole metric will become unstable. This can be illustrated as follows:

- Fig.3: The mechanics of grading
The metrics therefore would integrate all eight measurements with a certain weight. A mandatory requirement would be to be above a certain threshold level. Subjective parameters contribute less to the overall metrics than objective ones in order to achieve stability of the measurement and make it applicable for production processes, including rating machine translation output.
While developing the metrics that meet the criteria of usability, stability and accuracy, we must also consider another important issue: How much should we check for our grade to be reliable and representative? Do we need to check everything? Lacking the space in this article to dive deep into the statistical theory, let’s say that we take a simplified model of calculating the confidence level and confidence interval based on the grading of every word in the material.
Let’s try to see what size of the sample is sufficient to achieve the required confidence levels and confidence interval for varying sizes of population (entire project volume). Taking basic statistical formulas into our subject matter area, we will have the following graph:

- Fig.4: The size of the sample to LQA for various project volumes and parameters
It can be seen from the graph that if we want to achieve 95% confidence level with 1% error margin, we must take a certain minimum volume of samples to check, as illustrated in the table below:
|
Volume |
Sample Size / Check |
|
< 10 K words |
100% |
|
20 -> 200 K words |
85% -> 45% (+/- 25%) |
|
> 300 K words |
100-150 K words |
Table 1: Minimum sample size to check depending on the total project volume.
The rules of optimal sampling therefore are:
- A certain minimum volume of sample is necessary for the grade to be reliable, stable, with appropriate confidence level and error margin, and big enough to make a conclusion about adequacy, fluency, etc.
- Samples must be selected randomly
- Recommended granulation for the reviewers is one-page (250 words) pieces
- All priority / exposure pieces must be checked in full.
These rules are to be followed for your LQA to be usable and valuable. Now, let's put theory into practice.
Manual implementations
This model can be implemented manually with the help of an Excel spreadsheet, where the model mechanic is implemented in spreadsheet formulas that the reviewer completes. Microsoft Excel is an excellent tool for such a form: It is simple to compose (no programming required), able to show the fields that are mandatory for the reviewer to complete, with the formulas implementing the grading model password-hidden behind the scenes. Here’s an example of such a form, with ratings on every dimension and the overall grade calculated and displayed automatically:
-

- Fig.5: The typical language quality evaluation form
Such a form is completed by specially trained linguists who are proficient in quality measurement methodology, are familiar with the model, have significant experience in the language industry (typically more than 3 years of translation experience), are familiar with client-specific requirements and guidelines, and know how to complete LQA forms correctly. Reviewers are not only trained in the general LQA process, but also receive client-specific training.
Of course, the LQA process also requires educating the suppliers, otherwise several problems may hinder the process. In particular, vendors tend to dispute everything, even if there is no chance to change the final LQA results. This may lead to lengthy, inconclusive discussions unless the vendors recognize the idea of LQA as a objective measurement instrument, which may help to improve their processes.
At the same time the vendor must be given a chance to rebut the LQA results, if they are questionable. But this should be only one cycle, with vendor comments based on each dimension separately, completely and entirely factual. Implemented in such a way all the parties are educated properly and it doesn’t take too long to explain the metrics and the system to all stakeholders. Thus the process to runs smoothly and can be fine-tuned continuously.
Implementing LQA on demand
The science of LQA is not easy, but in fact the practical implementations are even more difficult to build. It is not enough to develop and test a model; since LQA can only be implemented in its entirety by humans, it takes tremendous efforts to build a network of qualified linguists, especially if we want to build a multilingual QA; which is the only LQA service that is worth bothering to build.
The LQA service must be…
- multilingual
- on demand, ready and easy to purchase
- very well managed, since every client has his own requirements
- automated and online, otherwise it is neither transparent nor really usable
- semi-automated, since there are many routine repetitive tasks of doing handoffs, for example for many languages, and clients would not be prepared to pay for manual project management labor
Building a LQA service involves more than programming an internet portal, although much depends of course on how well it is designed and implemented. A major challenge is to have a constant volume of work for linguists to be engaged, because people work best when they are permanently involved in particular operations. Since building a network of linguists is rather time-consuming you can only build a cost-efficient LQA service, if you have a network of linguists who are involved in on-going operations every day.
Logrus LQA service
We have considered all of the above and were able to design, build, implement and run such service based on advanced methodology, system design, organization, project management, a language specialist network and a technology including a powerful client-server application with internet front-end for the clients to book LQA projects transparently. The architecture of LQA service on demand is illustrated below:

- Fig.6: The architecture of LQA service on demand
Logrus LQA service on demand features are:
- .NET Web Portal with automated workflow features
- SQL Server for rates, contacts, and project data
- Middleware LQA subject matter area formulas and calculations
- LQA methodology is the key
- PM service solution is the vehicle

- Fig.7: The client front-end of Logrus LQA booking system
The steps of realizing a LQA service on demand for a particular client are:
- The system is fine-tuned for a particular client.
- The client familiarizes himself with the model.
- LQA jobs are then placed via the portal – as required (OnDemand), transparently in terms of costing and timeframe of implementation.
The setup steps of engaging a LQA service as an on demand process are simple and quick to take, without the need to do research, development or internal service setup and production.
Results and advantages of on demand implementation
On demand implementations are usable and available. Such a system is able to meet all the requirements listed above, while it is also practical, constantly available, reliable, stable, consistent, easy-to-use, and cost-efficient.
The advantages of such on demand implementation are:
- Easy and simple to use
- Immediate costing of small pieces is within the responsibility of any PM: No budgetary approvals required
- Quick
- Universal and objective
- LQS becomes a practical, instrumental and essential PM tool: It allows you to quickly make PM level decisions regarding unknown languages
The key features and advantages of on demand a LQA service are:
- Multilingual: QA service covers 57+ languages. Logrus is able to quickly and independently develop solutions for new languages.
- Universal: All types of content including Web content, documentation, user assistance, templates, wizards, CD booklets, Readme files, software, setup, audio files, scripts, etc.
- Objective: Impersonal QA is linguistically provided by companies rather than freelance reviewers. The result is measurable and quantifiable on an objective predefined scale, not personal opinion.
- Scalable: The service can handle small QA tasks with fast turnaround times on short notice and is highly scalable at peak hours.
- Customizable: the QA model can be modified to fit a customer’s practices
and requirements. - Accountable: the competitive and transparently priced solution (based on formula with transparent parameters clear for LQA customers).
- Ready for Machine Translation: the service can easily be modified to meet the less stringent quality criteria used in a MT-based business process. QA Portal Service can measure anything, from human to MT output, with no usability limits.
To conclude, let's consider important business advantages of a LQA service and implications of adopting it as a business tool.
LQA as a business process reengineering tool
Let’s see how important LQA is as a business tool. Consider the following advantages:
- Granular PM time saving on the client side is an immediate productivity increase and benefit. It takes about 120 hours to set up a manual LQA process from scratch, 80 hours to manually carry out an average QA project and process and interpret results.
- Project optimization savings for the entire project flow in terms of time, number of activities, etc., can be a huge saving.
- It is economical and practical to centralize LQA and remove redundant and inconsistent LQA efforts.
- As outsourced, standardized LQA services develop, they will become more economical than in-house LQA services (centralized outsourced LQA project placement only takes 4 hours to read portal instructions and place LQA job online).
- LQA service on demand can be used as an important business intelligence and process reengineering tool, if realized as workflow steps in several different scenarios.
The last statement is key; it is important to understand that Linguistic Quality Assurance is not “just” a quality assurance tool. It is a powerful business tool used by our clients to implement the following initiatives:
- Product quality control
- Vendor evaluation
- Cost reduction mechanism
Product quality control is one of the most obvious implementations of a LQA process. Clearly, LQA is a must in production.

- Fig.8: The conveyor belt production needs quality control
Important considerations to be taken into account in this scenario are:
- The performance of a translation vendor is not a guarantee of future results.
- No certification can guarantee the quality of every deliverable.
- Build ready quality control into your translation processes (on qualified, proven and ready to be rendered SERVICE ON DEMAND).
- Vendor ratings can be also assessed, collected and analyzed based on constant spot-check product quality control.
- Pay attention to the number and size of samples to be checked on conveyor belt.
Another important business application is vendor performance evaluation. LQA can be used to spot-check the quality of your vendors without costly internal efforts. Historical data is a basis for vendor Performance Evaluation Score (e.g. if vendor receives a FAIL, his score is decreased weighted to the total volume he has translated since the most recent FAIL).
Finally, clients can use the LQA service to seek vendors with the same quality and service level, but lower price. The sample workflow to implement vendor replacement with the LQA Service on demand is illustrated in the figure below.

- Fig.9: The sample workflow of constant Vendor Management effort to find more economical vendors
Quelle



