Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

99379 ai #318

Merged
merged 5 commits into from
Aug 16, 2024
Merged

99379 ai #318

merged 5 commits into from
Aug 16, 2024

Conversation

jowilco
Copy link
Collaborator

@jowilco jowilco commented Aug 15, 2024

No description provided.

@acrolinxatmsft1
Copy link
Collaborator

Acrolinx Scorecards

A minimum total score of 80 is required.

Select the total score link to review all feedback on clarity, consistency, tone, brand, terms, spelling, grammar, readability, and inclusive language. You should fix all spelling errors regardless of your total score. Fixing spelling errors helps maintain customer trust in overall content quality.

Article Total score
(Required: 80)
Words + phrases
(Brand, terms)
Correctness
(Spelling, grammar)
Clarity
(Readability)
globalization/localization/ai/ai-and-llms-for-translation.md 90 100 100 69
globalization/localization/ai/ai-and-localization.md 90 100 89 62
globalization/localization/ai/localizing-ai-based-features.md 88 100 100 58
globalization/toc.yml 95 100 84 95

More information about Acrolinx

Copy link

Learn Build status updates of commit d09624d:

✅ Validation status: passed

File Status Preview URL Details
globalization/localization/ai/ai-and-llms-for-translation.md ✅Succeeded View
globalization/localization/ai/ai-and-localization.md ✅Succeeded View
globalization/localization/ai/localizing-ai-based-features.md ✅Succeeded View
globalization/toc.yml ✅Succeeded View

For more details, please refer to the build report.

For any questions, please:

@jowilco jowilco requested a review from alfredoalmeida August 15, 2024 17:20

# Using artificial intelligence and large language models for translation

With recent advances in large language models (LLMs), many localizers are considering whether to use AI instead of existing machine translation (MT) systems or even as a replacement for human translators (HT). The latest LLMs are performing well, getting close to HT-level quality, especially for “high-resource” languages. However, LLM-based solutions:
Copy link
Collaborator

@brunolewin-msft brunolewin-msft Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but some style comments/suggestions:

  • many localizers --> remove localizers. They might be the last ones to want this and other functions tend to be the ones doing the considering.
  • replace human translators (HT) --> human translation (HT) - this is a touchy subject, so better focus on changing a process, not replacing people (whose role might evolve rather than just be eliminated)
  • high-resource” languages --> jargon that not all will understand, either use plain language or just use "some languages" if you want to keep concise.


Advances in large language models (LLMs) are enabling new paradigms for natural language processing tasks, which include translation. LLMs have the potential to outperform NMT, while enabling [natural language processing features](localizing-ai-based-features.md) in multilingual applications.

### Large language models and globalization
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sub-section has good information but a lot of it feels out of scope under "Artificial intelligence and translation technology". It feels like the point that's implicitly made in this sub-section as well as in Using LLMs for localization tasks other than translation is that LLM are able to directly provide output in multiple languages and that might remove the need for translation - it could be easier on the reader to explain that explicitly in its own section and move the relevant pieces under that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section introduces terms and concepts; however, I moved it outside of the AI and translation technology section

Copy link
Collaborator

@joeyobr joeyobr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

John, I'll send you a Word doc with my edits.


## Using LLMs for localization tasks other than translation

Due to their wide applicability for language processing tasks, consider using LLMs for other tasks in your localization workflow. For example,
Copy link
Collaborator

@brunolewin-msft brunolewin-msft Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[fixed typos] If we encourage readers to use LLM for these tasks, it might be good to remind to have a reminder that it's not trivial how to do this well and responsibly. A more neutral option for this paragraph could to be to say that below are areas that the industry is exploring rather than just suggest that the reader tries them.


## Generating output in non-source languages

There are two general approaches when creating output from LLMs in languages other than the original language:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it is a binary decision. My recommendation is to globalize the source prompt as much as possible and localize prompt for language-specific cases where the deviation is needed. We have evidence that there's no significant different between asking LLMs to translate with a source or a localized prompt.

- ai-seo-date:08/15/2024
---

# Using artificial intelligence and large language models for translation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See how you could integrate this write up:

Adopting AI in translation is a forward-thinking move that aligns with the latest advancements in technology. It’s essential to transition to this new model thoughtfully and incrementally, ensuring that it meets established benchmarks for each language before full implementation.

When evaluating the case for shifting to AI-based translation, it’s crucial to consider various factors such as risk management, ensuring high-quality outputs, the total cost of ownership, and the system’s performance.

The transition to AI should be a step-by-step process, tailored to the specifics of each product, content type, market, language, and customer expectations. This approach allows for a balanced and justified move towards AI, especially in cases where the ROI may be minimal.

In terms of risk, AI-based translation carries a new set of challenges that require thorough human evaluation. Ensuring responsible AI usage is paramount, particularly for sensitive applications, to maintain the integrity of the brand and manage potential reputational risks. Special attention should be paid to new or updated terminology, and frequent spot-check validation of the LLM updates, as newer versions of the models may introduce degradation for some languages.

Quality control is variable across different languages. While AI-based translation has exceeded or matched the quality of traditional methods in some languages, it still poses significant challenges in others. The focus of the quality reviews should include two factos: linguistic quality, and adequacy. So, not only the text is appropriately written following the required linguistic quality required by your products, but it should be an adequate translation for the source. The latter is specially important since, as opposed to MT, LLMs can introduce hallucinations.

Cost-wise, some of the latest AI models are slightly cost-effective than their predecessors. However, the total cost of ownership, which includes both the operational and personnel costs, must be taken into account.


Either approach might be most appropriate for your prompt, use case, and source/target language pair. Testing the output then becomes critical to ensuring the best chance of supporting your customers in their language.

## LLM prompt engineering and testing the output
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be important to breakdown prompts into role, system instructions, examples, and contextual information or RAG (Retrieval-Augmented Generation). A short section on RAG should be included, to explain the importance of providing the models not only with the source string, but also with the correct terminology and contextual comments. BTW, it might be worth considering a writeup on using AI for obtaining more accurate comments for translators as part of the source text handling, and perhaps referencing it here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jowilco - Use i18n prompt rules from Khipu


Your source language prompt might include *content moderation*, for example, specifying terms or words that shouldn't be used in the response. Content moderation might also attempt to use the prompt to make responses suitable for a specific age group or to meet national content requirements. You might need to adjust the prompt so that content moderation is still appropriate for the target language or locale.

### Cross-border data flow and AI laws
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: consider merging the regulatory considerations currently under in "Region availability" into Cross-border data flow and AI laws so all the regulatory bits are together.


If you intend to start with a source language prompt, then translate it for each target language, you shouldn't assume that just translating the prompt will be sufficient. Prompt translation is akin to translating marketing materials, in other words, [transcreation](../transcreation.md). While the translated prompt might be a good starting point, you'll need to repeat the prompt engineering process for each target language.

## Considerations for effective LLM prompts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title (effective LLM prompts) does not really match the content of this section - consider breaking out the various regulatory/deployment points unrelated to prompt engineering into their own "level 2" section - see also my next comment,


AI-based products and features have become more prevalent since the 2020 release of the Generative Pre-trained Transformer 3 (GPT-3) large language model (LLM). These features are usually designed to support the source-language market. While other language support might be easy to enable, you shouldn't assume that features will work without more extended customization.

It's essential to ensure that the LLM's outputs align with business goals and user expectations. Consider an LLM generating marketing emails. Without evaluation, these emails might come across as too formal, too casual, or too generic, depending on the target language. By assessing a sample of outputs for each target language, you can optimize their impact and relevance, making sure they effectively meet the business's objectives and resonate with the target audience.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to also mention evaluating basic linguistic quality in the languages being considered.

@acrolinxatmsft1
Copy link
Collaborator

Acrolinx Scorecards

A minimum total score of 80 is required.

Select the total score link to review all feedback on clarity, consistency, tone, brand, terms, spelling, grammar, readability, and inclusive language. You should fix all spelling errors regardless of your total score. Fixing spelling errors helps maintain customer trust in overall content quality.

Article Total score
(Required: 80)
Words + phrases
(Brand, terms)
Correctness
(Spelling, grammar)
Clarity
(Readability)
globalization/localization/ai/ai-and-llms-for-translation.md 85 95 90 66
globalization/localization/ai/ai-and-localization.md 90 100 89 62
globalization/localization/ai/localizing-ai-based-features.md 91 100 100 69
globalization/toc.yml 95 100 84 95

More information about Acrolinx

Copy link

Learn Build status updates of commit 959c0a7:

⚠️ Validation status: warnings

File Status Preview URL Details
globalization/localization/ai/ai-and-llms-for-translation.md ⚠️Warning View Details
globalization/localization/ai/ai-and-localization.md ✅Succeeded View
globalization/localization/ai/localizing-ai-based-features.md ✅Succeeded View
globalization/toc.yml ✅Succeeded View

globalization/localization/ai/ai-and-llms-for-translation.md

  • Line 63, Column 121: [Warning: bookmark-not-found - See documentation] Cannot find bookmark '#responsible-ai' in 'localization/ai/ai-and-localization.md'.
  • Line 99, Column 87: [Warning: bookmark-not-found - See documentation] Cannot find bookmark '#responsible-ai' in 'localization/ai/ai-and-localization.md'.

For more details, please refer to the build report.

Note: Your PR may contain errors or warnings or suggestions unrelated to the files you changed. This happens when external dependencies like GitHub alias, Microsoft alias, cross repo links are updated. Please use these instructions to resolve them.

For any questions, please:

@acrolinxatmsft1
Copy link
Collaborator

Acrolinx Scorecards

A minimum total score of 80 is required.

Select the total score link to review all feedback on clarity, consistency, tone, brand, terms, spelling, grammar, readability, and inclusive language. You should fix all spelling errors regardless of your total score. Fixing spelling errors helps maintain customer trust in overall content quality.

Article Total score
(Required: 80)
Words + phrases
(Brand, terms)
Correctness
(Spelling, grammar)
Clarity
(Readability)
globalization/localization/ai/ai-and-llms-for-translation.md 85 95 90 66
globalization/localization/ai/ai-and-localization.md 90 100 89 62
globalization/localization/ai/localizing-ai-based-features.md 91 100 100 69
globalization/toc.yml 95 100 84 95

More information about Acrolinx

Copy link

Learn Build status updates of commit 09e5d6d:

✅ Validation status: passed

File Status Preview URL Details
globalization/localization/ai/ai-and-llms-for-translation.md ✅Succeeded View
globalization/localization/ai/ai-and-localization.md ✅Succeeded View
globalization/localization/ai/localizing-ai-based-features.md ✅Succeeded View
globalization/toc.yml ✅Succeeded View

For more details, please refer to the build report.

For any questions, please:


Quality control is variable across different languages. While AI-based translation has exceeded or matched the quality of traditional methods in some languages, it still poses significant challenges in others. The focus of the quality reviews should include two factors: linguistic quality, and adequacy. Ensure that the text is appropriately written following the required linguistic quality required by your products and is an adequate translation for the source. The latter is specially important since, as opposed to MT, LLMs can introduce *fabrications* or *hallucinations*. Fabrications are words or phrases that aren't present in the source text but are generated by the model. The fabricated text might be factually correct, but it can also be incorrect or misleading, even when the text seems plausible.

Cost-wise, some of the latest AI models are slightly cost-effective than their predecessors. However, the total cost of ownership, which includes both the operational and personnel costs, must be taken into account.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly more cost-effective?

- risk management
- ensuring high-quality outputs
- the total cost of ownership
- the system’s performance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add impact on people/processes to that list?

agustd
agustd previously approved these changes Aug 16, 2024
@acrolinxatmsft1
Copy link
Collaborator

Acrolinx Scorecards

A minimum total score of 80 is required.

Select the total score link to review all feedback on clarity, consistency, tone, brand, terms, spelling, grammar, readability, and inclusive language. You should fix all spelling errors regardless of your total score. Fixing spelling errors helps maintain customer trust in overall content quality.

Article Total score
(Required: 80)
Words + phrases
(Brand, terms)
Correctness
(Spelling, grammar)
Clarity
(Readability)
globalization/localization/ai/ai-and-llms-for-translation.md 86 95 95 66
globalization/localization/ai/ai-and-localization.md 90 100 89 62
globalization/localization/ai/localizing-ai-based-features.md 91 100 100 69
globalization/toc.yml 95 100 84 95

More information about Acrolinx

Copy link

Learn Build status updates of commit e15f95d:

✅ Validation status: passed

File Status Preview URL Details
globalization/localization/ai/ai-and-llms-for-translation.md ✅Succeeded View
globalization/localization/ai/ai-and-localization.md ✅Succeeded View
globalization/localization/ai/localizing-ai-based-features.md ✅Succeeded View
globalization/toc.yml ✅Succeeded View

For more details, please refer to the build report.

For any questions, please:

@jowilco jowilco merged commit 48b0f88 into main Aug 16, 2024
4 checks passed
@jowilco jowilco deleted the 99379_AI branch August 16, 2024 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants