'Translation of HTML content in Google Sheets
I am using the following Google Sheets formula to translate some fields containing HTML tags:
=GOOGLETRANSLATE(A2, "en", "de")
However, the translation results in a messed up HTML and extra spaces between tags opening or closing, especially if there are many nested tags.
For example:
<div> <p>paragraph text</p> </div>
will result in:
<div> <p> Absatztext </ P> </ Div>
Sometimes, the translator changes the tags opening and closing and put extra spaces between some attributes also the closing tags letters are in uppercase.
Issues like:
<p> Absatztext <P />
<a href = " # "> Link </ A>
Sometimes, text are added before the tag closing
<h2 Was> ist Pilates? </h2>
it should be:
<h2> Was ist Pilates? </h2>
Demo here: https://docs.google.com/spreadsheets/d/11MOZjTknFGdwuAp6g3VUa0o5OQaW44hxN2uEvqnL3jw/edit?usp=sharing
How can I fix those problems?
Solution 1:[1]
try simple fix like:
=LOWER(SUBSTITUTE(GOOGLETRANSLATE(A1, "en", "de"), "/ ", "/"))
UPDATE:
=SUBSTITUTE(A1, TRIM(REGEXREPLACE(A1, "</?\S+[^<>]*>", )),
GOOGLETRANSLATE(TRIM(REGEXREPLACE(A1, "</?\S+[^<>]*>", )), "EN", "DE"))
Solution 2:[2]
If you don't mind doing it in a single formula and just want to solve the problem, you could try splitting it and only translating what's not an HTML tag.
Put all HTML tags and closing tags in a separate sheet, so you can check for them. I'll put mine in 'tags'!A1:B128.
Considering you have your original text in A1, you can split it up by < and >:
=SPLIT(A1,"<>")
then on a line below (or elsewhere, for me it'll be A2) you can check if the first word in each cell is found among the tags with:
NOT(COUNTIF(tags!$A$1:$B$128,INDEX(SPLIT(A2," "),1,1)))
translate it if it's true with
GOOGLETRANSLATE(A2, "en", "de")
or add the brackets back with
"<"&A2&">"
so the whole formula will look like
=IF(NOT(COUNTIF(tags!$A$1:$B$128,INDEX(SPLIT(A2," "),1,1))),GOOGLETRANSLATE(A2, "en", "de"), "<"&A2&">")
then on a line below, just join the whole row back to a single cell with
=JOIN("",A3:L3)
You can hide rows 2 and 3 for convenience, or even put them on a separate sheet along with the tags. You can also add a condition not to add < and > if it's empty, so you can join up the whole row without looking at how long it is.
If you'd like to do this in a single formula, you'd have to write a script for it, as some formulas act strangely with arrayformula and sometimes are barely usable.
Solution 3:[3]
I think the most most convenient solution nowerdays is to use a custom JS-Function (Extensions >> AppScripts):
var spanish = LanguageApp.translate('This is a <strong>test</strong>',
'en', 'es', {contentType: 'html'});
// The code will generate "Esta es una <strong>prueba</strong>".
LanguageApp.translate
(apidoc) accepts as fourth option a contentType, which can be text
or html
.
For huge tables be aware that there are daily limits (quotas)!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Community |
Solution 2 | zoltankundi |
Solution 3 | fraank |