'How to translate only text in formatted HTML code using Google Apps Script?

I have been trying to translate text from HTML code. Here is an example:

var s = '<span>X stopped the</span><icon></icon><subject>breakout session</subject>'

When I try =GOOGLETRANSLATE(s,"en","fi") in Google Sheet, it also changes the tags formatting and translates tags into simple text. Whereas the translation should be only for X stopped the breakout session. But that is not the case.

Then I tried this function:

function TransLang(string){

   return LanguageApp.translate(string,'en', 'fi', {contentType: 'text'});
}

This function worked well (for some time), but after that I got an error

Service invoked too many times in one day.

So I am stuck here. Is there any way that we can translate simple text of html code without translating/messing with HTML tags? Is there any regex that can avoid tags and translate all the other simple text?

I hope I am able to state my problem clearly. Please guide me if you have any suggestions. Thank you



Solution 1:[1]

So, after a lot of digging, I have been able to find what I was looking for.

function Translator(S){

  var sourceLang = "en";
  var targetLang = "fi"; 
  var url =
    'https://translate.googleapis.com/translate_a/single?client=gtx&sl=' 
    +
    sourceLang +
    '&tl=' +
    targetLang +
    '&dt=t&q=' +
     encodeURI(S);

  var result = JSON.parse(UrlFetchApp.fetch(url).getContentText());  
  return result[0][0][0];
     
}

This simple function calls Google translate Api and extracts the result from there. The best thing is you do not have to worry about the tags, as they are not translated by Google, so just the simple text is translated. There is just one limitation in the solution that Api calls are limited, so you can not make more than 5000 calls/day.

Solution 2:[2]

Is the text you want always inside a single <span>? Or could there be more than one span or other element types?

This works for extracting the inner text from a single <span>:

function getSpanText() {
  let s = '<span>X stopped the</span><icon></icon><subject>breakout session</subject>';
  var text = s.match("(?<=<span>).+(?=<\/span>)")[0]
  Logger.log(text);
  return text
}

Solution 3:[3]

Why not using LanguageApp.translate as a custom JS-Function (Extensions >> AppScripts)?!

var spanish = LanguageApp.translate('This is a <strong>test</strong>',
                                  'en', 'es', {contentType: 'html'});

// The code will generate "Esta es una <strong>prueba</strong>".

LanguageApp.translate (apidoc) accepts as fourth option a contentType, which can be text or html. For huge tables be aware that there are daily limits (quotas)!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Roomi
Solution 2 GreenFlux
Solution 3 fraank