'How to translate and update Azure Cognitive Search Index document for different Language Analyzer fields?

I am working on configuration of Azure Cognitive Search Index which will be queried from websites in different languages. I have created language specific fields and have added appropriate language analyzers while Index creation. For example:

{
    "id": "",
    "Description": "some_value",
    "Description_es": null, 
    "Description_fr": null,
    "Region": [ "some_value", "some_value" ],
    "SpecificationData": [
        {
            "name": "some_key1",
            "value": "some_value1",
            "name_es": null,
            "value_es": null,
            "name_fr": null,
            "value_fr": null
        },
        {
            "name": "some_key2",
            "value": "some_value2",
            "name_pt": null,
            "value_pt": null,
            "name_fr": null,
            "value_fr": null
        }
    ]
}

The fields Description, SpecificationData.name and SpecificationData.value are in English and coming from Cosmos DB. Fields Description_es, SpecificationData.name_es and SpecificationData.value_es will be queried from the Spanish website and should be fields translated in Spanish. And similar for the French language fields. But since, Cosmos DB is having fields only in English, language specific fields such as Description_es, SpecificationData.name_es and SpecificationData.value_es are Null by default. I have tried using Skillsets and linking Index to "Azure Cognitive Translate Service" but Skillsets are translating only one field at a time. Is there any way to translate multiple fields and save the specific translation in particular fields?

Edit: Adding Index, Skillset and Indexer code that I have tried:

Index (snippet):

{
    "name": "SpecificationData",
    "type": "Collection(Edm.ComplexType)",
    "analyzer": null,
    "synonymMaps": [],
    "fields": [
        {
            "name": "name",
            "type": "Edm.String",
            "facetable": true,
            "filterable": true,
            "key": false,
            "retrievable": true,
            "searchable": true,
            "sortable": false,
            "analyzer": "standard.lucene",
            "indexAnalyzer": null,
            "searchAnalyzer": null,
            "synonymMaps": [],
            "fields": []
        },
        {
            "name": "value",
            "type": "Edm.String",
            "facetable": true,
            "filterable": true,
            "key": false,
            "retrievable": true,
            "searchable": true,
            "sortable": false,
            "analyzer": "standard.lucene",
            "indexAnalyzer": null,
            "searchAnalyzer": null,
            "synonymMaps": [],
            "fields": []
        },
        {
            "name": "name_fr",
            "type": "Edm.String",
            "facetable": true,
            "filterable": true,
            "key": false,
            "retrievable": true,
            "searchable": true,
            "sortable": false,
            "analyzer": "fr.lucene",
            "indexAnalyzer": null,
            "searchAnalyzer": null,
            "synonymMaps": [],
            "fields": []
        },
        {
            "name": "value_fr",
            "type": "Edm.String",
            "facetable": true,
            "filterable": true,
            "key": false,
            "retrievable": true,
            "searchable": true,
            "sortable": false,
            "analyzer": "fr.lucene",
            "indexAnalyzer": null,
            "searchAnalyzer": null,
            "synonymMaps": [],
            "fields": []
        }
    ]
}

Skillset:

{
    "@odata.type": "#Microsoft.Skills.Text.TranslationSkill",
    "name": "psd_name_fr",
    "description": null,
    "context": "/document/SpecificationData",
    "defaultFromLanguageCode": null,
    "defaultToLanguageCode": "fr",
    "suggestedFrom": "en",
    "inputs": [
        {
            "name": "text",
            "source": "/*/name"
        }
    ],
    "outputs": [
        {
            "name": "translatedText",
            "targetName": "name_fr"
        }
    ]
}

Indexer:

"outputFieldMappings": [
    {
        "sourceFieldName": "/document/SpecificationData/*/name/name_fr",
        "targetFieldName": "/name_fr" //I get an error message as "Output field mapping specifies target field 'name_fr' that doesn't exist in the index". I have tried accessing the full path as /document/SpecificationData/name_fr but it still gives same error. It looks for the specified field inside root structure and gives the error if the field is nested array object.
    }
]


Solution 1:[1]

You could use a text merge skill first to merge all the fields you want to translate if you wanted to get one big merged translation field for each language. That probably wouldn't fit your exact scenario though since you said you still wanted separate fields as the output. To keep them separate, I think you'll have to translate them one by one, with one translation skill per field and language. There's no problem in having more than one translation skill in a skillset so that should work fine, it just may be a little tedious to setup.

UPDATE 5/18/22

OK, so since you're not defining a complex SpecificationData index field, but instead top-level "name_fr" and so on, then yes, output field mappings are fine. Output field mappings map a path in the enriched document to an index field, by name. So targetFieldName should be "name_fr" with no leading slash. sourceFieldName should point to the output of your translation skill, name_fr under the context path, which is /document/SpecificationData, so the full path to your skill's output is /document/SpecificationData/name_fr.

But then there's another issue, which is that you really have an array of values as the output of the skill of the skill because of the * in the input path (/*/name). That probably won't work as the index field is a string and not an array.

It seems like your intent is to get a translation for each name of each SpecificationData entry. For that, your context should probably do the enumeration (/document/SpecificationData/*) and have the input path be /document/SpecificationData/*/name. This way, one name_fr will be under each item in the SpecificationData array.

Then you'll need to make those multiple values into a single string for the index, if you keep the index defined that way. The simplest way to do this is by using a text merger skill, probably something like this:

{
  "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
  "context": "/document",
  "inputs": [
    {
      "name": "itemsToInsert", 
      "source": "/document/SpecificationData/*/name_fr"
    }
  ],
  "outputs": [
    {
      "name": "mergedText", 
      "targetName" : "name_fr"
    }
  ]
}

And then, since the output of this new skill will be /document/name_fr with the space-separated concatenation of all French-translated names, you don't need the output field mapping at all, the value will get automatically mapped to your index.

Finally, to better understand and debug skillsets, you should take a look at debug sessions.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1