'Google Cloud Translate: num_valid_languages_in_csv = 1

Attempting to add a glossary to Google Cloud Translate, but am receiving the following error:

Traceback (most recent call last):
  File "Python_SetGlossary.py", line 36, in <module>
    result = operation.result(timeout=90)
  File "C:\Programming Installs\Python\lib\site-packages\google\api_core\future\polling.py", line 127, in result
    raise self._exception
google.api_core.exceptions.GoogleAPICallError: None Failed to parse content of input file. Error: Not enough valid languages in CSV file. Must have terms for at least two different languages. num_valid_languages_in_csv = 1

The CSV file (below) was created using the example provided by Google for equivalent terms sets.

en,fr,pos
Canadian Meteorological Service of Environment Canada,Service météorologique d'Environnement Canada,noun
Jacques Cartier Strait,détroit de Jacques-Cartier,noun
the St. Lawrence Global Observatory,l'Observatoire global du Saint-Laurent,noun
St. Lawrence Global Observatory,Observatoire global du Saint-Laurent,noun

This was uploaded to Google Cloud Storage. I then attempted to create an online glossary by making at available to the Cloud Translation API, again via the code provided by Google for equivalent terms sets.

from google.cloud import translate_v3 as translate

# def sample_create_glossary(project_id, input_uri, glossary_id):

"""Create Glossary"""
client = translate.TranslationServiceClient()

# TODO(developer): Uncomment and set the following variables
project_id = 'testtranslate'
glossary_id = 'glossary-en-fr-bidirectional'
input_uri = 'gs://bidirectional-en-fr/bidirectional-glossary.csv'
location = 'us-central1'  # The location of the glossary

name = client.glossary_path(
    project_id,
    location,
    glossary_id)
language_codes_set = translate.types.Glossary.LanguageCodesSet(
    language_codes=['en', 'fr'])

gcs_source = translate.types.GcsSource(
   input_uri=input_uri)

input_config = translate.types.GlossaryInputConfig(
    gcs_source=gcs_source)

glossary = translate.types.Glossary(
    name=name,
    language_codes_set=language_codes_set,
    input_config=input_config)

parent = client.location_path(project_id, location)

operation = client.create_glossary(parent=parent, glossary=glossary)

result = operation.result(timeout=90)
print('Created: {}'.format(result.name))
print('Input Uri: {}'.format(result.input_config.gcs_source.input_uri))

Can anybody help me figure out what is going on / what I'm doing wrong? (Or what Google is doing wrong. Some of their documentation is definitely suspect. But also I am not particularly experienced with Python and could easily be missing something.)



Solution 1:[1]

For some reason, it required the first column in the CSV to be blank.

,en,fr,pos
,Canadian Meteorological Service of Environment Canada,Service météorologique d'Environnement Canada,noun
,Jacques Cartier Strait,détroit de Jacques-Cartier,noun
,the St. Lawrence Global Observatory,l'Observatoire global du Saint-Laurent,noun
,St. Lawrence Global Observatory,Observatoire global du Saint-Laurent,noun

No idea why, but it now works.

Solution 2:[2]

There is two type of Glossary in Google Cloud Translate Advanced.

The first is Unidirectional Glossaries This is just a simple a pair source language and target language in TSV, CSV or TMX format. Column header is not required.

Example data in CSV format

account,cuenta
directions,indicaciones

There is also the one you're currently using, they named it "Equivalent term sets". This format only available in CSV format. You use this format if you want to create a glossary with more than two languages. Header is required in this type of glossary.

Example data in CSV format :

first language,Second language,pos,description
account,cuenta,noun,A user's account. Do not use as verb.

Or, when there are 3 language :

first language,Second language,third language,pos,description
word in first language,word in second language, word in third language,noun,some information

As you can see, there are two extra column in this type of glossary: "pos" and "description". So at the minimum (when there is only a pair of language), there should be 4 column if you're using this type of glossary.

Also, in your case. You clearly need Unidirectional type of glossary instead of Equivalent Term Sets.

In your code above, instead of using language_codes_set you should use language_pair. You can see the sample REST request here (It's lacking on Python sample code though).

Solution 3:[3]

Adding info to Donovan's answer, it seems the correct way to create an Undirectional Glossary in python is:

  1. define a LanguageCodePair
language_codes_pair = translate.Glossary.LanguageCodePair()
  1. set source and target language attributes
language_codes_pair.source_language_code = 'source-language-code'
language_codes_pair.target_language_code = 'target-language-code'
  1. create the glossary
glossary = translate.Glossary(name=name, language_pair=language_codes_pair, input_config=input_config)

This worked for me

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 TeamAwareness
Solution 2 Donovan P
Solution 3 Massimo Gennaro