'Data Transfers to Big Query error "Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1"
Hello im currently trying to establish daily data transfers from Google Cloud Storage to Big Query tables. Theses tables are just meant to store raw data (JSON files) and i unnest them later with scheduled queries. I have no issue when i create the table manually but im getting the error Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1
when i launch a test transfer (even if I previously deleted the data in the table).
Here is an example of the raw data table/JSON schema:
{
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "relationships",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "createdBy",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "validationWorkflow",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "projects",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "expensesReport",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "agency",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "files",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "STRING",
"mode": "REPEATED"
}]
}
}, {
"name": "resource",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "validations",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "orders",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}]
}
}, {
"name": "attributes",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "plannedTimes",
"type": "STRING",
"mode": "REPEATED"
}, {
"name": "state",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "exceptionalTimes",
"type": "STRING",
"mode": "REPEATED"
}, {
"name": "closed",
"type": "BOOLEAN",
"mode": "NULLABLE"
}, {
"name": "informationComments",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "regularTimes",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "batch",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "title",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "workUnitType",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "name",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "activityType",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "reference",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "project",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "reference",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "row",
"type": "INTEGER",
"mode": "NULLABLE"
}, {
"name": "delivery",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "endDate",
"type": "DATE",
"mode": "NULLABLE"
}, {
"name": "startDate",
"type": "DATE",
"mode": "NULLABLE"
}, {
"name": "title",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "startDate",
"type": "DATE",
"mode": "NULLABLE"
}, {
"name": "duration",
"type": "FLOAT",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "updateDate",
"type": "TIMESTAMP",
"mode": "NULLABLE"
}, {
"name": "creationDate",
"type": "TIMESTAMP",
"mode": "NULLABLE"
}, {
"name": "absencesTimes",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "workUnitType",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "name",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "activityType",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "reference",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "duration",
"type": "FLOAT",
"mode": "NULLABLE"
}, {
"name": "startDate",
"type": "DATE",
"mode": "NULLABLE"
}]
}
}, {
"name": "term",
"type": "STRING",
"mode": "NULLABLE"
}]
}
}, {
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "date",
"type": "DATE",
"mode": "NULLABLE"
}]
}
I know that BQ works better with JSON newline delimited format but this table must contains raw data even if its just 1 row in the end. The weirdest thing is that the transfer works for some files with similar schemas.
What should I do to make these transfers work ?
Thank you for your help
Solution 1:[1]
Regarding GCP documentation, I would say that using JSON newline delimited format is a known limitation.
"JSON data must be newline delimited. Each JSON object must be on a separate line in the file."
Solution 2:[2]
Here is the solution:
Some of the fields were auto-detected as "TIMESTAMP" when uploading manually. However, Data Transfer didn't recognize these fields as "TIMESTAMP" during the daily updates of these tables. To solve this I edited the schema and declared these fields as "STRING".
Thank you for your help ;)
Solution 3:[3]
I'm also facing the same issue while I was working with Cloud DataFlow [ Apache Beam ] to migrate .csv data from Cloud Storage to Google BigQuery.
This error occurs during the conversion of .csv file to .json file resulting into an error. I resolved this issue by adding few lines of code to convert .csv file to .json file.
CSV file:
Gender,Math,Physics,Chemistry
male,57,50,53
male,63,66,59
male,65,56,54
CSV to JSON conversion:
Your output data must be in this below format then only it will load data to Google Bigquery.
[ {"Gender":"male","Math":"57","Physics":"50","Chemistry":"53"},
{"Gender":"male","Math":"63","Physics":"66","Chemistry":"59"},
{"Gender":"male","Math":"65","Physics":"56","Chemistry":"54"}, ]
Python code to convert .csv to .json file
count = "start"
lst =[]
final = ""
class csv2json(beam.DoFn):
def process(self,csvFile):
global count
global header
global lst,final
min_dict={}
lst=[]
print("**************")
print(csvFile)
print(type(csvFile))
print("**************")
#Consider first row as a header
if "start" == count:
for item in range(len(csvFile)):
min_dict[csvFile[item]]=""
count = "end"
header = csvFile
#Consider rest are the value for headers.
else:
for item in range(len(csvFile)):
min_dict[header[item]]=csvFile[item]
#converting dict into list of dict
lst.append(min_dict)
final = lst
print("**************")
print(final)
print(type(final))
print("**************")
return final
Hope my solution works for you as well.
For your reference have a glance here: https://beam.apache.org/documentation/io/built-in/google-bigquery/#writing-to-a-table
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | DamPlz |
Solution 2 | malevil667 |
Solution 3 |