'ERROR: extra data after last expected column on PostgreSQL while the number of columns is the same

I am new to PostgreSQL and I need to import a set of csv files, but some of them weren't imported successfully. I got the same error with these files: ERROR: extra data after last expected column. I have investigated this error report and learned that these errors occur might because the number of columns of the table is not equal to that in the file. But I don't think I am in this situation.

For example, I create this table:

CREATE TABLE cast_info (
  id integer NOT NULL PRIMARY KEY,
  person_id integer NOT NULL,
  movie_id integer NOT NULL,
  person_role_id integer,
  note character varying,
  nr_order integer,
  role_id integer NOT NULL
);

And then I want to copy the csv file:

COPY cast_info FROM '/private/tmp/cast_info.csv' WITH CSV HEADER;

Then I got the error:

**ERROR:  extra data after last expected column
CONTEXT:  COPY cast_info, line 8801: "612,207,2222077,1,"(segments \"Homies\" - \"Tilt A Whirl\" - \"We don't die\" - \"Halls of Illusions..."**

The complete row in this csv file is as follows:

612,207,2222077,1,"(segments \"Homies\" - \"Tilt A Whirl\" - \"We don't die\" - \"Halls of Illusions\" - \"Chicken Huntin\" - \"Another love song\" - \"How many times?\" - \"Bowling balls\" - \"The people\" - \"Piggy pie\" - \"Hokus pokus\" - \"Let\"s go all the way\" - \"Real underground baby\")/Full Clip (segments \"Duk da fuk down\" - \"Real underground baby\")/Guy Gorfey (segment \"Raw deal\")/Sugar Bear (segment \"Real underground baby\")",2,1

You can see that there's exactly 7 columns as the table has.

The strange thing is, I found that the error lines of all these files contain the characters backslash and quotation mark (\"). Also, these rows are not the only row that contains \" in the files. I wonder why this error doesn't appear in other rows. Because of that, I am not sure if this is the problem.

After modifying these rows (e.g. replace the \" or delete the content while remaining the commas), there are new errors: ERROR: invalid input syntax for line 2 of every file. And the errors occur because the data in the last column of these rows have been added three semicolons(;;;) for no reason. But when I open these csv files, I can't see the three semicolons in those rows.

For example, after deleting the content in the fifth column of this row:

612,207,2222077,1,,2,1

I got the error:

**ERROR:  invalid input syntax for type integer: "1;;;"
CONTEXT:  COPY cast_info, line 2, column role_id: "1;;;"**

While the line 2 doesn't contain three semicolons, as follows:

2,2,2163857,1,,25,1

In principle, I hope the problem can be solved without any modification to the data itself. Thank you for your patience and help!



Solution 1:[1]

The CSV format protects quotation marks by doubling them, not by backslashing them. You could use the text format instead, except that that doesn't support HEADER, and also it would then not remove the outer quote marks. You could instead tweak the files on the fly with a program:

COPY cast_info FROM PROGRAM 'sed s/\\\\/\"/g /private/tmp/cast_info.csv' WITH CSV;

This works with the one example you gave, but might not work for all cases.

ERROR: invalid input syntax for line 2 of every file. And the errors occur because the data in the last column of these rows have been added three semicolons(;;;) for no reason. But when I open these csv files, I can't see the three semicolons in those rows

How are you editing and viewing these files? Sounds like you are using something that isn't very good at preserving formatting, like Excel.

Solution 2:[2]

Try actually naming the columns you want processed in the copy statement:

copy cast_info (id, person_id, movie_id, person_role_id, note, nr_order, role_id) from ... 

Solution 3:[3]

According to a friend's suggestion, I need to specify the backslashes as escape characters:

copy <table_name> from '<csv_file_path>' csv escape '\';

and then the problem is solved.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jjanes
Solution 2 Belayer
Solution 3 tabbacio