'SELECT FROM OPENROWSET( BULK...) changing special characters
I've been facing an issue for a few hours, and I can't seem to get my head around this one.
So I have a SQL Server database 2008R2, Collation SQL_Latin1_General_CP1_CI_AS. Inside there is a table, with a field named incoming_name. The collation of this field is also SQL_Latin1_General_CP1_CI_AS, and it is a NVARCHAR(255).
I have a .csv file with around 123000 rows. It's a basic csv, no double quotes around text, but no comma inside the fields, so when I run a manual import into my database it works fine. The incoming_name field contains all kind of text, but never longer than 255 characters. And in a few lines there are french accents (like 'Ch*â*teau d'Agassac').
Now I try to use the code
select
test_file.[INCOMING_NAME] COLLATE SQL_Latin1_General_CP1_CI_AS
as [INCOMING_NAME]
, test_file.[PRODUCT_CODE] AS [PRODUCT_CODE]
FROM
OPENROWSET(
BULK 'INSERT PATH OF THE .CSV HERE',
FORMATFILE = 'INSERT PATH OF THE FORMAT FILE HERE',
FIRSTROW = 2
) AS test_file
With the format file
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR=',' MAX_LENGTH="255" COLLATION="SQL_LATIN1_GENERAL_CP1_CI_AS" />
<FIELD ID="29" xsi:type="CharTerm" TERMINATOR='\r\n' />
</RECORD>
<ROW>
<COLUMN SOURCE="4" NAME="INCOMING_NAME" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="29" NAME="PRODUCT_CODE" xsi:type="SQLNVARCHAR"/>
</ROW>
The import works fine, and I get all my data, with the right values in the right fields, except for the accents...
For example when I add where test_file.incoming_name like '%agassac%'
at the end of my query, I get a result like 'Château d'Agassac' instead of the original data 'Château d'Agassac' in my database.
What I don't understand is that I feel like at every step of the process, I did pick an accent sensitive collation, with a unicode datatype (NVARCHAR), so I really don't understand why the import doesn't pick the accents.
Thanks for reading this long question,
John.
EDIT: Ok, it looks like the .csv file I want to import is encoded with utf-8, and SQL Server 2008 doesn't want to support utf-8 import. Now I have no idea what to do. Any idea welcome...
Solution 1:[1]
I think adding widenative as DATAFILETYPE should resolve the issue. Please refer to this link for further details: http://msdn.microsoft.com/en-us/library/ms189941.aspx
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Sonam |