'What is the encoding to read and write files with special characters such as en dash, left quotes, etc?

I'm reading csv files that contain special characters such as the long en dash –, left double quotes “, and right double quotes ” and I can't figure out the proper way to read and write these correctly. I thought it was UTF8 or Unicode but it reads and writes them as a square or ? with a diamond. Opening the files in notepad++ to confirm. Maybe another specific encoding is needed? Here is the code I've been using so far, tried a few variations of this with different encoding.:

string[] lines = File.ReadAllLines(filePathTxt.Text, Encoding.UTF8);
...
Stream s = new FileStream(filePath, FileMode.Append);
StreamWriter sw = new StreamWriter(s, Encoding.UTF8, 1000, true);

Input of:

Surveys – Public

Documents:,“A”

comes out as

Surveys � Public

Documents:,�A�

Also shows problems in debugger as soon as it's read into the string array.

Edit: I've tried Unicode also. I'm using NotePad++, Win 10. The problem is definitely in the Read step, because if I add the following line to manually write a line of data, like so:

 sw.WriteLine("Surveys – Public");

That line writes the dash fine, so it's on the initial read of the file from the source csv where the characters get messed up. I've tried reading with a few different encodings, and NotePad++ just shows the csv as being ANSI.



Solution 1:[1]

Your code works OK if the file you'rte reading is already encoded in UTF-8, but it won't work if it's using a different encoding. I would recommend loading the file into a text editor like Notepad++ that tells you what the encoding of the file is (in the status bar). If it's not encoded in UTF-8 to start with, reading and writing as UTF-8 won't work.

If you want to try reading the file in the system's default encoding, you could use Encoding.Default instead of UTF8. Then you should write to a new file when writing the file because you can't really write multiple encodings to the same file. The default encoding is likely to be the correct encoding if UTF-8 isn't.

string filePath = @"C:\users\yourname\desktop\TestFile.txt";
string[] lines = File.ReadAllLines(filePath, Encoding.Default);

string outFile = @"C:\users\yourname\desktop\outfile.txt";
Stream s = new FileStream(outFile, FileMode.Append);
StreamWriter sw = new StreamWriter(s, Encoding.UTF8, 1000, true);
foreach (var line in lines)
   sw.WriteLine(line);
sw.Close();

Alternatively if you have to append to the same file, use the same encoding that you did for reading the file, or rewrite the whole file. If the original file looks OK in notepad, the system's default encoding is likely to be the correct encoding. If you want to leave the file in the system's current encoding, use Encoding.Default. If you want to change the encoding of the whole file to UTF-8, I think you'd have to rewrite the whole file instead of appending.

If Notepad++ shows this in the status bar, then you can't read the file as UTF-8 enter image description here

You can only use UTF-8 if Notepad++ shows something like this in the status bar:

enter image description here

You can use the "Encoding" menu's "Convert to UTF-8" command in Notepad++ to make the file compatible with your application.

Warning: Don't confuse the "Encode in UTF-8" command with the "Convert to UTF-8" command. If the file looks correct, you want to use "Convert to UTF-8". If you use "Encode in UTF-8" that will re-interpret the existing data as a new encoding instead of changing the content to use a new encoding.

Edit: Change Encoding.GetEncoding(0) to Encoding.Default.

Solution 2:[2]

Instead of:

StreamWriter sw = new StreamWriter(s, Encoding.UTF8, 1000, true);

use this:

StreamWriter sw = new StreamWriter(s, Encoding.Unicode, 1000, true);

I just tried it and it shows up correctly in NotePad++

Here's the sample I ran that I used for testing it:

        using (StreamWriter swClifor = new StreamWriter("test.txt", true, Encoding.Unicode))
        {
            string cString = "en dash –, left double quotes “, and right double quotes ”";
            swClifor.WriteLine(cString);
        }

Solution 3:[3]

Oh man, I figured it out now, this seems silly, but here is what it was.

NotePad++ status bar showed it as ANSI. Initially I tried without providing the encoding on read or write.

When looking through the options of the encoding to use I saw there was an option in Encoding for Default:

string[] lines = File.ReadAllLines(filePathTxt.Text, Encoding.Default);

BUT I never actually tried this one because surely not specifying which encoding to use must be the default already right? Well I guess not because actually giving it this on the read and write made it work correctly now.

Solution 4:[4]

Try do this....

I have to write non english(Bangla) Language, but not support this language.

Working: Encoding.Unicode and Encoding.UTF32 is supporting foreign characters using StreamWriter in C#

use this:

 StreamWriter wr = new StreamWriter(@"D:\TestBook.xls", true, Encoding.Unicode);  

or

 StreamWriter wr = new StreamWriter(@"D:\TestBook.xls", true, Encoding.UTF32); 

C# code:

 private void ExportExcel()
    {
        try
        {
            DataTable dt = new DataTable();

            //Add Datacolumn
            DataColumn workCol = dt.Columns.Add("Name", typeof(String));
            dt.Columns.Add("BanglaLanguage", typeof(String));
            dt.Columns.Add("City", typeof(String));

            //Add in the datarow
            DataRow newRow = dt.NewRow();

            newRow["Name"] = "MahalingaM";
            newRow["BanglaLanguage"] = "?????????? ?????";//Bangla Language
            newRow["City"] = "Coimbatore";
            dt.Rows.Add(newRow);

            //open file
            //non english not support(ex: Bangla Language)
            // StreamWriter wr = new StreamWriter(@"D:\TestBook.xls");

            //non english support(ex: Bangla Language)
            StreamWriter wr = new StreamWriter(@"D:\TestBook.xls", true, Encoding.Unicode); // Encoding.Unicode or Encoding.UTF32

            for (int i = 0; i < dt.Columns.Count; i++)
            {
                wr.Write(dt.Columns[i].ToString().ToUpper() + "\t");
            }

            wr.WriteLine();

            //write rows to excel file
            for (int i = 0; i < (dt.Rows.Count); i++)
            {
                for (int j = 0; j < dt.Columns.Count; j++)
                {
                    if (dt.Rows[i][j] != null)
                    {
                        wr.Write(Convert.ToString(dt.Rows[i][j]) + "\t");
                    }
                    else
                    {
                        wr.Write("\t");
                    }
                }
                //go to next line
                wr.WriteLine();
            }
            //close file
            wr.Close();

            //xls to xlsx convertion
            Microsoft.Office.Interop.Excel.Application app = new Microsoft.Office.Interop.Excel.Application();
            Microsoft.Office.Interop.Excel.Workbook wb = app.Workbooks.Open(@"D:\TestBook.xls", Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
            wb.SaveAs(@"D:\TestBook.xlsx", Microsoft.Office.Interop.Excel.XlFileFormat.xlOpenXMLWorkbook, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlExclusive, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
            wb.Close();
            app.Quit();

        }
        catch (Exception ex)
        {
            throw ex;
        }
    }

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Phil N DeBlanc
Solution 3 Brent Kilboy
Solution 4 MahalingaM SubramaniyaM