'How to remove BOM from an encoded base64 UTF string?
I have a file encoded in base64 using openssl base64 -in en -out en1
in a command line in MacOS and I am reading this file using the following code:
string fileContent = File.ReadAllText(Path.Combine(AppContext.BaseDirectory, MConst.BASE_DIR, "en1"));
var b1 = Convert.FromBase64String(fileContent);
var str1 = System.Text.Encoding.UTF8.GetString(b1);
The string I am getting has a ?
before the actual file content. I am not sure what's causing this, any help will be appreciated.
Example Input:
import pandas
import json
Encoded file example:
77u/DQppbXBvcnQgY29ubmVjdG9yX2FwaQ0KaW1wb3J0IGpzb24NCg0K
Output based on the C# code:
?import pandas
import json
Solution 1:[1]
Normally, when you read UTF (with BOM) from a text file, the decoding is handled for you behind the scene. For example, both of the following lines will read UTF text correctly regardless of whether or not the text file has a BOM:
File.ReadAllText(path, Encoding.UTF8);
File.ReadAllText(path); // UTF8 is the default.
The problem is that you're dealing with UTF text that has been encoded to a Base64 string. So, ReadAllText()
can no longer handle the BOM for you. You can either do it yourself by (checking and) removing the first 3 bytes from the byte array or delegate that job to a StreamReader, which is exactly what ReadAllText()
does:
var bytes = Convert.FromBase64String(fileContent);
string finalString = null;
using (var ms = new MemoryStream(bytes))
using (var reader = new StreamReader(ms)) // Or:
// using (var reader = new StreamReader(ms, Encoding.UTF8))
{
finalString = reader.ReadToEnd();
}
// Proceed to using finalString.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | 41686d6564 stands w. Palestine |