'Reading InputStream as UTF-8
I'm trying to read from a text/plain
file over the internet, line-by-line. The code I have right now is:
URL url = new URL("http://kuehldesign.net/test.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
LinkedList<String> lines = new LinkedList();
String readLine;
while ((readLine = in.readLine()) != null) {
lines.add(readLine);
}
for (String line : lines) {
out.println("> " + line);
}
The file, test.txt
, contains ¡Hélló!
, which I am using in order to test the encoding.
When I review the OutputStream
(out
), I see it as > ¡Hélló!
. I don't believe this is a problem with the OutputStream
since I can do out.println("é");
without problems.
Any ideas for reading form the InputStream
as UTF-8? Thanks!
Solution 1:[1]
Solved my own problem. This line:
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
needs to be:
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
or since Java 7:
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), StandardCharsets.UTF_8));
Solution 2:[2]
String file = "";
try {
InputStream is = new FileInputStream(filename);
String UTF8 = "utf8";
int BUFFER_SIZE = 8192;
BufferedReader br = new BufferedReader(new InputStreamReader(is,
UTF8), BUFFER_SIZE);
String str;
while ((str = br.readLine()) != null) {
file += str;
}
} catch (Exception e) {
}
Try this,.. :-)
Solution 3:[3]
I ran into the same problem every time it finds a special character marks it as ??. to solve this, I tried using the encoding: ISO-8859-1
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("txtPath"),"ISO-8859-1"));
while ((line = br.readLine()) != null) {
}
I hope this can help anyone who sees this post.
Solution 4:[4]
If you use the constructor InputStreamReader(InputStream in, Charset cs)
, bad characters are silently replaced. To change this behaviour, use a CharsetDecoder
:
public static Reader newReader(Inputstream is) {
new InputStreamReader(is,
StandardCharsets.UTF_8.newDecoder()
.onMalformedInput(CodingErrorAction.REPORT)
.onUnmappableCharacter(CodingErrorAction.REPORT)
);
}
Then catch java.nio.charset.CharacterCodingException
.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | tobijdc |
Solution 2 | Ahmed Ashour |
Solution 3 | |
Solution 4 | grigouille |