'Download File from Direct Download URL
I'm trying to download the following the following file, with this link that redirects you to a direct download: http://www.lavozdegalicia.es/sitemap_sections.xml.gz
I've done my own research, but all the results I see are related to HTTP URL redirections [3xx] and not to direct download redirections (maybe I'm using the wrong terms to do the research).
I've tried the following pieces of code (cite: https://programmerclick.com/article/7719159084/ ):
// Using Java IO
private static void downloadFileFromUrlWithJavaIO(String fileName, String fileUrl) {
BufferedInputStream inputStream = null;
FileOutputStream outputStream = null;
try {
URL url = new URL(fileUrl);
inputStream = new BufferedInputStream(url.openStream());
outputStream = new FileOutputStream(fileName);
byte data[] = new byte[1024];
int count;
while ((count = inputStream.read(data, 0, 1024)) != -1) {
outputStream.write(data, 0, count);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (inputStream != null) {
inputStream.close();
}
if (outputStream != null) {
outputStream.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
// Using Apache common IO
private static void downloadFileFromUrlWithCommonsIO(String fileName, String fileUrl) {
try {
FileUtils.copyURLToFile(new URL(fileUrl), new File(fileName));
} catch (IOException e) {
e.printStackTrace();
}
}
// Using NIO
private static void downloadFileFromURLUsingNIO(String fileName, String fileUrl) {
try {
URL url = new URL(fileUrl);
ReadableByteChannel readableByteChannel = Channels.newChannel(url.openStream());
FileOutputStream fileOutputStream = new FileOutputStream(fileName);
fileOutputStream.getChannel().transferFrom(readableByteChannel, 0, Long.MAX_VALUE);
fileOutputStream.close();
readableByteChannel.close();
} catch (IOException e) {
e.printStackTrace();
}
}
But the result I get with any of the three options is an empty file, my thoughts are that the problem is related to the file being a .xml.gz because when I debug it the inputStream doesn't seem to have any content.
I ran out of options, anyone has an idea of how to handle this case, or what would be the correct terms I should use to research about this specific case?
Solution 1:[1]
I found a solution, there's probably a more polite way of achieving the same result but this worked fine for me:
//Download the file and decompress it
filecount=0;
URL compressedSitemap = new URL(urlString);
HttpURLConnection con = (HttpURLConnection) compressedSitemap.openConnection();
con.setRequestMethod("GET");
if (con.getResponseCode() == HttpURLConnection.HTTP_MOVED_TEMP || con.getResponseCode() == HttpURLConnection.HTTP_MOVED_PERM) {
String location = con.getHeaderField("Location");
URL newUrl = new URL(location);
con = (HttpURLConnection) newUrl.openConnection();
}
String file = "/home/user/Documentos/Decompression/decompressed" + filecount + ".xml";
GZIPInputStream gzipInputStream = new GZIPInputStream(con.getInputStream());
FileOutputStream fos = new FileOutputStream(file);
byte[] buffer = new byte[1024];
int len = 0;
while ((len = gzipInputStream.read(buffer)) > 0) {
fos.write(buffer, 0, len);
}
fos.close();
filecount++;
Two things to note:
- When I was trying to do HTTPGet the url that was a redirect, the response code was 301 or 302 (depending on the example I used), I overcame this problem with the if check, that follows the redirect and aims to the downloaded file.
- Once aiming the file, to get the content of the compressed file I found the GZIPInputStream package, that allowed me to get an inputStream directly from the compressed file and dump it on an xml file, that saved me the time of doing it on three steps (decompress, read, copy).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Santiago Luca |