'Golang: Unzip files in Go gets char encoding problems in the files names when file has been zipped in windows
I´m trying to unzip files in Go (Golang) using the zip lib. The problem is that when the zip file has been zipped in windows all special characters get messy.
windows probably uses windows1252 char encoding. Just cant figure out how to unzip theses files.
I´ve already tried to use the golang.org/x/text/encoding/charmap
or golang.org/x/text/transform
, but no luck.
I guess, inside the zip lib should have an anternative to change the charmap.
Another problem: sometimes the app will unzip files zipped on windows and sometimes zipped on a different OS. So, the app will need to identify the char encoding.
This is the code (thanks to: https://golangcode.com/unzip-files-in-go/):
package main
import (
"archive/zip"
"fmt"
"io"
"log"
"os"
"path/filepath"
"strings"
)
func main() {
files, err := Unzip("Edificações e Instalações Operacionais - 08.03 a 12.03.2021.zip", "output-folder")
if err != nil {
log.Fatal(err)
}
fmt.Println("Unzipped:\n" + strings.Join(files, "\n"))
}
// Unzip will decompress a zip archive, moving all files and folders
// within the zip file (parameter 1) to an output directory (parameter 2).
func Unzip(src string, dest string) ([]string, error) {
var filenames []string
r, err := zip.OpenReader(src)
if err != nil {
return filenames, err
}
defer r.Close()
for _, f := range r.File {
// Store filename/path for returning and using later on
fpath := filepath.Join(dest, f.Name)
if !strings.HasPrefix(fpath, filepath.Clean(dest)+string(os.PathSeparator)) {
return filenames, fmt.Errorf("%s: illegal file path", fpath)
}
filenames = append(filenames, fpath)
if f.FileInfo().IsDir() {
// Make Folder
os.MkdirAll(fpath, os.ModePerm)
continue
}
// Make File
if err = os.MkdirAll(filepath.Dir(fpath), os.ModePerm); err != nil {
return filenames, err
}
outFile, err := os.OpenFile(fpath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, f.Mode())
if err != nil {
return filenames, err
}
rc, err := f.Open()
if err != nil {
return filenames, err
}
_, err = io.Copy(outFile, rc)
// Close the file without defer to close before next iteration of loop
outFile.Close()
rc.Close()
if err != nil {
return filenames, err
}
}
return filenames, nil
}
Solution 1:[1]
If we just print the first compressed entry:
package main
import "archive/zip"
func main() {
s := "Edificac?o?es_e_Instalac?o?es_Operacionais_08_03_a_12_03_2021.zip"
f, e := zip.OpenReader(s)
if e != nil {
panic(e)
}
defer f.Close()
println(f.File[0].Name)
}
We get this result:
Edifica??es e Instala??es Operacionais - 08.03 a 12.03.2021/
According to this page:
In Brazil, however, the most widespread codepage —and that which DOS in Brazilian portuguese used by default— was code page 850.
https://wikipedia.org/wiki/Code_page_860
So we can modify the code to deal with this:
package main
import (
"archive/zip"
"golang.org/x/text/encoding/charmap"
)
func main() {
z := "Edificac?o?es_e_Instalac?o?es_Operacionais_08_03_a_12_03_2021.zip"
f, e := zip.OpenReader(z)
if e != nil {
panic(e)
}
defer f.Close()
s, e := charmap.CodePage850.NewDecoder().String(f.File[0].Name)
if e != nil {
panic(e)
}
println(s)
}
We get correct result:
Edificações e Instalações Operacionais - 08.03 a 12.03.2021/
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |