'Golang: Unzip files in Go gets char encoding problems in the files names when file has been zipped in windows

I´m trying to unzip files in Go (Golang) using the zip lib. The problem is that when the zip file has been zipped in windows all special characters get messy. windows probably uses windows1252 char encoding. Just cant figure out how to unzip theses files. I´ve already tried to use the golang.org/x/text/encoding/charmap or golang.org/x/text/transform, but no luck. I guess, inside the zip lib should have an anternative to change the charmap.

Another problem: sometimes the app will unzip files zipped on windows and sometimes zipped on a different OS. So, the app will need to identify the char encoding.

This is the code (thanks to: https://golangcode.com/unzip-files-in-go/):

package main

import (
    "archive/zip"
    "fmt"
    "io"
    "log"
    "os"
    "path/filepath"
    "strings"
)

func main() {

    files, err := Unzip("Edificações e Instalações Operacionais - 08.03 a 12.03.2021.zip", "output-folder")
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println("Unzipped:\n" + strings.Join(files, "\n"))
}

// Unzip will decompress a zip archive, moving all files and folders
// within the zip file (parameter 1) to an output directory (parameter 2).
func Unzip(src string, dest string) ([]string, error) {

    var filenames []string

    r, err := zip.OpenReader(src)
    if err != nil {
        return filenames, err
    }
    defer r.Close()

    for _, f := range r.File {

        // Store filename/path for returning and using later on
        fpath := filepath.Join(dest, f.Name)

        
        if !strings.HasPrefix(fpath, filepath.Clean(dest)+string(os.PathSeparator)) {
            return filenames, fmt.Errorf("%s: illegal file path", fpath)
        }

        filenames = append(filenames, fpath)

        if f.FileInfo().IsDir() {
            // Make Folder
            os.MkdirAll(fpath, os.ModePerm)
            continue
        }

        // Make File
        if err = os.MkdirAll(filepath.Dir(fpath), os.ModePerm); err != nil {
            return filenames, err
        }

        outFile, err := os.OpenFile(fpath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, f.Mode())
        if err != nil {
            return filenames, err
        }

        rc, err := f.Open()
        if err != nil {
            return filenames, err
        }

        _, err = io.Copy(outFile, rc)

        // Close the file without defer to close before next iteration of loop
        outFile.Close()
        rc.Close()

        if err != nil {
            return filenames, err
        }
    }
    return filenames, nil
}

This is The Output



Solution 1:[1]

If we just print the first compressed entry:

package main
import "archive/zip"

func main() {
   s := "Edificac?o?es_e_Instalac?o?es_Operacionais_08_03_a_12_03_2021.zip"
   f, e := zip.OpenReader(s)
   if e != nil {
      panic(e)
   }
   defer f.Close()
   println(f.File[0].Name)
}

We get this result:

Edifica??es e Instala??es Operacionais - 08.03 a 12.03.2021/

According to this page:

In Brazil, however, the most widespread codepage —and that which DOS in Brazilian portuguese used by default— was code page 850.

https://wikipedia.org/wiki/Code_page_860

So we can modify the code to deal with this:

package main

import (
   "archive/zip"
   "golang.org/x/text/encoding/charmap"
)

func main() {
   z := "Edificac?o?es_e_Instalac?o?es_Operacionais_08_03_a_12_03_2021.zip"
   f, e := zip.OpenReader(z)
   if e != nil {
      panic(e)
   }
   defer f.Close()
   s, e := charmap.CodePage850.NewDecoder().String(f.File[0].Name)
   if e != nil {
      panic(e)
   }
   println(s)
}

We get correct result:

Edificações e Instalações Operacionais - 08.03 a 12.03.2021/

https://pkg.go.dev/golang.org/x/text/encoding/charmap

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1