'Extracting a .7z File into a Pandas Data Frame

I am Using a Jupyter notebook (google colab) to try and extract data from a .7z file into a pandas dataframe, using linux commands. The data is from http://untroubled.org/spam/ . I wish to extract only the data from the 2020-01.7z file. so far I have,

!wget http://untroubled.org/spam/2020-01.7z
!7z x 2020-01.7z
import pandas as pd
import py7zr     
archive = py7zr.SevenZipFile('2020-01.7z', mode='r')
archive.extractall(path="/tmp")
with open ('2020-01.7z', 'r') as myfile:
  myfile.read()

mydf = pd.DataFrame(myfile)
 


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 2: invalid 
start byte

I'm not really sure what the "/tmp" mean. I know there is a way to do this I just don't have enough understanding yet of these commands and how to use them. Any help is appreciated



Solution 1:[1]

Just try

!7z e 2020-01.7z

it works for me!

You can see this

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Shihab Masri