'zipfile write dont find files in gcloud
Im trying zip a few files from Google Storage.
The zipfile of Python doesnt find the files in gcloud, just in the project.
How can I do for my code find the files in gcloud?
zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, 'w') as zip_file:
for revenue in revenues:
# queryset with files a lot, so, for a each file, add in zip
t = tempfile.NamedTemporaryFile()
t.write(revenue.revenue.name)
if revenue.revenue.name:
t.seek(0)
with default_storage.open(revenue.revenue.name, "r") as file_data:
zip_file.write(file_data.name, compress_type=zipfile.ZIP_DEFLATED)
# the code dont pass from this part
t.close()
response = HttpResponse(content_type='application/x-zip-compressed')
response['Content-Disposition'] = 'attachment; filename=my_zip.zip'
response.write(zip_buffer.getvalue())
return response
In this part, I write the file that I opened from gcloud, but stop inside the function:
def write(self, filename, arcname=None, compress_type=None):
"""Put the bytes from filename into the archive under the name
arcname."""
if not self.fp:
raise RuntimeError(
"Attempt to write to ZIP archive that was already closed")
st = os.stat(filename)
# when I try find the file, the command os.stat search in project, not in gcloud
the "os.stat(filename)" search for a file in project, how can I do for find in the gcloud?
Solution 1:[1]
I will post my findings as an answer, since I would like to comment about few things.
I have understood:
- You have a Python library
zipfile
that is used to work with ZIP files. - You are looking for files locally and add one by one into the ZIP file.
- You would like to do this as well for files located in Google Cloud Storage bucket. But it is failing to find the files.
If I have misunderstood the use-case scenario, please elaborate further in a comment.
However, if this is exactly what you are trying to do, then this is not supported. In the StackOverflow Question - Compress files saved in Google cloud storage, it is stated that compressing files that are already in the Google Cloud Storage is not possible. The solution in that question is to subscribe to newly created files and then download them locally, compress them and overwrite them in GCS. As you can see, you can list the files, or iterate through the files stored in GCS, but you first need to download them to be able to process them.
Work around
Therefore, in your use-case scenario, I would recommend the following workaround, by using the Python client API:
- You can use Listing objects Python API, to get all the objects from GCS.
- Then you can use Downloading objects Python API, to download the objects locally.
- As soon as the objects are located in local directory, you can use the
zipfile
Python library to ZIP them together, as you are already doing it. - Then the objects are ZIPed and if you no longer need the downloaded objects, you can delete them with
os.remove("downloaded_file.txt")
. - In case you need to have the compressed ZIP file in the Google Cloud Storage bucket, then you can use the Uploading objects Python API to upload the ZIP file in the GCS bucket.
As I have mentioned above, processing files (e.g. Adding them to a ZIP files etc.) directly in Google Cloud Storage bucket, is not supported. You first need to download them locally in order to do so. I hope that my workaround is going to be helpful to you.
UPDATE
As I have mentioned above, zipping files while they are in GCS bucket is not supported. Therefore I have prepared for you a working example in Python on how to use the workaround.
NOTE: As I am not professional on operating os commands with Python library and I am not familiar with
zipfile
library, there is probably a better and more efficient way of achieving this. However, the code that can be found in this GitHub link, does the following procedures:
- Under
#Public variables:
section changeBUCKET_NAME
to your corresponding bucket name and execute the python script in Google Cloud Shell. Cloud Shell - Now my bucket structure is as follows:
gs://my-bucket/test.txt
gs://my-bucket/test1.txt
gs://my-bucket/test2.txt
gs://my-bucket/directory/test4.txt
When executing the command, what the app does is the following:
- Will get the path of where the script is executed. e.g.
/home/username/myapp
. - It will create a temporary directory within this directory e.g.
/home/username/myapp/temp
- It will iterate through all the files located in the bucket that you have specified and will download them locally inside that temp directory.
NOTE: If the file in the bucket is under directory it will simple download the file, instead of creating that sub-directory again. You can modify the code to make it work as you desired later.
- So the new downloaded files will look like this:
/home/username/myapp/temp/test.txt
/home/username/myapp/temp/test1.txt
/home/username/myapp/temp/test2.txt
/home/username/myapp/temp/test4.txt
- After that, the code will zip all those files to a new
zipedFile.zip
that will be located in the same directory with themain.py
script that you have executed. - When this step is done as well, the script will delete the directory
/home/username/myapp/temp/
with all of its contents.
As I have mentioned above, after executing the script locally, you should be able to see the main.py
and an zipedFile.zip
file with all the zipped files from the GCS bucket. Now you can take the idea of implementation and modify it according to your project's needs.
Solution 2:[2]
the final code:
zip_buffer = io.BytesIO()
base_path = '/home/everton/compressedfiles/'
fiscal_compentecy_month = datetime.date(int(year), int(month), 1)
revenues = CompanyRevenue.objects.filter(company__pk=company_id, fiscal_compentecy_month=fiscal_compentecy_month)
if revenues.count() > 0:
path = base_path + str(revenues.first().company.user.pk) + "/"
zip_name = "{}-{}-{}-{}".format(revenues.first().company.external_id, revenues.first().company.external_name, month, year)
for revenue in revenues:
filename = revenue.revenue.name.split('revenues/')[1]
if not os.path.exists(path):
os.makedirs(path)
with open(path + filename, 'wb+') as file:
file.write(revenue.revenue.read())
file.close()
with zipfile.ZipFile(zip_buffer, 'w') as zip_file:
for file in os.listdir(path):
zip_file.write(path + file, compress_type=zipfile.ZIP_DEFLATED)
zip_file.close()
response = HttpResponse(content_type='application/x-zip-compressed')
response['Content-Disposition'] = 'attachment; filename={}.zip'.format(zip_name)
response.write(zip_buffer.getvalue())
shutil.rmtree(path)
return response
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Everton Alauk |