'Aws S3 Filter by Tags. Search by tags

We have our bucket with new Aws SDK API on AWS S3. We uploaded and tagged lots of files and folders with tags.

How can we filter on key-value tag, or only one of them? I'd like to find all the objects with key = "temp", or key = "temp" and value = "lol".

Thanks!



Solution 1:[1]

I also hoped that AWS will eventually support "search files by tags" because that would open up possibilities like e.g. having a photo storage with the names, descriptions, location stored in tags so I wouldn't need a separate database.

But, apparently AWS explicitly is not supporting this, and will probably never do so. Quoting from their storage service white paper:

Amazon S3 doesn’t suit all storage situations. [...] some storage needs for which you should consider other AWS storage options [...]

Amazon S3 doesn’t offer query capabilities to retrieve specific objects. When you use Amazon S3 you need to know the exact bucket name and key for the files you want to retrieve from the service. Amazon S3 can’t be used as a database or search engine by itself.

Instead, you can pair Amazon S3 with Amazon DynamoDB, Amazon CloudSearch, or Amazon Relational Database Service (Amazon RDS) to index and query metadata about Amazon S3 buckets and objects.

AWS suggests using DynamoDB, RDS or CloudSearch instead.

Solution 2:[2]

There seems to be one way to achieve what you're looking for, although it's not ideal, or particularly user-friendly.

The AWS S3 tagging documentation says that you can grant accounts permissions for objects with a given tag. If you created a new account with the right permissions then you could probably get the filtered list.

Not particularly useful on an ongoing basis, though.

Solution 3:[3]

AFAIK - Resource Groups don't support tags on an S3 Object level only on a bucket level.

Source: https://aws.amazon.com/blogs/aws/new-aws-resource-tagging-api/ (scroll down the page to the table).

Solution 4:[4]

There's no way to filter/search by tags. But you can implement this yourself using S3.

You can create a special prefix in a bucket, e.g. /tags/. Then for each actual object you add and want to assign a tag (e.g. Department=67), you add a new object in /tags/, e.g: /tags/XXXXXXXXX_YYYYYYYYY_ZZZZZZZZZ, where

XXXXXXXXX = hash('Department')
YYYYYYYYY = hash('67')
ZZZZZZZZZ = actualObjectKey

Then when you want to get all objects that have a particular tag assigned (e.g. Department), you have to execute the ListObjectsV2 S3 API for prefix /tags/XXXXXXXXX_. If you want objects that have particular tag value (e.g. Department=67), you have to execute the ListObjectsV2 S3 API for prefix /tags/XXXXXXXXX_YYYYYYYYY_

https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html

It's not that fast but still does the job.

Obvious downside is that you have to remove the tags yourself. For example, you can do all of this above with a S3 triggers and lambda.

Solution 5:[5]

This is now possible using AWS Resource Tagging API and S3 Select (SQL). See this post: https://aws.amazon.com/blogs/architecture/how-to-efficiently-extract-and-query-tagged-resources-using-the-aws-resource-tagging-api-and-s3-select-sql/.

However, the Resource Tagging API supports only tags on buckets for the S3 service, not on objects: New – AWS Resource Tagging API

Solution 6:[6]

You should be able to query tags and values that you added using resource-groups/query resource:

https://${region}.console.aws.amazon.com/resource-groups/resources

Solution 7:[7]

There is many way to get filter list of s3 by tag. I used in my code:

import boto3
from botocore.exceptions import ClientError

def get_tag_value(tags, key):
    for tag in tags:
        if tag["Key"] == key:
            return tag["Value"]
    return ""

def filter_s3_by_tag_value(tag_key,tag_value):
    s3 = boto3.client('s3')
    response = s3.list_buckets()
    s3_list=[]
    for bucket in response["Buckets"]:
        try:
            response_tags = s3.get_bucket_tagging(Bucket=bucket["Name"])  
            if get_tag_value(response_tags["TagSet"],tag_key) == tag_value: 
                s3_list.append(bucket["Name"])
        except ClientError as e:
            print(e.response["Error"]["Code"]) 

    return s3_list     

def filter_s3_by_tag_key(tag_key):
    s3 = boto3.client('s3')
    response = s3.list_buckets()
    s3_list=[]
    for bucket in response["Buckets"]:
        try:
            response_tags = s3.get_bucket_tagging(Bucket=bucket["Name"])  
            if get_tag_value(response_tags["TagSet"],tag_key) != "": 
                s3_list.append(bucket["Name"])
        except ClientError as e:
            print(e.response["Error"]["Code"]) 

    return s3_list     



print(filter_s3_by_tag_value(tag_key,tag_value))    

print(filter_s3_by_tag_key(tag_key))  

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community
Solution 2 Stuart Gilbert
Solution 3 user3691228
Solution 4 Nikolay Dimitrov
Solution 5 riccardo.cardin
Solution 6 user1767316
Solution 7 Sunil Shakya