'How do I delete all except the latest 5 recently updated/new files from AWS s3?

I can fetch the last five updated files from AWS S3 using the below command

aws s3 ls s3://somebucket/ --recursive | sort | tail -n 5 | awk '{print $4}'

Now I need to delete all the files in AWS S3 except the last 5 files which are fetched from above command in AWS.

Say the command fetches 1.txt,2.txt,3.txt,4.txt,5.txt. I need to delete all from AWS S3 except 1.txt,2.txt,3.txt,4.txt,and 5.txt.



Solution 1:[1]

Use AWS s3 rm command with multiple --exclude options (I assume the last 5 files do not fall under a pattern)

aws s3 rm s3://somebucket/ --recursive --exclude "somebucket/1.txt" --exclude "somebucket/2.txt" --exclude "somebucket/3.txt" --exclude "somebucket/4.txt" --exclude "somebucket/5.txt"

CAUTION: Make sure you try it with --dryrun option, verify the files to be deleted do not include the 5 files before actually removing the files.

Solution 2:[2]

Use a negative number with head to get all but the last n lines:

aws s3 ls s3://somebucket/ --recursive | sort | head -n -5 | while read -r line ; do
    echo "Removing ${line}"
    aws s3 rm s3://somebucket/${line}
done

Solution 3:[3]

Short story : Based on @bcattle answser, this work for AWS CLI 2:

aws s3 ls s3://[BUCKER_NAME] --recursive | awk 'NF>1{print $4}' | grep . | sort | head -n -5 | while read -r line ; do
    echo "Removing ${line}"
    aws s3 rm s3://[BUCKER_NAME]/${line}
done

Long story : aws s3 ls is returning under CLI 2 file path, but also date creation. This behaviour insn't expected in our script, as we want only the file path to be concatenated with bucket uri.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 helloV
Solution 2 bcattle
Solution 3 emilie zawadzki