'How to use Boto3 pagination

BACKGROUND:

The AWS operation to list IAM users returns a max of 50 by default.

Reading the docs (links) below I ran following code and returned a complete set data by setting the "MaxItems" to 1000.

paginator = client.get_paginator('list_users')
response_iterator = paginator.paginate(
 PaginationConfig={
     'MaxItems': 1000,
     'PageSize': 123})
for page in response_iterator:
    u = page['Users']
    for user in u:
        print(user['UserName'])

http://boto3.readthedocs.io/en/latest/guide/paginators.html https://boto3.readthedocs.io/en/latest/reference/services/iam.html#IAM.Paginator.ListUsers

QUESTION:

If the "MaxItems" was set to 10, for example, what would be the best method to loop through the results?

I tested with the following but it only loops 2 iterations before 'IsTruncated' == False and results in "KeyError: 'Marker'". Not sure why this is happening because I know there are over 200 results.

marker = None

while True:
    paginator = client.get_paginator('list_users')
    response_iterator = paginator.paginate( 
        PaginationConfig={
            'MaxItems': 10,
            'StartingToken': marker})
    #print(response_iterator)
    for page in response_iterator:
        u = page['Users']
        for user in u:
            print(user['UserName'])
            print(page['IsTruncated'])
            marker = page['Marker']
            print(marker)
        else:
            break


Solution 1:[1]

(Answer rewrite) **NOTE **, the paginator contains a bug that doesn't tally with the documentation (or vice versa). MaxItems doesn't return the Marker or NextToken when total items exceed MaxItems number. Indeed PageSize is the one that controlling return of Marker/NextToken indictator.

import sys
import boto3
iam = boto3.client("iam")
marker = None
while True:
    paginator = iam.get_paginator('list_users')
    response_iterator = paginator.paginate( 
        PaginationConfig={
            'PageSize': 10,
            'StartingToken': marker})
    for page in response_iterator:
        print("Next Page : {} ".format(page['IsTruncated']))
        u = page['Users']
        for user in u:
            print(user['UserName'])
    try:
        marker = response_iterator['Marker']
        print(marker)
    except KeyError:
        sys.exit()

It is not your mistake that your code doesn't works. MaxItems in the paginator seems become a "threshold" indicator. Ironically, the MaxItems inside original boto3.iam.list_users still works as mentioned.

If you check boto3.iam.list_users, you will notice either you omit Marker, otherwise you must put a value. Apparently, paginator is NOT a wrapper for all boto3 class list_* method.

import sys
import boto3
iam = boto3.client("iam")
marker = None
while True:
    if marker:
        response_iterator = iam.list_users(
            MaxItems=10,
            Marker=marker
        )
    else:
        response_iterator = iam.list_users(
            MaxItems=10
        )
    print("Next Page : {} ".format(response_iterator['IsTruncated']))
    for user in response_iterator['Users']:
        print(user['UserName'])

    try:
        marker = response_iterator['Marker']
        print(marker)
    except KeyError:
        sys.exit()

You can follow up the issue I filed in boto3 github. According to the member, you can call build_full_result after paginate(), that will show the desire behavior.

Solution 2:[2]

I will post my solution here and hopefully help other people do their job faster instead of fiddling around with the amazingly written boto3 api calls.

My use case was to list all the Security Hub ControlIds using the SecurityHub.Client.describe_standards_controls function.


controlsResponse = sh_client.describe_standards_controls(
StandardsSubscriptionArn = enabledStandardSubscriptionArn)

controls = controlsResponse.get('Controls')

# This is the token for the 101st item in the list.
nextToken = controlsResponse.get('NextToken') 

# Call describe_standards_controls with the token set at item 101 to get the next 100 results 
controlsResponse1 = sh_client.describe_standards_controls(
StandardsSubscriptionArn = enabledStandardSubscriptionArn, NextToken=nextToken)

controls1 = controlsResponse1.get('Controls')

# And make the two lists into one
controls.extend(controls1)

No you have a list of all the SH standards controls for the specified Subscription Standard(e.g., AWS foundational Standard)

For example if I want to get all the ControlIds I can just iterate the 'controls' list and do

controlId=control.get("ControlId")

same for other field in the response as it is described here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Or Arbel
Solution 2 Stefanos Asl.