'Major rearrangement of pandas DataFrame containing nested lists and dictionaries ( CFBD (College Football Database))

The College Football Database (cfbd) contains all team ranks for each week of every college football season going back to 1937.I am trying to set up data from the cfbd in a way that will allow me to examine how a particular team's AP Top 25 ranking changes week to week in multiple seasons. Ideally, I would like to organize this data like so:

Team Season Week1 Rank Week2 Rank 16 more weeks worth of ranks ...
Alabama 1937 4 3 etc etc
Alabama 1938 3 6 etc
etc
Wyoming 2017 24 nr
Wyoming 2020 nr 25

So the output table will have a row for every school that has ever been ranked for each year that it has been ranked.

After running pip install cfbd I set up the API:

import pandas as pd
import cfbd
configuration = cfbd.Configuration()

# The cfbd API requires users to sign up for a free 
# private key in order to access the data. 
# https://collegefootballdata.com/key
configuration.api_key['Authorization'] = '[MY_SECRET_KEY]'
configuration.api_key_prefix['Authorization'] = 'Bearer'

api_rankings = cfbd.RankingsApi(cfbd.ApiClient(configuration))
#Initialize blank dataframe
allrankings_df = pd.DataFrame()

#year is required in the API call, so I have to collect each year separately
for yr in range(1936,2022):
    
    rankings = api_rankings.get_rankings(year = yr)  #get the data
    rankings_df = pd.DataFrame.from_records([p.to_dict() for p in rankings])
    allrankings_df2 = allrankings_df2.append(rankings_df, ignore_index=True)

This gives me a dataframe structured like so: |season|season_type|week|List of poll objects| |------|-----------|----|-------------------| |1936|regular|1|object|

Those poll objects have a Poll Name ("AP Top 25", "Coaches Poll") and ranking data. Like this, except with four or five different Polls.

{'poll': 'Coaches Poll',
  'ranks': [{'rank': 1,
    'school': 'Alabama',
    'conference': 'SEC',
    'firstPlaceVotes': 44,
    'points': 1601},
   {'rank': 2,
    'school': 'Clemson',
    'conference': 'ACC',
    'firstPlaceVotes': 14,
    'points': 1536},

The API describes all of this like so:

[
  {
    "season": 0,
    "seasonType": "string",
    "week": 0,
    "polls": [
      {
        "poll": "string",
        "ranks": [
          {
            "rank": 0,
            "school": "string",
            "conference": "string",
            "firstPlaceVotes": 0,
            "points": 0
          }
        ]
      }
    ]
  }
]

Phew. I can sort of picture how to do this by iterating through every year & week & polling object and building a new table piece by piece. But I also have read many times that I shouldn't do that - that I should be vectorizing. I know at this point I should share what I have tried so far, but to be honest, I am nowhere close. Can anyone point me in the right direction? I am willing to bang my head on this, but I can't even tell how I should be banging. Do I need some dataframe methods like melt or ravel? Or should I be trying to set this up with Boolean dataframe referencing?

references: https://api.collegefootballdata.com/api/docs/?url=/api-docs.json#/rankings/getRankings https://pypi.org/project/cfbd/



Solution 1:[1]

import pandas as pd
import cfbd
configuration = cfbd.Configuration()

# The cfbd API requires users to sign up for a free 
# private key in order to access the data. 
# https://collegefootballdata.com/key
configuration.api_key['Authorization'] = 'IXRIjbjaO/WQVVLk/Yj2tA1yB2K1K2ZprNcCmueKdHKMHbyst4fYD6lVbp1xeHXB'
configuration.api_key_prefix['Authorization'] = 'Bearer'

api_rankings = cfbd.RankingsApi(cfbd.ApiClient(configuration))
#Initialize blank dataframe
allrankings_df = pd.DataFrame()

#year is required in the API call, so I have to collect each year separately
for yr in range(1936,2022):
    print(yr)
    rankings = api_rankings.get_rankings(year = yr)  #get the data
    rankings = [p.to_dict() for p in rankings]
    
    # Flattens the json
    rankings_df = pd.json_normalize(rankings, 
                                    record_path=['polls', 'ranks'],
                                    meta = ['season','week'])
    
    rankings_df = rankings_df.drop_duplicates(subset=['school','week'], keep='first')

    
    # Pivot so Week rankings are the columns
    rankings_df = rankings_df.pivot(
        index = ['school','season'],
        columns = 'week',
        values = 'rank').add_prefix('Week').add_suffix(' Rank').reset_index(drop=False)
    
    rankings_df = rankings_df.rename(columns={'school':'Team',
                                              'season':'Season'})
    
    allrankings_df = pd.concat([allrankings_df, rankings_df], axis=0)
   
allrankings_df = allrankings_df.sort_values(['Team', 'Season']).reset_index(drop=True)

Output:

First 20 rows:

print(allrankings_df.head(20).to_string())
week               Team  Season  Week1 Rank  Week2 Rank  Week3 Rank  Week4 Rank  Week5 Rank  Week6 Rank  Week7 Rank  Week8 Rank  Week9 Rank  Week10 Rank  Week11 Rank  Week12 Rank  Week13 Rank  Week14 Rank  Week15 Rank  Week16 Rank  Week17 Rank
0     Abilene Christian    2012        10.0         9.0        20.0        19.0         NaN         NaN         NaN         NaN         NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
1             Air Force    1958         NaN         NaN         NaN         NaN         NaN        14.0        13.0        10.0        10.0          9.0          8.0          NaN          NaN          NaN          NaN          NaN          NaN
2             Air Force    1959        15.0         NaN        18.0        18.0        17.0         NaN         NaN        18.0         NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
3             Air Force    1969         NaN         NaN         NaN         NaN         NaN        20.0        19.0        19.0        20.0          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
4             Air Force    1970         NaN         NaN        20.0        10.0         8.0         7.0         7.0         7.0         9.0         13.0         10.0         12.0         11.0         11.0          NaN          NaN          NaN
5             Air Force    1971         NaN         NaN         NaN         NaN         NaN         NaN        20.0        18.0         NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
6             Air Force    1972         NaN         NaN         NaN         NaN        19.0        16.0        16.0         NaN        19.0          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
7             Air Force    1983         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN          NaN          NaN          NaN         18.0         17.0         16.0          NaN          NaN
8             Air Force    1985         NaN         NaN         NaN         NaN        19.0        17.0        13.0        10.0         8.0          7.0          5.0          4.0         13.0         11.0         10.0         10.0          NaN
9             Air Force    1989         NaN         NaN         NaN         NaN        24.0        20.0        17.0        19.0         NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
10            Air Force    1995         NaN         NaN         NaN        21.0         NaN         NaN         NaN         NaN         NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
11            Air Force    1997         NaN         NaN         NaN         NaN         NaN         NaN        23.0        19.0        18.0          NaN          NaN          NaN          NaN          NaN         24.0         23.0         23.0
12            Air Force    1998         NaN         NaN         NaN        23.0         NaN         NaN         NaN         NaN         NaN         25.0         23.0         20.0         18.0         17.0         16.0          NaN          NaN
13            Air Force    1999         NaN         NaN         NaN         NaN        24.0         NaN         NaN         NaN         NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
14            Air Force    2002         NaN         NaN         NaN         NaN         NaN         NaN        25.0        19.0        15.0         19.0          NaN          NaN          NaN          NaN          NaN          NaN          NaN
15            Air Force    2003         NaN         NaN         NaN         NaN         NaN         NaN        25.0         NaN         NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
16            Air Force    2010         NaN         NaN         NaN         NaN         NaN        25.0        23.0         NaN         NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
17            Air Force    2019         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN          NaN          NaN          NaN          NaN          NaN         25.0         24.0          NaN
18              Alabama    1936         NaN         NaN        14.0         4.0         8.0         3.0         NaN         NaN         NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
19              Alabama    1937         2.0         3.0         2.0         3.0         3.0         4.0         NaN         NaN         NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
....

[3954 rows x 19 columns]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 chitown88