'StopIteration Error while using scholarly.pprint function
I am trying to extract Google Scholar public profiles of certain professors.
I have a list of professors' names and I am using it with help of a scholarly
package for scraping their public profile information. However, I am stuck with an error. I am only able to retrieve information for the first name in the professor_list
and not the subsequent ones.
for name in professor_list:
search_query = scholarly.search_author(name)
scholarly.pprint(next(search_query))
Output:
{'affiliation': 'Deakin University',
'citedby': 2528,
'email_domain': '@deakin.edu.au',
'filled': False,
'interests': ['Lynn Batten'],
'name': 'Lynn Batten',
'scholar_id': 'Tmg0T9sAAAAJ',
'source': 'SEARCH_AUTHOR_SNIPPETS',
'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=Tmg0T9sAAAAJ'}
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-242-5b96571c0972> in <module>
1 for name in professor_list:
2 search_query = scholarly.search_author(name)
----> 3 scholarly.pprint(next(search_query))
StopIteration:
Solution 1:[1]
Although, scholarly.pprint(next(search_query))
should be working, you can add default value None
for next()
method in case nothing is found, e.g. next(search_query, None)
:
from scholarly import scholarly
professor_list = ["Marty Banks, Berkeley",
"Adam Lobel, Blizzard",
"Daniel Blizzard, Blizzard",
"Shuo Chen, Blizzard",
"Ian Livingston, Blizzard",
"Minli Xu, Blizzard"]
for professor_name in professor_list:
search_query = scholarly.search_author(name=professor_name)
scholarly.pprint(next(search_query, None))
More information about StopIteration
by Martijn Pieters.
Full output:
{'affiliation': 'Professor of Vision Science, UC Berkeley',
'citedby': 22559,
'email_domain': '@berkeley.edu',
'filled': False,
'interests': ['vision science', 'psychology', 'human factors', 'neuroscience'],
'name': 'Martin Banks',
'scholar_id': 'Smr99uEAAAAJ',
'source': 'SEARCH_AUTHOR_SNIPPETS',
'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=Smr99uEAAAAJ'}
{'affiliation': 'Blizzard Entertainment',
'citedby': 3050,
'email_domain': '@AdamLobel.com',
'filled': False,
'interests': ['Gaming', 'Emotion regulation'],
'name': 'Adam Lobel',
'scholar_id': '_xwYD2sAAAAJ',
'source': 'SEARCH_AUTHOR_SNIPPETS',
'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=_xwYD2sAAAAJ'}
{'affiliation': '',
'citedby': 873,
'email_domain': '',
'filled': False,
'interests': ['Daniel Blizzard'],
'name': 'Daniel Blizzard',
'scholar_id': 'dk4LWEgAAAAJ',
'source': 'SEARCH_AUTHOR_SNIPPETS',
'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=dk4LWEgAAAAJ'}
{'affiliation': 'Senior Data Scientist, Blizzard Entertainment',
'citedby': 656,
'email_domain': '@cs.cornell.edu',
'filled': False,
'interests': ['Machine Learning', 'Data Mining', 'Artificial Intelligence'],
'name': 'Shuo Chen',
'scholar_id': 'OBf4YnkAAAAJ',
'source': 'SEARCH_AUTHOR_SNIPPETS',
'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=OBf4YnkAAAAJ'}
{'affiliation': 'Blizzard Entertainment',
'citedby': 620,
'email_domain': '@usask.ca',
'filled': False,
'interests': ['Human-computer interaction',
'User Experience',
'Player Experience',
'User Research',
'Games'],
'name': 'Ian Livingston',
'scholar_id': 'xBHVqNIAAAAJ',
'source': 'SEARCH_AUTHOR_SNIPPETS',
'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=xBHVqNIAAAAJ'}
{'affiliation': 'Blizzard Entertainment',
'citedby': 502,
'email_domain': '@blizzard.com',
'filled': False,
'interests': ['Game', 'Machine Learning', 'Data Science', 'Bioinformatics'],
'name': 'Minli Xu',
'scholar_id': 'QST5iogAAAAJ',
'source': 'SEARCH_AUTHOR_SNIPPETS',
'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=QST5iogAAAAJ'}
Alternatively, you can iterate one more time over scholarly.search_author()
results to make it work:
from scholarly import scholarly
import json
professor_list = ["Marty Banks, Berkeley",
"Adam Lobel, Blizzard",
"Daniel Blizzard, Blizzard",
"Shuo Chen, Blizzard",
"Ian Livingston, Blizzard",
"Minli Xu, Blizzard"]
professor_results = []
for professor_name in professor_list:
for professor_result in scholarly.search_author(name=professor_name):
professor_results.append({
"name": professor_result.get("name"),
"affiliations": professor_result.get("affiliation"),
"email_domain": professor_result.get("email_domain"),
"interests": professor_result.get("interests"),
"citedby": professor_result.get("citedby")
})
print(json.dumps(professor_results, indent=2, ensure_ascii=False))
Full output:
[
{
"name": "Martin Banks",
"affiliations": "Professor of Vision Science, UC Berkeley",
"email_domain": "@berkeley.edu",
"interests": [
"vision science",
"psychology",
"human factors",
"neuroscience"
],
"citedby": 22559
},
{
"name": "Adam Lobel",
"affiliations": "Blizzard Entertainment",
"email_domain": "@AdamLobel.com",
"interests": [
"Gaming",
"Emotion regulation"
],
"citedby": 3050
},
{
"name": "Daniel Blizzard",
"affiliations": "",
"email_domain": "",
"interests": [
"Daniel Blizzard"
],
"citedby": 873
},
{
"name": "Shuo Chen",
"affiliations": "Senior Data Scientist, Blizzard Entertainment",
"email_domain": "@cs.cornell.edu",
"interests": [
"Machine Learning",
"Data Mining",
"Artificial Intelligence"
],
"citedby": 656
},
{
"name": "Ian Livingston",
"affiliations": "Blizzard Entertainment",
"email_domain": "@usask.ca",
"interests": [
"Human-computer interaction",
"User Experience",
"Player Experience",
"User Research",
"Games"
],
"citedby": 620
},
{
"name": "Minli Xu",
"affiliations": "Blizzard Entertainment",
"email_domain": "@blizzard.com",
"interests": [
"Game",
"Machine Learning",
"Data Science",
"Bioinformatics"
],
"citedby": 502
}
]
Another alternative is to use Google Scholar Profiles API from SerpApi. It's a paid API with a free plan that handles scaling, bypasses blocks from search engines via dedicated proxies and CAPTCHA solving services. Check out the playground.
Example code to integrate:
from serpapi import GoogleScholarSearch
import json
professor_list = ["Marty Banks, Berkeley",
"Adam Lobel, Blizzard",
"Daniel Blizzard, Blizzard",
"Shuo Chen, Blizzard",
"Ian Livingston, Blizzard",
"Minli Xu, Blizzard"]
for professor_name in professor_list:
params = {
"api_key": "Your SerpApi API key",
"engine": "google_scholar_profiles",
"hl": "en",
"mauthors": professor_name
}
search = GoogleScholarSearch(params)
results = search.get_dict()
for result in results["profiles"]:
print(json.dumps(result, indent=2))
Full output:
{
"name": "Martin Banks",
"link": "https://scholar.google.com/citations?hl=en&user=Smr99uEAAAAJ",
"serpapi_link": "https://serpapi.com/search.json?author_id=Smr99uEAAAAJ&engine=google_scholar_author&hl=en",
"author_id": "Smr99uEAAAAJ",
"affiliations": "Professor of Vision Science, UC Berkeley",
"email": "Verified email at berkeley.edu",
"cited_by": 22559,
"interests": [
{
"title": "vision science",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Avision_science",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:vision_science"
},
{
"title": "psychology",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Apsychology",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:psychology"
},
{
"title": "human factors",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Ahuman_factors",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:human_factors"
},
{
"title": "neuroscience",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aneuroscience",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:neuroscience"
}
],
"thumbnail": "https://scholar.google.com/citations/images/avatar_scholar_56.png"
}
{
"name": "Adam Lobel",
"link": "https://scholar.google.com/citations?hl=en&user=_xwYD2sAAAAJ",
"serpapi_link": "https://serpapi.com/search.json?author_id=_xwYD2sAAAAJ&engine=google_scholar_author&hl=en",
"author_id": "_xwYD2sAAAAJ",
"affiliations": "Blizzard Entertainment",
"email": "Verified email at AdamLobel.com",
"cited_by": 3050,
"interests": [
{
"title": "Gaming",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Agaming",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:gaming"
},
{
"title": "Emotion regulation",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aemotion_regulation",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:emotion_regulation"
}
],
"thumbnail": "https://scholar.googleusercontent.com/citations?view_op=small_photo&user=_xwYD2sAAAAJ&citpid=3"
}
https://serpapi.com/search
{
"name": "Daniel Blizzard",
"link": "https://scholar.google.com/citations?hl=en&user=dk4LWEgAAAAJ",
"serpapi_link": "https://serpapi.com/search.json?author_id=dk4LWEgAAAAJ&engine=google_scholar_author&hl=en",
"author_id": "dk4LWEgAAAAJ",
"affiliations": "",
"cited_by": 873,
"thumbnail": "https://scholar.google.com/citations/images/avatar_scholar_56.png"
}
{
"name": "Shuo Chen",
"link": "https://scholar.google.com/citations?hl=en&user=OBf4YnkAAAAJ",
"serpapi_link": "https://serpapi.com/search.json?author_id=OBf4YnkAAAAJ&engine=google_scholar_author&hl=en",
"author_id": "OBf4YnkAAAAJ",
"affiliations": "Senior Data Scientist, Blizzard Entertainment",
"email": "Verified email at cs.cornell.edu",
"cited_by": 656,
"interests": [
{
"title": "Machine Learning",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Amachine_learning",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:machine_learning"
},
{
"title": "Data Mining",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Adata_mining",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:data_mining"
},
{
"title": "Artificial Intelligence",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aartificial_intelligence",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:artificial_intelligence"
}
],
"thumbnail": "https://scholar.googleusercontent.com/citations?view_op=small_photo&user=OBf4YnkAAAAJ&citpid=1"
}
{
"name": "Ian Livingston",
"link": "https://scholar.google.com/citations?hl=en&user=xBHVqNIAAAAJ",
"serpapi_link": "https://serpapi.com/search.json?author_id=xBHVqNIAAAAJ&engine=google_scholar_author&hl=en",
"author_id": "xBHVqNIAAAAJ",
"affiliations": "Blizzard Entertainment",
"email": "Verified email at usask.ca",
"cited_by": 620,
"interests": [
{
"title": "Human-computer interaction",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Ahuman_computer_interaction",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:human_computer_interaction"
},
{
"title": "User Experience",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Auser_experience",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:user_experience"
},
{
"title": "Player Experience",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aplayer_experience",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:player_experience"
},
{
"title": "User Research",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Auser_research",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:user_research"
},
{
"title": "Games",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Agames",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:games"
}
],
"thumbnail": "https://scholar.google.com/citations/images/avatar_scholar_56.png"
}
{
"name": "Minli Xu",
"link": "https://scholar.google.com/citations?hl=en&user=QST5iogAAAAJ",
"serpapi_link": "https://serpapi.com/search.json?author_id=QST5iogAAAAJ&engine=google_scholar_author&hl=en",
"author_id": "QST5iogAAAAJ",
"affiliations": "Blizzard Entertainment",
"email": "Verified email at blizzard.com",
"cited_by": 502,
"interests": [
{
"title": "Game",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Agame",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:game"
},
{
"title": "Machine Learning",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Amachine_learning",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:machine_learning"
},
{
"title": "Data Science",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Adata_science",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:data_science"
},
{
"title": "Bioinformatics",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Abioinformatics",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:bioinformatics"
}
],
"thumbnail": "https://scholar.googleusercontent.com/citations?view_op=small_photo&user=QST5iogAAAAJ&citpid=14"
}
Disclaimer, I work for SerpApi.
Solution 2:[2]
When one uses the following code:
search_query = scholarly.search_pubs('A Bayesian Analysis of the Style Goods Inventory Problem')
scholarly.pprint(next(search_query))
Is there a way of saving the output of the above code as a dataframe?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Dmitriy Zub |
Solution 2 | Addy |