'Searching in Google with Python
I want to search a text in Google using a python script and return the name, description and URL for each result. I'm currently using this code:
from google import search
ip=raw_input("What would you like to search for? ")
for url in search(ip, stop=20):
print(url)
This returns only the URL's. How can I return the name and description for each URL?
Solution 1:[1]
Not exactly what I was looking for, but I found myself a nice solution for now (I might edit this if I will able to make this better). I combined searching in Google like I did (returning only URL) and the Beautiful Soup package for parsing HTML pages:
from googlesearch import search
import urllib
from bs4 import BeautifulSoup
def google_scrape(url):
thepage = urllib.urlopen(url)
soup = BeautifulSoup(thepage, "html.parser")
return soup.title.text
i = 1
query = 'search this'
for url in search(query, stop=10):
a = google_scrape(url)
print str(i) + ". " + a
print url
print " "
i += 1
This gives me a list of the title of pages and the link.
And another great solutions:
from googlesearch import search
import requests
for url in search(ip, stop=10):
r = requests.get(url)
title = everything_between(r.text, '<title>', '</title>')
Solution 2:[2]
I assume you are using this library by Mario Vilas because of the stop=20
argument which appears in his code. It seems like this library is not able to return anything but the URLs, making it horribly undeveloped. As such, what you want to do is not possible with the library you are currently using.
I would suggest you instead use abenassi/Google-Search-API. Then you can simply do:
from google import google
num_page = 3
search_results = google.search("This is my query", num_page)
for result in search_results:
print(result.description)
Solution 3:[3]
Most of them I tried using, but didn't work out for me or gave errors like search module not found despite importing packages. Or I did work out with selenium web driver and it works great if used with Firefox or chrome or Phantom web browser, but still I felt it was a bit slow in terms of execution time, as it queried browser first and then returned search result.
So I thought of using google api and it works amazingly quick and returns results accurately.
Before I share the code here are few quick tips to follow:-
- Register on Google Api to get a Google Api key (free version)
- Now search for Google Custom Search and set up your free account to get a custom search id
- Now add this package(google-api-python-client) in your python project (can be done by writing !pip install google-api-python-client )
That is it and all you have to do now is run this code:-
from googleapiclient.discovery import build
my_api_key = "your API KEY TYPE HERE"
my_cse_id = "YOUR CUSTOM SEARCH ENGINE ID TYPE HERE"
def google_search(search_term, api_key, cse_id, **kwargs):
service = build("customsearch", "v1", developerKey=api_key)
res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
return res['items']
results= google_search("YOUR SEARCH QUERY HERE",my_api_key,my_cse_id,num=10)
for result in results:
print(result["link"])
Solution 4:[4]
You can also use a third-party service like SerpApi which is a Google search engine results. It solves the issues of having to rent proxies and parsing the HTML results. JSON output is particularly rich.
It's easy to integrate with Python:
from serpapi import GoogleSearch
params = {
"q" : "Coffee",
"location" : "Austin, Texas, United States",
"hl" : "en",
"gl" : "us",
"google_domain" : "google.com",
"api_key" : "demo",
}
query = GoogleSearch(params)
dictionary_results = query.get_dict()
GitHub: https://github.com/serpapi/google-search-results-python
Solution 5:[5]
Usually, you cannot use google search function from python by importing google package in python3. but you can use it in python2.
Even by using the requests.get(url+query) the scraping won't perform because google prevents scraping by redirecting it to captcha page.
Possible ways:
- You can write code in python2
- If you want to write it in python3, then make 2 files and retrieve search results from python2 script.
- If found difficult, the best way is to use Google Colab or Jupyter Notebook with python3 runtime. You won't get any error.
Solution 6:[6]
You can use the Google Search Origin package which integrate most of the parameters available on google (it includes dorks and filters). It is based on the google official documentation. Moreover using it will create an object so it will be easily modifiable. For more information look at the project here: https://pypi.org/project/google-search-origin/
Here an example of how using it :
import google_search_origin
if __name__ == '__main__':
# Initialisation of the class
google_search = google_search_origin.GoogleSearchOrigin(search='sun')
# Request from the url assembled
google_search.request_url()
# Display the link parsed depending on the result
print(google_search.get_all_links())
# Modify the parameter
google_search.parameter_search('dog')
# Assemble the url
google_search.assemble_url()
# Request from the url assembled
google_search.request_url()
# Display the raw text depending on the result
print(google_search.get_response_text())
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Mooncrater |
Solution 2 | |
Solution 3 | Piyush Rumao |
Solution 4 | Illia Zub |
Solution 5 | DisappointedByUnaccountableMod |
Solution 6 |