python - How to scrape more than 100 google pages in one pass -


i using requests library in python get data google results. https://www.google.com.pk/#q=pizza&num=10 return first 10 results of google mentioned num=10. https://www.google.com.pk/#q=pizza&num=100 return 100 results of google results.

but

if write number more 100 let https://www.google.com.pk/#q=pizza&num=200 , google still returning first 100 results

how can more 100 in 1 pass?

code:

import requests url = 'http://www.google.com/search' my_headers = { 'user-agent' : 'mozilla/11.0' } payload = { 'q' : pizza, 'start' : '0', 'num' : 200 } r = requests.get( url, params = payload, headers = my_headers ) 

in "r" getting url's of google first 100 results, not 200

you can use more programmatic api google results vs. trying screen scrape human search interface, there's no error checking or assertion complies google t&cs, suggest details of using url:

import requests  def search(query, pages=4, rsz=8):     url = 'https://ajax.googleapis.com/ajax/services/search/web'     params = {         'v': 1.0,     # version         'q': query,   # query string         'rsz': rsz,   # result set size - max 8     }      s in range(0, pages*rsz+1, rsz):         params['start'] = s         r = requests.get(url, params=params)         result in r.json()['responsedata']['results']:             yield result 

e.g. getting 200 results 'google':

>>> list(search('google', pages=24, rsz=8)) [{'gsearchresultclass': 'gwebsearch',   'cacheurl': 'http://www.google.com/search?q=cache:y14fcuqogl4j:www.google.com',   'content': 'search world&#39;s information, including webpages, images, videos , more. \n<b>google</b> has many special features find you&#39;re looking\xa0...',   'title': '<b>google</b>',   'titlenoformatting': 'google',   'unescapedurl': 'https://www.google.com/',   'url': 'https://www.google.com/',   'visibleurl': 'www.google.com'},   ... ] 

to use google's custom search api need sign developer. 100 free queries (i'm not sure if api calls or allows pagination of same query count 1 query) day:

the can use requests make query:

import requests url = 'https://www.googleapis.com/customsearch/v1' params = {     'key': '<key>',     'cx': '<cse reference>',     'q': '<search>',     'num': 10,     'start': 1 }  resp = requests.get(url, params=params) results = resp.json()['items'] 

with start can similar pagination above.

there lots of other parameters available can @ rest documentation cse: https://developers.google.com/custom-search/json-api/v1/reference/cse/list#request

google has client-api library: pip install google-api-python-client can use:

from googleapiclient import discovery service = discovery.build('customsearch', 'v1', developerkey='<key>') params = {     'q': '<query>',     'cx': '<cse reference>',     'num': 10,     'start': 1 } query = service.cse().list(**params) results = query.execute()['items'] 

Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -