Scraping Javascript text from web page using Python -
i trying pull latitude , longitude tripadvisor website various restaurants. looking through html of restaurant in hong kong.
restaurant attempting scrape from
in html found this:
html code latitude , longitude
i want scrape latitude , longitude here can't seem out when attempt print it. below code, , suggestion helpful.
#import libraries import requests bs4 import beautifulsoup import csv #loop move next pages. entries in increments of 30 per page in range(0, 1, 30): #need here when want more 30 while <= range: = str(i) #url format offsets restaurants in increments of 30 after oa url1 = 'https://www.tripadvisor.com/restaurants-g294217-oa' + + '-hong_kong.html#eatery_list_contents' r1 = requests.get(url1) data1 = r1.text soup1 = beautifulsoup(data1, "html.parser") link in soup1.findall('a', {'property_title'}): #print 'https://www.tripadvisor.com/restaurant_review-g294217-' + link.get('href') restaurant_url = 'https://www.tripadvisor.com/restaurant_review-g294217-' + link.get('href') #print restaurant_url r2 = requests.get(restaurant_url) data2 = r2.text soup2 = beautifulsoup(data2, "html.parser") script in soup2.findall('script', {'type', 'text/javascript', 'lat'}): print script.string
to scrape javascript powered pages need use selenium.
Comments
Post a Comment