python - scrap webpage and save as CSV with BS4 and Pandas -
normally use r when scrap webpage & save result in csv file. easy in r, r code doesn't work on raspberry pi because of incompatibilities between raspbian , 1 of packages. decided try work in python.
it's easy want do: scrap title, link , image link local news webpage , save csv.
everything ok when try code in jupyter notebook (on windows), csv file has nice data frame 12 rows, when try code on raspberry csv file contains 1 row.
this first python code, except "many hello world" know not perfect, stuck why doesn't run on raspberry
thanks help
# coding: utf-8 bs4 import beautifulsoup import urllib r = urllib.urlopen('http://krakow.tvp.pl/554275/aktualnosci').read() soup = beautifulsoup(r,'html.parser') aktualnosci = soup.find_all("div", class_={"recommended","item hidden","image border-radius-5","meta cf","title"}) tytuly = soup.find_all("li", class_ = "border-radius-5") prefix="http://krakow.tvp.pl" link_aktualnosci = [] link_grafika_aktualnosci = [] link_tytul_aktualnosci = [] #course = [] temp = [] courses_list = [] item in aktualnosci: temp1 = item.a['href'] # pobieram link artykulu link_aktualnosci.append(temp1.encode('utf-8')) temp2 = item.img.get('src') # pobieram link grafiki link_grafika_aktualnosci.append(temp2.encode('utf-8')) temp3 = item.find('span',class_="title").text.strip().encode('utf-8') # pobieram tekst tytułu link_tytul_aktualnosci.append(temp3) temp = [temp1,temp2,temp3] courses_list.append(temp) import pandas pd df = pd.dataframe(courses_list) df.to_csv('aktualnosci.csv')
i can't test now; don't have python installed. but...does work you?
import pandas pd df = pd.read_html('http://krakow.tvp.pl/554275/aktualnosci', header=0, index_col=0)[2] print (df)
Comments
Post a Comment