python - scrap webpage and save as CSV with BS4 and Pandas -


normally use r when scrap webpage & save result in csv file. easy in r, r code doesn't work on raspberry pi because of incompatibilities between raspbian , 1 of packages. decided try work in python.

it's easy want do: scrap title, link , image link local news webpage , save csv.

everything ok when try code in jupyter notebook (on windows), csv file has nice data frame 12 rows, when try code on raspberry csv file contains 1 row.

this first python code, except "many hello world" know not perfect, stuck why doesn't run on raspberry

thanks help

# coding: utf-8 bs4 import beautifulsoup import urllib r = urllib.urlopen('http://krakow.tvp.pl/554275/aktualnosci').read() soup = beautifulsoup(r,'html.parser') aktualnosci = soup.find_all("div", class_={"recommended","item hidden","image border-radius-5","meta cf","title"}) tytuly = soup.find_all("li", class_ = "border-radius-5")  prefix="http://krakow.tvp.pl" link_aktualnosci = [] link_grafika_aktualnosci = [] link_tytul_aktualnosci = [] #course = [] temp = [] courses_list = []  item in aktualnosci:     temp1 = item.a['href'] # pobieram link artykulu     link_aktualnosci.append(temp1.encode('utf-8'))      temp2 = item.img.get('src') # pobieram link grafiki     link_grafika_aktualnosci.append(temp2.encode('utf-8'))       temp3 = item.find('span',class_="title").text.strip().encode('utf-8') # pobieram tekst tytułu     link_tytul_aktualnosci.append(temp3)      temp = [temp1,temp2,temp3]     courses_list.append(temp)  import pandas pd  df = pd.dataframe(courses_list)  df.to_csv('aktualnosci.csv') 

i can't test now; don't have python installed. but...does work you?

import pandas pd df = pd.read_html('http://krakow.tvp.pl/554275/aktualnosci',      header=0,      index_col=0)[2] print (df) 

Comments

Popular posts from this blog

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -