csv - R: Read multiple files from zip without extracting -
for given dataset, have around 5 - 20 zip files, each containing potentially hundreds of csvs. able use fread read in csvs without extracting them zip files. able download zip files, extract them , process csvs, takes large amount of disk space , ram.
here example data (just grabbed question):
write.csv(data.frame(x = 1:2, y = 1:2), tf1 <- tempfile(fileext = ".csv")) write.csv(data.frame(x = 2:3, y = 2:3), tf2 <- tempfile(fileext = ".csv")) write.csv(data.frame(x = 3:4, y = 3:4), tf3 <- tempfile(fileext = ".csv")) zip(zipfile <- tempfile(fileext = ".zip"), files = c(tf1, tf2)) zip(zipfile <- tempfile(fileext = ".zip"), files = c(tf1, tf3)) zip(zipfile <- tempfile(fileext = ".zip"), files = c(tf2, tf3))
existing method:
for (i in dir(pattern="\\.zip$")) unzip(i) lapply(list.files(pattern = "*.csv"), fread)
this trying do:
library(rio) lapply(list.files(pattern = "*.zip"), import, fread = true)
which gives output:
[[1]] v1 x y 1 1 2 2 2 2 3 3 [[2]] v1 x y 1 1 1 1 2 2 2 2 [[3]] v1 x y 1 1 1 1 2 2 2 2 warning messages: 1: in parse_zip(file) : zip archive contains multiple files. attempting first file. 2: in parse_zip(file) : zip archive contains multiple files. attempting first file. 3: in parse_zip(file) : zip archive contains multiple files. attempting first file.
it appears first csv read in each zip file. i've have searched quite bit, haven't yet found solution this.
#first obtain contents of archive: list_of_txts<-unzip("your.zip",list=true)[,1] list_of_txts<-list_of_txts[str_detect(list_of_txts,".xml")] #then loop on without unzipping: final_data<-list("vector") (i in 1:length(list_of_txts)){ conn<-unz("your.zip", list_of_txts[i) final_data[[i]]<-fread(conn) }
Comments
Post a Comment