csv - R: Read multiple files from zip without extracting -


for given dataset, have around 5 - 20 zip files, each containing potentially hundreds of csvs. able use fread read in csvs without extracting them zip files. able download zip files, extract them , process csvs, takes large amount of disk space , ram.

here example data (just grabbed question):

write.csv(data.frame(x = 1:2, y = 1:2), tf1 <- tempfile(fileext = ".csv")) write.csv(data.frame(x = 2:3, y = 2:3), tf2 <- tempfile(fileext = ".csv")) write.csv(data.frame(x = 3:4, y = 3:4), tf3 <- tempfile(fileext = ".csv")) zip(zipfile <- tempfile(fileext = ".zip"), files = c(tf1, tf2)) zip(zipfile <- tempfile(fileext = ".zip"), files = c(tf1, tf3)) zip(zipfile <- tempfile(fileext = ".zip"), files = c(tf2, tf3)) 

existing method:

for (i in dir(pattern="\\.zip$"))     unzip(i) lapply(list.files(pattern = "*.csv"), fread) 

this trying do:

library(rio) lapply(list.files(pattern = "*.zip"), import, fread = true) 

which gives output:

[[1]]   v1 x y 1  1 2 2 2  2 3 3  [[2]]   v1 x y 1  1 1 1 2  2 2 2  [[3]]   v1 x y 1  1 1 1 2  2 2 2  warning messages: 1: in parse_zip(file) :   zip archive contains multiple files. attempting first file. 2: in parse_zip(file) :   zip archive contains multiple files. attempting first file. 3: in parse_zip(file) :   zip archive contains multiple files. attempting first file. 

it appears first csv read in each zip file. i've have searched quite bit, haven't yet found solution this.

#first obtain contents of archive:      list_of_txts<-unzip("your.zip",list=true)[,1]  list_of_txts<-list_of_txts[str_detect(list_of_txts,".xml")]   #then loop on without unzipping:  final_data<-list("vector") (i in 1:length(list_of_txts)){   conn<-unz("your.zip", list_of_txts[i)   final_data[[i]]<-fread(conn) } 

Comments

Popular posts from this blog

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -