apache spark - How to open a file which is stored in HDFS in pySpark using with open -
how open file stored in hdfs - here input file hdfs - if give file bellow , wont able open , show file not found
from pyspark import sparkconf,sparkcontext conf = sparkconf () sc = sparkcontext(conf = conf) def getmoviename(): movienames = {} open ("/user/sachinkerala6174/indata/moviestat") f: line in f: fields = line.split("|") mid = fields[0] mname = fields[1] movienames[int(fields[0])] = fields[1] return movienames namedict = sc.broadcast(getmoviename())
my assumption use
with open (sc.textfile("/user/sachinkerala6174/indata/moviestat")) f:
but didnt work
to read textfile rdd:
rdd_name = sc.textfile("/user/sachinkerala6174/indata/moviestat")
you can use "collect()" in order use in pure python (not recommended - use on small data), or use spark rdd methods in order manipulate using pyspark methods (the recommended way)
for more info: http://spark.apache.org/docs/2.0.1/api/python/pyspark.html
textfile(name, minpartitions=none, use_unicode=true)
read text file hdfs, local file system (available on nodes), or hadoop-supported file system uri, , return rdd of strings.
if use_unicode false, strings kept str (encoding utf-8), faster , smaller unicode. (added in spark 1.2)
>>> path = os.path.join(tempdir, "sample-text.txt") >>> open(path, "w") testfile: ... _ = testfile.write("hello world!") >>> textfile = sc.textfile(path) >>> textfile.collect() [u'hello world!']
Comments
Post a Comment