python - Spark/Hadoop can't find file on AWS EMR -
i'm trying read in text file on amazon emr using python spark libraries. file in home directory (/home/hadoop/wet0), spark can't seem find it.
line in question:
lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0]) error:
pyspark.sql.utils.analysisexception: u'path not exist: hdfs://ip-172-31-19-121.us-west-2.compute.internal:8020/user/hadoop/wet0;' does file have in specific directory? can't find information anywhere on aws website.
if in local filesystem, url should file://user/hadoop/wet0 if in hdfs, should valid path. use hadoop fs command take look
e.g: hadoop fs -ls /home/hadoop
one think at, it's in "/home/hadoop", path in error "/user/hadoop". make sure aren't using ~ in command line, bash expansion before spark sees it. best use full path /home/hadoop
Comments
Post a Comment