python - Spark/Hadoop can't find file on AWS EMR -


i'm trying read in text file on amazon emr using python spark libraries. file in home directory (/home/hadoop/wet0), spark can't seem find it.

line in question:

lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0]) 

error:

pyspark.sql.utils.analysisexception: u'path not exist: hdfs://ip-172-31-19-121.us-west-2.compute.internal:8020/user/hadoop/wet0;' 

does file have in specific directory? can't find information anywhere on aws website.

if in local filesystem, url should file://user/hadoop/wet0 if in hdfs, should valid path. use hadoop fs command take look

e.g: hadoop fs -ls /home/hadoop

one think at, it's in "/home/hadoop", path in error "/user/hadoop". make sure aren't using ~ in command line, bash expansion before spark sees it. best use full path /home/hadoop


Comments

Popular posts from this blog

sql server - Cannot query correctly (MSSQL - PHP - JSON) -

php - trouble displaying mysqli database results in correct order -

C++ Linked List -