How can I see the underlying Hadoop file system from Spark -


i have started spark this:

spark-shell --master local[10] 

i'm trying see files on underlying hadoop installation.

i want this:

hdfs ls 

how can it?

if understand question correctly want execute hdfs commands shell. in opinion running spark job may not help.

you need start hdfs instance first. below commands documentation. once hdfs started can run shell commands.

to start hadoop cluster need start both hdfs , yarn cluster.

the first time bring hdfs, must formatted. format new distributed filesystem hdfs:

[hdfs]$ $hadoop_prefix/bin/hdfs namenode -format start hdfs namenode following command on designated node hdfs:

[hdfs]$ $hadoop_prefix/sbin/hadoop-daemon.sh --config $hadoop_conf_dir --script hdfs start namenode start hdfs datanode following command on each designated node hdfs:

[hdfs]$ $hadoop_prefix/sbin/hadoop-daemons.sh --config $hadoop_conf_dir --script hdfs start datanode if etc/hadoop/slaves , ssh trusted access configured (see single node setup), of hdfs processes can started utility script. hdfs:

[hdfs]$ $hadoop_prefix/sbin/start-dfs.sh start yarn following command, run on designated resourcemanager yarn:

[yarn]$ $hadoop_yarn_home/sbin/yarn-daemon.sh --config $hadoop_conf_dir start resourcemanager run script start nodemanager on each designated host yarn:

[yarn]$ $hadoop_yarn_home/sbin/yarn-daemons.sh --config $hadoop_conf_dir start nodemanager start standalone webappproxy server. run on webappproxy server yarn. if multiple servers used load balancing should run on each of them:

[yarn]$ $hadoop_yarn_home/sbin/yarn-daemon.sh --config $hadoop_conf_dir start proxyserver if etc/hadoop/slaves , ssh trusted access configured (see single node setup), of yarn processes can started utility script. yarn:

[yarn]$ $hadoop_prefix/sbin/start-yarn.sh start mapreduce jobhistory server following command, run on designated server mapred:

[mapred]$ $hadoop_prefix/sbin/mr-jobhistory-daemon.sh --config $hadoop_conf_dir start historyserver

second option programmatic way. can use filesystem class hadoop (it java implementation.) , hdfs operations.

below link javadoc.

https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/fs/filesystem.html


Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -