Hadoop dfsadmin -report command is not working in mapr - hdfs

I need to know the dfs report of the mapr cluster but when i am executing following command i am getting error
hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
report: FileSystem maprfs:/// is not an HDFS file system
Usage: java DFSAdmin [-report] [-live] [-dead] [-decommissioning]
Is there any way to do it in MAPR.
I tried this link as well but it doesn't provided needed information.

Try below commands:
maprcli node list
maprcli dashboard info

Related

Getting an error while copying file from aws EC2 instance to hadoop cluster

I am not able to run any hadoop command even -ls is also not working getting same error and not able to create directory using
hadoop fs -mkdir users.
You can create one directory in HDFS using the command :
$ hdfs dfs -mkdir "path"
and, then use the given below command to copy data from the local file to HDFS:
$ hdfs dfs -put /root/Hadoop/sample.txt /"your_hdfs_dir_path"
Alternatively, you can also use the below command:
$ hdfs dfs -copyFromLocal /root/Hadoop/sample.txt /"your_hdfs_dir_path"

What is the correct way of installing a JDBC driver on EMR for Sqoop to use?

I am running Sqoop 1.4.7 on AWS EMR 5.21.1 and am trying to import data from a database. I have successfully been able to do this manually where I create an EMR instance with Sqoop installed via the EMR Console.
Here are the preliminary steps that I performed in order to run sqoop on EMR
Download the JDBC Driver
Move the JDBC driver to the /usr/lib/sqoop/lib directory
I was able to successfully run a sqoop import when I was sshd into an EMR cluster with these commands:
wget -O mssql-jdbc.jar https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/8.4.0.jre8/mssql-jdbc-8.4.0.jre8.jar
sudo mv mssql-jdbc.jar /usr/lib/sqoop/lib/
When I try to run these commands from an EMR bootstrap script however I get the error:
usr/lib/sqoop/lib/ No such file or directory
After doing some investigation I realized this is because "Bootstrap actions execute before core services, such as Hadoop or Spark, are installed", as found here
So the /usr/lib/sqoop/lib directory doesnt exist when I run my bootstrap steps.
Here are some solutions which work but they feel like work-arounds
Create the /usr/lib/sqoop/lib directory in my bootstrap script and then place the jar in it
Add the jar to this directory as an EMR step. (Turns out this this is the correct approach, look at below accepted answer)
What is the correct way of installing this JDBC driver on EMR?
The 2nd option is the correct way to do it. The documentation explains running bash scripts as an EMR step.
You can also use the jar command-runner.jar and the arguments to be
bash -c "wget -O mssql-jdbc.jar https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/8.4.0.jre8/mssql-jdbc-8.4.0.jre8.jar;sudo mv mssql-jdbc.jar /usr/lib/sqoop/lib/"

spark cluster on aws emr cant find spark-env.sh

I am playing with apache-spark on aws emr, and trying to use this to set the cluster to use python3,
I use the command as the last command in a bootstrap script
sudo sed -i -e '$a\export PYSPARK_PYTHON=/usr/bin/python3' /etc/spark/conf/spark-env.sh
When I use it the cluster crashes during the bootstrap with the following error.
sed: can't read /etc/spark/conf/spark-env.sh: No such file or
directory
How should I set it to use python3 properly?
This is not a duplicate of, My issue is that the cluster is not finding the spark-env.sh file while bootstrapping, while the other question addresses the issue of the system not finding python3
In the end I did not use that script, but Used the EMR configuration file that is available on the creation stage, It gave me the proper configurations via spark_submit (in the aws gui) If you need it to be available for pyspark scripts in a more programatic way, you can use os.environ to set the pyspark python version in the python script

Clear data from HDFS on AWS EMR in Hadoop 1.0.3

For various reasons I'm running some jobs on EMR with AMI 2.4.11/Hadoop 1.0.3. I'm trying to run a cleanup of HDFS after my jobs by adding an additional EMR step. Using boto:
step = JarStep(
'HDFS cleanup',
'command-runner.jar',
action_on_failure='CONTINUE',
step_args=['hadoop', 'dfs', '-rmr', '-skipTrash', 'hdfs:/tmp'])
emr_conn.add_jobflow_steps(cluster_id, [step])
However it regularly fails with nothing in stderr in the EMR console.
Why I am confused is if I ssh into the master node and run the command:
hadoop dfs -rmr -skipTrash hdfs:/tmp
It succeeds with a 0 and a message that it successfully deleted everything. All the normal hadoop commands seem to work as documented. Does anyone know if there's an obvious reason for this? Issue with the Amazon distribution? Undocumented behavior in certain commands?
Note:
I have other jobs running in Hadoop 2 and the documented:
hdfs dfs -rm -r -skipTrash hdfs:/tmp
works as one would expect both as a step and as a command.
My solution generally was to upgrade everything to Hadoop2, in which case this works:
JarStep(
'%s: HDFS cleanup' % self.job_name,
'command-runner.jar',
action_on_failure='CONTINUE',
step_args=['hdfs', 'dfs', '-rm', '-r', '-skipTrash', path]
)
This was the best I could get with Hadoop1 that worked pretty well.
JarStep(
'%s: HDFS cleanup' % self.job_name,
'command-runner.jar',
action_on_failure='CONTINUE',
step_args=['hadoop', 'fs', '-rmr', '-skipTrash',
'hdfs:/tmp/mrjob']
)

Django && hadoop

I just want to access the hdfs from web with django,so I use the hadoopy,I just write
def list(reqeust):
return HttpResponse(hadoopy.ls("."))
in views.py,but something is wrong,there are some informations about the error:"IOError at /list/ Ran[hadoop fs -ls .]: /bin/sh: 1: hadoop: not found",I think the "hadoop" command can't be resolved by shell,but I don't know what to do
The hadoopy library you're attempting to use is simply acting as a wrapper over the existing Apache Hadoop bash command scripts (hadoop, hdfs, mapred, etc. commands) and thereby requires those to be installed and available on your OS's or Application's PATH env-var, so it may call a hadoop fs -ls <path> shell command when you attempt to do hadoopy.ls(…).