while running hadoop fs -ls getting error no cldb entries - mapreduce

while running hadoop fs -ls getting error, please see below screenshot.

When running "hadoop fs -ls", the list of CLDB servers will be read from the /opt/mapr/conf/mapr-clusters.conf file. That file should be populated based on the CLDB addresses provided when running the /opt/mapr/server/configure.sh command. Either, the contents of mapr-clusters.conf do not specify the correct CLDB hosts (or any servers), or the CLDB service is not running on the specified hosts. I'd suggest ensuring the MapR services are running on the nodes where they've been installed and reviewing the installation/configuration documentation.

Related

"puppet agent --test" on client machine aren't getting manifest from the Puppet master server

Issue
So I have two AWS instances: a Puppet master and a Puppet client. When I run sudo puppet agent --test on my client, the tasks defined in my master's manifest didn't apply to the client instance.
Where I am right now
puppetmaster is installed on the master instance
puppet is installed on client instance
Master just finished signing my client's certificate. No errors were displayed
Master has a /etc/puppet/manifests/site.pp
Client's puppet.conf file has a server=dns_of_master line
My Puppet version is 5.4.0. I'm using the default manifest configuration.
Here's the guide that I'm following: https://www.digitalocean.com/community/tutorials/getting-started-with-puppet-code-manifests-and-modules. The only changes are the site.pp content and that I'm using AWS.
If it helps, here's my AWS instances' AMI: ami-06d51e91cea0dac8d
Details
Here's the content on my master's /etc/puppet/manifests/site.pp:
node default {
package { 'nginx':
ensure => installed
}
service { 'nginx':
ensure => running,
require => Package['nginx']
}
file { '/tmp/hello_world':
ensure => present,
content => 'Hello, World!'
}
}
The file has a permission of 777.
Here's the ouput when I run sudo puppet agent --test. This is after I ran sudo puppet agent --enable:
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Caching catalog for my_client_dns
Info: Applying configuration version '1578968015'
Notice: Applied catalog in 0.02 seconds
I have looked at other StackOverflow posts with this issue. I know that my catalog is not getting applied due to the lack of status messages and the quick time. Unfortunately, the solutions didn't apply to my case:
My site.pp is named correctly and in the correct file path /etc/puppet/manifests
I didn't touch my master's puppet.conf file
I tried restarting the server with sudo systemctl but nothing happens
So I have fixed the issue. The guide that I was following required an older version of Ubuntu (16.4, rather than 18.4 as I'm using). This needs a different AMI than the one that I used to create the instances.

HDFS directory on a MAPR cluster

I need to save my Spark Streaming checkpoint files on a HDFS directory. I can access a remote cluster which has MAPR installed on it.
But, I am not sure on which path MAPR denoting to a HDFS directory
is it opt/mapr/..?
When you are connected to your MapR cluster you can run the following command:
hadoop fs -ls /
This will list, like inside any HDFS cluster the list of files/folders, so you see here anything special.
So if your Spark job is running on MapR cluster you just have to point to the folder your want, for example:
yourRdd.saveAsTextFile("/apps/output");
You can do exactly the same from your development environment, but you have to install and configure the MapR-Client
Note that you can also access MapR File System (FS) using NFS, that should run on your cluster, by default the mount point is /mapr
So you can see the content of your FS using:
cd /mapr/you-cluster-name/apps/output
/mapr/opt is the folder that contains the MapR installed product.
So if you look at it from a pure Spark point of view: nothing change just save/read data from a folder, if you are running in MapR this will be done in MapR-FS.

Hadoop dfsadmin -report command is not working in mapr

I need to know the dfs report of the mapr cluster but when i am executing following command i am getting error
hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
report: FileSystem maprfs:/// is not an HDFS file system
Usage: java DFSAdmin [-report] [-live] [-dead] [-decommissioning]
Is there any way to do it in MAPR.
I tried this link as well but it doesn't provided needed information.
Try below commands:
maprcli node list
maprcli dashboard info

Error starting Spark in EMR 4.0

I created an EMR 4.0 instance in AWS with all available applications, including Spark. I did it manually, through AWS Console. I started the cluster and SSHed to the master node when it was up. There I ran pyspark. I am getting the following error when pyspark tries to create SparkContext:
2015-09-03 19:36:04,195 ERROR Thread-3 spark.SparkContext
(Logging.scala:logError(96)) - -ec2-user, access=WRITE,
inode="/user":hdfs:hadoop:drwxr-xr-x at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
I haven't added any custom applications, nor bootstrapping and expected everything to work without errors. Not sure what's going on. Any suggestions will be greatly appreciated.
Login as the user "hadoop" (http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-connect-master-node-ssh.html). It has all the proper environment and related settings for working as expected. The error you are receiving is due to logging in as "ec2-user".
I've been working with Spark on EMR this week, and found a few weird things relating to user permissions and relative paths.
It seems that running Spark from a directory which you don't 'own', as a user, is problematic. In some situations Spark (or some of the underlying Java pieces) want to create files or folders, and they think that pwd - the current directory - is the best place to do that.
Try going to the home directory
cd ~
then running pyspark.

Not getting IDQ log while running using infacmd command

We are running a shell script that runs a deployed IDQ mapping. I tried in unix directories to see if it created a mapping log file but no where i can see.
I checked in various directories under "" <infa_home> " folder but i could not trace the log file.
If you have come across the same situation, please do let me know.
Which log you are looking for? Mapping/ workflow/ DIS/Webservice/Catalina?
All IDQ logs stored under $HOME/infa_shared/log/disLogs/*