Not able to access HDFS - hdfs

I installed cloudera vm and started trying some basic stuff. First I just wanted to ls the hdfs directoires. so I issued the below command.
[cloudera#quickstart ~]$ hadoop fs -ls /
ls: Failed on local exception: java.net.SocketException: Network is unreachable; Host Details : local host is: "quickstart.cloudera/10.0.2.15"; destination host is: "quickstart.cloudera":8020;
though ps -fu hdfs says both namenode and data node is running. I checked the status using the service command.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Thinking all the problems will be resolved if I restart all the services, I executed the below command.
[cloudera#quickstart conf]$ sudo /home/cloudera/cloudera-manager --express --force
[QuickStart] Shutting down CDH services via init scripts...
[QuickStart] Disabling CDH services on boot...
[QuickStart] Starting Cloudera Manager daemons...
[QuickStart] Waiting for Cloudera Manager API...
[QuickStart] Configuring deployment...
Submitted jobs: 92
[QuickStart] Deploying client configuration...
Submitted jobs: 93
[QuickStart] Starting Cloudera Management Service...
Submitted jobs: 101
[QuickStart] Enabling Cloudera Manager daemons on boot...
Now I thought all services will be up so again checked the status of namenode service. Again it came failed.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Now I decided to manually stop and start the namenode service. Again not much use.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode stop
no namenode to stop
Stopped Hadoop namenode: [ OK ]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode start
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out
Failed to start Hadoop namenode. Return value: 1 [FAILED]
I checked the file /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out . It just said below
log4j:ERROR Could not find value for key log4j.appender.RFA
log4j:ERROR Could not instantiate appender named "RFA".
I also checked /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-quickstart.cloudera.log.out . Found below when I searched for error. Can anyone please suggest me what is the best way to get the services back on track. Unfortunately I am not able to access cloudera manager from browser. Anything that I can do from command line?
2016-02-24 21:02:48,105 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={ROLE_TYPE=[NAMENODE], CATEGORY=[LOG_MESSAGE], ROLE=[hdfs-NAMENODE], SEVERITY=[IMPORTANT], SERVICE=[hdfs], HOST_IDS=[quickstart.cloudera], SERVICE_TYPE=[HDFS], LOG_LEVEL=[WARN], HOSTS=[quickstart.cloudera], EVENTCODE=[EV_LOG_EVENT]}, content=Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!, timestamp=1456295437905} - 1 of 17 failure(s) in last 79302s
java.io.IOException: Error connecting to quickstart.cloudera/10.0.2.15:7184
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:133)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.checkSpecificRequestor(AvroEventStorePublishProxy.java:122)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.publishEvent(AvroEventStorePublishProxy.java:196)
at com.cloudera.cmf.event.publish.EventStorePublisherWithRetry$PublishEventTask.run(EventStorePublisherWithRetry.java:242)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Network is unreachable

You can try this:
check witch process is using the port 7184 of namenode (i.e netstat linux command)
and kill that and then restart
Or
change you namenode port from conf and restart hadoop

Related

AWS Replication agent problem when launching

I am trying to launch aws replication agent in a CENTOS 8.3 and always returns me an error during the process of replication agent installation ( python3 aws-replication-installer-init.py ......)
The output of the process shows me:
The installation of the AWS Replication Agent has started.
Identifying volumes for replication.
Identified volume for replication: /dev/sdb of size 7 GiB
Identified volume for replication: /dev/sda of size 11 GiB
All volumes for replication were successfully identified.
Downloading the AWS Replication Agent onto the source server... Finished.
Installing the AWS Replication Agent onto the source server...
Error: Failed Installing the AWS Replication Agent
Installation failed.
If i check the aws_replication_agent_installer.log i can see that appears messages like:
make -C /lib/modules/4.18.0-348.2.1.el8_5.x86_64/build M=/tmp/tmp8mdbz3st/AgentDriver modules
.....................
retcode: 0
Build essentials returned with code None
--- Building software
running: 'which zypper'
retcode: 256
running: 'make'
retcode: 0
running: 'chmod 0770 ./aws-replication-driver-commander'
retcode: 0
running: '/sbin/rmmod aws-replication-driver'
retcode: 256
running: '/sbin/insmod ./aws-replication-driver.ko'
retcode: 256
running: '/sbin/rmmod aws-replication-driver'
retcode: 256
Cannot insert module. Try 0.
running: '/sbin/rmmod aws-replication-driver'
retcode: 256
running: '/sbin/insmod ./aws-replication-driver.ko'
retcode: 256
running: '/sbin/rmmod aws-replication-driver'
retcode: 256
Cannot insert module. Try 1.
............
Cannot insert module. Try 9.
Installation returned with code 2
Installation failed due to unspecified error:
stderr: sh: /var/lib/aws-replication-agent/stopAgent.sh: No such file or directory
which: no zypper in (/sbin:/bin:/usr/sbin:/usr/bin:/sbin:/usr/sbin)
which: no apt-get in (/sbin:/bin:/usr/sbin:/usr/bin:/sbin:/usr/sbin)
which: no zypper in (/sbin:/bin:/usr/sbin:/usr/bin:/sbin:/usr/sbin)
rmmod: ERROR: Module aws_replication_driver is not currently loaded
insmod: ERROR: could not insert module ./aws-replication-driver.ko: Required key not available
rmmod: ERROR: Module aws_replication_driver is not currently loaded
Any issue of the error?
Launching with the command:
mokutil --disable-validation
will allow to change kernel modules (next boot will confirm it introducing password that must be entered afet command mokutil)

reusing the salt states in the AMI image

I have several salt states(base and pillars) already written and present in Amazon s3. I want to re-use the salt states instead of writing the salt state again. I want to create an AMI image using packer and apply the salt-states that I have downloaded from s3 to the Packer Builder EC2 instance. Even if the salt-minion is installed on the CentOS -7 machine, I have installed salt-master service as well and started both salt-minion and salt-master by following commands.
cat > /etc/salt/minion.d/minion_id.conf <<'EOT' id: ${host} # id salt-minion id EOT
Generate the name of the master to connect to
cat > /etc/salt/minion.d/master_name.conf <<'EOT' master: localhost EOT
systemctl enable salt-minion
systemctl start salt-minion
systemctl enable salt-master
systemctl start salt-master
When running the below command it doesn't list any minions:
salt-key -L Accepted Keys: Denied Keys: Unaccepted Keys: Rejected Keys:
So the salt 'localhost-*' state.sls state.high_state
fails with errors:
"No minions matched the target. No command was sent, no jid was assigned.
ERROR: No return received"
This is because no minionid is created from salt-key.
Anybody has any idea why the salt-key is not being shown with salt-minion and how can i resolve this issue by running the existing salt-state successfully downloaded from s3 will work in AMI image?
Regards
Pradeep
What could be happening is that your minions can't find (resolve/DNS) the master salt.
What you could do is add the IP of your master to your minions /etc/salt/minion something like this:
master: 10.0.0.1
Replace 10.0.0.3 with the IP of your master
Later restart your minion and check the master again for requests.

Something wrong on deploy chaincode for hyperledger v1.0

I have tried to use docker toolbox to setup Hyperledger V1.0 in my local machines.
I according to this document:
http://hyperledger-fabric.readthedocs.io/en/latest/asset_setup.html
But when I tried to deploy chaincode.
$node deploy.js
I got an error message:
info: Returning a new winston logger with default configurations
info: [Chain.js]: Constructed Chain instance: name - fabric-client1, securityEnabled: true, TCert download batch size: 10, network mode: true
info: [Peer.js]: Peer.const - url: grpc://localhost:8051 options grpc.ssl_target_name_override=tlsca, grpc.default_authority=tlsca
info: [Peer.js]: Peer.const - url: grpc://localhost:8055 options grpc.ssl_target_name_override=tlsca, grpc.default_authority=tlsca
info: [Peer.js]: Peer.const - url: grpc://localhost:8056 options grpc.ssl_target_name_override=tlsca, grpc.default_authority=tlsca
info: [Client.js]: Failed to load user "admin" from local key value store
info: [FabricCAClientImpl.js]: Successfully constructed Fabric COP service client: endpoint - {"protocol":"http","hostname":"localhost","port":8054}
info: [crypto_ecdsa_aes]: This class requires a KeyValueStore to save keys, no store was passed in, using the default store C:\Users\daniel\.hfc-key-store
[2017-04-15 22:14:29.268] [ERROR] Helper - Error: Calling enrollment endpoint failed with error [Error: connect ECONNREFUSED 127.0.0.1:8054]
at ClientRequest.<anonymous> (C:\Users\daniel\node_modules\fabric-ca-client\lib\FabricCAClientImpl.js:304:12)
at emitOne (events.js:96:13)
at ClientRequest.emit (events.js:188:7)
at Socket.socketErrorListener (_http_client.js:310:9)
at emitOne (events.js:96:13)
at Socket.emit (events.js:188:7)
at emitErrorNT (net.js:1278:8)
at _combinedTickCallback (internal/process/next_tick.js:74:11)
at process._tickCallback (internal/process/next_tick.js:98:9)
[2017-04-15 22:14:29.273] [ERROR] DEPLOY - Error: Failed to obtain an enrolled user
at ca_client.enroll.then.then.then.catch (C:\Users\daniel\helper.js:59:12)
at process._tickCallback (internal/process/next_tick.js:103:7)
events.js:160
throw er; // Unhandled 'error' event
^
Error: Connect Failed
at ClientDuplexStream._emitStatusIfDone (C:\Users\daniel\node_modules\grpc\src\node\src\client.js:201:19)
at ClientDuplexStream._readsDone (C:\Users\daniel\node_modules\grpc\src\node\src\client.js:169:8)
at readCallback (C:\Users\daniel\node_modules\grpc\src\node\src\client.js:229:12)
Is this an question about unable to connect to ca? Or other causes?
Edit:
Environment:
OS: Windows 10 Professional Edition
Docker Toolbox: 17.04.0-ce
Go: 1.7.5
Node.js: 6.10.0
My steps:
1.Open Docker Quickstart Terminal and key commands.
$curl -L https://raw.githubusercontent.com/hyperledger/fabric/master/examples/sfhackfest/sfhackfest.tar.gz -o sfhackfest.tar.gz 2> /dev/null; tar -xvf sfhackfest.tar.gz
$docker-compose -f docker-compose-gettingstarted.yml build
$docker-compose -f docker-compose-gettingstarted.yml up -d
$docker ps
It has been confirmed that six containers have been activated
2.Download examples and install modules.
$curl -OOOOOO https://raw.githubusercontent.com/hyperledger/fabric-sdk-node/v1.0-alpha/examples/balance-transfer/{config.json,deploy.js,helper.js,invoke.js,query.js,package.json}
//This link didn't work, so I downloaded the required files from GitHub of fabric-sdk-node
$npm install --global windows-build-tools
$npm install
3.Try to deploy chaincode.
$node deploy.js
There were several problems, not the least of which that documentation was outdated and was for a preview release of Hyperledger Fabric. The docs are actually in the process of being removed as we need to update our examples / samples.
You mentioned Docker Toolbox - so are you trying to run all of this on Windows or Mac?
UPDATE:
So one of the issue with Docker Toolbox or Docker for Windows is that you cannot use localhost / 127.0.0.1 as the address when trying to communicate from apps on the host (even in the QuickStart Terminal) to the endpoints of the Docker containers. When the QuickStart Terminal first launches Docker, you'll see that it will output the IP address of the endpoint you should use when communicating with exposed ports.
I was having the same issue while following the latest "Writing Your First Application" tutorial (http://hyperledger-fabric.readthedocs.io/en/latest/write_first_app.html). I had installed all the pre-requisites and the fabric-samples and started the local network.
When I got to the step of enrolling the Admin user, $ node enrollAdmin.js, I was getting the same error message as above, Error: connect ECONNREFUSED, followed by the localhost domain.
As the first answer suggests, the root cause is that I'm running Docker Toolbox. I'm developing on an older Mac, OSX v10.9.5, so I couldn't use Docker for Mac.
To fix the issue, I replaced 'localhost' in the enrollAdmin.js code with the IP from Docker Toolbox.
Here are the steps I took:
Started Docker with Applications > Docker Quickstart Terminal
Copied the IP from this sentence: docker is configured to use the default machine with IP...
Opened the copy of enrollAdmin.js from fabric-samples/fabcar directory
Found this code:
// be sure to change the http to https when the CA is running TLS enabled
fabric_ca_client = new Fabric_CA_Client('http://localhost:7054', tlsOptions , 'ca.example.com', crypto_suite); // <-- This is the line to change
Replaced 'localhost' with the Docker IP, leaving the port :7054 as is.
Saved
Re-ran the command, $ node enrollAdmin.js
The script connected to the CA and successfully completed the Admin enrollment.
On to the next step!

informatica live data map(LDM) installation

We are facing the issue while starting the informatica cluster service.
When starting Informatica cluster services, some scripts installing Ambari server on infabde, bdemaster and bdeslave.
The script is trying to install ambari on infabde again and again in loop, So the cluster service failed to start by saying that Ambari already installed in infabde. Its not trying to install to other two nodes.
Error Log:
2017-01-12 17:10:30,763 [localhost-startStop-1] INFO com.infa.products.ihs.service.ambari.ScriptLauncher- Waiting for Script's streams to end.
2017-01-12 17:10:41,210 [localhost-startStop-1] ERROR com.infa.products.ihs.beans.application.ClusterListener- [InfaHadoopServiceException_00047] The launch of Ambari server on host [infabde.lucidtechsol.com] failed because the host already has an installed Ambari server. You can add the host to another cluster.
com.infa.products.ihs.service.exception.InfaHadoopServiceException: [InfaHadoopServiceException_00047] The launch of Ambari server on host [infabde.hostname.com] failed because the host already has an installed Ambari server. You can add the host to another cluster.
Run the reset script
./ResetScript.sh true user#server.com user#server.com
./ResetScript.sh false user#server.com user#client.com
and enable IHS then.
ResetScript.sh can be found in services/Infahadoopserveice/binaries

Confd error: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

While debugging I realised that confd doesn't pick up the keys and my journal looks like this:
Sep 18 18:31:50 ip-10-171-54-76.ec2.internal docker[24891]: [nginx] waiting for confd to refresh nginx.conf
Sep 18 18:31:56 ip-10-171-54-76.ec2.internal docker[24891]: 2014-09-18T18:31:56Z 9122c7a54edc confd[9572]: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
I use nsenter to log in to the running container to run some experiments for debugging purposes. I ran this command
confd -onetime -node 172.17.42.1:4001 -config-file /etc/confd/conf.d/nginx.toml
Then received this error as above
confd[12894]: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
I am totally clueless at this point. I am using EC2 with the stable version of CoreOS and I am sure that etcd is running on the host. Also, I can ping the host from inside the container successfully.
Any ideas on what's wrong?
Assistance will be much appreciated.
This error indicates that your etcd cluster isn't operating correctly, so confd has nothing to watch. It has probably lost quorum. The logs (journalctl -u etcd) should indicate what happened.