I configured Accumulo 1.7.0 with Hadoop 2.6.0 (HDFS) and Zookeeper 3.4.6, all works good, but i want to know how to restore an instance.
Thanks !!!
UPDATE
The problem is that i want to recover the instance after restart the PC or stop all processes. I put the log for better understanding:
hduser#master:/opt/accumulo-1.7.0-bin/bin$ ./start-all.sh
Starting monitor on localhost
WARN : Max open files on localhost is 1024, recommend 32768
Starting tablet servers .... done
Starting tablet server on localhost
WARN : Max open files on localhost is 1024, recommend 32768
2016-02-23 11:46:46,089 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on hard system reset or power loss
2016-02-23 11:46:46,092 [server.Accumulo] INFO : Attempting to talk to zookeeper
2016-02-23 11:46:46,242 [server.Accumulo] INFO : Waiting for accumulo to be initialized
2016-02-23 11:46:47,243 [server.Accumulo] INFO : Waiting for accumulo to be initialized
2016-02-23 11:46:48,246 [server.Accumulo] INFO : Waiting for accumulo to be initialized
and
hduser#master:/opt/accumulo-1.7.0-bin/bin$ ./accumulo init
2016-02-22 16:10:46,410 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on hard system reset or power loss
2016-02-22 16:10:46,411 [init.Initialize] INFO : Hadoop Filesystem is hdfs://master:9000
2016-02-22 16:10:46,412 [init.Initialize] INFO : Accumulo data dirs are [hdfs://master:9000/accumulo]
2016-02-22 16:10:46,412 [init.Initialize] INFO : Zookeeper server is localhost:2181
2016-02-22 16:10:46,412 [init.Initialize] INFO : Checking if Zookeeper is available. If this hangs, then you need to make sure zookeeper is running
2016-02-22 16:10:46,606 [init.Initialize] ERROR: FATAL It appears the directories [hdfs://master:9000/accumulo] were previously initialized.
2016-02-22 16:10:46,606 [init.Initialize] ERROR: FATAL: Change the property instance.volumes to use different filesystems,
2016-02-22 16:10:46,606 [init.Initialize] ERROR: FATAL: or change the property instance.dfs.dir to use a different directory.
2016-02-22 16:10:46,606 [init.Initialize] ERROR: FATAL: The current value of instance.dfs.uri is ||
2016-02-22 16:10:46,606 [init.Initialize] ERROR: FATAL: The current value of instance.dfs.dir is |/accumulo|
2016-02-22 16:10:46,606 [init.Initialize] ERROR: FATAL: The current value of instance.volumes is |hdfs://master:9000/accumulo|
It should be as simple as going into the bin directory and running the appropriate labeled script.
cd accumulo-1.7.0/bin
./stop-all.sh
Then to start again:
./start-all.sh
Thanks for the help, the problem was that i need set the directory where the snapshot is stored for zookeeper, because by default it is stored in "/tmp". I modify file zoo.cfg and set a new directory for "dataDir".
Related
I am getting subprocess error when launching ec2 cluster instance.
The terminal lags on
Waiting for cluster to enter 'ssh-ready' state'
when running
./spark-ec2 --key-pair=ru_spark --identity-file=ru_spark.pem --region=us-east-1 --zone=us-east-1a launch mycluster
Console:
Warning: Permanently added 'ec2-52-87-225-32.compute-1.amazonaws.com,52.87.225.32' (RSA) to the list of known hosts.
Connection to ec2-52-87-225-32.compute-1.amazonaws.com closed.
Warning: Permanently added 'ec2-52-87-225-32.compute-1.amazonaws.com,52.87.225.32' (RSA) to the list of known hosts.
Transferring cluster's SSH key to slaves...
ec2-34-207-153-79.compute-1.amazonaws.com
Warning: Permanently added 'ec2-34-207-153-79.compute-1.amazonaws.com,34.207.153.79' (RSA) to the list of known hosts.
Cloning spark-ec2 scripts from https://github.com/amplab/spark-ec2/tree/branch-1.6 on master...
Warning: Permanently added 'ec2-52-87-225-32.compute-1.amazonaws.com,52.87.225.32' (RSA) to the list of known hosts.
Cloning into 'spark-ec2'...
error: Peer reports incompatible or unsupported protocol version. while accessing https://github.com/amplab/spark-ec2/info/refs?service=git-upload-pack
fatal: HTTP request failed
Connection to ec2-52-87-225-32.compute-1.amazonaws.com closed.
Error executing remote command, retrying after 30 seconds: Command '['ssh', '-o', 'StrictHostKeyChecking=no', '-o', 'UserKnownHostsFile=/dev/null', '-i', 'ru_spark.pem', '-t', '-t', u'root#ec2-52-87-225-32.compute-1.amazonaws.com', 'rm -rf spark-ec2 && git clone https://github.com/amplab/spark-ec2 -b branch-1.6 spark-ec2']' returned non-zero exit status 128
I updated to curl ssl, changed file permissions to 400 and 600 for ru_spark.pem but neither have helped solve the issue.
I installed cloudera vm and started trying some basic stuff. First I just wanted to ls the hdfs directoires. so I issued the below command.
[cloudera#quickstart ~]$ hadoop fs -ls /
ls: Failed on local exception: java.net.SocketException: Network is unreachable; Host Details : local host is: "quickstart.cloudera/10.0.2.15"; destination host is: "quickstart.cloudera":8020;
though ps -fu hdfs says both namenode and data node is running. I checked the status using the service command.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Thinking all the problems will be resolved if I restart all the services, I executed the below command.
[cloudera#quickstart conf]$ sudo /home/cloudera/cloudera-manager --express --force
[QuickStart] Shutting down CDH services via init scripts...
[QuickStart] Disabling CDH services on boot...
[QuickStart] Starting Cloudera Manager daemons...
[QuickStart] Waiting for Cloudera Manager API...
[QuickStart] Configuring deployment...
Submitted jobs: 92
[QuickStart] Deploying client configuration...
Submitted jobs: 93
[QuickStart] Starting Cloudera Management Service...
Submitted jobs: 101
[QuickStart] Enabling Cloudera Manager daemons on boot...
Now I thought all services will be up so again checked the status of namenode service. Again it came failed.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Now I decided to manually stop and start the namenode service. Again not much use.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode stop
no namenode to stop
Stopped Hadoop namenode: [ OK ]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode start
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out
Failed to start Hadoop namenode. Return value: 1 [FAILED]
I checked the file /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out . It just said below
log4j:ERROR Could not find value for key log4j.appender.RFA
log4j:ERROR Could not instantiate appender named "RFA".
I also checked /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-quickstart.cloudera.log.out . Found below when I searched for error. Can anyone please suggest me what is the best way to get the services back on track. Unfortunately I am not able to access cloudera manager from browser. Anything that I can do from command line?
2016-02-24 21:02:48,105 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={ROLE_TYPE=[NAMENODE], CATEGORY=[LOG_MESSAGE], ROLE=[hdfs-NAMENODE], SEVERITY=[IMPORTANT], SERVICE=[hdfs], HOST_IDS=[quickstart.cloudera], SERVICE_TYPE=[HDFS], LOG_LEVEL=[WARN], HOSTS=[quickstart.cloudera], EVENTCODE=[EV_LOG_EVENT]}, content=Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!, timestamp=1456295437905} - 1 of 17 failure(s) in last 79302s
java.io.IOException: Error connecting to quickstart.cloudera/10.0.2.15:7184
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:133)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.checkSpecificRequestor(AvroEventStorePublishProxy.java:122)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.publishEvent(AvroEventStorePublishProxy.java:196)
at com.cloudera.cmf.event.publish.EventStorePublisherWithRetry$PublishEventTask.run(EventStorePublisherWithRetry.java:242)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Network is unreachable
You can try this:
check witch process is using the port 7184 of namenode (i.e netstat linux command)
and kill that and then restart
Or
change you namenode port from conf and restart hadoop
I've setup opscenter on one of cassandra cluster nodes. After installation, when setting up my cluster, I tried installation of datastax agent on all the cluster nodes via UI, but it failed. So, I had to install the agents manually.
After manually installing the agents, the node in which opscenter is installed is able to connect, but not the other nodes. It still says, "2 agents failed to connect". What could be the issue?
PS : My cassandra cluster is setup on AWS in ubuntu
My agent.log file looks like this
ERROR [os-metrics-9] 2015-07-27 07:04:43,390 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory
ERROR [os-metrics-7] 2015-07-27 07:04:43,391 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory
ERROR [os-metrics-8] 2015-07-27 07:04:53,391 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory
ERROR [os-metrics-3] 2015-07-27 07:04:53,392 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory
ERROR [StompConnection receiver] 2015-07-27 07:05:02,946 failed connecting to **.**.**.**:61620:java.net.ConnectException: Connection timed out
You have to set the stomp_interface in the address.yaml like
stomp_interface: <ip-address>
After agent restart it should be connected.
As your agent have been able to connect from the same box where opscenter is installed, so it sounds like :
You might have not configured your firewall properly. If you please try by disabling firewall on all your boxes.
You may have multiple interfaces and C* installation picked up an undesired interface. So run ifconfig or ip command on all of your instances and check with C* yaml.
About iostat failure message : You have not install sysstat pkg. Seems, you have not install dependencies as part of DSE install.
The agents uses iostat to collect some information from disks. If it cant find it you will get that error but it just means those metrics will be missing some os metrics (likely a lot of disk and cpu metrics will be missing)
These are some useful configurations that you should keep in mind when starting the agent manually in the conf/address.yaml file:
###A name for the node to use as a label throughout OpsCenter.
alias:
###Reachable IP address of the opscenterd machine. The connection made will be on stomp_port. Internal IP in this case
stomp_interface:
###Port for the agent's HTTP service (default: 61621).
#api_port: 61621
###The stomp_port used by opscenterd. == Must match with the 'incoming_port' in opscenter.conf
stomp_port: 61620
###The IP used to identify the node.
local_interface: 100.73.158.44
###The IP that the agent HTTP server listens on.
agent_rpc_interface:
###Host used to connect to local JMX server.
jmx_host: 100.73.158.44
###Whether or not to use SSL communication between the agent and opscenterd.
use_ssl: 1
To solve the "Cannot run program 'iostat'" error, do this:
sudo apt-get install sysstat
While debugging I realised that confd doesn't pick up the keys and my journal looks like this:
Sep 18 18:31:50 ip-10-171-54-76.ec2.internal docker[24891]: [nginx] waiting for confd to refresh nginx.conf
Sep 18 18:31:56 ip-10-171-54-76.ec2.internal docker[24891]: 2014-09-18T18:31:56Z 9122c7a54edc confd[9572]: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
I use nsenter to log in to the running container to run some experiments for debugging purposes. I ran this command
confd -onetime -node 172.17.42.1:4001 -config-file /etc/confd/conf.d/nginx.toml
Then received this error as above
confd[12894]: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
I am totally clueless at this point. I am using EC2 with the stable version of CoreOS and I am sure that etcd is running on the host. Also, I can ping the host from inside the container successfully.
Any ideas on what's wrong?
Assistance will be much appreciated.
This error indicates that your etcd cluster isn't operating correctly, so confd has nothing to watch. It has probably lost quorum. The logs (journalctl -u etcd) should indicate what happened.
I am using spark 0.90 stand alone mode.
When I tried with a streaming application in stand alone mode, I am getting a connection refused exception.
I added hostname in /etc/hosts also tried with IP alone. In both cases worker got registered with master without any issues.
Is there a way to solve this issue?
14/02/28 07:15:01 INFO Master: akka.tcp://driverClient#127.0.0.1:55891 got disassociated, removing it.
14/02/28 07:15:04 INFO Master: Registering app Twitter Streaming
14/02/28 07:15:04 INFO Master: Registered app Twitter Streaming with ID app-20140228071504-0000
14/02/28 07:34:42 INFO Master: akka.tcp://spark#127.0.0.1:33688 got disassociated, removing it.
14/02/28 07:34:42 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.165.35.96%3A38903-6#-1146558090] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/02/28 07:34:42 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster#10.165.35.96:8910] -> [akka.tcp://spark#127.0.0.1:33688]: Error [Association failed with [akka.tcp://spark#127.0.0.1:33688]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark#127.0.0.1:33688]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /127.0.0.1:33688
I had a similar issue when running in Spark in cluster mode. My problem was that the server was started with the hostname 'fluentd:7077' and not the FQDN. I edited the
/sbin/start-master.sh
to reflect how my remote nodes connect with the -ip flag.
/usr/lib/jvm/jdk1.7.0_51/bin/java -cp :/home/vagrant/spark-0.9.0-incubating-bin- hadoop2/conf:/home/vagrant/spark-0.9.0-incuba
ting-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m org.ap
ache.spark.deploy.master.Master --ip fluentd.alex.dev --port 7077 --webui-port 8080
Hope this helps.