Datastax Agent (Cassandra) Opscenter setup issue - amazon-web-services

I've setup opscenter on one of cassandra cluster nodes. After installation, when setting up my cluster, I tried installation of datastax agent on all the cluster nodes via UI, but it failed. So, I had to install the agents manually.
After manually installing the agents, the node in which opscenter is installed is able to connect, but not the other nodes. It still says, "2 agents failed to connect". What could be the issue?
PS : My cassandra cluster is setup on AWS in ubuntu
My agent.log file looks like this
ERROR [os-metrics-9] 2015-07-27 07:04:43,390 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory
ERROR [os-metrics-7] 2015-07-27 07:04:43,391 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory
ERROR [os-metrics-8] 2015-07-27 07:04:53,391 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory
ERROR [os-metrics-3] 2015-07-27 07:04:53,392 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory
ERROR [StompConnection receiver] 2015-07-27 07:05:02,946 failed connecting to **.**.**.**:61620:java.net.ConnectException: Connection timed out

You have to set the stomp_interface in the address.yaml like
stomp_interface: <ip-address>
After agent restart it should be connected.

As your agent have been able to connect from the same box where opscenter is installed, so it sounds like :
You might have not configured your firewall properly. If you please try by disabling firewall on all your boxes.
You may have multiple interfaces and C* installation picked up an undesired interface. So run ifconfig or ip command on all of your instances and check with C* yaml.
About iostat failure message : You have not install sysstat pkg. Seems, you have not install dependencies as part of DSE install.

The agents uses iostat to collect some information from disks. If it cant find it you will get that error but it just means those metrics will be missing some os metrics (likely a lot of disk and cpu metrics will be missing)

These are some useful configurations that you should keep in mind when starting the agent manually in the conf/address.yaml file:
###A name for the node to use as a label throughout OpsCenter.
alias:
###Reachable IP address of the opscenterd machine. The connection made will be on stomp_port. Internal IP in this case
stomp_interface:
###Port for the agent's HTTP service (default: 61621).
#api_port: 61621
###The stomp_port used by opscenterd. == Must match with the 'incoming_port' in opscenter.conf
stomp_port: 61620
###The IP used to identify the node.
local_interface: 100.73.158.44
###The IP that the agent HTTP server listens on.
agent_rpc_interface:
###Host used to connect to local JMX server.
jmx_host: 100.73.158.44
###Whether or not to use SSL communication between the agent and opscenterd.
use_ssl: 1

To solve the "Cannot run program 'iostat'" error, do this:
sudo apt-get install sysstat

Related

Cloud Foundry cli i/o timeout

I was able to successfully deploy BOSH and CF on GCP. I was able to install the cf cli on my worker machine and was able to cf login to the api endpoint without any issues. Now I am attempting to deploy a python and a node.js hello-world style application (cf push) but I am running into the following error:
Python:
**ERROR** Could not install python: Get https://buildpacks.cloudfoundry.org/dependencies/python/python-3.5.4-linux-x64-5c7aa3b0.tgz: dial tcp: lookup buildpacks.cloudfoundry.org on 169.254.0.2:53: read udp 10.255.61.196:36513->169.254.0.2:53: i/o timeout
Failed to compile droplet: Failed to run all supply scripts: exit status 14
NodeJS
-----> Nodejs Buildpack version 1.6.28
-----> Installing binaries
engines.node (package.json): unspecified
engines.npm (package.json): unspecified (use default)
**WARNING** Node version not specified in package.json. See: http://docs.cloudfoundry.org/buildpacks/node/node-tips.html
-----> Installing node 6.14.3
Download [https://buildpacks.cloudfoundry.org/dependencies/node/node-6.14.3-linux-x64-ae2a82a5.tgz]
**ERROR** Unable to install node: Get https://buildpacks.cloudfoundry.org/dependencies/node/node-6.14.3-linux-x64-ae2a82a5.tgz: dial tcp: lookup buildpacks.cloudfoundry.org on 169.254.0.2:53: read udp 10.255.61.206:34802->169.254.0.2:53: i/o timeout
Failed to compile droplet: Failed to run all supply scripts: exit status 14
I am able to download and ping the build pack urls manually on the worker machine, jumpbox, and the bosh vms so I believe DNS is working properly on each of those machine types.
As part of the default deployment, I believe a socks5 tunnel is created to allow communication from my worker machine to the jumpbox so this is where I believe the issue lies. https://docs.cloudfoundry.org/cf-cli/http-proxy.html
When running bbl print-env, export BOSH_ALL_PROXY=ssh+socks5://jumpbox#35.192.140.0:22?private-key=/tmp/bosh-jumpbox725514160/bosh_jumpbox_private.key , however when I export https_proxy=socks5://jumpbox#35.192.140.0:22?private-key=/tmp/bosh-jumpbox389236516/bosh_jumpbox_private.key and do a cf push I receive the following error:
Request error: Get https://api.cloudfoundry.costub.com/v2/info: proxy: SOCKS5 proxy at 35.192.140.0:22 has unexpected version 83
TIP: If you are behind a firewall and require an HTTP proxy, verify the https_proxy environment variable is correctly set. Else, check your network connection.
FAILED
Am I on the right track? Is my https_proxy variable formatted correctly? I also tried https_proxy=socks5://jumpbox#35.192.140.0:22 with the same result.

H2O + HDFS (Cloudera)

We have a Cloudera cluster up and running with an h2o instance although it appears to be running off h2o.jar (which as I understand it--please correct me if incorrect) is the stand-alone h2o. I can connect, but it will not load any files from our HDFS. (all of this i can see via 'ps' on edge node.
So I started an instance with h2odriver.jar
java -jar /path/to/h2odriver.jar -nodes 2 -mapperXmx 5g -output /my/hdfs/dir
I get several output/callback addresses:
[Possible callback IP address: 10.96.243.46:33728]
[Possible callback IP address: 127.0.0.1]
Using mapper->driver callback IP address and port: 10.96.243.46:33728
So I fire up python and try and connect (same thing happens if I use 10.96.243.46):
>>>h2o.connection(ip='127.0.0.1', port='33728')
and get
'Connecting to H2O server at http://127.0.0.1:33728..... failed.
H2OConnectionError: COuld not estalich link to the H2O cloud http://127.0.0.1:33728 after 5 retries
...
Failed to establish a new connection:[Errno 111] Connection refused',))`
Thing is on my screen with the H2O jar/java job I can see:
`MapperToDriverMessage: Read invalid type (G) from socket, ignoring...
MapperToDriverMessage: read: Unknown Type `
I cannot figure out how to launch h2o in cluster mode and have it access our hdfs system or even connect. I can connect to the h2o.jar version, but that sees no hdfs (it can see the filesystem of the edgenode). What is the proper way to launch H2O so that it can see the attached HDFS system (We are running Cloudera 5.7 in a enterprise environment, Python is 3.6, H2O is 3.10.0.6 and I know we have a ton of firewalls/security-- i beleive we are setup through LDAP
You are correct that h2o.jar is meant to be the standalone version of H2O which is not meant for connecting to HDFS.
Using the appropriate h2odriver.jar for your particular hadoop distribution is the way to go.
The correct beginner instructions can be found here:
go to http://www.h2o.ai/download/
choose H2O "Latest Stable Release"
choose tab "Install on Hadoop"
It says to run the following command:
hadoop jar h2odriver.jar -nodes 1 -mapperXmx 6g -output hdfsOutputDirName
[ Note this is "hadoop jar", not "java -jar" as written in the question. ]
You should see output like this:
Determining driver host interface for mapper->driver callback...
[Possible callback IP address: 172.16.2.181]
[Possible callback IP address: 127.0.0.1]
...
Waiting for H2O cluster to come up...
H2O node 172.16.2.188:54321 requested flatfile
Sending flatfiles to nodes...
[Sending flatfile to node 172.16.2.188:54321]
H2O node 172.16.2.188:54321 reports H2O cluster size 1
H2O cluster (1 nodes) is up
(Note: Use the -disown option to exit the driver after cluster formation)
Open H2O Flow in your web browser: http://172.16.2.188:54321
(Press Ctrl-C to kill the cluster)
Blocking until the H2O cluster shuts down...
Then point your web browser to the place where it says to "Open H2O Flow in your web browser".
(The other addresses in the output are diagnostics, and not for end users.)
In this case, the python connection command would be:
h2o.connect(ip = '172.16.2.188', port = 54321)
I recommend going to Flow in a web browser, start importing a file by typing in "hdfs://", and seeing if autocompletion works. If it does, your HDFS connection is working.

informatica live data map(LDM) installation

We are facing the issue while starting the informatica cluster service.
When starting Informatica cluster services, some scripts installing Ambari server on infabde, bdemaster and bdeslave.
The script is trying to install ambari on infabde again and again in loop, So the cluster service failed to start by saying that Ambari already installed in infabde. Its not trying to install to other two nodes.
Error Log:
2017-01-12 17:10:30,763 [localhost-startStop-1] INFO com.infa.products.ihs.service.ambari.ScriptLauncher- Waiting for Script's streams to end.
2017-01-12 17:10:41,210 [localhost-startStop-1] ERROR com.infa.products.ihs.beans.application.ClusterListener- [InfaHadoopServiceException_00047] The launch of Ambari server on host [infabde.lucidtechsol.com] failed because the host already has an installed Ambari server. You can add the host to another cluster.
com.infa.products.ihs.service.exception.InfaHadoopServiceException: [InfaHadoopServiceException_00047] The launch of Ambari server on host [infabde.hostname.com] failed because the host already has an installed Ambari server. You can add the host to another cluster.
Run the reset script
./ResetScript.sh true user#server.com user#server.com
./ResetScript.sh false user#server.com user#client.com
and enable IHS then.
ResetScript.sh can be found in services/Infahadoopserveice/binaries

Installing and Viewing Neo4j on Existing AWS EC2 Instance

I'm trying to install the enterprise edition of neo4j on an existing EC2 (Amazon linux) instance. So far I've
wget "link to enterprise"
untar the file
renamed and moved the folder to NEO4J_HOME
then went into the config files for neo4j.properties to make the following changes:
# Enable shell server so that remote clients can connect via Neo4j shell.
remote_shell_enabled=true
# The network interface IP the shell will listen on (use 0.0.0 for all interfaces)
remote_shell_host=127.0.0.1
# The port the shell will listen on, default is 1337
remote_shell_port=1337
EDITED Christophe Willemsen pointed out that for my original error, I had forgotten to restart the server at that point but I was still unable to access the web server while it was running. So to make it more clear, I've edited the remaining post:
I went to neo4j-server.properties and uncommented:
org.neo4j.server.webserver.address=0.0.0.0
And start the server
NEO4J_HOME/bin/neo4j start
WARNING: Max 1024 open files allowed, minimum of 40 000 recommended. See the Neo4j manual.
Using additional JVM arguments: -server -XX:+DisableExplicitGC -Dorg.neo4j.server.properties=conf/neo4j-server.properties -Djava.util.logging.config.file=conf/logging.properties -Dlog4j.configuration=file:conf/log4j.properties -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:-OmitStackTraceInFastThrow
Starting Neo4j Server...WARNING: not changing user
process [28557]... waiting for server to be ready..... OK.
http://localhost:7474/ is ready.
checking the status:
NEO4J_HOME/bin/neo4j status
Neo4j Server is running at pid 28557
I can run the shell but the when I go to localhost 7474 I still can not connect
Any help would be appreciative. The only tutorial or help I've found assumed I was starting from scratch with a new instance. If someone could provide some instructions for installing or fix my configuration that would be great.
Thanks!
You have to edit neo4j-server.properties and uncomment the line with:
org.neo4j.server.webserver.address=0.0.0.0
So that the db listens on an external interface not just localhost, and you have to open the port (7474) in your firewall rules.
Make sure to secure access to the db though:
http://neo4j.com/docs/stable/security-server.html

Problems getting RabbitMQ and Django-Celery Running: Target Machine actively refused connection

I am trying to get Django-Celery running on my Django App. I cannot get the worker server to run. When I try I get the message: No Connection could be made because the target machine actively refused it
Here is what I have done so far. First, I installed the django celery package: http://pypi.python.org/pypi/django-celery
I can load it into python without problems. I also installed the RabbitMQ server per the windows install instructions: http://www.rabbitmq.com/install.html#windows
Starting the tutorials in pytho on the RabbitMQ site I saw the need to install pika: http://pypi.python.org/pypi/pika. It imports without any problems.
From there I start the RabbitMQ server by running this at the command line: rabbitmq-service start
I get the message back that Service RabbitMQ started
Here is where I start to have problems.
I attempted the first steps in django-celery: http://packages.python.org/django-celery/getting-started/first-steps-with-django.html and the "hello world" example on the rabbitMQ site: http://www.rabbitmq.com/tutorials/tutorial-one-python.html
In both cases I get the message: No Connection could be made because the target machine actively refused it
My first thought was that this sounded like a firewall problem. So I went into the windows 7 firewall and added inbound and outbound rules to open the local and remote ports 5672 and 5673 to TCP protocol, but I still get the same error message.
When I run rabbitmqctl status i get the message:
Error: unable to connect to node 'rabbit#hostname': nodedown
diagnostics:
- nodes and their ports on hostname: [{rabbitmqctl18856, 505031}]
Does that mean it that it is trying to operate on those ports? what about the default 5672?
Any suggestions?
UPDATE: This was actually a problem resulting from several failed rabbitmq installs conflicting with the latest installation. If you have to remove rabbitmq use the 'rabbitmq-service remove' command and not SC DELETE, which cause a lot of problems for me and I had to go in and clean up my windows registry file.
The nodedown error indicated by rabbitmqctl suggests that the server isn't running on that machine.
Try going though the steps in RabbitMQ's troubleshooting guide. In particular, pay close attention to the logs. Has the server crashed for some reason? Could you post the logs somewhere?