informatica live data map(LDM) installation - informatica

We are facing the issue while starting the informatica cluster service.
When starting Informatica cluster services, some scripts installing Ambari server on infabde, bdemaster and bdeslave.
The script is trying to install ambari on infabde again and again in loop, So the cluster service failed to start by saying that Ambari already installed in infabde. Its not trying to install to other two nodes.
Error Log:
2017-01-12 17:10:30,763 [localhost-startStop-1] INFO com.infa.products.ihs.service.ambari.ScriptLauncher- Waiting for Script's streams to end.
2017-01-12 17:10:41,210 [localhost-startStop-1] ERROR com.infa.products.ihs.beans.application.ClusterListener- [InfaHadoopServiceException_00047] The launch of Ambari server on host [infabde.lucidtechsol.com] failed because the host already has an installed Ambari server. You can add the host to another cluster.
com.infa.products.ihs.service.exception.InfaHadoopServiceException: [InfaHadoopServiceException_00047] The launch of Ambari server on host [infabde.hostname.com] failed because the host already has an installed Ambari server. You can add the host to another cluster.

Run the reset script
./ResetScript.sh true user#server.com user#server.com
./ResetScript.sh false user#server.com user#client.com
and enable IHS then.
ResetScript.sh can be found in services/Infahadoopserveice/binaries

Related

Cloud Foundry cli i/o timeout

I was able to successfully deploy BOSH and CF on GCP. I was able to install the cf cli on my worker machine and was able to cf login to the api endpoint without any issues. Now I am attempting to deploy a python and a node.js hello-world style application (cf push) but I am running into the following error:
Python:
**ERROR** Could not install python: Get https://buildpacks.cloudfoundry.org/dependencies/python/python-3.5.4-linux-x64-5c7aa3b0.tgz: dial tcp: lookup buildpacks.cloudfoundry.org on 169.254.0.2:53: read udp 10.255.61.196:36513->169.254.0.2:53: i/o timeout
Failed to compile droplet: Failed to run all supply scripts: exit status 14
NodeJS
-----> Nodejs Buildpack version 1.6.28
-----> Installing binaries
engines.node (package.json): unspecified
engines.npm (package.json): unspecified (use default)
**WARNING** Node version not specified in package.json. See: http://docs.cloudfoundry.org/buildpacks/node/node-tips.html
-----> Installing node 6.14.3
Download [https://buildpacks.cloudfoundry.org/dependencies/node/node-6.14.3-linux-x64-ae2a82a5.tgz]
**ERROR** Unable to install node: Get https://buildpacks.cloudfoundry.org/dependencies/node/node-6.14.3-linux-x64-ae2a82a5.tgz: dial tcp: lookup buildpacks.cloudfoundry.org on 169.254.0.2:53: read udp 10.255.61.206:34802->169.254.0.2:53: i/o timeout
Failed to compile droplet: Failed to run all supply scripts: exit status 14
I am able to download and ping the build pack urls manually on the worker machine, jumpbox, and the bosh vms so I believe DNS is working properly on each of those machine types.
As part of the default deployment, I believe a socks5 tunnel is created to allow communication from my worker machine to the jumpbox so this is where I believe the issue lies. https://docs.cloudfoundry.org/cf-cli/http-proxy.html
When running bbl print-env, export BOSH_ALL_PROXY=ssh+socks5://jumpbox#35.192.140.0:22?private-key=/tmp/bosh-jumpbox725514160/bosh_jumpbox_private.key , however when I export https_proxy=socks5://jumpbox#35.192.140.0:22?private-key=/tmp/bosh-jumpbox389236516/bosh_jumpbox_private.key and do a cf push I receive the following error:
Request error: Get https://api.cloudfoundry.costub.com/v2/info: proxy: SOCKS5 proxy at 35.192.140.0:22 has unexpected version 83
TIP: If you are behind a firewall and require an HTTP proxy, verify the https_proxy environment variable is correctly set. Else, check your network connection.
FAILED
Am I on the right track? Is my https_proxy variable formatted correctly? I also tried https_proxy=socks5://jumpbox#35.192.140.0:22 with the same result.

Not able to access HDFS

I installed cloudera vm and started trying some basic stuff. First I just wanted to ls the hdfs directoires. so I issued the below command.
[cloudera#quickstart ~]$ hadoop fs -ls /
ls: Failed on local exception: java.net.SocketException: Network is unreachable; Host Details : local host is: "quickstart.cloudera/10.0.2.15"; destination host is: "quickstart.cloudera":8020;
though ps -fu hdfs says both namenode and data node is running. I checked the status using the service command.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Thinking all the problems will be resolved if I restart all the services, I executed the below command.
[cloudera#quickstart conf]$ sudo /home/cloudera/cloudera-manager --express --force
[QuickStart] Shutting down CDH services via init scripts...
[QuickStart] Disabling CDH services on boot...
[QuickStart] Starting Cloudera Manager daemons...
[QuickStart] Waiting for Cloudera Manager API...
[QuickStart] Configuring deployment...
Submitted jobs: 92
[QuickStart] Deploying client configuration...
Submitted jobs: 93
[QuickStart] Starting Cloudera Management Service...
Submitted jobs: 101
[QuickStart] Enabling Cloudera Manager daemons on boot...
Now I thought all services will be up so again checked the status of namenode service. Again it came failed.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Now I decided to manually stop and start the namenode service. Again not much use.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode stop
no namenode to stop
Stopped Hadoop namenode: [ OK ]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode start
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out
Failed to start Hadoop namenode. Return value: 1 [FAILED]
I checked the file /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out . It just said below
log4j:ERROR Could not find value for key log4j.appender.RFA
log4j:ERROR Could not instantiate appender named "RFA".
I also checked /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-quickstart.cloudera.log.out . Found below when I searched for error. Can anyone please suggest me what is the best way to get the services back on track. Unfortunately I am not able to access cloudera manager from browser. Anything that I can do from command line?
2016-02-24 21:02:48,105 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={ROLE_TYPE=[NAMENODE], CATEGORY=[LOG_MESSAGE], ROLE=[hdfs-NAMENODE], SEVERITY=[IMPORTANT], SERVICE=[hdfs], HOST_IDS=[quickstart.cloudera], SERVICE_TYPE=[HDFS], LOG_LEVEL=[WARN], HOSTS=[quickstart.cloudera], EVENTCODE=[EV_LOG_EVENT]}, content=Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!, timestamp=1456295437905} - 1 of 17 failure(s) in last 79302s
java.io.IOException: Error connecting to quickstart.cloudera/10.0.2.15:7184
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:133)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.checkSpecificRequestor(AvroEventStorePublishProxy.java:122)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.publishEvent(AvroEventStorePublishProxy.java:196)
at com.cloudera.cmf.event.publish.EventStorePublisherWithRetry$PublishEventTask.run(EventStorePublisherWithRetry.java:242)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Network is unreachable
You can try this:
check witch process is using the port 7184 of namenode (i.e netstat linux command)
and kill that and then restart
Or
change you namenode port from conf and restart hadoop

Datastax Agent (Cassandra) Opscenter setup issue

I've setup opscenter on one of cassandra cluster nodes. After installation, when setting up my cluster, I tried installation of datastax agent on all the cluster nodes via UI, but it failed. So, I had to install the agents manually.
After manually installing the agents, the node in which opscenter is installed is able to connect, but not the other nodes. It still says, "2 agents failed to connect". What could be the issue?
PS : My cassandra cluster is setup on AWS in ubuntu
My agent.log file looks like this
ERROR [os-metrics-9] 2015-07-27 07:04:43,390 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory
ERROR [os-metrics-7] 2015-07-27 07:04:43,391 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory
ERROR [os-metrics-8] 2015-07-27 07:04:53,391 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory
ERROR [os-metrics-3] 2015-07-27 07:04:53,392 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory
ERROR [StompConnection receiver] 2015-07-27 07:05:02,946 failed connecting to **.**.**.**:61620:java.net.ConnectException: Connection timed out
You have to set the stomp_interface in the address.yaml like
stomp_interface: <ip-address>
After agent restart it should be connected.
As your agent have been able to connect from the same box where opscenter is installed, so it sounds like :
You might have not configured your firewall properly. If you please try by disabling firewall on all your boxes.
You may have multiple interfaces and C* installation picked up an undesired interface. So run ifconfig or ip command on all of your instances and check with C* yaml.
About iostat failure message : You have not install sysstat pkg. Seems, you have not install dependencies as part of DSE install.
The agents uses iostat to collect some information from disks. If it cant find it you will get that error but it just means those metrics will be missing some os metrics (likely a lot of disk and cpu metrics will be missing)
These are some useful configurations that you should keep in mind when starting the agent manually in the conf/address.yaml file:
###A name for the node to use as a label throughout OpsCenter.
alias:
###Reachable IP address of the opscenterd machine. The connection made will be on stomp_port. Internal IP in this case
stomp_interface:
###Port for the agent's HTTP service (default: 61621).
#api_port: 61621
###The stomp_port used by opscenterd. == Must match with the 'incoming_port' in opscenter.conf
stomp_port: 61620
###The IP used to identify the node.
local_interface: 100.73.158.44
###The IP that the agent HTTP server listens on.
agent_rpc_interface:
###Host used to connect to local JMX server.
jmx_host: 100.73.158.44
###Whether or not to use SSL communication between the agent and opscenterd.
use_ssl: 1
To solve the "Cannot run program 'iostat'" error, do this:
sudo apt-get install sysstat

Installing and Viewing Neo4j on Existing AWS EC2 Instance

I'm trying to install the enterprise edition of neo4j on an existing EC2 (Amazon linux) instance. So far I've
wget "link to enterprise"
untar the file
renamed and moved the folder to NEO4J_HOME
then went into the config files for neo4j.properties to make the following changes:
# Enable shell server so that remote clients can connect via Neo4j shell.
remote_shell_enabled=true
# The network interface IP the shell will listen on (use 0.0.0 for all interfaces)
remote_shell_host=127.0.0.1
# The port the shell will listen on, default is 1337
remote_shell_port=1337
EDITED Christophe Willemsen pointed out that for my original error, I had forgotten to restart the server at that point but I was still unable to access the web server while it was running. So to make it more clear, I've edited the remaining post:
I went to neo4j-server.properties and uncommented:
org.neo4j.server.webserver.address=0.0.0.0
And start the server
NEO4J_HOME/bin/neo4j start
WARNING: Max 1024 open files allowed, minimum of 40 000 recommended. See the Neo4j manual.
Using additional JVM arguments: -server -XX:+DisableExplicitGC -Dorg.neo4j.server.properties=conf/neo4j-server.properties -Djava.util.logging.config.file=conf/logging.properties -Dlog4j.configuration=file:conf/log4j.properties -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:-OmitStackTraceInFastThrow
Starting Neo4j Server...WARNING: not changing user
process [28557]... waiting for server to be ready..... OK.
http://localhost:7474/ is ready.
checking the status:
NEO4J_HOME/bin/neo4j status
Neo4j Server is running at pid 28557
I can run the shell but the when I go to localhost 7474 I still can not connect
Any help would be appreciative. The only tutorial or help I've found assumed I was starting from scratch with a new instance. If someone could provide some instructions for installing or fix my configuration that would be great.
Thanks!
You have to edit neo4j-server.properties and uncomment the line with:
org.neo4j.server.webserver.address=0.0.0.0
So that the db listens on an external interface not just localhost, and you have to open the port (7474) in your firewall rules.
Make sure to secure access to the db though:
http://neo4j.com/docs/stable/security-server.html

Problems getting RabbitMQ and Django-Celery Running: Target Machine actively refused connection

I am trying to get Django-Celery running on my Django App. I cannot get the worker server to run. When I try I get the message: No Connection could be made because the target machine actively refused it
Here is what I have done so far. First, I installed the django celery package: http://pypi.python.org/pypi/django-celery
I can load it into python without problems. I also installed the RabbitMQ server per the windows install instructions: http://www.rabbitmq.com/install.html#windows
Starting the tutorials in pytho on the RabbitMQ site I saw the need to install pika: http://pypi.python.org/pypi/pika. It imports without any problems.
From there I start the RabbitMQ server by running this at the command line: rabbitmq-service start
I get the message back that Service RabbitMQ started
Here is where I start to have problems.
I attempted the first steps in django-celery: http://packages.python.org/django-celery/getting-started/first-steps-with-django.html and the "hello world" example on the rabbitMQ site: http://www.rabbitmq.com/tutorials/tutorial-one-python.html
In both cases I get the message: No Connection could be made because the target machine actively refused it
My first thought was that this sounded like a firewall problem. So I went into the windows 7 firewall and added inbound and outbound rules to open the local and remote ports 5672 and 5673 to TCP protocol, but I still get the same error message.
When I run rabbitmqctl status i get the message:
Error: unable to connect to node 'rabbit#hostname': nodedown
diagnostics:
- nodes and their ports on hostname: [{rabbitmqctl18856, 505031}]
Does that mean it that it is trying to operate on those ports? what about the default 5672?
Any suggestions?
UPDATE: This was actually a problem resulting from several failed rabbitmq installs conflicting with the latest installation. If you have to remove rabbitmq use the 'rabbitmq-service remove' command and not SC DELETE, which cause a lot of problems for me and I had to go in and clean up my windows registry file.
The nodedown error indicated by rabbitmqctl suggests that the server isn't running on that machine.
Try going though the steps in RabbitMQ's troubleshooting guide. In particular, pay close attention to the logs. Has the server crashed for some reason? Could you post the logs somewhere?