Connectivity between NiFi and HDP Hortonworks sandbox that is running on VirtualBox - hdfs

I'm actually running a virtual machine on VirtualBox that contains the HDP 3.0.1 Sandbox. On Docker i'm running a Nifi's Docker image that maps to localhost:8443 (i remapped with docker from 8080 to 8043).
I can't bulk files into HDFS because it returns connection refused.
When I try to bulk file into HDP HDFS I always receive the following error:
Error
HortonWorks runs on this port:
Hortonworks
Using the bridged network on virtualBox, I see the following (I've overwritten the hdp-sandbox.hortonworks.com in hdfs-site.xml y core-site.xml and then copied to NiFi working folder):
error
How can I solve this error?
Thank you!

Related

H2O + HDFS (Cloudera)

We have a Cloudera cluster up and running with an h2o instance although it appears to be running off h2o.jar (which as I understand it--please correct me if incorrect) is the stand-alone h2o. I can connect, but it will not load any files from our HDFS. (all of this i can see via 'ps' on edge node.
So I started an instance with h2odriver.jar
java -jar /path/to/h2odriver.jar -nodes 2 -mapperXmx 5g -output /my/hdfs/dir
I get several output/callback addresses:
[Possible callback IP address: 10.96.243.46:33728]
[Possible callback IP address: 127.0.0.1]
Using mapper->driver callback IP address and port: 10.96.243.46:33728
So I fire up python and try and connect (same thing happens if I use 10.96.243.46):
>>>h2o.connection(ip='127.0.0.1', port='33728')
and get
'Connecting to H2O server at http://127.0.0.1:33728..... failed.
H2OConnectionError: COuld not estalich link to the H2O cloud http://127.0.0.1:33728 after 5 retries
...
Failed to establish a new connection:[Errno 111] Connection refused',))`
Thing is on my screen with the H2O jar/java job I can see:
`MapperToDriverMessage: Read invalid type (G) from socket, ignoring...
MapperToDriverMessage: read: Unknown Type `
I cannot figure out how to launch h2o in cluster mode and have it access our hdfs system or even connect. I can connect to the h2o.jar version, but that sees no hdfs (it can see the filesystem of the edgenode). What is the proper way to launch H2O so that it can see the attached HDFS system (We are running Cloudera 5.7 in a enterprise environment, Python is 3.6, H2O is 3.10.0.6 and I know we have a ton of firewalls/security-- i beleive we are setup through LDAP
You are correct that h2o.jar is meant to be the standalone version of H2O which is not meant for connecting to HDFS.
Using the appropriate h2odriver.jar for your particular hadoop distribution is the way to go.
The correct beginner instructions can be found here:
go to http://www.h2o.ai/download/
choose H2O "Latest Stable Release"
choose tab "Install on Hadoop"
It says to run the following command:
hadoop jar h2odriver.jar -nodes 1 -mapperXmx 6g -output hdfsOutputDirName
[ Note this is "hadoop jar", not "java -jar" as written in the question. ]
You should see output like this:
Determining driver host interface for mapper->driver callback...
[Possible callback IP address: 172.16.2.181]
[Possible callback IP address: 127.0.0.1]
...
Waiting for H2O cluster to come up...
H2O node 172.16.2.188:54321 requested flatfile
Sending flatfiles to nodes...
[Sending flatfile to node 172.16.2.188:54321]
H2O node 172.16.2.188:54321 reports H2O cluster size 1
H2O cluster (1 nodes) is up
(Note: Use the -disown option to exit the driver after cluster formation)
Open H2O Flow in your web browser: http://172.16.2.188:54321
(Press Ctrl-C to kill the cluster)
Blocking until the H2O cluster shuts down...
Then point your web browser to the place where it says to "Open H2O Flow in your web browser".
(The other addresses in the output are diagnostics, and not for end users.)
In this case, the python connection command would be:
h2o.connect(ip = '172.16.2.188', port = 54321)
I recommend going to Flow in a web browser, start importing a file by typing in "hdfs://", and seeing if autocompletion works. If it does, your HDFS connection is working.

informatica live data map(LDM) installation

We are facing the issue while starting the informatica cluster service.
When starting Informatica cluster services, some scripts installing Ambari server on infabde, bdemaster and bdeslave.
The script is trying to install ambari on infabde again and again in loop, So the cluster service failed to start by saying that Ambari already installed in infabde. Its not trying to install to other two nodes.
Error Log:
2017-01-12 17:10:30,763 [localhost-startStop-1] INFO com.infa.products.ihs.service.ambari.ScriptLauncher- Waiting for Script's streams to end.
2017-01-12 17:10:41,210 [localhost-startStop-1] ERROR com.infa.products.ihs.beans.application.ClusterListener- [InfaHadoopServiceException_00047] The launch of Ambari server on host [infabde.lucidtechsol.com] failed because the host already has an installed Ambari server. You can add the host to another cluster.
com.infa.products.ihs.service.exception.InfaHadoopServiceException: [InfaHadoopServiceException_00047] The launch of Ambari server on host [infabde.hostname.com] failed because the host already has an installed Ambari server. You can add the host to another cluster.
Run the reset script
./ResetScript.sh true user#server.com user#server.com
./ResetScript.sh false user#server.com user#client.com
and enable IHS then.
ResetScript.sh can be found in services/Infahadoopserveice/binaries

Installing and Viewing Neo4j on Existing AWS EC2 Instance

I'm trying to install the enterprise edition of neo4j on an existing EC2 (Amazon linux) instance. So far I've
wget "link to enterprise"
untar the file
renamed and moved the folder to NEO4J_HOME
then went into the config files for neo4j.properties to make the following changes:
# Enable shell server so that remote clients can connect via Neo4j shell.
remote_shell_enabled=true
# The network interface IP the shell will listen on (use 0.0.0 for all interfaces)
remote_shell_host=127.0.0.1
# The port the shell will listen on, default is 1337
remote_shell_port=1337
EDITED Christophe Willemsen pointed out that for my original error, I had forgotten to restart the server at that point but I was still unable to access the web server while it was running. So to make it more clear, I've edited the remaining post:
I went to neo4j-server.properties and uncommented:
org.neo4j.server.webserver.address=0.0.0.0
And start the server
NEO4J_HOME/bin/neo4j start
WARNING: Max 1024 open files allowed, minimum of 40 000 recommended. See the Neo4j manual.
Using additional JVM arguments: -server -XX:+DisableExplicitGC -Dorg.neo4j.server.properties=conf/neo4j-server.properties -Djava.util.logging.config.file=conf/logging.properties -Dlog4j.configuration=file:conf/log4j.properties -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:-OmitStackTraceInFastThrow
Starting Neo4j Server...WARNING: not changing user
process [28557]... waiting for server to be ready..... OK.
http://localhost:7474/ is ready.
checking the status:
NEO4J_HOME/bin/neo4j status
Neo4j Server is running at pid 28557
I can run the shell but the when I go to localhost 7474 I still can not connect
Any help would be appreciative. The only tutorial or help I've found assumed I was starting from scratch with a new instance. If someone could provide some instructions for installing or fix my configuration that would be great.
Thanks!
You have to edit neo4j-server.properties and uncomment the line with:
org.neo4j.server.webserver.address=0.0.0.0
So that the db listens on an external interface not just localhost, and you have to open the port (7474) in your firewall rules.
Make sure to secure access to the db though:
http://neo4j.com/docs/stable/security-server.html

how to start postgresql server on Ubuntu 12.04

I had just installed a PostgreSQL 9.1 on the Ubuntu 12.04 server (hosted by Amazon EWS).When I tried to launch the psql command, the following error message shows up.
psql: could not connect to server: No such file or directory Is the
server running locally and accepting connections on Unix domain
socket "/var/run/postgresql/.s.PGSQL.5432"?
After searching on the web, I found I have to start the Server before using it. By following this initdb link, I still cannot use the postgresql database. Are there any further work (like configuration) should I do to start the server ?
I tried to start the service : service postgresql start
Another error message shows :
No PostgreSQL clusters exist; see "man pg_createcluster"
I received this message running a new installation of Postgres 9.3 on Ubuntu 11.04. The full message was:
$ sudo /etc/init.d/postgresql start
Error: Cannot stat /var/run/postgresql
* No PostgreSQL clusters exist; see "man pg_createcluster"
Turned out that the /var/run/postgresql directory did not exist, and it is in that directory where it was attempting to create a file with the process ID. I created the directory as root and made the "postgres" user the owner, and I was able to start the server.
Further explanation found here:
http://www.postgresql.org/message-id/21044.1326496507#sss.pgh.pa.us

Unable to point VMC to my local cloud: HTTP exception: Errno::ECONNREFUSED:No connection could be made because the target machine actively refused it

WHAT AM I TRYING TO DO
Trying to setup a VCAP on a UBUNTU SERVER VM on my machine by following the steps mentioned at https://github.com/cloudfoundry/vcap/
WHAT IS THE ISSUE
Things seemed to be working fine but at step5 (https://github.com/cloudfoundry/vcap/#step-5-validate-that-you-can-connect-and-tests-pass) I got an exception while trying to execute the following command - vmc target api.vcap.me
The exception that I see on my console is:
Host is not available or is not valid: 'http://api.vcap.me'
Would you like see the response? [yN]: y
HTTP exception: Errno::ECONNREFUSED:No connection could be made because the target machine actively refused it. - connect(2)
ANY OTHER RELEVANT INFO
For some earlier experiments I was using MicroCloud (provided as a download by CloudFoundry). I am having issues in pointing my VMC to this Microcloud as well.
On the Micro Cloud console I see the following message:
To access your Micro Cloud Foundry instance, use:
vmc target http://api.agoel.cloudfoundry.me
When I run this vmc command from the Ruby Command Prompt setup on my Windows7 I get following error:
Host is not available or is not valid: 'http://api.agoel.cloudfoundry.me'
Would you like see the response? [yN]: y
HTTP exception: Errno::ETIMEDOUT:A connection attempt failed because the connected party did not properly respond after a period of time, or
ost has failed to respond. - connect(2)
WHATS DOES VMC INFO DISPLAY
I ran vmc info command on command prompt. It displayed following info
VMware's Cloud Application Platform
For support visit support DOT cloudfoundry DOT com
Target: http:// api DOT cloudfoundry DOT com (v0.999)
Client: v0.3.18
User: ankitgoel1987#gmail.com
Usage: Memory (1.1G of 2.0G total)
Services (2 of 16 total)
Apps (2 of 20 total)
MY SETUP DETAILS
Windows7 running on 4GB RAM
Microcloud from Cloudfoundry already installed (this was done as part of some other exercise. My recent experiment requires me to setup a Ubuntu server with VCAP on it. So this MicroCloud should not really matter)
vmc 0.3.18 (installed on my Windows7 machine)
ruby 1.9.2p290 (2011-07-09) [i386-mingw32]
add in your hosts files the following entry:
IP_of_ubuntu_server vcap.me api.vcap.me
If you want to avoid having to edit your hosts file every time you deploy a new app and depending on what virtualisation platform you are using you may be able to forward all traffic on port 80 for your own computer on to the VM.
*.vcap.me is set to resolve to 127.0.0.1 so this is an ideal solution. To do this you should set the network settings to NAT rather than Bridged (maybe you have done this already) and then set port 80 to forward to the IP of the guest OS. In VMWare Fusion for example this is as simple as editing a settings file.