Sqoop job fails on AWS with Connection error

Sqoop job fails on AWS with Connection error - amazon-web-services

I am executing the below sqoop command to get a table from another aws rds instance over to hdfs.
#!/bin/bash
sqoop import \
--connect jdbc:mysql://awsrds.cpclxrkdvwmz.us-east-1.rds.amazonaws.com/financials_data \
--username someuser \
--password somepwd \
--table member_score \
--m 1 \
--target-dir /capstone/member_score
I could connect to this server using the workbench.
But, sqoop fails to get the data.
The stack-trace is as shown below:
[ec2-user#ip-10-0-0-238 capstone]$ ./DataIngestion.txt
Warning: /opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/01/03 03:56:45 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.15.1
20/01/03 03:56:45 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/01/03 03:56:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
20/01/03 03:56:45 INFO tool.CodeGenTool: Beginning code generation
20/01/03 03:58:52 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
The stack-trace says connection error. But, I could connect using Mysql Workbench

As sqoop was giving a Connection Error, I tried to ping the server.
[ec2-user#ip-10-0-0-238 capstone]$ ping awsrds.cpclxrkdvwmz.us-east-1.rds.amazonaws.com
PING ec2-3-211-175-82.compute-1.amazonaws.com (3.211.175.82) 56(84) bytes of data.
^C
--- ec2-3-211-175-82.compute-1.amazonaws.com ping statistics ---
2935 packets transmitted, 0 received, 100% packet loss, time 2934021ms
On seeing that server is not reachable, the next step to check was the Security Groups settings on AWS.
The outbound rule should allow All Traffic
I had previously set Outbound to specific IP. Since AWS allocates new IP each time static IP's dont work.

Related

Cannot reach containers from codebuild

I've been having issue reaching containers from within codebuild. I have an exposed GraphQL service with a downstream auth service and a postgresql database all started through Docker Compose. Running them and testing them works fine locally, however I cannot get the right comination of host names in codebuild.
It looks like my test is able to run if I hit the GraphQL endpoint at 0.0.0.0:8000 however once my GraphQL container attempts to reach the downstream service I will get a connection refused. I've tried reaching the auth service from inside the GraphQL service at auth:8001, 0.0.0.0:8001, with port 8001 exposed, and by setting up a briged network. I am always getting a connection refused error.
I've attached part of my codebuild logs.
Any ideas what I might be missing?
Container 2018/08/28 05:37:17 Running command docker ps CONTAINER ID
IMAGE COMMAND CREATED STATUS PORTS NAMES 6c4ab1fdc980
docker-compose_graphql "app" 1 second ago Up Less than a second
0.0.0.0:8000->8000/tcp docker-compose_graphql_1 5c665f5f812d docker-compose_auth "/bin/sh -c app" 2 seconds ago Up Less than a
second 0.0.0.0:8001->8001/tcp docker-compose_auth_1 b28148784c04
postgres:10.4 "docker-entrypoint..." 2 seconds ago Up 1 second
0.0.0.0:5432->5432/tcp docker-compose_psql_1
Container 2018/08/28 05:37:17 Running command go test ; cd ../..
Register panic: [{"message":"rpc error: code = Unavailable desc = all
SubConns are in TransientFailure, latest connection error: connection
error: desc = \"transport: Error while dialing dial tcp 0.0.0.0:8001:
connect: connection refused\"","path":

From the "host" machine my exposed GraphQL service could only be reached using the IP address 0.0.0.0. The internal networking was set up correctly and each service could be reached at <NAME>:<PORT> as expected, however, upon error the IP address would be shown (172.27.0.1) instead of the host name.
My problem was that all internal connections were not yet ready, leading to the "connection refused" error. The command sleep 5 after docker-compose up gave my services time to fully initialize before testing.

cf create-service-broker fails with connection refused

I'm experimenting with CF in my local bosh-lite setup.
The apps that I deploy into if work well. I am now trying to follow the steps here
https://github.com/cf-platform-eng/cf-community-workshop/blob/master/demos/service-broker-lab.adoc
to try out the custom service broker setup.
The https://github.com/mstine/haash-broker application starts and is running fine:
$ cf apps
name requested state instances memory disk urls
haash-broker started 1/1 768M 1G haash-broker.vbox.mojito, haash-broker.192.168.50.6.xip.io
I can access it from my host machine browser well:
http://haash-broker.192.168.50.6.xip.io/v2/catalog
But when I execute the
cf create-service-broker haash-broker warreng natedogg http://haash-broker.192.168.50.6.xip.io
I get
$ cf create-service-broker haash-broker warreng natedogg http://haash-broker.192.168.50.6.xip.io
Creating service broker haash-broker as admin...
FAILED
Server error, status code: 502, error code: 10001, message: The service broker could not be reached: http://haash-broker.192.168.50.6.xip.io/v2/catalog
When I log in into the CC VM:
$ bosh -e vbox -f cf ssh api/eb4cec99-bab1-4513-a980-fb92775ac2d8
I can ping the hostname:
api/eb4cec99-bab1-4513-a980-fb92775ac2d8:~$ sudo ping haash-broker.192.168.50.6.xip.io
PING haash-broker.192.168.50.6.xip.io (192.168.50.6) 56(84) bytes of data.
64 bytes from 192.168.50.6: icmp_seq=1 ttl=64 time=0.080 ms
But wget connection gets refused:
api/eb4cec99-bab1-4513-a980-fb92775ac2d8:~$ wget http://warreng:natedogg#haash-broker.192.168.50.6.xip.io/v2/catalog
--2018-04-06 04:19:05-- http://warreng:*password*#haash-broker.192.168.50.6.xip.io/v2/catalog
Resolving haash-broker.192.168.50.6.xip.io (haash-broker.192.168.50.6.xip.io)... 192.168.50.6
Connecting to haash-broker.192.168.50.6.xip.io (haash-broker.192.168.50.6.xip.io)|192.168.50.6|:80... failed: Connection refused.
The firewall permits everything on that VM (sudo iptables -L).
The hostname gets resolved properly. The ping works and the 80 port is open on the target IP, since I can reach it from my host browser.
How can that be that the wget doesn't work in such situation?
This also seems to be the reason for me failing to create a service broker cf create-service-broker
UPDATE
I've managed to to execute the cf create-service-broker command with URL of an nginx reverse proxy running outside of my bosh-lite environment. The proxy redirects to the same initial URL http://haash-broker.192.168.50.6.xip.io
and the command succeeds in this way.
But the subsequent
cf create-service-broker haash-broker warreng natedogg http://haash-broker.192.168.50.1.xip.io:9999
cf enable-service-access haash
cf create-service HaaSh basic my-hash
(where haash-broker.192.168.50.1.xip.io:9999 is my nginx proxy) fails with
Server error, status code: 502, error code: 10001, message: The service broker rejected the request to http://haash-broker.192.168.50.1.xip.io:9999/v2/service_instances/4ef19154-d238-4cb3-8003-803fba53af3f?accepts_incomplete=true. Status Code: 400 Bad Request, Body: {"timestamp":1523008856993,"error":"Bad Request","status":400,"message":""}
I can see in both nginx and broker app logs that the the request reaches the broker and it answers with 400.
Debugging now why.

Can you post the result of --server-response option used with wget? Also what happens when you try to curl the broker?
Broker requires credentials, but it is interesting if it responds with 401 or 500 on the first request that wget makes without credentials.

H2O + HDFS (Cloudera)

We have a Cloudera cluster up and running with an h2o instance although it appears to be running off h2o.jar (which as I understand it--please correct me if incorrect) is the stand-alone h2o. I can connect, but it will not load any files from our HDFS. (all of this i can see via 'ps' on edge node.
So I started an instance with h2odriver.jar
java -jar /path/to/h2odriver.jar -nodes 2 -mapperXmx 5g -output /my/hdfs/dir
I get several output/callback addresses:
[Possible callback IP address: 10.96.243.46:33728]
[Possible callback IP address: 127.0.0.1]
Using mapper->driver callback IP address and port: 10.96.243.46:33728
So I fire up python and try and connect (same thing happens if I use 10.96.243.46):
>>>h2o.connection(ip='127.0.0.1', port='33728')
and get
'Connecting to H2O server at http://127.0.0.1:33728..... failed.
H2OConnectionError: COuld not estalich link to the H2O cloud http://127.0.0.1:33728 after 5 retries
...
Failed to establish a new connection:[Errno 111] Connection refused',))`
Thing is on my screen with the H2O jar/java job I can see:
`MapperToDriverMessage: Read invalid type (G) from socket, ignoring...
MapperToDriverMessage: read: Unknown Type `
I cannot figure out how to launch h2o in cluster mode and have it access our hdfs system or even connect. I can connect to the h2o.jar version, but that sees no hdfs (it can see the filesystem of the edgenode). What is the proper way to launch H2O so that it can see the attached HDFS system (We are running Cloudera 5.7 in a enterprise environment, Python is 3.6, H2O is 3.10.0.6 and I know we have a ton of firewalls/security-- i beleive we are setup through LDAP

You are correct that h2o.jar is meant to be the standalone version of H2O which is not meant for connecting to HDFS.
Using the appropriate h2odriver.jar for your particular hadoop distribution is the way to go.
The correct beginner instructions can be found here:
go to http://www.h2o.ai/download/
choose H2O "Latest Stable Release"
choose tab "Install on Hadoop"
It says to run the following command:
hadoop jar h2odriver.jar -nodes 1 -mapperXmx 6g -output hdfsOutputDirName
[ Note this is "hadoop jar", not "java -jar" as written in the question. ]
You should see output like this:
Determining driver host interface for mapper->driver callback...
[Possible callback IP address: 172.16.2.181]
[Possible callback IP address: 127.0.0.1]
...
Waiting for H2O cluster to come up...
H2O node 172.16.2.188:54321 requested flatfile
Sending flatfiles to nodes...
[Sending flatfile to node 172.16.2.188:54321]
H2O node 172.16.2.188:54321 reports H2O cluster size 1
H2O cluster (1 nodes) is up
(Note: Use the -disown option to exit the driver after cluster formation)
Open H2O Flow in your web browser: http://172.16.2.188:54321
(Press Ctrl-C to kill the cluster)
Blocking until the H2O cluster shuts down...
Then point your web browser to the place where it says to "Open H2O Flow in your web browser".
(The other addresses in the output are diagnostics, and not for end users.)
In this case, the python connection command would be:
h2o.connect(ip = '172.16.2.188', port = 54321)
I recommend going to Flow in a web browser, start importing a file by typing in "hdfs://", and seeing if autocompletion works. If it does, your HDFS connection is working.

Not able to access HDFS

I installed cloudera vm and started trying some basic stuff. First I just wanted to ls the hdfs directoires. so I issued the below command.
[cloudera#quickstart ~]$ hadoop fs -ls /
ls: Failed on local exception: java.net.SocketException: Network is unreachable; Host Details : local host is: "quickstart.cloudera/10.0.2.15"; destination host is: "quickstart.cloudera":8020;
though ps -fu hdfs says both namenode and data node is running. I checked the status using the service command.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Thinking all the problems will be resolved if I restart all the services, I executed the below command.
[cloudera#quickstart conf]$ sudo /home/cloudera/cloudera-manager --express --force
[QuickStart] Shutting down CDH services via init scripts...
[QuickStart] Disabling CDH services on boot...
[QuickStart] Starting Cloudera Manager daemons...
[QuickStart] Waiting for Cloudera Manager API...
[QuickStart] Configuring deployment...
Submitted jobs: 92
[QuickStart] Deploying client configuration...
Submitted jobs: 93
[QuickStart] Starting Cloudera Management Service...
Submitted jobs: 101
[QuickStart] Enabling Cloudera Manager daemons on boot...
Now I thought all services will be up so again checked the status of namenode service. Again it came failed.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Now I decided to manually stop and start the namenode service. Again not much use.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode stop
no namenode to stop
Stopped Hadoop namenode: [ OK ]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode start
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out
Failed to start Hadoop namenode. Return value: 1 [FAILED]
I checked the file /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out . It just said below
log4j:ERROR Could not find value for key log4j.appender.RFA
log4j:ERROR Could not instantiate appender named "RFA".
I also checked /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-quickstart.cloudera.log.out . Found below when I searched for error. Can anyone please suggest me what is the best way to get the services back on track. Unfortunately I am not able to access cloudera manager from browser. Anything that I can do from command line?
2016-02-24 21:02:48,105 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={ROLE_TYPE=[NAMENODE], CATEGORY=[LOG_MESSAGE], ROLE=[hdfs-NAMENODE], SEVERITY=[IMPORTANT], SERVICE=[hdfs], HOST_IDS=[quickstart.cloudera], SERVICE_TYPE=[HDFS], LOG_LEVEL=[WARN], HOSTS=[quickstart.cloudera], EVENTCODE=[EV_LOG_EVENT]}, content=Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!, timestamp=1456295437905} - 1 of 17 failure(s) in last 79302s
java.io.IOException: Error connecting to quickstart.cloudera/10.0.2.15:7184
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:133)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.checkSpecificRequestor(AvroEventStorePublishProxy.java:122)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.publishEvent(AvroEventStorePublishProxy.java:196)
at com.cloudera.cmf.event.publish.EventStorePublisherWithRetry$PublishEventTask.run(EventStorePublisherWithRetry.java:242)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Network is unreachable

You can try this:
check witch process is using the port 7184 of namenode (i.e netstat linux command)
and kill that and then restart
Or
change you namenode port from conf and restart hadoop

Confd error: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

While debugging I realised that confd doesn't pick up the keys and my journal looks like this:
Sep 18 18:31:50 ip-10-171-54-76.ec2.internal docker[24891]: [nginx] waiting for confd to refresh nginx.conf
Sep 18 18:31:56 ip-10-171-54-76.ec2.internal docker[24891]: 2014-09-18T18:31:56Z 9122c7a54edc confd[9572]: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
I use nsenter to log in to the running container to run some experiments for debugging purposes. I ran this command
confd -onetime -node 172.17.42.1:4001 -config-file /etc/confd/conf.d/nginx.toml
Then received this error as above
confd[12894]: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
I am totally clueless at this point. I am using EC2 with the stable version of CoreOS and I am sure that etcd is running on the host. Also, I can ping the host from inside the container successfully.
Any ideas on what's wrong?
Assistance will be much appreciated.

This error indicates that your etcd cluster isn't operating correctly, so confd has nothing to watch. It has probably lost quorum. The logs (journalctl -u etcd) should indicate what happened.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js