Python Requests Post request fails when connecting to a Kerberized Hadoop cluster with Livy

Python Requests Post request fails when connecting to a Kerberized Hadoop cluster with Livy - python-2.7

I'm trying to connect to a kerberized hadoop cluster via Livy to execute Spark code. The requests call im making is as below.
kerberos_auth = HTTPKerberosAuth(mutual_authentication=REQUIRED, force_preemptive=True)
r = requests.post(host + '/sessions', data=json.dumps(data), headers=headers, auth=kerberos_auth)
This call fails with the following error
GSSException: No valid credentials provided (Mechanism level: Failed
to find any Kerberos credentails)
Any help here would be appreciated.

When running Hadoop service daemons in Hadoop in secure mode, Kerberos tickets are decrypted with a keytab and the service uses the keytab to determine the credentials of the user coming into the cluster. Without a keytab in place with the right service principal inside of it, you will get this error message. Please refer to Hadoop in Secure Mode for further details on setting up the keytab.

Related

"Kafka Timed out waiting for a node assignment." on MSK

Specs:
The serverless Amazon MSK that's in preview.
t2.xlarge EC2 instance with Amazon Linux 2
Installed Kafka from https://dlcdn.apache.org/kafka/3.0.0/kafka_2.13-3.0.0.tgz
openjdk version "11.0.13" 2021-10-19 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode,
sharing)
Gradle 7.3.3
https://github.com/aws/aws-msk-iam-auth, successfully built.
I also tried adding IAM authentication information, as recommended by the Amazon MSK Library for AWS Identity and Access Management. It says to add the following in config/client.properties:
# Sets up TLS for encryption and SASL for authN.
security.protocol = SASL_SSL
# Identifies the SASL mechanism to use.
sasl.mechanism = AWS_MSK_IAM
# Binds SASL client implementation.
# sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required;
# Encapsulates constructing a SigV4 signature based on extracted credentials.
# The SASL client bound by "sasl.jaas.config" invokes this class.
sasl.client.callback.handler.class = software.amazon.msk.auth.iam.IAMClientCallbackHandler
# Binds SASL client implementation. Uses the specified profile name to look for credentials.
sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required awsProfileName="kafka-client";
And kafka-client is the IAM role attached to the EC2 instance as an instance profile.
Networking: I used VPC Reachability Analyzer to confirm that the security groups are configured correctly and the EC2 instance I'm using as a Producer can reach the serverless MSK cluster.
What I'm trying to do: create a topic.
How I'm trying: bin/kafka-topics.sh --create --partitions 1 --replication-factor 1 --topic quickstart-events --bootstrap-server boot-zclcyva3.c2.kafka-serverless.us-east-2.amazonaws.com:9098
Result:
Error while executing topic command : Timed out waiting for a node assignment. Call: createTopics
[2022-01-17 01:46:59,753] ERROR org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: createTopics
(kafka.admin.TopicCommand$)
I'm also trying: with the plaintext port of 9092. (9098 is the IAM-authentication port in MSK, and serverless MSK uses IAM authentication by default.)
All the other posts I found on SO about this node assignment error didn't include MSK. I tried suggestions like uncommenting the listener setting in server.properties, but that didn't change anything.
Installing kcat for troubleshooting didn't work for me, since there's no out-of-the box installation for the yum package manager, which Amazon Linux 2 uses, and since these instructions failed for me at checking for libcurl (by compile)... failed (fail).
The Question: Any other tips on solving this "node assignment" error?

The documentation has been updated recently, I was able to follow it end to end without any issue (The IAM policy is now correct)
https://docs.aws.amazon.com/msk/latest/developerguide/serverless-getting-started.html

The created properties file is not automatically used; your command needs to include --command-config client.properties, where this properties file is documented at the MSK docs on the linked IAM page.
Extract...
ssl.truststore.location=<PATH_TO_TRUST_STORE_FILE>
security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
Alternatively, if the plaintext port didn't work, then you have other networking issues
Beyond these steps, I suggest reaching out to MSK support, and telling them to update the "Create a Topic" page to no longer use Zookeeper, keeping in mind that Kafka 3.0 is not (yet) supported

Tinkerpop Gremlin Console: java.lang.NoSuchMethodError: org.apache.tinkerpop.gremlin.driver.RequestOptions$Builder.userAgent

As my last post at 403 Forbidden error for Gremlin to AWS Neptune, I could successfully connect to my Neptune Cluster DB via my Tinkerpop Gremlin console v 3.4.3 that installed at my EC2 instance as v 3.4.1 suggested at https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth-connecting-gremlin-console.html didn't work for me.
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> :remote connect tinkerpop.server conf/neptune-remote.yaml
==>Configured <my neptune>.cluster-cm<cluster id>.ap-southeast-2.neptune.amazonaws.com/<private ip>:8182
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [<my neptune>.cluster-cm<cluster id>.ap-southeast-2.neptune.amazonaws.com/<private ip>:8182] - type ':remote console' to return to local mode
However, I'm getting NoSuchMethodError error for all Gremlin commands (g.) that I used on the console.
e.g:
g.V()
gremlin> g.V()
org.apache.tinkerpop.gremlin.driver.RequestOptions$Builder.userAgent(Ljava/lang/String;)Lorg/apache/tinkerpop/gremlin/driver/RequestOptions$Builder;
Type ':help' or ':h' for help.
Display stack trace? [yN]Y
java.lang.NoSuchMethodError: org.apache.tinkerpop.gremlin.driver.RequestOptions$Builder.userAgent(Ljava/lang/String;)Lorg/apache/tinkerpop/gremlin/driver/RequestOptions$Builder;
at org.apache.tinkerpop.gremlin.console.jsr223.DriverRemoteAcceptor.send(DriverRemoteAcceptor.java:214)
at org.apache.tinkerpop.gremlin.console.jsr223.DriverRemoteAcceptor.submit(DriverRemoteAcceptor.java:168)
at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:110)
...
g.addV('person').property('name', 'justin')
gremlin> g.addV('person').property('name', 'justin')
org.apache.tinkerpop.gremlin.driver.RequestOptions$Builder.userAgent(Ljava/lang/String;)Lorg/apache/tinkerpop/gremlin/driver/RequestOptions$Builder;
Type ':help' or ':h' for help.
Display stack trace? [yN]Y
java.lang.NoSuchMethodError: org.apache.tinkerpop.gremlin.driver.RequestOptions$Builder.userAgent(Ljava/lang/String;)Lorg/apache/tinkerpop/gremlin/driver/RequestOptions$Builder;
at org.apache.tinkerpop.gremlin.console.jsr223.DriverRemoteAcceptor.send(DriverRemoteAcceptor.java:214)
at org.apache.tinkerpop.gremlin.console.jsr223.DriverRemoteAcceptor.submit(DriverRemoteAcceptor.java:168)
at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:110)
....
I have also tried the latest Apache Tinkerpop Gremlin Console 3.4.6, same error I had...
Thanks

I think the step you're missing is taking the temporary credentials provided by your EC2 instance's assigned IAM role and pushing those into the Default Credential Provider chain in order for them to be seen by the SigV4Channelizer used by the Gremlin Console. A high level overview of that process can be seen here: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html
A more prescriptive way of handling this for Neptune can be found here: https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth-temporary-credentials.html See the section titled, "Setting Up Amazon EC2 for Neptune IAM Authentication".

I just tried to use Gremlin console 3.4.1 and it's working as expected... I think it's due to Incompatible Version issue. I was using Gremlin console 3.4.6

Greengrass_HelloWorld lambda doesn't publish to Amazon IoT console

I have been following the documentation in every step, and I didn't face any errors. Configured, deployed and made a subscription to hello/world topic just as the documentation detailed. However, when I arrived at the testing step here: https://docs.aws.amazon.com/greengrass/latest/developerguide/lambda-check.html
No messages were showing up on the IoT console (subscription view hello/world)! I am using Greengrass core daemon which runs on my Ubuntu machine, it is active and listens to port 8000. I don't think there is anything wrong with my local device because the group was deployed successfully and because I see the communications going both ways on Wireshark.
I have these logs on my machine: /home/##/Desktop/greengrass/ggc/var/log/system/runtime.log:
[2019-09-28T06:57:42.492-07:00][INFO]-===========================================
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Version: 1.9.3-RC3
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Root: /home/##/Desktop/greengrass
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Write Directory: /home/##/Desktop/greengrass/ggc
[2019-09-28T06:57:42.492-07:00][INFO]-Group File Directory: /home/##/Desktop/greengrass/ggc/deployment/group
[2019-09-28T06:57:42.492-07:00][INFO]-Default Lambda UID: 122
[2019-09-28T06:57:42.492-07:00][INFO]-Default Lambda GID: 127
[2019-09-28T06:57:42.492-07:00][INFO]-===========================================
[2019-09-28T06:57:42.492-07:00][INFO]-The current core is using the AWS IoT certificates with fingerprint. {"fingerprint": "90##4d"}
[2019-09-28T06:57:42.492-07:00][INFO]-Will persist worker process info. {"dir": "/home/##/Desktop/greengrass/ggc/ggc/core/var/worker/processes"}
[2019-09-28T06:57:42.493-07:00][INFO]-Will persist worker process info. {"dir": "/home/##/Desktop/greengrass/ggc/ggc/core/var/worker/processes"}
[2019-09-28T06:57:42.494-07:00][INFO]-No proxy URL found.
[2019-09-28T06:57:42.495-07:00][INFO]-Started Deployment Agent to listen for updates. [2019-09-28T06:57:42.495-07:00][INFO]-Connecting with MQTT. {"endpoint": "a6##ws-ats.iot.us-east-2.amazonaws.com:8883", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.497-07:00][INFO]-The current core is using the AWS IoT certificates with fingerprint. {"fingerprint": "90##4d"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection successful. {"attemptId": "GVko", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection established. {"endpoint": "a6##ws-ats.iot.us-east-2.amazonaws.com:8883", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection connected. Start subscribing. {"clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-Deployment agent connected to cloud.
[2019-09-28T06:57:42.685-07:00][INFO]-Start subscribing. {"numOfTopics": 2, "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-Trying to subscribe to topic $aws/things/simulators_gg_Core-gda/shadow/update/delta
[2019-09-28T06:57:42.727-07:00][INFO]-Trying to subscribe to topic $aws/things/simulators_gg_Core-gda/shadow/get/accepted
[2019-09-28T06:57:42.814-07:00][INFO]-All topics subscribed. {"clientId": "simulators_gg_Core"}
[2019-09-28T06:58:57.888-07:00][INFO]-Daemon received signal: terminated. [2019-09-28T06:58:57.888-07:00][INFO]-Shutting down daemon.
[2019-09-28T06:58:57.888-07:00][INFO]-Stopping all workers.
[2019-09-28T06:58:57.888-07:00][INFO]-Lifecycle manager is stopped.
[2019-09-28T06:58:57.888-07:00][INFO]-IPC server stopped.
/home/##/Desktop/greengrass/ggc/var/log/system/localwatch/localwatch.log:
[2019-09-28T06:57:42.491-07:00][DEBUG]-will keep the log files for the following lambdas {"readingPath": "/home/##/Desktop/greengrass/ggc/var/log/user", "lambdas": "map[]"}
[2019-09-28T06:57:42.492-07:00][WARN]-failed to list the user log directory {"path": "/home/##/Desktop/greengrass/ggc/var/log/user"}
Thanks in advance.

I had a similar issue on another platform (Jetson Nano). I could not get a response after going through the AWS instructions for setting up a simple Lambda using IOT Greengrass. In my search for answers I discovered that AWS has a qualification test script for any device you connect.
It goes through an automated process of deploying and testing a lambda function(as well as other functionality) and reports results for each step and docs provide troubleshooting info for failures.
By going through those tests I was able to narrow down the issues with my setup, installation, and configuration. The testing docs give pointers to troubleshoot test results. Here is a link to the test: https://docs.aws.amazon.com/greengrass/latest/developerguide/device-tester-for-greengrass-ug.html
If you follow the 'Next Topic' links, it will take you through the complete test. Let me warn you that its extensive, and will take some time, but for me it gave a lot of detailed insight that a hello world does not.

Spark is inventing his own AWS secretKey

I'm trying to read a s3 bucket from Spark and up until today Spark always complain that the request return 403
hadoopConf = spark_context._jsc.hadoopConfiguration()
hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
logs = spark_context.textFile("s3a://mybucket/logs/*)
Spark was saying .... Invalid Access key [ACCESSKEY]
However with the same ACCESSKEY and SECRETKEY this was working with aws-cli
aws s3 ls mybucket/logs/
and in python boto3 this was working
resource = boto3.resource("s3", region_name="us-east-1")
resource.Object("mybucket", "logs/text.py") \
.put(Body=open("text.py", "rb"),ContentType="text/x-py")
so my credentials ARE invalid and the problem is definitely something with Spark..
Today I decided to turn on the "DEBUG" log for the entire spark and to my suprise... Spark is NOT using the [SECRETKEY] I have provided but instead... add a random one???
17/03/08 10:40:04 DEBUG request: Sending Request: HEAD https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS ACCESSKEY:[RANDON-SECRET-KEY], User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65, Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type: application/x-www-form-urlencoded; charset=utf-8, )
This is why it still return 403! Spark is not using the key I provide with fs.s3a.secret.key but instead invent a random one??
For the record I'm running this locally on my machine (OSX) with this command
spark-submit --packages com.amazonaws:aws-java-sdk-pom:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py
Could some one enlighten me on this?

(updated as my original one was downvoted as clearly considered unacceptable)
The AWS auth protocol doesn't send your secret over the wire. It signs the message. That's why what you see isn't what you passed in.
For further information, please reread.

I ran into a similar issue. Requests that were using valid AWS credentials returned a 403 Forbidden, but only on certain machines. Eventually I found out that the system time on those particular machines were 10 minutes behind. Synchronizing the system clock solved the problem.
Hope this helps!

It is very intriguing this random passkey. Maybe AWS SDK is getting the password from OS environment.
In hadoop 2.8, the default AWS provider chain shows the following list of providers:
BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider
Order, of course, matters! the AWSCredentialProviderChain, get the first keys from the first provider that provides that information.
if (credentials.getAWSAccessKeyId() != null &&
credentials.getAWSSecretKey() != null) {
log.debug("Loading credentials from " + provider.toString());
lastUsedProvider = provider;
return credentials;
}
See the code in "GrepCode for AWSCredentialProviderChain".
I face similar problem using profile credentials. SDK was ignoring the credentials inside ~/.aws/credentials (as good practice, I encourage you to not store credentials inside the program in any way).
My solution...
Set the credentials provider to use ProfileCredentialsProvider
sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3.eu-central-1.amazonaws.com") # yes, I am using central eu server.
sc._jsc.hadoopConfiguration().set('fs.s3a.aws.credentials.provider', 'com.amazonaws.auth.profile.ProfileCredentialsProvider')

Folks, go for the IAM configuration based on Roles ... that will open up S3 access policies that should be added to the EMR default one.

Presto server on AWS - Cannot connect to discovery server

Trying to run Presto coordinator server with discovery server embedded on AWS CDH4 cluster
config.properties:
coordinator=true
datasources=jmx
http-server.http.port=8000
presto-metastore.db.type=h2
presto-metastore.db.filename=var/db/MetaStore
task.max-memory=1GB
discovery-server.enabled=true
discovery.uri=http://ip-10-0-0-11:8000
When server starts it can't register itself with discovery (relevant logs):
2013-11-08T19:38:38.193+0000 WARN main Bootstrap Warning: Configuration property 'discovery.uri' is deprecated and should not be used
2013-11-08T19:38:38.968+0000 INFO main Bootstrap discovery-server.enabled false true
2013-11-08T19:38:38.975+0000 INFO main Bootstrap discovery.uri null http://ip-10-0-0-11:8000 Discovery service base URI
2013-11-08T19:38:40.916+0000 ERROR Discovery-0 io.airlift.discovery.client.CachingServiceSelector Cannot connect to discovery server for refresh (collector/general): Lookup of collector failed for http://ip-10-0-0-11:8000/v1/service/collector/general
2013-11-08T19:38:42.556+0000 ERROR Discovery-1 io.airlift.discovery.client.CachingServiceSelector Cannot connect to discovery server for refresh (presto/general): Lookup of presto failed for http://ip-10-0-0-11:8000/v1/service/presto/general
2013-11-08T19:38:43.854+0000 INFO main org.eclipse.jetty.server.AbstractConnector Started SelectChannelConnector#0.0.0.0:8000
Tried to also run standalone Discovery server, same effect. Looks that listener is started after registration attempt is made.

I was wondering if someone would notice this in the logs :) It's actually not a problem. The error appears because the discovery client starts before the discovery server is ready. You'll see "succeeded for refresh" shortly after in the logs which shows that it's working. We will fix the log message eventually but it's purely a cosmetic issue.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js