Spark is inventing his own AWS secretKey - amazon-web-services

I'm trying to read a s3 bucket from Spark and up until today Spark always complain that the request return 403
hadoopConf = spark_context._jsc.hadoopConfiguration()
hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
logs = spark_context.textFile("s3a://mybucket/logs/*)
Spark was saying .... Invalid Access key [ACCESSKEY]
However with the same ACCESSKEY and SECRETKEY this was working with aws-cli
aws s3 ls mybucket/logs/
and in python boto3 this was working
resource = boto3.resource("s3", region_name="us-east-1")
resource.Object("mybucket", "logs/text.py") \
.put(Body=open("text.py", "rb"),ContentType="text/x-py")
so my credentials ARE invalid and the problem is definitely something with Spark..
Today I decided to turn on the "DEBUG" log for the entire spark and to my suprise... Spark is NOT using the [SECRETKEY] I have provided but instead... add a random one???
17/03/08 10:40:04 DEBUG request: Sending Request: HEAD https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS ACCESSKEY:[RANDON-SECRET-KEY], User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65, Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type: application/x-www-form-urlencoded; charset=utf-8, )
This is why it still return 403! Spark is not using the key I provide with fs.s3a.secret.key but instead invent a random one??
For the record I'm running this locally on my machine (OSX) with this command
spark-submit --packages com.amazonaws:aws-java-sdk-pom:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py
Could some one enlighten me on this?

(updated as my original one was downvoted as clearly considered unacceptable)
The AWS auth protocol doesn't send your secret over the wire. It signs the message. That's why what you see isn't what you passed in.
For further information, please reread.

I ran into a similar issue. Requests that were using valid AWS credentials returned a 403 Forbidden, but only on certain machines. Eventually I found out that the system time on those particular machines were 10 minutes behind. Synchronizing the system clock solved the problem.
Hope this helps!

It is very intriguing this random passkey. Maybe AWS SDK is getting the password from OS environment.
In hadoop 2.8, the default AWS provider chain shows the following list of providers:
BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider
Order, of course, matters! the AWSCredentialProviderChain, get the first keys from the first provider that provides that information.
if (credentials.getAWSAccessKeyId() != null &&
credentials.getAWSSecretKey() != null) {
log.debug("Loading credentials from " + provider.toString());
lastUsedProvider = provider;
return credentials;
}
See the code in "GrepCode for AWSCredentialProviderChain".
I face similar problem using profile credentials. SDK was ignoring the credentials inside ~/.aws/credentials (as good practice, I encourage you to not store credentials inside the program in any way).
My solution...
Set the credentials provider to use ProfileCredentialsProvider
sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3.eu-central-1.amazonaws.com") # yes, I am using central eu server.
sc._jsc.hadoopConfiguration().set('fs.s3a.aws.credentials.provider', 'com.amazonaws.auth.profile.ProfileCredentialsProvider')

Folks, go for the IAM configuration based on Roles ... that will open up S3 access policies that should be added to the EMR default one.

Related

s3_out: unable to sign request without credentials set

I try to use "instance_profile_credentials" at ec2 instance as credentials. However I get
2021-09-16 14:16:50 +0000 [error]: #0 unexpected error error_class=RuntimeError error="can't call S3 API. Please check your credentials or s3_region configuration. error = #<Aws::Errors::MissingCredentialsError: unable to sign request without credentials set>"
I pretty sure my s3_region is correct, and I can use cli "aws s3 cp " to copy object at command line, not sure what going wrong.
I wonder if that because I am under a http proxy. However, I already setup "proxy_uri" parameter. Not sure what else I can do to check what going wrong?

Can I set AWS credentials on a Spring Boot/Cloud #SqsListener? (Java)

Double newbie here, to both SQS and Spring Cloud. I've created (using the console) an SQS queue. The company wiki I'm working from says then to generate temporary credentials, which come out looking like this:
aws_access_key_id = <secret>
aws_secret_access_key = <secret>
region = us-west-2
aws_session_token = <secret and VERY LONG, like 240 characters>
NOTE: more on that "aws_session_token" later.
So, once I have done that, I can send a message from the CLI, like this.
`aws --endpoint-url https://sqs.us-west-2.amazonaws.com/99999999999999/<queue name>.fifo sqs send-message --queue-url https://sqs.us-west-2.amazonaws.com/99999999999999/<queue name>.fifo --message-body "cli test msg 2" --message-group-id "azgroup"`
So far so good. But now, I want to implement an SqsListener to listen continuously. So, I checked out the code here https://github.com/sixthpoint/spring-boot-sqs-fifo-tutorial, which is a minimal Spring Cloud SQS application, and set all the configs as shown in the readme. My listener, right now, looks simply like this:
#SqsListener(value=SQSURL)
public void process(String json) throws IOException {
System.out.println("here");
System.out.println(json);
}
But, when I try to start the application up, I get this error:
com.amazonaws.services.sqs.model.AmazonSQSException: The security token included in the request is invalid. (Service: AmazonSQS; Status Code: 403; Error Code: InvalidClientTokenId; Request ID:....)
I think what's going on is that at startup, the listener is trying to contact my queue, and is being rejected because it's not sending that aws_session_token. (The company wiki, again, says this: "You will see aws_session_token. This is something you have not had before. It is required for your key to work!")
So, is there a way to explicitly set my AWS parameters, either in the Java code where the #SqsListener is defined, or somewhere in configs, such that the aws_session_token gets passed? It doesn't seem possible to pass an AwsCredentials object. (edit) And it doesn't seem that that would help me anyway, since AwsCredentials doesn't contain that field.
Or . . . is there some other way of solving this?
Answering, or at least partially answering, my own question: It turns out that the aws_session_token is required when, and only when, using temporary aws credentials, which as I noted is what I've been given to work with. It has to be added to any CLI operations, but there is no way to set it the AwsCredentials object in Java code. So that's not going to help me. It may just not be possible to connect from Java code when using temporary credentials. If I'm wrong and there is a way, please let me know.

while installing aws amplify init on terminal .gives error

i am getting this while doing amplify init , so main agenda is to develop authentication through aws-cognito , which is using aws-amplify,
? Do you want to use an AWS profile? Yes
? Please choose the profile you want to use default
init failed
Error: read ECONNRESET
at TLSWrap.onStreamRead (internal/stream_base_commons.js:205:27) {
message: 'read ECONNRESET',
errno: 'ECONNRESET',
code: 'NetworkingError',
syscall: 'read',
region: 'us-east-1',
hostname: 'amplify.us-east-1.amazonaws.com',
retryable: true,
time: 2020-04-16T12:09:59.975Z
You may try the following strategies to eliminate the problem you are facing,
This more of looks like a Network problem as per the logs from your
Terminal, therefore if you have a jittery connection, I would
recommend that you try the same on a stable internet connection.
I will recommend to do an amplify delete in case there is some mis-configuration from the last time you did an amplify init, but the chances of this are very less.
Check your aws environment variables or configuration file maybe the credentials of your aws account are missing. Try doing an aws configure and reset the values of your key,secret, and region.
I hope the above suggestions help you somehow.

Does AWS CPP S3 SDK support "Transfer acceleration"

I enabled "Transfer acceleration" on my bucket. But I dont see any improvement in speed of Upload in my C++ application. I have waited for more than 20 minutes that is mentioned in AWS Documentation.
Does the SDK support "Transfer acceleration" by default or is there a run time flag or compiler flag? I did not spot anything in the SDK code.
thanks
Currently, there isn't a configuration option that simply turns on transfer acceleration. You can however, use endpoint override in the client configuration to set the accelerated endpoint.
What I did to enable a (working) transfer acceleration:
set in the bucket configuration on the AWS panel "Transfer Acceleration" to enabled.
add to the IAM user that I use inside my C++ application the permission s3::PutAccelerateConfiguration
Add the following code to the s3 transfer configuration (bucket_ is your bucket name, the final URL must match the one shown in the AWS panel "Transfer Acceleration"):
Aws::Client::ClientConfiguration config;
/* other configuration options */
config.endpointOverride = bucket_ + ".s3-accelerate.amazonaws.com";
Ask for acceleration to the bucket before transfer... (docs in here )
auto s3Client = Aws::MakeShared<Aws::S3::S3Client>("Uploader",
Aws::Auth::AWSCredentials(id_, key_), config);
Aws::S3::Model::PutBucketAccelerateConfigurationRequest bucket_accel;
bucket_accel.SetAccelerateConfiguration(
Aws::S3::Model::AccelerateConfiguration().WithStatus(
Aws::S3::Model::BucketAccelerateStatus::Enabled));
bucket_accel.SetBucket(bucket_);
s3Client->PutBucketAccelerateConfiguration(bucket_accel);
You can check in the detailed logs of the AWS sdk that your code is using the accelerated entrypoint and you can also check that before the transfer start there is a call to /?accelerate (info)
What worked for me:
Enabling S3 Transfer Acceleration within AWS console
When configuring the client, only utilize the accelerated endpoint service:
clientConfig->endpointOverride = "s3-accelerate.amazonaws.com";
#gabry - your solution was extremely close, I think the reason it wasn't working for me was perhaps due to SDK changes since originally posted as the change is relatively small. Or maybe because I am constructing put object templates for requests used with the transfer manager.
Looking through the logs (Debug level) the SDK automatically concatenates the bucket used in transferManager::UploadFile() with the overridden endpoint. I was getting unresolved host errors as the requested host looked like:
[DEBUG] host: myBucket.myBucket.s3-accelerate.amazonaws.com
This way I could still keep the same S3_BUCKET macro name while only selectively calling this when instantiating a new configuration for upload.
e.g.
<<
...
auto putTemplate = new Aws::S3::Model::PutObjectRequest();
putTemplate->SetStorageClass(STORAGE_CLASS);
transferConfig->putObjectTemplate = *putTemplate;
auto multiTemplate = new Aws::S3::Model::CreateMultipartUploadRequest();
multiTemplate->SetStorageClass(STORAGE_CLASS);
transferConfig->createMultipartUploadTemplate = *multiTemplate;
transferMgr = Aws::Transfer::TransferManager::Create(*transferConfig);
auto transferHandle = transferMgr->UploadFile(localFile, S3_BUCKET, s3File);
transferMgr = Aws::Transfer::TransferManager::Create(*transferConfig);
...
>>

Python Requests Post request fails when connecting to a Kerberized Hadoop cluster with Livy

I'm trying to connect to a kerberized hadoop cluster via Livy to execute Spark code. The requests call im making is as below.
kerberos_auth = HTTPKerberosAuth(mutual_authentication=REQUIRED, force_preemptive=True)
r = requests.post(host + '/sessions', data=json.dumps(data), headers=headers, auth=kerberos_auth)
This call fails with the following error
GSSException: No valid credentials provided (Mechanism level: Failed
to find any Kerberos credentails)
Any help here would be appreciated.
When running Hadoop service daemons in Hadoop in secure mode, Kerberos tickets are decrypted with a keytab and the service uses the keytab to determine the credentials of the user coming into the cluster. Without a keytab in place with the right service principal inside of it, you will get this error message. Please refer to Hadoop in Secure Mode for further details on setting up the keytab.