Random ssl handshake failure when pulling file from Amazon s3 bucket - amazon-web-services

I have a specific fetch request in my node app which just pulls a json file from my S3 bucket and stores the content within the state of my app.
The fetch request works 99% of the time but for some reason about every 4 or 5 days I get a notification saying the app has crashed and when I investigate the reason is always because of this ssl handshake failure.
I am trying to figure out why this is happening as well as a fix to prevent this in future cases.
The fetch request looks like the following and is called everything someone new visits the site, Once the request has been made and the json is now in the app's state, the request is no longer called.
function grabPreParsedContentFromS3 (prefix, callback) {
fetch(`https://s3-ap-southeast-2.amazonaws.com/my-bucket/${prefix}.json`)
.then(res => res.json())
.then(res => callback(res[prefix]))
.catch(e => console.log('Error fetching data from s3: ', e))
}
When this error happens the .catch method with get thrown and returns the following error message:
Error fetching data from s3: {
FetchError: request to https://s3-ap-southeast-2.amazonaws.com/my-bucket/services.json failed,
reason: write EPROTO 139797093521280:error:1409E0E5:SSL routines:ssl3_write_bytes:ssl handshake failure:../deps/openssl/openssl/ssl/s3_pkt.c:659:
...
}
Has anyone encountered this kind of issue before or has any idea why this might be happening? Currently I am wondering if maybe there is a limit to the amount of S3 request i can make at one time which is causing it to fail? But the site isn't super popular either to be request a huge portion of fetches.

Related

Errors connecting to AWS Keyspaces using a lambda layer

Intermittently getting the following error when connecting to an AWS keyspace using a lambda layer
All host(s) tried for query failed. First host tried, 3.248.244.53:9142: Host considered as DOWN. See innerErrors.
I am trying to query a table in a keyspace using a nodejs lambda function as follows:
import cassandra from 'cassandra-driver';
import fs from 'fs';
export default class AmazonKeyspace {
tpmsClient = null;
constructor () {
let auth = new cassandra.auth.PlainTextAuthProvider('cass-user-at-xxxxxxxxxx', 'zzzzzzzzz');
let sslOptions1 = {
ca: [ fs.readFileSync('/opt/utils/AmazonRootCA1.pem', 'utf-8')],
host: 'cassandra.eu-west-1.amazonaws.com',
rejectUnauthorized: true
};
this.tpmsClient = new cassandra.Client({
contactPoints: ['cassandra.eu-west-1.amazonaws.com'],
localDataCenter: 'eu-west-1',
authProvider: auth,
sslOptions: sslOptions1,
keyspace: 'tpms',
protocolOptions: { port: 9142 }
});
}
getOrganisation = async (orgKey) => {
const SQL = 'select * FROM organisation where organisation_id=?;';
return new Promise((resolve, reject) => {
this.tpmsClient.execute(SQL, [orgKey], {prepare: true}, (err, result) => {
if (!err?.message) resolve(result.rows);
else reject(err.message);
});
});
};
}
I am basically following this recommended AWS documentation.
https://docs.aws.amazon.com/keyspaces/latest/devguide/using_nodejs_driver.html
It seems that around 10-20% of the time the lambda function (cassandra driver) cannot connect to the endpoint.
I am pretty familiar with Cassandra (I already use a 6 node cluster that I manage) and don't have any issues with that.
Could this be a timeout or do I need more contact points?
Followed the recommended guides. Checked from the AWS console for any errors but none shown.
UPDATE:
Update to the above question....
I am occasionally (1 in 50 if I parallel call the function (5 concurrent calls)) getting the below error:
"All host(s) tried for query failed. First host tried,
3.248.244.5:9142: DriverError: Socket was closed at Connection.clearAndInvokePending
(/opt/node_modules/cassandra-driver/lib/connection.js:265:15) at
Connection.close
(/opt/node_modules/cassandra-driver/lib/connection.js:618:8) at
TLSSocket.
(/opt/node_modules/cassandra-driver/lib/connection.js:93:10) at
TLSSocket.emit (node:events:525:35)\n at node:net:313:12\n at
TCP.done (node:_tls_wrap:587:7) { info: 'Cassandra Driver Error',
isSocketError: true, coordinator: '3.248.244.5:9142'}
This exception may be caused by throttling in the keyspaces side, resulting the Driver Error that you are seeing sporadically.
I would suggest taking a look over this repo which should help you to put measures in place to either prevent the occurrence of this issue or at least reveal the true cause of the exception.
Some of the errors you see in the logs you will need to investigate Amazon CloudWatch metrics to see if you have throttling or system errors. I've built this AWS CloudFormation template to deploy a CloudWatch dashboard with all the appropriate metrics. This will provide better observability for your application.
A System Error indicates an event that must be resolved by AWS and often part of normal operations. Activities such as timeouts, server faults, or scaling activity could result in server errors. A User error indicates an event that can often be resolved by the user such as invalid query or exceeding a capacity quota. Amazon Keyspaces passes the System Error back as a Cassandra ServerError. In most cases this a transient error, in which case you can retry your request until it succeeds. Using the Cassandra driver’s default retry policy customers can also experience NoHostAvailableException or AllNodesFailedException or messages like yours "All host(s) tried for query failed". This is a client side exception that is thrown once all host in the load balancing policy’s query plan have attempted the request.
Take a look at this retry policy for NodeJs which should help resolve your "All hosts failed" exception or pass back the original exception.
The retry policies in the Cassandra drivers are pretty crude and will not be able to do more sophisticated things like circuit breaker patters. You may want to eventually use a "failfast" retry policy for the driver and handle the exceptions in your application code.

Kinesis put records not returned in response from get records request

I have a Scala app using the aws-java-sdk-kinesis to issue a series of putRecord requests to a local kinesis stream.
The response returned after each putRecord request indicates its successfully putting the records into the stream.
The scala code making the putRecordRquest:
def putRecord(kinesisClient: AmazonKinesis, value: Array[Byte], streamName: String): Try[PutRecordResult] = Try {
val putRecordRequest = new PutRecordRequest()
putRecordRequest.setStreamName(streamName)
putRecordRequest.setData(ByteBuffer.wrap(value))
putRecordRequest.setPartitionKey("integrationKey")
kinesisClient.putRecord(putRecordRequest)
}
To confirm this I have a small python app that basically consumes from the stream (initialStreamPosition: LATEST). And prints the records it finds by iterating through the shard-iterators. But unexpectedly however it returns an empty set of records for each obtained shardIterator.
Trying this using the aws cli tool, I do however get records returned for the same shardIterator. I am confused? How can that be?
Running the python consumer (with LATEST), returns:
Shard-iterators: ['AAAAAAAAAAH9AUYVAkOcqkYNhtibrC9l68FcAQKbWfBMyNGko1ypHvXlPEuQe97Ixb67xu4CKzTFFGoLVoo8KMy+Zpd+gpr9Mn4wS+PoX0VxTItLZXxalmEfufOqnFbz2PV5h+Wg5V41tST0c4X0LYRpoPmEnnKwwtqwnD0/VW3h0/zxs7Jq+YJmDvh7XYLf91H/FscDzFGiFk6aNAVjyp+FNB3WHY0d']
Records: []
If doing the "same" with the aws cli tool however I get:
> aws kinesis get-records --shard-iterator AAAAAAAAAAH9AUYVAkOcqkYNhtibrC9l68FcAQKbWfBMyNGko1ypHvXlPEuQe97Ixb67xu4CKzTFFGoLVoo8KMy+Zpd+gpr9Mn4wS+PoX0VxTItLZXxalmEfufOqnFbz2PV5h+Wg5V41tST0c4X0LYRpoPmEnnKwwtqwnD0/VW3h0/zxs7Jq+YJmDvh7XYLf91H/FscDzFGiFk6aNAVjyp+FNB3WHY0d --endpoint-url http://localhost:4567
Returns:
{"Records":[{"SequenceNumber":"49625122979782922897342908653629584879579547704307482626","ApproximateArrivalTimestamp":1640263797.328,"Data":{"type":"Buffer","data":[123,34,116,105,109,101,115,116,97,109,112,34,58,49,54,52,48,50,54,51,55,57,55,44,34,100,116,109,34,58,49,54,52,48,50,54,51,55,57,55,44,34,101,34,58,34,101,34,44,34,116,114,97,99,107,101,114,95,118,101,114,115,105,111,110,34,58,34,118,101,114,115,105,111,110,34,44,34,117,114,108,34,58,34,104,116,116,112,115,58,47,47,116,101,115,116,46,99,111,109,34,44,34,104,99,99,34,58,102,97,108,115,101,44,34,115,99,34,58,49,44,34,99,111,110,116,101,120,116,34,58,123,34,101,116,34,58,34,101,116,34,44,34,100,101,118,34,58,34,100,101,118,34,44,34,100,119,101,108,108,34,58,49,44,34,111,105,100,34,58,49,44,34,119,105,100,34,58,49,44,34,115,116,97,116,101,34,58,123,34,108,99,34,58,123,34,99,111,100,101,34,58,34,115,111,109,101,45,99,111,100,101,34,44,34,105,100,34,58,34,115,111,109,101,45,105,100,34,125,125,125,44,34,121,117,105,100,34,58,34,102,53,101,52,57,53,98,102,45,100,98,102,100,45,52,102,53,102,45,56,99,56,98,45,53,97,56,98,50,56,57,98,52,48,49,97,34,125]},"PartitionKey":"integrationKey"},{"SequenceNumber":"49625122979782922897342908653630793805399163707871723522","ApproximateArrivalTimestamp":1640263817.338,"Data":{"type":"Buffer","data":[123,34,116,105,109,101,115,116,97,109,112,34,58,49,54,52,48,50,54,51,56,49,55,44,34,100,116,109,34,58,49,54,52,48,50,54,51,56,49,55,44,34,101,34,58,34,101,34,44,34,116,114,97,99,107,101,114,95,118,101,114,115,105,111,110,34,58,34,118,101,114,115,105,111,110,34,44,34,117,114,108,34,58,34,104,116,116,112,115,58,47,47,116,101,115,116,46,99,111,109,34,44,34,104,99,99,34,58,102,97,108,115,101,44,34,115,99,34,58,49,44,34,99,111,110,116,101,120,116,34,58,123,34,101,116,34,58,34,101,116,34,44,34,100,101,118,34,58,34,100,101,118,34,44,34,100,119,101,108,108,34,58,49,44,34,111,105,100,34,58,49,44,34,119,105,100,34,58,49,44,34,115,116,97,116,101,34,58,123,34,108,99,34,58,123,34,99,111,100,101,34,58,34,115,111,109,101,45,99,111,100,101,34,44,34,105,100,34,58,34,115,111,109,101,45,105,100,34,125,125,125,44,34,121,117,105,100,34,58,34,102,53,101,52,57,53,98,102,45,100,98,102,100,45,52,102,53,102,45,56,99,56,98,45,53,97,56,98,50,56,57,98,52,48,49,97,34,125]},"PartitionKey":"integrationKey"},{"SequenceNumber":"49625122979782922897342908653632002731218779711435964418","ApproximateArrivalTimestamp":1640263837.347,"Data":{"type":"Buffer","data":[123,34,116,105,109,101,115,116,97,109,112,34,58,49,54,52,48,50,54,51,56,51,55,44,34,100,116,109,34,58,49,54,52,48,50,54,51,56,51,55,44,34,101,34,58,34,101,34,44,34,116,114,97,99,107,101,114,95,118,101,114,115,105,111,110,34,58,34,118,101,114,115,105,111,110,34,44,34,117,114,108,34,58,34,104,116,116,112,115,58,47,47,116,101,115,116,46,99,111,109,34,44,34,104,99,99,34,58,102,97,108,115,101,44,34,115,99,34,58,49,44,34,99,111,110,116,101,120,116,34,58,123,34,101,116,34,58,34,101,116,34,44,34,100,101,118,34,58,34,100,101,118,34,44,34,100,119,101,108,108,34,58,49,44,34,111,105,100,34,58,49,44,34,119,105,100,34,58,49,44,34,115,116,97,116,101,34,58,123,34,108,99,34,58,123,34,99,111,100,101,34,58,34,115,111,109,101,45,99,111,100,101,34,44,34,105,100,34,58,34,115,111,109,101,45,1pre05,100,34,125,125,125,44,34,121,117,105,100,34,58,34,102,53,101,52,57,53,98,102,45,100,98,102,100,45,52,102,53,102,45,56,99,56,98,45,53,97,56,98,50,56,57,98,52,48,49,97,34,125]},"PartitionKey":"integrationKey"}],"NextShardIterator":"AAAAAAAAAAE+9W/bI4CsDfzvJGN3elplafFFBw81/cVB0RjojS39hpSglW0ptfsxrO6dCWKEJWu1f9BxY7OZJS9uUYyLn+dvozRNzKGofpHxmGD+/1WT0MVYMv8tkp8sdLdDNuVaq9iF6aBKma+e+iD079WfXzW92j9OF4DqIOCWFIBWG2sl8wn98figG4x74p4JuZ6Q5AgkE41GT2Ii2J6SkqBI1wzM","MillisBehindLatest":0}
The actual python consumer I have used in many other settings to introspec other kinesis streams we have and its working as expected. But for some reason here its not working.
Does anyone have a clue what might be going on here?
So I was finally able to identify the issue, and perhaps it will be useful for someone else with similar problem.
In my setup, I am using a local kinesis stream (kinesalite) which doesn't support CBOR. You have to disable this explicitly otherwise I was seeing the following error when trying to deserialize the received record.
Unable to unmarshall response (We expected a VALUE token but got: START_OBJECT). Response Code: 200, Response Text: OK
In my case, setting the environment variable: AWS_CBOR_DISABLE=1 did the trick

Amazon retries upload with large files

I have a trouble, i'm using aws-sdk on the browser to upload videos from my web app to Amazon S3, it works fine with shortest files (less than 100MB) but with large files (500MB for example) amazon retries the upload.
For example, when the upload is in 28% it return back to 1%, i don't know why, i put an event listener of the upload file but it don't give any error, just return back to 1%
Literally this is my code:
const params = {
Bucket: process.env.VUE_APP_AWS_BUCKET_NAME, // It comes from my dotenv file
Key: videoPath, // It comes from a external var
ACL: 'public-read',
ContentType: this.type, // It comes from my class where i have the video type
Body: VideoObject // <- It comes from the input file
};
s3.putObject(params, function (err) {
if (err)
console.log(err);
}).on('httpUploadProgress', progress => {
console.log(progress);
console.log(progress.loaded + " - " + progress.total);
this.progress = parseInt((progress.loaded * 100) / progress.total);
});
I really would like to give more info but that's all i have, i don't know why s3 retry the upload without any error (Also i don't know how to catch s3 errors...)
My internet connection is fine, i'm using my business conection and it works fine, this issue happens with all my computers
For large object you may try upload function that supports progress tracking and multipart uploading to upload parts in parallel. Also I didn't see content length set in your example, actually uplaod accepts a stream without a content length necessarily defined.
An example: https://aws.amazon.com/blogs/developer/announcing-the-amazon-s3-managed-uploader-in-the-aws-sdk-for-javascript/
You need to use a multipart upload for large files.

Multipart file upload failing some times on docker container on AWS

I have an hapi server running on AWS docker container and it exposes a file upload API. This API runs smoothly on my local machine, but when deployed to AWS it fails some times with an error "Incomplete multipart payload". The error does not occur always, but only at some times.
The images which I am uploading are small in size(less than 100 kb) only and this failure is not because of slow network as I have tested it on multiple networks.
After debugging hapi modules for payload parsing, I have found that Pez module who is parsing the payload is throwing this error. I also noticed that when this error happens Pez modules onClose event is called and none of the parse events occurs and hence it returns the "Incomplete multipart payload" error. The Pez state is at "preamble" when this happens, for successful parse case, the state is "epilogue".
My hapi route config is
config: {
payload: {
maxBytes: 20971520,
output: 'data',
parse: true,
allow: 'multipart/form-data'
}
}
Can somebody suggest why is the parsing fails at times or why the onClose event is called before parsing happens?

Cloudfront Lambda Egde bug? Change Response Status code in viewer response

The question is: Why is not possible to change http status before return responses in ViewerResponseEvents in lambda edge?
I have a Lambda function that must check every single response and change it's status code based on JSON file that I have in my lambda function. I deployed this lambda function in Viewer response because I want that this function executes before return every single response. Aws Documentation ( https://aws.amazon.com/blogs/networking-and-content-delivery/lambdaedge-design-best-practices/ ) says if you want to execute a function for all requests it should be placed in viewer events.
So, I've created a simple function that basically is cloning and changing the http status code of response before return it. I did this code for test:
exports.handler = async (event) => {
const request = event.Records[0].cf.request;
console.log(request);
console.log(`Original response`);
const response = event.Records[0].cf.response;
console.log(`Original response`);
console.log(response);
//clone response just for change the status code
let cloneResponseReturn = JSON.parse(JSON.stringify(response));
cloneResponseReturn.status = 404;
cloneResponseReturn.statusDescription = 'Not Found';
console.log('Log Clone Response Return');
console.log(cloneResponseReturn);
return cloneResponseReturn;
};
When I access the log in cloudwatch, it shows that response has http 404 code, but for some reason, cloudfront still returning the response with 200 status code. (I've cleared browsers cache, tested it in other tools such as postman, but in all of them CloudFront returns HTTP 200)
CloudWatch Log and Response print:
If I change this function to execute in origin response it will work, but I don't want to execute it ONLY in cache miss (+as aws tell us that origin events will be executed only in that case+). As origin events are executed only in cache miss, to execute that redirects I would have to create a chache header buster to make sure that origin events will be always executed.
Is really weird this behaviour of edge lambda. Does anyone have any idea how I can solve this? I already tried to clean cache of browsers and tools that I am using for test the requests, also clean the cache of my distribution but still not working.
I've posted the question in AWS Forum a week ago but it still without answer: https://forums.aws.amazon.com/message.jspa?messageID=885516#885516
Thanks in advance.