Lambda function is (sometimes) timing out attempting to access s3 - amazon-web-services

So i am attempting to create a lambda function inside of my vpc that requires s3 access. Most of the time it goes off without a hitch, however, sometimes it will just hang on s3.getObject until the function times out, there is no error when this happens. I have set up a VPC endpoint and the endpoint is in the route table for the (private) subnet, ensured that access to the endpoint is not being blocked by either the security group or NACL, and IAM permissions all seem to be in order (though if that was the issue one would expect an error message).
I've boiled my code down to a simple get/put for the purposes of debugging this issue, but here it is in case i am missing the incredibly obvious. I've spent hours googling this, tried everything suggested/i can think of... and am basically out of ideas at this point... so i cannot emphasize enough how much i appreciate any help that can be given
Update: i have run my code from an ec2 instance inside the same vpc/subnet/security group as the lambda... seems to not have the same problem, so the issue seems to be with the lambda configuration rather than any of the network configuration
try {
const getParams = {
Bucket: 'MY_BUCKET',
Key: '/path/to/file'
};
console.log('************** about to get', getParams);
const getObject = await s3.getObject(getParams).promise();
console.log('************** gotObject', getObject);
const uploadParams = {
Bucket: 'MY_BUCKET',
Key: '/new/path/to/file',
Body: getObject.Body
};
console.log('************** about to put', uploadParams);
const putObject = await s3.putObject(uploadParams).promise();
console.log('*************** object was put', putObject);
} catch (err) {
console.log('***************** error', err);
}
};```

Related

AccessDeniedException executing GetParametersByPathCommand with NextToken is used (using #aws-sdk/client-ssm)

I'm getting
AccessDeniedException: User: arn:aws:iam::[ACCOUNTID]:user/esc-user-name is not authorized to perform: ssm:GetParametersByPath on resource: arn:aws:ssm:eu-central-1:[ACCOUNTID]:* because no identity-based policy allows the ssm:GetParametersByPath action
expection while reading the Second page, while Fist page of variables is fetched successfuly.
While deploying the Next.js application via ECS, I need to get relevant env variables during instance startup. The ESC-related role is permitted to read params on a particular path, like
/my-nextjs-app/stg/
I have no problems getting the expected result for the first page of params with
(code is not exactly the same for simplicity)
const initialCommand = new GetParametersByPathCommand({
Path: "/my-nextjs-app/stg/",
WithDecryption: true,
Recursive: true,
});
const response = await ssmClient.send(initialCommand)
As soon I receive a NextToken in the response, and trying to use it to fetch the next page in kind-of this way:
const nextCommand = new GetParametersByPathCommand({
Path: "/my-nextjs-app/stg/",
WithDecryption: true,
Recursive: true,
NextToken: response.NextToken,
});
await ssmClient.send(nextCommand)
And I got the permission denied error mentioned abowe.
It feels like when NextToken is defined in the command, SSMClient just ignores the Path param, and tries to use Token as source of all required data (I guess it somehow encoded into it, including pagination)
Giving permission to a whole arn:aws:ssm:eu-central-1:[ACCOUNTID]:* is not an option due to security reasons, and feels "dirty" anyway. My assumption is that if SSMClient was able to fetch the first page successfully - it should be able to proceed with the next ones as well with no additional permissions.
Meanwhile, using boto3 - all good with the same user\role.
Is it worth a bug report to #aws-sdk/client-ssm, or is there anything I've missed?

Errors connecting to AWS Keyspaces using a lambda layer

Intermittently getting the following error when connecting to an AWS keyspace using a lambda layer
All host(s) tried for query failed. First host tried, 3.248.244.53:9142: Host considered as DOWN. See innerErrors.
I am trying to query a table in a keyspace using a nodejs lambda function as follows:
import cassandra from 'cassandra-driver';
import fs from 'fs';
export default class AmazonKeyspace {
tpmsClient = null;
constructor () {
let auth = new cassandra.auth.PlainTextAuthProvider('cass-user-at-xxxxxxxxxx', 'zzzzzzzzz');
let sslOptions1 = {
ca: [ fs.readFileSync('/opt/utils/AmazonRootCA1.pem', 'utf-8')],
host: 'cassandra.eu-west-1.amazonaws.com',
rejectUnauthorized: true
};
this.tpmsClient = new cassandra.Client({
contactPoints: ['cassandra.eu-west-1.amazonaws.com'],
localDataCenter: 'eu-west-1',
authProvider: auth,
sslOptions: sslOptions1,
keyspace: 'tpms',
protocolOptions: { port: 9142 }
});
}
getOrganisation = async (orgKey) => {
const SQL = 'select * FROM organisation where organisation_id=?;';
return new Promise((resolve, reject) => {
this.tpmsClient.execute(SQL, [orgKey], {prepare: true}, (err, result) => {
if (!err?.message) resolve(result.rows);
else reject(err.message);
});
});
};
}
I am basically following this recommended AWS documentation.
https://docs.aws.amazon.com/keyspaces/latest/devguide/using_nodejs_driver.html
It seems that around 10-20% of the time the lambda function (cassandra driver) cannot connect to the endpoint.
I am pretty familiar with Cassandra (I already use a 6 node cluster that I manage) and don't have any issues with that.
Could this be a timeout or do I need more contact points?
Followed the recommended guides. Checked from the AWS console for any errors but none shown.
UPDATE:
Update to the above question....
I am occasionally (1 in 50 if I parallel call the function (5 concurrent calls)) getting the below error:
"All host(s) tried for query failed. First host tried,
3.248.244.5:9142: DriverError: Socket was closed at Connection.clearAndInvokePending
(/opt/node_modules/cassandra-driver/lib/connection.js:265:15) at
Connection.close
(/opt/node_modules/cassandra-driver/lib/connection.js:618:8) at
TLSSocket.
(/opt/node_modules/cassandra-driver/lib/connection.js:93:10) at
TLSSocket.emit (node:events:525:35)\n at node:net:313:12\n at
TCP.done (node:_tls_wrap:587:7) { info: 'Cassandra Driver Error',
isSocketError: true, coordinator: '3.248.244.5:9142'}
This exception may be caused by throttling in the keyspaces side, resulting the Driver Error that you are seeing sporadically.
I would suggest taking a look over this repo which should help you to put measures in place to either prevent the occurrence of this issue or at least reveal the true cause of the exception.
Some of the errors you see in the logs you will need to investigate Amazon CloudWatch metrics to see if you have throttling or system errors. I've built this AWS CloudFormation template to deploy a CloudWatch dashboard with all the appropriate metrics. This will provide better observability for your application.
A System Error indicates an event that must be resolved by AWS and often part of normal operations. Activities such as timeouts, server faults, or scaling activity could result in server errors. A User error indicates an event that can often be resolved by the user such as invalid query or exceeding a capacity quota. Amazon Keyspaces passes the System Error back as a Cassandra ServerError. In most cases this a transient error, in which case you can retry your request until it succeeds. Using the Cassandra driver’s default retry policy customers can also experience NoHostAvailableException or AllNodesFailedException or messages like yours "All host(s) tried for query failed". This is a client side exception that is thrown once all host in the load balancing policy’s query plan have attempted the request.
Take a look at this retry policy for NodeJs which should help resolve your "All hosts failed" exception or pass back the original exception.
The retry policies in the Cassandra drivers are pretty crude and will not be able to do more sophisticated things like circuit breaker patters. You may want to eventually use a "failfast" retry policy for the driver and handle the exceptions in your application code.

DNS Lookup Error when uploading to localhost (local S3 server)

In a docker container, the scality/s3server-image is running. I am connecting to it with NodeJS using the #aws-sdk/client-s3 API.
The S3Client setup looks like this:
const s3Client = new S3Client({
region: undefined, // See comment below
endpoint: 'http://127.0.0.1:8000',
credentials: {
accessKeyId: 'accessKey1',
secretAccessKey: 'verySecretKey1',
},
})
Region undefined: this answer to a similar question mentions to leave the region out, but, accessing the region with await s3Client.config.region() still displays eu-central-1, which was the value I passed to the constructor in a previous version. Although I changed it to undefined, it does still take the old configuration. Could that be connected to the issue?
It was possible to successfully create a bucket (test) and it could be listed by running a ListBucketsCommand (await s3Client.send(new ListBucketsCommand({}))).
However, as mentionned in the title, uploading content or streams to the Bucket with
bucketParams = {
Bucket: 'test',
Key: 'test.txt',
Body: 'Test Content',
}
await s3Client.send(new PutObjectCommand(bucketParams))
does not work, instead I am getting a DNS resolution error (which seems odd, since I manually typed the IP-address, not localhost.
Anyway, here the error message:
Error: getaddrinfo EAI_AGAIN test.127.0.0.1
at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:72:26) {
errno: -3001,
code: 'EAI_AGAIN',
syscall: 'getaddrinfo',
hostname: 'test.127.0.0.1',
'$metadata': { attempts: 1, totalRetryDelay: 0 }
}
Do you have any idea on
why the region is still configured and/or
why the DNS lookup happens / and then fails, but only when uploading, not when retrieving metadata about the Buckets / creating the Buckets?
For the second question, I found a workaround:
Instead of specifying the IP-Address directly, using endpoint: http://localhost:8000 (so using the Hostname instead of the IP-Adress) fixes the DNS lookup exception. However, there is no obvious reason on why this should happen.

Cloud Functions calling another Cloud Functions faces "Access is forbidden"

I'm trying to call a Cloud Function from another one and for that, I'm following this documentation.
I've created two functions. This is the code for the function that calls the other one:
const {get} = require('axios');
// TODO(developer): set these values
const REGION = 'us-central1';
const PROJECT_ID = 'my-project-######';
const RECEIVING_FUNCTION = 'hello-world';
// Constants for setting up metadata server request
// See https://cloud.google.com/compute/docs/instances/verifying-instance-identity#request_signature
const functionURL = `https://${REGION}-${PROJECT_ID}.cloudfunctions.net/${RECEIVING_FUNCTION}`;
const metadataServerURL =
'http://metadata/computeMetadata/v1/instance/service-accounts/default/identity?audience=';
const tokenUrl = metadataServerURL + functionURL;
exports.proxy = async (req, res) => {
// Fetch the token
const tokenResponse = await get(tokenUrl, {
headers: {
'Metadata-Flavor': 'Google',
},
});
const token = tokenResponse.data;
console.log(`Token: ${token}`);
// Provide the token in the request to the receiving function
try {
console.log(`Calling: ${functionURL}`);
const functionResponse = await get(functionURL, {
headers: {Authorization: `bearer ${token}`},
});
res.status(200).send(functionResponse.data);
} catch (err) {
console.error(JSON.stringify(err));
res.status(500).send('An error occurred! See logs for more details.');
}
};
It's almost identical to the one proposed in the documentation. I just added a couple of logs and I'm stringifying the error before logging it. Following the instructions on that page, I've also added to my hello-world function the permission for the my-project-#######appspot.gserviceaccount.com service account to have the roles/cloudfunctions.invoker role:
$ gcloud functions add-iam-policy-binding hello-world \
> --member='serviceAccount:my-project-#######appspot.gserviceaccount.com' \
> --role='roles/cloudfunctions.invoker'
bindings:
- members:
- allUsers
- serviceAccount:my-project--#######appspot.gserviceaccount.com
role: roles/cloudfunctions.invoker
etag: ############
version: 1
But still, when I call the code above, I get 403 Access is forbidden. I'm sure this is returned by the hello-world function since I can see the logs from the code. I can see the token and I can see the correct URL for the hello-world function in the logs. Also, I can call the hello-world function directly from GCP console. Both of the functions are Trigger type: HTTP and only hello-world function is Ingress settings: Allow internal traffic only. The other one, Ingress settings: Allow all traffic.
Can someone please help me understand what's wrong?
If your Hello world function is in Allow internal only mode this mean:
Only requests from VPC networks in the same project or VPC Service Controls perimeter are allowed. All other requests are rejected.
To reach the functions, you have to call it through your VPC. For this,
Create a serverless VPC connector in the same region of your function (take care, serverless VPC connector is not available in all region!!)
Add it in your second function
Route all the traffic to the serverless VPC connector (I'm not sure that if you route only internal traffic that works)

Api Gateway: AWS Subdomain for Lambda Integration

I'm attempting to integrate my lambda function, which must run async because it takes too long, with API gateway. I believe I must, instead of choosing the "Lambda" integration type, choose "AWS Service" and specify Lambda. (e.g. this and this seem to imply that.)
However, I get the message "AWS ARN for integration must contain path or action" when I attempt to set the AWS Subdomain to the ARN of my Lambda function. If I set the subdomain to just the name of my Lambda function, when attempting to deploy I get "AWS ARN for integration contains invalid path".
What is the proper AWS Subdomain for this type of integration?
Note that I could also take the advice of this post and set up a Kinesis stream, but that seems excessive for my simple use case. If that's the proper way to resolve my problem, happy to try that.
Edit: Included screen shot
Edit: Please see comment below for an incomplete resolution.
So it's pretty annoying to set up, but here are two ways:
Set up a regular Lambda integration and then add the InvocationType header described here http://docs.aws.amazon.com/lambda/latest/dg/API_Invoke.html. The value should be 'Event'.
This is annoying because the console won't let you add headers when you have a Lambda function as the Integration type. You'll have to use the SDK or the CLI, or use Swagger where you can add the header easily.
Set the whole thing up as an AWS integration in the console (this is what you're doing in the question), just so you can set the InvocationType header in the console
Leave subdomain blank
"Use path override" and set it to /2015-03-31/functions/<FunctionARN>/invocations where <FunctionARN> is the full ARN of your lambda function
HTTP method is POST
Add a static header X-Amz-Invocation-Type with value 'Event'
http://docs.aws.amazon.com/lambda/latest/dg/API_Invoke.html
The other option, which I did, was to still use the Lambda configuration and use two lambdas. The first (code below) runs in under a second and returns immediately. But, what it really does is fire off a second lambda (your primary one) that can be long running (up to the 15 minute limit) as an Event. I found this more straightforward.
/**
* Note: Step Functions, which are called out in many answers online, do NOT actually work in this case. The reason
* being that if you use Sequential or even Parallel steps they both require everything to complete before a response
* is sent. That means that this one will execute quickly but Step Functions will still wait on the other one to
* complete, thus defeating the purpose.
*
* #param {Object} event The Event from Lambda
*/
exports.handler = async (event) => {
let params = {
FunctionName: "<YOUR FUNCTION NAME OR ARN>",
InvocationType: "Event", // <--- This is KEY as it tells Lambda to start execution but immediately return / not wait.
Payload: JSON.stringify( event )
};
// we have to wait for it to at least be submitted. Otherwise Lambda runs too fast and will return before
// the Lambda can be submitted to the backend queue for execution
await new Promise((resolve, reject) => {
Lambda.invoke(params, function(err, data) {
if (err) {
reject(err, err.stack);
}
else {
resolve('Lambda invoked: '+data) ;
}
});
});
// Always return 200 not matter what
return {
statusCode : 200,
body: "Event Handled"
};
};