AWS Lambda access to Redshift, S3 and Secrets Manager - amazon-web-services

I am new to AWS and trying to wrap my head around how I can build a data pipeline using Lambda, S3, Redshift and Secrets Manager. I have searched the web, read a number of documents/tutorials, yet I am still a bit confused as to how I can configure this properly.
In my stack, Lambda will be the core of the tooling, where lambda will need to call out to external APIs, write/read data to S3, access Secrets Manager and be able to connect to redshift for data loading and querying.
My question. What do I have for options to configure this setup and allow for lambda to access all of the necessary tools/services?
For context, I have been able to poke around and get most things working, but access to Redshift is what has slowed me down. If I put the lambda into the same VPC as Redshift (default), I lose access to everything else, so I am not certain as to how to proceed.

For context, I have been able to poke around and get most things working, but access to Redshift is what has slowed me down. If I put the lambda into the same VPC as Redshift (default), I lose access to everything else, so I am not certain as to how to proceed.
A Lambda running in a VPC does not ever get a public IP address. This causes issues when it tries to access things outside the VPC, such as S3 and Secrets Manager.
There are two ways to fix this access issue:
Move the Lambda function to private VPC subnets with a route to a NAT Gateway.
Add VPC Gateways to your VPC for the AWS services you need.
Since you only need your Lambda function to access other AWS services, and not the Internet, you should add an S3 VPC Gateway, and a Secrets Manager VPC Gateway to your VPC.

Related

AWS Lambda can't retrieve file from an Amazon S3 Bucket in same network

I have a AWS project that contains a S3 bucket, RDS database and Lambda functions.
I want Lambda to have access to both the S3 bucket and the RDS database.
The Lambda functions connects to the RDS database correctly but it times out when trying to retrieve an object from the S3 bucket:
Event needs-retry.s3.GetObject: calling handler <bound method S3RegionRedirectorv2.redirect_from_error of <botocore.utils.S3RegionRedirectorv2 object at 0x7f473a4ae910>>
...
(some more error lines)
...
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://{bucket name}.s3.eu-west-3.amazonaws.com/{file name}.tar.gz"
So I understand that the reason would be that Lambda doesn't have internet access and therefor my options are:
VPC endpoint (privatelink): https://aws.amazon.com/privatelink
NAT gateway for Lambda
But both go over the cloud (in same region), which doesn't make any sense as they are both in the same project.
It's just a redundant cost for such a detail and there must be a better solution right?
Maybe it helps you to think of the S3 bucket "in the same project" as having permission to use an object system that resides in a different network outside your own. Your lambda is in VPC but S3 objects are not in your VPC. You access them using either public end-points (over the internet) or privately by establishing S3 Gateway endpoint or VPC Interface Endpoint. Neither uses public internet.
As long as you are staying in the same region, S3 gateway endpoint actually does not cost you money but if you need to cross regions, you will need to use VPC Interface endpoint. The differences are documented here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html
If you are trying to avoid costs, S3 gateway might work for you, however, you will need to update your route tables that's associated with the gateway. The process is documented here: https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html

Create s3 endpoint to a different region than current one

I have an account in region ca-central. I wish to make call to a Public S3 bucket located in us-east.
As much as possible, these call have to be make through https (I am actually using apt-get), but if not possible I can try to use CLI call to download my data.
I can not exit public network due to firewall limitations, I need to stay internal to AWS network.
Can I do it through a S3 endpoint? The only endpoint I can create are connected to my current region (so ca-central). Or the only way is to do it through public network?
Yes, regardless of which region the bucket lives in, you will be able to reach it via AWS private link with either a VPC interface or gateway endpoint deployed correctly in the region you're working in.

Export data from OpenSearch in private VPC and import it to local running container - aws opensearch

I'm using aws OpenSearch in a private vpc.
I've about 10000 entries under some index.
For local development i'm running an local OpeanSearch container and i'd like to export all the entries from the OpenSearch service into my local container.
I can get all the entries from the OpeanSerch API but the format of the response is different then the format that should be when doing _bulk operation.
Can someone please tell me how should i do it?
Anna,
There are different strategies you can take to accomplish this, considering the fact that your domain is running in a private VPC.
Option 1: Exporting and Importing Snapshots
From the security standpoint, this is the recommended option, as you are moving entire indices out of the service without exposing the data. Please follow the AWS official documentation about how to create custom index snapshots. Once you complete the steps, you will have an index snapshot stored on an Amazon S3 bucket. After this, you can securely download the index snapshot to your local machine, then follow the instructions on the official OpenSearch documentation about how to restore the index snapshots.
Option 2: Using VPC Endpoints
Another way for you to export the data from your OpenSearch domain is accessing the data via a alternate endpoint using the VPC Endpoints feature from AWS OpenSearch. It allows you to to expose additional endpoints running on public or private subnets within the same VPC, different VPC, or different AWS accounts. In this case, you are essentially create a venue to access the OpenSearch REST APIs outside of the private VPC, to which you need to take care of who other than you will be able to do so as well. Please follow the best practices related to secure endpoints if you follow this option.
Option 3: Using the ElasticDump Open Source Utility
The ElasticDump utility allows you to retrieve data from Elasticsearch/OpenSearch clusters in a format of your preference, and then import that data back to another cluster. It is a very flexible way for you to move data around—but it requires the utility to access the REST API endpoints from the cluster. Run this utility in a bastion server that has ingress access to your OpenSearch domain in the private VPC. Keep in mind, though, that AWS doesn't provide any support to this utility, and you must use it at your own risk.
I hope that helps with your question. Let us know if you need any more help on this. 🙂

Request to S3 from Lambda without leaving AWS Cloud

I have a lambda function accessing a S3 bucket using aws-sdk
There are a high number of operations(requests) to the S3 bucket, which is increasing considerably the cost to use lambda
I was hoping that the requests use the s3:// protocol but there are going over the internet
I understand that one solution could be:
Attach the Lambda to a VPC
Create a VPC endpoint to S3
Update the route tables of the VPC
Is there a simpler way to do so?
An alternative could be creating an API Gateway, and creating lambda proxy method integration following the AWS Guide or Tutorial.
You can then configure your apigateway to act as your external facing integration over the internet and your lambda / s3 stays within AWS.
The traffic won't go over the internet and incur additional data transfer cost as long as the non-VPC lambda function is executing in the same region as the S3 bucket. So VPC is not needed in this case.
https://aws.amazon.com/s3/pricing/
You pay for all bandwidth into and out of Amazon S3, except for the following:
• Data transferred in from the internet.
• Data transferred out to an Amazon Elastic Compute Cloud (Amazon EC2) instance, when the instance is in the same AWS Region as the S3 bucket.
• Data transferred out to Amazon CloudFront (CloudFront).
You can think of lambda as ec2. So the data transfer is free but be careful you still need to pay for api request.

AWS Lambda can't reach resources created from MobileHub

I am having an issue accessing resources created in MobileHub from Lambda and that does not make sense for me at all.. I have two questions (maybe it is the same question..):
Why lambda can't access all resources created by MobileHub when it has fullAccess permissions to those specific resources? I mean, if I create those resources separately I can access them but not created ones from MobileHub..
Is there a way to grant access to these resources or am I missing something?
Update
The issue was VPC. Basically when I enabled VPC on lambdas to reach rds which have no public access I couldn't reach any other resources, when I disabled it - RDS was unreachable. The question is how to combine vpc with role policies?
You can find the resources associated with your project using the left-side navigation in the Mobile Hub console and select "Resources." If you want to enable your AWS Lambda functions to be able to make use of any AWS resources, then you'll need to add an appropriate IAM Policy to the Lambda Execute IAM Role. You can find this role in your project on the "Resources" page under "AWS Identity and Access Management Roles." It is the role that has "lambdaexecutionrole" in the name. Select this role then attach whatever policies you like in the IAM (Identity and Access Management) console.
For more information on how to attach roles to polices, see:
http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_manage_modify.html
And, if you have further problems, you can get help from the AWS community in the forums, here:
https://forums.aws.amazon.com/forum.jspa?forumID=88
**Update - WRT VPC Question**
This question should really go to an expert on the AWS Lambda team. You can reach them in the AWS Forums (link above). However, I'll take a shot at answering (AWS Lambda experts feel free to chime in if I'm wrong here). When you set the VPC on the Lambda function, I expect that any network traffic coming from your Lambda function will have the same routing and domain name resolution behavior as anything else in your VPC. So, if your VPC has firewall rules which prevent traffic from the VPC to, for example, DynamoDB, then you won't be able to reach it. If that is the case, then you would need to update those rules in your VPC's security group(s) to open up out-going traffic. Here's a blurb from a relevant document.
From https://aws.amazon.com/vpc/details/:
*AWS resources such as Elastic Load Balancing, Amazon ElastiCache, Amazon RDS, and Amazon Redshift are provisioned with IP addresses within your VPC. Other AWS resources such as Amazon S3 and Amazon DynamoDB are accessible via your VPC’s Internet Gateway, NAT gateways, VPC Endpoints, or Virtual Private Gateway.*
This doc seems to explain how to configure the gateway approach:
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints.html