I have a Python application deployed on EKS (Elastic Kubernetes Service). This application saves large files inside an S3 bucket using the AWS SDK for Python (boto3). Both the EKS cluster and the S3 bucket are in the same region.
My question is, how is communication between the two services (EKS and S3) handled by default?
Do both services communicate directly and internally through the Amazon network, or do they communicate externally via the Internet?
If they communicate via the internet, is there a step by step guide on how to establish a direct internal connection between both services?
how is communication between the two services (EKS and S3) handled by default?
By default the network topology of your EKS offers route to the public AWS S3 endpoints.
Do both services communicate directly and internally through the Amazon network, or do they communicate externally via the Internet?
Your cluster needs to have network access to the said public AWS S3 endpoints. Example, worker nodes running in public subnet or the use of NAT gateway in private subnet.
...is there a step by step guide on how to establish a direct internal connection between both services?
You create VPC endpoints for S3 in the VPC that your EKS runs to ensure network communication with S3 stay within AWS network. VPC endpoints for S3 support both interface and gateway type. Try this article to learn about the basic of S3 endpoints, you can use the same method to create endpoints in the VPC where your EKS runs. Request to S3 from your pods will then use the endpoint to reach out to S3 within AWS network.
You can add S3 access to your EKS node IAM role, this link shows you how to add ECR registry access to EKS node IAM role, but it is the same for S3.
The other way is to make environment variables available in your container, see this link, though I would recommend the first way.
Related
I am trying to understand the concept of how VPC endpoints work and I am not sure that I understand the AWS documentation. For example, I have a private S3 bucket and I have an EKS cluster. So if my bucket is private I believe that traffic from the EKS cluster to S3 does not go through the internet, but only through the AWS network. But in a case my s3 bucket was public, then probably I will need to set up the VPC endpoint, so traffic will not leave the AWS. The same logic I would expect with ECR, if it is private you load images to your EKS through AWS network.
So what is the exact case when you need to use VPC endpoint within your AWS account (not from on-prem or another VPC)?
VPC endpoints are typically used with public AWS services (such as S3, DynamoDB, ECR, etc.) when the client applications are hosted inside your VPC and you do not want to route traffic via public Internet, which would otherwise result in a number of hops to reach the AWS service.
Imagine a situation when you have an app running on an EC2 instance, which is deployed to a private subnet of your VPC (i.e. a Pod in your EKS cluster). This app reads/writes data from/to AWS S3. If you do not use a VPC endpoint, your traffic will first reach your NAT gateway, then your VPC's Internet gateway out to the public Internet. Eventually, it will hit AWS S3. The response will travel back via the same route.
Same thing with ECR (i.e. a new instance of your Kubernetes Pod started by the kubelet). It's better (i.e. quicker) to pick the shortest route to download a Docker image from ECR rather than traverse a number of switches/routers. With a VPC endpoint your traffic will first hit the VPC endpoint (without leaving your private subnet) and then reach e.g. ECR directly (traffic does not leave the Amazon network).
As correctly mentioned by #jarmod, one should differentiate between routing (Layer 3 in the OSI model) and authentication/authorization (Layer 7). For example, you can use a VPC endpoint to reach AWS S3, but not be authorized (or even unauthenticated) to e.g. read a file from an S3 bucket.
Hope this clarifies the idea behind using VPC endpoints.
I want to execute AWS CLI commands of RDS not via the internet, but via a VPC network for mainly creating manual snapshots of RDS.
However, VPC endpoints support only RDS Data API according to the following document:
VPC endpoints - Amazon Virtual Private Cloud
Why? I need to execute a command within closed network for security rules.
Just to reiterate you can still connect to your RDS database through the normal private network using whichever library you choose to perform any DDL, DML, DCL and TCL commands. Although in your case you want to create a snapshot which is via the service endpoint.
VPC endpoints are to connect to the service APIs that power AWS (think the interactions you perform in the console, SDK or CLI), at the moment this means for RDS to create, modify or delete resources you need to use the API over the public internet (using HTTPS for encrypted traffic).
VPC endpoints are added over time, just because a specific API is not there now does not mean it will never be there. There is an integration that has to be carried out by the team of that AWS service to allow VPC endpoints to work.
Google failed me again or may be I wasnt too clear in my question.
Is there an easy way or rather how do we determine what services are VPC bound and what services are non-vpc ?
For example - EC2, RDS require a VPC setup
Lambda, S3 are publicly available services and doesn't need a VPC setup.
The basic services that require an Amazon VPC are all related to Amazon EC2 instances, such as:
Amazon RDS
Amazon EMR
Amazon Redshift
Amazon Elasticsearch
AWS Elastic Beanstalk
etc
These resources run "on top" of Amazon EC2 and therefore connect to a VPC.
There are also other services that use a VPC, but you would only use them if you are using some of the above services, such as:
Elastic Load Balancer
NAT Gateway
So, if you wish to run "completely non-vpc", then avoid services that are "deployed". It means you would use AWS Lambda for compute, probably DynamoDB for database, Amazon S3 for object storage, etc. This is otherwise referred to as going "serverless".
When to use aws direct connect and aws storage gateway. My question is these services seems to be similer, so what are use cases to use these two services.
AWS Direct Connect is a network connection between AWS and on on-premises network. The physical connection is an optical fiber link organised through a Telco, while Direct Connect provisions the physical port where the fiber connects in an AWS transit center.
AWS Storage Gateway is a storage service that provisions a virtual tape drive, virtual S3 drive or virtual disk that is stored in AWS. It typically runs across a Direct Connect connection.
AWS direct connect connect the in premisses resources with any services, while AWS storage gateway used to connects to S3 services including AWS S3 Glacier only.
This is one of the difference.
"AWS Direct Connect is a network service that provides an alternative to using the Internet to connect customer's on-premise sites to AWS" (AWS Docs).
"AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage" (AWS Docs).
Direct Connect creates a private network connection btwn AWS and on-prem resources while Storage Gateway enables you to store and retrieve Amazon S3 objects through standard file storage protocol.
Storage Gateway - As the name suggests, this service is used to connect on-premises infra with STORAGE services (specifically S3, FSx, EBS)
Direct Connect - This service is used to connect on-premises infra with any AWS resources (in any region)
Is there a way to use AWS CLI to call different services such as SQS, EC2, SNS from EC2 linux instance?
The EC2 instance from where the AWS CLI command are invoked does not have access to internet. It is in private subnet. It is not using internet gateway or NAT.
Thanks,
Not possible. The CLI has to access the API endpoints for all the services you mentioned. For that the CLI needs internet access. Only service it can access without internet is the internal metadata server.
AWS Regions and Endpoints
VPC endpoints create a private connection between your VPC and an AWS service. However, currently the only supported service is S3 and none of the services listed in your question.
Currently, we support endpoints for connections with Amazon S3 only.
We'll add support for other AWS services later. Endpoints are
supported within the same region only.