Install Cloudwatch agent on EC2 in a private subnet - amazon-web-services

I am trying to install the cloud watch agent on an EC2 instance behind a private subnet (no internet access). All the documentation online seems to get the RPM using the internet (either through S3 download links or AWS System manager). What I am trying to figure out is how to get the RPM without the internet. I have a VPCE setup for s3 which is able to get objects from my own bucket, however, as per my understanding, it doesn't work with download links.
Documentation I am trying to follow:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/download-cloudwatch-agent-commandline.html
If it matters I am using terraform to deploy my infra.
Is there a solution for this?
UPDATE:
Shell script for EC2 instance launch
#!/bin/bash
cd /home/ec2-user
aws s3 cp s3://${bucket_name}/${zip_file} ${zip_file} --region ${region}
wget https://s3.us-east-1.amazonaws.com/amazoncloudwatch-agent-us-east-1/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
VPC route table:
101.0.0.0/16 local active No
pl-xxxxx (com.amazonaws.us-east-1.s3, 54.231.0.0/17, 52.216.0.0/15, 3.5.16.0/21, 3.5.0.0/20) vpce-xxxxxxx active. No

To access internet from private subnet, you generally need:
NAT gateway or NAT instance in a public subnet(s)
Modified route tables of the private subnet(s) to point internet traffic (0.0.0.0/0) to the NAT devices.
The alternative is to store CloudWatch Agent in S3 and download it from there via S3 VPC Gateway. If this does not work, have to verify your VPC Endpoint settings and route tables.
Also you can prepare a golden AMI image in the public subnet with the agent and any other software which requires internet to be installed. Then you deploy your instances in the private subnet from the AMI.

The S3 download link is already provided in the documentation for all the regions. Since you've already set up the S3 gateway VPC endpoint, if you use the region-specific S3 download link of your region, it will work like a charm. You don't need NAT or anything else.

Related

aws vpc endpoints - how it works?

I am trying to understand the concept of how VPC endpoints work and I am not sure that I understand the AWS documentation. For example, I have a private S3 bucket and I have an EKS cluster. So if my bucket is private I believe that traffic from the EKS cluster to S3 does not go through the internet, but only through the AWS network. But in a case my s3 bucket was public, then probably I will need to set up the VPC endpoint, so traffic will not leave the AWS. The same logic I would expect with ECR, if it is private you load images to your EKS through AWS network.
So what is the exact case when you need to use VPC endpoint within your AWS account (not from on-prem or another VPC)?
VPC endpoints are typically used with public AWS services (such as S3, DynamoDB, ECR, etc.) when the client applications are hosted inside your VPC and you do not want to route traffic via public Internet, which would otherwise result in a number of hops to reach the AWS service.
Imagine a situation when you have an app running on an EC2 instance, which is deployed to a private subnet of your VPC (i.e. a Pod in your EKS cluster). This app reads/writes data from/to AWS S3. If you do not use a VPC endpoint, your traffic will first reach your NAT gateway, then your VPC's Internet gateway out to the public Internet. Eventually, it will hit AWS S3. The response will travel back via the same route.
Same thing with ECR (i.e. a new instance of your Kubernetes Pod started by the kubelet). It's better (i.e. quicker) to pick the shortest route to download a Docker image from ECR rather than traverse a number of switches/routers. With a VPC endpoint your traffic will first hit the VPC endpoint (without leaving your private subnet) and then reach e.g. ECR directly (traffic does not leave the Amazon network).
As correctly mentioned by #jarmod, one should differentiate between routing (Layer 3 in the OSI model) and authentication/authorization (Layer 7). For example, you can use a VPC endpoint to reach AWS S3, but not be authorized (or even unauthenticated) to e.g. read a file from an S3 bucket.
Hope this clarifies the idea behind using VPC endpoints.

Security Group settings for using sagemaker notebooks in private subnet

I am new to sagemaker, and am hoping to use sagemaker in a VPC with a private subnet, so data accessed from s3 is not exposed to public internet.
I have created a vpc with a private subnet (no internet or nat gateway), and have attached a vpc s3 gateway endpoint - with this, can I apply the subnet's default security group settings to the sagemaker notebook instances? ..or are some additional configurations to this required?
Also, I'm hoping to keep internet access for the sagemaker notebook instance, so I can still download python packages (but just wanting to ensure data read from s3 using the private subnet is all okay with its default security group)
Thank you

Install the AWS Cloudwatch Agent from a S3 VPC endpoint

To keep our resources on AWS secure, we are trying to block access to the internet for our EC2 instances unless we explicitly need it. We have one EC2 instance (Ubuntu) running that we want to install the AWS cloudwatch agent on. The default way to do this is to use wget to download the installation files from an s3-internal address (as seen in the linked article).
We now want to replace the public access our EC2 instance has to the internet with VPC endpoints. I created an interface endpoint for global S3 access and S3 access in our region each. Optimally, the EC2 instance would now connect through our endpoint to the S3 bucket to download the resources from the AWS address.
How can I now access the files from my EC2 instance using wget? The article lists an url option for the global s3 access and another url for regional S3 access, but I can not get a connection using either. Here's a few examples of urls I tried:
wget https://accesspoint.s3-global.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
wget https://s3.vpce-123456.s3.eu-central-1.vpce.amazonaws.com/amazoncloudwatch-agent-eu-central-1/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
wget https://amazoncloudwatch-agent-eu-central-1.vpce-123456.s3.eu-central-1.vpce.amazonaws.com/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
Note that accesspoint.s3-global.amazonaws.com is the internal private DNS entry created by the global s3 service endpoint (automatically), and *.vpce-123456.s3.eu-central-1.vpce.amazonaws.com is an example for one of the DNs entries created by the regional S3 service endpoint.
Make sure that you have updated the route table of your subnet. Add the rule that routes the traffic to the endpoint gateway (since we are talking about S3).

Route table for docker hub and vpc endpoints for private hosted instances: AWS

I have a docker image which is just an Java application. The java application reads data from DynamoDB and S3 buckets and outputs something (its a test app). I have hosted the docker images onto public docker-hub repo.
In AWS, i have created private subnet which is hosting an EC2 via AWS ECS. Now to have security high; i am using VPC Endpoints for DynamoDB and S3 bucket operations for the containers.
And i have used NAT Gateway to allow EC2 to pull docker images from docker-hub.
Problem:
When i remove VPC Endpoint, the application is able to read DynamoDB and S3 via NAT. Which means the traffic is going through public network.
Thoughts:
Can not whitelist the Ip addresses of Dockerhub as it can change.
Since AWS ECS handles all the docker pull etc tasks, i do not have control to customize.
I do not want to use AWS container registry. I prefer dockerhub.
DynamoDB/S3 private addresses are not known
Question:
How to make sure that traffic for docker hub should only be allowed via NAT?
How to make sure that the DynamoDB and S3 access should be via Endpoints only?
Thanks for your help
IF you want to restrict outbound traffic over your NAT (by DNS hostname) to DockerHub only you will need a third party solution that can allow or deny outbound traffic before it traverses the internet.
You would install this appliance in a separate subnet which has NAT Gateway access. Then in your existing subnet(s) for ECS you would update the route table to have the 0.0.0.0/0 route speak to this appliance (by specifying its ENI). If you check the AWS marketplace there may be a solution already in place to fulfil the domain filter.
Alternatively you could automate a tool that is able scrape the whitelisted IP addresses for DockerHub, and then have it add these as allow all traffic rules with a NACL. This NACL would only be applied to the subnets that the NAT Gateway resides in.
Regarding your second question, from the VPC point of view by adding the prefix list of the S3 and DynamoDB endpoints to the route table it will forward any requests that hit these API endpoints through the private route.
At this time DynamoDB does not have the ability to prevent public routed interaction, however S3 does. By adding a condition of the VPCE to its bucket policy you can deny any access that tries to interact outside of the listed VPC Endpoint. Be careful not to block yourself access from the console however, by blocking only the specific verbs that you don't want allowed.

How can AWS Glue access IP whitelisted resource

If I have a service that needs to have IP whitelisting, how can I connect AWS Glue to it? I read that I seem to be able to put AWS Glue in a private VPC and configure a NAT gateway. I can then allow my NAT IP to connect to the service. However, I cannot find anyway to configure my Glue Job to run inside a subnet/VPC. How do I do this?
The job will run automatically in a VPC if you attach a Database connection to a resource which is inside the VPC. For example, I have a job that reads data from S3 and writes into an Aurora database in a private VPC using a Glue connection (configured as JDBC).
That job automatically has access to all the resources inside the VPC, as explained here. If the VPC has enabled NAT for external access, then your job can also take advantage of that.
Note if you use a connection that requires VPC and you use S3, you will need to enable an endpoint for S3 in that VPC as well.
The answer for your question is answered here -- https://stackoverflow.com/a/64414639 Note that Glue is a 'managed' service so it does not release any list IP addresses such that can be whitelisted. As a workaround you can use a EC2 instance to run your custom python OR pyspark script and whitelist the IP address of that particular EC2 instance