AWS EMR on VPC with EC2 Instance - amazon-web-services

I am doing a reading on AWS EMR on VPC but it seems like it is more of design consideration for AWS EMR Service to access EMR cluster for calls.
What I am trying to do is host a VPC with ALB and EC2 instance running an application as a service to access EMR cluster.
VPC -> Internet Gateway -> Load Balancer -> EC2 (Application endpoints) -> EMR Cluster
I don't want Cluster to be accessible from outside except through Public IP of IG. But Public IP can access only EC2 instance hosting application which calls EMR cluster on same VPC.
Is it recommended approach?
The design looks something like below.
Some challenges I am tackling is how to access S3 from EMR if on VPC,
and if the application is running on EC2 can it access EMR cluster, and if EMR cluster would be available publicly?
Any guidance links or recommendations would be welcome.
EDIT:
Or if I create EMR on VPC do i need to wrap it inside of another VPC something like below?

The simplest design is:
Put everything in a public subnet in a VPC
Use Security Groups to control access to the EMR cluster
If you are security-paranoid, then you could use:
Put publicly-accessible resources (eg EC2) in a public subnet
Put EMR in a private subnet
Use a NAT Gateway or VPC-Endpoints to allow EMR to communicate with S3 (which is outside the VPC)
The first option is simpler and Security Groups act as firewalls that can fully protect the EMR cluster. You would create three security groups:
ELB-SG: Permit inbound access from the Internet on your desired ports. Associate the security group with your Load Balancer.
EC2-SG: Permit inbound access from ELB-SG (from the Security Group itself). Associate the security group with your EC2 instances.
EMR-SG: Permit inbound access from EC2-SG (from the Security Group itself). Associate EMR-SG with the EMR cluster.
This will permit only the Load Balancer to communicate with the EC2 instances and only the EC2 instances to communicate with the EMR cluster. The EMR cluster will be able to connect directly to the Internet to access Amazon S3 due to default rules permitting Outbound access.

Related

Connection from Lambda to RDS in a different account

I have an RDS in one AWS Account - say Acct-1.
The RDS is public (i know it's not a good idea and there are other solutions for that)
I have a lambda in another AWS Account - say Acct-2 which runs in a VPC.
I have setup VPC peering between the 2 accounts, the route table entries are in place as well as the security groups IN/OUT bound policies in place.
In Acct-2 I can verify that I can connect to the RDS instance in Acct-1 using a mysql cient from an EC2 instance. The EC2 instance is in the same subnet as the Lambda and they both have the same security group.
But the Lambda gets a timeout connection. The Lambda has the typical Lambda execution role that Allows logs, and network interfaces.
Thoughts on what could be missing ? Does the RDS need to grant specific access to the Lambda service even if it's running in a VPC ?
Clarification: There is no route to the RDS instance from the internet. Clearly, the ec2 host is able to resolve the Private IP for the RDS instance from the DNS name and connect.
Lambda is unable to resolve the private IP for the RDS instance.
I'm trying to keep the traffic within AWS so as to not pay egress costs.

How can I access aws resources in VPC from AWS glue?

I have a glue job which is hitting an API hosted over an EC2 instance.
The problem is EC2 instance resides within a VPC blocking all public access.
I tried creating an endpoint interface in my VPC but still can't access the REST API.
The host is always unreachable but when I try to access the API from VPC it is working fine.
The security group associated with the EC2 instance is used while creating the VPC Endpoint.
Any help is appreciated
If you go to AWS Glue console, under connections, create a connection. What is meant by a dummy connection, is just be a non-existent database or resource for example: jdbc:mysql://some-fake-endpoint-here:3306/mydb. After this you choose the correct VPC, subnet and security group. Which means a test connection will not work in this context but what it brings is a way to introduce your VPC, Subnet and Security group information to the job. Testing such a connection can be done using a python-shell job or launch an ec2 instance in the same vpc or same subnet and run something like nc -vz endport port.
This connection metadata information will facilitate the launching of elastic network interfaces in your account that allow glue DPUs to communicate with your resource at runtime. More on how connections in glue is discussed here.

Cross Account DMS Replication for RDS instances behind Bastion machines

I have production stacks inside a Production account and development stacks inside a Development account. The stacks are identical and are setup as follows:
Each stack as its own VPC.
Within the VPC are two public subnets spanning to AZs and two private subnets spanning to AZs.
The private Subnets contain the RDS instance.
The public Subnets contain a Bastion EC2 instance which can access the RDS instance.
To access the RDS instance, I either have to SSH into the Bastion machine and access it from there, or I create an SSH tunnel via the Bastion to access it through a Database client application such as PGAdmin.
Current DMS setup:
I would like to be able to use DMS (Database Migration Service) to replication an RDS instance from Production into Development. So far I am trying the following but cannot get it to work:
Create a VPC peering connection between Development VPC and Production VPC
Create a replication instance in the private subnet of the Development VPC
Update the private subnet route tables in the development VPC to route traffic to the CIDR of the production VPC through the VPC peering connection
Ensure the Security group for the replication instance can access both RDS instances.
Main Problem:
When creating the source endpoint in DMS, the wizard only shows RDS instances from the same account and the same region, and only allows RDS instances to be configured using server names and ports, however, the RDS instances in my stacks can only be accessed via Bastion machines using tunnelling. Therefore the test endpoint connection always fails.
Any ideas of how to achieve this cross account replication?
Any good step by step blogs that detail how to do this? I have found a few but they don't seem to have RDS instances sitting behind bastion machines and so they all assume the endpoint configuration wizard can be populated using server names and ports.
Many thanks.
Securing the RDS instances via the Bastion host is sound security practice, of course, for developer/operational access.
For DMS migration service however, you should expect to open security group for both the Target and Source RDS database instances to allow the migration instance to have access to both.
From Network Security for AWS Database Migration Service:
The replication instance must have access to the source and target endpoints. The security group for the replication instance must have network ACLs or rules that allow egress from the instance out on the database port to the database endpoints.
Database endpoints must include network ACLs and security group rules that allow incoming access from the replication instance. You can achieve this using the replication instance's security group, the private IP address, the public IP address, or the NAT gateway’s public address, depending on your configuration.
See
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Security.Network.html
For network addressing and to open the RDS private subnet, you'll need a NAT on both source and target. They can be added easily, and then terminated after the migration.
You can now use Network Address Translation (NAT) Gateway, a highly available AWS managed service that makes it easy to connect to the Internet from instances within a private subnet in an AWS Virtual Private Cloud (VPC).
See
https://aws.amazon.com/about-aws/whats-new/2015/12/introducing-amazon-vpc-nat-gateway-a-managed-nat-service/

Unsecured traffic between two ec2 instances within the same vpc

Is it save to transfer unsecured http messages between two ec2 instances within the same vpc in aws?
Or is it necessary to use ssh tunneling etc?
It's safe in the sense that only your instances exist in the VPC. So the traffic between your two instances in your VPC cannot be sniffed by a 3rd party.
Amazon Virtual Private Cloud (Amazon VPC) lets you provision a
logically isolated section of the Amazon Web Services (AWS) Cloud
where you can launch AWS resources in a virtual network that you
define. You have complete control over your virtual networking
environment, including selection of your own IP address range,
creation of subnets, and configuration of route tables and network
gateways.
Source: Amazon VPC

Connecting Kubernetes minions to classic (non-VPC) AWS resources

I'm looking to spin up a Kubernetes cluster on AWS that will access resources (e.g. RDS, ElastiCache) that are not on a VPC.
I was able to set up access to RDS by enabling ClassicLink on the kubernetes-vpc VPC, but this required commenting out the creation of one of Kubernetes' route tables (which conflicted with ClassicLink's route tables), which breaks some of Kubernetes networking. ElastiCache is more difficult, as it looks like its access is only grantable via classic EC2 security groups, which can't be associated with a VPC EC2 instance, AFAICT.
Is there a way to do this? I'd prefer not to use a NAT instance to provide access to ElastiCache.