Limit port 8080 access to SNS - amazon-web-services

I am using SNS to connect to a Tomcat server on port 8080. The server runs on AWS/EC2. It is not a public server, I use it only to execute my code (triggered by the Notification delivered to it.)
How can I set up the "inbound rules" on my EC2 box so that only SNS can reach it? When I block the port, messages are not delivered. If I restrict it to the EC2 internal or external IP, the same. Apparently Notifications are delivered from "somewhere" that is not documented. And/or it is not a fixed IP [range]?
[I know how to secure the Tomcat server itself, but it would be nice if random port-scans can't even get to the server. I do see a number of (so far unsuccessful) access attempts in the Tomcat log.]

Related

AWS keep site to site VPN connection alive

We have a site to site VPN connection from our AWS cloud to the customer's on site network. Our web application login requires the authentication from the customer's active directory and hence the need for VPN connection.
When our application is not being used for a while the VPN tunnel goes down, due to which when a user tries to log into the application he is unable to due to downed tunnel. It takes some time for the tunnel to get up after which everything works properly.
I had a call with the customer's IT people and it seems they have set up a keep alive bit (DPD settings) on their end but still the tunnel keeps going down. AWS support isn't much of a help either.
I google around and discovered that one way we can keep the tunnel alive is by "sending a ping to the target from the device sourced from the outside interface. A possible destination for the ping is an instance within the VPC"
AWS documentation also suggests "to create a host that sends ICMP requests to an instance in your VPC every 5 seconds."
I already have an private subnet EC2 instance (with only private IP) in my VPC.
My question is, do I need to create another ec2 instance in my VPC private subnet and ping the first one from the other every 5 seconds?
Would I need to write a shell script for this?
I am basically confused about from where to ping, whom to ping and how to ping.
Ping any remote AWS instance from your on-premise site, thereby causing traffic over the vpn. Just schedule it in windows task scheduler, and use the basic command line ping.

AWS Security Group connection tracking failing for responses with a body in ASP.NET Core app running in ECS + Fargate

In my application:
ASP.NET Core 3.1 with Kestrel
Running in AWS ECS + Fargate
Services run in a public subnet in the VPC
Tasks listen only in the port 80
Public Network Load Balancer with SSL termination
I want to set the Security Group to allow inbound connections from anywhere (0.0.0.0/0) to port 80, and disallow any outbound connection from inside the task (except, of course, to respond to the allowed requests).
As Security Groups are stateful, the connection tracking should allow the egress of the response to the requests.
In my case, this connection tracking only works for responses without body (just headers). When the response has a body (in my case, >1MB file), they fail. If I allow outbound TCP connections from port 80, they also fail. But if I allow outbound TCP connections for the full range of ports (0-65535), it works fine.
I guess this is because when ASP.NET Core + Kestrel writes the response body it initiates a new connection which is not recognized by the Security Group connection tracking.
Is there any way I can allow only responses to requests, and no other type of outbound connection initiated by the application?
So we're talking about something like that?
Client 11.11.11.11 ----> AWS NLB/ELB public 22.22.22.22 ----> AWS ECS network router or whatever (kubernetes) --------> ECS server instance running a server application 10.3.3.3:8080 (kubernetes pod)
Do you configure the security group on the AWS NLB or on the AWS ECS? (I guess both?)
Security groups should allow incoming traffic if you allow 0.0.0.0/0 port 80.
They are indeed stateful. They will allow the connection to proceed both ways after it is established (meaning the application can send a response).
However firewall state is not kept for more than 60 seconds typically (not sure what technology AWS is using), so the connection can be "lost" if the server takes more than 1 minute to reply. Does the HTTP server take a while to generate the response? If it's a websocket or TCP server instead, does it spend whole minutes at times without sending or receiving any traffic?
The way I see it. We've got two stateful firewalls. The first with the NLB. The second with ECS.
ECS is an equivalent to kubernetes, it must be doing a ton of iptables magic to distribute traffic and track connections. (For reference, regular kubernetes works heavily with iptables and iptables have a bunch of -very important- settings like connection durations and timeouts).
Good news is. If it breaks when you open inbound 0.0.0.0:80, but it works when you open inbound 0.0.0.0:80 + outbound 0.0.0.0:*. This is definitely an issue due to the firewall dropping the connection, most likely due to losing state. (or it's not stateful in the first place but I'm pretty sure security groups are stateful).
The drop could happen on either of the two firewalls. I've never had an issue with a single bare NLB/ELB, so my guess is the problem is in the ECS or the interaction of the two together.
Unfortunately we can't debug that and we have very little information about how this works internally. Your only option will be to work with the AWS support to investigate.

KAFKA consumer setup behind a proxy on prem. Producer is in AWS

We have a KAFKA setup in AWS and we publish message from the publisher there, we want to consume those messages through Kafka consumer from on-prem boxes which have access to the internet through a proxy. Is there any setup in the KAFKA consumer so that we can update the proxy detail.
Note: We are able to connect and get a message from a local box (Kafka consumer) which has direct access to the internet (without proxy).
You need to set advertised.listeners to the external IP so that clients can correctly connect to it. Otherwise they'll try to connect to the internal IP (since advertised.listeners will default to listeners unless explicitly set)
Ref: https://kafka.apache.org/documentation/#brokerconfigs

Diagnosing Kafka Connection Problems

I have tried to build as much diagnostics into my Kafka connection setup as possible, but it still leads to mystery problems. In particular, the first thing I do is use the Kafka Admin Client to get the clusterId, because if this operation fails, nothing else is likely to succeed.
def getKafkaClusterId(describeClusterResult: DescribeClusterResult): Try[String] = {
try {
val clusterId = describeClusterResult.clusterId().get(futureTimeout.length / 2, futureTimeout.unit)
Success(clusterId)
} catch {
case cause: Exception =>
Failure(cause)
}
}
In testing this usually works, and everything is fine. It generally only fails when the endpoint is not reachable somehow. It fails because the future times out, so I have no other diagnostics to go by. To test these problems, I usually telnet to the endpoint, for example
$ telnet blah 9094
Trying blah...
Connected to blah.
Escape character is '^]'.
Connection closed by foreign host.
Generally if I can telnet to a Kafka broker, I can connect to Kafka from my server. So my questions are:
What does it mean if I can reach the Kafka brokers via telnet, but I cannot connect via the Kafka Admin Client
What other diagnostic techniques are there to troubleshoot Kafka broker connection problems?
In this particular case, I am running Kafka on AWS, via a Docker Swarm, and trying to figure out why my server cannot connect successfully. I can see in the broker logs when I try to telnet in, so I know the brokers are reachable. But when my server tries to connect to any of 3 brokers, the logs are completely silent.
This is a good article that explains the steps that happens when you first connect to a Kafka broker
https://community.hortonworks.com/articles/72429/how-kafka-producer-work-internally.html
If you can telnet to the bootstrap server then it is listening for client connections and requests.
However clients don't know which real brokers are the leaders for each of the partitions of a topic so the first request they always send to a bootstrap server is a metadata request to get a full list of all the topic metadata. The client uses the metadata response from the bootstrap server to know where it can then make new connections to each of Kafka brokers with the active leaders for each topic partition of the topic you are trying to produce to.
That is where your misconfigured broker problem comes into play. When you misconfigure the advertised.listener port the results of the first metadata request are redirecting the client to connect to unreachable IP addresses or hostnames. It's that second connection that is timing out, not the first one on the port you are telnet'ing into.
Another way to think of it is that you have to configure a Kafka server to work properly as both a bootstrap server and a regular pub/sub message broker since it provides both services to clients. Yours are configured correctly as a pub/sub server but incorrectly as a bootstrap server because the internal and external ip addresses are different in AWS (also in docker containers or behind a NAT or a proxy).
It might seem counter intuitive in small clusters where your bootstrap servers are often the same brokers that the client is eventually connecting to but it is actually a very helpful architectural design that allow kafka to scale and to failover seamlessly without needing to provide a static list of 20 or more brokers on your bootstrap server list, or maintain extra load balancers and health checks to know onto which broker to redirect the client requests.
If you do not configure listeners and advertised.listeners correctly, basically Kafka just does not listen. Even though telnet is listening on the ports you've configured, the Kafka Client Library silently fails.
I consider this a defect in the Kafka design which leads to unnecessary confusion.
Sharing Anand Immannavar's answer from another question:
Along with ADVERTISED_HOST_NAME, You need to add ADVERTISED_LISTENERS to container environment.
ADVERTISED_LISTENERS - Broker will register this value in zookeeper and when the external world wants to connect to your Kafka Cluster they can connect over the network which you provide in ADVERTISED_LISTENERS property.
example:
environment:
- ADVERTISED_HOST_NAME=<Host IP>
- ADVERTISED_LISTENERS=PLAINTEXT://<Host IP>:9092

Vertx Clustered EventBus not sending messages

Diagram of Setup
I've setup TCP discovery using Hazelcast where parts of the cluster exist in and out of the AWS cloud.
Inside AWS I can send and receive messages no problem but not externally.
Looking at the members all 3 servers are in the list but no messages are sent to server 3 on my local machine.
For testing the AWS machines have their firewalls disabled, so the only thing I can think of is a firewall issue on my local network.
I tried making a new instance of Vertx on all servers setting the EventBus port to 80 but that stopped all messages.
Servers 1 or 2 are not reporting any failed to send issues, but I'm not sure what the problems is.
Anybody have any ideas as to why server 3 cannot send or receive messages despite being int he cluster?