Does AWS Internet Gateway Id always start with "igw-" or it may be different? - amazon-web-services

Does AWS Internet Gateway Id always start with "igw-" or it may be different ?
Similarly I want to know about the Nat Gateway Id and VPC Endpoint Id.

Resource IDs will have a specific naming format for the ID, this will generally stay the same through the entire lifecycle of the product itself.
There are occasions where the format might change, an example of this is when EC2 instances had their suffix lengthened to increase availability. These kinds of changes are communicated months in advance with the choice of opting in early.

Related

Setting up API Gateway with Route 53 and Dynamodb global tables

I have an application that uses Dynamodb as persistence layer and api gateway as interface to the internet. To make it globally accessible with the least amount of latency for the consumers of the api, I thought about enable Dynamodb global tables for various regions, deploy my api to the same regions and have Route 53 route traffic with an geolocation routing policy to the nearest api end point.
My questions are:
Is that the right way to do it? Am I missing something? Are there better ways?
What are the cost implications? As far as I understand are all services (Route 53, Dynamodb, API gateway) billed based on consumption. Therefore deploying to all regions does not add costs
Thank you
You are perhaps missing a Lambda to interact with DynamoDB. Not sure about your use case -- and it is not unheard of to expose DynamoDB directly -- but the most obvious pattern would be API Gateway -> Lambda -> DynamoDB. But, as I say, your particular use case will drive that -- would be keen to learn more, if you want to share.
There are no particular pricing call-outs at this level of detail, as long as you are sure you want to run DynamoDB Global Tables. You may consider provisioned capacity for DynamoDB, if you have stable consumption, but note that provisioned Global Tables are charged by hour.
There are probably like a hundred more questions I would ask about your solution architecture, but this is perhaps not the right forum. Hope this much helps.

How to Access documentDB from Lambda#Edge function?

I am trying to set up an event trigger lambda#Edge function from cloudFront.
This function needs to access the database and replace the url's metadata before distributing out to users.
Issues I am facing:
My DocumentDB is placed in a VPC private subnet. Can't be accessed outside the VPC.
My Lambda edge function can't connect to my VPC since they are both in different region.
The method I had in mind is to create an API in my web server(public subnet) for my lambda function to call, but this seems like not a very efficient method
Appreciate If you can give me some advice or an alternative way for implementation.
Thanks in Advance
Lambda#Edge has a few limitations you can read about here.
Among them is:
You can’t configure your Lambda function to access resources inside your VPC.
That means the VPC being in another region is not your problem, you just can't place a Lambda # Edge function in any VPC.
The only solution I can think of is making your DocumentDB available publicly on the internet, which doesn't seem like a great idea. You might be able to create a security group that only allows access from the CloudFront IP-Ranges although I couldn't find out if Lambda#Edge actually uses the same ranges :/
Generally I'd avoid putting too much business logic in Lambda#Edge functions - keep in mind they're run on every request (or at the very least every request to the origin) and increase the latency for these requests. Particularly network requests are expensive in terms of time, more so if you communicate across continents to your primary region with the database.
If the information you need to update the URLs metadata is fairly static, I'd try to serialize it and distribute it in the lambda package - reading from local storage is considerably cheaper and faster.

How can I add ip-based rate limits with longer intervals on API Gateway?

I have an API Gateway endpoint that I would like to limit access to. For anonymous users, I would like to set both daily and monthly limits (based on IP address).
AWS WAF has the ability to set rate limits, but the interval for them is a fixed 5 minutes, which is not useful in this situation.
API Gateway has the ability to add usage plans with longer term rate quotas that would suit my needs, but unfortunately they seem to be based on API keys, and I don't see a way to do it by IP.
Is there a way to accomplish what I'm trying to do using AWS Services?
Is it maybe possible to use a usage plan and automatically generate an api key for each user who wants to access the api? Or is there some other solution?
Without more context on your specific use-case, or the architecture of your system, it is difficult to give a “best practice” answer.
Like most things tech, there are a few ways you could accomplish this. One way would be to use a combination of CloudWatch API logging, Lambda, DynamoDB (with Streams) and WAF.
At a high level (and regardless of this specific need) I’d protect my API using WAF and the AWS security automations quickstart, found here, and associate it with my API Gateway as guided in the docs here. Once my WAF is setup and associated with my API Gateway, I’d enable CloudWatch API logging for API Gateway, as discussed here. Now that I have things setup, I’d create two Lambdas.
The first will parse the CloudWatch API logs and write the data I’m interested in (IP address and request time) to a DynamoDB table. To avoid unnecessary storage costs, I’d set the TTL on the record I’m writing to my DynamoDB table to be twice whatever my analysis’s temporal metric is... ie If I’m looking to limit it to 1000 requests per 1 month, I’d set the TTL on my DynamoDB record to be 2 months. From there, my CloudWatch API log group will have a subscription filter that sends log data to this Lambda, as described here.
My second Lambda is going to be doing the actual analysis and handling what happens when my metric is exceeded. This Lambda is going to be triggered by the write event to my DynamoDB table, as described here. I can have this Lambda run whatever analysis I want, but I’m going to assume that I want to limit access to 1000 requests per month for a given IP. When the new DynamoDB item triggers my Lambda, the Lambda is going to query the DynamoDB table for all records that were created in the preceding month from that moment, and that contain the IP address. If the number of records returned is less than or equal to 1000, it is going to do nothing. If it exceeds 1000 then the Lambda is going to update the WAF WebACL, and specifically UpdateIPSet to reject traffic for that IP, and that’s it. Pretty simple.
With the above process I have near real-time monitoring of request to my API gateway, in a very efficient, cost-effective, scaleable manner in a way that can be deployed entirely Serverless.
This is just one way to handle this, there are definitely other ways you could accomplish this with say Kinesis and Elastic Search, or instead of logs you could analyze CloudTail events, or by using a third party solution that integrates with AWS, or something else.

Routing traffic to specific AWS regions using wildcard subdomain

I'm building a Laravel application that offers an authoring tool to customers. Each customer will get their own subdomain i.e:
customer-a.my-tool.com
customer-b.my-tool.com
My tool is hosted on Amazon in multiple regions for performance but mostly for privacy law reasons(GDPR++). Each customer have their data in only one region. Australian customers in Australia, European in Europe etc. So the customers users must be directed to the correct region. If a European user ends up being served by the US region their data won't be there.
We can solve this manually using DNS and simply point each sub-domain to the correct IP, but we don't want to do this for two reasons. (1) updating the DNS might take up to 60 seconds. We don't want the customer to wait. (2) It seems the sites we've researched uses wildcard domains. For instance slack and atlassian.net. We know that atlassian.net also have multiple regions.
So the question is:
How can we use a wildcard domain and still route the traffic to the regions where the content is located?
Note:
We don't want the content in all regions, but we can have for instance a DynamoDB available in all regions mapping subdomains to regions.
We don't want to tie an organization to a region. I.e. a domain structure like customer-a.region.my-tool.com is an option we've considered, but discarded
We, of course, don't want to be paying for transferring the data twice, and having apps in all regions accessing the databases in the regions the data belong to is not an option since it will be slow.
How can we use a wildcard domain and still route the traffic to the regions where the content is located?
It is, in essence, not possible to do everything you are trying to do, given all of the constraints you are imposing: automatically, instantaneously, consistently, and with zero overhead, zero cost, and zero complexity.
But that isn't to say it's entirely impossible.
You have asserted that other vendors are using a "wildcard domain," which is a concept that is essentially different than I suspect you believe it necessarily entails. A wildcard in DNS, like *.example.com is not something you can prove to the exclusion of other possibilities, because wildcard records are overridden by more specific records.
For a tangible example that you can observe, yourself... *.s3.amazonaws.com has a DNS wildcard. If you query some-random-non-existent-bucket.s3.amazonaws.com, you will find that it's a valid DNS record, and it routes to S3 in us-east-1. If you then create a bucket by that name in another region, and query the DNS a few minutes later, you'll find that it has begun returning a record that points to the S3 endpoint in the region where you created the bucket. Yes, it was and is a wildcard record, but now there's a more specific record that overrides the wildcard. The override will persist for at least as long as the bucket exists.
Architecturally, other vendors that segregate their data by regions (rather than replicating it, which is another possibility, but not applicable to your scenario) must necessarily be doing something along one of these lines:
creating specific DNS records and accepting the delay until the DNS is ready or
implementing what I'll call a "hybrid" environment that behaves one way initially, and a different way eventually, this evironment uses specific DNS records to override a wildcard and has an ability to temporarily deliver, via a reverse proxy, a misrouted request to the correct cluster, to allow instantaneous correct behavior until the DNS propagates or
an ongoing "two-tier" environment, using a wildcard without more specific records to override it, operating a two-tier infrastructure, with an outer tier that is distributed globally, that accepts any request, and has internal routing records that deliver the request to an inner tier -- the correct regional cluster.
The first option really doesn't seem unreasonable. Waiting a short time for your own subdomain to be created seems reasonably common. But, there are other options.
The second option, the hybrid environment, would simply require that the location where your wildcard points to by default be able to do some kind of database lookup to determine where the request should go, and proxy the request there. Yes, you would pay for inter-region transport, if you implement this yourself in EC2, but only until the DNS update takes effect. Inter-region bandwidth between any two AWS regions costs substantially less than data transfer to the Internet -- far less than "double" the cost.
This might be accomplished in any number of ways that are relatively straightforward.
You must, almost by definition, have a master database of the site configuration, somewhere, and this system could be queried by a complicated service that provides the proxying -- HAProxy and Nginx both support proxying and both support Lua integrations that could be used to do a lookup of routing information, which could be cached and used as long as needed to handle the temporarily "misrouted" requests. (HAProxy also has static-but-updatable map tables and dynamic "stick" tables that can be manipulated at runtime by specially-crafted requests; Nginx may offer similar things.)
But EC2 isn't the only way to handle this.
Lambda#Edge allows a CloudFront distribution to select a back-end based on logic -- such as a query to a DynamoDB table or a call to another Lambda function that can query a relational database. Your "wildcard" CloudFront distribution could implement such a lookup, caching results in memory (container reuse allows very simple in-memory caching using simply an object in a global varible). Once the DNS record propagates, the requests would go directly from the browser to the appropriate back-end. CloudFront is marketed as a CDN, but it is in fact a globally-distributed reverse proxy with an optional response caching capability. This capability may not be obvious at first.
In fact, CloudFront and Lambda#Edge could be used for such a scenario as yours in either the "hybrid" environment or the "two-tier" environment. The outer tier is CloudFront -- which automatically routes requests to the edge on the AWS network that is nearest the viewer, at which point a routing decision can be made at the edge to determine the correct cluster of your inner tier to handle the request. You don't pay for anything twice, here, since bandwidth from EC2 to CloudFront costs nothing. This will not impact site performance other than the time necessary for thst initial database lookup, and once your active containers have that cached the responsiveness of the site will not be impaired. CloudFront, in general, improves responsiveness of sites even when most of the content is dynamic, because it optimizes both the network path and protocol exchanges between the viewer and your back-end, with optimized TCP stacks and connection reuse (particularly helpful at reducing the multiple round-trips required by TLS handshakes).
In fact, CloudFront seems to offer an opportunity to have it both ways -- an initially hybrid capability that automatically morphs into a two-tier infrastructure -- because CloudFront distributions also have a wildcard functionality with overrides: a distribution with *.example.com handles all requests unless a distribution with a more specific domain name is provisioned -- at which point the other distribution will start handling the traffic. CloudFront takes a few minutes before the new distribution overrides the wildcard, but when the switchover happens, it's clean. A few minutes after the new distribution is configured, you make a parallel DNS change to the newly assigned hostname for the new distribution, but CloudFront is designed in such a way that you do not have to tightly coordinate this change -- all endpoints will handle all domains because CloudFront doesn't use the endpoint to make the routing decision, it uses SNI and the HTTP Host header.
This seems almost like a no-brainer. A default, wildcard CloudFront distribution is pointed to by a default, wildcard DNS record, and uses Lambda#Edge to identify which of your clusters handles a given subdomain using a database lookup, followed by the deployment -- automated, of course -- of a distribution for each of your customers, which already knows how to forward the request to the correct cluster, so no further database queries are needed after the subdomain is fully live. You'll need to ask AWS Support to increase your account's limit for the number of CloudFront distributions from the default of 200, but that should not be a problem.
There are multiple ways to accomplish that database lookup. As mentioned, before, the Lambda#Edge function can invoke a second Lambda function inside VPC to query the database for routing instructions, or you could push the domain location config to a DynamoDB global table, which would replicate your domain routing instructions to multiple DynamoDB regions (currently Virginia, Ohio, Oregon, Ireland, and Frankfurt) and DynamoDB can be queried directly from a Lambda#Edge function.

Using AWS Autoscaling as dispatcher

I have been given the business logic of
A customer makes a request for services through a third party
gateway GUI to an EC2 instance
Processing for some time (15hr)
Data retrieval
Currently this is implemented by statically giving each user an EC2 instance to use to handle their requests. (This instance actually creates some sub instances to parallel process the data).
What should happen is that for each request, an EC2 instance be fired off automatically.
In the long term, I was thinking that this should be done using SWF (given the use of sub processes), however, I wondered if as a quick and dirty solution, using Autoscaling with the correct settings is worthwhile pursuing.
Any thoughts?
you can "trick" autoscalling to spin up instances based on metrics:
http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/policy_creating.html
So on each request, keep track/increments a metric. Decrement the metric when the process completes. Drive the autoscalling group on the metric.
Use Step Adjustments to control the number of instances: http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-scale-based-on-demand.html#as-scaling-types
Interesting challenges: binding customers to specific EC2 instances. Do you have this hard requirement of giving each customer their own instance? Sounds like autos calling is actually better suited for the paralleling process of the actual data, not for requests routing. You may get away with having a fixed number of machines for this and/or scale based on traffic, not number of customers.