Does Amazon allow raising the SES mail *receiving* limits? - amazon-web-services

I run an email forwarding service (https://kopi.cloud).
I'm investigating the feasibility of building a feature to allow users to "bring their own domain".
It seems like this should work fine with SES, except there are limits on total number of rules and total number of recipients (see https://docs.aws.amazon.com/ses/latest/DeveloperGuide/limits.html)
With the currently limits on rules and recipients, I could pack the subscriber domains into the receipt rules and Kopi could support up to about 10,000 separate domains.
10K domains will be plenty for a while, I don't expect that many people will actually want to bring their own domain (I reckon most people who'd want this would just go ahead and do their own forwarding), so I'm going to go ahead prototyping the feature.
But I need to check if these limits are "soft limits", like the sending limits that can be raised on request; or "hard limits", where no increase is possible.
I'm still going to prototype the feature and if it were to be wildly successful, I guess I could jury-rig something together with multiple accounts or some other shennanigans.
So my question: "Is it possible to get the SES receiving rule limits raised?"

Answer from AWS support, as of June 2019 - "these limits are a hard limit and cannot be increased at the moment".
There is an outstanding request to raise them though.
https://forums.aws.amazon.com/thread.jspa?threadID=303902

Related

Optimization and loadbalancing of microservice based backend

I have a client which has a pretty popular ticket selling service, to the point that the microservice based backend is struggling to keep up, I need to come up with a solution to optimize and loadbalance the system. The infrastructure works through a series of interconnected microservices.
When a user enter the sales channels (mobile or web app), the request is directed to an AWS API Gateway which is in charge of orchestrating the communication towards the microservice in charge of obtaining the requested resources.
These resources are provided from a third party API
This third party has physical servers in each venue in charge of synchronizing the information between the POS systems and the digital sales channels.
We have a REDIS instance in charge of caching these requests that we make to the third party API, we cache each endpoint with a TTL relative to the frequency of updating the information.
Here is some background info:
We get traffic mostly from 2 major countries
On a normal day, about 100 thousands users will use the service, with an 70%/30% traffic relation in between the two countries
On important days, each country has different opening hours (Country A starts sales at 10 am UTC, but country B starts at 5 pm UTC), on these days the traffic increases some n times
We have a main MiddleWare through which all requests made by clients are processed.
We have a REDIS cache database that stores GETs with different TTLs for each endpoint.
We have a MiddleWare that decides to make the request to the cache or to the third party's API, as the case may be.
And these are the complaints I have gotten that need to be deal with:
When a country receives a high amount of requests, the country with the least traffic gets negatively affected, the clients do not respond, or respond partially because the computation layer's limit was exceeded and so the users have a bad experience
Every time the above happens, the computation layer must be manually increased from the infrastructure.
Each request has different response times, stadiums respond in +/- 40 seconds and movie theaters in 3 seconds. These requests enter a queue and are answered in order of arrival.
The error handling is not clear. The errors are mixed up and you can't tell from which country the errors are coming from and how many errors there are
The responses from the third party API are not cached correctly in the cache layer since errors are stored for the time of the TTL
I was thinking of a couple of thinks that I could suggest:
Adding in instrumentation of the requests by using AWS X-Ray
Adding in a separate table for errors in the redis cache layer (old data has to be better than no data for the end user)
Adding in AWS elastic load balancing for the main middleware
But I'm not sure how realistic would be to implement these 3 things, I'm also not sure if they would even solve the problem, I personally don't really have experience with optimizing this type of backed. I would appreciate any suggestions, recommendations, links, documentation, etc. I'm really desperate for a solution to this problem
few thoughts:
When a country receives a high amount of requests, the country with the least traffic gets negatively affected, the clients do not respond, or respond partially because the computation layer's limit was exceeded and so the users have a bad experience
A common approach in aws is to regionalize stack - assuming you are using cdk/cloud formation creating regionalized stack should be a straightforward task.
But it is a question if this will solve the problem. Your system suffers from availability issues, regionalization will isolate this problem down to regions. So we should be able to do better (see below)
Every time the above happens, the computation layer must be manually increased from the infrastructure.
AWS has an option to automatically scale up and down based on traffic patterns. This is a neat feature, given you set limits to make sure you are not overcharged.
Each request has different response times, stadiums respond in +/- 40 seconds and movie theaters in 3 seconds. These requests enter a queue and are answered in order of arrival.
It seems that the large variance is because you have to contact the servers at venues. I recommend to decouple that activity. Basically calls to venues should be done async; there are several ways you could do that - queues and customer push/pull are the approaches (please, comment if more details are needed. but this is quite standard problem - lots of data in the internet)
The error handling is not clear. The errors are mixed up and you can't tell from which country the errors are coming from and how many errors there are
That's a code fix, when you do send data to cloudwatch (do you?). You could put country as a context to all request, via filter or something. And when error is logged that context is logged as well. You probably need venue id even more than country, as you can conclude country from venue id.
The responses from the third party API are not cached correctly in the cache layer since errors are stored for the time of the TTL
Don't store errors + add a circuit breaker pattern.

Amazon SES: To send mail to large number of users

I want to send mail to large number of users, I did some research about it and found that we can send mail to maximum 50 recipients at one API call.
But I have more than 500 users, and need to send mail to all of them.
I have tried with AWS lambda + SES and mail sending is working, but all recipients are showing in to mail:
["#","#","#",...]
How could I hide other recipients?
Without knowing more information, all I can do is provide an answer that represents general architectural guidance.
I would architect your Lambdas in such a way that minimizes the amount of "work" that each Lambda does. In this case, you should have a Lambda that handles your business logic necessary to construct a message, and another Lambda that handles sending email, either in batches or as one-off messages, in addition to handling rate limiting, batch size, etc.
From your "business logic" Lambda, call the "email sending" Lambda as many times as you need. The platform will handle provisioning the necessary execution environments.
Serverless really allows you to have functions that follow the Unix Philosophy - do one thing and do it well.

Unusual request activity log found in django server

Following is the screenshot of the server activity log.I can see that many requests are automatically raised in the server.How can I avoid this.?
It looks like someone is fuzzing your website and scanning to find any common file names or extensions that commonly have security vulnerabilities. One way to limit this behaviour is to implement rate limiting whereby you might limit the number of requests a user makes that result in HTTP 404 Not Found during some time period before giving them a temporary ban. Note: this solution doesn't stop this from happening but it does buy you time and may deter the attacker or researcher

Delayed SES Stats Updation

I am noticing AWS SES stats are not being updated in real-time. After sending email, it takes time for sent count to increase on SES Dashboard. Sometimes it takes few minutes and sometimes it takes long.
Has anyone also experienced this? Any thoughts?
On the assumption that the console is simply making a call to a standard API action (rather than using some kind a console-only backend service that is not documented or user-accessible -- such things are not unheard-of, but are pretty rare in AWS, so it's a reasonably safe assumption), it looks like this is not really designed to be real-time. The stats are reported in 15 minute windows.
From the SES API reference:
GetSendStatistics
Returns the user's sending statistics. The result is a list of data points, representing the last two weeks of sending activity.
Each data point in the list contains statistics for a 15-minute interval.
— http://docs.aws.amazon.com/ses/latest/APIReference/API_GetSendStatistics.html
AWS/SES dashboard stats are for pure hint performace but not to rely on them. In such case, if you want to have real time notifications of sent emails you will need to create SNS notifications. Keep in mind that Spam-Complaint notifications can take up to a couple of days as this is based on information provided by the ISP to Amazon. And complaints within the Gmail evil-system will NEVER get to you.

Amazon EC2 EBS i/o costs

I am hosting my application on amazon ec2 , on one of their micro linux instances.
It costs (apart from other costs) $0.11 per 1 million I/O requests . I was wondering how much I/O requests does it take when I have say 1000 users using it for say 1 hours per day for 1 month ?
I guess my main concern is : if a hacker keeps hitting my login page (simple html) , will it increase the I/O request count ? I guess yes, as every time the server needs to do something to server that page.
There are a lot of factors that will impact your IO requests, as #datasage says, try it and see how it behaves under your scenario. Micro Linux instances are incredible cheap to begin with, but if you are really concerned, setup a billing alert that will notify you when your usage passes a pre-determined threshold - if it suddenly spikes up, you can take some action to shut it down if that is what you want.
https://portal.aws.amazon.com/gp/aws/developer/account?ie=UTF8&action=billing-alerts
Take a look at CloudWatch, and (for free) set up a VolumeWriteOps and VolumeReadOps alarm to work with Amazon Simple Notification Service (SNS) to send you a text message and eMail notice right away if things get too busy, before the bill gets high! (A billing alert will let you know too late - after it has reached the threshold.)
In general though, from my experience, you will not have the problem you outline. Scan the EC2 Discussion Forum at forums.aws.amazon.com where you would find evidence of this kind of problem if were prevalent; it does not seem to be happening.
#Dilpa yes you are right. If some brute force attack will occur to your website eg: somebody hitting to your loginn page then it will increase the server I/O if you enable loging for your webserver. Webserver will keep log to it's log files of every event and that will increase your I/O. Just verify your webserver log for such kind of attack and you can prevent them.