How to get AWS SES quota usage per templated email? - amazon-web-services

As I understand from the documentation the SES emails are billed per quantity and per message size. How can I determine the size of each templated email that might or not have an attachment?
I want to have that information to bill our clients. Or should I be using another service for that purpose?

As you specify - that’s the AWS billing (just keep in mind that there is a difference whether you use EC2 instances or sth different).
About determining a size of the templated email. There are two ways here that I would suggest doing - one easier and given semi-correct outputs and another one harder and giving exact outputs.
Easier & semi-correct
All of your templates create some overhead on the normal email (if it was send without the template), there is 100% chance that the size of the email is going to be less (unless you do some crazy stuff that not all of the fields are used etc, but this is not the case). So in that case, you can store all the information for the template in the database (sadly there is no other choice) and have the size of the template calculated. Then you can easily calculate and estimate the size of your email (by the attachments, additional overhead of the email headers and the content of your email) and afterwards you have semi correct answer (it will always be AT LEAST the size of the mail that is being sent with a bit of overhead but it shouldnt matter a lot).
Harder and correct
You are moving from the normal way of sending emails to use the normal custom “Mime messages”. You also create your own similar template service in your application. Afterwards, you can perform the template conversion by yourself and finally estimate the appropriate and correct size of the email message. The only thing that you woudlnt able to do is if you have set up “open links” in your application and that you should also consider in your application. It would give you a much more acccurate results.
What I would do:
In all your clients you can just calculate how many emails are they sending. I think that you can just take the average of the attachments in the mail (usually the content is really small) and then split the bill based on the calculated averages. You will not have a lot of overhead created from implementing new functionalities in the application and overall everyone should be satisfied with the results (you have all of your bills paid, as well as the customers are paying reasonable amounts of their emails).

Related

My chatbot (lex v2) exceeded intent & utterances quotas and i need more intents. [AWS][LEXV2]

I am creating a large chatbot, the problem is that I have already exceeded the limits that amazon has, I need approximately 1800 intents(it is a large project), and the "hard limits" cannot be increased (I already spoke with an amazon agent) , I wanted to know if anyone has experienced this problem and how to solve it (not changing Dialogflow/wattson tools).
I was thinking of creating a "Chatbot Orchestrator" and splitting the chatbot into several parts (experiences) and invoking the corresponding bot and intent.
Any ideas?
A possible solution is use Kendra for search the intent, basically i need to activate the fallback intent and use Kendra in a lambda function for the search of the answer.
in this document, there is an example.
Kendra, as mentioned already, is an alternative.
However, I would suggest you do a deep dive through your intents to see how many pertain to the same context and can be combined and effectively managed through the use of slots and Lambdas to get the right behaviour.
Another approach would be to use separate bots if you have clean divisions between the intents. Note that your costs here could increase quite substantially as you'd need to invoke all the bots, evaluate the confidence scores and then decide which response to return to the client.

Passing more than 8000 Post parameters throws error

I am working on a module which requires to submit a form with an insane amount of parameters (8k-10k). I am not sure whether this is a good idea or not. But that's the way it is. I have changed the settings in neo-runtime.xml file as mentioned in this link as bellow:
<var name='postParametersLimit'><number>10000.0</number></var> and restarted the server. But no use. CF still throws error 500. We can not see any robust information. I am working on CF9.0.2 and we are using IIS 7.5. Is there anything do i need to do?
"We gave our client a dynamic form where he can add his own form fields and now we have this problem. There was a mismatch between clients expectations and our thinking of the way client wants it."
Unfortunately, you're going to have to tell the client they can't have it how they want it. That post processing limit is there for security reasons and if you raise it too high, then you're re-opening your server to a denial of service attack using a hash algorithm collision.
We have tens of thousands of forms in our workflow system and work with banking and government clients. Once this update was applied (in development first), we had to raise the default to a certain value and stick with it. We made sure to note this limitation to the entire business team and add it to our coding standards document to ensure that all new development was done in accordance to the standard. After reworking a handful of existing forms to account for the limitation, we were able to push the security update to production without a problem.
Just tell them that there is a security restriction on the number of fields in a single form and they cannot cross that line. If you need to gather that much data, they'll have to break it up into multiple forms.
You can use a cfgrid instead of using a long form with huge amount of data to take input from the user.
cfgrid allows you to load only a limited amount of data from the database.
Using it you can prevent posting and loading of huge amount of data at once.
And if you are not a great supporter of cfgrid of cfajax features you can still use pagination or stuff like that, that will allow you to load a limited amount of data in your form and in turn less posting of data. But the later will need you to build a logic by yourself.
Start with the CF server limits first. This blog post should give you a pointer to where limits can be adjusted:
http://www.cutterscrossing.com/index.cfm/2012/3/27/ColdFusion-Security-Hotfix-and-Big-Forms

how to get the 1 million-th click of a website

I often heard this question coming from different sources, but never got a good idea of the technologies to achieve this. Can anyone shed some lights? The question is: you have a website which has high volume of users access per day. Your website is deployed in a distributed manner, have multiple webservers and load balancers responding incoming requests from lots of locations. How do you get the 1000000th user access, and show him a special page saying "congrats, you are our 1000000th visitor!". Assuming you had a distributed backend.
You could do it with jQuery, for example:
$("#linkOfInterest").click(function() { //code for updating a variable/record that contains the current number of clicks });
CSS:
a#linkOfInterest {
//style goes here
}
somewhere in the html :
<a id="linkOfInterest" href="somepage.htm"></a>
You are going to have to trade off performance or accuracy. The simplest way to do this would be have a memcached instance keep track of your visitor counts, or some other datastore with an atomic increment operation. Since there is only a single source of truth, only 1 visitor will get the message. This will delay the loading of your page by the roundtrip to the store at minimum.
If you can't afford the delay, then you will have to trade off accuracy. A distributed data store will not be able to atomically increment the field any faster than a single instance. Every web server can read and write to a local node, but another node at another datacenter may also reach 1 million users counts before the transactions are reconciled. In that case 2 or more people may get the 1 millionth user message.
It is possible to do so after the fact. Eventually, the data store will reconcile the increments, and your application can decide on a strict ordering. However, if you have already decided that a single atomic request takes too long, then this logic will take place too late to render your page.

Determine unique visitors to site

I'm creating a django website with Apache2 as the server. I need a way to determine the number of unique visitors to my website (specifically to every page in particular) in a full proof way. Unfortunately users will have high incentives to try to "game" the tracking systems so I'm trying to make it full proof.
Is there any way of doing this?
Currently I'm trying to use IP & Cookies to determine unique visitors, but this system can be easily fooled with a headless browser.
Unless it's necessary that the data be integrated into your Django database, I'd strongly recommend "outsourcing" your traffic to another provider. I'm very happy with Google Analytics.
Failing that, there's really little you can do to keep someone from gaming the system. You could limit based on IP address but then of course you run into the problem that often many unique visitors share IPs (say, via a university, organization, or work site). Cookies are very easy to clear out, so if you go that route then it's very easy to game.
One thing that's harder to get rid of is files stored in the appcache, so one possible solution that would work on modern browsers is to store a file in the appcache. You'd count the first time it was loaded in as the unique visit, and after that since it's cached they don't get counted again.
Of course, since you presumably need this to be backwards compatible then of course it leaves it open to exactly the sorts of tools which are most likely to be used for gaming the system, such as curl.
You can certainly block non-browserlike user agents, which makes it slightly more difficult if some gamers don't know about spoofing browser agent strings (which most will quickly learn).
Really, the best solution might be -- what is the outcome from a visit to a page? If it is, for example, selling a product, then don't award people who have the most page views; award the people whose hits generate the most sales. Or whatever time-consuming action someone might take at the page.
Possible solution:
If you're willing to ignore people with JavaScript disabled, you could choose to count only people who access the page and then stay on that page for a given window of time (say, 1 minute). After a given period of time, do an Ajax request back to the server. So if they tried to game by changing their cookie and loading multiple tabs at once, it wouldn't work because they'd need to have the same cookie in order to register that they'd been on that page long enough. I actually think this might work; I can't honestly see a way to game that. Basically on the server side you store a dictionary called stay_until in request.session with keys for each unique page and after 1 minute or so you run an Ajax call back to the server. If the value for stay_until[page_id] is less than or equal to the current time, then they're an active user, otherwise they're not. This means that it will take someone at least 20 minutes to generate 20 unique visitors, and so long as you make the payoff worth less than the time consumed that will be a strong disincentive.
I'd even make it more explicit: on the bottom of the page in a noscript tag, put "Your access was not counted. Turn on JavaScript to be counted" with a page that lays out the tracking process.
As HTML Requests are stateless and you have no control over the users behavior on his clientside, there is no bulletproof way.
The only way you're going to be able to track "unique" visitors in a fool-proof way is to make it contingent on some controlled factor such as a login. Anything else can and will fail to be completely accurate.

cleaning up missed geocoding (or general advise on data cleaning)

I've got a rather large database of location addresses (500k+) from around the world. Though lots of the address are duplicates or near duplicates.
Whenever a new address is entered, I check to see if it is in the database already, and if so, i take the already existing lat/long and apply it to the new entry.
The reason I don't link to a separate table is because the addresses are not used as a group to search on, and their are often enough differences in the address that i want to keep them distinct.
If I have a complete match on the address, I apply that lat/long. If not, I go to city level and apply that, if I can't get a match there, I have a separate process to run.
Now that you have the extensive background, the problem. Occasionally I end up with a lat/long that is far outside of the normal acceptable range of error. However, strangely, it is normally just one or two of these lat/longs that fall outside the range, while the rest of the data exists in the database with the correct city name.
How would you recommend cleaning up the data. I've got the geonames database, so theoretically i have the correct data. What i'm struggling with is what is the routine you would run to get this done.
If someone could point me in the direction of some (low level) data scrubbing direction, that would be great.
This is an old question, but true principles never die, right?
I work in the address verification industry for a company called SmartyStreets. When you have a large list of addresses and need them "cleaned up", polished to official standards, and then will rely on it for any aspect of your operations, you best look into CASS-Certified software (US only; countries vary widely, and many don't offer such a service officially).
The USPS licenses CASS-Certified vendors to "scrub" or "clean up" (meaning: standardize and verify) address data. I would suggest that you look into a service such as SmartyStreets' LiveAddress to verify addresses or process a list all at once. There are other options, but I think this is the most flexible and affordable for you. You can scrub your initial list then use the API to validate new addresses as you receive them.
Update: I see you're using JSON for various things (I love JSON, by the way, it's so easy to use). There aren't many providers of the services you need which offer it, but SmartyStreets does. Further, you'll be able to educate yourself on the topic of address validation by reading some of the resources/articles on that site.