Design Tracking of Emails sent from SMTP server

Design Tracking of Emails sent from SMTP server - web-services

I am trying to design a service to send emails to users. This service is pretty much similar to Amazon SES.
One of the requirement is to keep track of all the emails that this system will be sending. I am confused as how to design this solution so that I can maintain the emails sent with parent user(known at the time of sending email) who sent emails.
If I start dumping all the email related data in relational DB, it will grow exponentially over period of time and will create a lot of problem. Similarly if I store these things in Cassandra it will grow at good speed and create problems.
Need for storing this information:-
1) In future need to know if email was sent to a particular user and when.
2) If the feedback loop creates complaint mail, I will need to map it back to a particular email id(which will be present in complaint email) and parent user who sent it(which will be stored at the time email was sent).
Can someone help me giving pointers as, how to store or create some cache in a way to achieve this.

It's unlikely to grow "exponentially." Seems like it will grow linearly. Regardless, if you need the ability to look up who sent what to whom, then you have no choice but to store it.
What you need to do is estimate how many emails you send per day, and how much data you need to save with each of those emails. Do the math and determine how much data you expect to be generating each day. Then at least you can figure out how large your database will get over time.
You'll also need to consider how you want to index the data. Seems like you'll want to index by email id, at least. You might also want to index by sender, and also possibly by recipient. Those indexes will create additional per-email data storage requirements. How much is something you'll have to determine through analysis.
How much actual disk space this will occupy per email is hard to determine. If the messages are short, you could probably get more than a million emails per gigabyte in a relational database. You could potentially do much better than that if you compress the message data, or apply other techniques that take advantage of similarities in the messages. For example, if you send the exact same message to a thousand recipients, you can store a single copy of the message and just store a reference to that message in the individual email records.
You might also want to consider how long you need to store each message. Do you need to store everything forever, or can you periodically remove all messages that are older than a year (or some other relatively long amount of time)?

Related

DynamoDB - Reducing number of queries

After my users log in the app makes too many requests to DynamoDB and I am thinking about different ways to reduce the number of calls.
The app allows user to trigger certain alerts that get sent to other users. For instance: "Shipment received, come to the deck", "Shipment completed", etc.
These are the calls made:
Get company's software license expiration date.
Get the computer's location in the building (i.e. "Office A").
Get the kinds of alerts that can be triggered (i.e. "Shipment received, come to the deck", "Shipment completed", etc).
Get information about the user (i.e. company teams the user belongs to, and admin level the user has (which can be 0, 1, 2, or 3).
Potential solutions I have though about:
Put the company's license expiration date as an attribute of each computer (This would reduce the number of queries by 1). However, if I need to update the company's license expiration date, then I need to update it for EVERY SINGLE computer I have in the system, which sounds impractical to me since I may have 200, 300 or perhaps even more computers in the database.
Add the company's license expiration date as an attribute of the alerts (This would reduce the number of queries by 1); which seems more reasonable because there are only about 15 different kinds of alerts, so if I need to change the license expiration date later on, it is not too bad.
Cache information on the user's device; however, I can't seem to find a good strategy to keep the information stored locally as updated as possible.
I still think these 3 options do not sound too good, so I am hoping someone can point me in the right direction. Is there a good way to reduce the number of calls? I am retrieving information about 4 different entities (license, computer, alert, user), should I leave those 4 calls after users log in?

here are few things that can be done wrt each component.
Get information about the user
keep it in session store and whenever details changes update the store. session stores are usually implemented using cache like redis.
Computer location
Keep it in a distributed cache like redis. lazily initialise it. and whenever new write happens to computer location (rare IMO) remove the entry from redis using dynamodb streams and aws lambda.
Kind of alerts
Same as Computer location
License expiration date
If possible don't allow license expiry date (issue a new one for these cases, so that traceability is maintained.) and cache licence expiry forever. OR same as Computer location.

Storing Chat Log on AWS DynamoDB?

I am thinking of building a chat app with AWS DynamoDB. The app will support 1:1 and group chats.
I want to create one table for each one of the chats, where there is a record for each sent chat text line. Is DynamoDB suitable for this kind of job?
I am also thinking of merging both tables. But is this a good idea, if there are – let's assume – 100k or 1000k users?

I think you may run into problems with the read capacity on your table. The write capacity should be ok, as there are not so many messages coming in per second (e.g. 10 or so), but you'll need to constantly read from it for all users, so that'll be expensive.
If you want to use DynamoDB just as storage and distribute the chat messages like in any normal chat over the network, then it may make sense, depending on your use cases. You could, assuming you have a hash key UserId and Timestamp, query all messages from a specific user during a specific period as a result. If you want, however, search within the chat text (a much more useful feature, probably), then DynamoDB won't work per se. It's not like SQL, where you could do a LIKE '%abc%' query (which isn't a good idea in SQL either).
Probably you're better off using S3 as data storage and ElasticSearch as search instrument. If you require the aforementioned use case "get all messages from user X in timespan S" (as a simple example) you could additionally use DynamoDB to store metadata, such as UserId, Timestamp, PositionInFile or something like that.

Best Practices to update multiple records with a single server request

I have a User model which hasMany phones. The UI for the user allows to add/delete/update phones on the single form.
When user submits the form all changes to the phone list are sent to the server with a single request.
I have extended the App.UserSerializer with custom serializeHasMany to include all the phone details in the single request.
The real problem is to sync the store state after the request is complete.
Basically I need to solve these two problems:
Remove deleted records from the store. I could not find any methods which just removes a record from a store.
Update new records with the ids generated by server. (Or just remove the new records from the store and hasMany array since response creates the dups for the added records)
Is there any best practices or work arounds for this kind of scenarios?
Thank you.

I think the best practice for now is just sticking to regular REST. In your case this will mean a few extra requests (really though, how many phones can a user have?), but it will spare you a lot of effort in handling things manually.
Ember may support bulk updates in the future (https://github.com/emberjs/data/blob/master/TRANSITION.md, "We plan to support batch saving with a single HTTP request through a dedicated API in the future.")

Amazon DynamoDB and Provisioned Throughput

I am new to DynamoDB and I'm having trouble getting my head around the Provisioned Throughput.
From what I've read it seems you can use this to set the limit of reads and writes at one time. Have I got that wrong?
Basically what I want to do is store emails that are sent through my software. I currently store them in a MySQL database but the amount of data is very large which is why I am looking at DynamoDB. This data I do not need to access very often but when it's needed, I need to be able to access it.
Last month 142,925 emails were sent and each "row" (or email) in the MySQL table I store them in is around 2.5KB.
Sometimes 1 email is sent, other times there might be 3,000 at one time. There's no way of knowing when or how many will be sent at any given time.
Do you have any suggestions on what my Throughputs should be?
And if I did go over, am I correct in understanding that Amazon throttles it and adds them over time? Or does it just throw and error and that's the end of it?
Thanks so much for your help.

I'm using DynamoDB with the Java SDK. When you have an access burst, amazon first tries to keep up, even allowing a bit above the provisioned throughput, after that it start throttling and also throws exceptions. In our code we use this error to break the requests into smaller batches and sometimes force a sleep to cool it down a bit.
When dealing with your situation it really depends on the type of crunching you need to do "from time to time". How much time do you need to get all the data from the table? do you really need to get all of it? And ~100k a month doesn't sound too much for MySQL in my mind.. it all depends on the querying power you need.
Also note that in DynamoDB writes are more expensive than reads so maybe that alone signals that it is not the best fit for your write-intensive problem.

DynamoDb is very expensive, I would suggest not to store emails in dynamo db as each read and write cost good amount, Basically 1 read unit means 4KB data read per sec and 1 write unit means 1KB data write per sec, As you mentioned your each email is 2.5KB, hence while searching data(if you dont have proper key for searching the email) table will be completely scanned that will cost a very good amount as you will need several write units for reading.

Best Practice for Cookies

There are two approaches I've been thinking about for storing data in cookies. One way is to use one cookie for all the data and store it as a JSON string.
The other approach is to use a different cookies for each piece of data.
The negatives I see with the first approach is that it'll take up more space in the headers because of the extra JSON characters in the cookie. Also I'll have to parse and stringify the JSON which will take a little processing time. The positive is that only one cookie is being used. Are there other positives I am missing?
The negatives I see with the second approach is that there will be more cookie key value pairs used.
There are about 15-20 cookies that I will be storing. The expires date will be the same for each cookie.
From what I understand the max number of cookies per domain is around 4000. We are not close to that number yet.
Are there any issue I am overlooking? Which approach would be best?
Edit - These cookies are managed by the JavaScript.

If you hand out any data for storage to your users (which is what cookies do), you should encrypt the data, or at the very very least sign it.
This is needed to protect the data from tampering.
At this point, size considerations are way off (due to padding), and so is the performance overhead of parsing the JSON (encryption will cause significantly greater overhead).
Conclusion: store your data as JSON, (encrypt it), sign it, encode it as base64, and store it in a single cookie. Keep in mind that there is a maximum size for cookies (and it's 4K).
Reference: among numerous other frameworks and applications, this is what Rails does.

A best-practice for cookies is to minimize their use. For instance, limit your cookie usage to just remembering the session id, and then store your data on the server side.
In the EU, cookies are subject to legal regulations, and using cookies for almost anything but session ids require explicit client consent.

Good morning.
I think i understand you. At sometime ago, i use cookies stored as json data encrypted, but for intranet, or administration accounts. For users of shop, i used this same practice. Whetever, to store products on shop site, i don't use encryption.
Important: sometimes i have problems with json decode before decrypt data. Depending your use, you can adopt a system storing data separated by ; and : encrypted like:
encrypt_function($key, "product:K10072;qtd:1|product:1042;qtd:1|product:3790;qtd:1") to store products; and
encrypt_function($key, "cad_products:1;mdf_products:2;cad_collabs:0") to store security grants.
Any system can be hacked. You need to create an applycation with constant user data verification and log analyzing. This system, yes, needs to be fast.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js