After my users log in the app makes too many requests to DynamoDB and I am thinking about different ways to reduce the number of calls.
The app allows user to trigger certain alerts that get sent to other users. For instance: "Shipment received, come to the deck", "Shipment completed", etc.
These are the calls made:
Get company's software license expiration date.
Get the computer's location in the building (i.e. "Office A").
Get the kinds of alerts that can be triggered (i.e. "Shipment received, come to the deck", "Shipment completed", etc).
Get information about the user (i.e. company teams the user belongs to, and admin level the user has (which can be 0, 1, 2, or 3).
Potential solutions I have though about:
Put the company's license expiration date as an attribute of each computer (This would reduce the number of queries by 1). However, if I need to update the company's license expiration date, then I need to update it for EVERY SINGLE computer I have in the system, which sounds impractical to me since I may have 200, 300 or perhaps even more computers in the database.
Add the company's license expiration date as an attribute of the alerts (This would reduce the number of queries by 1); which seems more reasonable because there are only about 15 different kinds of alerts, so if I need to change the license expiration date later on, it is not too bad.
Cache information on the user's device; however, I can't seem to find a good strategy to keep the information stored locally as updated as possible.
I still think these 3 options do not sound too good, so I am hoping someone can point me in the right direction. Is there a good way to reduce the number of calls? I am retrieving information about 4 different entities (license, computer, alert, user), should I leave those 4 calls after users log in?
here are few things that can be done wrt each component.
Get information about the user
keep it in session store and whenever details changes update the store. session stores are usually implemented using cache like redis.
Computer location
Keep it in a distributed cache like redis. lazily initialise it. and whenever new write happens to computer location (rare IMO) remove the entry from redis using dynamodb streams and aws lambda.
Kind of alerts
Same as Computer location
License expiration date
If possible don't allow license expiry date (issue a new one for these cases, so that traceability is maintained.) and cache licence expiry forever. OR same as Computer location.
Related
I'm working on an Ad-tech system which serves millions of users.
Basically users (non anonymous users) can see different Ads that are being created by the marketing team.
Our marketing team want to be able to set some Frequency caps on those Ads (among other targeting rules they already have)
For example:
"We should not show this ad for a user if he already seen/click this ad more than X times in the last Y days"
Also ads can be grouped to campaigns, so rules like that are also possibile:
"We should not show this for a user if he viewed more than X times ads in this campaign in the last Y days".
Also our marketing might wanna know how many people viewed/click a specific add in the last Y days.
We have roughly 200K RPM and our responses should be very fast.
The smallest unit of time for our queries is one day and it will not change.
Few questions and thoughts:
Is DynamoDB a good fit?
I thought about creating a table for each event type (Click/View/Close..)
What is the best way to configure the primary key?
I thought about settings the primary key as the user id and the sort key as a combination of the ad id and the current day {dd/mm/yyyy}
I thought about use "ADD" operation to increase the counter when a user click/view/.. an Ad in a specific date. are they expensive operations? do I have an alternative?
What is the best way I can use to also be able to query per ad and campaigns as well (for example: "all users views for all ads in campaign" or "get all ad views in the last 40 days) ) ?
What other considerations should I take in mind?
Thanks a lot
Consider sample chat application where user purchase monthly/annual subscriptions (subscriptions like Amazon Prime, etc).
As soon as the subscription expires, user should not be able to send messages in app.
User can end their subscription before the original subscription end date.
One solution in my mind (Frontend) - to cache the end date in app and before every "send message" operation, compare the end date and current date.
But the problem is - if user ends the subscription early, user will still be able to send the message.
How can I push update the new subscription end date in cache.
Another solution was (Backend) - I have a table in backend storing subscription details like subscription_id, user_id, subscription_enddate. So before any "send message" operation, query the subscription table and compare the dates and then continue/cancel further operations.
Q1. Should I go with Backend solution or can you please share some improvements to frontend method or any best practice for this scenario?
Q2. Also is storing subscription details in separate table best practice or any good design instead. ?
PS- Sample chat app is based on AWS Amplify Datastore
Let me try to breakdown the answer and give my opinion. I would also like to mention solutions to such problems are determined by the scale and various tradeoffs.
Q1-
If sending messages has an adverse effect, you should never rely on the frontend solution only as it is easy to bypass them. You can use a mixture to ensure that the load is not very high on the backend.
Adding a Frontend Cache for subscription will ensure you will be able to filter most of the messages on the frontend if the cache is not tampered with.
Adding a service before the queue, that validates whether the user subscription has expired adds one more layer of security. If the user subscription is valid it pushes the message to Queue else throws an error. This way any bad actor can also not misuse the system.
Q2-
Depending on the use-cases and load, you can have a separate table or a separate micro-service for the subscription itself.
When to have a separate micro-service?
When the subscription data is required from multiple applications in your system and needs to have its own scalability independent of others, it can be beneficial to have a separate micro-service.
When to have a separate table?
In other cases, where you feel adding a service would be overkill. You can keep the data separate in a different table/DB giving you the flexibility to change subscription and even extract it easily in the future.
I am trying to design a service to send emails to users. This service is pretty much similar to Amazon SES.
One of the requirement is to keep track of all the emails that this system will be sending. I am confused as how to design this solution so that I can maintain the emails sent with parent user(known at the time of sending email) who sent emails.
If I start dumping all the email related data in relational DB, it will grow exponentially over period of time and will create a lot of problem. Similarly if I store these things in Cassandra it will grow at good speed and create problems.
Need for storing this information:-
1) In future need to know if email was sent to a particular user and when.
2) If the feedback loop creates complaint mail, I will need to map it back to a particular email id(which will be present in complaint email) and parent user who sent it(which will be stored at the time email was sent).
Can someone help me giving pointers as, how to store or create some cache in a way to achieve this.
It's unlikely to grow "exponentially." Seems like it will grow linearly. Regardless, if you need the ability to look up who sent what to whom, then you have no choice but to store it.
What you need to do is estimate how many emails you send per day, and how much data you need to save with each of those emails. Do the math and determine how much data you expect to be generating each day. Then at least you can figure out how large your database will get over time.
You'll also need to consider how you want to index the data. Seems like you'll want to index by email id, at least. You might also want to index by sender, and also possibly by recipient. Those indexes will create additional per-email data storage requirements. How much is something you'll have to determine through analysis.
How much actual disk space this will occupy per email is hard to determine. If the messages are short, you could probably get more than a million emails per gigabyte in a relational database. You could potentially do much better than that if you compress the message data, or apply other techniques that take advantage of similarities in the messages. For example, if you send the exact same message to a thousand recipients, you can store a single copy of the message and just store a reference to that message in the individual email records.
You might also want to consider how long you need to store each message. Do you need to store everything forever, or can you periodically remove all messages that are older than a year (or some other relatively long amount of time)?
I am thinking of building a chat app with AWS DynamoDB. The app will support 1:1 and group chats.
I want to create one table for each one of the chats, where there is a record for each sent chat text line. Is DynamoDB suitable for this kind of job?
I am also thinking of merging both tables. But is this a good idea, if there are – let's assume – 100k or 1000k users?
I think you may run into problems with the read capacity on your table. The write capacity should be ok, as there are not so many messages coming in per second (e.g. 10 or so), but you'll need to constantly read from it for all users, so that'll be expensive.
If you want to use DynamoDB just as storage and distribute the chat messages like in any normal chat over the network, then it may make sense, depending on your use cases. You could, assuming you have a hash key UserId and Timestamp, query all messages from a specific user during a specific period as a result. If you want, however, search within the chat text (a much more useful feature, probably), then DynamoDB won't work per se. It's not like SQL, where you could do a LIKE '%abc%' query (which isn't a good idea in SQL either).
Probably you're better off using S3 as data storage and ElasticSearch as search instrument. If you require the aforementioned use case "get all messages from user X in timespan S" (as a simple example) you could additionally use DynamoDB to store metadata, such as UserId, Timestamp, PositionInFile or something like that.
I've started a django project that will include an analytics app. I want that app to use either couchDB or mongoDB for storing data.
The initial idea was (since the client already is using Google Analytics) to once a day/week/month grab data from GA, and store store it locally as values in database. Which would ultimately build a database of entries - one entry per user per month - with summed values like
{"date":"11.2011""clicks": 21, "pageviews": 40, "n": n},
for premium users there could be one entry per user per week or even day.
The question would be:
grab analytics from GA, do a sum entries for clicks, visits etc.
or
store clicks and whatever values locally and once a month do sums for display ?
Lukasz, unless Google Analytics has really relaxed their privacy levels, you're not going to be able to access user-level records (but check out the answer here: Django saving the whole request for statistics, whats available?)
Right, old question but I've just finished the project so I'll just write what I did.
Since I didn't need concurrency and wanted more speed approach, I found that mongodb is better for that.
The final document schema that I've used is
{'date': '11.2009', 'pageviews': 40, 'clicks': 13, 'otherdata': 'that i can use as filters'}
The scope of my local analytics is monthly, so I create one entry in mongdb per user per month, and update it each day. As said just now, I update data daily, and store only summaries and averages of those.
What else. Re: Jamie's answer... The system is using GA events, so I've got access to all data that i need.
Hope someone may find it interesting.
cheers and thanks for ideas !