I am trying to do something that would be relatively simple for a relational database but I don't know how to do it for a nonrelational one.
I am trying to make a simple task web app on AWS where people can post their tasks.
I have a table called tasks which uses the userid from the auth token provisioned by AWS Cognito. I am wondering how I can return the user information. I do not want to rely on Cognito by simply calling it every time a user sends a request. So, my thought would be to create another table to store all of the user information. That, however, is not a very nonrelational way of doing things since JOINS are so bad.
So, I was wondering if I should do any of the following
a) Using RDS instead
b) Not use Cognito and set up my own Auth system
c) Just doing the JOIN with a table containing all of the user info
d) Doing the request to Cognito each time
Although I personally like the idea of cognito, at this time it has some major drawbacks...
You can not backup / restore a user pool without loosing their password, also you have to implement your own backup/restore.
A way around is to save the user password in a cognito custom attribute.
I expected by using api gateway/lambda authorizer to have all the user data in the lambda context but its not there. Or am indoing something wrong with api gateway template mapping 😬
Good thing api gateway/lambda authorizer, can be cached by up to an hour, wont call the authorizer function again which seems like a top feature.
Does not work well with cloudformation, with every attribute update it recreates the user pool without restoring the users, thus loosing the users.
I used it only in one implementation and ended up duplicating the users in DynamoDB as well.
I'm avoiding it ever since. I wish they solve these issues as it looks like a service to be included with every project saving lot of time.
Reading your post I asked myself the same questions and not sure the answer either 😄
Pricing seems fair.
The default 5 requests/second to get user info seems strange as it woukd be consumed by one page load doing multiple ajax api requests .
For this in DynamoDB, there is no need for another table. If the access patterns dictate you store the information in another object, then so be it, but more than likely it should be in the same table. Sounds like you need two different item types in the same table.
For the task PK of userid and SK of task::your-task-id. This would allow you to get all of a user's tasks easily or even a specific task very easily if you knew the task ID. You might even have an attribute that is a timestamp and then have a GSI that is the userID as the PK and the timestamp as the SK. then you could use the begins_with operator on the SK and "paginate through all of the user's tasks that are in the month of 2019-04".
For the user information, have the userID be the PK and the SK be user_info and attributes be the user's information.
The one challenge for this is if you were to go to extremes and one single user is doing thousands of ops per second. e.g. "All tweets by very popular celebrity". If you have such a use case there are ways around that as well, e.g. write sharding. These are just examples for you to play with. Without knowing all your access patterns, I cannot model everything you might want to do. I highly recommend you go watch this presentation from reInvent 2018.
Related
Consider the example use case as below.
You need to invite a Company as your connection. The sub actions that needs to happen in this situation is.
A Company need to be created by adding an entry to the Company table.
A User account needs to be created for the staff member to login by creating an entry in the User table.
A Staff object is created to ensure that the User has access to the Company by creating an entry in the Staff table.
The invited company is related to the invitee company, so a relation similar to friendship is created to connect the two companies by creating an entry in the Connection table.
An Invitation object is created to store the information as to who invited who onto the system, with other information like invitation time, invite message etc. For this, and entry is created in the Invitation table.
An email needs to be sent to the user to accept invitation and join by setting password.
As you can see, entries are to be made in 5 Tables.
Is it a good practice to do all this in a single API call?
If not, what are the other option.
How do I maintain data integrity if it is to be split into multiple APIs?
If the actions need to be atomic, then it's definitely best to do this in a single API call. Otherwise, you run the risk of someone not completing all the tasks required and leaving the resources in a potentially conflicting state.
That said, you're not updating a single resource, so this isn't a good fit for a single RESTful resource creation call (e.g., POST /companyInvitations) -- as all these other things being created and stitched together might lead to quite a bit of confusion.
If the action you're doing is "inviting a Company", then one option is to use Google's "custom method" syntax (POST /resources/1234:action) as defined in AIP-136. In this case, you might do POST /companies/1234:invite which says "I want to invite Company #1234 to be my connection".
Under the hood, this might atomically upsert (create if resources don't already exist) all the right things that you've listed out.
Something to consider when approaching an API call where multiple things happen when called, is how long those downstream actions take. Leaving the api call blocked isn't the best idea in the world while things are processing in the background.
You could consider (depending on your usecase) taking in the api request, immediately responding with a 200 status, and dropping the request onto an internal queue for processing. When your background service picks up the request it can update whatever needs to be updated and manage the transactions appropriately etc. This also caters for horizontal scaling scenarios where lots of "worker" services can be deployed to process the requests.
As part of this you could consider adding another "status" endpoint where requests can be made to find out how things are going. To avoid lots of polling status requests you could also take in callback details as part of the original api call which then gets called when the background processing is complete. Or you could do both!
I'm using aws congito for user storage. Now I need to get user names and avatars for a list of user posts. But as far as a know, the listUsers api does not accept a list of subs as a filter condition. So how can I achieve this?
I have some other ideas, like syncing user information to a dynamodb by lambda trigger, or storing user name and avatar in posts database (but it's hard to update user info).
Is there a better way to get user information for a list of posts?
Until filtering by list of subs is possible, I would use multiple https://docs.aws.amazon.com/cognito-user-identity-pools/latest/APIReference/API_ListUsers.html calls with only one sub in parallel (fork join approach) and store the data in a cache (Redis would do nicely). This way, it will fetch data from Cognito only once for a user, making it very efficient after the first try.
I am very new to AWS. As the first step I am creating an eCommerce application on my personal interest to give the demo of this application to my colleagues.
I am implementing 'Order' part. For this, I am thinking of moving the data from one table to other. I.e Once the user add the product to cart , it will saved in Cart table in dynamo-db and in cart screen when the user clicks on 'Order'button/Link, the same data as it is in cart table should be moved to Order table and the cart should be empty So, the order can be confirmed.
How could I implement it? Not sure the method I am thinking is right if there any other method to accomplish Order functionality.
The answer to this is really going to depend on your architecture and stack - and even within that you have lots of options.
In a serverless way, i.e. from a static html page with no server-side backend, you could create a lambda function in the supported language of your choice and with the proper IAM role, to move the data from one table to the other - your html page could call it via an API call, and I would suggest you use AWS API Gateway to expose an api endpoint that then calls the lambda function.
If on (one of the other many) other-hands, say you were using ASP.net or PHP on the server side, you could use the AWSSDK to talk to the dynamodb directly and accomplish the same thing.
Besides these two options there are many, many alternatives and variations - and with all of the options you are also going to need to deal with authentication/security to make sure no one can make calls to your database/service that they aren't permitted to - perhaps not important for your demo application, but will definitely be an issue if/when you go live.
I'm retreiving event data using Real Time Google Analytics API, so as to trigger responses each time conditions are met - while the user navigates.
This is my actual query on Google Analytics Real Time API (which works perfectly!)
return service.data().realtime().get(
ids='ga:' + profile_id,
metrics='rt:totalEvents',
dimensions='rt:eventAction,rt:eventLabel,rt:eventCategory',
max_results='25').execute()
I'd like to show results grouped by each particular session or user. So as to trigger a message to this particular user if some conditions are met.
Is that possible? And if so, how do apply this criteria to this query?
"Trigger a message to a particular user" would imply that you either have personally identifiable data stored in GA, which would violate Googles TOS, or that you map an anonymous ID (clientid or UserID or similar) to a key stored in an external database (which might be legally murky, depending on your legislation). Since I don't want to throw away the answer I have written before reading your question to the end :-) I am going to assume the latter.
So, is that possible? No, not really. By default GA does not identify neither an identifier for the user (client id or user id) nor for the session (a session identifier is present only in the BigQuery export schema).
The realtime API has a very limited set of dimensions (mostly I think because data aggregation does not happen in realtime), so you can't even use custom dimensions. Your only chance would be to overwrite one of the standard fields, i.e. campaign information.
Of course this destroys the original data in the field. So you should use an extra view for the API query, send a custom dimension with the user identifier along, and then use an advanced filter to copy the custom dimension value to a standard field (while you original data is safe in your other data views). This is a bit hackish, though.
Also the realtime API only displays the current hit per user, so you cannot group by user in the query in any case - you'd need to download and store the data to an external database and do your aggregation there.
I am thinking of building a chat app with AWS DynamoDB. The app will support 1:1 and group chats.
I want to create one table for each one of the chats, where there is a record for each sent chat text line. Is DynamoDB suitable for this kind of job?
I am also thinking of merging both tables. But is this a good idea, if there are – let's assume – 100k or 1000k users?
I think you may run into problems with the read capacity on your table. The write capacity should be ok, as there are not so many messages coming in per second (e.g. 10 or so), but you'll need to constantly read from it for all users, so that'll be expensive.
If you want to use DynamoDB just as storage and distribute the chat messages like in any normal chat over the network, then it may make sense, depending on your use cases. You could, assuming you have a hash key UserId and Timestamp, query all messages from a specific user during a specific period as a result. If you want, however, search within the chat text (a much more useful feature, probably), then DynamoDB won't work per se. It's not like SQL, where you could do a LIKE '%abc%' query (which isn't a good idea in SQL either).
Probably you're better off using S3 as data storage and ElasticSearch as search instrument. If you require the aforementioned use case "get all messages from user X in timespan S" (as a simple example) you could additionally use DynamoDB to store metadata, such as UserId, Timestamp, PositionInFile or something like that.