What's the best practice to implement "read receipts" on group chats in AWS AppSync and Amplify? - concurrency

I'm building an Angular 11 web app using AppSync for the backend.
I've mentioned group chat, but basically I have a feature in my app where I have an announcement feature where there's a person creating announcements to a specific audience (can be individual members or groups of members) and whenever the receiving user opens the announcement, it has to mark that announcement as read for that user in their UI and also let the sender know that it has been opened by that particular member.
I have an idea for implementing this:-
Each announcement needs to have a "seenBy" which aggregates the user Ids of the ones who open it.
Each member also has an attribute in their user object named "announcementsRead" which is an array of Ids of the announcements that they have opened.
In the UI when I'm gathering the list of announcements for the user, the ones whose ID don't belong in the member's own announcementsRead array, will be marked as unread.
When they click on it and it is opened, I make 2 updates - a) To the announcement object I simply push the member's user ID to the "seenBy" attribute and push to db. b) to the member's user object, I add the announcement's id to the "announcementRead" attribute and push it to the DB.
This is just something that I came up with.
Please let me know if there are any pitfalls to this approach. Or if there are simpler ways to achieve this functionality.
I have a few concerns as well:-
Let's say that two users are opening an announcement at the same time, and the clients try to update the announcement with the updated seenBy containing the user's ID, what happens when the two requests from two different clients are happening concurrently? It's possible that the first user fetches the object and then the second user fetches it immediately, and by the time the second user has updated the attribute and sent it back to the DB, the first user has already written their updated data. In such a case the second user's write to the DB will overwrite the first user's change. I am not sure of the internal mechanisms of the amplify data store, but I can imagine this happening. Is this possible? If so, how do we ensure that it is prevented?
Is it really necessary for me to maintain the "announcementsRead" attribute in the user? I mean I can imagine generating that list in the UI every time I get the list of announcements by checking if the current user's ID exists in the announcement's "seenBy" and maintaining that list in the UI, that way we can eliminate redundancy of info in the DB and also it would make sense to not accumulate extremely old announcement IDs that may have been deleted. But I'm wondering if having this on the member actually helps in an indispensable way.
Hope my questions are clear.

Related

What is the best practice to write an API for an action that affects multiple tables?

Consider the example use case as below.
You need to invite a Company as your connection. The sub actions that needs to happen in this situation is.
A Company need to be created by adding an entry to the Company table.
A User account needs to be created for the staff member to login by creating an entry in the User table.
A Staff object is created to ensure that the User has access to the Company by creating an entry in the Staff table.
The invited company is related to the invitee company, so a relation similar to friendship is created to connect the two companies by creating an entry in the Connection table.
An Invitation object is created to store the information as to who invited who onto the system, with other information like invitation time, invite message etc. For this, and entry is created in the Invitation table.
An email needs to be sent to the user to accept invitation and join by setting password.
As you can see, entries are to be made in 5 Tables.
Is it a good practice to do all this in a single API call?
If not, what are the other option.
How do I maintain data integrity if it is to be split into multiple APIs?
If the actions need to be atomic, then it's definitely best to do this in a single API call. Otherwise, you run the risk of someone not completing all the tasks required and leaving the resources in a potentially conflicting state.
That said, you're not updating a single resource, so this isn't a good fit for a single RESTful resource creation call (e.g., POST /companyInvitations) -- as all these other things being created and stitched together might lead to quite a bit of confusion.
If the action you're doing is "inviting a Company", then one option is to use Google's "custom method" syntax (POST /resources/1234:action) as defined in AIP-136. In this case, you might do POST /companies/1234:invite which says "I want to invite Company #1234 to be my connection".
Under the hood, this might atomically upsert (create if resources don't already exist) all the right things that you've listed out.
Something to consider when approaching an API call where multiple things happen when called, is how long those downstream actions take. Leaving the api call blocked isn't the best idea in the world while things are processing in the background.
You could consider (depending on your usecase) taking in the api request, immediately responding with a 200 status, and dropping the request onto an internal queue for processing. When your background service picks up the request it can update whatever needs to be updated and manage the transactions appropriately etc. This also caters for horizontal scaling scenarios where lots of "worker" services can be deployed to process the requests.
As part of this you could consider adding another "status" endpoint where requests can be made to find out how things are going. To avoid lots of polling status requests you could also take in callback details as part of the original api call which then gets called when the background processing is complete. Or you could do both!

Handling multiple users concurrently populating a PostgreSQL database

I'm currently trying to build a web app that would allow many users to query an external API (I cannot retrieve all the data served by this API at regular intervals to populate my PostgreSQL database for various reasons). I've read several thing about ACID and MVCC but still, I'm not sure there won't be any problem if several users are populating/reading my PostgreSQL database at the very same time. So here I'm asking for advice (I'm very new to this field)!
Let's say my users query the external API to retrieve articles. They make their search via a form, the back end gets it, queries the api, populates the database, then query the database to return some data to the front end.
Would it be okay to simply create a unique table to store the articles returned by the API when users are querying it ?
Shall I rather store the articles returned by the API and associate each of them to the user that requested it (the Article model will contain a foreign key mapping to a User model)?
Or shall I give each user a table (data isolation would be good but that sounds very inefficient)?
Thanks for your help !
Would it be okay to simply create a unique table to store the articles returned by the API when users are querying it ?
Yes. If the articles have unique keys (doi?) you could use INSERT...ON CONFLICT DO NOTHING to handle the (presumably very rare) case that an article is requested by two people nearly simultaneously.
Shall I rather store the articles returned by the API and associate each of them to the user that requested it (the Article model will contain a foreign key mapping to a User model)?
Do you want to? Is there a reason to? Do you care who requested each article? It sounds like you anticipating storing only the first person to request each article, and not every request?
Or shall I give each user a table (data isolation would be good but that sounds very inefficient)?
Right, you would be hitting the API a lot more often (assuming some large fraction of articles are requested more than once) and storing a lot of duplicates. It might not even solve the problem, if one person hits "submit" twice in a row, or has multiple tabs open, or writes a bot to hit your service in parallel.

Real Time Google Analytics API - Identify user session

I'm retreiving event data using Real Time Google Analytics API, so as to trigger responses each time conditions are met - while the user navigates.
This is my actual query on Google Analytics Real Time API (which works perfectly!)
return service.data().realtime().get(
ids='ga:' + profile_id,
metrics='rt:totalEvents',
dimensions='rt:eventAction,rt:eventLabel,rt:eventCategory',
max_results='25').execute()
I'd like to show results grouped by each particular session or user. So as to trigger a message to this particular user if some conditions are met.
Is that possible? And if so, how do apply this criteria to this query?
"Trigger a message to a particular user" would imply that you either have personally identifiable data stored in GA, which would violate Googles TOS, or that you map an anonymous ID (clientid or UserID or similar) to a key stored in an external database (which might be legally murky, depending on your legislation). Since I don't want to throw away the answer I have written before reading your question to the end :-) I am going to assume the latter.
So, is that possible? No, not really. By default GA does not identify neither an identifier for the user (client id or user id) nor for the session (a session identifier is present only in the BigQuery export schema).
The realtime API has a very limited set of dimensions (mostly I think because data aggregation does not happen in realtime), so you can't even use custom dimensions. Your only chance would be to overwrite one of the standard fields, i.e. campaign information.
Of course this destroys the original data in the field. So you should use an extra view for the API query, send a custom dimension with the user identifier along, and then use an advanced filter to copy the custom dimension value to a standard field (while you original data is safe in your other data views). This is a bit hackish, though.
Also the realtime API only displays the current hit per user, so you cannot group by user in the query in any case - you'd need to download and store the data to an external database and do your aggregation there.

Access list data as a group

We have a company program designed to help us get control over data. It has feature to group all the application of one Client. If I want to take a look at them I click on the Client and I see a list of all applications made for him. Take a look at the picture below:
I was wondering if Microsoft Access can do the same? If yes where should I start looking?
I did some internet search and no solution found.
That is built in, and it is called Subdatasheet. You have relationships properly set between Clients and Order, for instance, when you open the Clients table you will see such small "+" allowing to view the Orders of the current client. You may have to set the Subdatasheet Name property of table Clients to "Orders" in this case.
If you want to work with forms, you can build a continuous from for Clients, then one for Orders, then insert the Orders subform in the Footer of the Clients form. Access might tell you you can't do this, just ignore, it works.
In Access that would simply be a continuous form with a filter. Typically opened from a list of clients, setting a filter for the applications of the selected client.
Unless I'm misunderstanding the question.

Marking users as new when created via a backend's authenticate in Django

I have an authentication backend based off a legacy database. When someone logs in using that database and there isn't a corresponding User record, I create one. What I'm wondering is if there is some way to alert the Django system to this fact, so that for example I can redirect the brand-new user to a different page.
The only thing I can think of is adding a flag to the users' profile record called something like is_new which is tested once and then set to False as soon as they're redirected.
Basically, I'm wondering if someone else has figured this out so I don't have to reinvent the wheel.
I found the easiest way to accomplish this is to do exactly as you've said. I had a similar requirement on one of my projects. We needed to show a "Don't forget to update your profile" message to any new member until they had visit their profile for the first time. The client wanted it quickly so we added a 'visited_profile' field to the User's profile and defaulted that to False.
We settled on this because it was super fast to implement, didn't require tinkering with the registration process, worked with existing users, and didn't require extra queries every page load (since the user and user profile is retrieved on every page already). Took us all of 10 minutes to add the field, run the South migration and put an if tag into the template.
There's two methods that I know of to determine if an object has been created:
1) When using get_or_create a tuple is returned of the form (obj, created) where created is a boolean indicating obviously enough whether the object was created or not
2) The post_save signal passes a created paramater, also a boolean, also indicating whether the object was created or not.
At the simplest level, you can use either of these two hooks to set a session var, that you can then check and redirect accordingly.
If you can get by with it, you could also directly redirect either after calling get_or_create or in the post_save signal.
You can use a file-based cache to store the users that aren't yet saved to the database. When the user logs in for the second time, you can look in the cache, find the user object, and save it to the database for good.
Here's some info on django caching: http://docs.djangoproject.com/en/dev/topics/cache/?from=olddocs
PS: don't use Memcached because it will delete all information in the situation of a computer crash or shut down.