How to keep track of visitors/auth users/ip address activities

How to keep track of visitors/auth users/ip address activities - cookies

We need to start keeping track of user activities across the website. This means a user with inappropriate behavior should be marked somehow. First of all, how can I actually catch if an IP address is accessing the website from different browsers? Can this be done with cookies? Can this be done by keeping track of IP address's user agent activity? Something like keeping data inside a file/database table and per 2-3 days analyze this data and decide which accounts to delete.

Related

What is the best practice to write an API for an action that affects multiple tables?

Consider the example use case as below.
You need to invite a Company as your connection. The sub actions that needs to happen in this situation is.
A Company need to be created by adding an entry to the Company table.
A User account needs to be created for the staff member to login by creating an entry in the User table.
A Staff object is created to ensure that the User has access to the Company by creating an entry in the Staff table.
The invited company is related to the invitee company, so a relation similar to friendship is created to connect the two companies by creating an entry in the Connection table.
An Invitation object is created to store the information as to who invited who onto the system, with other information like invitation time, invite message etc. For this, and entry is created in the Invitation table.
An email needs to be sent to the user to accept invitation and join by setting password.
As you can see, entries are to be made in 5 Tables.
Is it a good practice to do all this in a single API call?
If not, what are the other option.
How do I maintain data integrity if it is to be split into multiple APIs?

If the actions need to be atomic, then it's definitely best to do this in a single API call. Otherwise, you run the risk of someone not completing all the tasks required and leaving the resources in a potentially conflicting state.
That said, you're not updating a single resource, so this isn't a good fit for a single RESTful resource creation call (e.g., POST /companyInvitations) -- as all these other things being created and stitched together might lead to quite a bit of confusion.
If the action you're doing is "inviting a Company", then one option is to use Google's "custom method" syntax (POST /resources/1234:action) as defined in AIP-136. In this case, you might do POST /companies/1234:invite which says "I want to invite Company #1234 to be my connection".
Under the hood, this might atomically upsert (create if resources don't already exist) all the right things that you've listed out.

Something to consider when approaching an API call where multiple things happen when called, is how long those downstream actions take. Leaving the api call blocked isn't the best idea in the world while things are processing in the background.
You could consider (depending on your usecase) taking in the api request, immediately responding with a 200 status, and dropping the request onto an internal queue for processing. When your background service picks up the request it can update whatever needs to be updated and manage the transactions appropriately etc. This also caters for horizontal scaling scenarios where lots of "worker" services can be deployed to process the requests.
As part of this you could consider adding another "status" endpoint where requests can be made to find out how things are going. To avoid lots of polling status requests you could also take in callback details as part of the original api call which then gets called when the background processing is complete. Or you could do both!

Is IP address retrieval authorized in terms of users' privacy?

I'm currently new developping large scale webservices and I'd like to retrieve IP addresses from visitors to make some stats about the country/state of origin.
Is it allowed to take IP addresses from clients for internal use?
As this is a kind of personal information, I wonder if it is legal or not retrieving it.

It's not possible for you not to know the client IP (because your site couldn't work without it), but you don't have to keep it. From a GDPR perspective, data is only "personal data" if it can be linked to an individual (even indirectly), so for example you could take the client IP, do some kind of GeoIP lookup on it (preferably local), and then increment a country counter. Then you can simply discard the IP, and the aggregate data you retain has no way of being connected back to an individual, so it's not personal data.
A very simple approach would be a table like this:
Country
Count
France
2
Germany
4
USA
10
So you would just bump the count for the country each time. This gives you the data you're after, but without any privacy impact for your users, and no GDPR exposure.

What's the best practice to implement "read receipts" on group chats in AWS AppSync and Amplify?

I'm building an Angular 11 web app using AppSync for the backend.
I've mentioned group chat, but basically I have a feature in my app where I have an announcement feature where there's a person creating announcements to a specific audience (can be individual members or groups of members) and whenever the receiving user opens the announcement, it has to mark that announcement as read for that user in their UI and also let the sender know that it has been opened by that particular member.
I have an idea for implementing this:-
Each announcement needs to have a "seenBy" which aggregates the user Ids of the ones who open it.
Each member also has an attribute in their user object named "announcementsRead" which is an array of Ids of the announcements that they have opened.
In the UI when I'm gathering the list of announcements for the user, the ones whose ID don't belong in the member's own announcementsRead array, will be marked as unread.
When they click on it and it is opened, I make 2 updates - a) To the announcement object I simply push the member's user ID to the "seenBy" attribute and push to db. b) to the member's user object, I add the announcement's id to the "announcementRead" attribute and push it to the DB.
This is just something that I came up with.
Please let me know if there are any pitfalls to this approach. Or if there are simpler ways to achieve this functionality.
I have a few concerns as well:-
Let's say that two users are opening an announcement at the same time, and the clients try to update the announcement with the updated seenBy containing the user's ID, what happens when the two requests from two different clients are happening concurrently? It's possible that the first user fetches the object and then the second user fetches it immediately, and by the time the second user has updated the attribute and sent it back to the DB, the first user has already written their updated data. In such a case the second user's write to the DB will overwrite the first user's change. I am not sure of the internal mechanisms of the amplify data store, but I can imagine this happening. Is this possible? If so, how do we ensure that it is prevented?
Is it really necessary for me to maintain the "announcementsRead" attribute in the user? I mean I can imagine generating that list in the UI every time I get the list of announcements by checking if the current user's ID exists in the announcement's "seenBy" and maintaining that list in the UI, that way we can eliminate redundancy of info in the DB and also it would make sense to not accumulate extremely old announcement IDs that may have been deleted. But I'm wondering if having this on the member actually helps in an indispensable way.
Hope my questions are clear.

DynamoDB - Reducing number of queries

After my users log in the app makes too many requests to DynamoDB and I am thinking about different ways to reduce the number of calls.
The app allows user to trigger certain alerts that get sent to other users. For instance: "Shipment received, come to the deck", "Shipment completed", etc.
These are the calls made:
Get company's software license expiration date.
Get the computer's location in the building (i.e. "Office A").
Get the kinds of alerts that can be triggered (i.e. "Shipment received, come to the deck", "Shipment completed", etc).
Get information about the user (i.e. company teams the user belongs to, and admin level the user has (which can be 0, 1, 2, or 3).
Potential solutions I have though about:
Put the company's license expiration date as an attribute of each computer (This would reduce the number of queries by 1). However, if I need to update the company's license expiration date, then I need to update it for EVERY SINGLE computer I have in the system, which sounds impractical to me since I may have 200, 300 or perhaps even more computers in the database.
Add the company's license expiration date as an attribute of the alerts (This would reduce the number of queries by 1); which seems more reasonable because there are only about 15 different kinds of alerts, so if I need to change the license expiration date later on, it is not too bad.
Cache information on the user's device; however, I can't seem to find a good strategy to keep the information stored locally as updated as possible.
I still think these 3 options do not sound too good, so I am hoping someone can point me in the right direction. Is there a good way to reduce the number of calls? I am retrieving information about 4 different entities (license, computer, alert, user), should I leave those 4 calls after users log in?

here are few things that can be done wrt each component.
Get information about the user
keep it in session store and whenever details changes update the store. session stores are usually implemented using cache like redis.
Computer location
Keep it in a distributed cache like redis. lazily initialise it. and whenever new write happens to computer location (rare IMO) remove the entry from redis using dynamodb streams and aws lambda.
Kind of alerts
Same as Computer location
License expiration date
If possible don't allow license expiry date (issue a new one for these cases, so that traceability is maintained.) and cache licence expiry forever. OR same as Computer location.

Persistent data with finer granularity than a session in django

Suppose that a staff member using a web site can exchange tickets for a customer. It is convenient to store data about the multi-view exchange in the session. But more than one exchange might be going on at the same time.
One way to keep track of the separate data in the session is to create a sub-session key and use that to access the session data. This key would need to be part of the view as a hidden input or it would need to be in the URL. This all gets pretty messy and the hidden variable method isn't great since redirects might occur during the exchange.
Is there a clean way to do this?

Use a database table that tracks information for a particular exchange and read/write from it when opening/submitting your wizard pages. Sessions are much more volatile by nature.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js