Real Time Google Analytics API - Identify user session - python-2.7

I'm retreiving event data using Real Time Google Analytics API, so as to trigger responses each time conditions are met - while the user navigates.
This is my actual query on Google Analytics Real Time API (which works perfectly!)
return service.data().realtime().get(
ids='ga:' + profile_id,
metrics='rt:totalEvents',
dimensions='rt:eventAction,rt:eventLabel,rt:eventCategory',
max_results='25').execute()
I'd like to show results grouped by each particular session or user. So as to trigger a message to this particular user if some conditions are met.
Is that possible? And if so, how do apply this criteria to this query?

"Trigger a message to a particular user" would imply that you either have personally identifiable data stored in GA, which would violate Googles TOS, or that you map an anonymous ID (clientid or UserID or similar) to a key stored in an external database (which might be legally murky, depending on your legislation). Since I don't want to throw away the answer I have written before reading your question to the end :-) I am going to assume the latter.
So, is that possible? No, not really. By default GA does not identify neither an identifier for the user (client id or user id) nor for the session (a session identifier is present only in the BigQuery export schema).
The realtime API has a very limited set of dimensions (mostly I think because data aggregation does not happen in realtime), so you can't even use custom dimensions. Your only chance would be to overwrite one of the standard fields, i.e. campaign information.
Of course this destroys the original data in the field. So you should use an extra view for the API query, send a custom dimension with the user identifier along, and then use an advanced filter to copy the custom dimension value to a standard field (while you original data is safe in your other data views). This is a bit hackish, though.
Also the realtime API only displays the current hit per user, so you cannot group by user in the query in any case - you'd need to download and store the data to an external database and do your aggregation there.

Related

What's the best practice to implement "read receipts" on group chats in AWS AppSync and Amplify?

I'm building an Angular 11 web app using AppSync for the backend.
I've mentioned group chat, but basically I have a feature in my app where I have an announcement feature where there's a person creating announcements to a specific audience (can be individual members or groups of members) and whenever the receiving user opens the announcement, it has to mark that announcement as read for that user in their UI and also let the sender know that it has been opened by that particular member.
I have an idea for implementing this:-
Each announcement needs to have a "seenBy" which aggregates the user Ids of the ones who open it.
Each member also has an attribute in their user object named "announcementsRead" which is an array of Ids of the announcements that they have opened.
In the UI when I'm gathering the list of announcements for the user, the ones whose ID don't belong in the member's own announcementsRead array, will be marked as unread.
When they click on it and it is opened, I make 2 updates - a) To the announcement object I simply push the member's user ID to the "seenBy" attribute and push to db. b) to the member's user object, I add the announcement's id to the "announcementRead" attribute and push it to the DB.
This is just something that I came up with.
Please let me know if there are any pitfalls to this approach. Or if there are simpler ways to achieve this functionality.
I have a few concerns as well:-
Let's say that two users are opening an announcement at the same time, and the clients try to update the announcement with the updated seenBy containing the user's ID, what happens when the two requests from two different clients are happening concurrently? It's possible that the first user fetches the object and then the second user fetches it immediately, and by the time the second user has updated the attribute and sent it back to the DB, the first user has already written their updated data. In such a case the second user's write to the DB will overwrite the first user's change. I am not sure of the internal mechanisms of the amplify data store, but I can imagine this happening. Is this possible? If so, how do we ensure that it is prevented?
Is it really necessary for me to maintain the "announcementsRead" attribute in the user? I mean I can imagine generating that list in the UI every time I get the list of announcements by checking if the current user's ID exists in the announcement's "seenBy" and maintaining that list in the UI, that way we can eliminate redundancy of info in the DB and also it would make sense to not accumulate extremely old announcement IDs that may have been deleted. But I'm wondering if having this on the member actually helps in an indispensable way.
Hope my questions are clear.

Handling multiple users concurrently populating a PostgreSQL database

I'm currently trying to build a web app that would allow many users to query an external API (I cannot retrieve all the data served by this API at regular intervals to populate my PostgreSQL database for various reasons). I've read several thing about ACID and MVCC but still, I'm not sure there won't be any problem if several users are populating/reading my PostgreSQL database at the very same time. So here I'm asking for advice (I'm very new to this field)!
Let's say my users query the external API to retrieve articles. They make their search via a form, the back end gets it, queries the api, populates the database, then query the database to return some data to the front end.
Would it be okay to simply create a unique table to store the articles returned by the API when users are querying it ?
Shall I rather store the articles returned by the API and associate each of them to the user that requested it (the Article model will contain a foreign key mapping to a User model)?
Or shall I give each user a table (data isolation would be good but that sounds very inefficient)?
Thanks for your help !
Would it be okay to simply create a unique table to store the articles returned by the API when users are querying it ?
Yes. If the articles have unique keys (doi?) you could use INSERT...ON CONFLICT DO NOTHING to handle the (presumably very rare) case that an article is requested by two people nearly simultaneously.
Shall I rather store the articles returned by the API and associate each of them to the user that requested it (the Article model will contain a foreign key mapping to a User model)?
Do you want to? Is there a reason to? Do you care who requested each article? It sounds like you anticipating storing only the first person to request each article, and not every request?
Or shall I give each user a table (data isolation would be good but that sounds very inefficient)?
Right, you would be hitting the API a lot more often (assuming some large fraction of articles are requested more than once) and storing a lot of duplicates. It might not even solve the problem, if one person hits "submit" twice in a row, or has multiple tabs open, or writes a bot to hit your service in parallel.

How to structure AWS DynamoDB Table with Cognito

I am trying to do something that would be relatively simple for a relational database but I don't know how to do it for a nonrelational one.
I am trying to make a simple task web app on AWS where people can post their tasks.
I have a table called tasks which uses the userid from the auth token provisioned by AWS Cognito. I am wondering how I can return the user information. I do not want to rely on Cognito by simply calling it every time a user sends a request. So, my thought would be to create another table to store all of the user information. That, however, is not a very nonrelational way of doing things since JOINS are so bad.
So, I was wondering if I should do any of the following
a) Using RDS instead
b) Not use Cognito and set up my own Auth system
c) Just doing the JOIN with a table containing all of the user info
d) Doing the request to Cognito each time
Although I personally like the idea of cognito, at this time it has some major drawbacks...
You can not backup / restore a user pool without loosing their password, also you have to implement your own backup/restore.
A way around is to save the user password in a cognito custom attribute.
I expected by using api gateway/lambda authorizer to have all the user data in the lambda context but its not there. Or am indoing something wrong with api gateway template mapping 😬
Good thing api gateway/lambda authorizer, can be cached by up to an hour, wont call the authorizer function again which seems like a top feature.
Does not work well with cloudformation, with every attribute update it recreates the user pool without restoring the users, thus loosing the users.
I used it only in one implementation and ended up duplicating the users in DynamoDB as well.
I'm avoiding it ever since. I wish they solve these issues as it looks like a service to be included with every project saving lot of time.
Reading your post I asked myself the same questions and not sure the answer either 😄
Pricing seems fair.
The default 5 requests/second to get user info seems strange as it woukd be consumed by one page load doing multiple ajax api requests .
For this in DynamoDB, there is no need for another table. If the access patterns dictate you store the information in another object, then so be it, but more than likely it should be in the same table. Sounds like you need two different item types in the same table.
For the task PK of userid and SK of task::your-task-id. This would allow you to get all of a user's tasks easily or even a specific task very easily if you knew the task ID. You might even have an attribute that is a timestamp and then have a GSI that is the userID as the PK and the timestamp as the SK. then you could use the begins_with operator on the SK and "paginate through all of the user's tasks that are in the month of 2019-04".
For the user information, have the userID be the PK and the SK be user_info and attributes be the user's information.
The one challenge for this is if you were to go to extremes and one single user is doing thousands of ops per second. e.g. "All tweets by very popular celebrity". If you have such a use case there are ways around that as well, e.g. write sharding. These are just examples for you to play with. Without knowing all your access patterns, I cannot model everything you might want to do. I highly recommend you go watch this presentation from reInvent 2018.

DynamoDB table/index schema design for querying multi-valued attributes

I'm building a DynamoDB app that will eventually serve a large number (millions) of users. Currently the app's item schema is simple:
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
email: "foo#foo.com",
... other attributes ...
}
When a new user signs up, or if a user wants to find another user by email address, we'll need to look up users by email instead of by userId. With the current schema that's easy: just use a global secondary index with email as the Partition Key.
But we want to enable multiple email addresses per user, and the DynamoDB Query operation doesn't support a List-typed KeyConditionExpression. So I'm weighing several options to avoid an expensive Scan operation every time a user signs up or wants to find another user by email address.
Below is what I'm planning to change to enable additional emails per user. Is this a good approach? Is there a better option?
Add a sort key column (e.g. itemTypeAndIndex) to allow multiple items per userId.
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
itemTypeAndIndex: "main", // sort key
email: "foo#foo.com",
... other attributes ...
}
If the user adds a second, third, etc. email, then add a new item for each email, like this:
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
itemTypeAndIndex: "Email-2", // sort key
email: "bar#bar.com"
// no more attributes
}
The same global secondary index (with email as the Partition Key) can still be used to find both primary and non-primary email addresses.
If a user wants to change their primary email address, we'd swap the email values in the "primary" and "non-primary" items. (Now that DynamoDB supports transactions, doing this will be safer than before!)
If we need to delete a user, we'd have to delete all the items for that userId. If we need to merge two users then we'd have to merge all items for that userId.
The same approach (new items with same userId but different sort keys) could be used for other 1-user-has-many-values data that needs to be Query-able
Is this a good way to do it? Is there a better way?
Justin, for searching on attributes I would strongly advise not to use DynamoDB. I am not saying, you can't achieve this. However, I see a few problems that will eventually come in your path if you will go this root.
Using sort-key on email-id will result in creating duplicate records for the same user i.e. if a user has registered 5 email, that implies 5 records in your table with the same schema and attribute except email-id attribute.
What if a new use-case comes in the future, where now you also want to search for a user based on some other attribute(for example cell phone number, assuming a user may have more then one cell phone number)
DynamoDB has a hard limit of the number of secondary indexes you can create for a table i.e. 5.
Thus with increasing use-case on search criteria, this solution will easily become a bottle-neck for your system. As a result, your system may not scale well.
To best of my knowledge, I can suggest a few options that you may choose based on your requirement/budget to address this problem using a combination of databases.
Option 1. DynamoDB as a primary store and AWS Elasticsearch as secondary storage [Preferred]
Store the user records in DynamoDB table(let's call it UserTable)as and when a user registers.
Enable DynamoDB table streams on UserTable table.
Build an AWS Lambda function that reads from the table's stream and persists the records in AWS Elasticsearch.
Now in your application, use DynamoDB for fetching user records from id. For all other search criteria(like searching on emailId, phone number, zip code, location etc) fetch the records from AWS Elasticsearch. AWS Elasticsearch by default indexes all the attributes of your record, so you can search on any field within millisecond of latency.
Option 2. Use AWS Aurora [Less preferred solution]
If your application has a relational use-case where data are related, you may consider this option. Just to call out, Aurora is a SQL database.
Since this is a relational storage, you can opt for organizing the records in multiple tables and join them based on the primary key of those tables.
I will suggest for 1st option as:
DynamoDB will provide you durable, highly available, low latency primary storage for your application.
AWS Elasticsearch will act as secondary storage, which is also durable, scalable and low latency storage.
With AWS Elasticsearch, you can run any search query on your table. You can also do analytics on data. Kibana UI is provided out of the box, that you may use to plot the analytical data on a dashboard like (how user growth is trending, how many users belong to a specific location, user distribution based on city/state/country etc)
With DynamoDB streams and AWS Lambda, you will be syncing these two databases in near real-time [within few milliseconds]
Your application will be scalable and the search feature can further be enhanced to do filtering on multi-level attributes. [One such example: search all users who belong to a given city]
Having said that, now I will leave this up to you to decide. 😊

Storing Chat Log on AWS DynamoDB?

I am thinking of building a chat app with AWS DynamoDB. The app will support 1:1 and group chats.
I want to create one table for each one of the chats, where there is a record for each sent chat text line. Is DynamoDB suitable for this kind of job?
I am also thinking of merging both tables. But is this a good idea, if there are – let's assume – 100k or 1000k users?
I think you may run into problems with the read capacity on your table. The write capacity should be ok, as there are not so many messages coming in per second (e.g. 10 or so), but you'll need to constantly read from it for all users, so that'll be expensive.
If you want to use DynamoDB just as storage and distribute the chat messages like in any normal chat over the network, then it may make sense, depending on your use cases. You could, assuming you have a hash key UserId and Timestamp, query all messages from a specific user during a specific period as a result. If you want, however, search within the chat text (a much more useful feature, probably), then DynamoDB won't work per se. It's not like SQL, where you could do a LIKE '%abc%' query (which isn't a good idea in SQL either).
Probably you're better off using S3 as data storage and ElasticSearch as search instrument. If you require the aforementioned use case "get all messages from user X in timespan S" (as a simple example) you could additionally use DynamoDB to store metadata, such as UserId, Timestamp, PositionInFile or something like that.