Google Tag Manager clickstream to Amazon - amazon-web-services

So the questions has more to do with what services should i be using to have the efficient performance.
Context and goal:
So what i trying to do exactly is use tag manager custom HTML so after each Universal Analytics tag (event or pageview) send to my own EC2 server a HTTP request with a similar payload to what is send to Google Analytics.
What i think, planned and researched so far:
At this moment i have two big options,
Use Kinesis AWS which seems like a great idea but the problem is that it only drops the information in one redshift table and i would like to have at least 4 o 5 so i can differentiate pageviews from events etc ... My solution to this would be to divide from the server side each request to a separated stream.
The other option is to use Spark + Kafka. (Here is a detail explanation)
I know at some point this means im making a parallel Google Analytics with everything that implies. I still need to decide what information (im refering to which parameters as for example the source and medium) i should send, how to format it correctly, and how to process it correctly.
Questions and debate points:
Which options is more efficient and easiest to set up?
Send this information directly from the server of the page/app or send it from the user side making it do requests as i explained before.
Does anyone did something like this in the past? Any personal recommendations?

You'd definitely benefit from Google Analytics custom task feature instead of custom HTML. More on this from Simo Ahava. Also, Google Big Query is quite a popular destination for streaming hit data since it allows many 'on the fly computations such as sessionalization and there are many ready-to-use cases for BQ.

Related

POST Request to REST API with Apache Beam

I have the use case that we're pulling messages from PubSub, and then the idea is to POST those messages to the REST API of PowerBI. We want to create a Live Report using the PushDatasets feature.
The main idea should be something like this:
PubSub -> Apache Beam -> POST REST API -> PowerBI Dashboard
I haven't found any implementation about POST Request inside an Apache Beam job (the runner is not a problem right now), just a GET request inside a DoFn. I don't even know if this is possible.
Has someone experienced doing something like this? or maybe another framework/tool that may be more helpful?
Thanks.
Sending POST requests to an external API is certainly possible, but requires some care. It could be as simple as making the POST inside the body of a DoFn, but be aware that this could lead to duplicates since messages within your pipeline belong to a batch and the Beam model allows entire batches to be reprocessed in case of worker failures, exceptions, etc.
There is some advice in the beam docs on grouping elements for efficient external service calls.
Choosing the best course of action here largely depends on the details of the API you're calling. Does it take message IDs that can be used for deduplication on the PowerBI side? Can the API accept batches of messages? Is there rate limiting?

AWS hosted application need autosuggest & wild card search feature

I 'm not sure if this is the correct platform to ask architecture related question, actually I have a webapplication developed in nodejs & typescript hosted in AWS, and the backend is mongodb and my requirement is to include a search box with wild card & auto suggest search functionality so when I start typing on the text box, it will autosuggest just like we do in google search, so how would I achieve this, querying everytime to mongodb will be kind of slow and if 100's of user start doing that, then my application might start dangling so need your suggestion.
Not tried as this more of architecture help required
Not tried as this more of architecture help required
It's not a very detailed answer but may point you in a direction.
I just built something similar using AWS Lambda, ElasticSearch and API Gateway.
ElasticSearch is great for text searches but needs to be populated with indexed data.
If your dataset is changing, you will have to remember about updating ElasticSearch.
API Gateway routes requests from HTTP to Lambda, of which there are two:
one for analysing data in my data warehouse and producing indices for ElasticSearch, the other for doing the actual search and returning results.

Can I create an algorithm using Amazon MWS API?

I am working with my team to prep a project for a potential client. We've researched Amazon MWS API, and we're trying to develop an algorithm using the data scraped from this API.
Just want to make sure we understand the research correctly:
Is it possible to scrape data from Amazon.com like the plugins RevSeller or HowMany do? Then can we add that data to a database for use in an algorithm to determine whether or not an Amazon reseller should invest in reselling a product?
Thanks!
I am doing a similar project. I don't know the specifics of RevSeller or HowMany, but another very popular plugin is Amzpecty. If you use a tool like Fiddler, you can see the HTTP traffic and figure out what it does. They basically scrape out the ASIN and offer listing ID's on the current page you are looking at and one-by-one call the Amazon Product Advertising API, which is not the same thing as MWS. Out of that data returned, they produce a nice overlay that tells you all kinds of important stuff.
Instead of a browser plugin, I'm just writing an app that makes HTTP calls based on a list of ASIN's to the PA API and then I can run the results through my own algorithms. Hope that gives you a starting point.

What is the best tool to use for real-time web statistics?

I operate a number of content websites that have several million user sessions and need a reliable way to monitor some real-time metrics on particular pieces of content (key metrics being: pageviews/unique pageviews over time, unique users, referrers).
The use case here is for the stats to be visible to authors/staff on the site, as well as to act as source data for real-time content popularity algorithms.
We already use Google Analytics, but this does not update quickly enough (4-24 hours depending on traffic volume). Google Analytics does offer a real-time reporting API, but this is currently in closed beta (I have requested access several times, but no joy yet).
New Relic appears to offer a few analytics products, but they are quite expensive ($149/500k pageviews - we have several times this).
Other answers I found on StackOverflow suggest building your own, but this was 3-5 years ago. Any ideas?
Heard some good things about Woopra and they offer 1.2m page views for the same price as Relic.
https://www.woopra.com/pricing/
If that's too expensive then it's live loading your logs and using an elastic search service to read them to get he data you want but you will need access to your logs whilst they are being written to.
A service like Loggly might suit you which would enable you to "live tail" your logs (view whilst being written) but again there is a cost to that.
Failing that you could do something yourself or get someone on freelancer to knock something up for you enabling logs to be read and displayed in a format you recognise.
https://www.portent.com/blog/analytics/how-to-read-a-web-site-log-file.htm
If the metrics that you need to track are just limited to the ones that you have listed (Page Views, Unique Users, Referrers) you may think of collecting the logs of your web servers and using a log analyzer.
There are several free tools available on the Internet to get real-time statistics out of those logs.
Take a look at www.elastic.co, for example.
Hope this helps!
Google Analytics offers real time data viewing now, if that's what you want?
https://support.google.com/analytics/answer/1638635?hl=en
I believe their API is now released as we are now looking at incorporating this!
If you have access to web server logs then you can actually set up Elastic Search as a search engine and along with log parser as Logstash and Kibana as Front end tool for analyzing the data.
For more information: please go through the elastic search link.
Elasticsearch weblink

Record live streaming video with WebRTC and stream with AWS

I'm trying to develop a website that basically lets a user visit a page, and lets say click a button, and use their built in camera to live stream videos with audio to others that visit another url.
I need some clarity on what I need to develop, what I can get from 3rd party to save time. AWS looks to cover all the encoding and delivery http://aws.amazon.com/cloudfront/streaming/, but I'm confused on the process on which I should record and delivery the content to S3. Just to much information overload.
In all my research I looks like I should build a WebRTC, which I have done, then transport that data with javascript from the clients browser to my server, and thus to AWS. Is this the best format, or should I been using a 3rd party thats putting more time into that element?
I have seen the Kurento project, as well as this RecordRTC project.
Like I said, I'm finding there is just to much information overload on the topic.
So what are my options for:
In browser recording with WebRTC. Anything else I should do or just force users to roll up to a supporting browser?
WebRTC means I have to do Javascript for the delivery, is node a better option for the server to take delivery of this streaming data?
Anything else I need to know before I pass it off to S3 for delivery to the cloud front?
As you can see the core of my question comes within the recording and transporting the data to the web server so I can delivery it for streaming.
I am looking for the same thing.
In 2020, it seems it should be possible with RecordRTC and then uploading blobs / multiform data directly to S3.