Can I create an algorithm using Amazon MWS API? - amazon-web-services

I am working with my team to prep a project for a potential client. We've researched Amazon MWS API, and we're trying to develop an algorithm using the data scraped from this API.
Just want to make sure we understand the research correctly:
Is it possible to scrape data from Amazon.com like the plugins RevSeller or HowMany do? Then can we add that data to a database for use in an algorithm to determine whether or not an Amazon reseller should invest in reselling a product?
Thanks!

I am doing a similar project. I don't know the specifics of RevSeller or HowMany, but another very popular plugin is Amzpecty. If you use a tool like Fiddler, you can see the HTTP traffic and figure out what it does. They basically scrape out the ASIN and offer listing ID's on the current page you are looking at and one-by-one call the Amazon Product Advertising API, which is not the same thing as MWS. Out of that data returned, they produce a nice overlay that tells you all kinds of important stuff.
Instead of a browser plugin, I'm just writing an app that makes HTTP calls based on a list of ASIN's to the PA API and then I can run the results through my own algorithms. Hope that gives you a starting point.

Related

How to share large data sets to third party data consumers/services?

Lets assume I have a client who has plethora of data related to railways(Signals, tracks, train timings, hazard. offers etc). There are various internal department in railways which wants that data. Just like various weather websites get data from weather department and show that data on their website. Similar is my requirement that I want to share the data securely with other department and services. I want to look at best method to share the data to other services as quickly as possible when the data is available.
Possible Solution
API based: Create API for each department and share them data via API. This has its own pros and cons. This is something which came to my mind but we would have to create lot go API's I was looking if Azure and AWS has any other service which can do the same.
Azure based solution: I am looking for help in this if Azure and services it provides can help. Service bus, Event Grid, Event Hub etc can these be of any use?.
AWS based solution: Is there any service in AWS which can help here?. I dont have much exposure to AWS.
Any other solution ?
I have a fair idea that this could be built using API's but I am looking if I can get this done using cloud platforms like Azure and AWS> This will help in better integration of the product and can scale.

AWS hosted application need autosuggest & wild card search feature

I 'm not sure if this is the correct platform to ask architecture related question, actually I have a webapplication developed in nodejs & typescript hosted in AWS, and the backend is mongodb and my requirement is to include a search box with wild card & auto suggest search functionality so when I start typing on the text box, it will autosuggest just like we do in google search, so how would I achieve this, querying everytime to mongodb will be kind of slow and if 100's of user start doing that, then my application might start dangling so need your suggestion.
Not tried as this more of architecture help required
Not tried as this more of architecture help required
It's not a very detailed answer but may point you in a direction.
I just built something similar using AWS Lambda, ElasticSearch and API Gateway.
ElasticSearch is great for text searches but needs to be populated with indexed data.
If your dataset is changing, you will have to remember about updating ElasticSearch.
API Gateway routes requests from HTTP to Lambda, of which there are two:
one for analysing data in my data warehouse and producing indices for ElasticSearch, the other for doing the actual search and returning results.

Google Tag Manager clickstream to Amazon

So the questions has more to do with what services should i be using to have the efficient performance.
Context and goal:
So what i trying to do exactly is use tag manager custom HTML so after each Universal Analytics tag (event or pageview) send to my own EC2 server a HTTP request with a similar payload to what is send to Google Analytics.
What i think, planned and researched so far:
At this moment i have two big options,
Use Kinesis AWS which seems like a great idea but the problem is that it only drops the information in one redshift table and i would like to have at least 4 o 5 so i can differentiate pageviews from events etc ... My solution to this would be to divide from the server side each request to a separated stream.
The other option is to use Spark + Kafka. (Here is a detail explanation)
I know at some point this means im making a parallel Google Analytics with everything that implies. I still need to decide what information (im refering to which parameters as for example the source and medium) i should send, how to format it correctly, and how to process it correctly.
Questions and debate points:
Which options is more efficient and easiest to set up?
Send this information directly from the server of the page/app or send it from the user side making it do requests as i explained before.
Does anyone did something like this in the past? Any personal recommendations?
You'd definitely benefit from Google Analytics custom task feature instead of custom HTML. More on this from Simo Ahava. Also, Google Big Query is quite a popular destination for streaming hit data since it allows many 'on the fly computations such as sessionalization and there are many ready-to-use cases for BQ.

What is the best tool to use for real-time web statistics?

I operate a number of content websites that have several million user sessions and need a reliable way to monitor some real-time metrics on particular pieces of content (key metrics being: pageviews/unique pageviews over time, unique users, referrers).
The use case here is for the stats to be visible to authors/staff on the site, as well as to act as source data for real-time content popularity algorithms.
We already use Google Analytics, but this does not update quickly enough (4-24 hours depending on traffic volume). Google Analytics does offer a real-time reporting API, but this is currently in closed beta (I have requested access several times, but no joy yet).
New Relic appears to offer a few analytics products, but they are quite expensive ($149/500k pageviews - we have several times this).
Other answers I found on StackOverflow suggest building your own, but this was 3-5 years ago. Any ideas?
Heard some good things about Woopra and they offer 1.2m page views for the same price as Relic.
https://www.woopra.com/pricing/
If that's too expensive then it's live loading your logs and using an elastic search service to read them to get he data you want but you will need access to your logs whilst they are being written to.
A service like Loggly might suit you which would enable you to "live tail" your logs (view whilst being written) but again there is a cost to that.
Failing that you could do something yourself or get someone on freelancer to knock something up for you enabling logs to be read and displayed in a format you recognise.
https://www.portent.com/blog/analytics/how-to-read-a-web-site-log-file.htm
If the metrics that you need to track are just limited to the ones that you have listed (Page Views, Unique Users, Referrers) you may think of collecting the logs of your web servers and using a log analyzer.
There are several free tools available on the Internet to get real-time statistics out of those logs.
Take a look at www.elastic.co, for example.
Hope this helps!
Google Analytics offers real time data viewing now, if that's what you want?
https://support.google.com/analytics/answer/1638635?hl=en
I believe their API is now released as we are now looking at incorporating this!
If you have access to web server logs then you can actually set up Elastic Search as a search engine and along with log parser as Logstash and Kibana as Front end tool for analyzing the data.
For more information: please go through the elastic search link.
Elasticsearch weblink

Accessing World of Warcraft data from the web

I'm aware of the WoW add-on programming community, but what I can find no documentation on is any API for accessing WoW's databases from the web. I see third-party sites like WoWHeroes.com and Wowhead use game data (item and character databases,) so I know it's possible. But, I can't figure out where to begin. Is there a web service I can use or are they doing some sort of under-the-hood work that requires running the WoW client in their server environment?
Sites like Wowhead and WoWHearoes use client run addons from players which collect data. The data is then posted to their website. There is no way to access WoW's database. Your best bet is to hit the armory and extract the XML returned from your searches. The armory is just an xml transform on xml data returned.
Blizzard has recently (8/15/2011) published draft documentation for their RESTful APIs at the following location:
http://blizzard.github.com/api-wow-docs/
The APIs cover information about characters, items, auctions, guilds, PVP, etc.
Requests to the API are currently throttled to 3,000 per day for anonymous usage, but there is a process for registering applications that have a legitimate need for more access.
Update (January 2019): The new Blizzard Battle.net Developer Portal is here:
https://develop.battle.net/
Request throttling limits and authentication rules have changed.
Characters can be mined from the armory, the pages are xml.
Items are mined from the local installation game files, that's how wowhead does it at least.
It's actually really easy to get item data from the wow armory!
For example:
http://www.wowarmory.com/item-info.xml?i=33135
View the source of the page (not via Google Chrome, which displays transformed XML via XSLT) and you'll see the XML data!
You can use search listing pages to retrieve all blue gems, for example, then use an XML parser to retrieve the data
They are parsing the Armory information from www.wowarmory.com. There is no official Blizzard API for accessing it, but there is an open source PHP solution available (http://phparmory.sourceforge.net/)
Maybe a little late to the party, but for future reference check out the WoW API Documentation at http://blizzard.github.com/api-wow-docs/
Scraping HTML and XML is now pretty much obsolete and also discouraged by Blizzard.
The documentation:
http://blizzard.github.com/api-wow-docs/
enjoy
Sites like those actually get the data from the Armory. If you pull up any item, guild, character, etc. and do 'View Source' on the page you will see the XML data coming back. Here is a quick C# example of how to get the data.
This third-party site collection data from players. I think this collection based on addons for WoW or each player submit information manualy.
Next option is wraping wow site and parsing information from websites (HTML).
this is probably the wrong site for your question, but you're thinking of the wowarmory xml stuff. there is no official wow api. people just do httprequests and get the xml to do number crunching stuffs. try googling around. there are some libs out there in different languages that are already written for you. i know there are implementations in php/ruby. i was working on one in .net a while back until i got distracted. here's an article which kinda sums this all up.
http://www.wow.com/2008/02/11/mashing-up-wow-data-when-we-can-get-it-in-outside-applications/
Wowhead and other sites generally rely on data gathered by users with a wow add-in.
Wowhead also has a way for other sites to reference that data in hover pop-ups, so their content gets reused on a number of sites.
Powered by Wowhead
For actual ingame data collection:
cosmos.exe is what thottbot for example uses. It probably uses some form windows hack (dllinjection or something) or sniffs packets to determine what items have dropped and etc. (intercepts traffic from the wow server to your client and decodes it). It saves this data on the users computer and then uploads it to a webserver for storage. I don't know if any development libraries were created for this sort of thing.