How to share large data sets to third party data consumers/services?

How to share large data sets to third party data consumers/services? - amazon-web-services

Lets assume I have a client who has plethora of data related to railways(Signals, tracks, train timings, hazard. offers etc). There are various internal department in railways which wants that data. Just like various weather websites get data from weather department and show that data on their website. Similar is my requirement that I want to share the data securely with other department and services. I want to look at best method to share the data to other services as quickly as possible when the data is available.
Possible Solution
API based: Create API for each department and share them data via API. This has its own pros and cons. This is something which came to my mind but we would have to create lot go API's I was looking if Azure and AWS has any other service which can do the same.
Azure based solution: I am looking for help in this if Azure and services it provides can help. Service bus, Event Grid, Event Hub etc can these be of any use?.
AWS based solution: Is there any service in AWS which can help here?. I dont have much exposure to AWS.
Any other solution ?
I have a fair idea that this could be built using API's but I am looking if I can get this done using cloud platforms like Azure and AWS> This will help in better integration of the product and can scale.

Related

Google Tag Manager clickstream to Amazon

So the questions has more to do with what services should i be using to have the efficient performance.
Context and goal:
So what i trying to do exactly is use tag manager custom HTML so after each Universal Analytics tag (event or pageview) send to my own EC2 server a HTTP request with a similar payload to what is send to Google Analytics.
What i think, planned and researched so far:
At this moment i have two big options,
Use Kinesis AWS which seems like a great idea but the problem is that it only drops the information in one redshift table and i would like to have at least 4 o 5 so i can differentiate pageviews from events etc ... My solution to this would be to divide from the server side each request to a separated stream.
The other option is to use Spark + Kafka. (Here is a detail explanation)
I know at some point this means im making a parallel Google Analytics with everything that implies. I still need to decide what information (im refering to which parameters as for example the source and medium) i should send, how to format it correctly, and how to process it correctly.
Questions and debate points:
Which options is more efficient and easiest to set up?
Send this information directly from the server of the page/app or send it from the user side making it do requests as i explained before.
Does anyone did something like this in the past? Any personal recommendations?

You'd definitely benefit from Google Analytics custom task feature instead of custom HTML. More on this from Simo Ahava. Also, Google Big Query is quite a popular destination for streaming hit data since it allows many 'on the fly computations such as sessionalization and there are many ready-to-use cases for BQ.

Implementing a simple Restful service to store and retrieve data using AWS API Gateway/Lambda

I'm new to AWS, so apologies in advance if this question is missing some important considerations, or has incorrect assumptions.
But basically I want to implement a service on AWS to store and retrieve data from multiple clients, which may be Android apps, Windows applications, websites etc. The way I've considered doing this is via a RESTful service using API Gateway front end, with a Lambda back end and maybe an S3 bucket to hold the data.
The basic requirements are:
(1) Clients can publish data to the server, where it is stored, perhaps with some kind of key/value structure.
(2) Clients can retrieve said data by key.
(3) If it is possible, clients to be able to subscribe to events from the service, so that they are notified if the value of a piece of data changes. This would avoid the need to poll the service, which would presumably start racking up unnecessary charges if the data doesn't change often.
Any pointers on how to get started with this welcome!

Creating a RESTful API on top of Lambda and API Gateway is one of the main use cases for this architecture. You can think of Lambda functions as controllers with methods and API Gateway as a router that forwards requests to functions based on the URL pattern. There are many frameworks and approaches that can help out here if you don't want to write from scratch:
Lambdasync
https://medium.com/#fredrikanderzon/create-a-rest-api-on-aws-lambda-using-lambdasync-e46c68f8043f
Serverless
https://serverless.com/framework/docs/providers/aws/events/apigateway/
Swagger
https://cloudonaut.io/create-a-serverless-restful-api-with-api-gateway-swagger-lambda-and-dynamodb/
As far as event subscriptions go (requirement #3) you can model this in many datastores, certainly in a relational/SQL database, with a table like this:
Subscription (key_of_interest, user_id, events_of_interest)
I'm leaving out data types for you to figure out, but you get the idea hopefully. After each data modification on a particular key, see if that key is of interest in the subscription table, then wire up a response to the user's who indicated interest. The details of this of course depend on your particular requirements. A caution though: this approach will increase the cost of data modifications because of the additional overhead needed to process subscriptions.
EDIT: One other thing I forgot. S3 is better suited for non-structured data (think 'files'). For relational databases, checkout RDS. For a simple NoSQL database you might use DynamoDB, or host your own NoSQL database of choice on an EC2 instance.

Can I create an algorithm using Amazon MWS API?

I am working with my team to prep a project for a potential client. We've researched Amazon MWS API, and we're trying to develop an algorithm using the data scraped from this API.
Just want to make sure we understand the research correctly:
Is it possible to scrape data from Amazon.com like the plugins RevSeller or HowMany do? Then can we add that data to a database for use in an algorithm to determine whether or not an Amazon reseller should invest in reselling a product?
Thanks!

I am doing a similar project. I don't know the specifics of RevSeller or HowMany, but another very popular plugin is Amzpecty. If you use a tool like Fiddler, you can see the HTTP traffic and figure out what it does. They basically scrape out the ASIN and offer listing ID's on the current page you are looking at and one-by-one call the Amazon Product Advertising API, which is not the same thing as MWS. Out of that data returned, they produce a nice overlay that tells you all kinds of important stuff.
Instead of a browser plugin, I'm just writing an app that makes HTTP calls based on a list of ASIN's to the PA API and then I can run the results through my own algorithms. Hope that gives you a starting point.

How to collect the mobile app data using AWS service(s) or other solutions?

I would like to build an app and collect some events from the app, and then show some event statistics like frequency, duration etc.
I`ve just investigated the aws Cognito web service, but it stores only a set of key-value pairs of a limited total size.
I can build, of course, my own REST web service on the top of the database and store all my events there. But I wonder if there are some aws web service(s) that I can leverage to build such a solution. (In case if someone familiar with Azure, it would be nice to see the possible solution there too!)
Any ideas, suggestions?

Haven't used any packaged web service for this; however, I do use REST methods for statistics in my apps and find it works well....low overhead and easy to add, change and collect.

I would suggest you to have a look at AWS Mobile Analytics service (http://aws.amazon.com/mobileanalytics/)
Have a look at the Getting Started page http://aws.amazon.com/mobileanalytics/getting-started/
Seb

Amazon AWS / Rakuten API - Inventory Management

I am sure this question may seem a bit lacking, but I literally do not know where to begin with. I want to develop a solution that will allow me to manage ALL of my Amazon and Rakuten/Buy.com inventory from my own website.
My main concern is keeping the inventory in sync, so the process would be as follows:
1.Fetch Orders sold today
a.Subtract the respective quantities
2.Fetch Rakuten orders sold
a.Subtract the respective quantities
3.Update Internal DB of products
a.Send out updated feeds to Amazon and Rakuten.
Again, I apologize if this question may seem a bit lacking, but I am having trouble understanding how exactly to implement this, any tips would be appreciated

For the Amazon part look at https://developer.amazonservices.com/
Rakuten, I think you will be able to do what you want with it via the FTP access, I'm still researching this. If I find more I'll respond with a better answer.

In order to process orders, you'll need to use be registered with Rakuten in order to get an authorisation token. For the API doc etc... try sending an email to support#rakuten.co.uk.
Incidentally, to send out updated feeds, you'll need to use the inventory API in order to update stock quantities (given that you'll be selling the same item Amazon etc..).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js