Receive Facebook real time updates into Kafka - facebook-graph-api

im pretty new to kafka and i'm wondering if i'm heading in the right direction.
I want to recive Facebook real time update facebook subscriptions into kafka .
To get the data back from Facebook you need to provide URL that you will be receiving the data back.
So i figured out the the best way to recive the data is to implement dropwizard-kafka-http and push the data into kafka.
Is it the best way to recive the data or you recomend a better way?

You're definitely on the right track.
Note that FB's push notifications will not get you the actual content; you still need to request it. From their docs:
Note that real-time updates only indicate that a particular field has changed, they do not include the value of those fields. They should be used only to indicate when a new Graph API request to that field needs to be made.
So if you wanted a quick way to get the actual stuff into Kafka, you're out of luck. You're gonna have some minimal back and forth.
Are you sure you want to use Java? How about a lightweight spray-can Endpoint in Scala? After all, Kafka is a Scala project :)

Related

POST Request to REST API with Apache Beam

I have the use case that we're pulling messages from PubSub, and then the idea is to POST those messages to the REST API of PowerBI. We want to create a Live Report using the PushDatasets feature.
The main idea should be something like this:
PubSub -> Apache Beam -> POST REST API -> PowerBI Dashboard
I haven't found any implementation about POST Request inside an Apache Beam job (the runner is not a problem right now), just a GET request inside a DoFn. I don't even know if this is possible.
Has someone experienced doing something like this? or maybe another framework/tool that may be more helpful?
Thanks.
Sending POST requests to an external API is certainly possible, but requires some care. It could be as simple as making the POST inside the body of a DoFn, but be aware that this could lead to duplicates since messages within your pipeline belong to a batch and the Beam model allows entire batches to be reprocessed in case of worker failures, exceptions, etc.
There is some advice in the beam docs on grouping elements for efficient external service calls.
Choosing the best course of action here largely depends on the details of the API you're calling. Does it take message IDs that can be used for deduplication on the PowerBI side? Can the API accept batches of messages? Is there rate limiting?

Send POST request from ember to specified URL

I am new to full stack development, and having an issue with a web application I'm working on for my employer. I was tasked with creating a fairly simple application that we can scale over time. For now all that it needs to do is take data from one of our databases, and pass it to a front end application. Using this front-end app our workers should be able to double check the information passed in, and make sure it has been properly translated to a new format. After it is translated I want to send an HTTP POST request to our new systems back-end and have it add this new data via the REST API. Essentially it's an application that was used for practice to get me more acquainted with full stack development while making an effective tool to transfer mass data from one system to another. I can't seem to figure out how to set up something in ember.js to send that POST request to somewhere other than my back-end though.
I believe after re-reading the page on the ember.js site I found my answer. Sorry for posting a question that was already answered elsewhere. It was asked after a while of googling and not finding what sounded right. Just needed a day to let my brain reset I guess. In any case, if somebody else stumbles upon this post and (like me) wasn't comprehending at the time, the answer lies here: https://guides.emberjs.com/v1.10.0/models/connecting-to-an-http-server/
You simply create a new adapter with the host you want to send requests to. Then you can create a new model in my case to serialize the data in a way the host will understand what you're trying to transfer to it. This will allow for data migration between sources like I desired. Sorry again for the unnecessary post, but hopefully it helps someone else in the future.

Google Tag Manager clickstream to Amazon

So the questions has more to do with what services should i be using to have the efficient performance.
Context and goal:
So what i trying to do exactly is use tag manager custom HTML so after each Universal Analytics tag (event or pageview) send to my own EC2 server a HTTP request with a similar payload to what is send to Google Analytics.
What i think, planned and researched so far:
At this moment i have two big options,
Use Kinesis AWS which seems like a great idea but the problem is that it only drops the information in one redshift table and i would like to have at least 4 o 5 so i can differentiate pageviews from events etc ... My solution to this would be to divide from the server side each request to a separated stream.
The other option is to use Spark + Kafka. (Here is a detail explanation)
I know at some point this means im making a parallel Google Analytics with everything that implies. I still need to decide what information (im refering to which parameters as for example the source and medium) i should send, how to format it correctly, and how to process it correctly.
Questions and debate points:
Which options is more efficient and easiest to set up?
Send this information directly from the server of the page/app or send it from the user side making it do requests as i explained before.
Does anyone did something like this in the past? Any personal recommendations?
You'd definitely benefit from Google Analytics custom task feature instead of custom HTML. More on this from Simo Ahava. Also, Google Big Query is quite a popular destination for streaming hit data since it allows many 'on the fly computations such as sessionalization and there are many ready-to-use cases for BQ.

Record live streaming video with WebRTC and stream with AWS

I'm trying to develop a website that basically lets a user visit a page, and lets say click a button, and use their built in camera to live stream videos with audio to others that visit another url.
I need some clarity on what I need to develop, what I can get from 3rd party to save time. AWS looks to cover all the encoding and delivery http://aws.amazon.com/cloudfront/streaming/, but I'm confused on the process on which I should record and delivery the content to S3. Just to much information overload.
In all my research I looks like I should build a WebRTC, which I have done, then transport that data with javascript from the clients browser to my server, and thus to AWS. Is this the best format, or should I been using a 3rd party thats putting more time into that element?
I have seen the Kurento project, as well as this RecordRTC project.
Like I said, I'm finding there is just to much information overload on the topic.
So what are my options for:
In browser recording with WebRTC. Anything else I should do or just force users to roll up to a supporting browser?
WebRTC means I have to do Javascript for the delivery, is node a better option for the server to take delivery of this streaming data?
Anything else I need to know before I pass it off to S3 for delivery to the cloud front?
As you can see the core of my question comes within the recording and transporting the data to the web server so I can delivery it for streaming.
I am looking for the same thing.
In 2020, it seems it should be possible with RecordRTC and then uploading blobs / multiform data directly to S3.

How to create an API and then dynamically retrieve data from and add new data to it?

To start off, I am extremely sorry if my question is not clear but I have very little knowledge about web services in general and the vast nature of varying available information has driven me crazy over the past few weeks. So please do bear with me.
Summary: I want to create a live score update app for android. (I haven't added android as a tag because I do know how to retrieve data from say twitter's JSON api.) However, like the twitter JSON api, I want to be able to add(POST maybe?) data to the Apache 7.0 service that I have running. I then want the app to be able to be able to retrieve this data that I have posted.
I had asked a more generic question earlier and I was told that I should look up some api's. I did that but I have still not been unable to make a break through.
So my questions is:
Is setting up an API on my local web service the correct way to do this?
If so, how can I setup an API that will return JSON objects to the Android app. Also, I would need to be able to constantly update this API with new data.
Additionally, would I also need to setup a database for all this?
Any links to well explained matter would be appreciated too.
Note: I would like to carry this out using a RESTful Web Service through Jersey and use JSON Objects during retrieval.
Again, I am sorry about my terrible knowledge with web services in general despite trying my best to research a lot. The best I could do was get my RESTful Web to respond to a GET with some pre-defined text that I had set in Eclipse.
Thanks.
If I understand you correctly, what you try to do is something like this:
There will be a match or multiple matches of some sort. Whenever a team/player scores someone (i.e. you) will use the app to update the score. People who previously subscribed to the match, will be notified and see the updated score.
Even though I'm not familiar with backends based on Java, the implementation should be fairly similar to other programming languages.
First of all a few words to REST in general. REST is generally needed, when you need to share information between multiple devices and or users. This seems to be the case here. To implement the REST you are going to need an API of some sorts. Within the web APIs are implemented by webservers answering to certain predefined HTTP Requests.
Thus setting up an API on a web server is the correct way.
Next a few words on databases. A database is generally needed, if you want to store information persistently. This might, or might not be what you are planning to do. If there are just going to be a few matches at the same time and you don't care about persistence of the data, you can use Java to store a collection of match objects in memory. I'm just saying it is possible, not that it is a good idea. Once your server crashes or you run out of memory due to w/e reason, data is going to be lost. (Of course within the actual implementation you want to cache data for current matches in some way and keeping objects in memory is way to do so).
I'd recommend to use a database.
Within the database, you can then store and access information about the matches like the score, which users subscribed, who played, etc.
JSON is just a way to represent the data/objects that will be shared between the server and the client. You can use JSON to encode request and response data/bodies.
The user has to be informed about the updated score. There are two basic ways to do so. Push or Pull. With pull, the client will check for updated scores after fixed intervals or actions. With push, the server will notify the client about changed scores which will cause him to update the information. Since you are planning on doing a live application and using Java anyways, push seems to be the better way to go.
Last but not least let's have a look at a possible implementation using
Webserver (API endpoints + database)
Administrator (keeps score updated)
User (receives updates)
We assume that the server will respond to HTTP Requests (POST#/api/my-endpoint) with JSON-Objects.
Possible flow
1)
First the administrator creates a match
REQUEST
POST # /api/matches
body: team1=someteam&team2=someotherteam
The server now will create a match object and store it in the database. The response will contain information about the object and whether the action was successful.
2)
The user asks for a list of matches
REQUEST
GET # /api/matches/curret
The response will be a JSON object containing a list of current matches.
RESPONSE
{
matches: [
{id: 1, teams:...}, ...
]
}
3)
(If push)
A user subscribes to a match
REQUEST
GET # /api/SOME_MATCH_ID/observe
The user will now be added as an observer for the match. Again, the response contains information about whether the action was successful or not.
4)
The administrator updates a score
REQUEST
UPDATE # /api/SOME_MATCH_ID
body: team1scored...
The score now gets update on the server (in memory/database) and the user will be notified about the updated score.
5)
The user gets the updated score
REQUEST
GET # /api/SOME_MATCH_ID
RESPONSE
... (Updated score in some way)