Which is the best place where to consume kafka topic in Google cloud platform? - google-cloud-platform

we have a microservices architecture developed on Google cloud.
Actually the microservices are all running on cloud run and talk each other with rest (sync) or with pub/sub (async).
It is an event-driven pattern so when a service publish something happened (like "user_created") on the right pub/sub topic, many services receive that event with a push subscription on their http endpoint.
Now we are moving to kafka for message ordering and replaying features.
Unfortunately kafka consumers are pull based so we need to change the way services are receiving events.
Since cloud run is a serverless solutions that scale to zero, we cannot make it listen to kafka topic, because the service could shut down during the night because no request arrive.
We have different services which can safely be updated with a scheduled cron, so every one hour as example, we make a get request to service, which download all new kafka messages and update itself accordingly.
But many other services need a near real time update to accomplish their role.
So which product of google cloud platform is best suited to consume kafka topic in this architecture?
Thanks!

Related

google cloud pubsublite client on a serverless service

First of all, I wanted to tag this post to google-cloud-pubsub-lite, but it's not created yet, my apologizes
I'm trying to get introduced with pubsub lite. I think it can be used as a "cheap" way to get an event store in a GCP project.
We usually create GAE standard services so we pay for what we used and at the same time it offers a great scalability.
Reading samples about how to currently subscribe to pubsub lite I observe that there's no option to supply an endpoint to receive new messages. The client connects to a subscription and stays awaiting for new messages to be streamed throw the connection.
I'm wondering a few qustions:
Can we receive messages from a pubsub lite topic in a Cloud Function or in an endpoint of a GAE standard service?
How can we scale to several clients for a topic subscription
Thanks
PubSub lite subscription supports only the Pull mode. So, you need to create one or several clients, to plug them to the subscription and to get the messages.
In serverless mode, you should use the Push subscription more suitable for scalability and integration. In the pull subscription mode, you need to perform microbatches
Create a Cloud Scheduler
* * * * * as frequency
Call the serverless tool that you want (Cloud Run, Cloud Function, App Engine)
On the serverless product, when you receive a request, create a connection to the PubSub lite subscription and start to pull the messages.
If the pulling takes more than 1 minutes a new request will be received from Cloud Scheduler
Cloud Function will create a new instance automatically and start the pulling
Cloud Run can handle up to 80 requests concurrently. I recommend you to set the Concurrency paramater to 1 to have the exact same behavior as Cloud Function
You can't play with the concurrency on App Engine
Set the timeout to the max
If there is no new message (for example during 500ms) exit gracefully.
If the service timeout is close (15s before for example), stop the pulling and exit gracefully.
Like this, you could have several client to the same subscription (scale + 1 per minutes
and per scheduler, if the previous run is still active)
This workaround keep the serverless mode. If there is no messages, the pulling stopped after 500ms, or when there is no new messages. You scale up with your traffic.
However, I don't understand your concept of cheap event store.
PubSub lite is not a pay as you go model, but a flat model. You reserve capacity and you pay for it 24/7 even if it is not used
PubSub lite is zonal, and dangerous for HA
You can keep the event up to the partition is full. But will not be cheaper to store the event elsewhere? BigQuery? Firestore? Cloud SQL?

Specifics of using a push subscription as a load balancer

I am trying to send IoT commands using a push subscription. I have 2 reasons for this. Firstly, my devices are often on unstable connections so going through the pubsub let me have retries and I don't have to wait the QoS 1 timeout (I still need it because I log it for later use) at the time I send the message. The second reason is the push subscription can act as a load balancer. To my understanding, if multiple consumers listen to the same push subscription, each will receive a subset of the messages, effectively balancing my workload. Now my question is, this balancing is a behavior I observed on pull subscriptions, I want to know if:
Do push subscription act the same ?
Is it a reliable way to balance a workload ?
Am I garanteed that these commands will be executed at most once if there is, lets say, 15 instances listening to that subscription ?
Here's a diagram of what I'm trying to acheive:
Idea here is that I only interact with IoT Core when instances receive a subset of the devices to handle (when the push subscription triggers). Also to note that I don't need this perfect 1 instance for 1 device balancing. I just need the workload to be splitted in a semi equal manner.
EDIT: The question wasn't clear so I rewrote it.
I think you are a bit confused about the concepts behind Pub/Sub. In general, you publish messages to a topic for one or multiple subscribers. I prefer to compare Pub/Sub with a magazine that is being published by a big publishing company. People who like the magazine can get a copy of that magazine by means of a subscription. Then when a new edition of that magazine arrives, a copy is being sent to the magazine subscribers, having exactly the same content among all subscribers.
For Pub/Sub you can create multiple push subscriptions for a topic, up to the maximum of 10,000 subscriptions per topic (also per project). You can read more about those quotas in the documentation. Those push subscriptions can contain different endpoints, in your case, representing your IoT devices. Referring back to the publishing company example, those push endpoints can be seen as the addresses of the subscribers.
Here is an example IoT Core architecture, which focuses on the processing of data from your devices to a store. The other way around could also work. Sending a message (including device/registry ID) from your front-end to a Cloud Function wrapped in API gateway. This Cloud Function then publishes the message to a topic, which sends the message to a cloud Function that posts the message using the MQTT protocol. I worked out both flows for you that are loosely coupled so that if anything goes wrong with your device or processing, the data is not lost.
Device to storage:
Device
IoT Core
Pub/Sub
Cloud Function / Dataflow
Storage (BigQuery etc.)
Front-end to device:
Front-end (click a button)
API Gateway / Cloud Endpoints
Cloud Function (send command to pub/sub)
Pub/Sub
Cloud Function (send command to device with MQTT)
Device (execute the command)

How to deliver AWS SNS message to all instances of particular micro service

I've got rather rare requirement to deliver SNS topic message to all micro service instances.
Basically it's kind of notification that related data had changed
and all micro service instances should reload their internals from data source.
We are using TerraForm to create our infrastructure, with Kong api gateway.
Micro Service instances could be created 'on the fly' as system load is increased,
so subscriptions to topic could not be created in TerraForm stage.
Micro Service is standard SpringBoot app.
My first approach is:
micro service is exposing http endpoint that can be subscribed to SNS topic
micro service on start will subscribe itself (above endpoint) to required SNS topic, unsubscribe on service shutdown.
My problem is to determine individual micro service instances urls, that can be used in subscription process.
Alternative approach would be to use SQS, create SQS queue per micro srv instance (subscribe it to sns).
Maybe I'm doing it wrong on conceptual level ?
Maybe different architecture approach is required ?
It might be easier for the microservices to check an object in Amazon S3 to "pull" the configuration updates (or at least call HeadObject to check if the configuration has changed) rather than trying to "push" the configuration update to all servers.
Or, use AWS Systems Manager Parameter Store and have the servers cache the credentials for a period (eg 5 minutes) so they aren't always checking the configuration.
Kinda old right now but here is my solution:
create SNS, subscribe with SQS, publish the SQS to redis pub/sub, subscribe to pub/sub
now all your instances will get the event.

Question on data transfer from local kafka to kafka google cloud

I have been evaluating a requirement to move test data from kafka local setup to kafka in google cloud. which means integrate both.I plan to use the following approach to create a pipeline. Please let me know your thoughts if i am headed in right direction.
I will setup kafka locally on my desktop and create a topic called TopicA. I then plan to produce messages to this kafka topic TopicA. The consumer of this TopicA would pick these messages sequentially and connect to the GCP kafka topic called TopicB and start posting message to the gcp TopicB. Finally the consumer of the GCP TopicB would consume these messages thus completing the created pipeline and validating these messages.
To update i do not have much idea about GCP need to explore. Any ideas or pointers to any code base/docs/SO would be highly welcome. Thanks in advance.

Aws IoT : How to use an application service on EC2?

I'd like to use AWS IoT to manage a grid of devices. Data by device must be sent to a queue service (RabbitMQ) hosted on an EC2 instance that is the starting point for a real time control application. I read how to make a rule to write data to other Service: Here
However there isn't an example for EC2. Using the AWS IoT service, how can I connect to a service on EC2?
Edit:
I have a real time application developed with storm that consume data from RabbitMQ and puts the result of computation in another RabbitMQ queue. RabbitMQ and storm are on EC2. I have devices producing data and connected to IoT. Data produced by devices must be redirected to the queue on EC2 that is the starting point of my application.
I'm sorry if I was not clear.
The AWS IoT supports pushing the data directly to other AWS services. As you have probably figured out by now publishing to third party APIs isn't directly supported.
From the choices AWS offers Lambda, SQS, SNS and Kinesis would probably work best for you.
With Lambda you could directly forward the incoming message using the one of Rabbit MQs APIs.
With SQS you would put it into an AWS queue first and than poll this queue transfering it to RabbitMQ.
Kinesis would allow more sophisticated processing, but is probably too complex.
I suggest you program a Lamba with the programming language of your choice using one of the numerous RabbitMQ APIs.