I have a requirement to publish messages from Google Pub/Sub topic to a Kafka running on my on-prem infrastructure. I stumbled on this link.
https://docs.confluent.io/current/connect/kafka-connect-gcp-pubsub/index.html
This should work. Wanted to know if you've used any other alternative solution to achieve this?
If you need to integrate PubSub and Kafka I suggest that you create a script for this purpose. In Python for example we have libraries for both PubSub and Kafka
Based on that, you could create a script more or less like below and run it inside some processing resource like Compute Engine or in your on premises server:
from google.cloud import pubsub_v1
from kafka import KafkaProducer
def callback(message):
print(message.data)
producer.send('<your-topic>', message.data)
message.ack()
producer = KafkaProducer(bootstrap_servers='localhost:1234') //Change it for your real parameter
subscription_name = "projects/<your-project>/subscriptions/<your-subscription>"
subscriber = pubsub_v1.SubscriberClient()
future = subscriber.subscribe(subscription_name, callback)
Related
I am running Celery+Kombu 4.4.6 on AWS SQS and want to revoke and terminate tasks.
Reading through documentation and SO posts, the transport needs to allow broadcast messages. SQS does not do broadcast messages and Celery+Kombu needs to use SimpleDB for those. That option was turned off by default long way back in version 1.x. To enables it, support_fanout = True needs to be added to the transport options.
But adding just that option is not working for me and I can't figure out what am I missing. Possible options are:
SimpleDB - it is not clear to me how do I even enable SimpleDB. I do see documentation in AWS, but I do not see it as a separate service.
Any additional config to be added?
Looking briefly at the SQS code, seems like SimpleDB is the only option for this. Is that correct?
Any other option to enable task revocation on SQS?
In my app.celery I have:
app = Celery('app',
broker=''sqs://<Access key>:<secret key>#')),
backend='cache+memcached://<host>:11211/')),
)
And in my app.settings I have:
CELERY_BROKER_URL='sqs://<access key>:<secret key>#'))
CELERY_BROKER_TRANSPORT_OPTIONS = {
'region': '<region>',
'supports_fanout': True,
}
CELERY_DEFAULT_QUEUE = 'app'
CELERY_DEFAULT_EXCHANGE = 'app'
CELERY_DEFAULT_ROUTING_KEY = 'app'
My final solution was to use Amazon MQ with a RabbitMQ instance. Amazon SimpleDB seems to be gone, making any support in Celery+Kombu obsolete and broken.
What I want to achieve is that once I receive a message via Twilio I want to schedule a reply to it after exactly 5 minutes. I am using Google Cloud Functions to generate the replies, but I'm not sure how to schedule it. I have gone through Cloud tasks, Pub/Sub and Scheduler but I'm still confused as to how to achieve it. I am using Python.
What I am thinking is the following workflow: Twilio -> cloud function receives the message and sets a task for after 5 minutes o-> another cloud function is invoked after 5 minutes. I am stuck as to how to schedule it after 5 minutes.
In AWS you would use SQS in combination with delay queues which makes this very convenient.
Google Cloud Pub/Sub being the equivalent to AWS SQS doesn't support any sort of delay so you would need to use Google Cloud Tasks.
When creating a task you can specify a schedule time which identifies the time at which the task should be executed:
scheduleTime string (Timestamp format)
The time when the task is scheduled to be attempted or retried.
Quick example code copy & pasted from the Google documentation leaving out non-relevant bits and pieces:
from google.cloud import tasks_v2
from google.protobuf import timestamp_pb2
import datetime
[...]
client = tasks_v2.CloudTasksClient()
parent = client.queue_path(project, location, queue)
in_seconds = 5*60 # After 5 minutes...
d = datetime.datetime.utcnow() + datetime.timedelta(seconds=in_seconds)
timestamp = timestamp_pb2.Timestamp()
timestamp.FromDatetime(d)
task = {
"http_request": {
"http_method": tasks_v2.HttpMethod.POST,
"url": url,
"schedule_time": timestamp,
}
}
# Need to add payload, headers and task name as necessary here...
[...]
response = client.create_task(request={"parent": parent, "task": task})
How can I bulk move messages from one topic to another in GCP Pub/Sub?
I am aware of the Dataflow templates that provide this, however unfortunately restrictions do not allow me to use Dataflow API.
Any suggestions on ad-hoc movement of messages between topics (besides one-by-one copy and pasting?)
Specifically, the use case is for moving messages in a deadletter topic back into the original topic for reprocessing.
You can't use snapshots, because snapshots can be applied only on subscriptions of the same topics (to avoid message ID overlapping).
The easiest way is to write a function that pull your subscription. Here, how I will do it:
Create a topic (named, for example, "transfer-topic") with a push subscription. Set the timeout to 10 minutes
Create a Cloud Functions HTTP triggered by PubSub push subscription (or a CLoud Run service). When you deploy it, set the timeout to 9 minutes for Cloud Function and to 10 minutes for Cloud Run. The content of the processing is the following
Read a chunk of messages (for examples 1000) from the deadletter pull subscription
Publish the messages (in bulk mode) into the initial topic
Acknowledge the messages of the dead letter subscription
Repeat this up to the pull subscription is empty
Return code 200.
The global process:
Publish a message in the transfer-topic
The message trigger the function/cloud run with a push HTTP
The process pull the messages and republish them into the initial topic
If the timeout is reached, the function crash and PubSub perform a retry of the HTTP request (according with an exponential backoff).
If all the message are processed, the HTTP 200 response code is returned and the process stopped (and the message into the transfer-topic subscription is acked)
this process allow you to process a very large amount of message without being worried about the timeout.
I suggest that you use a Python script for that.
You can use the PubSub CLI to read the messages and publish to another topic like below:
from google.cloud import pubsub
from google.cloud.pubsub import types
# Defining parameters
PROJECT = "<your_project_id>"
SUBSCRIPTION = "<your_current_subscription_name>"
NEW_TOPIC = "projects/<your_project_id>/topics/<your_new_topic_name>"
# Creating clients for publishing and subscribing. Adjust the max_messages for your purpose
subscriber = pubsub.SubscriberClient()
publisher = pubsub.PublisherClient(
batch_settings=types.BatchSettings(max_messages=500),
)
# Get your messages. Adjust the max_messages for your purpose
subscription_path = subscriber.subscription_path(PROJECT, SUBSCRIPTION)
response = subscriber.pull(subscription_path, max_messages=500)
# Publish your messages to the new topic
for msg in response.received_messages:
publisher.publish(NEW_TOPIC, msg.message.data)
# Ack the old subscription if necessary
ack_ids = [msg.ack_id for msg in response.received_messages]
subscriber.acknowledge(subscription_path, ack_ids)
Before running this code you will need to install the PubSub CLI in your Python environment. You can do that running pip install google-cloud-pubsub
An approach to execute your code is using Cloud Functions. If you decide to use it, pay attention in two points:
The maximum time that you function can take to run is 9 minutes. If this timeout get exceeded, your function will terminate without finishing the job.
In Cloud Functions you can just put google-cloud-pubsub in a new line of your requirements file instead of running a pip command.
I am trying to connect lots of iot objects to an eventhub and save them to a blob storage(also an sql database). I want to do this with python(and I am not sure if this is a recommended practice). The documentation about python was confusing. I tried a few examples but they create an entry to blob storage but entries seems to be irrelevant.
Things like this:
Objavro.codecnullavro.schema\EC{"type":"record","name":"EventData","namespace":"Microsoft.ServiceBus.Messaging","fields":[{"name":"SequenceNumber","type":"long"}...
which is not what I send. How can I solve this?
You could use the azure-eventhub Python SDK to send messages to Event Hub which is available on pypi.
And there is a send sample showing how to send messages:
import os
from azure.eventhub import EventHubProducerClient, EventData
producer = EventHubProducerClient.from_connection_string(
conn_str=CONNECTION_STR,
eventhub_name=EVENTHUB_NAME
)
with producer:
event_data_batch = producer.create_batch()
event_data_batch.add(EventData('Single message'))
producer.send_batch(event_data_batch)
I'm interested in The documentation about python was confusing. I tried a few examples but they create an entry to blob storage but entries seems to be irrelevant.
Could you share your code with me? I'm wondering what's the input/output for Event Hub and Storage Blob and how's the data processing flow.
btw, for Azure Storage Blob Python SDK usage, you could check the repo and [blob samples] for more information.
This is the connection string format for inserting new messages in eventhub using kafka-python. If you were using kafka and want to replace you just have to change this connection string.
import ssl
context = ssl.create_default_context()
context.options &= ssl.OP_NO_TLSv1
context.options &= ssl.OP_NO_TLSv1_1
self.kafka = KafkaProducer(bootstrap_servers=KAFKA_HOST,connections_max_idle_ms=5400000,security_protocol='SASL_SSL',value_serializer=lambda v: json.dumps(v).encode('utf-8'),sasl_mechanism='PLAIN',sasl_plain_username='$ConnectionString',sasl_plain_password={YOUR_KAFKA_ENDPOINT},api_version = (0,10),retries=5,ssl_context = context)
KAFKA_HOST = "{your_eventhub}.servicebus.windows.net:9093"
KAFKA_ENDPOINT="Endpoint=sb://{your_eventhub}.servicebus.windows.net/;SharedAccessKeyName=RootSendAccessKey;SharedAccessKey={youraccesskey}"
You can find KAFKA_HOST and KAFKA_ENDPOING from your Azure Console.
I am using Azure cloud service bus to send and receive messages using AMQP protocol. I have installed proton-c libraries in my debian-linux. I tried the below program to send and receive message from the queue. My requirement is instead of queue I have use topics. Please anyone give me a sample program to use topics in Azure cloud.
import sys, optparse
from proton import *
messenger = Messenger()
message = Message()
message.address = "amqps://owner:<<key>>#namespace.servicebus.windows.net/queuename"
message.body = "sending message to the queue"
messenger.put(message)
messenger.send()
Instead of queuename in above url if I give the topic name then the program running forever. Please someone help me. I am new to python programming.
I found myself the solution for this problem. I guess very few people are working in Azure Cloud so I didn't get any answers.
Here is the solution:
If we create topics in Azure service bus, it always select the checkbox "Enable Partitioning". AMQP protocol doesn't support partitioning topics/queues so I stuck with above issue. Once I deleted the topic and recreate the same topic without select the checkbox "Enable Partitioning". Its work fine. :)