kafka-python: Closing the kafka producer with 0 vs inf secs timeout

kafka-python: Closing the kafka producer with 0 vs inf secs timeout - python-2.7

I am trying to produce the messages to a Kafka topic using kafka-python 2.0.1 using python 2.7 (can't use Python 3 due to some workplace-related limitations)
I created a class as below in a separate and compiled the package and installed in virtual environment:
import json
from kafka import KafkaProducer
class KafkaSender(object):
def __init__(self):
self.producer = self.get_kafka_producer()
def get_kafka_producer(self):
return KafkaProducer(
bootstrap_servers=['locahost:9092'],
value_serializer=lambda x: json.dumps(x),
request_timeout_ms=2000,
)
def send(self, data):
self.producer.send("topicname", value=data)
My driver code is something like this:
from mypackage import KafkaSender
# driver code
data = {"a":"b"}
kafka_sender = KafkaSender()
kafka_sender.send(data)
Scenario 1:
I run this code, it runs just fine, no errors, but the message is not pushed to the topic. I have confirmed this as offset or lag is not increased in the topic. Also, nothing is getting logged at the consumer end.
Scenario 2:
Commented/removed the initialization of Kafka producer from __init__ method.
I changed the sending line from
self.producer.send("topicname", value=data) to self.get_kafka_producer().send("topicname", value=data) i.e. creating kafka producer not in advance (during class initialization) but right before sending the message to topic. And when I ran the code, it worked perfectly. The message got published to the topic.
My intention using scenario 1 is to create a Kafka producer once and use it multiple times and not to create Kafka producer every time I want to send the messages. This way I might end up creating millions of Kafka producer objects if I need to send millions of messages.
Can you please help me understand why is Kafka producer behaving this way.
NOTE: If I write the Kafka Code and Driver code in same file it works fine. It's not working only when I write the Kafka code in separate package, compile it and import it in my another project.
LOGS: https://www.diffchecker.com/dTtm3u2a
Update 1: 9th May 2020, 17:20:
Removed INFO logs from the question description. I enabled the DEBUG level and here is the difference between the debug logs between first scenario and the second scenario
https://www.diffchecker.com/dTtm3u2a
Update 2: 9th May 2020, 21:28:
Upon further debugging and looking at python-kafka source code, I was able to deduce that in scenario 1, kafka sender was forced closed while in scenario 2, kafka sender was being closed gracefully.
def initiate_close(self):
"""Start closing the sender (won't complete until all data is sent)."""
self._running = False
self._accumulator.close()
self.wakeup()
def force_close(self):
"""Closes the sender without sending out any pending messages."""
self._force_close = True
self.initiate_close()
And this depends on whether kafka producer's close() method is called with timeout 0 (forced close of sender) or without timeout (in this case timeout takes value float('inf') and graceful close of sender is called.)
Kafka producer's close() method is called from __del__ method which is called at the time of garbage collection. close(0) method is being called from method which is registered with atexit which is called when interpreter terminates.
Question is why in scenario 1 interpreter is terminating?

Related

MismatchingMessageCorrelationException : Cannot correlate message ‘onEventReceiver’: No process definition or execution matches the parameters

We are facing an MismatchingMessageCorrelationException for the receive task in some cases (less than 5%)
The call back to notify receive task is done by :
protected void respondToCallWorker(
#NonNull final String correlationId,
final CallWorkerResultKeys result,
#Nullable final Map<String, Object> variables
) {
try {
runtimeService.createMessageCorrelation("callWorkerConsumer")
.processInstanceId(correlationId)
.setVariables(variables)
.setVariable("callStatus", result.toString())
.correlateWithResult();
} catch(Exception e) {
e.printStackTrace();
}
}
When i check the logs : i found that the query executed is this one :
select distinct RES.* from ACT_RU_EXECUTION RES
inner join ACT_RE_PROCDEF P on RES.PROC_DEF_ID_ = P.ID_
WHERE RES.PROC_INST_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0' and RES.SUSPENSION_STATE_ = '1'
and exists (select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = RES.ID_ and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer' )
Some times, When i look for the instance of the process in the database i found it waiting in the receive task
SELECT DISTINCT * FROM ACT_RU_EXECUTION RES
WHERE id_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
However, when i check the subscription event, it's not yet created in the database
select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer'
I think that the solution is to save the "receive task" before getting the response for respondToCallWorker, but sadly i can't figure it out.
I tried "asynch before" callWorker and "Message consumer" but it did not work,
I also tried camunda.bpm.database.jdbc-batch-processing=false and got the same results,
I tried also parallel branches but i get OptimisticLocak exception and MismatchingMessageCorrelationException
Maybe i am doing it wrong
Thanks for your help

This is an interesting problem. As you already found out, the error happens, when you try to correlate the result from the "worker" before the main process ended its transaction, thus there is no message subscription registered at the time you correlate.
This problem in process orchestration is described and analyzed in this blog post, which is definitely worth reading.
Taken from that post, here is a design that should solve the issue:
You make message send and receive parallel and put an async before the send task.
By doing so, the async continuation job for the send event and the message subscription are written in the same transaction, so when the async message send executes, you already have the subscription waiting.
Although this should work and solve the issue on BPMN model level, it might be worth to consider options that do not require remodeling the process.
First, instead of calling the worker directly from your delegate, you could (assuming you are on spring boot) publish a "CallWorkerCommand" (simple pojo) and use a TransactionalEventLister on a spring bean to execute the actual call. By doing so, you first will finish the BPMN process by subscribing to the message and afterwards, spring will execute your worker call.
Second: you could use a retry mechanism like resilience4j around your correlate message call, so in the rare cases where the result comes to quickly, you fail and retry a second later.
Another solution I could think of, since you seem to be using an "external worker" pattern here, is to use an external-task-service task directly, so the send/receive synchronization gets solved by the Camunda external worker API.
So many options to choose from. I would possibly prefer the external task, followed by the transactionalEventListener, but that is a matter of personal preference.

How to schedule SMS using Twilio w/o using Celery?

I want to schedule SMS using Twilio in Python. After reading some articles I came to know about Celery.
But I opted not to use celery and go with Python Threading module. Threading module works perfectly when using some dummy function, but when calling
client.api.account.messages.create(
to="+91xxxxxxxxx3",
from_=settings.TWILIO_CALLER_ID,
body=message)
it sends the SMS at the same time.
Here is my code
from threading import Timer
from django.conf import settings
from twilio.rest import Client
account_sid = settings.TWILIO_ACCOUNT_SID
auth_token = settings.TWILIO_AUTH_TOKEN
message = to_do
client = Client(account_sid, auth_token)
run_at = user_given_time() #this function extracts the user given time from database. it works perfectly fine.
# find current DateTime
now = DT.now()
now = DT.strptime(str(now), '%Y-%m-%d %H:%M:%S.%f')
now = now.replace(microsecond=0)
delay = (run_at - now).total_seconds()
Timer(delay, client.api.account.messages.create(
to="+91xxxxxxxxx3",
from_=settings.TWILIO_CALLER_ID,
body=to_do)).start()
So the problem is that Twilio sends SMS at the same time, but I want it to send after given delay.

You are calling the function before starting your Timer, and then passing your Timer thread the return value. You need to pass Timer the function client.api.account.messages.create and the kwargs to pass it as separate arguments so the thread can call the function itself when the time comes:
Timer(delay, client.api.account.messages.create,
kwargs={'to': "+91xxxxxxxxx3",
'from_': settings.TWILIO_CALLER_ID,
'body'=to_do)).start()
See the documentation for Timer and notice that it takes args and kwargs parameters to pass to the provided function.

The Timer shouldn't be used with web service, here's the reason:
Web requests are served using multiple threads.
You need to wait for the timer thread to execute so it will become blocking otherwise, timer thread won't be executed.
Hence I would recommend, use some any queue to do such things.

Running asynchronous code synchronously in separate thread

I'm using Django Channels to support websockets and am using their concept of a group to broadcast messages to multiple consumers in the same group. In order to send messages outside of a consumer, you need to call asynchronuous methods in otherwise synchronous code. Unfortunately, this is presenting problems when testing.
I began by using loop.run_until_complete:
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.ensure_future(channel_layer.group_send(group_name, {'text': json.dumps(message),
'type': 'receive_group_json'}),
loop=loop))
Then the stacktrace read that the thread did not have an event loop: RuntimeError: There is no current event loop in thread 'Thread-1'.. To solve this, I added:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(asyncio.ensure_future(channel_layer.group_send(group_name, {'text': json.dumps(message),
'type': 'receive_group_json'}),
loop=loop))
And now the stacktrace is reading the RuntimeError: Event loop is closed, although if I add print statements loop.is_closed() prints False.
For context, I'm using Django 2.0, Channels 2, and a redis backend.
Update: I tried running this in a Python interpreter (outside of py.test to remove moving variables). When I ran the second code block, I did not get an Event loop is closed error (that may be due to something on Pytest's end whether its timeouts, etc). But, I did not receive the group message in my client. I did, however, see a print statement:
({<Task finished coro=<RedisChannelLayer.group_send() done, defined at /Users/my/path/to/venv/lib/python3.6/site-packages/channels_redis/core.py:306> result=None>}, set())
Update 2: After flushing redis, I added a fixture in py.test to flush it for every function as well as a session-scoped event loop. This time yielding yet another print from RedisChannelLayer:
({<Task finished coro=<RedisChannelLayer.group_send() done, defined at /Users/my/path/to/venv/lib/python3.6/site-packages/channels_redis/core.py:306> exception=RuntimeError('Task <Task pending coro=<RedisChannelLayer.group_send() running at /Users/my/path/to/venv/lib/python3.6/site-packages/channels_redis/core.py:316>> got Future <Future pending> attached to a different loop',)>}, set())

If channel_layer expects to reside in its own event loop in another thread, you will need to get a hold of that event loop object. Once you have it, you can submit coroutines to it and synchronize with your thread, like this:
def wait_for_coro(coro, loop):
# submit coroutine to the event loop in the other thread
# and wait for it to complete
future = asyncio.run_coroutine_threadsafe(coro, loop)
return future.wait()
wait_for_coro(channel_layer.group_send(group_name, ...), channel_loop)

By default, only the main thread gets an event loop and calling get_event_loop in other threads will fail.
If you need an event loop in another thread -- such as a thread handling an HTTP or WebSockets request -- you need to make it yourself with new_event_loop. After that you can use set_event_loop and future get_event_loop calls will work. I do this:
# get or create an event loop for the current thread
def get_thread_event_loop():
try:
loop = asyncio.get_event_loop() # gets previously set event loop, if possible
except RuntimeError:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
return loop
More here.

Celery Storing unrecoverable task failures for later resubmission

I'm using the djkombu transport for my local development, but I will probably be using amqp (rabbit) in production.
I'd like to be able to iterate over failures of a particular type and resubmit. This would be in the case of something failing on a server or some edge case bug triggered by some new variation in data.
So I could be resubmitting jobs up to 12 hours later after some bug is fixed or a third party site is back up.
My question is: Is there a way to access old failed jobs via the result backend and simply resubmit them with the same params etc?

You can probably access old jobs using:
CELERY_RESULT_BACKEND = "database"
and in your code:
from djcelery.models import TaskMeta
task = TaskMeta.objects.filter(task_id='af3185c9-4174-4bca-0101-860ce6621234')[0]
but I'm not sure you can find the arguments that the task is being started with ... Maybe something with TaskState...
I've never used it this way. But you might want to consider the task.retry feature?
An example from celery docs:
#task()
def task(*args):
try:
some_work()
except SomeException, exc:
# Retry in 24 hours.
raise task.retry(*args, countdown=60 * 60 * 24, exc=exc)

From IRC
<asksol> dpn`: task args and kwargs are not stored with the result
<asksol> dpn`: but you can create your own model and store it there
(for example using the task_sent signal)
<asksol> we don't store anything when the task is sent, only send a
message. but it's very easy to do yourself
This was what I was expecting, but hoped to avoid.
At least I have an answer now :)

How to display remote email message?

I have been using this code to display IMAP4 messages:
void DisplayMessageL( const TMsvId &aId )
{
// 1. construct the client MTM
TMsvEntry indexEntry;
TMsvId serviceId;
User::LeaveIfError( iMsvSession->GetEntry(aId, serviceId, indexEntry));
CBaseMtm* mtm = iClientReg->NewMtmL(indexEntry.iMtm);
CleanupStack::PushL(mtm);
// 2. construct the user interface MTM
CBaseMtmUi* uiMtm = iUiReg->NewMtmUiL(*mtm);
CleanupStack::PushL(uiMtm);
// 3. display the message
uiMtm->BaseMtm().SwitchCurrentEntryL(indexEntry.Id());
CMsvOperationWait* waiter=CMsvOperationWait::NewLC();
waiter->Start(); //we use synchronous waiter
CMsvOperation* op = uiMtm->OpenL(waiter->iStatus);
CleanupStack::PushL(op);
CActiveScheduler::Start();
// 4. cleanup for example even members
CleanupStack::PopAndDestroy(4); // op,waiter, mtm, uimtm
}
However, in case when user attempts to download a remote message (i.e. one of the emails previously not retrieved from the mail server), and then cancels the request, my code remains blocked, and it never receives information that the action was canceled.
My question is:
what is the workaround for the above, so the application is not stuck?
can anyone provide a working example for asynchronous call for opening remote messages which do not panic and crash the application?
Asynchronous calls for POP3, SMTP and local IMAP4 messages work perfectly, but remote IMAP4 messages create this issue.
I am testing these examples for S60 5th edition.
Thank you all in advance.

First of all, I would retry removing CMsvOperationWait and deal with the open request asynchronously - i.e. have an active object waiting for the CMsvOperation to complete.
CMsvOperationWait is nothing more than a convenience to make an asynch operation appear synchronous and my suspicion is that this is culprit - in the case of download->show message, there are two asynch operations chained.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js