twisted self.transport.write working inside loop - python-2.7

I have the following code for the client which sends some data to server after every 8 seconds and following is my code
class EchoClient(LineReceiver):
def connectionMade(self):
makeByteList()
self.transport.write(binascii.unhexlify("7777"))
while 1:
print "hello"
lep = random.randint(0,4)
print lep
print binascii.unhexlify(sendHexBytes(lep))
try:
self.transport.write("Hello")
self.transport.write(binascii.unhexlify(sendHexBytes(lep)))
except Exception, ex1:
print "Failed to send"
time.sleep(8)
def lineReceived(self, line):
pass
def dataReceived(self, data):
print "receive:", data
Every statement inside while loop execute except self.transport.write. The server doesn't receive any data. Also self.transport.write outside while loop doesn't execute. In both cases no exception is raised, but if I remove while loop the statement outside loop executes correctly. Why is this happening? Please correct me where I am making mistake?

All methods in twisted are asynchronous. All of the the methods such as connectionMade and lineReceived are happening on the same thread. The Twisted reactor runs a loop (called an event loop) and it calls methods such as connectionMade and lineReceived when these events happen.
You have an infinite loop in connectionMade. Once Python gets into that loop, it can never get out. Twisted calls connectionMade when connection is established, and your code stays there forever. Twisted has no opportunity to actually write the data to the transport, or receive data, it is stuck in connectionMade!
When you write Twisted code, the important point that you must understand is that you may not block on the Twisted thread. For example, let's say I want to send a "Hello" 4 seconds after a client connects. I might write this:
class EchoClient(LineReceiver):
def connectionMade(self):
time.sleep(4)
self.transport.write("Hello")
but this would be wrong. What happens if 2 clients connect at the same time? The first client will go into connectionMade, and my program will hang for 4 seconds until the "Hello" is sent.
The Twisted way to do this would be like this:
class EchoClient(LineReceiver):
def connectionMade(self):
reactor.callLater(4, self.sendHello)
def sendHello(self):
self.transport.write("Hello")
Now, when Twisted enters connectionMade, it calls reactor.callLater to schedule an event 4 seconds in the future. Then it exits connectionMade and continues doing all the other stuff it needs to do. Until you grasp the concept of async programming you can't continue in Twisted. I suggest you read through the Twisted docs here.
Finally, an unrelated note: If you have a LineReceiver, you should not implement your own dataReceived, it will make lineReceived not called. LineReceiver is a protocol which implements its own dataReceived which buffers and breaks up data into lines and calls lineReceived methods.

Related

kafka-python: Closing the kafka producer with 0 vs inf secs timeout

I am trying to produce the messages to a Kafka topic using kafka-python 2.0.1 using python 2.7 (can't use Python 3 due to some workplace-related limitations)
I created a class as below in a separate and compiled the package and installed in virtual environment:
import json
from kafka import KafkaProducer
class KafkaSender(object):
def __init__(self):
self.producer = self.get_kafka_producer()
def get_kafka_producer(self):
return KafkaProducer(
bootstrap_servers=['locahost:9092'],
value_serializer=lambda x: json.dumps(x),
request_timeout_ms=2000,
)
def send(self, data):
self.producer.send("topicname", value=data)
My driver code is something like this:
from mypackage import KafkaSender
# driver code
data = {"a":"b"}
kafka_sender = KafkaSender()
kafka_sender.send(data)
Scenario 1:
I run this code, it runs just fine, no errors, but the message is not pushed to the topic. I have confirmed this as offset or lag is not increased in the topic. Also, nothing is getting logged at the consumer end.
Scenario 2:
Commented/removed the initialization of Kafka producer from __init__ method.
I changed the sending line from
self.producer.send("topicname", value=data) to self.get_kafka_producer().send("topicname", value=data) i.e. creating kafka producer not in advance (during class initialization) but right before sending the message to topic. And when I ran the code, it worked perfectly. The message got published to the topic.
My intention using scenario 1 is to create a Kafka producer once and use it multiple times and not to create Kafka producer every time I want to send the messages. This way I might end up creating millions of Kafka producer objects if I need to send millions of messages.
Can you please help me understand why is Kafka producer behaving this way.
NOTE: If I write the Kafka Code and Driver code in same file it works fine. It's not working only when I write the Kafka code in separate package, compile it and import it in my another project.
LOGS: https://www.diffchecker.com/dTtm3u2a
Update 1: 9th May 2020, 17:20:
Removed INFO logs from the question description. I enabled the DEBUG level and here is the difference between the debug logs between first scenario and the second scenario
https://www.diffchecker.com/dTtm3u2a
Update 2: 9th May 2020, 21:28:
Upon further debugging and looking at python-kafka source code, I was able to deduce that in scenario 1, kafka sender was forced closed while in scenario 2, kafka sender was being closed gracefully.
def initiate_close(self):
"""Start closing the sender (won't complete until all data is sent)."""
self._running = False
self._accumulator.close()
self.wakeup()
def force_close(self):
"""Closes the sender without sending out any pending messages."""
self._force_close = True
self.initiate_close()
And this depends on whether kafka producer's close() method is called with timeout 0 (forced close of sender) or without timeout (in this case timeout takes value float('inf') and graceful close of sender is called.)
Kafka producer's close() method is called from __del__ method which is called at the time of garbage collection. close(0) method is being called from method which is registered with atexit which is called when interpreter terminates.
Question is why in scenario 1 interpreter is terminating?

Running asynchronous code synchronously in separate thread

I'm using Django Channels to support websockets and am using their concept of a group to broadcast messages to multiple consumers in the same group. In order to send messages outside of a consumer, you need to call asynchronuous methods in otherwise synchronous code. Unfortunately, this is presenting problems when testing.
I began by using loop.run_until_complete:
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.ensure_future(channel_layer.group_send(group_name, {'text': json.dumps(message),
'type': 'receive_group_json'}),
loop=loop))
Then the stacktrace read that the thread did not have an event loop: RuntimeError: There is no current event loop in thread 'Thread-1'.. To solve this, I added:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(asyncio.ensure_future(channel_layer.group_send(group_name, {'text': json.dumps(message),
'type': 'receive_group_json'}),
loop=loop))
And now the stacktrace is reading the RuntimeError: Event loop is closed, although if I add print statements loop.is_closed() prints False.
For context, I'm using Django 2.0, Channels 2, and a redis backend.
Update: I tried running this in a Python interpreter (outside of py.test to remove moving variables). When I ran the second code block, I did not get an Event loop is closed error (that may be due to something on Pytest's end whether its timeouts, etc). But, I did not receive the group message in my client. I did, however, see a print statement:
({<Task finished coro=<RedisChannelLayer.group_send() done, defined at /Users/my/path/to/venv/lib/python3.6/site-packages/channels_redis/core.py:306> result=None>}, set())
Update 2: After flushing redis, I added a fixture in py.test to flush it for every function as well as a session-scoped event loop. This time yielding yet another print from RedisChannelLayer:
({<Task finished coro=<RedisChannelLayer.group_send() done, defined at /Users/my/path/to/venv/lib/python3.6/site-packages/channels_redis/core.py:306> exception=RuntimeError('Task <Task pending coro=<RedisChannelLayer.group_send() running at /Users/my/path/to/venv/lib/python3.6/site-packages/channels_redis/core.py:316>> got Future <Future pending> attached to a different loop',)>}, set())
If channel_layer expects to reside in its own event loop in another thread, you will need to get a hold of that event loop object. Once you have it, you can submit coroutines to it and synchronize with your thread, like this:
def wait_for_coro(coro, loop):
# submit coroutine to the event loop in the other thread
# and wait for it to complete
future = asyncio.run_coroutine_threadsafe(coro, loop)
return future.wait()
wait_for_coro(channel_layer.group_send(group_name, ...), channel_loop)
By default, only the main thread gets an event loop and calling get_event_loop in other threads will fail.
If you need an event loop in another thread -- such as a thread handling an HTTP or WebSockets request -- you need to make it yourself with new_event_loop. After that you can use set_event_loop and future get_event_loop calls will work. I do this:
# get or create an event loop for the current thread
def get_thread_event_loop():
try:
loop = asyncio.get_event_loop() # gets previously set event loop, if possible
except RuntimeError:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
return loop
More here.

boost async rest client

I currently working on a async rest client using boost::asio::io_service.
I am trying to make the client as a some kind of service for a bigger program.
The idea is that the client will execute async http requests to a rest API, independently from the thread running the main program. So inside in the client will be another thread waiting for a request to send.
To pass the requests to the client I am using a io_service and io_service::work initialized with the io_service. I almost reused the example given on this tutorial - logger_service.hpp.
My problem is that when in the example they post a work to the service, the called handler is a simple function. In my case as I am making async calls like this
(I have done the necessary to run all the instancies of the following objects and some more in a way to be able to establish the network connection):
boost::asio::io_service io_service_;
boost::asio::io_service::work work_(io_service_); //to prevent the io_service::run() to return when there is no more work to do
boost::asio::ssl::stream<boost::asio::ip::tcp::socket> socket_(io_service_);
In the main program I am doing the following calls:
client.Connect();
...
client.Send();
client.Send();
...
Some client's pseudo code:
void MyClass::Send()
{
...
io_service_.post(boost::bind(&MyClass::AsyncSend, this);
...
}
void MyClass::AsyncSend()
{
...
boost::io_service::asio::async_write(socket, streamOutBuffer, boost::bind(&MyClass::handle_send, this));
...
}
void MyClass::handle_send()
{
boost::io_service::asio::async_read(socket, streamInBuffer, boost::bind(&MyClass::handle_read, this));
}
void MyClass::handle_read()
{
// ....treatment for the received data...
if(allDataIsReceived)
FireAnEvent(ReceivedData);
else
boost::io_service::asio::async_read(socket, streamInBuffer, boost::bind(&MyClass::handle_read, this));
}
As it is described in the documentation the 'post' method requests the io_service to invoke the given handler and return immediately. My question is, will be the nested handlers, for example the ::handle_send in the AsyncSend, called just after (when the http response is ready) when post() is used? Or the handlers will be called in another order different from the one defined by the order of post() calls ?
I am asking this question because when I call only once client->Send() the client seems to "work fine". But when I make 2 consecutive calls, as in the example above, the client cannot finish the first call and than goes to execute the second one and after some chaotic executions at the end the 2 operations fail.
Is there any way to do what I'm describing execute the whole async chain before the execution of another one.
I hope, I am clear enough with my description :)
hello Blacktempel,
Thank you for the given comment and the idea but however I am working on a project which demands using asynchronous calls.
In fact, as I am newbie with Boost my question and the example I gave weren't right in the part of the 'handle_read' function. I add now a few lines in the example in a way to be more clear in what situation I am (was).
In fact in many examples, may be all of them, who are treating the theme how to create an async client are very basic... All they just show how to chain the different handlers and the data treatment when the 'handle_read' is called is always something like "print some data on the screen" inside of this same read handler. Which, I think, is completely wrong when compared to real world problems!
No one will just print data and finish the execution of her program...! Usually once the data is received there is another treatment that has to start, for example FireAnEvent(). Influenced by the bad examples, I have done this 'FireAnEvent' inside the read handler, which, obviously is completely wrong! It is bad to do that because making the things like that, the "handle_read" might never exit or exit too late. If this handler does not finish, the io_service loop will not finish too. And if your further treatment demands once again to your async client to do something, this will start/restart (I am not sure about the details) the io_service loop. In my case I was doing several calls to the async client in this way. At the end I saw how the io_service was always started but never ended. Even after the whole treatment was ended, I never saw the io_service to stop.
So finally I let my async client to fill some global variable with the received data inside the handle_read and not to call directly another function like FireAnEvent. And I moved the call of this function (FireAnEvent) just after the io_service.run(). And it worked because after the end of the run() method I know that the loop is completely finished!
I hope my answer will help people :)

Stop a Twisted Fiber Mid-Execution

There are many ways to create a Python Twisted fiber. For example, one could call reactor.callWhenRunning(helloWorld). helloWorld() will execute and the fiber will stop executing when helloWorld() returns.
What if half way through executing helloWorld() I wanted to stop the fiber's execution without impacting the rest of the fibers? How would I do that?
If the execution is inside helloWorld() itself, then I could simply return from the method. But, what if the program is 10 nested calls deep? How would I stop the fiber's execution from continuing? I suppose I could make all 10 methods return immediately but that would be very difficult to code for a large program with 1000s of methods.
I could raise an exception. This would work unless some method in the call stack (besides the reactor) catches the exception.
I could do the following. However, this will add a lot of pending Deferreds to pile up in the Twisted reactor.
while True:
d = defer.Deferred()
d.delay = reactor.callLater(sys.maxint, d.callback, None)
yield d
Are there any other solutions?
Note: A Python 2.6 solution would be ideal.
The solution is to simply call cancel() on the Deferred before yielding. The code does not continue execution after the yield.
d = defer.Deferred()
d.delay = reactor.callLater(sleepTime, d.callback, None)
d.cancel()
yield d
returnValue(None)

Stopping a function in Python using a timeout

I'm writing a healthcheck endpoint for my web service.
The end point calls a series of functions which return True if the component is working correctly:
The system is considered to be working if all the components are working:
def is_health():
healthy = all(r for r in (database(), cache(), worker(), storage()))
return healthy
When things aren't working, the functions may take a long time to return. For example if the database is bogged down with slow queries, database() could take more than 30 seconds to return.
The healthcheck endpoint runs in the context of a Django view, running inside a uWSGI container. If the request / response cycle takes longer than 30 seconds, the request is harakiri-ed!
This is a huge bummer, because I lose all contextual information that I could have logged about which component took a long time.
What I'd really like, is for the component functions to run within a timeout or a deadline:
with timeout(seconds=30):
database_result = database()
cache_result = cache()
worker_result = worker()
storage_result = storage()
In my imagination, as the deadline / harakiri timeout approaches, I can abort the remaining health checks and just report the work I've completely.
What's the right way to handle this sort of thing?
I've looked at threading.Thread and Queue.Queue - the idea being that I create a work and result queue, and then use a thread to consume the work queue while placing the results in result queue. Then I could use the thread's Thread.join function to stop processing the rest of the components.
The one challenge there is that I'm not sure how to hard exit the thread - I wouldn't want it hanging around forever if it didn't complete it's run.
Here is the code I've got so far. Am I on the right track?
import Queue
import threading
import time
class WorkThread(threading.Thread):
def __init__(self, work_queue, result_queue):
super(WorkThread, self).__init__()
self.work_queue = work_queue
self.result_queue = result_queue
self._timeout = threading.Event()
def timeout(self):
self._timeout.set()
def timed_out(self):
return self._timeout.is_set()
def run(self):
while not self.timed_out():
try:
work_fn, work_arg = self.work_queue.get()
retval = work_fn(work_arg)
self.result_queue.put(retval)
except (Queue.Empty):
break
def work(retval, timeout=1):
time.sleep(timeout)
return retval
def main():
# Two work items that will take at least two seconds to complete.
work_queue = Queue.Queue()
work_queue.put_nowait([work, 1])
work_queue.put_nowait([work, 2])
result_queue = Queue.Queue()
# Run the `WorkThread`. It should complete one item from the work queue
# before it times out.
t = WorkThread(work_queue=work_queue, result_queue=result_queue)
t.start()
t.join(timeout=1.1)
t.timeout()
results = []
while True:
try:
result = result_queue.get_nowait()
results.append(result)
except (Queue.Empty):
break
print results
if __name__ == "__main__":
main()
Update
It seems like in Python you've got a few options for timeouts of this nature:
Use SIGALARMS which work great if you have full control of the signals used by the process but probably are a mistake when you're running in a container like uWSGI.
Threads, which give you limited timeout control. Depending on your container environment (like uWSGI) you might need to set options to enable them.
Subprocesses, which give you full timeout control, but you need to be conscious of how they might change how your service consumes resources.
Use existing network timeouts. For example, if part of your healthcheck is to use Celery workers, you could rely on AsyncResult's timeout parameter to bound execution.
Do nothing! Log at regular intervals. Analyze later.
I'm exploring the benefits of these different options more.
Update #2
I put together a GitHub repo with quite a bit more information on the topic:
https://github.com/johnboxall/pytimeout
I'll type it up into a answer one day but the TLDR is here:
https://github.com/johnboxall/pytimeout#recommendations