Purpose for bonobo-etl Tee node - bonobo-etl

A tee operation is expected to take an input and return two copy outputs.
I noticed that bonobo-etl features Tee nodes, but it's not clear how they are intended to be used.
Can they be used to fork the running graph into two directions?
Or are they intended for a Load-type persistent action, to use without stopping the data flow in that particular node?

The bonobo.Tee(f: Callable) operation just applies a function and pass the stream input unmodified into the stream output.
Although the name obviously comes from the unix tool tee (as you pointed), it's not perfectly similar as in the bonobo version, one output is the stream output and one output is just the callable you use. This callable may or may not send data to a stream (and sending data to a stream is knid of hackish for now).
As an example, if you use Tee(print), then there will be the stream passed both to output and print.
As another, more realistic example, you should be able to do the following:
import bonobo
import queue
output_queue = queue.Queue()
def get_graph():
graph = bonobo.Graph()
graph >> range(100) >> bonobo.Tee(output_queue.put) >> print
return graph
if __name__ == "__main__":
with bonobo.parse_args() as options:
bonobo.run(get_graph())
while True:
try:
print("out:", output_queue.get_nowait())
except queue.Empty:
break
Hope that helps.

Related

Reusing FFMPEG AVFilterGraph

In my code I apply same filtering on multiple input files. In first version of code I created AVFilterGraph for every input, but I think these actions might be excessive.
However, when I try to reuse the same graph, I face with the error during sending frame to abuffer filter. At the previous iteration over input files, I passed EOF to it for flushing, and the av_buffersrc_add_frame function has a check for this:
BufferSourceContext *s = ctx->priv;
...
if (s->eof)
return AVERROR(EINVAL);
which crashes the execution on the second iteration.
Unfortunately, I couldn't find any functions that can restore buffer filter or something like that.
I would like to know if avfilter implies the possibility to reuse once created filter graph, or there are some fundamental misconceptions in my understanding of ffmpeg logic by passing input after EOF.
Thank you!

count all events in stream with babeltrace API

I have a LTTNg trace, which i am parsing using babeltrace API. So i was wondering if I could count all events in trace (or stream) without iterating over them. What functions from publilc API I can use to do that ?
The very nature of CTF makes it impossible to count the event records of a given packet in constant time. The packet's context could include an event record count field somehow, but it's not specified, so generic tools would not use it.
Thus the only way to count events is to iterate the event records, unfortunately. The easiest way is to count the number of lines that the text format of the babeltrace(1) tool prints:
babeltrace /path/to/ctf/trace/directory | wc --lines
This works as long as there's one line per printed event record, which is the case unless an event record contains a string field which has a newline (currently not escaped in the text output).
You may also wish to consider discarded event records. They are not printed to the standard output by babeltrace(1), but the tool prints a message including the count to the standard error when they are detected.
There's no way with the current babeltrace(1) tool to only print the event records which belong to the packets of a given data stream. If you need this, what I suggest is that you remove all the data stream files except the one for which you need an event record count, and run the command above again.
Also consider the Babeltrace Python bindings, for example (not tested):
import babeltrace
def count_ctf_event_records(path):
trace_collection = babeltrace.TraceCollection()
trace_collection.add_trace(path, 'ctf')
return sum(1 for event in trace_collection.events)
if __name__ == '__main__':
import sys
print(count_ctf_event_records(sys.argv[1]))
Saved as count.py, you can try this:
python3 count.py /path/to/ctf/trace/directory
Counting the event records of a specific data stream with the Python bindings is left as an exercise for the reader.
Having said this, I don't know if the Python bindings approach is faster than the babeltrace(1) one.

Reliably write and read to serial using python

I am communicating with a Fona 808 module from a Raspberry Pi and I can issue AT commands, yey!
Now I want to make a python program where I can reliably issue AT commands using shortcut commands like "b" for getting the battery level and so on.
This is what I have so far:
import serial
con = serial.Serial('/dev/ttyAMA0',timeout=0.2,baudraute=115200)
def sendAtCommand(command):
if command == 'b':
con.write("at+cbc\n".encode())
reply = ''
while con.inWaiting():
reply = reply + con.read(1)
return reply
while True:
x = raw_input("At command: ")
if x.strip() == 'q':
break
reply = sendAtCommand(x)
print(reply)
con.close()
In the sendAtCommand I will have a bunch of if statements that send different at commands depending on the input it receives.
This is somewhat working but is very unreliable. Sometimes I get the full message. Other times I get nothing. Then double message next time and so on.
I would like to create one method that issues a command to the Fona module and then reads the full response and returns it.
Any suggestions?
Your loop quits if the 'modem' has not responded anything to your at command yet. You should keep reading the serial input until you get a linefeed or until a certain time has passes e.g. 1 second or so.
Okay. It turns out this is pretty trivial.
Since at commands always return OK after a successful query then it is simply a matter of reading the lines until eventually one of them will contain 'OK\r\n'.
Like so:
def readUntilOK():
reply=''
while True:
x = con.readline()
reply += x
if x == 'OK\r\n':
return reply
This does not have a timeout and it does not check for anything else than an OK response. Which makes it very limiting. Adding error handling is up to the reader. Something like if x == 'ERROR\r\n' would be a good start.
Cheers!

c++ software passing arguments method

I have a problem related to passing arguments to a C++ compiled executable. The program emulate the behaviour of a particular inference engine: the setup of the engine is load at runtime from an XML file, and then I want to call it from command line with different input values.
The characteristic of the input are:
Every time that I call the program, the input structure is different, because the system itself is different.
The input is a set of couple {name, value}, one for each part of the system.
I have to separate the configuration XML from the input.
I call the program from a PHP or Node.js server, since it return a result that I expose to the outside through an API.
Input value are obtained from an HTTP post request.
By now I have tried these solutions:
Pass it from the command line ex: "./mysoftware input1 value1 input2 value2 ...etc". A little unconfortable, since I have up to 200 input.
Create a file with all the couples name,value and then call the program that parse the file and then destroy at the end. This is a bottleneck of performance for my API, because at every call I have to create and destruct a file.
Does anyone know a better way to approach this problem?
3. Pass the values to the program via the standard input stream and read them from std::cin inside your C++ program.

ZMQ IOLoop instance write/read workflow

I am having a weird system behavior when using PyZMQ's IOLoop instance:
def main():
context = zmq.Context()
s = context.socket(zmq.REP)
s.bind('tcp://*:12345')
stream = zmqstream.ZMQStream(s)
stream.on_recv(on_message)
io_loop = ioloop.IOLoop.instance()
io_loop.add_handler(some_file.fileno(), on_file_data_ready_read_and_then_write, io_loop.READ)
io_loop.add_timeout(time.time() + 10, another_handler)
io_loop.start()
def on_file_data_ready_read_and_then_write(fd, events):
# Read content of the file and then write back
some_file.read()
print "Read content"
some_file.write("blah")
print "Wrote content"
def on_message(msg):
# Do something...
pass
if __name__=='__main__':
main()
Basically the event loop listens to zmq port of 12345 for JSON requests, and reads content from a file when available (and when it does, manipulate it and wrties to it back. Basically the file is a special /proc/ kernel module that was built for that).
Everything works well BUT, for some reason when looking at the strace I see the following:
...
1. read(\23424) <--- Content read from file
2. write("read content")
3. write("Wrote content")
4. POLLING
5. write(\324324) # <---- THIS is the content that was sent using some_file.write()
...
So it seems like the write to file was not done in the order of the python script, but the system call of write to that file was done AFTER the polling, even though it should have been done between lines 2 and 3.
Any ideas?
Looks like you're running into a caching problem. If some_file is a file like object, you can try explicitly calling .flush() on it, same goes for ZMQ Socket which can hold messages for efficiency reasons as well.
As it stands, the file's contents are being flushed when the some_file reference is garbage collected.
Additional:
use the context manager logic that newer versions of Python provide with open()
with open("my_file") as some_file:
some_file.write("blah")
As soon as it finishes this context, some_file will automatically be flushed and closed.