How to feed bokeh streaming interface by a c++ application - c++

I want to use bokeh to display a time series and provide the data with updates via source.stream(someDict). The data is, however, generated by a c++ application (server) that may run on the same machine or a machine in the network. I was looking into transmitting the updated data (only the newly added lines of the time series) via ZMQ to the python program (client).
The transmission of the message seems easy enough to implement but
the dictionary is column based. Is it not more efficient to append lines, i.e. one line per point in time, and send this?
If there is no good way for the first, what kind of object should I send? Do I need to marshal the information or is it sufficient to make a long string like {col1:[a,b,c,...], col2:[...],...} and send this to the client? I expect to send not more than a few hundred lines with 10 floats per second.
Thanks for all helpful answers.

Related

How to send/receive XML data with sockets in Qt using string?

I have a Qt TCP Server and Client program which can interact with each other. The Server can send some function generated data to the socket using Qtextstream. And the Client reads the data from the socket using simple readAll() and displays to a QtextEdit.
Now my data from Server side is huge (around 7000+ samples ) and I need the data to appear on the Client side instantaneously. I have learned that using XML will help in my case. So, I made an Qt XML Server and it generates the whole xml data into a .xml file. I read the .xml file in Client side and I can get to display its contents. I used the DOM method for parsing. But I get the data to display only when all the 7000+ samples have been generated on the Server side.
I need clarifications on these questions:
How do I write each element of the XML Server side in to a String and send them through socket? I learnt tagName() can help me, but I have not been able to figure out how.
Is there any other way other than the String method to get a single element generated in the Server side to appear in the Client side.
PS: I am a newbie, forgive my ignorance. Thank you.
Most DOM XML parsers require a complete, well-formed XML document before they'll do anything with it. That's precisely what you see: your data is processed only after all of the samples have been received.
You need to use an incremental parser that doesn't care about the XML document not being complete yet.
On the other hand: if you're not requiring XML for interoperability with 3rd party systems, you're probably wasting a lot of resources by using it. I don't know where you've "learned" that XML will "help in your case". To me it's not learning, it's just following the crowd without understanding what's going on. Is your requirement to use XML or to move the data around? Moving data around has been a well understood problem for decades. Computers "speak" binary. No need to work around it, you know. If all you need is to move around some numbers, use QDataStream and be done with it. It'll be two orders of magnitude faster than the fastest XML parsers, you'll transmit an order of magnitude less data, and everyone will live happily ever after*.
*living happily ever after not guaranteed, individual results may vary.

prioritizing torrent download sequences using libtorrent

Suppose I have 2+ clients (developed by me) ALL using libtorrent ( http://www.rasterbar.com/products/libtorrent/manual.html#queuing )
Can I prioritize download of a file from other clients effectively so that they download the file's pieces/chunks (whatever is torrent terminology here) from beginning of the file towards its end and not quite in random order?
(of course I'm allowing some "multiplexing" / "intertwining" pieces for reasons of availability and performance, but the goal here is to download as linearly and quickly from the start of the file towards the end as possible)
The goal I'm thinking about here is obviously previewing the file quickly. How to do this most effectively using libtorrent / possibly other C++ torrent library?
(I'm not quite interested in torrent implementations using non-binary languages, like Java or Python - I need machine code for reasons of performance and security, so, C, C++ or possibly D would all fit the bill)
You can certainly prioritize pieces and files with torrent_handle::prioritize_pieces() and torrent_handle::prioritize_files(). See the documentation.
This won't be enough to download in-order though. To do that, you can enable sequential download with torrent_handle::set_sequential_download(). This will issue new piece requests in-order. Keep in mind that the time a request take to be satisfied varies a lot depending on which peer you talk to. Making the requests in-order does not necessarily mean receiving the pieces in order.
There is another mechanism to attempt to do that. torrent_handle::set_piece_deadline() is used to set a target completion time for a piece. Such pieces are considered time-critical pieces, and they are ordered by their deadline and the fastest peers are used to request blocks from those pieces, attempting to download them in deadline-order.
Now, I also get the impression that you want two separate clients (presumably running on different machines) to coordinate which pieces they download. Is that right? It's not entirely clear what you're asking about, but there's no simple way of asking libtorrent to do that.
You could write a plugin for libtorrent that implements a new extension message for these clients to chat and coordinate, which could de-select certain pieces the other client is downloading by setting their priority to 0.

Replace strings in large file

I have a server-client application where clients are able to edit data in a file stored on the server side. The problem is that the file is too large in order to load it into the memory (8gb+). There could be around 50 string replacements per second invoked by the connected clients. So copying the whole file and replacing the specified string with the new one is out of question.
I was thinking about saving all changes in a cache on the server side and perform all the replacements after reaching a certain amount of data. After reaching that amount of data I would perform the update by copying the file in small chunks and replace the specified parts.
This is the only idea I came up with but I was wondering if there might be another way or what problems I could encounter with this method.
When you have more than 8GB of data which is edited by many users simultaneously, you are far beyond what can be handled with a flatfile.
You seriously need to move this data to a database. Regarding your comment that "the file content is no fit for a database": sorry, but I don't believe you. Especially regarding your remark that "many people can edit it" - that's one more reason to use a database. On a filesystem, only one user at a time can have write access to a file. But a database allows concurrent write access for multiple users.
We could help you to come up with a database schema, when you open a new question telling us how your data is structured exactly and what your use-cases are.
You could use some form of indexing on your data (in a separate file) to allow quick access to the relevant parts of this gigantic file (we've been doing this with large files successfully (~200-400gb), but as Phillipp mentioned you should move that data to a database, especially for the read/write access. Some frameworks (like OSG) already come with a database back-end for 3d terrain data, so you can peek there, how they do it.

M3U playlists...why redundant files?

I'm taking a look at some random Icecast playlists (available here: http://dir.xiph.org/index.php) and I'm wondering why many seem to contain a list of the same mp3 file.
e.g.:
GET http://dir.xiph.org/listen/193289/listen.m3u
http://87.230.101.49:80/top100station.mp3
http://87.230.103.107:80/top100station.mp3
http://87.230.101.16:80/top100station.mp3
http://87.230.101.78:80/top100station.mp3
http://87.230.101.11:80/top100station.mp3
http://87.230.103.85:80/top100station.mp3
http://80.237.158.87:80/top100station.mp3
http://87.230.101.30:80/top100station.mp3
http://80.237.158.88:80/top100station.mp3
http://87.230.103.9:80/top100station.mp3
http://87.230.103.58:80/top100station.mp3
http://87.230.101.12:80/top100station.mp3
http://87.230.101.24:80/top100station.mp3
http://87.230.103.60:80/top100station.mp3
http://87.230.103.8:80/top100station.mp3
http://87.230.101.25:80/top100station.mp3
http://87.230.101.56:80/top100station.mp3
http://87.230.101.50:80/top100station.mp3
For what it's worth, Icecast streams are intended for playing those Shoutcast-type live streams (think live radio over internet) so it makes sense that there wouldn't be a list of different tracks..but what are those repeats? Different bitrates or just mirrors?
I'm asking all of this because I'm looking to stream one of those mp3s inside my mobile app so I'm wondering if there any need to somehow figure out which url to use...
Internet radio streams are typically mirrored across many servers. This balances the bandwidth load, and reduces points of failure.
In addition, it is common for servers to fill up as they get popular. Most players will go to the next track in the playlist when a track fails, so this allows autofailover when a client cannot connect to a specific server, or if it gets disconnected.

How to measure the amount of data transmitted by my MPI program?

I'm experimenting my distributed clustering algorithm (implemented with MPI) on 24 computers that I set up as a cluster using BCCD (Bootable Cluster CD) that can be downloaded at http://bccd.net/.
I've written a batch program to run my experiment that consists in running my algorithm several times varying the number of nodes and the size of the input data.
I want to know the amount of data used in the MPI communications for each run of my algorithm so I can see how the amount of data changes when varying the previous mentioned parameters. And I want to do all this automatically using a batch program.
Someone told me to use tcpdump, but I found some difficulties in this approach.
First, I don't know how to call tcpdump in my batch program (which is written in C++ using the command system for making calls) before each run of my algorithm, since tcpdump requires another terminal to run in parallel with my application. And I can't run tcpdump in another computer since the network uses a switch. So I need to run it on the master node.
Second, I saw the traffic with tcpdump while my experiment was going on and I couldn't figure out what was the port used by MPI. It seems to use many ports. I wanted to know that for filtering the packages.
Third, I tried capturing whole packages and saving it to a file using tcpdump and in a few seconds the file was 3,5MB. But my whole experiment takes 2 days. So the final log file will be huge if I follow this approach.
The ideal approach would be to capture just the size field in the header of the packages and sum this up to obtain the total amount of data transmitted. In that way the logfile would be much smaller than if I were capturing the whole package. But I don't know how to do it.
Another restriction is that I don't have access to the computer disc. So I only have the RAM and my 4GB USB Flash drive. So I can't have huge logfiles.
I have already thought about using some MPI tracing or profiling tool such as those mentioned at http://www.open-mpi.org/faq/?category=perftools. I have only tested Sun Performance Analyzer until now. The problem is that I guess it will be difficult to install those tools on BCCD and maybe even impossible. In addtion to that, this tool will make my experiment take longer to end, sice it adds overhead. But if someone is familiar with BCCD and think it is a good choice to use one of those tools, so please let me know.
Hope someone have a solution.
Implementations like tcpdump won't work if there are multi-core nodes which use shard memory to communicate, anyway.
Using something like MPE is almost certainly the way to go. Those tools add very little overhead, and some overhead is always going to be necessary if you want to count messages. You can use mpitrace to write out every MPI call, and parse the resulting text file yourself. By the way, note that MPE is explicitly discussed on the bccd website. MPICH2 comes with MPE built in, but it can be compiled for any implementation. I've only found a very modest overhead for MPE.
IPM is another nice tool that does counting of messages and sizes; you should be able either parse the XML output, or use the postprocessing tools and just manually integrate the graphs (say either bytes_rx/bytes_tx by rank, or the message buffer size/count graph). The overhead for IPM is even less than for MPE, and mostly comes after the program's finished running to do the file I/O.
If you were really super worried about the overhead with either of these approaches, you could always write your own MPI wrappers using the profiling interface that wrapped MPI_Send, MPI_Recv, etc, and just counted # of bytes sent and recieved for each process, and output only that total at the end.