Kaldi - how to share language model among multiple decoders?

Kaldi - how to share language model among multiple decoders? - c++

I'm using Kaldi for decoding lots of audio samples every day. I have a plan that there would be multiple decoders running in parallel doing decoding on the same language model. For this it would be nice if I could share one language model that is loaded into memory by multiple decoders. Model I have right now is 1GB on disc and uses around 3GB in the memory and it would be great if I can save the memory by using it once again.
Have anyone ever thought about such thing? Is it doable?
I have not found anything about it in Kaldi documentation
I was thinking to use boost::interprocess library to manage the object fst::VectorFst fst::ReadFstKaldi() as this is the biggest object. But this looks like a big issue, as it is a complex custom object and I'm not sure if boost::interprocess can handle those. I don't want to go into customizing the Kaldi objects to have them supported by boost memory sharing.
Any other ideas about this approach?

You do not need multiple processes, you just share the fst object across threads. It's constant, so there is no need to protect it. You create decoder with fst pointer in every worker, decoders are separate for every thread. You can use io_service for processing requests.

Related

C++ Multithreading objects from library with static variables

I created several "manager" objects of a library, each with different parameters. Every cycle a manager is fed with a data set, run calculations and writes result into a data structure. I have to run all managers on the same data set as fast as possible, so I created a threadpool to distribute data to all managers so that they can be run concurrently. Each manager have access to one result data structure, so I thought this will be thread safe.
However later I found out that the several classes in this library, which are used by managers, have static member variables which (as I believe) causes segmentation faults - segmentation errors originates from the library, not my code (checked).
My question is, is it possible to go around this? This will probably sound stupid, but is it possible to force each manager to use its own copy of the library and thus circumventing the static issue? I am processing ~20-50k data sets per second so I cannot afford overhead. Using forks would be very painful and in my case could create unwanted overheads.
Thanks for any advice!

MPI large data processing

My application of MPI will read a series images to build a 3-D data. It is very large data( about 4 GB). I don't want the data distributed to every worker. I don't know how to do with this. Shared memory may be one solution. But how to use shared memory by using MPI. I have searched a lot about this, nothing good was found. Could someone give me suggestions or examples for large data processing by using MPI (BTW. I am using Open MPI implementation).
Thank you very much for your great help.

What you are looking for are one-sided communications that were added to MPI-2. It is available in OpenMPI. For an introduction, you could have a look at http://www.linux-mag.com/id/1793/ .
The principle is that you create a window (an area of shared memory), then you can get or put data from that window. MPI should optimize it to use RMA when available. There are also mechanisms like fences to ensure synchronization across processes.

Transfering data between threads in C++ and Fortran

I need to move large amounts (~10^6 floats) between multiple c++ threads and a fortran thread. At the moment we use windows shared memory to move very small piece of data, mainly for communication, and then save the file to a proprietary format to move the data. I've been asked to look at moving the bulk of the data via shared memory, but looking at the shared memory techniques in windows (seemingly a character buffer) this looks like a mess. Another possibility is boost's interprocess communication, but not sure how to use that from fortran, or if it's a good idea. Another idea was to use a database like sqlite.
I'm just wondering if anyone had any experience or would like to comment, as this is a little over my head at the moment.
Thanks very much
Jim

Use pipes. If you can inherit handles between processes, you can use anonym pipes, when not, you have to use named pipes. Also, threads share the address space, so you're probably thinking of processes when you say threads.

Store huge amount of data in memory

I am looking for a way to store several gb's of data in memory. The data is loaded into a tree structure. I want to be able to access this data through my main function, but I'm not interested in reloading the data into the tree every time I run the program. What is the best way to do this? Should I create a separate program for loading the data and then call it from the main function, or are there better alternatives?
thanks
Mads

I'd say the best alternative would be using a database - which would be then your "separate program for loading the data".

If you are using a POSIX compliant system, then take a look into mmap.
I think Windows has another function to memory map a file.

You could probably solve this using shared memory, to have one process that it long-lived build the tree and expose the address for it, and then other processes that start up can get hold of that same memory for querying. Note that you will need to make sure the tree is up to being read by multiple simultaneous processes, in that case. If the reads are really just pure reads, then that should be easy enough.

You should look into a technique called a Memory mapped file.

I think the best solution is to configure a cache server and put data there.
Look into Ehcache:
Ehcache is an open source, standards-based cache used to boost
performance, offload the database and simplify scalability. Ehcache is
robust, proven and full-featured and this has made it the most
widely-used Java-based cache.
It's written in Java, but should support any language you choose:
The Cache Server has two apis: RESTful resource oriented, and SOAP.
Both support clients in any programming language.

You must be running a 64 bit system to use more than 4 GB's of memory. If you build the tree and set it as a global variable, you can access the tree and data from any function in the program. I suggest you perhaps try an alternative method that requires less memory consumption. If you post what type of program, and what type of tree you're doing, I can perhaps give you some help in finding an alternative method.
Since you don't want to keep reloading the data...file storage and databases are out of question, but several gigs of memory seem like such a hefty price.
Also note that on Windows systems, you can access the memory of another program using ReadProcessMemory(), all you need is a pointer to use for the location of the memory.

You may alternatively implement the data loader as an executable program and the main program as a dll loaded and unloaded on demand. That way you can keep the data in the memory and be able to modify the processing code w/o reloading all the data or doing cross-process memory sharing.
Also, if you can operate on the raw data from the disk w/o making any preprocessing of it (e.g. putting it in a tree, manipulating pointers to its internals), you may want to memory-map the data and avoid loading unused portions of it.

How to implement a shared buffer?

I've got one program which creates 3 worker programs. The preferable method of communication in my situation would be through a memory buffer which all four programs may access.
Is there a way to pass a pointer, reference or any kind of handler to the child processes?
Update
The three child programs are transforming vertex data while the main program primarily deals with UI, system messages, errors, etc..
I'm hoping there is some way to leverage OpenCL such that the four programs can share a context. If this is not possible, it would be nice to have access to the array of vertices across all programs.
I suppose our target platform is Windows right now but we'd like to keep it as cross-platform as possible. If there is no way to implement this utilizing OpenCL we'll probably fall back to wrapping this piece of code for a handful of different platforms.

Your question is platform dependent, therefore :
for Windows : Named Shared Memory
for linux : mmap or POSIX shared memory access
general case : boost::interprocess

If you explain a bit what kind of data is shared and other constraints/goal of the system it would be easier to answer your question.
I wonder why you think a shared buffer would be good? Is that because you want to pass a pointer in the buffer to the data to be worked on? Then you need shared memory if you want to work across processes.
What about a client-server approach where you send data to clients on request?
More information about your problem helps giving a better answer.

You should use Named Shared Memory and inter-process synchronization.

This is somewhat wider than the original question on shared memory buffers, but depending on your design, volume of data and performance requirements you could look into in-memory databases such as Redis or distributed caches, especially if you find yourself in 'publish-subscribe' situation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Kaldi - how to share language model among multiple decoders? - c++

You do not need multiple processes, you just share the fst object across threads. It's constant, so there is no need to protect it. You create decoder with fst pointer in every worker, decoders are separate for every thread. You can use io_service for processing requests.

Related

C++ Multithreading objects from library with static variables

MPI large data processing

Transfering data between threads in C++ and Fortran

Store huge amount of data in memory

How to implement a shared buffer?

Categories

Resources