Monitor buffers in GNU Radio - c++

I have a question regarding buffering in between blocks in GNU Radio. I know that each block in GNU (including custom blocks) have buffers to store items that are going to be sent or received items. In my project, there is a certain sequence I have to maintain to synchronize events between blocks. I am using GNU radio on the Xilinx ZC706 FPGA platform with the FMCOMMS5.
In the GNU radio companion I created a custom block that controls a GPIO Output port on the board. In addition, I have an independent source block that is feeding information into the FMCOMMS GNU block. The sequence I am trying to maintain is that, in GNU radio, I first send data to the FMCOMMS block, second I want to make sure that the data got consumed by the FMCOMMS block (essentially by checking buffer), then finally I want to control the GPIO output.
From my observations, the source block buffer doesn’t seem to send the items until it’s full. This will cause a major issue in my project because this means that the GPIO data will be sent before or in parallel with sending the items to the other GNU blocks. That’s because I’m setting the GPIO value through direct access to its address in the ‘work’ function of my custom block.
I tried to use pc_output_buffers_full() in the ‘work’ function of my custom source in order to monitor the buffer, but I’m always getting 0.00. I’m not sure if it’s supposed to be used in custom blocks or if the ‘buffer’ in this case is something different from where the output items are stored. Here's a small code snippet which shows the problem:
char level_count = 0, level_val = 1;
vector<float> buff (1, 0.0000);
for(int i=0; i< noutput_items; i++)
{
if(level_count < 20 && i< noutput_items)
{
out[i] = gr_complex((float)level_val,0);
level_count++;
}
else if(i<noutput_items)
{
level_count = 0;
level_val ^=1;
out[i] = gr_complex((float)level_val,0);
}
buff = pc_output_buffers_full();
for (int n = 0; n < buff.size(); n++)
cout << fixed << setw(5) << setprecision(2) << setfill('0') << buff[n] << " ";
cout << "\n";
}
Is there a way to monitor the buffer so that I can determine when my first part of data bits have been sent? Or is there a way to make sure that the each single output item is being sent like a continuous stream to the next block(s)?
GNU Radio Companion version: 3.7.8
OS: Linaro 14.04 image running on the FPGA

Or is there a way to make sure that the each single output item is being sent like a continuous stream to the next block(s)?
Nope, that's not how GNU Radio works (at all!):
A while back I wrote an article that explains how GNU Radio deals with buffers, and what these actually are. While the in-memory architecture of GNU Radio buffers might be of lesser interest to you, let me quickly summarize the dynamics of it:
The buffers that (general_)work functions are called with behave for all that's practical like linearly addressable ring buffers. You get a random number of samples at once (restrictable to minimum numbers, multiples of numbers), and all that you not consume will be handed to you the next time work is called.
These buffers hence keep track of how much you've consumed, and thus, how much free space is in a buffer.
The input buffer a block sees is actually the output buffer of the "upstream" block in the flow graph.
GNU Radio's computation is backpressure-controlled: Any block's work method will immediately be called in an endless loop given that:
There's enough input for the block to do work,
There's enough output buffer space to write to.
Therefore, as soon as one block finishes its work call, the upstream block is informed that there's new free output space, thus typically leading to it running
That leads to high parallelity, since even adjacent blocks can run simultaneously without conflicting
This architecture favors large chunks of input items, especially for blocks that take a relative long time to computer: while the block is still working, its input buffer is already being filled with chunks of samples; when it's finished, chances are it's immediately called again with all the available input buffer being already filled with new samples.
This architecture is asynchronous: even if two blocks are "parallel" in your flow graph, there's no defined temporal relation between the numbers of items they produce.
I'm not even convinced switching GPIOs at times based on the speed computation in this completely non-deterministic timing data flow graph model is a good idea to start with. Maybe you'd rather want to calculate "timestamps" at which GPIOs should be switched, and send (timestamp, gpio state) command tuples to some entity in your FPGA that keeps absolute time? On the scale of radio propagation and high-rate signal processing, CPU timing is really inaccurate, and you should use the fact that you have an FPGA to actually implement deterministic timing, and use the software running on the CPU (i.e. GNU Radio) to determine when that should happen.
Is there a way to monitor the buffer so that I can determine when my first part of data bits have been sent?
Other than that, a method to asynchronously tell another another block that, yes, you've processed N samples, would be either to have a single block that just observes the outputs of both blocks that you want to synchronize and consumes an identical number of samples from both inputs, or to implement something using message passing. Again, my suspicion is that this is not a solution to your actual problem.

Related

Dealing with complex send recv message within a for loop

I am trying to parallelise a biological model in C++ with boost::mpi. It is my first attempt, and I am entirely new to the boost library (I have started from the Boost C++ Libraries book by Schaling). The model consists of grid cells and cohorts of individuals living within each grid cell. The classes are nested, such that a vector of Cohorts* belongs to a GridCell. The model runs for 1000 years, and at each time step, there is dispersal such that the cohorts of individuals move randomly between grid cells. I want to parallelise the content of the for loop, but not the loop itself as each time step depends on the state of the previous time.
I use world.send() and world.recv() to send the necessary information from one rank to another. Because sometimes there is nothing to send between ranks I use with mpi::status and world.iprobe() to make sure the code does not hang waiting for a message that was never sent (I followed this tutorial)
The first part of my code seems to work fine but I am having troubles with making sure all the sent messages have been received before moving on to the next step in the for loop. In fact, I noticed that some ranks move on to the following time step before the other ranks have had the time to send their messaages (or at least that what it looks like from the output)
I am not posting the code because it consists of several classes and it’s quite long. If interested the code is on github. I write here roughly the pseudocode. I hope this will be enough to understand the problem.
int main()
{
// initialise the GridCells and Cohorts living in them
//depending on the number of cores requested split the
//grid cells that are processed by each core evenly, and
//store the relevant grid cells in a vector of GridCell*
// start to loop through each time step
for (int k = 0; k < (burnIn+simTime); k++)
{
// calculate the survival and reproduction probabilities
// for each Cohort and the dispersal probability
// the dispersing Cohorts are sorted based on the rank of
// the destination and stored in multiple vector<Cohort*>
// I send the vector<Cohort*> with
world.send(…)
// the receiving rank gets the vector of Cohorts with:
mpi::status statuses[world.size()];
for(int st = 0; st < world.size(); st++)
{
....
if( world.iprobe(st, tagrec) )
statuses[st] = world.recv(st, tagrec, toreceive[st]);
//world.iprobe ensures that the code doesn't hang when there
// are no dispersers
}
// do some extra calculations here
//wait that all processes are received, and then the time step ends.
//This is the bit where I am stuck.
//I've seen examples with wait_all for the non-blocking isend/irecv,
// but I don't think it is applicable in my case.
//The problem is that I noticed that some ranks proceed to the next
//time step before all the other ranks have sent their messages.
}
}
I compile with
mpic++ -I/$HOME/boost_1_61_0/boost/mpi -std=c++11 -Llibdir \-lboost_mpi -lboost_serialization -lboost_locale -o out
and execute with mpirun -np 5 out, but I would like to be able to execute with a higher number of cores on an HPC cluster later on (the model will be run at the global scale, and the number of cells might depend on the grid cell size chosen by the user).
The compilers installed are g++ (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, Open MPI: 2.1.1
The fact that you have nothing to send is an important piece of information in your scenario. You can not deduce that fact from only the absence of a message. The absence of a message only means nothing was sent yet.
Simply sending a zero-sized vector and skipping the probing is the easiest way out.
Otherwise you would probably have to change your approach radically or implement a very complex speculative execution / rollback mechanism.
Also note that the linked tutorial uses probe in a very different fashion.

AT command response parser

I am working on my own implementation to read AT commands from a Modem using a microcontroller and c/c++
but!! always a BUT!! after I have two "threads" on my program, the first one were I am comparing the possible reply from the Moden using strcmp which I believe is terrible slow
comparing function
if (strcmp(reply, m_buffer) == 0)
{
memset(buffer, 0, buffer_size);
buffer_size = 0;
memset(m_buffer, 0, m_buffer_size);
m_buffer_size = 0;
return 0;
}
else
return 1;
this one works fine for me with AT commands like AT or AT+CPIN? where the last response from the Modem is "OK" and nothing in the middle, but it is not working with commands like AT+CREG?, wheres it responses:
+REG: n,n
OK
and I am specting for "+REG: n,n" but I believe strncpy is very slow and my buffer data is replaced for "OK"
2nd "thread" where it enables a UART RX interruption and replaces my buffer data every time it receives new data
Interruption handle:
m_buffer_size = buffer_size;
strncpy(m_buffer, buffer, buffer_size + m_buffer_size);
Do you know any out there faster than strcmp? or something to improve the AT command responses reading?
This has the scent of an XY Problem
If you have seen the buffer contents being over written, you might want to look into a thread safe queue to deliver messages from the RX thread to the parsing thread. That way even if a second message arrives while you're processing the first, you won't run into "buffer overwrite" problems.
Move the data out of the receive buffer and place it in another buffer. Two buffers is rarely enough, so create a pool of buffers. In the past I have used linked lists of pre-allocated buffers to keep fragmentation down, but depending on the memory management and caching smarts in your microcontroller, and the language you elect to use, something along the lines of std::deque may be a better choice.
So
Make a list of free buffers.
When a the UART handling thread loop looks something like,
Get a buffer from the free list
Read into the buffer until full or timeout
Pass buffer to parser.
Parser puts buffer in its own receive list
Parsing sends a signal to wake up its thread.
Repeat until terminated. If the free list is emptied, your program is probably still too slow to keep up. Perhaps adding more buffers will allow the program to get through a busy period, but if the data flow is relatively constant and the free list empties out... Well, you have a problem.
Parser loop also repeats until terminated looks like:
If receive list not empty,
Get buffer from receive list
Process buffer
Return buffer to free list
Otherwise
Sleep
Remember to protect the lists from concurrent access by the different threads. C11 and C++11 have a number of useful tools to assist you here.

GNURadio issues with timing

I am having trouble getting a custom block to operate at high frequency.
The block I would like to use is going to take in data from an external radio.
I am using an Ettus USRP block to stream data in from this radio, and I can display this on the QT Scope. I can set this block's sample rate to 15 MHz, and with the scope this seems to work ok.
Problem:
I have tried making a simple block with the gnuradio gr_modtool which takes in 2 floats as input and has 0 outputs. The block has private members "timer", a time_t, and "counter", an int. In the "work" function, my code simply does this at the moment:
const float *in_i = (const float *) input_items[0];
const float *in_q = (const float *) input_items[1];
if (count == 0){
if (*in_i > 0.5){
timer = clock();
count = 30000;
}
}else{
count --;
if(count == 0){
timer = clock()-timer;
printf("Count took %d clicks, or %f seconds\n",timer,(float)timer/CLOCKS_PER_SEC);
}
}
// Tell runtime system how many output items we produced.
return 0;
However, when I run this code, it takes longer than the expected time.
For 30000 cycles, it takes 0.872970 to complete, instead of the desired 0.002 seconds. Since the standard gnuradio block generated with gr_modtool is a sync block, and the input stream to the block is coming from the 15 MHz USRP, I would have expected this block to run at that same frequency. This is not currently the case.
Eventually my goal is to be able to store data streaming in over a period of time, and write it to file with certain formatting(A block already exists to do this, but there is some sort of bug that is preventing that block and the USRP block from working at the same time, so I am attempting to write my own.). However, unless I can keep up with the sample rate of 15 MHz, I will lose data. Since this block is fairly simple, I would have hoped it would be able to run quickly enough to keep up. However, the input stream block is able to pull data from the radio and output at 15 MHz, so I know my computer is capable of it.
How can I make this custom block operate more quickly, and keep up with the 15 MHz frequency?(Or, how can I make this sync block operate at the input stream frequency, since it currently does not)
Your block is not consuming any samples. I presume you're writing a sync_block (work function, not general_work), so your number of produced items is identical to the number of consumed items. But as your source code says:
// Tell runtime system how many output items we produced.
return 0;
In other words, your block tells GNU Radio that it didn't use any of the input GNU Radio offered, and produced no output. That means GNU Radio can't do nothing. You must return the number of items you've produced, and for sync blocks, that's the number of items you consumed – even if you're a sink, with zero output streams!

Correct use of memcpy

I have some problems with a project I'm doing. Basically I'm just using memcpy the wrong way. I know the theroy of pointer/arrays/references and should know how to do that, nevertheless I've spend two days now without any progress. I'll try to give a short code overview and maybe someone sees a fault! I would be very thankful.
The Setup: I'm using an ATSAM3x Microcontroller together with a uC for signal aquisition. I receive the data over SPI.
I have an Interrupt receiving the data whenever the uC has data available. The data is then stored in a buffer (int32_t buffer[1024 or 2048]). There is a counter that counts from 0 to the buffer size-1 and determines the place where the data point is stored. Currently I receive a test signal that is internally generated by the uC
//ch1: receive 24 bit data in 8 bit chunks -> store in an int32_t
ch1=ch1|(SPI.transfer(PIN_CS, 0x00, SPI_CONTINUE)<<24)>>8;
ch1=ch1|(SPI.transfer(PIN_CS, 0x00, SPI_CONTINUE)<<16)>>8;
ch1=ch1|(SPI.transfer(PIN_CS, 0x00, SPI_CONTINUE)<<8)>>8;
if(Not Important){
_ch1Buffer[_ch1SampleCount] = ch1;
_ch1SampleCount++;
if(_ch1SampleCount>SAMPLE_BUFFER_SIZE-1) _ch1SampleCount=0;
}
This ISR is active all the time. Since I need raw data for signal processing and the buffer is changed by the ISR whenever a new data point is available, i want to copy parts of the buffer into a temporary "storage".
To do so, I have another, global counter wich is incremented within the ISR. In the mainloop, whenever the counter reaches a certain size, i call a method get some of the buffer data (about 30 samples).
The method aquires the current position in the buffer:
'int ch1Pos = _ch1SampleCount;'
and then, depending on that position I try to use memcpy to get my samples. Depending on the position in the buffer, there has to be a "wrap-around" to get the full set of samples:
if(ch1Pos>=(RAW_BLOCK_SIZE-1)){
memcpy(&ch1[0],&_ch1Buffer[ch1Pos-(RAW_BLOCK_SIZE-1)] , RAW_BLOCK_SIZE*sizeof(int32_t));
}else{
memcpy(&ch1[RAW_BLOCK_SIZE-1 - ch1Pos],&_ch1Buffer[0],(ch1Pos)*sizeof(int32_t));
memcpy(&ch1[0],&_ch1Buffer[SAMPLE_BUFFER_SIZE-1-(RAW_BLOCK_SIZE- ch1Pos)],(RAW_BLOCK_SIZE-ch1Pos)*sizeof(int32_t));
}
_ch1Buffer is the buffer containing the raw data
SAMPLE_BUFFER_SIZE is the size of that buffer
ch1 is the array wich is supposed to hold the set of samples
RAW_BLOCK_SIZE is the size of that array
ch1Pos is the position of the last data point written to the buffer from the ISR at the time where this method is called
Technically I'm aware of the requirements, but apparently thats not enough ;-).
I know, that the data received by the SPI interface is "correct". The problem is, that this is not the case for the extracted samples. There are a lot of spikes in the data that indicate that I've been reading something I wasn't supposed to read. I've changed the memcpy commands that often, that I completly lost the overview. The code sample above is one version of many's, and while you're reading this I'm sure I've changed everything again.
I would appreciate every hint!
Thanks & Greetings!
EDIT
I've written down everything (again) on a sheet of paper and tested some constellations. This is the updated Code for the memcpy part:
if(ch1Pos>=(RAW_BLOCK_SIZE-1)){
memcpy(&ch1[0],&_ch1Buffer[ch1Pos-(RAW_BLOCK_SIZE-1)] , RAW_BLOCK_SIZE*sizeof(int32_t));
}else{
memcpy(&ch1[RAW_BLOCK_SIZE-1-ch1Pos],&_ch1Buffer[0],(ch1Pos+1)*sizeof(int32_t));
memcpy(&ch1[0],&_ch1Buffer[SAMPLE_BUFFER_SIZE-(RAW_BLOCK_SIZE-1-ch1Pos)],(RAW_BLOCK_SIZE-1-ch1Pos)*sizeof(int32_t));
}
}
This already made it a lot better. From all the changes, everything kinda got messed up. Now there is just one Error there. There is a periodical spike. I'll try to get more information, but I think it is a wrong access while wrapping around.
I've changed the if(_ch1SampleCount>SAMPLE_BUFFER_SIZE-1) _ch1SampleCount=0; to if(_ch1SampleCount>=SAMPLE_BUFFER_SIZE) _ch1SampleCount=0;.
EDIT II
To answer the Questions of #David Schwartz :
SPI.transfer returns a single byte
The buffer is initialised once at startup: memset(_ch1Buffer,0,sizeof(int32_t)*SAMPLE_BUFFER_SIZE);
EDIT III
Sorry for the frequent updates, the comment section is getting too big.
I managed to get rid of a bunch of zero values at the beginning of the stream by decreasing ch1Pos: 'int ch1Pos = _ch1SampleCount;' Now there is just one periodic "spike" (wrong value). It must be something with the splitted memcpy command. I'll continue looking. If anyone has an idea ... :-)

Concatenate data in an array in C ++

I'm working on software for processing audio in real time in C++ with Qt. I need that requirements are minimized.
Defining a temporary buffer 40ms, launching our device with a sampling frequency Fs = 8000Hz, every 320 samples entered a feature called Data Processing ().
The idea is to have a global buffer that stores the 10s last recorded, 80000 samples.
This Buffer in each iteration eliminates the initial 320 samples and looped at the end, 320 new samples. Thus the buffer is updated and the user can observe the real-time graphical representation of the recorded signal.
At first I thought of using QVector (equivalent to std::vector but for Qt) for this deployment, thus we reduce the process a few lines of code
int NUM_POINTS=320;
DatosTemporales.erase(DatosTemporales.begin(),DatosTemporales.begin()+NUM_POINTS);
DatosTemporales+= (DatosNuevos); // Datos Nuevos con un tamaño de NUM_POINTS
In each iteration we create a vector of 80000 samples in addition to free some positions so requires some processing time. An alternative for opting was the use of * double, and iterations a loop:
for(int i=0;i<80000;i++){
if(i<80000-NUM_POINTS){
aux=DatosTemporales[i];
DatosTemporales[i+NUM_POINTS]=aux;
}else{
DatosTemporales[i]=DatosNuevos[i-NUN_POINTS];
}
}
Does fails. I think the best way is to use dynamic memory. Implementing this process by pointers. Could anyone give me some idea how to implement it?
It sounds like what you are looking for is a circular buffer.
https://www.google.com/search?q=qcircularbuffer
https://qt.gitorious.org/qt/qtbase/merge_requests/60
And it looks like you only need the header file and you should be good to go.
A similar tool that is already in the Qt data set is found here:
http://doc.qt.io/qt-5/qcontiguouscache.html#details
The advantage of using a system like these presented, is that they don't need to have dynamic memory, it just needs to move the head and the tail pointers.
Hope that helps.