Embedding cache aligned meta data inside mbuf - dpdk

I'm developing my own dpdk application and I wish received packets to go through several threads in series. Each individual thread has it's own duty of inspecting packets and generating some metadata for each individual packet. It appears to be the easiest and most efficient way to transfer packets between threads is using rte rings. However I need to transfer the metadata generated by each thread to the next thread as well. I have tried doing this using array of structures for metadata and parsing a pointer to next thread. However this method proved to be inefficient since I got lot of cache misses.
As a solution I came up with idea of putting metadata generated by each thread into mbufs themselves. It seems to be doable with "Dynamic fields" of mbufs. However documentation of this feature seems to be very limited. For my application I wish to use a metadata field inside dynamic field something like this,
typedef struct {
uint32_t packet_id;
uint64_t time_stamp;
uint8_t ip_v;
uint32_t length;
.........
.........
} my_metadata_field;
What I don't understand is how much space I can use for dynamic field? The only thing mentioned about this on dpdk documentation is,
"10.6.1. Dynamic fields and flags
The size of the mbuf is constrained and limited; while the amount of
metadata to save for each packet is quite unlimited. The most basic
networking information already find their place in the existing mbuf
fields and flags.
If new features need to be added, the new fields and flags should fit
in the “dynamic space”, by registering some room in the mbuf
structure:
dynamic field -
named area in the mbuf structure, with a given size (at least 1 byte) and alignment constraint."
which doesn't make much sense for me. How much memory I have for this field? If it's almost unlimited, what are the tradeoffs I have to deal with if I use a large metadata field? (performance wise)
I use dpdk 20.08
Edit:
After some digging I have abandoned the idea of using dynamic field for metadata since lack of documentation and it doesn't appears to be able to hold more than 64bits.
I am looking for an easy way to embed my metadata inside cache aligned mbufs (preferably using a struct like above) so I can use rte rings to share them between threads. I'm looking for any documentation or reference project for me to begin with.

There are a couple of ways to carry metadata along with MBUF. Following are the options to do the same
in function rte_mempool_create instead of passing private_data_size as 0 pass the size as custom metadata size.
in function rte_pktmbuf_pool_create instead of passing priv_size as 0 pass the size as custom metadata size
if size of metadata is less than 128 Bytes, use typecast to access memory area right after rte_mbuf
If there are no external buffer used in DPDK application, update rte_mbuf shinfo or next
Solution 1: rte_mempool_create("FIPS_SESS_PRIV_MEMPOOL", 16, sess_sz, 0, sizeof(my_metadata_field), NULL, NULL, NULL, NULL, rte_socket_id(), 0);
Solution 2: rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS * nb_ports, MBUF_CACHE_SIZE, sizeof(my_metadata_field), RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
Solution 3:
struct rte_mbuf *bufs[BURST_SIZE];
const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs, BURST_SIZE);
if (unlikely(nb_rx == 0))
continue;
for (int index = 0; index < nb_rx; index++)
{
assert(sizeof(my_metadata_field) <= RTE_CACHE_LINE_SIZE);
my_metadata_field *ptr = bufs[index] + 1;
...
...
...
}
Solution 4:
privdata_ptr = rte_mempool_create("METADATA_POOL", 16 * 1024, sizeof(my_metadata_field), 0, 0,
NULL, NULL, NULL, NULL, rte_socket_id(), 0);
struct rte_mbuf *bufs[BURST_SIZE];
const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs, BURST_SIZE);
if (unlikely(nb_rx == 0))
continue;
for (int index = 0; index < nb_rx; index++)
{
void *msg = NULL;
if (0 == rte_mempool_get(privdata_ptr, &msg))
{
assert(msg != NULL);
bufs[index]->shinfo = msg;
continue;
}
/* free the mbuf as we are not able to retrieve the private data */
}
/* before transmit or pkt free ensure to release object back to mempool via rte_mempool_put */

Related

FatFS - Cannot format drive, FR_MKFS_ABORTED

I am implementing a file system on SPI flash memory using a w25qxx chip and an STM32F4xx on STM32CubeIDE. I have successfully created the basic i/o for the w25 over SPI, being able to write and read sectors at a time.
In my user_diskio.c I have implemented all of the needed i/o methods and have verified that they are properly linked and being called.
in my main.cpp I go to format the drive using f_mkfs(), then get the free space, and finally open and close a file. However, f_mkfs() keeps returning FR_MKFS_ABORTED. (FF_MAX_SS is set to 16384)
fresult = FR_NO_FILESYSTEM;
if (fresult == FR_NO_FILESYSTEM)
{
BYTE work[FF_MAX_SS]; // Formats the drive if it has yet to be formatted
fresult = f_mkfs("0:", FM_ANY, 0, work, sizeof work);
}
f_getfree("", &fre_clust, &pfs);
total = (uint32_t)((pfs->n_fatent - 2) * pfs->csize * 0.5);
free_space = (uint32_t)(fre_clust * pfs->csize * 0.5);
fresult = f_open(&fil, "file67.txt", FA_OPEN_ALWAYS | FA_READ | FA_WRITE);
f_puts("This data is from the FILE1.txt. And it was written using ...f_puts... ", &fil);
fresult = f_close(&fil);
fresult = f_open(&fil, "file67.txt", FA_READ);
f_gets(buffer, f_size(&fil), &fil);
f_close(&fil);
Upon investigating my ff.c, it seems that the code is halting on line 5617:
if (fmt == FS_FAT12 && n_clst > MAX_FAT12) return FR_MKFS_ABORTED; /* Too many clusters for FAT12 */
n_clst is calculated a few lines up before some conditional logic, on line 5594:
n_clst = (sz_vol - sz_rsv - sz_fat * n_fats - sz_dir) / pau;
Here is what the debugger reads the variables going in as:
This results in n_clst being set to 4294935040, as it is unsigned, though the actual result of doing the calculations would be -32256 if the variable was signed. As you can imagine, this does not seem to be an accurate calculation.
The device I am using has 16M-bit (2MB) of storage organized in 512 sectors of 4kb in size. The minimum erasable block size is 32kb. If you would need more info on the flash chip I am using, page 5 of this pdf outlines all of the specs.
This is what my USER_ioctl() looks like:
DRESULT USER_ioctl (
BYTE pdrv, /* Physical drive nmuber (0..) */
BYTE cmd, /* Control code */
void *buff /* Buffer to send/receive control data */
)
{
/* USER CODE BEGIN IOCTL */
UINT* result = (UINT*)buff;
HAL_GPIO_WritePin(GPIOE, GPIO_PIN_11, GPIO_PIN_SET);
switch (cmd) {
case GET_SECTOR_COUNT:
result[0] = 512; // Sector and block sizes of
return RES_OK;
case GET_SECTOR_SIZE:
result[0] = 4096;
return RES_OK;
case GET_BLOCK_SIZE:
result[0] = 32768;
return RES_OK;
}
return RES_ERROR;
/* USER CODE END IOCTL */
}
I have tried monkeying around with the parameters to f_mkfs(), swapping FM_ANY out for FM_FAT, FM_FAT32, and FM_EXFAT (along with enabling exFat in my ffconf.h. I have also tried using several values for au rather than the default. For a deeper documentation on the f_mkfs() method I am using, check here, there are a few variations of this method floating around out there.
Here:
fresult = f_mkfs("0:", FM_ANY, 0, work, sizeof work);
The second argument is not valid. It should be a pointer to a MKFS_PARM structure or NULL for default options, as described at http://elm-chan.org/fsw/ff/doc/mkfs.html.
You should have something like:
MKFS_PARM fmt_opt = {FM_ANY, 0, 0, 0, 0};
fresult = f_mkfs("0:", &fmt_opt, 0, work, sizeof work);
except that it is unlikely for your media (SPI flash) that the default option are appropriate - the filesystem cannot obtain formatting parameters from the media as it would for SD card for example. You have to provide the necessary formatting information.
Given your erase block size I would guess:
MKFS_PARM fmt_opt = {FM_ANY, 0, 32768, 0, 0};
but to be clear I have never used the ELM FatFS (which STM32Cube incorporates) with SPI flash - there may be additional issues. I also do not use STM32CubeMX - it is possible I suppose that the version has a different interface, but I would recommend using the latest code from ELM rather than ST's possibly fossilised version.
Another consideration is that FatFs is not particularly suitable for your media due to wear-levelling issues. Also ELM FatFs has not journalling or check/repair function, so is not power fail safe. That is particularly important for non-removable media that you cannot easily back-up or repair.
You might consider a file system specifically designed for SPI NOR flash such as SPIFFS, or the power-fail safe LittleFS. Here is an example of LittleFS in STM32: https://uimeter.com/2018-04-12-Try-LittleFS-on-STM32-and-SPI-Flash/
Ok, I think the real problem was that the IOCTL call GET_BLOCK_SIZE to get the block size was returning the sector size instead of the number of sectors in the block. Which is usually 1 for SPI Flash.

How to retrieve callchain from ring buffer in perf_event_open for PERF_RECORD_SWITCH?

In the ring buffer, can we retrieve the callchain only for PERF_RECORD_SAMPLE or it can be done for other record types as well?
The man page of perf_event_open only explictly states the callchain to be available for PERF_RECORD_SAMPLE. I am particularly interested getting the callchain for PERF_RECORD_SWITCH to get the stack trace for when my program is context switching in and out. I've tried a method of reading the callchain from the buffer, but seeing the addresses returned, it looks incorrect.
size_t index = mapping->data_tail; //mapping is the pointer to the ring buffer
uintptr_t base = reinterpret_cast<uintptr_t>(mapping) + PageSize;
size_t start_index = index % DataSize;
size_t end_index = start_index + sizeof(struct perf_event_header);
memcpy(buf, reinterpret_cast<void*>(base + start_index), sizeof(struct perf_event_header));
struct perf_event_header* header = reinterpret_cast<struct perf_event_header*>(buf);
uintptr_t p = reinterpret_cast<uintptr_t>(header) + sizeof(struct perf_event_header)
// Only sampling PERF_SAMPLE_CALLCHAIN
uint64_t* base = reinterpret_cast<uint64_t*>(p);
uint64_t size = *base; // Should be callchain size
base++;
for (int i = 0; i < size; i++) {
cout << *base << endl; // prints the addresses in the callchain stack
}
The 2 main issues with the output I am getting using this snippet are that :
1. All PERF_RECORD_SWITCH have the same callchain. Which should be extremely unlikely.
2. The output is not consistent across multiple runs. The callchain size keeps varying from 0 (mostly) to 4,6, 16 and sometimes a very big (undefined) number.
The callchain is only available for PERF_RECORD_SAMPLE events.
When reading the different record types, you should follow the struct definitions from perf_event_open rather than just trying to access individual fields by pointers, i.e.,
struct perf_record_switch {
struct perf_event_header header;
struct sample_id sample_id;
};
And then cast the whole event reinterpret_cast<struct perf_record_switch*>(header)
Specifically the layout of the sample type is highly dependent on the configuration and may include multiple dynamically sized arrays which can prevent using a static struct.
Technically, you can collect callchain from switching events by using the sched:sched_switch tracepoint sampling event. This results in PERF_RECORD_SAMPLE events. However, your might not always see useful information but mainly in-kernel scheduling details.

Deleting and re instantiating an arrayed struct globally from within a function without knowing the size of the array at compile

Full code: Pastebin
Full code with your comments enabled (Google Drive): SerialGrapherV0.9
Code-in-progress is near the bottom.
youTube example of graphing code running: Grapher
Background: My goal is to write a library to allow a caller Arduino to drive a callee Arduino via serial and print to a master-defined graph or graphs on an SSD1306 I2C display(no SPI version to test with). The graphing code is finished. Currently I can have 4 graphs that can update synchronously or asynchronously, there is no blanking and only writes the portions that need updating.
Both arduinos currently run the same sketch and determine their role via a pullup_input tied to ground, however in later versions the sketch will compile using if statements with a #defined boolean to greatly save on program space for the caller arduino.
So far:
The actual graphing is working and the graph updates whenever a graphAdd(graphNumber, newVal); is called.
The xpos, ypos, xlength, and ylength, of each graph can be defined on the caller side as such:
#define masterGraphNum 4 //Set this to the number of Graphs you want.
graphStruct graph[masterGraphNum] = { //Each set of brackets is an instance of a graph, one for each specified in graphNum, set the array number on the receiver to the max number of graphs you can use.
//Graph 1 //Usage: {LeftX, TopY, width, height}
{0, 0, 31, 32},
//Graph 2
{32, 0, 31, 32},
//Graph 3
{64, 0, 31, 32},
//Graph 4
{96, 0, 31, 32},
};
Currently I am trying to use delete[] (graph); followed by graphStruct *graph = new graphStruct[incomingGraphNum]; where incomingGraphNum is an int sent by the caller and received by the callee, this seems to work at first, however after a short time of graphing ~15 seconds the arduino crashes and restarts.
FLOW:
Callee awaits connection indefinitely
Caller sends ready byte
Callee acks
Caller sends number of graphs wanted
NOT WORKING: Re-initializing graph
Graph adds data via called function
NYI: Sending graph number and new value over serial
My problem is now instantiating a globally accessible array of structs from within a function as I don't want to have to pre-code the number of graphs into the callee, as well as assign the size of the buffer array within the struct.
For the functions to work graph[] needs to be declared globally. I would like to globally declare graph[number of graphs] within a function during the callee setup, as I want to make this into a plug-and play diagnostic tool for my future projects.
Next Steps:
Setting up packets to send the graph data over. Not too hard, essentially sending two ints like (graph#, graphData)
Add graph "titling" (like "ACC" or "Light intensity")
Implemented:
Graphing system
Simple serial "call - response" system and acknowledgement system. (Just discovered the stream function included with the Arduino IDE, currently rewriting a few sections to use Serial.parseInt() instead of a modified serialEvent().
Basic Error Handling
Loops/Second counter
A couple ideas that may help.
The display is 128 pixels, so you need a buffer no larger than that. I suggest you make that a single, global buffer (instead of each struct having its own buffer). This will never need to be re-sized with new/delete, no matter how many graphs you have.
uint8_t global_graph_buffer[128];
Notice that I have changed it from int to byte. The height of the display is only 30 or 40 pixels (?) so there's no need to store any number bigger than that. Just scale the value down when it comes in on the port.
graph_buffer[x] = map(incoming_data, 0, max_input, 0, height_of_graph);
See the Arduino map() function.
Next, do you really need the y_pos and height of the graphs? Do you plan on having more than 1 row of graphs? If not, get rid of those struct members. Also, you can get rid of the x_pos and width fields as well. These can be calculated based on the indexes.
struct graphStruct {
uint8_t gx; // Removed
uint8_t gy; // Removed
uint8_t gw; // Removed
uint8_t gh; // Removed
int gVal; // What is this for?
//int graphBuffer[graphBufferSize]; This is gone
uint8_t start_index; // First index in the global array
uint8_t end_index; // Last index
bool isReady; // What is this for?
};
To calc x_pos and width:
x_pos = start_index
width = end_index - start_index
To handle incoming data, shift just the part of the buffer for the given graph and add the value:
int incoming_data = some_value_from_serial;
// Shift
for (byte i = graph[graphNumber].start_index+1; i < graph[graphNumber].end_index; i++) {
global_graph_buffer[i] = global_graph_buffer[i-1]
}
// Store
global_graph_buffer[i] = map(incoming_data, 0, graphMax, 0, 128);
Lastly, you need to consider: how many graphs can you realistically display at one time? Set a max, and create only that many structs at the start. If you use a global buffer, as I suggest, you can re-use a struct multiple times (without having to use new/delete). Just change the start_index and end_index fields.
Not sure if any of this helps, but maybe you can get some ideas from it.

Multiple constant buffer - register - dx12

I learn dx12 with that tutorial :
https://www.braynzarsoft.net/viewtutorial/q16390-directx-12-constant-buffers-root-descriptor-tables#c0
I tried to modify this step to got 2 constant buffer (so a register b0 and a b1, if i understood well).
For that I begin to say in my root sign there is 2 parameters:
// create root signature
// create a descriptor range (descriptor table) and fill it out
// this is a range of descriptors inside a descriptor heap
D3D12_DESCRIPTOR_RANGE descriptorTableRanges[1]; // only one range right now
descriptorTableRanges[0].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_CBV; // this is a range of constant buffer views (descriptors)
descriptorTableRanges[0].NumDescriptors = 2; // we only have one constant buffer, so the range is only 1
descriptorTableRanges[0].BaseShaderRegister = 0; // start index of the shader registers in the range
descriptorTableRanges[0].RegisterSpace = 0; // space 0. can usually be zero
descriptorTableRanges[0].OffsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND; // this appends the range to the end of the root signature descriptor tables
// create a descriptor table
D3D12_ROOT_DESCRIPTOR_TABLE descriptorTable;
descriptorTable.NumDescriptorRanges = 0;// _countof(descriptorTableRanges); // we only have one range
descriptorTable.pDescriptorRanges = &descriptorTableRanges[0]; // the pointer to the beginning of our ranges array
D3D12_ROOT_DESCRIPTOR_TABLE descriptorTable2;
descriptorTable2.NumDescriptorRanges = 1;// _countof(descriptorTableRanges); // we only have one range
descriptorTable2.pDescriptorRanges = &descriptorTableRanges[0]; // the pointer to the beginning of our ranges array
// create a root parameter and fill it out
D3D12_ROOT_PARAMETER rootParameters[2]; // only one parameter right now
rootParameters[0].ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE; // this is a descriptor table
rootParameters[0].DescriptorTable = descriptorTable; // this is our descriptor table for this root parameter
rootParameters[0].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX; // our pixel shader will be the only shader accessing this parameter for now
rootParameters[1].ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE; // this is a descriptor table
rootParameters[1].DescriptorTable = descriptorTable2; // this is our descriptor table for this root parameter
rootParameters[1].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX; // our pixel shader will be the only shader accessing this parameter for now
But now I failed to link constant buffer to a variable, I try to modify in this part of the code:
// Create a constant buffer descriptor heap for each frame
// this is the descriptor heap that will store our constant buffer descriptor
for (int i = 0; i < frameBufferCount; ++i)
{
D3D12_DESCRIPTOR_HEAP_DESC heapDesc = {};
heapDesc.NumDescriptors = 1;
heapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE;
heapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV;
hr = device->CreateDescriptorHeap(&heapDesc, IID_PPV_ARGS(&mainDescriptorHeap[i]));
if (FAILED(hr))
{
Running = false;
}
}
// create the constant buffer resource heap
// We will update the constant buffer one or more times per frame, so we will use only an upload heap
// unlike previously we used an upload heap to upload the vertex and index data, and then copied over
// to a default heap. If you plan to use a resource for more than a couple frames, it is usually more
// efficient to copy to a default heap where it stays on the gpu. In this case, our constant buffer
// will be modified and uploaded at least once per frame, so we only use an upload heap
// create a resource heap, descriptor heap, and pointer to cbv for each frame
for (int i = 0; i < frameBufferCount; ++i)
{
hr = device->CreateCommittedResource(
&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // this heap will be used to upload the constant buffer data
D3D12_HEAP_FLAG_NONE, // no flags
&CD3DX12_RESOURCE_DESC::Buffer(1024 * 64), // size of the resource heap. Must be a multiple of 64KB for single-textures and constant buffers
D3D12_RESOURCE_STATE_GENERIC_READ, // will be data that is read from so we keep it in the generic read state
nullptr, // we do not have use an optimized clear value for constant buffers
IID_PPV_ARGS(&constantBufferUploadHeap[i]));
constantBufferUploadHeap[i]->SetName(L"Constant Buffer Upload Resource Heap");
D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc = {};
cbvDesc.BufferLocation = constantBufferUploadHeap[i]->GetGPUVirtualAddress();
cbvDesc.SizeInBytes = (sizeof(ConstantBuffer) + 255) & ~255; // CB size is required to be 256-byte aligned.
device->CreateConstantBufferView(&cbvDesc, mainDescriptorHeap[i]->GetCPUDescriptorHandleForHeapStart());
ZeroMemory(&cbColorMultiplierData, sizeof(cbColorMultiplierData));
CD3DX12_RANGE readRange(0, 0); // We do not intend to read from this resource on the CPU. (End is less than or equal to begin)
hr = constantBufferUploadHeap[i]->Map(0, &readRange, reinterpret_cast<void**>(&cbColorMultiplierGPUAddress[i]));
memcpy(cbColorMultiplierGPUAddress[i], &cbColorMultiplierData, sizeof(cbColorMultiplierData));
}
Thank
Your root signature is incorrect, you are trying to set a descriptor table with no range.
You have 3 ways to register a constant buffer in a root signature, with root constants, with a root constant buffer and with descriptor tables. The first two connect one constant buffer per root parameter, while the third allow to set multiple constant buffers in a single table.
In your case, a single root parameter of type descriptor table, with a single range refering to an array of 2 is enough to let you bind 2 constant buffer.
I recommend you to read how root signatures are declared in HLSL to better understand the concept and how it translates to the C++ declaration.
As for the runtime portion of manipulating constant buffer. You will have to be very careful again, their is no life time management in d3d12 nor the driver like it was with d3d11, you cannot update in place a constant buffer memory without making sure the GPU is already done using the previous content. The solution is often to work with a ring buffer to allocate your frame constant buffer, and to use fence to keep you from overwriting too soon.
I highly recommend you to stick to d3d11. d3d12 is not a replacement of it, it is made to overcome some of this performance issues that are only to be find in extremely complex renderer and to be used by people with expert knowledge of the GPU already and d3d11, if your application is not to the level of complexity of a GTA V ( just an example ), you are only shooting you in the foot by switching to d3d12.
Your real problem is: You defined 2 pieces of CBV descriptor in one range, and than defined 2 pieces of descriptor table with this range. So , you defined 4 pieces of CBVs instead of 2, and when you define the descriptor heap, you set the heapDesc.NumDescriptors to 1 instead of 4, because you defined 4 constant-buffer descriptor in the code, not 2.

MongoDB C driver efficiency

I'm trying to write a program whose job it is to go into shared memory, retrieve a piece of information (a struct 56 bytes in size), then parse that struct lightly and write it to a database.
The catch is that it needs to do this several dozens of thousands of times per second. I'm running this on a dedicated Ubuntu 14.04 server with dual Xeon X5677's and 32GB RAM. Also, Mongo is running PerconaFT as its storage engine. I am making an uneducated guess here, but say worst case load scenario would be 100,000 writes per second.
Shared memory is populated by another program who's reading information from a real time data stream, so I can't necessarily reproduce scenarios.
First... is Mongo the right choice for this task?
Next, this is the code that I've got right now. It starts with creating a list of collections (the list of items I want to record data points on is fixed) and then retrieving data from shared memory until it catches a signal.
int main()
{
//these deal with navigating shared memory
uint latestNotice=0, latestTurn=0, latestPQ=0, latestPQturn=0;
symbol_data *notice = nullptr;
bool done = false;
//this is our 56 byte struct
pq item;
uint64_t today_at_midnight; //since epoch, in milliseconds
{
time_t seconds = time(NULL);
today_at_midnight = seconds/(60*60*24);
today_at_midnight *= (60*60*24*1000);
}
//connect to shared memory
infob=info_block_init();
uint32_t used_symbols = infob->used_symbols;
getPosition(latestNotice, latestTurn);
//fire up mongo
mongoc_client_t *client = nullptr;
mongoc_collection_t *collections[used_symbols];
mongoc_collection_t *collection = nullptr;
bson_error_t error;
bson_t *doc = nullptr;
mongoc_init();
client = mongoc_client_new("mongodb://localhost:27017/");
for(uint32_t symbol = 0; symbol < used_symbols; symbol++)
{
collections[symbol] = mongoc_client_get_collection(client, "scribe",
(infob->sd+symbol)->getSymbol());
}
//this will be used later to sleep one millisecond
struct timespec ts;
ts.tv_sec=0;
ts.tv_nsec=1000000;
while(continue_running) //becomes false if a signal is caught
{
//check that new info is available in shared memory
//sleep 1ms if it isn't
while(!getNextNotice(&notice,latestNotice,latestTurn)) nanosleep(&ts, NULL);
//get the new info
done=notice->getNextItem(item, latestPQ, latestPQturn);
if(done) continue;
//just some simple array math to make sure we're on the right collection
collection = collections[notice - infob->sd];
//switch on the item type and parse it accordingly
switch(item.tp)
{
case pq::pq_event:
doc = BCON_NEW(
//decided to use this instead of std::chrono
"ts", BCON_DATE_TIME(today_at_midnight + item.ts),
//item.pr is a uint64_t, and the guidance I've read on mongo
//advises using strings for those values
"pr", BCON_UTF8(std::to_string(item.pr).c_str()),
"sz", BCON_INT32(item.sz),
"vn", BCON_UTF8(venue_labels[item.vn]),
"tp", BCON_UTF8("e")
);
if(!mongoc_collection_insert(collection, MONGOC_INSERT_NONE, doc, NULL, &error))
{
LOG(1,"Mongo Error: "<<error.message<<endl);
}
break;
//obviously, several other cases go here, but they all look the
//same, using BCON macros for their data.
default:
LOG(1,"got unknown type = "<<item.tp<<endl);
break;
}
}
//clean up once we break from the while()
if(doc != nullptr) bson_destroy(doc);
for(uint32_t symbol = 0; symbol < used_symbols; symbol++)
{
collection = collections[symbol];
mongoc_collection_destroy(collection);
}
if(client != nullptr) mongoc_client_destroy(client);
mongoc_cleanup();
return 0;
}
My second question is: is this the fastest way to do this? The retrieval from shared memory isn't perfect, but this program is getting way behind its supply of data, far moreso than I need it to be. So I'm looking for obvious mistakes with regards to efficiency or technique when speed is the goal.
Thanks in advance. =)