I am trying to parse a neural network configuration file similar to the following lines. Actual file will have many more lines but similar format.
Resnet50 {
Layer CONV1 {
Type: CONV
Stride { X: 2, Y: 2 }
Dimensions { K: 64, C: 3, R: 7, S: 7, Y:224, X:224 }
}
Layer CONV2_1_1 {
Type: CONV
Stride { X: 1, Y: 1 }
Dimensions { K: 64, C: 64, R: 1, S: 1, Y: 56, X: 56 }
}
I use this Boost argument parsing code:
void to_cout(const std::vector<std::string> &v)
{
std::copy(v.begin(), v.end(), std::ostream_iterator<std::string>{std::cout, "\n"});
}
int main(int argc, char* argv[]) {
namespace po = boost::program_options;
po::options_description conf("Config file options");
conf.add_options()("confg_file", po::value<std::string>(&file_name), "HW configuration file");
po::options_description all_options;
all_options.add(conf);
po::variables_map vm;
po::store(po::parse_command_line(argc, argv, all_options), vm);
po::notify(vm);
return 0;
}
Seeming a regular parsing routine. But the configuration file wasn't parsed correctly because there was no output in the to_cout of vm. How does parse_command_line get into the hierarchy of the example configuration file?
That's not what Program Options is about. You can use it to read ini files, but not with the code shown. You are literally invoking parse_command_line (not parse_config_file).
The code you show allows you to parse the name of a config file from the command line. This is also why the value is std::string file_name.
Maybe we're missing (quite a lot of) code, because there's also nothing invoking to_cout in your code, nevermind that it wouldn't work with vm because the argument type doesn't directly match. I know you can loop over matched names in the variable map, and this is likely what you did, but that's all not very relevant.
Even if you did call parse_config_file would not know how to read that file format, as the documented format is an INI-file flavour.
The Good News
The good news is that your config file does have a format that closely resembles INFO files as supported by Boost Property Tree. Which gives me the first opportunity in 10 years¹ to actually suggest using that library: It seems to be more or less precisely what you are after:
Live On Coliru
#include <boost/property_tree/info_parser.hpp>
#include <iostream>
extern std::string config;
int main() {
boost::property_tree::ptree pt;
std::istringstream iss(config);
read_info(iss, pt);
write_info(std::cout, pt);
}
std::string config = R"(
Resnet50 {
Layer CONV1 {
Type: CONV
Stride { X: 2, Y: 2 }
Dimensions { K: 64, C: 3, R: 7, S: 7, Y:224, X:224 }
}
Layer CONV2_1_1 {
Type: CONV
Stride { X: 1, Y: 1 }
Dimensions { K: 64, C: 64, R: 1, S: 1, Y: 56, X: 56 }
}
}
)";
Prints
Resnet50
{
Layer CONV1
{
Type: CONV
Stride
{
X: 2,
Y: 2
}
Dimensions
{
K: 64,
C: 3,
R: 7,
S: 7,
Y:224, X:224
}
}
Layer CONV2_1_1
{
Type: CONV
Stride
{
X: 1,
Y: 1
}
Dimensions
{
K: 64,
C: 64,
R: 1,
S: 1,
Y: 56,
X: 56
}
}
}
Tieing It Together
You may tie it together with a CLI argument for the filename:
Live On Coliru
#include <boost/property_tree/info_parser.hpp>
#include <boost/program_options.hpp>
#include <iostream>
using boost::property_tree::ptree;
int main(int argc, char* argv[]) {
std::string file_name;
{
namespace po = boost::program_options;
po::options_description cliopts("options");
cliopts.add_options() //
("config_file", po::value<std::string>(&file_name),
"HW configuration file");
po::variables_map vm;
po::store(po::parse_command_line(argc, argv, cliopts), vm);
if (!vm.contains("config_file")) {
std::cerr << cliopts << '\n';
return 255;
}
po::notify(vm); // sets file_name
}
boost::property_tree::ptree pt;
{
std::ifstream ifs(file_name);
read_info(ifs, pt);
} // closes file
for (auto const& [key, sub] : pt.get_child("Resnet50")) {
std::cout << key << " " << sub.get_value("") << "\n";
}
}
Then for running ./test.exe --config_file config.cfg it may print e.g.
Layer CONV1
Layer CONV2_1_1
¹ 10 years (and more) of admonishing people not to abuse Property Tree as an XML, INI, or JSON parser, because it is none of these things. It's ... a property tree library.
Related
I have a 2d vector which has 6 columns and 500 rows.I want to combine the rows by comparing a single column value(PDG_ID)i.e. if the PDG_ID colum value is same for rows,i will take the mean of other five columns and store these rows as one row.
Any idea how to do that in c++?
2d vectror with six columns
You need to understand the requirements and then select a fitting design.
In your case, you want to group several rows, having the same ID, and calculate the mean values of the data entries.
So, there is a relation between 1 ID and 1 or many data entries. Or in C++, an ID has associated one or many entries.
In C++, we have so called associative containers, like std::map or std::unordered_map. Here, we can store a key (the ID) with many associated data.
If we put all data of one row into on struct, we could write something like:
struct PDG {
int ID{};
int status{};
double Px{};
double Py{};
double Pz{};
double E{};
}
And, if we want to store IDs with associated many PDGs, we can define a map like this:
std::map<int, std::vector<PDG>> groupedPDGs{};
In this map, we can store IDs and associated data consisting of one or many PDGs.
Then we can add some very small/simple helper functions, like for example for IO functionality or calculation of a the mean values. With that we break down big, more complicated problems into simpler parts.
Then, the overall implementation could loook like the below:
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <iterator>
#include <map>
#include <iomanip>
// Simple data struct with all necessary values and IO functions
struct PDG {
// Data part
int ID{};
int status{};
double Px{};
double Py{};
double Pz{};
double E{};
// Input of one row
friend std::istream& operator >> (std::istream& is, PDG& pdg) {
char c{};
return is >> pdg.ID >> c >> pdg.status >> c >> pdg.Px >> c >> pdg.Py >> c >> pdg.Pz >> c >> pdg.E;
}
// Output of one row
friend std::ostream& operator << (std::ostream& os, const PDG& pdg) {
return os << "ID: " << std::setw(5) << pdg.ID << "\tStatus: " << pdg.status << "\tPx: " << std::setw(9) << pdg.Px
<< "\tPy: " << std::setw(9) << pdg.Py << "\tPz: " << std::setw(9) << pdg.Pz << "\tE: " << std::setw(9) << pdg.E;
}
};
// Alias/Abbreviation for a vector of PDGs
using PDGS = std::vector<PDG>;
// Calculate a mean value for vector of PDG data
PDG calculateMean(const PDGS& pdgs) {
// Here we store the result. Initilize with values from first row and zeroes
PDG result{ pdgs.front().ID, pdgs.front().status, 0.0, 0.0, 0.0, 0.0};
// Add up data fields according to type
for (size_t i{}; i < pdgs.size(); ++i) {
result.Px += pdgs[i].Px;
result.Py += pdgs[i].Py;
result.Pz += pdgs[i].Pz;
result.E += pdgs[i].E;
}
// Get mean value
result.Px /= pdgs.size();
result.Py /= pdgs.size();
result.Pz /= pdgs.size();
result.E /= pdgs.size();
// And return result to calling function
return result;
}
int main() {
// Open the source file containing the data, and check, if the file could be opened
if (std::ifstream ifs{ "pdg.txt" }; ifs) {
// Read header line and throw away
std::string header{}; std::getline(ifs, header);
// Here we will stored the PDGs grouped by their ID
std::map<int, PDGS> groupedPDGs{};
// Read all source lines
PDG pdg{};
while (ifs >> pdg)
// Store read values grouped by their ID
groupedPDGs[pdg.ID].push_back(pdg);
// Result with mean values
PDGS result{};
// Calculate mean values and store in additional vector
for (const auto& [id, pdgs] : groupedPDGs)
result.push_back(std::move(calculateMean(pdgs)));
// Debug: Show output to user
for (const PDG& p : result)
std::cout << p << '\n';
}
std::cerr << "\nError: Could not open source datafile\n\n";
}
With an input file like:
PDG ID, Status, Px, Py, Pz, E
22, 1, 0.00658, 0.0131, -0.00395, 0.0152
13, 1, -43.2, -44.7, -49.6, 79.6
14, 1, 3.5, 21.4, 0.499, 21.7
16, 1, 41.1, -18, 27.8, 52.8
211, 1, 0.483, -0.312, 1.52, 1.63
211, 1, -0.247, -1.75, 45.2, 45.2
321, 1, 0.717, 0.982, 52.6, 52.6
321, 1, 0.112, 0.423, 33.2, 33.2
211, 1, 0.191, -0.68, -178, 178
2212, 1, 1.08, -0.428, -1.78E+03, 1.78E+03
2212, 1, 7.61, 4.28, 76.3, 76.8
211, 1, 0.176, 0.247, 8.9, 8.9
211, 1, 0.456, -0.73, 0.342, 0.937
2112, 1, 0.633, -0.904, 0.423, 1.51
2112, 1, 1, -0.645, 0.366, 1.56
211, 1, -0.0722, 0.147, -0.153, 0.264
211, 1, 0.339, 0.402, 0.304, 0.623
211, 1, 3.64, 2.58, -2.84, 5.29
211, 1, 0.307, 0.208, -5.69, 5.71
2212, 1, 0.118, 0.359, -3.29, 3.45
we get the below output:
ID: 13 Status: 1 Px: -43.2 Py: -44.7 Pz: -49.6 E: 79.6
ID: 14 Status: 1 Px: 3.5 Py: 21.4 Pz: 0.499 E: 21.7
ID: 16 Status: 1 Px: 41.1 Py: -18 Pz: 27.8 E: 52.8
ID: 22 Status: 1 Px: 0.00658 Py: 0.0131 Pz: -0.00395 E: 0.0152
ID: 211 Status: 1 Px: 0.585867 Py: 0.0124444 Pz: -14.4908 E: 27.3949
ID: 321 Status: 1 Px: 0.4145 Py: 0.7025 Pz: 42.9 E: 42.9
ID: 2112 Status: 1 Px: 0.8165 Py: -0.7745 Pz: 0.3945 E: 1.535
ID: 2212 Status: 1 Px: 2.936 Py: 1.40367 Pz: -568.997 E: 620.083
Following this link provided by #sehe in this post Boost_option to parse a configuration file, I need to parse configuration files that may have comments.
https://www.boost.org/doc/libs/1_76_0/doc/html/property_tree/parsers.html#property_tree.parsers.info_parser
But since there are comments (leading #), so in addition to read_info(), should a grammer_spirit be used to take out the comments as well? I am referring to info_grammar_spirit.cpp in the /property_tree/examples folder
You would do good to avoid depending on implementation details, so instead I'd suggest pre-processing your config file just to strip the comments.
A simple replace of "//" with "; " may be enough.
Building on the previous answer:
std::string tmp;
{
std::ifstream ifs(file_name.c_str());
tmp.assign(std::istreambuf_iterator<char>(ifs), {});
} // closes file
boost::algorithm::replace_all(tmp, "//", ";");
std::istringstream preprocessed(tmp);
read_info(preprocessed, pt);
Now if you change the input to include comments:
Resnet50 {
Layer CONV1 {
Type: CONV // this is a comment
Stride { X: 2, Y: 2 } ; this too
Dimensions { K: 64, C: 3, R: 7, S: 7, Y:224, X:224 }
}
// don't forget the CONV2_1_1 layer
Layer CONV2_1_1 {
Type: CONV
Stride { X: 1, Y: 1 }
Dimensions { K: 64, C: 64, R: 1, S: 1, Y: 56, X: 56 }
}
}
It still parses as expected, if we also extend the debug output to verify:
ptree const& resnet50 = pt.get_child("Resnet50");
for (auto& entry : resnet50) {
std::cout << entry.first << " " << entry.second.get_value("") << "\n";
std::cout << " --- Echoing the complete subtree:\n";
write_info(std::cout, entry.second);
}
Prints
Layer CONV1
--- Echoing the complete subtree:
Type: CONV
Stride
{
X: 2,
Y: 2
}
Dimensions
{
K: 64,
C: 3,
R: 7,
S: 7,
Y:224, X:224
}
Layer CONV2_1_1
--- Echoing the complete subtree:
Type: CONV
Stride
{
X: 1,
Y: 1
}
Dimensions
{
K: 64,
C: 64,
R: 1,
S: 1,
Y: 56,
X: 56
}
See it Live On Coliru
Yes, But...?
What if '//' occurs in a string literal? Won't it also get replaced. Yes.
This is not a library-quality solution. You should not expect one, because you didn't have to put in any effort to parse your bespoke configuration file format.
You are the only party who can judge whether the short-comings of this approach are a problem for you.
However, short of just copying and modifying Boost's parser or implementing your own from scratch, there's not a lot one can do.
For The Masochists
If you don't want to reimplement the entire parser, but still want the "smarts" to skip string literals, here's a pre_process function that does all that. This time, it's truly employing Boost Spirit
#include <boost/spirit/home/x3.hpp>
std::string pre_process(std::string const& input) {
std::string result;
using namespace boost::spirit::x3;
auto static string_literal
= raw[ '"' >> *('\\'>> char_ | ~char_('"')) >> '"' ];
auto static comment
= char_(';') >> *~char_("\r\n")
| "//" >> attr(';') >> *~char_("\r\n")
| omit["/*" >> *(char_ - "*/") >> "*/"];
auto static other
= +(~char_(";\"") - "//" - "/*");
auto static content
= *(string_literal | comment | other) >> eoi;
if (!parse(begin(input), end(input), content, result)) {
throw std::invalid_argument("pre_process");
}
return result;
}
As you can see, it recognizes string literals (with escapes), it treats "//" and ';' style linewise comments as equivalent. To "show off" I threw in /block comments/ which cannot be represented in proper INFO syntax, so we just omit[] them.
Now let's test with a funky example (extended from the "Complicated example demonstrating all INFO features" from the documentation):
#include <boost/property_tree/info_parser.hpp>
#include <iostream>
using boost::property_tree::ptree;
int main() {
boost::property_tree::ptree pt;
std::istringstream iss(
pre_process(R"~~( ; A comment
key1 value1 // Another comment
key2 "value with /* no problem */ special // characters in it {};#\n\t\"\0"
{
subkey "value split "\
"over three"\
"lines"
{
a_key_without_value ""
"a key with special characters in it {};#\n\t\"\0" ""
"" value /* Empty key with a value */
"" /*also empty value: */ "" ; Empty key with empty value!
}
})~~"));
read_info(iss, pt);
std::cout << " --- Echoing the parsed tree:\n";
write_info(std::cout, pt);
}
Prints (Live On Coliru)
--- Echoing the parsed tree:
key1 value1
key2 "value with /* no problem */ special // characters in it {};#\n \"\0"
{
subkey "value split over threelines"
{
a_key_without_value ""
"a key with special characters in it {};#\n \"\0" ""
"" value
"" ""
}
}
I have successfully created a scalar valued attribute whose value is a variable length array of const char*. I do not understand how to read this attribute however!
This is code I used to create the attribute:
void create_attribute_with_vector_of_strings_as_value()
{
using namespace H5;
// Create some test strings.
std::vector<std::string> strings;
for (int iii = 0; iii < 10; iii++)
{
strings.push_back("this is " + boost::lexical_cast<std::string>(iii));
}
// Part 1: grab pointers to the chars
std::vector<const char*> chars;
for (auto si = strings.begin(); si != strings.end(); ++si)
{
std::string &s = (*si);
chars.push_back(s.c_str());
}
BOOST_TEST_MESSAGE("Size of char* array is: " << chars.size());
// Part 2: create the variable length type
hvl_t hdf_buffer;
hdf_buffer.p = chars.data();
hdf_buffer.len = chars.size();
// Part 3: create the type
auto s_type = H5::StrType(H5::PredType::C_S1, H5T_VARIABLE);
auto svec_type = H5::VarLenType(&s_type);
try
{
// Open an existing file and dataset.
H5File file(m_file_name.c_str(), H5F_ACC_RDWR);
// Part 4: write the output to a scalar attribute
DataSet dataset = file.openDataSet(m_dataset_name.c_str());
std::string filter_names = "multi_filters";
Attribute attribute = dataset.createAttribute( filter_names.c_str(), svec_type, H5S_SCALAR);
attribute.write(svec_type, &hdf_buffer);
file.close();
}
Here is the dataset with attribute as seen from h5dump:
HDF5 "d:\tmp\hdf5_tutorial\h5tutr_dset.h5" {
GROUP "/" {
DATASET "dset" {
DATATYPE H5T_STD_I32BE
DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) }
DATA {
(0,0): 1, 7, 13, 19, 25, 31,
(1,0): 2, 8, 14, 20, 26, 32,
(2,0): 3, 9, 15, 21, 27, 33,
(3,0): 4, 10, 16, 22, 28, 34
}
ATTRIBUTE "multi_filters" {
DATATYPE H5T_VLEN { H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}}
DATASPACE SCALAR
DATA {
(0): ("this is 0", "this is 1", "this is 2", "this is 3", "this is 4", "this is 5", "this is 6", "this is 7", "this is 8", "this is 9")
}
}
}
}
}
I do not understand how to read this data. The code I've experimented with so far is below. It compiles, but I've hardwired the array-size to the known length and the variable-length cstrings are empty? Does anyone have any suggestions as to where I'm going wrong? In particular, how do I query for the length of the array of const char* and how do I read the actual const char* cstrings contained in the array?
void read_attribute_with_vector_of_strings_as_value()
{
using namespace H5;
std::vector<std::string> strings;
try
{
// Open an existing file and dataset readonly
H5File file(m_file_name.c_str(), H5F_ACC_RDONLY);
// Part 4: Open the dataset
DataSet dataset = file.openDataSet(m_dataset_name.c_str());
// Atribute_name
std::string filter_names = "multi_filters";
Attribute attribute = dataset.openAttribute(filter_names.c_str());
size_t sz = attribute.getInMemDataSize();
size_t sz_1 = attribute.getStorageSize();
auto t1 = attribute.getDataType();
VarLenType t2 = attribute.getVarLenType();
H5T_class_t type_class = attribute.getTypeClass();
if (type_class == H5T_STRING)
BOOST_TEST_MESSAGE("H5T_STRING");
int length = 10;
std::vector<char*> tmp_vec(length);
auto s_type = H5::StrType(H5::PredType::C_S1, H5T_VARIABLE);
auto svec_type = H5::VarLenType(&s_type);
hvl_t hdf_buffer;
hdf_buffer.p = tmp_vec.data();
hdf_buffer.len = length;
attribute.read(svec_type, &hdf_buffer);
//attribute.read(s_type, &hdf_buffer);
//attribute.read(tmp_vec.data(), s_type);
for(size_t x = 0; x < tmp_vec.size(); ++x)
{
fprintf(stdout, "GOT STRING [%s]\n", tmp_vec[x] );
strings[x] = tmp_vec[x];
}
file.close();
}
If you are not required to use specific technologies to implement what you have in mind, you may consider HDFql (http://www.hdfql.com) which is a high-level language to manage HDF files easily (think SQL). That way you can be alleviated from all the low-level details of manipulating HDF files that you describe. Using HDFql in C++, reading an array of variable-length char is done like this:
// include HDFql C++ header file (make sure it can be found by the C++ compiler)
#include <iostream>
#include "HDFql.hpp"
int main(int argc, char *argv[])
{
// create an HDF file named "example.h5" and use (i.e. open) it
HDFql::execute("CREATE AND USE FILE example.h5");
// create an attribute named "multi_filters" of type varchar of one dimension (size 5)
HDFql::execute("CREATE ATTRIBUTE multi_filters AS VARCHAR(5)");
// insert (i.e. write) values "Red", "Green", "Blue", "Orange" and "Yellow" into attribute "multi_filters"
HDFql::execute("INSERT INTO multi_filters VALUES(Red, Green, Blue, Orange, Yellow)");
// select (i.e. read) attribute "multi_filters" into HDFql default cursor
HDFql::execute("SELECT FROM multi_filters");
// display content of HDFql default cursor
while(HDFql::cursorNext() == HDFql::Success)
{
std::cout << "Color " << HDFql::cursorGetChar() << " has a size of " << HDFql::cursorGetSize() << std::endl;
}
return 0;
}
This is my code
std::ifstream infile("/home/alexander/MyCompany/MyGame/Resources/res/puzzles(copia).json");
std::string line;
std::ofstream ofs("/home/alexander/MyCompany/MyGame/Resources/res/temporal.json", std::ofstream::out);
Document d;
while (std::getline(infile, line))
{
d.Parse(line.c_str());
if (d.HasParseError()) CCLOG("GetParseError %s\n", d.GetParseError());
if (d.IsObject() && d.HasMember("Lados"))
{
rapidjson::Value& a = d["Lados"]; // Using a reference for consecutive access is handy and faster.
rapidjson::Document::AllocatorType& allocator = d.GetAllocator();
assert(a.IsArray()); // explotar si no es un arreglo
a.PushBack(4, allocator).PushBack(8, allocator).PushBack(15, allocator).PushBack(16, allocator).PushBack(23, allocator).PushBack(42, allocator);
}
// Convertir JSON a string e Insertar en archivo
StringBuffer strbuf;
Writer<StringBuffer> writer(strbuf);
d.Accept(writer);
ofs << strbuf.GetString() << "\n";
}
ofs.close();
infile.close();
// ACTUALIZA ARCHIVO PRINCIPAL & LIMPIA TEMPORAL
std::ifstream src("/home/alexander/MyCompany/MyGame/Resources/res/puzzles(copia).json");
std::ofstream dst("/home/alexander/MyCompany/MyGame/Resources/res/temporal.json");
dst << src.rdbuf();
src.close();
dst.close();
if (remove("/home/alexander/MyCompany/MyGame/Resources/res/temporal.json") != 0) CCLOG("Error deleting file");
CCLOG("save");
As you can see I'm creating a new file called temporal in which'll put my modified file, then pass it back to the original file.
the problem is that when I do that the file does not change,it is created and properly cleared but the file orginal is not modify and don't know why?.
I am using cocos2d-x, c ++ and rapidjson.
not if I need to give permission for my program to modififque arhivos or something like that
JSON has several lines in this format:
{ "N" : 5, "Rotacion" : 42, "Igual" : 20, "Inverso" : 0, "RotacionE" : 47, "Espejo" : 22, "Puntuacion" : 0, "_id" : "563b7b4756ab632f47fe6d7f" , "Lados" : [], "Camino" : [ 6, 5, 4, 21, 22, 7, 2, 3, 20, 23, 8, 1, 18, 19, 24, 9, 0, 17, 16, 15, 10, 11, 12, 13, 14 ], "__v" : 0 }
as I can see your code is ok, what you have to do is reverse the direction of your files .json right now you have it like so:
std::ifstream src("/home/alexander/MyCompany/MyGame/Resources/res/puzzles(copia).json");
std::ofstream dst("/home/alexander/MyCompany/MyGame/Resources/res/temporal.json");
but you have to put it like this :
std::ifstream src("/home/alexander/MyCompany/MyGame/Resources/res/temporal.json");
std::ofstream dst("/home/alexander/MyCompany/MyGame/Resources/res/puzzles(copia).json");
Consider following code
struct VersionData
{
VersionData();
VersionData(VersionData&& rhs);
int m_versionId;
int m_weight;
int m_pId;
bool m_hdi;
};
struct VersionId{};
typedef boost::multi_index_container<
VersionData,
bmi::indexed_by<
bmi::ordered_non_unique<
bmi::tag<VersionId>,
bmi::member<VersionData, int, &VersionData::m_versionId>
>
>
> VersionDataContainer;
struct VersionsData
{
VersionsData();
VersionsData(VersionsData&& rhs);
int m_sdGroupId;
int m_retId;
VersionDataContainer m_versionData;
};
struct mvKey{};
typedef boost::multi_index_container<
VersionsData,
bmi::indexed_by<
bmi::ordered_unique<
bmi::tag<mvKey>,
bmi::composite_key<
VersionsData,
bmi::member<VersionsData,int, &VersionsData::m_subdeliveryGroupId>,
bmi::member<VersionsData,int, &VersionsData::m_retargetingId>
>
>
>
> mvDataContainer;
mvDataContainer m_data;
The intention is to lookup using the composite key in mvDataContainer but in some cases I need to lookup in VersionData across all VersionsData. Something like m_data.get<mvKey>.equal_range(make_tuple(ignore, ignore, ignore)).get<VersionId>.equal_range(123456);
First question, is it achievable?
Second, is this the right approach to use nested multi_index_containers? any performance impacts/gains?
In addition to the other answer suggesting a single container for the whole table, here's the idea based on Boost Intrusive multiset
See it Live On Coliru
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/multi_index/composite_key.hpp>
// for intrusive multiset
#include <boost/intrusive/set.hpp>
#include <boost/range/iterator_range.hpp>
#include <iostream>
namespace bmi = boost::multi_index;
namespace bi = boost::intrusive;
struct VersionData : bi::set_base_hook<bi::link_mode<bi::auto_unlink> > {
VersionData(int versionId, int weight=0, int pId=0, bool hdi=false) :
m_versionId(versionId), m_weight(weight), m_pId(pId), m_hdi(hdi) { }
int m_versionId;
int m_weight;
int m_pId;
bool m_hdi;
friend std::ostream& operator<<(std::ostream& os, VersionData const& vd) {
return os << "{ " << vd.m_versionId << " " << vd.m_weight << " " << vd.m_pId << " " << vd.m_hdi << " }";
}
struct ById {
bool operator()(VersionData const& a, VersionData const& b) const { return a.m_versionId < b.m_versionId; }
};
};
typedef bi::multiset<VersionData, bi::constant_time_size<false>, bi::compare<VersionData::ById> > VersionIndex;
typedef boost::multi_index_container<
VersionData,
bmi::indexed_by<
bmi::ordered_non_unique<
bmi::tag<struct VersionId>,
bmi::member<VersionData, int, &VersionData::m_versionId>
>
>
> VersionDataContainer;
struct VersionsData
{
int m_subdeliveryGroupId;
int m_retargetingId;
VersionDataContainer m_versionData;
};
typedef boost::multi_index_container<
VersionsData,
bmi::indexed_by<
bmi::ordered_unique<
bmi::tag<struct mvKey>,
bmi::composite_key<
VersionsData,
bmi::member<VersionsData,int, &VersionsData::m_subdeliveryGroupId>,
bmi::member<VersionsData,int, &VersionsData::m_retargetingId>
>
>
>
> mvDataContainer;
void insert(
mvDataContainer& into, VersionIndex& global_version_index,
int subdeliveryGroupId, int retargetingId, int
versionId, int weight, int pId, bool hdi)
{
auto& mainIdx = into.get<mvKey>();
auto insertion = mainIdx.insert(VersionsData { subdeliveryGroupId, retargetingId, VersionDataContainer {} });
mainIdx.modify(insertion.first, [&](VersionsData& record) {
auto insertion = record.m_versionData.insert(VersionData { versionId, weight, pId, hdi });
global_version_index.insert(const_cast<VersionData&>(*insertion.first));
});
}
int main() {
VersionIndex global_version_index;
mvDataContainer table;
insert(table, global_version_index, 21, 10, 1, 100, 123, false);
insert(table, global_version_index, 9, 27, 2, 90, 123, false);
insert(table, global_version_index, 12, 25, 3, 110, 123, true);
insert(table, global_version_index, 10, 33, /*version 8:*/ 8, 80, 123, false);
insert(table, global_version_index, 4, 38, 5, 101, 124, false);
insert(table, global_version_index, 33, 7, 6, 91, 124, false);
insert(table, global_version_index, 34, 27, 7, 111, 124, true);
insert(table, global_version_index, 9, 11, /*version 8:*/ 8, 81, 124, false);
insert(table, global_version_index, 0, 12, 9, 99, 125, false);
insert(table, global_version_index, 35, 39, /*version 8:*/ 8, 89, 125, false);
insert(table, global_version_index, 15, 15, 11, 109, 125, true);
insert(table, global_version_index, 25, 32, /*version 8:*/ 8, 79, 125, false);
// debug table output
assert(table.size()==12);
// so now you can do:
std::cout << "---\nQuerying for version id 8:\n";
for (auto& record : boost::make_iterator_range(global_version_index.equal_range(8)))
std::cout << record << "\n";
table.erase(table.find(boost::make_tuple(10,33))); // auto unlinks from global_version_index
std::cout << "---\nQuerying for version id 8:\n";
for (auto& record : boost::make_iterator_range(global_version_index.equal_range(8)))
std::cout << record << "\n";
}
Prints:
---
Querying for version id 8:
{ 8 80 123 0 }
{ 8 81 124 0 }
{ 8 89 125 0 }
{ 8 79 125 0 }
---
Querying for version id 8:
{ 8 81 124 0 }
{ 8 89 125 0 }
{ 8 79 125 0 }
So indeed, instead of using nested containers, like that (live on Coliru)
You could/should implement it as a single /table/ (after all, this is a table with several indices):
Live On Coliru
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/multi_index/composite_key.hpp>
namespace bmi = boost::multi_index;
struct VersionRecord {
int m_subdeliveryGroupId;
int m_retargetingId;
int m_versionId;
int m_weight;
int m_pId;
bool m_hdi;
friend std::ostream& operator<<(std::ostream& os, VersionRecord const& record) {
return os << "{ " << record.m_subdeliveryGroupId << " " << record.m_retargetingId << " "
<< record.m_versionId << " " << record.m_weight << " " << record.m_pId << " " << record.m_hdi << " }";
}
};
typedef boost::multi_index_container<
VersionRecord,
bmi::indexed_by<
bmi::ordered_unique<
bmi::tag<struct mvKey>,
bmi::composite_key<
VersionRecord,
bmi::member<VersionRecord,int, &VersionRecord::m_subdeliveryGroupId>,
bmi::member<VersionRecord,int, &VersionRecord::m_retargetingId>
>
>,
bmi::ordered_non_unique<
bmi::tag<struct VersionId>,
bmi::member<VersionRecord, int, &VersionRecord::m_versionId>
>
>
> VersionTable;
#include <iostream>
#include <boost/range/iterator_range.hpp>
int main() {
auto table = VersionTable {
{ 21, 10, 1, 100, 123, false },
{ 9, 27, 2, 90, 123, false },
{ 12, 25, 3, 110, 123, true },
{ 10, 33, /*version 8:*/ 8, 80, 123, false },
{ 4, 38, 5, 101, 124, false },
{ 33, 7, 6, 91, 124, false },
{ 34, 27, 7, 111, 124, true },
{ 9, 11, /*version 8:*/ 8, 81, 124, false },
{ 0, 12, 9, 99, 125, false },
{ 35, 39, /*version 8:*/ 8, 89, 125, false },
{ 15, 15, 11, 109, 125, true },
{ 25, 32, /*version 8:*/ 8, 79, 125, false },
};
// debug table output
assert(table.size()==12);
for (auto& record : table) std::cout << record << "\n";
// so now you can do:
auto& idx = table.get<VersionId>();
std::cout << "---\nQuerying for version id 8:\n";
for (auto& record : boost::make_iterator_range(idx.equal_range(8)))
std::cout << record << "\n";
}
Which prints:
{ 0 12 9 99 125 0 }
{ 4 38 5 101 124 0 }
{ 9 11 8 81 124 0 }
{ 9 27 2 90 123 0 }
{ 10 33 8 80 123 0 }
{ 12 25 3 110 123 1 }
{ 15 15 11 109 125 1 }
{ 21 10 1 100 123 0 }
{ 25 32 8 79 125 0 }
{ 33 7 6 91 124 0 }
{ 34 27 7 111 124 1 }
{ 35 39 8 89 125 0 }
---
Querying for version id 8:
{ 25 32 8 79 125 0 }
{ 35 39 8 89 125 0 }
{ 10 33 8 80 123 0 }
{ 9 11 8 81 124 0 }
Alternatively, you can bolt an intrusive container on top of the VersionsData records. This however, complicates the design (you either have to use auto_unlink node hooks (sacrificing thread safety control) or you have to make sure the containers are in synch all the time.
It is not the exact answer which I originally asked but since the performance issue was mentioned and in light of discussion with #sehe this is what I found.
1) use flat structure, you can save wasting memory on the same keys using boost::flyweight
2) use MIC instead of tailored containers, MIC might be slightly slower (depends on test scenario) when searching on simple indexes, but once you use composite keys (and implement similar behavior for your tailored datastructure) it is from slightly to significantly faster than tailored DS
My previous statement that tailored one is faster is wrong, since I was using MIC from boost 1.52 and looks like there was a bug when using composite keys with strings (5 orders of magnitude slower than composite without string). When switched to 1.57 everything started to work as expected.
Tests on Coliru
Have a nice indexing, guys! :)