I have a generic table class implemented in C++ that uses a shared_ptr< ptr_vector< vector<T> > > as its backing, where T is an arbitrary typename; the ptr_vector contains pointers to the vectors corresponding to the columns in the table. I decided to wrap the ptr_vector in a shared_ptr since the tables may contain many millions of rows, and the vectors containing data for each column in a ptr_vector for the same reason. (Please tell me if this can be improved.)
Implementing column-wise operations on this table is trivial, since I have access to the native iterator supplied by the vector. However, I also need the table to support row-wise operations: relatively mundane operations such as adding and removing rows should be supported, as well as the ability to use the STL algorithms with the table. Now, I have run across some design issues that I need some help to address:
It seems that implementing a custom iterator to conduct row-wise operations is necessary to accomplish what is describe above. Would boost::iterator_adaptor be the right way to go about doing this?
When the user adds rows to the table, I do not wish to impose a specific data structure upon the user -- how would I go about doing this? I am thinking of accepting iterators as parameters to the add_row() method.
If you think that I should be implementing this table structure differently, I am open to any suggestions that you may have for me. It was originally designed with the intent to store strings read from tab-delimited files containing hundreds of thousands of row entries.
Thank you very much for your help!
The Boost library has a container called multi_array which provides and n-dimensional dynamic array which can be accessed and manipulated along each dimension. This seems to be very similar to what you are trying to build, perhaps you could use it instead?
Related
Let me preface this with the statement that most of my background has been with functional programming languages so I'm fairly novice with C++.
Anyhow, the problem I'm working on is that I'm parsing a csv file with multiple variable types. A sample line from the data looks as such:
"2011-04-14 16:00:00, X, 1314.52, P, 812.1, 812"
"2011-04-14 16:01:00, X, 1316.32, P, 813.2, 813.1"
"2011-04-14 16:02:00, X, 1315.23, C, 811.2, 811.1"
So what I've done is defined a struct which stores each line. Then each of these are stored in a std::vector< mystruct >. Now say I want to subset this vector by column 4 into two vectors where every element with P in it is in one and C in the other.
Now the example I gave is fairly simplified, but the actual problem involves subsetting multiple times.
My initial naive implementation was iterate through the entire vector, creating individual subsets defined by new vectors, then subsetting those newly created vectors. Maybe something a bit more memory efficient would be to create an index, which would then be shrunk down.
Now my question is, is there a more efficient, in terms of speed/memory usage) way to do this either by this std::vector< mystruct > framework or if there's some better data structure to handle this type of thing.
Thanks!
EDIT:
Basically the output I'd like is first two lines and last line separately. Another thing worth noting, is that typically the dataset is not ordered like the example, so the Cs and Ps are not grouped together.
I've used std::partition for this. It's not part of boost though.
If you want a data structure that allows you to move elements between different instances cheaply, the data structure you are looking for is std::list<> and it's splice() family of functions.
I understand you have not per se trouble doing this but you seem to be concerned about memory usage and performance.
Depending on the size of your struct and the number of entries in the csv file it may be advisabe to use a smart pointer if you don't need to modify the partitioned data so the mystruct objects are not copied:
typedef std::vector<boost::shared_ptr<mystruct> > table_t;
table_t cvs_data;
If you use std::partition (as another poster suggested) you need to define a predicate that takes the indirection of the shared_ptr into accont.
problem is simple:
We have a class that has members a,b,c,d...
We want to be able to quickly search(key being value of one member) and update class list with new value by providing current value for a or b or c ...
I thought about having a bunch of
std::map<decltype(MyClass.a/*b,c,d*/),shared_ptr<MyClass>>.
1) Is that a good idea?
2) Is boost multi index superior to this handcrafted solution in every way?
PS SQL is out of the question for simplicity/perf reasons.
Boost MultiIndex may have a distinct disadvantage that it will attempt to keep all indices up to date after each mutation of the collection.
This may be a large performance penalty if you have a data load phase with many separate writes.
The usage patterns of Boost Multi Index may not fit with the coding style (and taste...) of the project (members). This should be a minor disadvantage, but I thought I'd mention it
As ildjarn mentioned, Boost MI doesn't support move semantics as of yet
Otherwise, I'd consider Boost MultiIndex superior in most occasions, since you'd be unlikely to reach the amount of testing it received.
You want want to consider containing all of your maps in a single class, arbitrarily deciding on one of the containers as the one that stores the "real" objects, and then just use a std::map with a mapped type of raw pointers to elements of the first std::map.
This would be a little more difficult if you ever need to make copies of those maps, however.
I am working on a piece of code that need to do something very similar to the combine function in ArcGIS in C/C++. See: http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Combining%20multiple%20rasters
The C++ code will read in multiple very large raster data files (2GB+) in chunks, find unique combinations and output to a single map. For example, if there were 3 maps and <1,3,5> existed, respectfully, in the first cell of the three maps then I want all subsequent instances of <1,3,5> to have the same key in the final output map.
What STL containers should I be using to store the maps? Reading in the files in chunks certainly adds more complexity of the project. The algorithm needs to be very fast so I cannot use vectors which have a O(n) complexity for searching. Currently, I'm thinking of using a unsorted_map of unsorted_multimaps but I am not sure if this is correct and if I am going to get the performance I need.
Any thoughts?
Yes, std::map or std::unordered_map is correct choice. unordered_map is faster and more compact if you don't need order. If you need something even faster, you can replace it with other map implementation.
Use something compact for the key, something like std::tuple or std::array.
I want to design/find a C++ Datastructure/Container which supports two column of data and CRUD operation on those data. I reviewed the STL Containers but none of them supports my requirement (correct me if I am wrong).
My exact requirement is as follows
Datastructure with Two Columns.
Supports the following Functions
Search for a Specific item.
Search for a List of items matching a criteria
Both the column should support the above mentioned Search Operation. ie., I should be able to search for a data on both the columns.
Update a specific item
Delete a specific item
Add new item
I prefer search operation to be faster than add/delete operation.
In Addition I will be sharing this Data between Threads, hence need to support Mutex(I can also implement Mutex Lock on these data separately.)
Does any of the existing STL meets my requirement or do we have any other library or datastructure that fits best to my requirements.
Note: I cant use Database or SQLite to store my data.
Thank you
Regards,
Dinesh
If one of the column is unique then probably you can use Map. Otherwise define a class with two member variables representing the column and store it in a vector. There are algorithms that will help you in searching the container.
Search for a Specific item.
If you need one-way mapping (i.e. fast search over values in one column), you should use map or multimap container classes. There is no, however, bidirectional map in standart library, so you should build your own as pair of (multi)maps or use other libraries, such as boost::bimap
Your best bet is Boost.Bimap, because it will make it easy for you when you want to search based on either column. If you decided that you need more columns, then Boost.Multi_index might be better. Here is an example!
Can you give me a idea about the best way that can i store the data coming from the database dynamically. I know the number of columns before, so i want to create a data structure dynamically that will hold all the data, and i need to reorganize the data to show output. In one word - when we type the query "select * from table" - results will come. how to store the results dynamically. (Using structures , map , lists ..) . Thanks in advance.
In a nutshell, the data structures you use to store the data really depend on your data usage patterns. That is:
Is your need for the data simply to output it? If so, why store the data at all?
If not, will you be performing searches on the data?
Is ordering important?
Will you be performing computations with the data?
How much data will you need to hold?
etc...
An array of strings (StringList in Delphi, not sure what you have in C++), one per row, where each row is a comma-separated string. This can be easily dumped and read into Excel as a .csv file, imported into lots of databases.
Or, an XML document may be best. Or something else. "it depends...."
There are quite a number of options for you preferrably from the STL. Use a class to store one row in a class object or in a string if you don't want to create objects if rows are quite huge and you don't need to access all rows returned.
1) Use a vector - Use smart pointers(shared_ptr) to create objects of the class and push them in a vector. Because of the copying involved in a vector I would use a shared_ptr. Sort it later
2) Use a map/set - Creating and inserting elements may be costly, if you are looking for faster inserts. Look up maybe faster.
3) Hash map - Insertion and look up time is better than a map/set.