library for matrices in c++ - c++

I have a lot of elements in a matrix and when I access them manually it takes a pretty long time to eliminate all the bugs arising from wrong indexing... Is there a suitable library that can keep track of e.g the neighbors,the numbering, if an element is in the outer edge or not and so on.
e.g.
VA=
11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44
Now what I would like to do is write a function that says something like
for every Neighbor to element at index 12(which would be 41)
do something
I would like this to only recognize the elements at index 8 (31) and 13 (42).
Right now I'm using vectors (vector<vector<int>>V;)but the code gets pretty difficult and clumsy both to write and read since I have these annoying if statements in every single function.
example:
for (int i=0;i<MatrixSIZE;i++)
if ((i+1)%rowSize!=0){//check that it's not in the outer edge.
//Do something
}
What approach would you suggest?
Can boost::MultiArray help me here in some way? Are there any other similar?
UPDATE::
So i'm looking more for a template that can easily access the elements than a template that can do matrix arithmetichs.

Try LAPACK, a linear algebra package.

There is this: http://osl.iu.edu/research/mtl/
or this: http://www.robertnz.net/nm_intro.htm
If you Google it a bit, there's quite a few matrix libraries out there for C++.

This might inspire you:
Matrix classes in c++

Is it used in a larger program ? If not, it would be more adapted to use R to deal with matrices.
If it's in a larger program, you can use a lib such as MTL.

Related

std::map in class : trade off between execution speed and memory usage

My question concerns the trade off between execution speed and memory usage when designing a class that will be instantiated thousands or millions of times and used differently in different contexts.
So I have a class that contains a bunch of numerical properties (stored in int and double). A simple example would be
class MyObject
{
public:
double property1;
double property2;
...
double property14
int property15;
int property16;
...
int property25;
MyObject();
~MyObject();
};
This class is used by different programs that instantiate
std::vector<MyObject> SetOfMyObjects;
that may contain as much as a few millions of elements. The thing is that depending on the context, some or many properties may remain unused (we do not need to compute them in this given context), implying that the memory for millions of useless int and double is allocated. As I said, the usefulness and uselessness of the properties depend on the context, and I would like to avoid writing a different class for each specific contexts.
So I was thinking about using std::maps to assign memory only for the properties I use. For example
class MyObject
{
public:
std::map<std::string, double> properties_double;
std::map<std::string, int> properties_int;
MyObject();
~MyObject();
};
such that if "property1" has to be computed, it would stored as
MyObject myobject;
myobject.properties_double["property1"] = the_value;
Obviously, I would define proper "set" and "get" methods.
I understand that accessing elements in a std::map goes as the logarithm of its size, but since the number of properties is quite small (about 25), I suppose that this should not slow down the execution of the code too much.
Am I overthinking this too much? Do you think using std::map is a good idea? Any suggestion from more seasoned programmers would be appreciated.
I don't think this is your best option, for 25 elements, you will not benefit that much from using a map in terms of lookup performance. Also, it depends on what kinds of properties are you going to have, if it is a fixed set of properties as in your example, then string lookup would be a waste of memory and CPU cycles, you could go for an enum of all properties or just an integer and use a sequential container for the properties each element has. For such a small number of possible properties, lookup time will be lower than a map because of cache friendliness and integer comparisons, and memory usage will be lower too. For such a small set of properties this solution is marginally better.
Then there is the problem that an int is usually twice as small as a double. And they are different types. So it is not directly possible to store both in a single container, but you could have enough space for a double in each element, and either use a union or just read/write an int from/to the address of the double if the property "index" is larger than 14.
So you can have something as simple as:
struct Property {
int type;
union {
int d_int;
double d_double;
};
};
class MyObject {
std::vector<Property> properties;
};
And for type 1 - 14 you read the d_double field, for type 15 - 25 the d_int field.
BENCHMARKS!!!
Out of curiosity I did some testing, creating 250k objects, each with 5 int and 5 double properties, using a vector, a map and a hash for the properties, and measured memory usage and time taken to set and get the properties, ran each test 3 times in a row to see impact on caching, calculate checksum for getters to verify consistency, and here are the results:
vector | iteration | memory usage MB | time msec | checksum
setting 0 32 54
setting 1 32 13
setting 2 32 13
getting 0 32 77 3750000
getting 1 32 77 3750000
getting 2 32 77 3750000
map | iteration | memory usage MB | time msec | checksum
setting 0 132 872
setting 1 132 800
setting 2 132 800
getting 0 132 800 3750000
getting 1 132 799 3750000
getting 2 132 799 3750000
hash | iteration | memory usage MB | time msec | checksum
setting 0 155 797
setting 1 155 702
setting 2 155 702
getting 0 155 705 3750000
getting 1 155 705 3750000
getting 2 155 706 3750000
As expected, the vector solution is by far the fastest and most efficient, although it is most influenced by cold cache, even running cold it is way faster than a map or hash implementation.
On a cold run, the vector implementation is 16.15 times faster than map and 14.75 times faster than hash. On a warm run it is even faster - 61 times faster and 54 times faster respectively.
As for memory usage, the vector solution is far more efficient as well, using over 4 times less memory than the map solution and almost 5 times less than the hash solution.
As I said, it is marginally better.
To clarify, the "cold run" is not only the first run but also the one inserting the actual values in the properties, so it is fairly illustrative of the insert operations overhead. None of the containers used preallocation so they used their default policies of expanding. As for the memory usage, it is possible it doesn't accurately reflect actual memory usage 100% accurately, since I use the entire working set for the executable, and there is usually some preallocation taking place on OS level as well, it will most likely be more conservative as the working set increases. Last but not least, the map and hash solutions are implemented using a string lookup as the OP originally intended, which is why they are so inefficient. Using integers as keys in the map and hash produces far more competitive results:
vector | iteration | memory usage MB | time msec | checksum
setting 0 32 55
setting 1 32 13
setting 2 32 13
getting 0 32 77 3750000
getting 1 32 77 3750000
getting 2 32 77 3750000
map | iteration | memory usage MB | time msec | checksum
setting 0 47 95
setting 1 47 11
setting 2 47 11
getting 0 47 12 3750000
getting 1 47 12 3750000
getting 2 47 12 3750000
hash | iteration | memory usage MB | time msec | checksum
setting 0 68 98
setting 1 68 19
setting 2 68 19
getting 0 68 21 3750000
getting 1 68 21 3750000
getting 2 68 21 3750000
Memory usage is much lower for hash and map, while still higher than vector, but in terms of performance the tables are turned, while the vector solution sill wins at inserts, at reading and writing the map solution takes the trophy. So there's the trade-off.
As for how much memory is saved compared to having all the properties as object members, by just a rough calculation, it would take about 80 MB of RAM to have 250k such objects in a sequential container. So you save like 50 MB for the vector solution and almost nothing for the hash solution. And it goes without saying - direct member access would be much faster.
TL;DR: it's not worth it.
From carpenters we get: measure twice, cut once. Apply it.
Your 25 int and double will occupy on a x86_64 processor:
14 double: 112 bytes (14 * 8)
11 int: 44 bytes (11 * 4)
for a total of 156 bytes.
A std::pair<std::string, double> will, on most implementation, consume:
24 bytes for the string
8 bytes for the double
and a node in the std::map<std::string, double> will add at least 3 pointers (1 parent, 2 children) and a red-black flag for another 24 bytes.
That's at least 56 bytes per property.
Even with a 0-overhead allocator, any time you store 3 elements or more in this map you use more than 156 bytes...
A compressed (type, property) pair will occupy:
8 bytes for the property (double is the worst case)
8 bytes for the type (you can choose a smaller type, but alignment kicks in)
for a total of 16 bytes per pair. Much better than map.
Stored in a vector, this will mean:
24 bytes of overhead for the vector
16 bytes per property
Even with a 0-overhead allocator, any time you store 9 elements or more in this vector you use more than 156 bytes.
You know the solution: split that object.
You're looking up objects by name that you know will be there. So look them up by name.
I understand that accessing elements in a std::map goes as the logarithm of its size, but since the number of properties is quite small (about 25), I suppose that this should not slow down the execution of the code too much.
You will slow down your program by more than one order of magnitude. Lookup of a map may be O(logN) but it's O(LogN) * C. C will be huge compared to direct access of properties (thousands of times slower).
implying that the memory for millions of useless int and double is allocated
A std::string is at least 24 bytes on all the implementations I can think of - assuming you keen the names of properties short (google 'short string optimisation' for details).
Unless 60% of your properties are unpopulated there is no saving using a map keyed by string at all.
With so many objects and small map object in each you may hit another problem - memory fragmentation. It could be usable to have std::vector with std::pair<key,value> in it instead and do lookup (I think binary search should be sufficient, but it depends on your situation, it could be cheaper to do linear lookup but not to sort the vector). For property key I would use enum instead of string, unless later is dictated by interface (which you did not show).
Just an idea (not compiled/tested):
struct property_type
{
enum { kind_int, kind_double } k;
union { int i; double d; };
};
enum prop : unsigned char { height, widht, };
typedef std::map< std::pair< int/*data index*/, prop/*property index*/ >, property_type > map_type;
class data_type
{
map_type m;
public:
double& get_double( int i, prop p )
{
// invariants...
return m[ std::pair<int,prop>(i,p) ].d;
}
};
Millions of ints and doubles is still only hundreds of megabytes of data. On a modern computer that may not be a huge issue.
The map route looks like it will be a waste of time but there is an alternative you could use that saves memory while retaining decent performance characteristics: store the details in a separate vector and store an index into this vector (or -1 for unassigned) in your main data type. Unfortunately, your description doesn't really indicate how the property usage actually looks but I'm going to guess you can sub-divide into properties that are always, or usually, set together and some that are needed for every node. Let's say you subdivide into four sets: A, B, C and D. The As are needed for every node whereas B, C and D are rarely set but all elements are typically modified together, then modify the struct you're storing like so:
struct myData {
int A1;
double A2;
int B_lookup = -1;
int C_lookup = -1;
int D_lookup = -1;
};
struct myData_B {
int B1;
double B2;
//etc.
};
// and for C and D
and then store 4 vectors in your main class. When a property in the Bs in accessed you add a new myData_B to the vector of Bs (actually a deque might be a better choice, retaining fast access but without the same memory fragmentation issues) and set the B_lookup value in the original myData to the index of the new myData_B. And the same for Cs and Ds.
Whether this is worth doing depends on how few of the properties you actually access and how you access them to together but you should be able to modify the idea to your tastes.

How to nicely print Buffer in OCaml?

I have a Buffer.
Question 1
How can I print out all byte inside one by one?
Question 2
How can I control the format of the printing?
For example, if I have a buffer like 33 33 33 33 33 33 14 40 (every byte is in HEX format), how can I print it as \x33\x33\x33\x33\x33\x33\x14\x40?
To apply an imperative function f to every byte in a buffer b, you can use String.iter f (Buffer.contents b).
To print a value with a desired format, you can use Printf.printf.
To get the integer value of a byte in a string you can use Char.code.
As a side comment, many of your recent questions could be answered extremely quickly by reading through the OCaml standard library documentation. I think this would be a good thing for you to do. There's not a lot of deep intellectual content, it's just something you should know about as an OCaml programmer.

n-dimensional interpolation c++ algorithm

How can I implement n-dimensional interpolation in C++? In ideal case I would like to have it generic on actual kernel so that I can switch between e.g., linear and polynomial interpolation (perhaps as a start: linear interpolation). This article ( http://pimiddy.wordpress.com/2011/01/20/n-dimensional-interpolation/ ) discusses this stuff but I have two problems:
1) I could not understand how to implement the "interpolate" method shown in the article in C++
2) More importantly I want to use it in a scenario where you have "multiple independent variables (X)" and "1 dependent variable (Y)" and somehow interpolate on both (?)
For example, if n=3 (i.e. 3-dimensional) and I have the following data:
#X1 X2 X3 Y
10 10 10 3.45
10 10 20 4.52
10 20 15 5.75
20 10 15 5.13
....
How could I know value of Y (dependent variable) for a particular combination of X (independent variables): 17 17 17
I know there exists other ways such as decision trees and SVM but here I am here interested in interpolation.
You can take a look at a set of interpolation alrogithms (including C++ implementation) at alglib.
Also it should be noted that neural networks (backpropagation nets, for example) are treated as good interpolators.
If your question is about the specific article, it's out of my knowledge.

Keeping track of boolean data

I need to keep track of n samples. The information I am keeping track of is of boolean type, i.e. something is true or false. As soon as I am on sample n+1, i basically want to ignore the oldest sample and record information about the newest one.
So say I keep track of samples, I may have something like
OLDEST 0 0 1 1 0 NEWEST
If the next sample is 1, this will become
OLDEST 0 1 1 0 1 NEWEST
if the next one is 0, this will become...
OLDEST 1 1 0 1 0 NEWEST
So what is the best way to implement this in terms of simplicity and memory?
Some ideas I had:
Vector of bool (this would require shifting elements so seems expensive)
Storing it as bits...and using bit shifting (memorywise --cheap? but is there a limit on the number of samples?)
Linked lists? (might be an overkill for the task)
Thanks for the ideas and suggestions :)
You want a set of bits. Maybe you can look into a std::bitset
http://www.sgi.com/tech/stl/bitset.html
Very straightfoward to use, optimal memory consumption and probably the best performance
The only limitation is that you need to know at compile-time the value of n. If you want to set it on runtime, have a look at boost http://www.boost.org/doc/libs/1_36_0/libs/dynamic_bitset/dynamic_bitset.html
Sounds like a perfect use of a ring buffer. Unfortunately there isn't one in the standard library, but you could use boost.
Alternately roll your own using a fixed-length std::list and splice the head node to the tail when you need to overwrite an old element.
It really depends on how many samples you want to keep.
vector<bool> could be a valid option; I would expect an
erase() on the first element to be reasonably efficient.
Otherwise, there's deque<bool>. If you know how many elements
you want to keep at compile time, bitset<N> is probably better
than either.
In any case, you'll have to wrap the standard container in some
additional logic; none have the actual logic you need (that of
a ring buffer).
If you only need 8 bits... then use a char and do logical shifts "<<, >>" and do a mask to look at the one you need.
16 Bits - short
32 Bits - int
64 Bits - long
etc...
Example:
Oldest 00110010 Newest -> Oldest 1001100101 Newest
Done by:
char c = 0x32; // 50 decimal or 00110010 in binary
c<<1; // Logical shift left once.
c++; // Add one, sense LSB is the newest.
//Now look at the 3rd newest bit
print("The 3rd newest bit is: %d\n", (c & 0x4));
Simple and EXTREMELY cheap on resources. Will be VERY VERY high performance.
From your question, it's not clear what you intend to do with the samples. If all you care about is storing the N most recent samples, you could try the following. I'll do it for "chars" and let you figure out how to optimize for "bool" should you need that.
char buffer[N];
int samples = 0;
void record_sample( char value )
{
buffer[samples%N] = value;
samples = samples + 1;
}
Once you've stored N samples (once you've called record_sample N times) you can read the oldest and newest samples like so:
char oldest_sample()
{
return buffer[samples%N];
}
char newest_sample()
{
return buffer[(samples+N-1)%N];
}
Things get a little trickier if you intend to read the oldest sample before you've already stored N samples - but not that much trickier. For that, you want a "ring buffer" which you can find in boost and on wikipedia.

Diff function on two arrays, in c++/mfc/stl?

Diff function on two arrays (or how to turn Old into New)
Example
One[]={2,3,4,5,6,7}
Two[]={1,2,3,5,5,5,9}
Example Result
Diff: insert 1 into One[0], One[]={1,2,3,4,5,6,7}
Diff: delete 4 from One[3], One[]={1,2,3,5,6,7}
Diff: modify 6 into 5 in One[4], One[]={1,2,3,5,5,7}
Diff: modify 7 into 5 in One[5], One[]={1,2,3,5,5,5}
Diff: append 9 into One[6], One[]={1,2,3,5,5,5,9}
Need code in c++/mfc/stl/c, Thanks.
What you need is a string matching algorithm, usually implemented using dynamic programming (see here).
I'd highly suggest using a library that performs the diff instead of implementing it yourself.
Though it's normally done with letters instead of integers, the usual algorithm for computing the Levenstein distance should work just as well here as where it's usually applied.
I'm diff library developer with C++.
http://code.google.com/p/dtl-cpp/
Using My diff library, it is possible to calculate the difference between two sequences.
Please see examples/intdiff.cpp about how to use.