What data structure is prefered instead of manipulating multiple vectors - c++

I have implemented a class that makes computations on images. The processing is being done on a subset of the given images (lets say 100 out of 1000) at a time and each image takes a different number of iterations to finish. The processing uses GPUs and therefore it is not possible to use all the images at once. When the processing of an image is finished then this image is removed and another one is added. So I am using three different vectors image_outcome, image_index, image_operation to keep infromations about the images:
The image_outcome is a std::vector<float> and each of its elements is a value that is used as a criterion to decide when the image is finished.
The image_index is a std::vector<int> that holds the index of image in the original dataset.
The image_operation is a std::vector<MyEnumValue> that holds the operation that is used to update the image_outcome. Is of an enum type and its value is one of many possible operations.
There are also two functions, one to remove the finished images and one to add as many images as removed (if there are still enough in the input).
The remove_images() function takes all three matrices and the image matrix and removes the elements using the std::vector.erase().
The add_images() takes again the three matrices and the image matrix adds new images and the relevant information to the vectors.
Because I am using an erase() on each vector with the same index (and also a similar way to add) I was thinking to:
Use a private struct that has three vectors (nested struct).
Use a private class that is implemented using three vectors (nested class).
Use a different data-structure other than vec.
A hight-level example of the code can be fund below:
class ComputationClass {
public:
// the constructor initializes the member variables
ComputationClass();
void computation_algorithm(std::vector<cv::Mat> images);
private:
// member variables which define the algorithms parameters
// add_images() and remove_images() functions take more than these
// arguments, but I only show the relevant here
add_images(std::vector<float>&, std::vector<int>&, std::vector<MyEnumValue>&);
remove_images(std::vector<float>&, std::vector<int>&, std::vector<MyEnumValue>&);
};
void ComputationClass::computation_algorithm(std::vector<cv::Mat> images) {
std::vector<float> image_output;
std::vector<int> image_index;
std::vector<MyEnumValue> image_operation;
add_images(image_output, image_index, image_operation);
while (there_are_still_images_to_process) {
// make computations by updating the image_output vector
// check which images finished computing
remove_images(image_output, image_index, image_operation);
add_images(image_output, image_index, image_operation);
}
}

I think, instead of a struct with 3 vectors, a single vector of user-defined objects would work better.
std::vector<MyImage> images;
class MyImage {
Image OImage; // the actual image
float fOutcome;
int dIndex;
MyEnumValue eOperation;
bool getIsDone() {
return fOutcome > 0; // random condition
}
}
You can add to vector or erase from vector with a condition
if( (*it).getIsDone() ) {
VMyVector.erase( it );
}
In my opinion, maintaining 3 vectors that go parallel is easy to make mistakes and hard to modify.

Related

Graph with std::vectors?

I thought that a cool way of using vectors could be to have one vector class template hold an two separate int variables for x/y-coordinates to graph.
example:
std::vector<int, int> *name*;
// First int. being the x-intercept on a graph
// Second int. being the y-intercept on a graph
(I also understand that I could just make every even/odd location or two separate vectors to classify each x/y-coordinate, but for me I would just like to see if this could work)
However, after making this vector type, I came across an issue with assigning which int within the vector will be written to or extracted from. Could anyone tell me how to best select and std::cout both x/y ints appropriately?
P.S. - My main goal, in using vectors this way, is to make a very basic graph output to Visual Studio terminal. While being able to change individual x/y-intercepts by 'selecting' and changing if needed. These coordinates will be outputted to the terminal via for/while loops.
Also, would anyone like to list out different ways to best make x/y-coordinates with different containers?
Your question rather broad, in other words it is asking for a bit too much. I will just try to give you some pointers from which you can work your way to what you like.
A) equidistant x
If your x values are equidistant, ie 0, 0.5, 1, 1.5 then there is no need to store them, simply use a
std::vector<int> y;
if the number of variables is not known at compile time, otherwise a
std::array<int,N> y;
B) arbitrary x
There are several options that depend on what you actually want to do. For simply storing (x,y)-pairs and printing them on the screen, they all work equally well.
map
std::map<int,int> map_x_to_y = { { 1,1}, {2,4}, {3,9}};
// print on screen
for (const auto& xy : map_x_to_y) {
std::cout << xy.first << ":" xy.second;
}
a vector of pairs
std::vector<std::pair<int,int>> vector_x_and_y = { { 1,1}, {2,4}, {3,9}};
Printing on screen is actually the same as with map. The advantage of the map is that it has its elements ordered, while this is not the case for the vector.
C) not using any container
For leightweight calculations you can consider to not store the (xy) pairs at all, but simply use a function:
int fun(int x) { return x*x; }
TL;DR / more focussed
A vector stores one type. You cannot have a std::vector<int,int>. If you look at the documentation of std::vector you will find that the second template parameter is an allocator (something you probably dont have to care about for some time). If you want to store two values as one element in a vector you either have to use std::vector<std::pair<double,double>> or a different container.
PS
I used std::pair in the examples above. However, I do consider it as good practice to name things whenever I can and leave std::pair for cases when I simply cannot give names better than first and second. In this spirit you can replace std::pair in the above examples with a
struct data_point {
int x;
int y;
};

How to pass efficiently a subvector of std::vector<cv::Point3f> as parameter (don't own function)

I am trying to use an opencv function that accepts std::vector<cv::Point3f> among other parameters. In my program, I have an std::vector<cv::Point3f> worldPoints and another std::vector<int> mask, both of larger dimension than what I want to send.
What I want to do is pass to the opencv function only the entries that have a respective non-zero mask, as efficiently as possible.
std::vector<cv::Point3f> worldPointsSubset;
for (int i=0; i<mask.size(); i++) {
if (mask[i] != 0) {
worldPointsSubset.push_back(worldPoints[i]);
}
}
// Then use worldPointsSubset in function
Is there any other way around, possibly involving no copying of data?
EDIT 1: The function I am referring to is solvePnPRansac()
The function that you call requires a vector of Point3f, so if the only thing you have is a masked vector, then you have to copy the data first. There is no way around this if the function doesn't accept a vector and its mask.
To see if this copy is an issue, you must measure the drop in performance first and see if this copy is a bottleneck. If it is a bottleneck, the first thing is to count the number of points you need and reserve that capacity in worldPointsSubset.
There is no way to convert data from std::vector<int> to std::vector<cv::Point3f> without a copy because despite the fact you see the same values the size of data might be different.
But you can change the type of data you are working on (std::vector<int> to std::vector<cv::Point3f>) and work directly with cv::Point3f and when needed pass it to solvePnPRansac().

An object containing a vector referring to another vector's content

My problem is hard to explain, so I'll take the scenario itself as example:
I have a templated Matrix class, which is using a std::vector as storage.
What I'm looking for is having a "row", or "block" method, capable to return another Matrix with a smaller size, but referring to its parent.
With this piece of code:
Matrix<float> mat(2, 2);
// Filling the matrix
Matrix<float> row = mat.row(0); // returns a 1x2 matrix(row vector)
row[1] = 10; // Here I modify the row, which reflects the modifications in mat
std::cout << mat(0, 1); // prints 10
I have been thinking about multiple solutions but all of them have some non-negligible downsides.
Do you have any ideas about how to achieve this?
Thanks!
EDIT 1 :
I forgot to precise, the behavior should be recursive, like getting a block of another block, etc.
Even when implemented correctly, I'd argue that your behavior is counter-intuitive.
Make a seperate MatrixRef class that acts as a reference to a (subset of) a Matrix. This should also make implementation fairly straightforward.

Creating "Item" definitions

I am trying to think of a good way to define base-items for a small game I am creating in C++ using DirectX. Right now the structure would look something like:
struct itemdef {
string name;
ID3DXMesh* mesh;
vector<IDirect3DTexture9*> textures;
vector<short> abilities;
};
The problem I am having is I essentially want to make an array of these base properties without creating an upper limit for textures or abilities.
Essentially imagine it as having another array of "itemdef"
vector<itemdef> itemDefs;
and then wanting to add items to this array, either hardcoded or from a file:
itemDefs.push_back(NewItem("Wall", Assets.GetMesh(Mesh_Wall), ???, ???));
Basically, I have no idea how to put multiple single items into a parameter list for a vector array. The second problem being the need to create two lists in one set of arguments.
So my question is: What should replace the "???" fields in that statement above? Or, failing that, what better method should I use to store these basic definitions?
(for clarity: Pointers for textures are obtained virtually the same way as the mesh above was and the "abilities" are just short ints. Although it should be noted that for these definitions both meshes and textures could just be replaced with their enumerations rather than pointers)
I would build up the vector's first as local variables and then send them to the item constructor:
vector<IDirect3DTexture9*> tex;
vector<short*> abil;
// fill them with the data you need
itemDefs.push_back(NewItem("Wall", Assets.GetMesh(Mesh_Wall), tex, abil));
itemdef NewItem(const string& name, ID3DMesh* mesh, const vector<IDirect3DTexture9*>& textures, const vector<short*>& abilities)
{
itemdef retval;
retval.name = name;
retval.mesh = mesh;
retval.textures = textures;
retval.abilities = abilities;
return retval;
}
Alternatively you could write some function which builds and returns the vector's.
PS: Why do you use some global NewItem-function instead of a constructor?

Simulation design - flow of data, coupling

I am writing a simulation and need some hint on the design. The basic idea is that data for the given stochastic processes is being generated and later on consumed for various calculations. For example for 1 iteration:
Process 1 -> generates data for source 1: x1
Process 2 -> generates data for source 1: x2
and so on
Later I want to apply some transformations for example on the output of source 2, which results in x2a, x2b, x2c. So in the end up with the following vector: [x1, x2a, x2b, x2c].
I have a problem, as for N-multivariate stochastic processes (representing for example multiple correlated phenomenons) I have to generate N dimensional sample at once:
Process 1 -> generates data for source 1...N: x1...xN
I am thinking about the simple architecture that would allow to structuralize the simulation code and provide flexibility without hindering the performance.
I was thinking of something along these lines (pseudocode):
class random_process
{
// concrete processes would generate and store last data
virtual data_ptr operator()() const = 0;
};
class source_proxy
{
container_type<process> processes;
container_type<data_ptr> data; // pointers to the process data storage
data operator[](size_type number) const { return *(data[number]);}
void next() const {/* update the processes */}
};
Somehow I am not convinced about this design. For example, if I'd like to work with vectors of samples instead of single iteration, then above design should be changed (I could for example have the processes to fill the submatrices of the proxy-matrix passed to them with data, but again not sure if this is a good idea - if yes then it would also fit nicely the single iteration case). Any comments, suggestions and criticism are welcome.
EDIT:
Short summary of the text above to summarize the key points and clarify the situation:
random_processes contain the logic to generate some data. For example it can draw samples from multivariate random gaussian with the given means and correlation matrix. I can use for example Cholesky decomposition - and as a result I'll be getting a set of samples [x1 x2 ... xN]
I can have multiple random_processes, with different dimensionality and parameters
I want to do some transformations on individual elements generated by random_processes
Here is the dataflow diagram
random_processes output
x1 --------------------------> x1
----> x2a
p1 x2 ------------transform|----> x2b
----> x2c
x3 --------------------------> x3
p2 y1 ------------transform|----> y1a
----> y1b
The output is being used to do some calculations.
When I read this "the answer" doesn't materialize in my mind, but instead a question:
(This problem is part of a class of problems that various tool vendors in the market have created configurable solutions for.)
Do you "have to" write this or can you invest in tried and proven technology to make your life easier?
In my job at Microsoft I work with high performance computing vendors - several of which have math libraries. Folks at these companies would come much closer to understanding the question than I do. :)
Cheers,
Greg Oliver [MSFT]
I'll take a stab at this, perhaps I'm missing something but it sounds like we have a list of processes 1...N that don't take any arguments and return a data_ptr. So why not store them in a vector (or array) if the number is known at compile time... and then structure them in whatever way makes sense. You can get really far with the stl and the built in containers (std::vector) function objects(std::tr1::function) and algorithms (std::transform)... you didn't say much about the higher level structure so I'm assuming a really silly naive one, but clearly you would build the data flow appropriately. It gets even easier if you have a compiler with support for C++0x lambdas because you can nest the transformations easier.
//compiled in the SO textbox...
#include <vector>
#include <functional>
#include <numerics>
typedef int data_ptr;
class Generator{
public:
data_ptr operator()(){
//randomly generate input
return 42 * 4;
}
};
class StochasticTransformation{
public:
data_ptr operator()(data_ptr in){
//apply a randomly seeded function
return in * 4;
}
};
public:
data_ptr operator()(){
return 42;
}
};
int main(){
//array of processes, wrap this in a class if you like but it sounds
//like there is a distinction between generators that create data
//and transformations
std::vector<std::tr1::function<data_ptr(void)> generators;
//TODO: fill up the process vector with functors...
generators.push_back(Generator());
//transformations look like this (right?)
std::vector<std::tr1::function<data_ptr(data_ptr)> transformations;
//so let's add one
transformations.push_back(StochasticTransformation);
//and we have an array of results...
std::vector<data_ptr> results;
//and we need some inputs
for (int i = 0; i < NUMBER; ++i)
results.push_back(generators[0]());
//and now start transforming them using transform...
//pick a random one or do them all...
std::transform(results.begin(),results.end(),
results.begin(),results.end(),transformation[0]);
};
I think that the second option (the one mentioned in the last paragraph) makes more sense. In the one you had presented you are playing with pointers and indirect access to random process data. The other one would store all the data (either vector or a matrix) in one place - the source_proxy object. The random processes objects are then called with a submatrix to populate as a parameter, and themselves they do not store any data. The proxy manages everything - from providing the source data (for any distinct source) to requesting new data from the generators.
So changing a bit your snippet we could end up with something like this:
class random_process
{
// concrete processes would generate and store last data
virtual void operator()(submatrix &) = 0;
};
class source_proxy
{
container_type<random_process> processes;
matrix data;
data operator[](size_type source_number) const { return a column of data}
void next() {/* get new data from the random processes */}
};
But I agree with the other comment (Greg) that it is a difficult problem, and depending on the final application may require heavy thinking. It's easy to go into the dead-end resulting in rewriting lots of code...