Returning a vector of tuples c++ - c++

I am trying to create and return a vector of two element arrays (which I will refer to as tuples), however I am running into issues.
std::vector<int *> distr;
int tuple[2];
distr.push_back(tuple);
//modify tuple's contents
distr.push_back(tuple)
In this case distr then has two copies of the modified tuple rather than the two distinct tuples I desired.
So I figured it had to do with memory so I tried this approach instead
distr.push_back(new int [num1, num2]);
But it doesn't save the tuples correctly as trying to access their values returns weird false values.
This is clearly due to a misunderstanding of how memory is allocated. I can understand why the first example fails in that fashion but I do not understand the issue with the second example.

When you use
distr.push_back(new int [num1, num2]);
You are not creating a a two element array filled with num1, num1. That would be done like the following:
new int[2] {num1, num2}
I would advise against using this method though. If all of your tuples will be the same size I would make struct to represent that data type (in the special case of two, you can even use std::pair)

Use a pair instead of a pointer:
std::vector<std::pair<int, int> > distr;
// Do some code
distr.emplace_back(num1, num2);

At first, you should understand, that "classic" C and C++ arrays are just buffers of allocated memory. In your sample, tuple is just a pointer to allocated buffer of 2 integers. So, when you push_back value of tuple you just add the same pointer twice. The array itself is not copied to std::vector, so, you end with vector containing two pointers to the SAME area of memory. To achieve desired behavior, you can use more high-level C++-ish data types, such as std::tuple or std::array.
Speaking about your second code snippet, it's just syntax misunderstanding: expression new <type>[<count>] creates a memory buffer (similar to your tuple, but on the HEAP) of values of type <type>. So, if you are going to create buffer of 2 ints, you should write new int[2]. When you are use a, b expression, it evaluates as comma operator, and <count> will be num2 in your sample.
P.S. Be aware, that to work correct with heap memory you should study C++ memory management much deeper.

Related

Building a dataframe in C++

I am trying to build a DataFrame in C++. I'm facing some problems, such as dealing with variable data type.
I am thinking in a DataFrame inspired by Pandas DataFrame (from python). So my design idea is:
Build an object 'Series' which is a vector of a fixed data type.
Build an object 'DataFrame' which will store a list of Series (this list can be variable).
The item 1. is just a regular vector. So, for instance, the user would call
Series.fill({1,2,3,4}) and it would store the vector {1,2,3,4} in some attribute of Series, say Series.data.
Problem 1. How I would make a class that understands {1,2,3,4} as a vector of 4 integers. Is it possible?
The next problem is:
About 2., I can see the DataFrame as a matrix of n columns and m rows, but the columns can have different data types.
I tried to design this as a vector of n pointers, where each pointer would point to a vector of dimension m with different data types.
I tried to do something like
vector<void*> columns(10)
and fill it with something like
columns[0] = (int*) malloc(8*sizeof(int))
But this does not work, if I try to fill the vector, like
(*columns[0])[0] = 5;
I get an error
::value_type {aka void*}’ is not a pointer-to-object type
(int *) (*a[0])[0] = 5;
How can I do it properly? I still have other questions like, how would I append an undetermined number of Series into a DataFrame, but for now, just building a matrix with columns with different data types is a great start.
I know that I must keep track of the types of pointers inside my void vector but I can create a parallel list with all data types and make this an attribute of my class DataFrame.
Building a heterogeneous container (which a dataframe is supposed to be) in C++ is more complex than you think, because C++ is statically typed. It means you have to know all the types at compile time.
Your approach uses a vector of pointers (there are a few variations of this approach, which I am not going into). This approach is very inefficient, because pointers are pointing to all over the memory and trashing your cache locality. I do not recommend even attempting to implement such a dataframe because there is really no point to it.
Look at this implementation of DataFrame in C++: https://github.com/hosseinmoein/DataFrame. You might be able to just use it as is. Or get insight from it how to implement a true heterogeneous DataFrame. It uses a collection of static vectors in a hash table to implement a true heterogeneous container. It also uses contiguous memory space, so it avoids the pointer effect.
TL;DR Version
Discard what you are doing.
Use vector<vector<int>> columns;. When you need a column, use columns[index].data() to get a pointer to the backing array from the indexed inner vector and pass that int * to whatever required the void *. The int * will be implicitly converted.
Explanation
Quoting cppreference
void - type with an empty set of values. It is an incomplete type that cannot be completed (consequently, objects of type void are disallowed). There are no arrays of void, nor references to void. However, pointers to void and functions returning type void (procedures in other languages) are permitted.
Since void is incomplete, you can't have a void. void* needs to be cast back to the actual data type, int*, before it can be used for anything other than passing the anonymously typed pointer around. All receivers of the void * have to know what it really is to do anything with it other than pass it on.
Functions that require void * parameters will take any pointer you give them without any further effort on your part, so there is almost no need to make void * variables in C++. Almost all cases where you would need a void * are filled in with polymorphism or templates. The last time I used a void * in C++ was back when I wrote C++ as C with classes bolted on.
The Error
Given
vector<void*> columns(10);
where each element will contain an array of ints, let's work through
(*columns[0])[0] = 5;
step by step to see what types we have and make sure thee types at each step are consistent
columns[0]
Gets the first element in the vector, a void*. So far so good.
*columns[0]
dereferences the void* at columns[0]. As covered in the preamble, this cannot be done. You cannot dereference a void * because that you have a value of type void This produces the reported ::value_type {aka void}’ is not a pointer-to-object type* error message.
We could
*reinterpret_cast<int*>(columns[0])
to turn it into a pointer to int, something we can dereference and matches the initial type, and receive an int, specifically the first int in the array.
(*reinterpret_cast<int*>(columns[0]))[0]
will fail because you can't index an int. That would be like writing 42[0]. This means the dereference is unnecessary.
The end result needs to look like
reinterpret_cast<int*>(columns[0])[0]
But don't do this. It is unnecessary and grossly over-complicated.

array of pointers and pointer to an array in c++

i have a class in which it's protected section i need to declare an array with unknown size (the size is given to the constructor as a parameter), so i looked around and found out that the best possible solution is to declare an array of pointers, each element points to an integer:
int* some_array_;
and simply in the constructor i'll use the "new" operator:
some_array_ = new int[size];
and it worked, my question is: can i declare an array in a class without defining the size? and if yes how do i do it, if not then why does it work for pointers and not for a normal array?
EDIT: i know vecotrs will solve the problem but i can't use them on my HW
You have to think about how this works from the compiler's perspective. A pointer uses a specific amount of space (usually 4 bytes) and you request more space with the new operator. But how much space does an empty array use? It can't be 0 bytes and the compiler has no way of knowing what space to allocate for an array without any elements and therefore it is not allowed.
You could always use a vector. To do this, add this line of code: #include <vector> at the top of your code, and then define the vector as follows:
vector<int> vectorName;
Keep in mind that vectors are not arrays and should not be treated as such. For example, in a loop, you would want to retrieve an element of a vector like this: vectorName.at(index) and not like this: vectorName[index]
Lets say that you have an integer array of size 2. So you have Array[0,1]
Arrays are continuous byte of memery, so if you declare one and then you want to add one or more elements to end of that array, the exact next position (in this case :at index 2(or the 3rd integer) ) has a high chance of being already allocated so in that case you just cant do it. A solution is to create a new array (in this case of 3 elements) , copy the initial array into the new and in the last position add the new integer. Obviously this has a high cost so we dont do it.
A solution to this problem in C++ is Vector and in Java are ArrayLists.

Difference between <type*[n]> and <type(*)[n]> in C++

I wanted to create a queue to store two dimensional arrays of chars and I thought that declaring it in the following way would work:
queue<char*[7]> states;
However, it turned out that the right way was:
queue<char(*)[7]> states;
And I can't really understand what do the round brackets change? I guess it has something to do with precedence, but nothing more specific.
char*[7] is an array of seven pointers to char, char(*)[7] is a pointer to an array of seven chars. Often it's used to allocate dynamically contiguous multidimensional arrays (see here).
The C++ FAQ about arrays may give you some insight about these subtleties.
An easy way to remember the meaning of char*[7] is that that's the form of the second argument to main.
I.e. it means an array of pointers.
Then char(*)[7] is easiest to analyze by introducing a name, like char(*p)[7]. Since C declarations were designed to mimic use of the declared things, this means that you can dereference p, and index the result, then yielding a char. I.e. p is a pointer to an array of char.
char*[7] is an array of pointer to char.
char(*)[7] is a pointer referencing an array of char.

Is it okay to use constructors to initialize a 2D Vector as a one-liner in C++?

Is it okay to initialize a 2D vector like this (here all values in a 5x4 2D vectro are initialized to 3)?
std::vector<std::vector<int> > foo(5, std::vector<int>(4, 3));
This seems to behave okay, but everywhere I look on the web people seem to recommend initializing such a vector with for loops and push_back(). I was initially afraid that all rows here would point to the same vector, but that doesn't seem to be the case. Am I missing something?
This is perfectly valid - You'll get a 2D vector ([5, 4] elements) with every element initialized to 3.
For most other cases (where you e.g. want different values in different elements) you cannot use any one-liner - and therefore need loops.
Well, the code is valid and it indeed does what you want it to do (assuming I understood your intent correctly).
However, doing it that way is generally inefficient (at least in the current version of the language/library). The above initialization creates a temporary vector and then initializes individual sub-vectors by copying the original. That can be rather inefficient. For this reason in many cases it is preferrable to construct the top-level vector by itself
std::vector<std::vector<int> > foo(5);
and then iterate over it and build its individual sub-vectors in-place by doing something like
foo[i].resize(4, 3);

Passing multidimensional array back through access members

I have a class "foo" that has a multi dimensional array and need to provide a copy of the array through a getArray member. Is there a nice way of doing this when the array is dynamically created so I can not pass the array back a const as the array is always being deleted, recreated etc. I thought about creating a new dynamic array to pass it back but is this acceptable as the calling code would need to know to delete this etc.
Return an object, not a naked array. The object can have a copy constructor, destructor etc. which will do the copying, deletion etc. for the user.
class Matrix {
// handle creation and access to your multidim array
// including copying, deletion etc.
};
class A { // your class
Matrix m; // the classes matrix
Matrix getArray() {
return m;
}
};
Easy answer to your question is, not this is not a good design, as it should be the creating class that should handle the deletion/release of the array.
The main point is why do you keep deleting/recreating this multi dimensional array? Can you not create one instance, and then just modify when need be?
Personally I would return the array as it is, and iterate over it and do any calculations/functions on it during the loop therefore saving resources by not creating/deleting the array.
Neil's probably the best answer. The second best will be not to use an array. In C++, when you talk about dynamic array, it means vector.
There are two possibilities:
nested vectors: std::vector<int, std::vector<int> >(10, std::vector<int>(20))
simple vector: std::vector<int>(200)
Both will have 200 items. The first is clearly multi-dimensional, while the second leaves you the task of computing offsets.
The second ask for more work but is more performing memory-wise since a single big chunk is allocated instead of one small chunks pointing to ten medium ones...
But as Neil said, a proper class of your own to define the exact set of operations is better :)