I previously read of a structure or class in C++11 that allows saving two values as one, what its name?
For example, I want to save 1 and 2 together, I know it's easy to implement but if there is one already done why to implement mine :)
The traditional way of storing a number of values without naming them all is to use an array:
int arr[2] = {1, 2};
C++ being what it is, you have lots of other ways to do that. You could achieve something similar with the use of std::array, std::vector (or other STL containers), std::tuple, std::pair, new int[2], struct {int x, y;} elms;, or do something crazy like store 2 32bit values in a single 64bit integer. These are all suitable for different use cases depending on stuff like if the values you're trying to store have the same type, if you know how many of them you want to store at compile time, if the size is fixed, if you have a few of them or many of them, if you're trying to interface with C APIs and so on. I suggest you have a look at our C++ Book Guide and List.
Related
I most often work with python and therefore I'm used to being able to put a bool and integer in a single list. I realize that C++ has a different paradigm, however, I imagine that there is a workaround to this issue. Ideally I want a vector that could contain data that looks like {1, 7, true, 8, false, true, 9}. So this vector would have to be defined with syntax like (vector int bool intBoolsVec), however, I realize that isn't proper syntax.
I see that some people suggest using variant that was introduced in C++17, is this the best solution? Seems like this would be a common problem if C++ doesn't easily allow you to work with heterogenous containers, even if those containers are constrained to a couple defined types like a vector that only takes only ints and bools.
What is the easiest way to create a vector that contains both integers and booleans in C++? If someone could also provide me more insight on why C++ doesn't have an easy/obvious way to do this, that might help me better understand C++ as well.
My approach would probably be to create my own class, which does exactly what I want it to do. This might be your easiest solution, besides using std::any. Also, you could combine those by creating a custom array of std::any which only allows integers and booleans at certain entries for your example. This would be similar, but not equal to an array of std::variant. Also, In C++ you can store 2 types in a std::pair, if that fits your use-case.
The description of the object I have
I have several N-dimensional containers in my code, representing tensors, whose types are defined as
std::vector<std::vector<std::vector<...<double>...>>>
These type of data structures occur in several different sizes and dimensions and they only contain scalar numbers. The number of dimensions is known for every vector and can be accessed as eg. tensor::dimension. Since they're representing tensors, they're never "irregular": at the bottom level, vectors always contain the same number of elements, like this:
// THIS IS HOW THEY ALWAYS LOOK LIKE
T tensor = {{1,2,3,4}, {1,2,3,4}, {1,2,3,4}}
// THIS IS WHAT NEVER HAPPENS
T tensor = {{1,2,3}, {1,2,3,4}, {1,2}}
What I want to do with this object
I want to save each of these multidimensional vectors (tensors basically) into different files, which I can then easily load/read eg. in Python into a numpy.array - for further analysis and visualization. How can I achieve this to save any of these N-dimensional std::vectors in modern C++ without explicitly defining a basic write-to-txt function with N nested loops for each vector with different dimensions?
(Note: Solutions/advice that require/mention only standard libraries are preferred, but I'm happy to hear any other answers too!)
The only way to iterate over something in C++ is a loop, in some sort of shape, matter, or form. So no matter what you're going to have loops. There are no workarounds or alternatives, but it doesn't mean you actually have to write all these loops yourself, one at a time. This is why we have templates in C++. What you are looking for is a recursive template, that recursively peels away each dimension: until the last one which gets implemented for real-sies, basically letting your compiler write every loop for you. Mission accomplished. Starting with a simplistic example of writing out a plain vector
void write_vec(const std::vector<double> &v)
{
for (const auto &value:vector)
std::cout << value << std::endl;
}
The actual details of how you want to save each value, and which files, is irrelevant here, you can adjust the above code to make it work in whichever way you see fit. The point that you want to make it work for some artbirary dimensions. Simply add a template with the same name, then let overload resolution do all the work for you:
template<typename T>
void write_vec(const std::vector<std::vector<T>> &v)
{
for (const auto &value:vector)
write_vec(value);
}
Now, a write_vec(anything), where anything is any N-"deep" vector that ends up in a std::vector<double> will walk its way downhill, on its own, and write out every double.
I'm new to C++ and I think a good way for me to jump in is to build some basic models that I've built in other languages. I want to start with just Linear Regression solved using first order methods. So here's how I want things to be organized (in pseudocode).
class LinearRegression
LinearRegression:
tol = <a supplied tolerance or defaulted to 1e-5>
max_ite = <a supplied max iter or default to 1k>
fit(X, y):
// model learns weights specific to this data set
_gradient(X, y):
// compute the gradient
score(X,y):
// model uses weights learned from fit to compute accuracy of
// y_predicted to actual y
My question is when I use fit, score and gradient methods I don't actually need to pass around the arrays (X and y) or even store them anywhere so I want to use a reference or a pointer to those structures. My problem is that if the method accepts a pointer to a 2D array I need to supply the second dimension size ahead of time or use templating. If I use templating I now have something like this for every method that accepts a 2D array
template<std::size_t rows, std::size_t cols>
void fit(double (&X)[rows][cols], double &y){...}
It seems there likely a better way. I want my regression class to work with any size input. How is this done in industry? I know in some situations the array is just flattened into row or column major format where just a pointer to the first element is passed but I don't have enough experience to know what people use in C++.
You wrote a quite a few points in your question, so here are some points addressing them:
Contemporary C++ discourages working directly with heap-allocated data that you need to manually allocate or deallocate. You can use, e.g., std::vector<double> to represent vectors, and std::vector<std::vector<double>> to represent matrices. Even better would be to use a matrix class, preferably one that is already in mainstream use.
Once you use such a class, you can easily get the dimension at runtime. With std::vector, for example, you can use the size() method. Other classes have other methods. Check the documentation for the one you choose.
You probably really don't want to use templates for the dimensions.
a. If you do so, you will need to recompile each time you get a different input. Your code will be duplicated (by the compiler) to the number of different dimensions you simultaneously use. Lots of bad stuff, with little gain (in this case). There's no real drawback to getting the dimension at runtime from the class.
b. Templates (in your setting) are fitting for the type of the matrix (e.g., is it a matrix of doubles or floats), or possibly the number of dimesions (e.g., for specifying tensors).
Your regressor doesn't need to store the matrix and/or vector. Pass them by const reference. Your interface looks like that of sklearn. If you like, check the source code there. The result of calling fit just causes the class object to store the parameter corresponding to the prediction vector β. It doesn't copy or store the input matrix and/or vector.
Let me preface this with the statement that most of my background has been with functional programming languages so I'm fairly novice with C++.
Anyhow, the problem I'm working on is that I'm parsing a csv file with multiple variable types. A sample line from the data looks as such:
"2011-04-14 16:00:00, X, 1314.52, P, 812.1, 812"
"2011-04-14 16:01:00, X, 1316.32, P, 813.2, 813.1"
"2011-04-14 16:02:00, X, 1315.23, C, 811.2, 811.1"
So what I've done is defined a struct which stores each line. Then each of these are stored in a std::vector< mystruct >. Now say I want to subset this vector by column 4 into two vectors where every element with P in it is in one and C in the other.
Now the example I gave is fairly simplified, but the actual problem involves subsetting multiple times.
My initial naive implementation was iterate through the entire vector, creating individual subsets defined by new vectors, then subsetting those newly created vectors. Maybe something a bit more memory efficient would be to create an index, which would then be shrunk down.
Now my question is, is there a more efficient, in terms of speed/memory usage) way to do this either by this std::vector< mystruct > framework or if there's some better data structure to handle this type of thing.
Thanks!
EDIT:
Basically the output I'd like is first two lines and last line separately. Another thing worth noting, is that typically the dataset is not ordered like the example, so the Cs and Ps are not grouped together.
I've used std::partition for this. It's not part of boost though.
If you want a data structure that allows you to move elements between different instances cheaply, the data structure you are looking for is std::list<> and it's splice() family of functions.
I understand you have not per se trouble doing this but you seem to be concerned about memory usage and performance.
Depending on the size of your struct and the number of entries in the csv file it may be advisabe to use a smart pointer if you don't need to modify the partitioned data so the mystruct objects are not copied:
typedef std::vector<boost::shared_ptr<mystruct> > table_t;
table_t cvs_data;
If you use std::partition (as another poster suggested) you need to define a predicate that takes the indirection of the shared_ptr into accont.
I am quite new to C++ but am familiar with a few other languages.
I would like to use a data type similar to a Java ArrayList, an Objective-c NSMutableArray or a Python array, but in C++. The characteristics I am looking for are the possibility to initialize the array without a capacity (thus to be able to add items gradually), and the capability to store multiple datatypes in one array.
To give you details of what I want it for is to read data from different tables of a mysql db, without knowing the number of fields in the table, and being able to move this data around. My ideal data type would allow me to store something like this:
idealArray = [Bool error,[string array],[string array]];
where the string arrays may have different sizes, from 1 to 20 in size (relatively small).
I don't know if this is possible in C++, any help appreciated, or links towards good ressources.
Thanks
The standard dynamically sized array in C++ is std::vector<>. Homogeneous containers doesn't exist unless you introduce indirection, for this you can use either boost::variant or boost::any depending on your needs.
You may use structure or class to store (named) multiple data types together, such as:
class Record
{
bool _error;
vector<string> _v1;
vector<string> _v2;
};
vector<Record> vec;
or std::tuple to store (unnamed) multiple data types, e.g.
vector<tuple<bool, vector<string>, vector<string> > > vec;
I would suggest using a std::vector from the STL. However, note that C++ does not have a container that can contain multiple data types. There are multiple ways to simulate this behaviour though:
Derive all the "types" that you want to store from a certain "Base_type". Then store the items in the vector as std::vector<Base_type*>, but then you would need to know which item is where and the type to (dynamic)cast to, if the "types" are totally different.
Use something like std::vector<boost::any> from the boost library (but noticing that you are new to C++, that might be overkill).
In fact, the question you need to ask is, why do you want to store unrelated "types" in an "array" in the first place? and if they are related, then "how"? this will guide you in desgining a decent "Base_type" for the "types".
And finally, in short, C++ does not have homogenous array-like structures that can contain unrelated data types.
You could try to use an std::vector<boost::any> (documentation here).