Creating dynamic data-type in C++ - c++

I'm creating an interpreter of a particular language in C++. After creating a parser, implementing scope resolution etc., the only problem I have is implementing dynamically typed variables.
Following some general advice scattered around, I created VarData and VarType structs:
union VarData{
int IntData;
char CharData;
double DoubleData;
};
enum class VarType {
Int,
Char,
Double
};
and a variable struct (it is obviously incomplete):
struct Variable {
VarData data;
VarType type;
template<typename T>
void operator =(T val) {
std::string name = typeid(T).name();
if (name == "char") {
data.CharData = val;
type = VarType::Char;
}
else if (name == "int") {
data.IntData = val;
type = VarType::Int;
}
else if (name == "double") {
data.DoubleData = val;
type = VarType::Double;
}
}
};
And this actually kind of works, for instance in this sample code, all assigned values are correctly stored:
int main() {
Variable a;
a = '5'; // a.type is now VarType::Char
a = 57; // a.type is now VarType::Int
a = 8.032; // a.type is now VarType::Double
}
The problem I have is that if I want to use the Variable struct, I need operator overloads for all common operators (+, -, /, * etc.), each of which needs to cover all possible pairs of types Variable can take. For instance,
Variable operator + (Variable& v1, Variable& v2) {
if (v1.type == VarType::Char && v2.type == VarType::Char)
//return Variable of type int
else if (v1.type == VarType::Double && v2.type == VarType::Int)
// return Variable of type double
else if (...)
}
Is there any other (not involving millions of nested if statements) method of doing that?
Sorry if my question is not exactly clear, I will be happy to provide additional explanation.

One way to handle all the different possible type combinations could be to use double dispatch to execute the operation based on the types involved.
Double dispatch would simplify the determination of what variant of the operation to execute. This means a lot of if less, leaving either to some clever combination of overload override or a dispatch table the duty to mechanically find the suitable operation. However, it would not really be a mastery of the combinatorial explosion.
Another more effective way would be to apply some systematic type promotion rules. For example if you want to combine in one operation an integer and a float, you'd convert everything to float before performing the operation.
If you use the interpreter pattern, you could have a template method pattern to manage the type promotion before invoking the suitable operator overload for two values of the same type.
Unrelated but important: you need to be aware that the typeid() is not a standardized value. So "int", "double", etc... are a nice implementation, but other strings might be used for other compilers. This makes your code non-portable

Related

C++ and dynamically typed languages

Today I talked to a friend about the differences between statically and dynamically typed languages (more info about the difference between static and dynamic typed languages in this SO question). After that, I was wondering what kind of trick can be used in C++ to emulate such dynamic behavior.
In C++, as in other statically typed languages, the variable type is specified at compile time. For example, let's say I have to read from a file a big amount of numbers, which are in the majority of the cases quite small, small enough to fit in an unsigned short type. Here comes the tricky thing, a small amount of these values are much bigger, bigger enough to need an unsigned long long to be stored.
Since I assume I'm going to do calculations with all of them I want all of them stored in the same container in consecutive positions of memory in the same order than I read them from the input file.. The naive approach would be to store them in a vector of type unsigned long long, but this means having typically up to 4 times extra space of what is actually needed (unsigned short 2 bytes, unsigned long long 8 bytes).
In dynamically typed languages, the type of a variable is interpreted at runtime and coerced to a type where it fits. How can I achieve something similar in C++?
My first idea is to do that by pointers, depending on its size I will store the number with the appropriate type. This has the obvious drawback of having to also store the pointer, but since I assume I'm going to store them in the heap anyway, I don't think it matters.
I'm totally sure that many of you can give me way better solutions than this ...
#include <iostream>
#include <vector>
#include <limits>
#include <sstream>
#include <fstream>
int main() {
std::ifstream f ("input_file");
if (f.is_open()) {
std::vector<void*> v;
unsigned long long int num;
while(f >> num) {
if (num > std::numeric_limits<unsigned short>::max()) {
v.push_back(new unsigned long long int(num));
}
else {
v.push_back(new unsigned short(num));
}
}
for (auto i: v) {
delete i;
}
f.close();
}
}
Edit 1:
The question is not about saving memory, I know in dynamically typed languages the necessary space to store the numbers in the example is going to be way more than in C++, but the question is not about that, it's about emulating a dynamically typed language with some c++ mechanism.
Options include...
Discriminated union
The code specifies a set of distinct, supported types T0, T1, T2, T3..., and - conceptually - creates a management type to
struct X
{
enum { F0, F1, F2, F3... } type_;
union { T0 t0_; T1 t1_; T2 t2_; T3 t3_; ... };
};
Because there are limitations on the types that can be placed into unions, and if they're bypassed using placement-new care needs to be taken to ensure adequate alignment and correct destructor invocation, a generalised implementation becomes more complicated, and it's normally better to use boost::variant<>. Note that the type_ field requires some space, the union will be at least as large as the largest of sizeof t0_, sizeof t1_..., and padding may be required.
std::type_info
It's also possible to have a templated constructor and assignment operator that call typeid and record the std::type_info, allowing future operations like "recover-the-value-if-it's-of-a-specific-type". The easiest way to pick up this behaviour is to use boost::any.
Run-time polymorphism
You can create a base type with virtual destructor and whatever functions you need (e.g. virtual void output(std::ostream&)), then derive a class for each of short and long long. Store pointers to the base class.
Custom solutions
In your particular scenario, you've only got a few large numbers: you could do something like reserve one of the short values to be a sentinel indicating that the actual value at this position can be recreated by bitwise shifting and ORing of the following 4 values. For example...
10 299 32767 0 0 192 3929 38
...could encode:
10
299
// 32767 is a sentinel indicating next 4 values encode long long
(0 << 48) + (0 << 32) + (192 << 16) + 3929
38
The concept here is similar to UTF-8 encoding for international character sets. This will be very space efficient, but it suits forward iteration, not random access indexing a la [123].
You could create a class for storing dynamic values:
enum class dyn_type {
none_type,
integer_type,
fp_type,
string_type,
boolean_type,
array_type,
// ...
};
class dyn {
dyn_type type_ = dyn_type::none_type;
// Unrestricted union:
union {
std::int64_t integer_value_;
double fp_value_;
std::string string_value_;
bool boolean_value_;
std::vector<dyn> array_value_;
};
public:
// Constructors
dyn()
{
type_ = dyn_type::none_type;
}
dyn(std::nullptr_t) : dyn() {}
dyn(bool value)
{
type_ = dyn_type::boolean_type;
boolean_value_ = value;
}
dyn(std::int32_t value)
{
type_ = dyn_type::integer_type;
integer_value_ = value;
}
dyn(std::int64_t value)
{
type_ = dyn_type::integer_type;
integer_value_ = value;
}
dyn(double value)
{
type_ = dyn_type::fp_type;
fp_value_ = value;
}
dyn(const char* value)
{
type_ = dyn_type::string_type;
new (&string_value_) std::string(value);
}
dyn(std::string const& value)
{
type_ = dyn_type::string_type;
new (&string_value_) std::string(value);
}
dyn(std::string&& value)
{
type_ = dyn_type::string_type;
new (&string_value_) std::string(std::move(value));
}
// ....
// Clear
void clear()
{
switch(type_) {
case dyn_type::string_type:
string_value_.std::string::~string();
break;
//...
}
type_ = dyn_type::none_type;
}
~dyn()
{
this->clear();
}
// Copy:
dyn(dyn const&);
dyn& operator=(dyn const&);
// Move:
dyn(dyn&&);
dyn& operator=(dyn&&);
// Assign:
dyn& operator=(std::nullptr_t);
dyn& operator=(std::int64_t);
dyn& operator=(double);
dyn& operator=(bool);
// Operators:
dyn operator+(dyn const&) const;
dyn& operator+=(dyn const&);
// ...
// Query
dyn_type type() const { return type_; }
std::string& string_value()
{
assert(type_ == dyn_type::string_type);
return string_value_;
}
// ....
// Conversion
explicit operator bool() const
{
switch(type_) {
case dyn_type::none_type:
return true;
case dyn_type::integer_type:
return integer_value_ != 0;
case dyn_type::fp_type:
return fp_value_ != 0.0;
case dyn_type::boolean_type:
return boolean_value_;
// ...
}
}
// ...
};
Used with:
std::vector<dyn> xs;
xs.push_back(3);
xs.push_back(2.0);
xs.push_back("foo");
xs.push_back(false);
An easy way to get dynamic language behavior in C++ is to use a dynamic language engine, e.g. for Javascript.
Or, for example, the Boost library provides an interface to Python.
Possibly that will deal with a collection of numbers in a more efficient way than you could do yourself, but still it's extremely inefficient compared to just using an appropriate single common type in C++.
The normal way of dynamic typing in C++ is a boost::variant or a boost::any.
But in many cases you don't want to do that. C++ is a great statically typed language and it's just not your best use case to try to force it to be dynamically typed (especially not to save memory use). Use an actual dynamically typed language instead as it is very likely better optimized (and easier to read) for that use case.

Convert void* to a dynamic type

I have a double variable i, that is converted to a void pointer named pointer:
double i = 3;
void *pointer = &i;
When I like to convert the void pointer back to double I used:
double p = *((double*) pointer);
I would like to convert the void pointer to the type I will send as a char*:
char* myType= typeid(i).name();//get the name of type of "i"
int p = *((myType*) pointer); // how to implement?
Is it possible?
instead of
char* myType= typeid(i).name();//get the name of type of "i"
int p = *((myType*) pointer); // how to implement?
use
typedef decltype(i) myType;
myType p = *((myType*) pointer);
or better:
typedef decltype(i) myType;
auto p = *reinterpret_cast<myType*>(pointer);
Works with c++11 or later. If you want to decltype on older c++ compilers, it is emulated in boost.
Edit. This is probably different from what you wanted to do, which I suppose is something like this:
void myFunction(void* unknownParam) {
typedef (realTypeOf unknownParam) RealType; // <-- this is not real c++
RealType &a = *reinterpret_cast<RealType*>(unknownParam)
//do stuff using 'a'
}
This is not possible in C++, but there is a reason: it doesn't make much sense.
And the reason is that for myFunction to be valid the //do stuff using 'a' part should be valid for whatever type RealType ends up being. As such, it cannot rely on any feature the RealType type have: it cannot use any of its methods, it cannot use any operator, it cannot even know whether it is a class or not. Basically, you cannot do anything more with it than what you would already be able to do on a void*, so giving the type a name doesn't really help you much.
A language feature that is similar to what you want (but not quite it) is type reflection, which is not present in C++, but you can find it in language such as Java, Objective-C or C#. Basically, you ask the object itself if it has a certain method, and eventually call it. An example in Objective-C
-(void)myFunction:(id)unknownParam {
if([unknownParam respondsToSelector:#selector(myMethod)])
[unknownParam performSelector:#selector(myMethod)]
}
C/C++ does not work well to interchange datatype like for example JavaScript variables
Format of int value will be different than double value format (floating point) in binary
You cannot get original data type using typeid after it has been casted to void*. Also note that typeid will have different output on different OS and compilers
double dValue = 77.7;
void* pValue = &dValue;
//output "pV" / pointer void (depending on compiler and OS)
std::cout << typeid(dValue).name() << std::endl;
To cast from void* using string you can make rules like following. Or you can try to use C++ template functions in specific cases.
int iOutValue = 0;
double dOutValue = 0;
char* name = "double";
if(!strcmp(name, "int"))
{
iOutValue = *((int*)pValue);
}
else if(!strcmp(name, "double"))
{
dOutValue = *((double*)pValue);
}
If instead of passing around void* you used some kind of variant type, you would be able to convert it back.
If you use a string to indicate the type you will need some kind of map from string to actual type. Although you can go from type to string, there is no conversion back.
Assuming you know your underlying data is always numeric in some way, there are ways to have special discrete variants that only contain numbers. The simplest would be to store a union of a 64-bit int and a double, and some flag indicating which one you have, and then a method to convert to any numeric type, asDouble(), asLong() etc.

Is there any way to declare the name of the variable then the type

I have enable to the user choosing the type of data he wants to use, for example if he wants to use long or double etc.
I would like to declare first the name of the variable then set its type. Is it possible in C++?
If I understood your question correctly, you want to do this:
declare variable;
// in the program:
variable = (int) anotherVariable;
Short answer:
No
Long answer:
a void * does exactly this, it needs to be explicitly converted to a different type before dereferencing. But this is not possible on variables that are not void *s.
void *variable = NULL;
int someIntVariable = 100;
int *someIntPointer = NULL;
variable = &someIntVariable;
someIntPointer = (int *)variable;
.. // but this seems unncessary.
Have a look at boost::variant, or, if you need only PODs, union. However keep in mind that this complicates many things.
enum VariantType {
USER_INT, USER_DOUBLE
};
union Variant {
int i;
double d;
}
int main() {
VariantType type;
Variant data;
type = getUserDataType();
switch(type) {
case USER_INT:
data.i = 42;
break;
case USER_DOUBLE:
data.d = 42.0;
break;
default:
break;
}
...or use some ready Variant implmementation.
Look into using VARIANT (if you're on Windows) or something similar on other platforms. The point of VARIANT is that it's a union that is capable of storing all kinds of data types but only 1 particular type at a given time. This way you can define a new generic variable type (VARIANT) ahead of time and then adapt its internal type at run-time, depending on user choice.
Using something like VARIANT comes at a price, though, since every operation that you do on it will have to check if the operation is correct for the current underlying type. VARIANT also uses more memory since the union has its own overhead (see the definition for details).
You may want to wrap variant operations in a class to simplify its usage. The nice thing about VARIANT as opposed to void* is that it gives you a lot more type safety and the code becomes a lot more readable.
Edit: as another answer pointed out, boos::variant is for this purpose.

Why can't I increment a variable of an enumerated type?

I have a enumerated type StackID, and I am using the enumeration to refer to an index of a particular vector and it makes my code easier to read.
However, I now have the need to create a variable called nextAvail of type StackID. (it actually refers to a particular stackID ). I tried to increment it but in C++, the following is illegal:
nextAvail++;
Which sort of makes sense to me ... because there's no bounds checking.
I'm probably overlooking something obvious, but what's a good substitute?
I also want to link to this question.
I'm probably overlooking something obvious, but what's a good substitute?
Overloading operator++:
// Beware, brain-compiled code ahead!
StackID& operator++(StackID& stackID)
{
#if MY_ENUMS_ARE_CONTIGUOUS && I_DO_NOT_WORRY_ABOUT_OVERFLOW
return stackID = static_cast<StackID>( ++static_cast<int>(stackID) );
#else
switch(stackID) {
case value1 : return stackID = value2;
case value2 : return stackID = value3;
...
case valueN : return stackID = value1;
}
assert(false);
return stackID; // some compilers might warn otherwise
#endif
}
StackID operator++(StackID& stackID, int)
{
StackID tmp(stackID);
++stackID;
return tmp;
}
Because enumerations do not have to be contiguous. E.g. take this example:
enum Colors {
cRed, // = 0
cBlue, // = 1
cGreen = 3
}
What should happen in this scenario?
Colors color = cBlue;
Colors other = color++;
Should other be cGreen or should it be 2. In that case it's not a valid enumeration member anymore. What about this?
Colors color = cGreen;
Colors other = color++;
Should other be cRed (wrap around) or 4?
As you can see, being able to increment enumeration values introduces a whole lot of questions and complicates the simple mechanism that they intend to be.
If all you care about is the integer value being incremented, then simply cast to int and increment that.
Casting back and forth to/from int is of course the obvious solution, then you make clear that you understand that the addition is happening "outside" the enum:
nextAvail = static_cast<StackID>(static_cast<int>(nextAvail) + 1);
Why not store nextAvail as an int instead if you're going to do arithmetic operations on it?
Another option would be to wrap the enum in your own type and overload operator ++ for it (which also could wrap around or something for instance).
An enumeration is semantically supposed to represent a set of distinct related, values.
So you could have
enum Colour {RED, GREEN, BLUE};
But that should be equivalent to:
enum Colour {GREEN, BLUE, RED};
The problem is that if you increment an enum then those representations are not the same. GREEN++ in the first case is not the same as GREEN++ in the second.
Making your program dependent on the declaration of the enum is a recipe for disaster - maintainers may assume that the order of the enum doesnt matter, introducing many silent bugs.
Very Simple:
nextAvail = (StackID)(nextAvail + 1);
Enums are going to be type int, so you can cast them. Is this what you're trying to do?
int ndx = (int) StackID.SomeValue;
...
++ndx;
This is going to make someone very confused down the line, of course.
It occurs to me that you're using an enum where you should be using const, or even #define. enum is most appropriate when you have arbitrary values (where the exact value is not meaningful).
I've overloaded the ++/-- operator in this way:
enum STATE {STATE_1, STATE_2, STATE_3, STATE_4, STATE_5, STATE_6};
// Overload the STATE++ operator
inline STATE& operator++(STATE& state, int) {
const int i = static_cast<int>(state)+1;
state = static_cast<STATE>((i) % 6);
return state;
}
// Overload the STATE-- operator
inline STATE& operator--(STATE& type, int) {
const int i = static_cast<int>(type)-1;
if (i < 0) {
type = static_cast<STATE>(6);
} else {
type = static_cast<STATE>((i) % 6);
}
return type;
}
With respect to oprator++, $5.2.6/1 states- "The type of the operand shall be an arithmetic type or a pointer to a complete object type."
StackID does not fit the bill here. It is of enumeration type.
One option is like this
$5.7/1 - "For addition, either both operands shall have arithmetic or enumeration type, or one operand shall be a pointer to a completely defined object type and the other shall have integral or enumeration type."
enum Possibility {Yes, No, Maybe};
Possibility operator++(Possibility const& r){
return Possibility(r + 1); // convert r to integer, add 1, convert back to Enum
}
int main(){
Possibility p = Yes;
Possibility p1 = ++p;
}
I'm quite happy with this C plus C++ solution for a for loop incrementing an enum.
for (Dwg_Version_Type v = R_INVALID; v <= R_AFTER; v++)
=>
int vi;
for (Dwg_Version_Type v = R_INVALID;
v <= R_AFTER;
vi = (int)v, vi++, v = (Dwg_Version_Type)vi)
The other solutions here are not C backcompat, and quite large.

Declaring a data type dynamically in C++

I want to be able to do the following:
I have an array of strings that contain data types:
string DataTypeValues[20] = {"char", "unsigned char", "short", "int"};
Then later, I would like to create a variable of one of the data types at runtime. I won't know at compile time what the correct data type should be.
So for example, if at runtime I determined a variable x needed to be of type int:
DataTypeValues[3] x = 100;
Obviously this won't work, so how could I do something like this?
The simple answer is that you can't - types need to be known at compile time in C++. You can do something like it using things like boost::any or unions, but it won't be pretty.
you would have to use unions to achieve something like that, but handling unions is a very difficile matter, so you should choose a container class which wraps the union logic behind an interface like Boost.Variant or Qts QVariant
You can't. This kind of run-time metaprogramming is not supported in C++.
Everyone saying you can't do this in C++ is missing one obvious solution. This is where you could use a base class, you need to define the commonly used interface there, and then all the derived classes are whatever types you need. Put it in a smart pointer appropriate for a container and there you go. You may have to use dynamic type inference if you can't put enough of the interface in the base class, which is always frowned upon because it's ugly, but it's there for a reason. And dynamically allocating your types probably isn't the most efficient thing, but as always, it depends on what you're using it for.
I think you are really looking for a dynamically-typed language. Embed an interpreter if you must stick with C++!
Or you could implement something akin to the component model using interfaces to work with wrapped data. Start with the cosmic base class - IObject, then implement interfaces for IInteger, IDouble, IString, etc. The objects themselves would then get created by a factory.
Or you could just use void buffers with a factory... That's the age-old way of avoiding static typing in C/C++ (without the use of inheritance-based polymorphism). Then sprinkle in generous amounts of reinterpret_cast.
The closest you can get is with templates:
template<int i> class Data { };
template<> class Data<0> { typedef char type; }
template<> class Data<1> { typedef unsigned char type; }
template<> class Data<2 { typedef short type; }
template<> class Data<3> { typedef int type; }
Data<3>::Type x;
If you need something a lot more complex, Boost has a C++-Python bridge.
use union and make your own dynamic class.
the pseudocode like:
union all{
char c;
unsigned char uc;
short s;
int i;
};
class dynamic{
public:
char Type;
all x;
template <class T>
dynamic(T y){
int Int;
char Char;
unsigned char Uchar;
short Short;
if (typeof(y) == typeof(Char)){
Type = 1;
}else if (typeof(y) == typeof(Uchar)) {
Type = 2;
}else if (typeof(y) == typeof(Short)) {
Type = 3;
}else{
Type = 4;
}
switch (Type) {
case 1: x.c = y; break;
case 2: x.uc = y; break;
case 3: x.s = y; break ;
case 4: x.i = y; break ;
}
}
auto get() {
switch(Type) {
case 1: return x.c;
case 2: return x.uc;
case 3: return x.s;
case 4: retuen x.i;
}
}
//also make the operators function you like to use
} ;
however you should avoid using the dynamic type as possible as you can because it is memory inefficient
(in this example, each object of dynamic will takes 5 bytes)
it will also slow down your code (a bit).if in your example you want to use dynamic type of number variable only to reduce memory usage, you should forget about dynamic and just use the integer as the type (where integer can contain all of char, unsigned char, and short at once).
but if you want to use it because you need a dynamic type between something really different (example between an int and a string or a custom object), then it will be one of your option.
The only thing you can do is manually loop through the types and compare each individual one. There's also the potential to use a factory object here - but that would involve the heap.
Visual Basic's 'Variant' data type is what you are talking about. It can hold anything, primary data types, arrays, objects etc.
"The Collection class in OLE Automation can store items of different data types. Since the data type of these items cannot be known at compile time, the methods to add items to and retrieve items from a collection use variants. If in Visual Basic the For Each construct is used, the iterator variable must be of object type, or a variant." -- from http://en.wikipedia.org/wiki/Variant_type
The above page gives some insights on how variants are used and it shows how OLE is used in C++ for dealing with variants.
In your simple example, there would be little benefit in not simply using the widest type in the list as a generic container and casting to the smaller types when necessary (or even relying on implicit casts).
You could get elaborate with unions, classes, polymorphism, RTTI, Boost variants etc, but merely for a list of different width integers it is hardly worth the effort.
It seems to me you have a perceived problem for which you have invented an impractical solution for which you are now asking for help. You'd probably be far better off describing your original problem rather than making your solution the problem!
Also, don't forget that all the functions that must operate on this mysterious data type. Most functions are designed to use only one type, such as addition. The functions are overloaded to handle additional types.
How do you know at run-time what the variable type is?
The only way that come to mind now is the old C style where pointer to void was used like:
void *unkown;
Leter on you can assign any object to it like below:
unkown = (void *)new int(4);
If you know the type in the runtime then you may run specified function on such variable like below:
if(type==0) { // int
printf("%d\n", * ((int*)unkown) );
} else {
// other type
}
This way (casting void*) is used for example when malloc [, etc.] function is used.
I'm not saying it is a good practise when c++ is now much more developed.
Still agree with persons that saying it is not the best solution for your problem. But maybe after some redesign you may find it helpful.
You may find also interesting auto type since C++11.
http://en.cppreference.com/w/cpp/language/auto
I guess this reply would be a few years late. But for people who might happen to view this thread, a possible solution for this would be using variable templates. For example:
template<typename T>
T var;
template<typename T>
T arr[10];
int main() {
int temp;
var<int> = 2;
cout << var<int> << ' '; // this would output 2
var<char> = 'a';
cout << var<int> << ' '; // var<int> value would be a space character
cout << var<char> << ' '; // this would output 'a'
for(int i = 0; i < 10; i++) {
switch(i % 2) {
case 0:
arr<int>[i] = ++temp;
break;
case 1:
arr<char>[i] = 'a' + ++temp;
break;
}
cout << endl;
for(int i = 0; i < 10; i++) {
switch(i % 2) {
case 0:
cout << arr<int>[i] << ' ';
break;
case 1:
cout << arr<char>[i] << ' ';
break;
}
}
return 0;
}
The only problem with this, is that you would need to know the variable type of what is currently within the variable(e.g. storing in an integer array what the variable's "id"(the id you would give it), for a specific type). If you do not know or do not have a condition to know what is inside a specific variable or array location, I do not suggest using this.
I try to post it in here, but I had format error. I decided to put a link.
Any way you can use (long long) to store addresses because size of address is 8 and size of (long long) also is 8 then it can hold an address.
https://www.flatech.com.au/learning-material/programming/c/object-pointers-to-any-type