This might seem like a weird question, but how would I create a C++ function that tells whether a given C++ function that takes as a parameter a variable of type X and returns a variable of type X, is injective in the space of machine representation of those variables, i.e. never returns the same variable for two different variables passed to it?
(For those of you who weren't Math majors, maybe check out this page if you're still confused about the definition of injective: http://en.wikipedia.org/wiki/Injective_function)
For instance, the function
double square(double x) { return x*x};
is not injective since square(2.0) = square(-2.0),
but the function
double cube(double x) { return x*x*x};
is, obviously.
The goal is to create a function
template <typename T>
bool is_injective(T(*foo)(T))
{
/* Create a set std::set<T> retVals;
For each element x of type T:
if x is in retVals, return false;
if x is not in retVals, add it to retVals;
Return true if we made it through the above loop.
*/
}
I think I can implement that procedure except that I'm not sure how to iterate through every element of type T. How do I accomplish that?
Also, what problems might arise in trying to create such a function?
You need to test every possible bit pattern of length sizeof(T).
There was a widely circulated blog post about this topic recently: There are Only Four Billion Floats - So Test Them All!
In that post, the author was able to test all 32-bit floats in 90 seconds. Turns out that would take a few centuries for 64-bit values.
So this is only possible with small input types.
Multiple inputs, structs, or anything with pointers are going to get impossible fast.
BTW, even with 32-bit values you will probably exhaust system memory trying to store all the output values in a std::set, because std::set uses a lot of extra memory for pointers. Instead, you should use a bitmap that's big enough to hold all 2^sizeof(T) output values. The specialized std::vector<bool> should work. That will take 2^sizeof(T) / 8 bytes of memory.
Maybe what you need is std::numeric_limits. To store the results, you may use an unordered_map (from std if you're using C++11, or from boost if you're not).
You can check the limits of the data types, maybe something like this might work (it's a dumb solution, but it may get you started):
template <typename T>
bool is_injective(T(*foo)(T))
{
std::unordered_map<T, T> hash_table;
T min = std::numeric_limits<T>::min();
T max = std::numeric_limits<T>::max();
for(T it = min; i < max; ++i)
{
auto result = hash_table.emplace(it, foo(it));
if(result.second == false)
{
return false;
}
}
return true;
}
Of course, you may want to restrict a few of the possible data types. Otherwise, if you check for floats, doubles or long integers, it'll get very intensive.
but the function
double cube(double x) { return x*x*x};
is, obviously.
It is obviously not. There are 2^53 more double values representable in [0..0.5) than in [0..0.125).
As far as I know, you cannot iterate all possible values of a type in C++.
But, even if you could, that approach would get you nowhere. If your type is a 64 bit integer, you might have to iterate through 2^64 values and keep track of the result for all of them, which is not possible.
Like other people said, there is no solution for a generic type X.
Related
I read a lot about unordered_map not being very fast but I wonder what's the best alternative to do this:
I need to buffer calculation results for a function of an integer argument. I don't know ahead of time what range or interval will be requested. Storing in a vector with maximal resolution would cost way too much memory.
So I'm using
unordered_map<unsigned long, pair<T, long>>
Where the key is the argument of the function to be computed, the first of the pair the result of the computation of type T, and the second of the pair a version information for that computation.
Only if the unordered_map does not contain the element or it contains it but the version is outdated, the computation is carried out and then added to the unordered_map. The lookup function looks something like this:
template<typename T> class BufferClass{
long MyVersion;
unordered_map<unsigned long, pair<T,long>> Buffer;
public:
BufferClass(): MyVersion{1} {};
T* GetIfValid(unsigned long index)
{
if (!Buffer.count(index)) return nullptr;
pair <T,long> &x{Buffer.at(index)};
if (x.second!=MyVersion) return nullptr;
return &x.first;
}
/* ...Functions to set elements...*/
}
As you can see, I combined element validity check and retrieval in one function, so that I only need one lookup for both.
The profiler shows most of the computation time is used up in the hash function __constrain_hash related to unordered_map.
What would be the fastest way to store and retrieve values like that? The list of stored indices is expected to be non-continuous (there will be a lot of "holes") and first and last index are also mostly unknown.
T will generally be a "small" data type (like double or complex).
Thanks!
Martin
In your code, there could be two hash lookup in one query, one invoked in count() and the other invoked in at(). It is redundant, use unordered_map::find instead, see here.
Sample code:
const auto iter = Buffer.find(index);
if(iter != Buffer.end()) //Found something, so the return value is not end()
{
return &(iter->first);
}
else return nullptr;
In my opinion, unordered_map is slow but not that slow, for 99.9% usage is fast enough. You may want to check whether you call this function (unnecessarily) too many times. Using other fast implementation is not free, it could bloat your code base, harm your application's compatibility with different host systems or so on. If you think std::unordered_map is unreasonably slow, it is almost always because you got somewhere wrong in your work. (either your estimation or your code implementation)
BTW, another thing to mention: You said T is a small data type right? then return its value instead of pointer to it, it is faster and safer.
One thing that strikes me as odd about your implementation is the following two lines:
if (!Buffer.count(index)) return nullptr;
pair <T,long> &x{Buffer.at(index)};
This code is checking if the key exists, then throws away the result and searches for the same key again with bounds checking to boot. I think you'll find searching once with std::unordered_map<unsigned long, std::pair<T, long>>::find and reusing the result to be preferable:
auto it = Buffer.find(index);
if (it == Buffer.end()) return nullptr;
auto& x = *it;
I'm working on an algorithm and I need to initialize the vector of ints:
std::vector<int> subs(10)
of fixed length with values:
{-inf, +inf, +inf …. }
This is where I read that it is possible to use MAX_INT, but it's not quiete correct because the elements of my vector are supposed to be greater than any possible int value.
I liked overrloading comparison operator method from this answer, but how do you initialize the vector with infinitytype class objects if there are supposed to be an int?
Or maybe you know any better solution?
Thank you.
The solution depends on the assumptions your algorithm (or the implementation of your algorithm) has:
You could increase the element size beyond int (e.g. if your sizeof(int) is 4, use int64_t), and initialize to (int64_t) 1 + std::numeric_limits<int>:max() (and similarly for the negative values). But perhaps your algorithm assumes that you can't "exceed infinity" by adding on multiplying by positive numbers?
You could use an std::variant like other answers suggest, selecting between an int and infinity; but perhaps your algorithm assumes your elements behave like numbers?
You could use a ratio-based "number" class, ensuring it will not get non-integral values except infinity.
You could have your algorithm special-case the maximum and minimum integers
You could use floats or doubles which support -/+ infinity, and restrict them to integrality.
etc.
So, again, it really just depends and there's no one-size-fits-all solution.
AS already said in the comments, you can't have an infinity value stored in int: all values of this type are well-defined and finite.
If you are ok with a vector of something working as an infinite for ints, then consider using a type like this:
struct infinite
{ };
bool operator < (int, infinite)
{
return true;
}
You can use a variant (for example, boost::variant) which supports double dispatching, which stores either an int or an infinitytype (which should store the sign of the infinity, for example in a bool), then implement the comparison operators through a visitor.
But I think it would be simpler if you simply used a double instead of int, and whenever you take out a value that is not infinity, convert it to int. If performance is not that great of an issue, then it will work fine (probably still faster than a variant). If you need great performance, then just use MAX_INT and be done with it.
You are already aware of the idea of an "infinite" type, but that implementation could only contain infinite values. There's another related idea:
struct extended_int {
enum {NEGINF, FINITE, POSINF} type;
int finiteValue; // Only meaningful when type==FINITE
bool operator<(extended_int rhs) {
if (this->type==POSINF) return false;
if (rhs.type==NEGINF) return false;
if (this->type==FINITE && rhs.type==POSINF) return false;
if (this->type==NEGINF && rhs.type==FINITE) return false;
assert(this->type==FINITE && rhs.type==FINITE);
return this->finiteValue < rhs.finiteValue)
}
// Implicitly converting ctor
constexpr extended_int(int value) : type(FINITE), finiteValue(value) { }
// And the two infinities
static constexpr extended_int posinf;
static constexpr extended_int neginf;
}
You now have extended_int(5) < extended_int(6) but also extended_int(5) < extended_int::posinf
I wanted to know if there was a way to find an int in a double type map container. For instance in the following example
std::map<double,double> mt;
mt[2.33] =3.45;
if(mt.find(2)!=mt.end()) //How to do a search for an int instead of a map
{
//Found
}
I wanted to know if there was a way to tell the map to search for an int instead of a double. Since the map would search for a double by default.
One way you can do this is to use lower_bound/upper_bound member functions to get a range of values around your integer, and then check this range manually.
Other way is to use a map with custom comparator that compares keys as integers (see std::map referernce), so you preserve initial key values and can search for integers. But you can't search for doubles then.
Anyways, the task is a bit strange, you probably should reconsider your data structures choice for your problem.
The following should work:
it = mt.lower_bound(2);
However, you need to check the item afterwards;
it->first<3;
must yield true for correct result.
if you are interested only in the integral part (or anything else, as you can use a lambda for that), you might use
auto result = find_if(begin(mt), end(mt),
[&](pair<double, double> p){return (int)(p.first) == 2)}
)
if (result != mt.end())
{
// do your stuff
}
The use case for such a kind of approach still remains unclear...
In this question regarding magic numbers in arrays, paxdiablo says that any number other than -1, 0 and 1 are magic numbers. I know that this is only a guideline but I was wondering if when specifying dimensions of a multi dimensional array if I needed to define constants for which dimension was x, y and z. For example, is this okay:
typedef boost::multi_array<TileID, 3> TileArray3D;
TileArray3D::size_type GoRPG::MapData::GetWidth() {
return mData.shape()[0];
}
or is this preferred:
TileArray3D::size_type GoRPG::MapData::GetWidth() {
return mData.shape()[x_dimension];
}
Thanks in advance, ell.
Edit: The reason the lengths are stored in an array is because I am using Boost::multi_array and that is how they are stored. Apologies for the broken code! I can fix that later - at the moment I just would like to know about the magic numbers, although it is still useful, thank you!
No, magic numbers are non-obvious ones. -1, 0, and 1 are generally obvious, but not always.
In your case, neither of your snippets is obvious to me. Why is the width of your map data stored in an array?
Without seeing more code, I would prefer this:
auto GoRPG::MapData::GetWidth() {
return mData.shape()[width_index];
}
I think you should have a constant defined for this case and not use a number. 0 is not always "magic", in instances where accessing the first element is the intuitive thing to do. Example: checking the first byte of a string, there's no reason to define a c_first_byte constant.)
But in your code, it would be even better to add a method to shape or mData named x() that returns the value that you want. There's no need to expose the internal representation of mData or shape to its users.
The code does not compile though. Since, you haven't used decltype to deduce the return type. Do all the shapes have the same width ?
If yes, then should be OK. If not, then better approach could be to provide a default argument.
auto MapData::GetWidth(const size_t index = 0) -> decltype( mData.shape()[0]) {
return mData.shape()[index];
}
I'm trying to get my head around tuples (thanks #litb), and the common suggestion for their use is for functions returning > 1 value.
This is something that I'd normally use a struct for , and I can't understand the advantages to tuples in this case - it seems an error-prone approach for the terminally lazy.
Borrowing an example, I'd use this
struct divide_result {
int quotient;
int remainder;
};
Using a tuple, you'd have
typedef boost::tuple<int, int> divide_result;
But without reading the code of the function you're calling (or the comments, if you're dumb enough to trust them) you have no idea which int is quotient and vice-versa. It seems rather like...
struct divide_result {
int results[2]; // 0 is quotient, 1 is remainder, I think
};
...which wouldn't fill me with confidence.
So, what are the advantages of tuples over structs that compensate for the ambiguity?
tuples
I think i agree with you that the issue with what position corresponds to what variable can introduce confusion. But i think there are two sides. One is the call-side and the other is the callee-side:
int remainder;
int quotient;
tie(quotient, remainder) = div(10, 3);
I think it's crystal clear what we got, but it can become confusing if you have to return more values at once. Once the caller's programmer has looked up the documentation of div, he will know what position is what, and can write effective code. As a rule of thumb, i would say not to return more than 4 values at once. For anything beyond, prefer a struct.
output parameters
Output parameters can be used too, of course:
int remainder;
int quotient;
div(10, 3, "ient, &remainder);
Now i think that illustrates how tuples are better than output parameters. We have mixed the input of div with its output, while not gaining any advantage. Worse, we leave the reader of that code in doubt on what could be the actual return value of div be. There are wonderful examples when output parameters are useful. In my opinion, you should use them only when you've got no other way, because the return value is already taken and can't be changed to either a tuple or struct. operator>> is a good example on where you use output parameters, because the return value is already reserved for the stream, so you can chain operator>> calls. If you've not to do with operators, and the context is not crystal clear, i recommend you to use pointers, to signal at the call side that the object is actually used as an output parameter, in addition to comments where appropriate.
returning a struct
The third option is to use a struct:
div_result d = div(10, 3);
I think that definitely wins the award for clearness. But note you have still to access the result within that struct, and the result is not "laid bare" on the table, as it was the case for the output parameters and the tuple used with tie.
I think a major point these days is to make everything as generic as possible. So, say you have got a function that can print out tuples. You can just do
cout << div(10, 3);
And have your result displayed. I think that tuples, on the other side, clearly win for their versatile nature. Doing that with div_result, you need to overload operator<<, or need to output each member separately.
Another option is to use a Boost Fusion map (code untested):
struct quotient;
struct remainder;
using boost::fusion::map;
using boost::fusion::pair;
typedef map<
pair< quotient, int >,
pair< remainder, int >
> div_result;
You can access the results relatively intuitively:
using boost::fusion::at_key;
res = div(x, y);
int q = at_key<quotient>(res);
int r = at_key<remainder>(res);
There are other advantages too, such as the ability to iterate over the fields of the map, etc etc. See the doco for more information.
With tuples, you can use tie, which is sometimes quite useful: std::tr1::tie (quotient, remainder) = do_division ();. This is not so easy with structs. Second, when using template code, it's sometimes easier to rely on pairs than to add yet another typedef for the struct type.
And if the types are different, then a pair/tuple is really no worse than a struct. Think for example pair<int, bool> readFromFile(), where the int is the number of bytes read and bool is whether the eof has been hit. Adding a struct in this case seems like overkill for me, especially as there is no ambiguity here.
Tuples are very useful in languages such as ML or Haskell.
In C++, their syntax makes them less elegant, but can be useful in the following situations:
you have a function that must return more than one argument, but the result is "local" to the caller and the callee; you don't want to define a structure just for this
you can use the tie function to do a very limited form of pattern matching "a la ML", which is more elegant than using a structure for the same purpose.
they come with predefined < operators, which can be a time saver.
I tend to use tuples in conjunction with typedefs to at least partially alleviate the 'nameless tuple' problem. For instance if I had a grid structure then:
//row is element 0 column is element 1
typedef boost::tuple<int,int> grid_index;
Then I use the named type as :
grid_index find(const grid& g, int value);
This is a somewhat contrived example but I think most of the time it hits a happy medium between readability, explicitness, and ease of use.
Or in your example:
//quotient is element 0 remainder is element 1
typedef boost:tuple<int,int> div_result;
div_result div(int dividend,int divisor);
One feature of tuples that you don't have with structs is in their initialization. Consider something like the following:
struct A
{
int a;
int b;
};
Unless you write a make_tuple equivalent or constructor then to use this structure as an input parameter you first have to create a temporary object:
void foo (A const & a)
{
// ...
}
void bar ()
{
A dummy = { 1, 2 };
foo (dummy);
}
Not too bad, however, take the case where maintenance adds a new member to our struct for whatever reason:
struct A
{
int a;
int b;
int c;
};
The rules of aggregate initialization actually mean that our code will continue to compile without change. We therefore have to search for all usages of this struct and updating them, without any help from the compiler.
Contrast this with a tuple:
typedef boost::tuple<int, int, int> Tuple;
enum {
A
, B
, C
};
void foo (Tuple const & p) {
}
void bar ()
{
foo (boost::make_tuple (1, 2)); // Compile error
}
The compiler cannot initailize "Tuple" with the result of make_tuple, and so generates the error that allows you to specify the correct values for the third parameter.
Finally, the other advantage of tuples is that they allow you to write code which iterates over each value. This is simply not possible using a struct.
void incrementValues (boost::tuples::null_type) {}
template <typename Tuple_>
void incrementValues (Tuple_ & tuple) {
// ...
++tuple.get_head ();
incrementValues (tuple.get_tail ());
}
Prevents your code being littered with many struct definitions. It's easier for the person writing the code, and for other using it when you just document what each element in the tuple is, rather than writing your own struct/making people look up the struct definition.
Tuples will be easier to write - no need to create a new struct for every function that returns something. Documentation about what goes where will go to the function documentation, which will be needed anyway. To use the function one will need to read the function documentation in any case and the tuple will be explained there.
I agree with you 100% Roddy.
To return multiple values from a method, you have several options other than tuples, which one is best depends on your case:
Creating a new struct. This is good when the multiple values you're returning are related, and it's appropriate to create a new abstraction. For example, I think "divide_result" is a good general abstraction, and passing this entity around makes your code much clearer than just passing a nameless tuple around. You could then create methods that operate on the this new type, convert it to other numeric types, etc.
Using "Out" parameters. Pass several parameters by reference, and return multiple values by assigning to the each out parameter. This is appropriate when your method returns several unrelated pieces of information. Creating a new struct in this case would be overkill, and with Out parameters you emphasize this point, plus each item gets the name it deserves.
Tuples are Evil.