In std::holds_alternative and std::get_if documentation, it does not say what's the complexity requirements of the operations. Are the two functions always O(1), or it's linear in terms of number of types in the std::variant? Another way to ask is is there any difference in terms of performance of the two operations on variants that has many member types vs. very few member types.
The most reasonable implementation of a variant is an index integer denoting the type (also accessible through the index() method) and aligned storage for the held value.
Since the type argument for holds_alternative and get_if is a template, the type to index computation can be done at compile time. Therefore the runtime check is performed in const time. Just a simple integer comparison.
Some variant methods describe time complexity. For example visit:
For n ≤ 1 [n = number of visited variants], the invocation of the callable object is implemented in constant time, i.e., for n = 1, it does not depend on the number of alternative types of Variants0. For n > 1, the invocation
of the callable object has no complexity requirements
This backs up my assessment
Related
I have a very simple problem I can't solve.
I'm using Numba and Cuda.
I have a list T=[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0] and I want a tuple with the elements of the list, like this:
C=(1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0).
In Python I would code C=tuple(T), but I can't with Numba.
I tried these solutions but none of them work because you can't modify the type of a variable inside a loop with Numba.
Important My list has a length which is a multiple of 3 and I use this knowledge for my algorithms.
Code
First algorithm, recursively, it works by giving the algorithm L=[(1.0,),(2.0,),(3.0,),(4.0,),(5.0,),...,(9.0,)]
#njit
def list2tuple(L):
n=len(L)
if n == 1:
return L[0]
else:
C=[]
for i in range(0,n-2,3):
c=L[i]+L[i+1]+L[i+2]
C.append(c)
return list2tuple(C)
Problem: It enters an infinite loop and I must stop the kernel. It works in basic Python.
algorithm 2: it works by giving T=[1.0,2.0,3.0,...,9.0]
#njit
def list2tuple2(T):
L=[]
for i in range(len(T)):
a=(T[i],)
L.append(a)
for k in range(len(T)//3-1):
n=len(L)
C=[]
for j in range(0,n-2,3):
c=L[j]+L[j+1]+L[j+2]
C.append(c)
L=C
return L[0]
Problem When C=[(1.0,2.0,3.0),(4.0,5.0,6.0),(7.0,8.0,9.0)], you can't say L=C because L = [(1.0,),(2.0,),(3.0,),....(9.0,)] is List(Unituple(float64x1)) and can't be unified with List(Unituple(float64x3)).
I can't find a solution for this problem.
Tuples and list are very different in Numba: a list is an unbounded sequence of items of the same type while a tuple is a fixed-length sequence of items of possibly different type. As a result, the type of a list of 2 element can be defined as List[ItemType] while a tuple of 2 items can be Tuple[ItemType1, ItemType2] (where ItemType1 and ItemType2 may be the same). A list of 3 items still have the same type (List[ItemType]). However, a tuple of 3 element is a completely different type: Tuple[ItemType1, ItemType2, ItemType3]. Numba defines a UniTuple type to easily create a N-ary tuple where each item is of the same type, but this is only for convenience. Internally, Numba (and more specifically the JIT compiler: LLVM-Lite) needs to iterates over all the types and generate specific functions for each tuple type.
As a result, creating a recursive function that works on growing tuples is not possible because Numba cannot generate all the possible tuple type so to be able to just compile all the functions (one by tuple type). Indeed, the maximum length of the tuple is only known at runtime.
More generally, you cannot generate a N-ary tuple where N is variable in a Numba function. However, you can instead to generate and compile a function for a specific N. If N is very small (e.g. <15), this is fine. Otherwise, it is really a bad idea to write/generate such a function. Indeed, for the JIT, this is equivalent to generate a function with N independent parameters and when N is big, the compilation time can quickly become huge and actually be much slower than the actual computation (many compiler algorithms have theoretically a pretty big complexity, for example the register allocation which is known to be NP-complete although heuristics are relatively fast in most practical cases). Not to mention the required memory to generate such a function can also be huge.
I have following problem. I have some code that uses Eigen3. Eigen3 uses int or long int for indices. At some points in the code I have to store values from the eigen-arrays in a std::vector.
Here is some example:
std::vector myStdVector;
Eigen::VectorXd myEigen;
....
for(size_t i=0; i<myStdVector.size(); i++)
{
myStdVector[i] = myEigen(i);
}
Here I get the compiler warning:
warning: implicit conversion loses integer precision: 'const size_t'
(aka 'const unsigned long') to 'int'
So of course I could add a static_cast<int>(i) to all the functions where such a scenario occurs, but I wonder if there is a better way to deal with such things. I guess this happens with many other "library-mixing" too.
In this specific case, I would suggest using the smaller container's index type; this would be Eigen's index type, as determined by your Eigen::VectorXd. Ideally, it would be used as Eigen::Index, for forwards-compatibility.
It might also be worth looking into how Eigen defines its index type. In particular, you are allowed to redefine it if necessary, by changing a preprocessor directive, by #defineing the symbol EIGEN_DEFAULT_DENSE_INDEX_TYPE; it defaults to std::ptrdiff_t.
[Note, however, that in my own code, I generally prefer to use the larger index (in this case, size_t), but do range checks as if using the smaller of the index types if applicable (in this case, Eigen::Index). This is just a personal preference, however, and not necessarily what I consider to be the best option.]
Generally, when trying to choose the best index type, I would suggest that you look at their available ranges. First, if one or more of the potential types are signed, and if one or more signed potential type allows negative values*, you'll want to eliminate any unsigned types, especially ones that are larger than the largest signed type. Then, you'd look at your use case, eliminate any types that aren't viable for your intended purpose, and choose the best fit out of the remaining potential types.
In your case specifically, you want to store values from an Eigen3 container in an STL container, where the Eigen3 container is indexed with ptrdiff_t and (as mentioned in your comment) to your knowledge only uses non-negative index values. In this case, either is a viable option; the range of non-negative index values provided by ptrdiff_t fits nicely inside size_t's range, and the loop condition will be determined by your VectorXd (and thus is also guaranteed to fit inside the Eigen3 container's index type). Thus, both potential types are viable choices. As the additional range provided by size_t is currently unnecessary, I would consider the index type provided by your Eigen setup to be slightly better suited to the task at hand.
*: While it's typically safe to assume that index values will always be positive due to how indexing works, I can see a few cases where allowing negatives would be beneficial. These are typically rare, though.
Note that I assumed the loop condition i<myStdVector.size() in your example code was a typo, due to not lining up with the initial description or the operation performed inside the loop body. If I was incorrect, then this decision becomes more complex.
I would like to use -1 to indicate a size that has not yet been computed:
std::vector<std::size_t> sizes(nResults, -1);
and I was wondering why isn't there a more expressive way:
std::vector<std::size_t> sizes(nResults, std::vector<std::size_t>::npos);
It basically comes down to a fairly simple fact: std::string includes searching capability, and that leads to a requirement for telling the caller that a search failed. std::string::npos fulfills that requirement.
std::vector doesn't have any searching capability of its own, so it has no need for telling a caller that a search has failed. Therefore, it has no need for an equivalent of std::string::npos.
The standard algorithms do include searching, so they do need to be able to tell a caller that a search has failed. They work with iterators, not directly with collections, so they use a special iterator (one that should never be dereferenced) for this purpose. As it happens, std::vector::end() returns an iterator suitable for the purpose, so that's used--but this is more or less incidental. It would be done without (for example) any direct involvement by std::vector at all.
From cppreference:
std::size_t is the unsigned integer type of the result of the sizeof
operator as well as the sizeof operator and the alignof operator
(since C++11)....
...std::size_t can store the maximum size of a
theoretically possible object of any type...
size_t is unsigned, and can't represent -1. In reality if you were to attempt to set your sizes to -1, you would actually be setting them to the maximum value representable by a size_t.
Therefore you should not use size_t to represent values which include the possible size of a type in addition to a value indicating that no size has been computed, because any value outside the set of possible sizes can not be represented by a size_t.
You should use a different type which is capable of expressing all of the possible values you wish to represent. Here is one possibility:
struct possibly_computed_size_type
{
size_t size;
bool is_computed;
};
Of course, you'll probably want a more expressive solution than this, but the point is that at least possibly_computed_size_type is capable of storing all of the possible values we wish to express.
One possibility is to use an optional type. An optional type can represent the range of values of a type, and an additional value meaning 'the object has no value'. The boost library provides such a type.
The standard library also provides an optional type as an experimental feature. Here is an example I created using this type:
http://ideone.com/4J0yfe
In the local object there is a collate facet.
The collate facet has a hash method that returns a long.
http://www.cplusplus.com/reference/std/locale/collate/hash/
Two questions:
Does anybody know what hashing method is used.
I need a 32bit value.
If my long is longer than 32 bits, does anybody know about techniques for folding the hash into a shorter version. I can see that if done incorrectly that folding could generate lots of clashes (and though I can cope with clashes as I need to take that into account anyway, I would prefer if they were minimized).
Note:
I can't use C++0x features
Boost may be OK.
No, nobody really knows -- it can vary from one implementation to another. The primary requirements are (N3092, §20.8.15):
For all object types Key for which there exists a specialization hash, the instantiation hash shall:
satisfy the Hash requirements (20.2.4), with Key as the function call argument type, the DefaultConstructible requirements (33), the CopyAssignable requirements (37),
be swappable (20.2.2) for lvalues,
provide two nested types result_type and argument_type which shall be synonyms for size_t and Key, respectively,
satisfy the requirement that if k1 == k2 is true, h(k1) == h(k2) is also true, where h is an object of type hash and k1 and k2 are objects of type Key.
and (N3092, §20.2.4):
A type H meets the Hash requirements if:
it is a function object type (20.8),
it satisifes the requirements of CopyConstructible and Destructible (20.2.1),
the expressions shown in the following table are valid and have the indicated semantics, and
it satisfies all other requirements in this subclause.
§20.8.15 covers the requirements on the result of hashing, §20.2.4 on the hash itself. As you can see, however, both are pretty general. The table that's mentioned basically covers three more requirements:
A hash function must be "pure" (i.e., the result depends only on the input, not any context, history, etc.)
The function must not modify the argument that's passed to it, and
It must not throw any exceptions.
Exact algorithms definitely are not specified though -- and despite the length, most of the requirements above are really just stating requirements that (at least to me) seem pretty obvious. In short, the implementation is free to implement hashing nearly any way it wants to.
If the implementation uses a reasonable hash function, there should be no bits in the hash value that have any special correlation with the input. So if the hash function gives you 64 "random" bits, but you only want 32 of them, you can just take the first/last/... 32 bits of the value as you please. Which ones you take doesn't matter since every bit is as random as the next one (that's what makes a good hash function).
So the simplest and yet completely reasonable way to get a 32-bit hash value would be:
int32_t value = hash(...);
(Of course this collapses groups of 4 billion values down to one, which looks like a lot, but that can't be avoided if there are four billion times as many source values as target values.)
I know that the individual map queries take a maximum of log(N) time. However I was wondering, I have seen a lot of examples that use strings as map keys. What is the performance cost of associating a std::string as a key to a map instead of an int for example ?
std::map<std::string, aClass*> someMap; vs std::map<int, aClass*> someMap;
Thanks!
Analyzing algorithms for asymptotic performance is working on the operations that must be performed and the cost they add to the equation. For that you need to first know what are the performed operations and then evaluate its costs.
Searching for a key in a balanced binary tree (which maps happen to be) require O( log N ) complex operations. Each of those operations implies comparing the key for a match and following the appropriate pointer (child) if the key did not match. This means that the overall cost is proportional to log N times the cost of those two operations. Following pointers is a constant time operation O(1), and comparing keys depend on the key. For an integer key, comparisons are fast O(1). Comparing two strings is another story, it takes time proportional to the sizes of the strings involved O(L) (where I have used intentionally L as the length of string parameter instead of the more common N.
When you sum all the costs up you get that using integers as keys the total cost is O( log N )*( O(1) + O(1) ) that is equivalent to O( log N ). (O(1) gets hidden in the constant that the O notation silently hides.
If you use strings as keys, the total cost is O( log N )*( O(L) + O(1) ) where the constant time operation gets hidden by the more costly linear operation O(L) and can be converted into O( L * log N ). That is, the cost of locating an element in a map keyed by strings is proportional to the logarithm of the number of elements stored in the map times the average length of the strings used as keys.
Note that the big-O notation is most appropriate to use as an analysis tool to determine how the algorithm will behave when the size of the problem grows, but it hides many facts underneath that are important for raw performance.
As the simplest example, if you change the key from a generic string to an array of 1000 characters you can hide that cost within the constant dropped out of the notation. Comparing arrays of 1000 chars is a constant operation that just happens to take quite a bit of time. With the asymptotic notation that would just be a O( log N ) operation, as with integers.
The same happens with many other hidden costs, as the cost of creation of the elements that is usually considered as a constant time operation, just because it does not depend on the parameters to your problem (the cost of locating the block of memory in each allocation does not depend on your data set, but rather on memory fragmentation that is outside of the scope of the algorithm analysis, the cost of acquiring the lock inside malloc as to guarantee that not two processes try to return the same block of memory depends on the contention of the lock that depends itself number of processors, processes and how much memory requests they perform..., again out of the scope of the algorithm analysis). When reading costs in the big-O notation you must be conscious of what it really means.
In addition to the time complexity from comparing strings already mentioned, a string key will also cause an additional memory allocation each time an item is added to the container. In certain cases, e.g. highly parallel systems, a global allocator mutex can be a source of performance problems.
In general, you should choose the alternative that makes the most sense in your situation, and only optimize based on actual performance testing. It's notoriously hard to judge what will be a bottleneck.
The cost difference will be linked to the difference in cost between comparing two ints versus comparing two strings.
When comparing two strings, you have to dereference a pointer to get to the first chars, and compare them. If they are identical, you have to compare the second chars, and so on. If your strings have a long common prefix, this can slow down the process a bit. It is very unlikely to be as fast as comparing ints, though.
The cost is ofcourse that ints can be compared in real O(1) time whereas strings are compared in O(n) time (n being the maximal shared prefix). Also, the storage of strings consumes more space than that of integers.
Other than these apparent differences, there's no major performance cost.
First of all, I doubt that in a real application, whether you have string keys or int keys makes any noticeable difference. Profiling your application will tell you if it matters.
If it does matter, you could change your key to be something like this (untested):
class Key {
public:
unsigned hash;
std::string s;
int cmp(const Key& other) {
int diff = hash - other.hash;
if (diff == 0)
diff = strcmp(s, other.s);
return diff;
}
Now you're doing an int comparison on the hashes of two strings. If the hashes are different, the strings are certainly different. If the hashes are the same, you still have to compare the strings because of the Pigeonhole Principle.
Simple example with just accessing values in two maps with equal number of keys - one int keys another strings of the same int values takes 8 times longer with strings.