I need a function which can take string as an input and generate hash code out of it. Currently, in c++ we have std::hash to do this but this returns the hash code of type size_t( unsigned long long ). Here, I need a hash function which can give me the hash code of type signed long long.
I have also tried using the modulus operator but that gives me negative values and those are not reliable. Hence, pls advise me on the hash function I can use in C++ so that I get hash code of type signed long long.
I need a hash function which can give me the hash code of type signed long long.
You could just set to 0 the most significant bit, unless your architecture have a weird internal representation of integral types, this would produce a positive number when converted to signed type of the same size.
template <class S>
constexpr size_t clamp_to_positive(size_t value)
{
return value & (std::numeric_limits<size_t>::max() >>
(std::numeric_limits<size_t>::digits - std::numeric_limits<S>::digits)
);
}
You can then call it as
auto my_hash = clamp_to_positive<long long>(std::hash<std::string>{}(source_string));
As noted by Ben Voigt, though, the easiest way is to just right-shift by one the unsigned value.
auto my_hash = static_cast<long long>(std::hash<std::string>{}(source_string) >> 1);
Another way to tackle this problem is to force the modulo operation to always return a positive value.
// Evaluates abs(x) as an unsigned type, avoiding corner case overflow.
template <class T>
constexpr auto unsigned_abs(T x)
{
static_assert(std::is_integral_v<T>);
if constexpr ( std::is_unsigned_v<T> ) {
return x;
}
return x < 0
? ~static_cast<std::make_unsigned_t<T>>(x) + 1
: static_cast<std::make_unsigned_t<T>>(x);
}
// Evaluates abs(x) % abs(y) avoiding overflows. The result has the same type
// of y and it's always 0 <= result < y. It has UB when y == 0.
template <class X, class Y>
auto absolute_remainder(X x, Y y)
{
static_assert( std::is_integral_v<X> && std::is_integral_v<Y> );
return static_cast<Y>(unsigned_abs(x) % unsigned_abs(y));
}
Whether you really need one of those or maybe a change in the current design of your program is left to you to figure out.
I believe that when you add two unsigned int values together, the returned value's data type will be an unsigned int.
But the addition of two unsigned int values may return a value that is larger than an unsigned int.
So why does unsigned int + unsigned int return an unsigned int and not some other larger data type?
This would have truly evil consequences:
Would you really want 1 + 1 to be a long type? And (1 + 1) + (1 + 1) would become a long long type? It would wreak havoc with the type system.
It's also possible, for example, that short, int, long, and long long are all the same size, and similarly for the unsigned versions.
So the implicit type conversion rules as they stand are probably the best solution.
You could always force the issue with something like
0UL + "unsigned int" + "unsigned int"
Let's imagine that we have a language where adding two integers results in a bigger type. So, adding two 32 bit numbers results in a 64 bit number. What would happen in expression the following expression?
auto x = a + b + c + d + e + f + g;
a + b is 64 bits. a + b + c is 128 bits. a + b + c + d is 256 bits... This becomes unmanageable very fast. Most processors don't support operations with so wide operands.
The type of a varaible does not only determine the range of values it can hold, but sloppy speaking, also how the operations are realized. If you add two unsigned values you get an unsigned result. If you want a different type as result (eg long unsigned) you could cast:
unsigned x = 42;
unsigned y = 42;
long unsigned z = static_cast<long unsigned>(x) + static_cast<long unsigned>(y);
Actually the real reason is: It is defined like that. In particular unsigned overflow is well defined in C++ to wrap around and using a wider type for the result of unsigned operators would break that behaviour.
As a contrived example, consider this loop:
for (unsigned i = i0; i != 0; ++i) {}
Note the condition! Lets assume i0 > 0, then it can only ever be false when incrementing the maximum value of unsigned results in 0. This code is obfuscated and should probably make you raise an eyebrow or two in a code-review, though it is perfectly legal. Making the result type adjust depending on the value of the result, or choosing the result type such that overflow cannot happen would break this behaviour.
Because a variable + a same type variable can be only equal to that type variable ,
(well in some cases it will but not in your case)
example:
int + int = int a int plus another int cannot be equal to a float because it dont have the properties of a float.
I hope this answers your question bye!
here is the code:
test.cpp
unsigned short x;
bool y;
if ((x==1)&& y)
{
...
}
else
{
...
}
I got a lint message:
Note 912 Implicit binary conversion from int
to unsigned int [MISRA Rule 48]
why? and how to avoid this?
You are comparing x which is unsigned short and 1 which is int by default. Hence you got your implicit binary conversion thing.
Give your compiler a hint that you actually want to compare x with another unsigned value:
if ((x==1U) && y)
try this
if ( ( static_cast<unsigned int>(1) == x ) && y)
because 1 is treated as int. Use
unsigned int x
or cast
It is not clear which version of MISRA you are using. You should be using MISRA-C++ when writing C++ code, everything else will be a violation of the MISRA rules. Compliance against MISRA-C++ can obviously not be checked with a MISRA-C checker.
Anyway, assuming you have a system with 32 bit integers, this should solve the problem no matter MISRA version:
if ( ( static_cast<uint32_t>(x) == 1u ) && y) // compliant
The important part to understand is how implicit promotions work and how to avoid them:
Casting the 1 literal to unsigned short will not solve anything. Such a cast is completely superfluous as the operand will get immediately integer promoted back to int anyway.
if ( ( x == static_cast<unsigned short>(1) ) && y) // not compliant
unsigned short ushort=1u; if ( ( x == ushort ) && y) // not compliant
Casting the 1 literal to unsigned intor merely chaning it to 1u (same thing) will make the program behave as expected, but it will not solve the MISRA warning. Because you still have an implicit type promotion of the x operand, which is the MISRA violation.
if ( ( x == 1u ) && y) // not compliant
if ( ( static_cast<unsigned int>(1) == x ) && y) // not compliant
Study integer promotion and the usual arithmetic conversions.
in c++, is it okay to compare an int to a char because of implicit type casting? Or am I misunderstanding the concept?
For example, can I do
int x = 68;
char y;
std::cin >> y;
//Assuming that the user inputs 'Z';
if(x < y)
{
cout << "Your input is larger than x";
}
Or do we need to first convert it to an int?
so
if(x < static_cast<int>(y))
{
cout << "Your input is larger than x";
}
The problem with both versions is that you cannot be sure about the value that results from negative/large values (the values that are negative if char is indeed a signed char). This is implementation defined, because the implementation defines whether char means signed char or unsigned char.
The only way to fix this problem is to cast to the appropriate signed/unsigned char type first:
if(x < (signed char)y)
or
if(x < (unsigned char)y)
Omitting this cast will result in implementation defined behavior.
Personally, I generally prefer use of uint8_t and int8_t when using chars as numbers, precisely because of this issue.
This still assumes that the value of the (un)signed char is within the range of possible int values on your platform. This may not be the case if sizeof(char) == sizeof(int) == 1 (possible only if a char is 16 bit!), and you are comparing signed and unsigned values.
To avoid this problem, ensure that you use either
signed x = ...;
if(x < (signed char)y)
or
unsigned x = ...;
if(x < (unsigned char)y)
Your compiler will hopefully complain with warning about mixed signed comparison if you fail to do so.
Your code will compile and work, for some definition of work.
Still you might get unexpected results, because y is a char, which means its signedness is implementation defined. That combined with unknown size of int will lead to much joy.
Also, please write the char literals you want, don't look at the ASCII table yourself. Any reader (you in 5 minutes) will be thankful.
Last point: Avoid gratuituous cast, they don't make anything better and may hide problems your compiler would normally warn about.
Yes you can compare an int to some char, like you can compare an int to some short, but it might be considered bad style. I would code
if (x < (int)y)
or like you did
if (x < static_cast<int>(y))
which I find a bit too verbose for that case....
BTW, if you intend to use bytes not as char consider also the int8_t type (etc...) from <cstdint>
Don't forget that on some systems, char are signed by default, on others they are unsigned (and you could explicit unsigned char vs signed char).
The code you suggest will compile, but I strongly recommend the static_cast version. Using static_cast you will help the reader understand what do you compare to an integer.
In case of integer overflows what is the result of (unsigned int) * (int) ? unsigned or int? What type does the array index operator (operator[]) take for char*: int, unsigned int or something else?
I was auditing the following function, and suddenly this question arose. The function has a vulnerability at line 17.
// Create a character array and initialize it with init[]
// repeatedly. The size of this character array is specified by
// w*h.
char *function4(unsigned int w, unsigned int h, char *init)
{
char *buf;
int i;
if (w*h > 4096)
return (NULL);
buf = (char *)malloc(4096+1);
if (!buf)
return (NULL);
for (i=0; i<h; i++)
memcpy(&buf[i*w], init, w); // line 17
buf[4096] = '\0';
return buf;
}
Consider both w and h are very large unsigned integers. The multiplication at line 9 have a chance to pass the validation.
Now the problem is at line 17. Multiply int i with unsigned int w: if the result is int, it is possible that the product is negative, resulting in accessing a position that is before buf. If the result is unsigned int, the product will always be positive, resulting in accessing a position that is after buf.
It's hard to write code to justify this: int is too large. Does anyone has ideas on this?
Is there any documentation that specifies the type of the product? I have searched for it, but so far haven't found anything.
I suppose that as far as the vulnerability is concerned, whether (unsigned int) * (int) produces unsigned int or int doesn't matter, because in the compiled object file, they are just bytes. The following code works the same no matter the type of the product:
unsigned int x = 10;
int y = -10;
printf("%d\n", x * y); // print x * y in signed integer
printf("%u\n", x * y); // print x * y in unsigned integer
Therefore, it does not matter what type the multiplication returns. It matters that whether the consumer function takes int or unsigned.
The question here is not how bad the function is, or how to improve the function to make it better. The function undoubtedly has a vulnerability. The question is about the exact behavior of the function, based on the prescribed behavior from the standards.
do the w*h calculation in long long, check if bigger than MAX_UINT
EDIT : alternative : if overflown (w*h)/h != w (is this always the case ?! should be, right ?)
To answer your question: the type of an expression multiplying an int and an unsigned int will be an unsigned int in C/C++.
To answer your implied question, one decent way to deal with possible overflow in integer arithmetic is to use the "IntSafe" set of routines from Microsoft:
http://blogs.msdn.com/michael_howard/archive/2006/02/02/523392.aspx
It's available in the SDK and contains inline implementations so you can study what they're doing if you're on another platform.
Ensure that w * h doesn't overflow by limiting w and h.
The type of w*i is unsigned in your case. If I read the standard correctly, the rule is that the operands are converted to the larger type (with its signedness), or unsigned type corresponding to the signed type (which is unsigned int in your case).
However, even if it's unsigned, it doesn't prevent the wraparound (writing to memory before buf), because it might be the case (on i386 platform, it is), that p[-1] is the same as p[-1u]. Anyway, in your case, both buf[-1] and buf[big unsigned number] would be undefined behavior, so the signed/unsigned question is not that important.
Note that signed/unsigned matters in other contexts - eg. (int)(x*y/2) gives different results depending on the types of x and y, even in the absence of undefined behaviour.
I would solve your problem by checking for overflow on line 9; since 4096 is a pretty small constant and 4096*4096 doesn't overflow on most architectures (you need to check), I'd do
if (w>4096 || h>4096 || w*h > 4096)
return (NULL);
This leaves out the case when w or h are 0, you might want to check for it if needed.
In general, you could check for overflow like this:
if(w*h > 4096 || (w*h)/w!=h || (w*h)%w!=0)
In C/C++ the p[n] notation is really a shortcut to writting *(p+n), and this pointer arithmetic takes into account the sign. So p[-1] is valid and refers to the value immediately before *p.
So the sign really matters here, the result of arithmetic operator with integer follow a set of rules defined by the standard, and this is called integer promotions.
Check out this page: INT02-C. Understand integer conversion rules
2 changes make it safer:
if (w >= 4096 || h >= 4096 || w*h > 4096) return NULL;
...
unsigned i;
Note also that it's not less a bad idea to write to or read from past the buffer end. So the question is not whether iw may become negative, but whether 0 <= ih +w <= 4096 holds.
So it's not the type that matters, but the result of h*i.
For example, it doesn't make a difference whether this is (unsigned)0x80000000 or (int)0x80000000, the program will seg-fault anyway.
For C, refer to "Usual arithmetic conversions" (C99: Section 6.3.1.8, ANSI C K&R A6.5) for details on how the operands of the mathematical operators are treated.
In your example the following rules apply:
C99:
Otherwise, if the type of the operand
with signed integer type can represent
all of the values of the type of the
operand with unsigned integer type,
then the operand with unsigned integer
type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted
to the unsigned integer type
corresponding to the type of the
operand with signed integer type.
ANSI C:
Otherwise, if either operand is unsigned int, the other is converted to unsigned int.
Why not just declare i as unsigned int? Then the problem goes away.
In any case, i*w is guaranteed to be <= 4096, as the code tests for this, so it's never going to overflow.
memcpy(&buf[iw > -1 ? iw < 4097? iw : 0 : 0], init, w);
I don't think the triple calculation of iw does degrade the perfomance)
w*h could overflow if w and/or h are sufficiently large and the following validation could pass.
9. if (w*h > 4096)
10. return (NULL);
On int , unsigned int mixed operations, int is elevated to unsigned int, in which case, a negative value of 'i' would become a large positive value. In that case
&buf[i*w]
would be accessing a out of bound value.
Unsigned arithmetic is done as modular (or wrap-around), so the product of two large unsigned ints can easily be less than 4096. The multiplication of int and unsigned int will result in an unsigned int (see section 4.5 of the C++ standard).
Therefore, given large w and a suitable value of h, you can indeed get into trouble.
Making sure integer arithmetic doesn't overflow is difficult. One easy way is to convert to floating-point and doing a floating-point multiplication, and seeing if the result is at all reasonable. As qwerty suggested, long long would be usable, if available on your implementation. (It's a common extension in C90 and C++, does exist in C99, and will be in C++0x.)
There are 3 paragraphs in the current C1X draft on calculating (UNSIGNED TYPE1) X (SIGNED TYPE2) in 6.3.1.8 Usual arithmetic coversions, N1494,
WG 14: C - Project status and milestones
Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.
Otherwise, if the type of the operand with signed integer type can represent
all of the values of the type of the operand with unsigned integer type, then
the operand with unsigned integer type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
So if a is unsigned int and b is int, parsing of (a * b) should generate code (a * (unsigned int)b). Will overflow if b < 0 or a * b > UINT_MAX.
If a is unsigned int and b is long of greater size, (a * b) should generate ((long)a * (long)b). Will overflow if a * b > LONG_MAX or a * b < LONG_MIN.
If a is unsigned int and b is long of the same size, (a * b) should generate ((unsigned long)a * (unsigned long)b). Will overflow if b < 0 or a * b > ULONG_MAX.
On your second question about the type expected by "indexer", the answer appears "integer type" which allows for any (signed) integer index.
6.5.2.1 Array subscripting
Constraints
1 One of the expressions shall have type ‘‘pointer to complete object type’’, the other
expression shall have integer type, and the result has type ‘‘type’’.
Semantics
2 A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
It is up to the compiler to perform static analysis and warn the developer about possibility of buffer overrun when the pointer expression is an array variable and the index may be negative. Same goes about warning on possible array size overruns even when the index is positive or unsigned.
To actually answer your question, without specifying the hardware you're running on, you don't know, and in code intended to be portable, you shouldn't depend on any particular behavior.