Related
I don't know any other way to return the size of a vector other than the .size() command, and it works very well, but, it return a variable of type long long unsigned int, and this in very cases are very good, but I'm sure my program will never have a vector so big that it need all that size of return, short int is more than enough.
I know, for today's computers those few enused bytes are irrelevant, but I don't like to leave these "loose ends" even if they are small, and whem I was programming, I came across some details that bothered me.
Look at these examples:
for(short int X = 0 ; X < Vector.size() ; X++){
}
compiling this, I receive this warning:
warning: comparison of integer expressions of different signedness: 'short int' and 'std::vector<unsigned char>::size_type' {aka 'long long unsigned int'} [-Wsign-compare]|
this is because the .size() return value type is different from the short int I'm compiling, "X" is a short int, and Vector.size() return a long long unsigned int, was expected, so if I do this:
for(size_t X = 0 ; X < Vector.size() ; X++){
}
the problem is gone, but by doing this, I'm creating a long long unsigned int in variable size_t and I'm returning another variable long long unsigned int, so, my computer allocale two variables long long unsigned int, so, what I do for returning a simple short int? I don't need anything more than this, long long unsigned int is overkill, so I did this:
for(short int X = 0 ; X < short(Vector.size()) ; X++){
}
but... how is this working? short int X = 0 is allocating a short int, nothing new, but what about short (Vector.size()), is the computer allocating a long unsigned int and converting it to a short int? or is the compiler "changing" the return of the .size() function by making it naturally return a short int and, in this case, not allocating a long long unsined int? because I know the compilers are responsible for optimizing the code too, is there any "problem" or "detail" when using this method? since I rarely see anyone using this, what exactly is this short() doing in memory allocation? where can i read more about it?
(thanks to everyone who responded)
Forget for a moment that this involves a for loop; that's important for the underlying code, but it's a distraction from what's going on with the conversion.
short X = Vector.size();
That line calls Vector.size(), which returns a value of type std::size_t. std::size_t is an unsigned type, large enough to hold the size of any object. So it could be unsigned long, or it could be unsigned long long. In any event, it's definitely not short. So the compiler has to convert that value to short, and that's what it does.
Most compilers these days don't trust you to understand what this actually does, so they warn you. (Yes, I'm rather opinionated about compilers that nag; that doesn't change the analysis here). So if you want to see that warning (i.e., you don't turn it off), you'll see it. If you want to write code that doesn't generate that warning, then you have to change the code to say "yes, I know, and I really mean it". You do that with a cast:
short X = short(Vector.size());
The cast tells the compiler to call Vector.size() and convert the resulting value to short. The code then assigns the result of that conversion to X. So, more briefly, in this case it tells the compiler that you want it to do exactly what it would have done without the cast. The difference is that because you wrote a cast, the compiler won't warn you that you might not know what you're doing.
Some folks prefer to write that cast is with a static_cast:
short X = static_cast<short>(Vector.size());
That does the same thing: it tells the compiler to do the conversion to short and, again, the compiler won't warn you that you did it.
In the original for loop, a different conversion occurs:
X < Vector.size()
That bit of code calls Vector.size(), which still returns an unsigned type. In order to compare that value with X, the two sides of the < have to have the same type, and the rules for this kind of expression require that X gets promoted to std::size_t, i.e., that the value of X gets treated as an unsigned type. That's okay as long as the value isn't negative. If it's negative, the conversion to the unsigned type is okay, but it will produce results that probably aren't what was intended. Since we know that X is not negative here, the code works perfectly well.
But we're still in the territory of compiler nags: since X is signed, the compiler warns you that promoting it to an unsigned type might do something that you don't expect. Again, you know that that won't happen, but the compiler doesn't trust you. So you have to insist that you know what you're doing, and again, you do that with a cast:
X < short(Vector.size())
Just like before, that cast converts the result of calling Vector.size() to short. Now both sides of the < are the same type, so the < operation doesn't require a conversion from a signed to an unsigned type, so the compiler has nothing to complain about. There is still a conversion, because the rules say that values of type short get promoted to int in this expression, but don't worry about that for now.
Another possibility is to use an unsigned type for that loop index:
for (unsigned short X = 0; X < Vector.size(); ++X)
But the compiler might still insist on warning you that not all values of type std::size_t can fit in an unsigned short. So, again, you might need a cast. Or change the type of the index to match what the compiler think you need:
for (std::size_t X = 0; X < Vector.size(); ++X_
If I were to go this route, I would use unsigned int and if the compiler insisted on telling me that I don't know what I'm doing I'd yell at the compiler (which usually isn't helpful) and then I'd turn off that warning. There's really no point in using short here, because the loop index will always be converted to int (or unsigned int) wherever it's used. It will probably be in a register, so there is no space actually saved by storing it as a short.
Even better, as recommended in other answers, is to use a range-base for loop, which avoids managing that index:
for (auto& value: Vector) ...
In all cases, X has a storage duration of automatic, and the result of Vector.size() does not outlive the full expression where it is created.
I don't need anything more than this, long long unsigned int is overkill
Typically, automatic duration variables are "allocated" either on the stack, or as registers. In either case, there is no performance benefit to decreasing the allocation size, and there can be a performance penalty in narrowing and then widening values.
In the very common case where you are using X solely to index into Vector, you should strongly consider using a different kind of for:
for (auto & value : Vector) {
// replace Vector[X] with value in your loop body
}
I want to write a function
int char_to_int(char c);
that converts given char to int by zero extending the value. So if the char has N bits and int has M bits, M >= N, then the M-N most significant bits of the int value should be zero and the N least significant bits of the int value should match the bits of the char value.
This seems like a simple task, but I'm not sure how to write it relying only on standard behavior. No UB, no implementation-defined behavior. Without relying on char being 8 bit, int being 32 bit, char being unsigned and any other common assumptions I make that are not guaranteed by standard.
The reason I want to know this, is that I have done this conversion several times in the past, but recently I became aware about the limited guarantees C++ gives about it's data types. So now I'm curious what is the correct, standard compliant approach.
I don't suppose
return (int) c;
is good enough, is it?
There's no hurt in being extra clear:
return int((unsigned char)c);
That way you tell the compiler exactly what you want: the int that contains the char value, read as unsigned. So char 255 will become int 255.
Given the following piece of (pseudo-C++) code:
float x=100, a=0.1;
unsigned int height = 63, width = 63;
unsigned int hw=31;
for (int row=0; row < height; ++row)
{
for (int col=0; col < width; ++col)
{
float foo = x + col - hw + a * (col - hw);
cout << foo << " ";
}
cout << endl;
}
The values of foo are screwed up for half of the array, in places where (col - hw) is negative. I figured because col is int and comes first, that this part of the expression is converted to int and becomes negative. Unfortunately, apparently it doesn't, I get an overflow of an unsigned value and I've no idea why.
How should I resolve this problem? Use casts for the whole or part of the expression? What type of casts (C-style or static_cast<...>)? Is there any overhead to using casts (I need this to work fast!)?
EDIT: I changed all my unsigned ints to regular ones, but I'm still wondering why I got that overflow in this situation.
Unsigned integers implement unsigned arithmetic. Unsigned arithmetic is modulo arithmetics. All values are adjusted modulo 2^N, where N is the number of bits in the value representation of unsigned type.
In simple words, unsigned arithmetic always produces non-negative values. Every time the expression should result in negative value, the value actually "wraps around" 2^N and becomes positive.
When you mix a signed and an unsigned integer in a [sub-]expression, the unsigned arithmetic "wins", i.e. the calculations are performed in unsigned domain. For example, when you do col - hw, it is interpreted as (unsigned) col - hw. This means that for col == 0 and hs == 31 you will not get -31 as the result. Instead you wil get UINT_MAX - 31 + 1, which is normally a huge positive value.
Having said that, I have to note that in my opinion it is always a good idea to use unsigned types to represent inherently non-negative values. In fact, in practice most (or at least half) of integer variables in C/C++ the program should have unsigned types. Your attempt to use unsigned types in your example is well justified (if understand the intent correctly). Moreover, I'd use unsigned for col and row as well. However, you have to keep in mind the way unsigned arithmetic works (as described above) and write your expressions accordingly. Most of the time, an expression can be rewritten so that it doesn't cross the bounds of unsigned range, i.e. most of the time there's no need to explicitly cast anything to signed type. Otherwise, if you do need to work with negative values eventually, a well-placed cast to signed type should solve the problem.
How about making height, width, and hw signed ints? What are you really gaining by making them unsigned? Mixing signed and unsigned integers is always asking for trouble. At first glance, at least, it doesn't look like you gain anything by using unsigned values here. So, you might as well make them all signed and save yourself the trouble.
You have the conversion rules backwards — when you mix signed and unsigned versions of the same type, the signed operand is converted to unsigned.
If you want this to be fast you should static_cast all unsigned values before you start looping and use their int versions rather than unsigned int. You can still require inputs being unsigned and then just cast them on the way to your algorithm to retain the required domain for your function.
Casts won't happen automatically- uncasted arithmetic still has it's uses. The usual example is int / int = int, even if data is lost by not converting to float. I'd use signed int unless it's impossible to do so because of INT_MAX being too small.
42 as unsigned int is well defined as "42U".
unsigned int foo = 42U; // yeah!
How can I write "23" so that it is clear it is an unsigned short int?
unsigned short bar = 23; // booh! not clear!
EDIT so that the meaning of the question is more clear:
template <class T>
void doSomething(T) {
std::cout << "unknown type" << std::endl;
}
template<>
void doSomething(unsigned int) {
std::cout << "unsigned int" << std::endl;
}
template<>
void doSomething(unsigned short) {
std::cout << "unsigned short" << std::endl;
}
int main(int argc, char* argv[])
{
doSomething(42U);
doSomething((unsigned short)23); // no other option than a cast?
return EXIT_SUCCESS;
}
You can't. Numeric literals cannot have short or unsigned short type.
Of course in order to assign to bar, the value of the literal is implicitly converted to unsigned short. In your first sample code, you could make that conversion explicit with a cast, but I think it's pretty obvious already what conversion will take place. Casting is potentially worse, since with some compilers it will quell any warnings that would be issued if the literal value is outside the range of an unsigned short. Then again, if you want to use such a value for a good reason, then quelling the warnings is good.
In the example in your edit, where it happens to be a template function rather than an overloaded function, you do have an alternative to a cast: do_something<unsigned short>(23). With an overloaded function, you could still avoid a cast with:
void (*f)(unsigned short) = &do_something;
f(23);
... but I don't advise it. If nothing else, this only works if the unsigned short version actually exists, whereas a call with the cast performs the usual overload resolution to find the most compatible version available.
unsigned short bar = (unsigned short) 23;
or in new speak....
unsigned short bar = static_cast<unsigned short>(23);
at least in Visual Studio (at least 2013 and newer) you can write
23ui16
for get an constant of type unsigned short.
see definitions of INT8_MIN, INT8_MAX, INT16_MIN, INT16_MAX, etc. macros in stdint.h
I don't know at the moment whether this is part of the standard C/C++
There are no modifiers for unsigned short. Integers, which has int type by default, usually implicitly converted to target type with no problems. But if you really want to explicitly indicate type, you could write the following:
unsigned short bar = static_cast<unsigned short>(23);
As I can see the only reason is to use such indication for proper deducing template type:
func( static_cast<unsigned short>(23) );
But for such case more clear would be call like the following:
func<unsigned short>( 23 );
There are multiple answers here, none of which are terribly satisfying. So here is a compilation answer with some added info to help explain things a little more thoroughly.
First, avoid shorts as suggested, but if you find yourself needing them such as when working with indexed mesh data and simply switching to shorts for your index size cuts your index data size in half...then read on...
1 While it is technically true that there is no way to express an unsigned short literal in c or C++ you can easily side step this limitation by simply marking your literal as unsigned with a 'u'.
unsigned short myushort = 16u;
This works because it tells the compiler that 16 is unsigned int, then the compiler goes looking for a way to convert it to unsigned short, finds one, most compilers will then check for overflow, and do the conversion with no complaints. The "narrowing conversion" error/warning when the "u" is left out is the compiler complaining that the code is throwing away the sign. Such that if the literal is negative such as -1 then the result is undefined. Usually this means you will get a very large unsigned value that will then be truncated to fit the short.
2 There are multiple suggestions on how to side step this limitation, most seasoned programmers will sum these up with a "don't do that".
unsigned short myshort = (unsigned short)16;
unsigned short myothershort = static_cast<unsigned short>(16);
While both of these work they are undesirable for 2 major reasons. First they are wordy, programmers get lazy and typing all that just for a literal is easy to skip which leads to basic errors that could have been avoided with a better solution. Second they are not free, static_cast in particular generates a little assembly code to do the conversion, and while an optimizer may(or may not) figure out that it can do the conversion its better to just write good quality code from the start.
unsigned short myshort = 16ui16;
This solution is undesirable because it limits who can read your code and understand it, it also means you are starting down the slippery slope of compiler specific code which can lead to your code suddenly not working because of the whims of some compiler writer, or some company that randomly decides to "make a right hand turn", or goes away and leaves in you in the lurch.
unsigned short bar = L'\x17';
This is so unreadable that nobody has upvoted it. And unreadable should be avoided for many good reasons.
unsigned short bar = 0xf;
This to is unreadable. While being able to read understand and convert hex is something serious programmers really need to learn it is very unreadable quick what number is this: 0xbad; Now convert it to binary...now octal.
3 Lastly if you find all the above solutions undesirable I offer up yet another solution that is available via a user defined operator.
constexpr unsigned short operator ""_ushort(unsigned long long x)
{
return (unsigned short)x;
}
and to use it
unsigned short x = 16_ushort;
Admittedly this too isn't perfect. First it takes an unsigned long long and whacks it all the way down to an unsigned short suppressing potential compiler warnings along the way, and it uses the c style cast. But it is constexpr which gurantees it is free in an optimized program, yet can be stepped into during debug. It is also short and sweet so programmers are more likely to use it and it is expressive so it is easy to read and understand. Unfortunately it requires a recent compiler as what can legally be done with user defined operators has changed over the various version of C++.
So pick your trade off but be careful as you may regret them later. Happy Programming.
Unfortunately, the only method defined for this is
One or two characters in single quotes
('), preceded by the letter L
According to http://cpp.comsci.us/etymology/literals.html
Which means you would have to represent your number as an ASCII escape sequence:
unsigned short bar = L'\x17';
Unfortunately, they can't. But if people just look two words behind the number, they should clearly see it is a short... It's not THAT ambiguous. But it would be nice.
If you express the quantity as a 4-digit hex number, the unsigned shortness might be clearer.
unsigned short bar = 0x0017;
You probably shouldn't use short, unless you have a whole lot of them. It's intended to use less storage than an int, but that int will have the "natural size" for the architecture. Logically it follows that a short probably doesn't. Similar to bitfields, this means that shorts can be considered a space/time tradeoff. That's usually only worth it if it buys you a whole lot of space. You're unlikely to have very many literals in your application, though, so there was no need foreseen to have short literals. The usecases simply didn't overlap.
In C++11 and beyond, if you really want an unsigned short literal conversion then it can be done with a user defined literal:
using uint16 = unsigned short;
using uint64 = unsigned long long;
constexpr uint16 operator""_u16(uint64 to_short) {
// use your favorite value validation
assert(to_short < USHRT_MAX); // USHRT_MAX from limits.h
return static_cast<uint16>(to_short);
}
int main(void) {
uint16 val = 26_u16;
}
I have two 16-bit shorts (s1 and s2), and I'm trying to combine them into a single 32-bit integer (i1). According to the spec I'm dealing with, s1 is the most significant word, and s2 is the least significant word, and the combined word appears to be signed. (i.e. the top bit of s1 is the sign.)
What is the cleanest way to combine s1 and s2?
I figured something like
const utils::int32 i1 = ((s1<<16) | (s2));
would do, and it seems to work, but I'm worried about left-shifting a short by 16.
Also, I'm interested in the idea of using a union to do the job, any thoughts on whether this is a good or bad idea?
What you are doing is only meaningful if the shorts and the int are all unsigned. If either of the shorts is signed and has a negative value, the idea of combining them into a single int is meaningless, unless you have been provided with a domain-specific specification to cover such an eventuality.
What you've got looks nearly correct, but will probably fail if the second part is negative; the implicit conversion to int will probably sign-extend and fill the upper 16 bits with ones. A cast to unsigned short would probably prevent that from happening, but the best way to be sure is to mask off the bits.
const utils::int32 combineddata = ((data.first<<16) | ((data.second) & 0xffff));
I know this is an old post but the quality of the present posted answers is depressing...
These are the issues to consider:
Implicit integer promotion of shorts (or other small integer types) will result in an operand of type int which is signed. This will happen regardless of the signedness of the small integer type. Integer promotion happens in the shift operations and in the bitwise OR.
In case of the shift operators, the resulting type is that of the promoted left operand. In case of bitwise OR, the resulting type is obtained from "the usual arithmetic conversions".
Left-shifting a negative number results in undefined behavior. Right-shifting a negative number results in implementation-defined behavior (logical vs arithmetic shift). Therefore signed numbers should not be used together with bit shifts in 99% of all use-cases.
Unions, arrays and similar are poor solutions since they make the code endianess-dependent. In addition, type punning through unions is also not well-defined behavior in C++ (unlike C). Pointer-based solutions are bad since they will end up violating "the strict aliasing rule".
A proper solution will therefore:
Use operands with types that are guaranteed to be unsigned and will not be implicitly promoted.
Use bit-shifts, since these are endianess-independent.
Not use some non-portable hogwash solution with unions or pointers. There is absolutely nothing gained from such solutions except non-portability. Such solutions are however likely to invoke one or several cases of undefined behavior.
It will look like this:
int32_t i32 = (int32_t)( (uint32_t)s1<<16 | (uint32_t)s2 );
Any other solution is highly questionable and at best non-portable.
Since nobody posted it, this is what the union would look like. But the comments about endian-ness definitely apply.
Big-endian:
typedef union {
struct {
uint16_t high;
uint16_t low;
} pieces;
uint32_t all;
} splitint_t;
Little-endian:
typedef union {
struct {
uint16_t low;
uint16_t high;
} pieces;
uint32_t all;
} splitint_t;
Try projecting data.second explicite to short type, like:
const utils::int32 combineddata = ((data.first<<16) | ((short)data.second));
edit: I am C# dev, probably the casting in your code language looks different, but idea could be the same.
You want to cast data.first to an int32 before you do the shift, otherwise the shift will overflow the storage before it gets a chance to be automatically promoted when it is assigned to combined data.
Try:
const utils::int32 combineddata = (static_cast<utils::int32>(data.first) << 16) | data.second;
This is of course assuming that data.first and data.second are types that are guaranteed to be precisely 16 bits long, otherwise you have bigger problems.
I really don't understand your statement "if data.second gets too big, the | won't take account of the fact that they're both shorts."
Edit: And Neil is absolutely right about signedness.
Using a union to do the job looks like a good choice, but is a portability issue due to endian differences of processors. It's doable, but you need to be prepared to modify your union based on the target architecture. Bit shifting is portable, but please write a function/method to do it for you. Inline if you like.
As to the signedness of the shorts, for this type of operation, it's the meaning that matters not the data type. In other words, if s1 and s2 are meant to be interpreted as two halves of a 32 bit word, having bit 15 set only matters if you do something that would cause s2 to be sign extended. See Mark Ransoms answer, which might be better as
inline utils::int32 CombineWord16toLong32(utils::int16 s1, utils::int16 s2)
{
return ((s1 <<16) | (s2 & 0xffff));
}