Why does memset take an int as the second argument instead of a char, whereas wmemset takes a wchar_t instead of something like long or long long?
memset predates (by quite a bit) the addition of function prototypes to C. Without a prototype, you can't pass a char to a function -- when/if you try, it'll be promoted to int when you pass it, and what the function receives is an int.
It's also worth noting that in C, (but not in C++) a character literal like 'a' does not have type char -- it has type int, so what you pass will usually start out as an int anyway. Essentially the only way for it to start as a char and get promoted is if you pass a char variable.
In theory, memset could probably be modified so it receives a char instead of an int, but there's unlikely to be any benefit, and a pretty decent possibility of breaking some old code or other. With an unknown but potentially fairly high cost, and almost no chance of any real benefit, I'd say the chances of it being changed to receive a char fall right on the line between "slim" and "none".
Edit (responding to the comments): The CHAR_BIT least significant bits of the int are used as the value to write to the target.
Probably the same reason why the functions in <ctypes.h> take ints and not chars.
On most platforms, a char is too small to be pushed on the stack by itself, so one usually pushes the type closest to the machine's word size, i.e. int.
As the link in #Gui13's comment points out, doing that also increases performance.
See fred's answer, it's for performance reasons.
On my side, I tried this code:
#include <stdio.h>
#include <string.h>
int main (int argc, const char * argv[])
{
char c = 0x00;
printf("Before: c = 0x%02x\n", c);
memset( &c, 0xABCDEF54, 1);
printf("After: c = 0x%02x\n", c);
return 0;
}
And it gives me this on a 64bits Mac:
Before: c = 0x00
After: c = 0x54
So as you see, only the last byte gets written. I guess this is dependent on the architecture (endianness).
Related
I want to write a function
int char_to_int(char c);
that converts given char to int by zero extending the value. So if the char has N bits and int has M bits, M >= N, then the M-N most significant bits of the int value should be zero and the N least significant bits of the int value should match the bits of the char value.
This seems like a simple task, but I'm not sure how to write it relying only on standard behavior. No UB, no implementation-defined behavior. Without relying on char being 8 bit, int being 32 bit, char being unsigned and any other common assumptions I make that are not guaranteed by standard.
The reason I want to know this, is that I have done this conversion several times in the past, but recently I became aware about the limited guarantees C++ gives about it's data types. So now I'm curious what is the correct, standard compliant approach.
I don't suppose
return (int) c;
is good enough, is it?
There's no hurt in being extra clear:
return int((unsigned char)c);
That way you tell the compiler exactly what you want: the int that contains the char value, read as unsigned. So char 255 will become int 255.
Is there an easy way to convert between char and unsigned char if you don't know the default setting of the machine your code is running on?
(On most architectures, char is signed by default and thus has a range from -128 to +127. On some other architectures, such as ARM, char is unsigned by default and has a range from 0 to 255)
I am looking for a method to select the correct signedness or to convert between the two transparently, preferably one that doesn't involve too many steps since I would need to do this for all elements in an array.
Using a pre-processor definition would allow the setting of this at the start of my code.
As would specifying an explicit form of char such as signed char or unsigned char as only char is variable between platforms.
The reason is, there are library functions I would like to use (such as strtol) that take in char as an argument but not unsigned char.
I am looking for some advice or perhaps some pointers in the right direction as to what would be a practical efficient way to do this to make the code portable, as I intend to run the code on a few machines with different default settings for char.
I don't feel any actual issue on this point.
It's not a matter of the architecture being signed or unsigned by default. It's rather a matter of the compiler, and the default setting can be changed between the two options as you wish.
Also, there's no need to convert between the types. Both have the same representation in memory, on the same number of bits (usually 8). It's only a matter of your program and the libraries it uses to interpret the bits. If you're going to call strtol, then your data is a character array and you ought to use plain char.
If you ever use char to store not a character (A, b, f ...) but an actual value (-1, 0, 42 ...) then the range matters. In such cases, you have to use signed char or unsigned char. However in such a case, there's little use for the libraries functions that want a char *.
For these libraries that do actually want a char * with an actual binary blob, there's no issue. Create your binary buffer with the type you prefer, signed, unsigned, or undecided, and send it, possibly with a cast. It will run perfectly.
C++ has three char types however only char is allowed to vary between compilers/architectures, as the other two are explicit version, and char is implicit, so it is allowed to default to signed or unsigned.
To make your code portable the most straightforward thing to do is explicitly to use either signed or unsigned char as you require them, however for readability you may prefer to redefine char as a the type you need, or even make your own definition of a char (for demonstration purposes I will use RLChar)
1st version - un-define char and redefine
#ifdef __arm__
#undef char
#define char signed char
#endif
2nd version - define your own custom char type to use in your code
#ifndef RLChar
#define RLChar signed char
#endif
(personally I would tend to do the second)
You can also create another macro to allow changes between the two:
#define CLAMP_VALUE_TO_255(v) ((v) > 255 ? 255 : ((v) < 0 ? 0 : (v)))
then you can use:
unsigned char clampedChar = CLAMP_VALUE_TO_255((unsigned char)pixel)
or use casts such as (these are the way to go if all the compilers you will use have the support for it):
signed char myChar = -100;
unsigned char mySecondChar;
mySecondChar = static_cast<unsigned char>(myChar); // uses a static cast
mySecondChar = reinterpret_cast<unsigned char&>(myChar); // uses a reinterpretation cast
so for your array scenario you could do
unsigned char* RLArray;
RLArray = reinterpret_cast<unsigned char*>(originalSignedCharArray);
Let me know if you need more info as this is just what I can remember off the top of my head, especially if you need C equivalents or more details. :)
I am reading chapter 2 of Advanced Linux Programming:
http://www.advancedlinuxprogramming.com/alp-folder/alp-ch02-writing-good-gnu-linux-software.pdf
In the section 2.1.3 Using getopt_long, there is an example program that goes a bit like this:
int main (int argc, char* argv[]) {
int next_option;
// ...
do {
next_option = getopt_long (argc, argv, short_options, long_options, NULL);
switch (next_option) {
case ‘h’: /* -h or --help */
// ...
}
// ...
The bit that caught my attention is that next_option is declared as an int. The function getopt_long() apparently returns an int representing the short command line argument which is used in the following switch statement. How come that integer can be compared to a character in the switch statement?
Is there an implicit conversion from a char (a single character?) to an int? How is the code above valid? (see full code in linked pdf)
Neither C nor C++ have a type that can store "characters" as values with some dedicated character-specific properties. In that sense, there's no "character" type neither in C nor in C++.
In both C++ and C languages char is an integral type. It contains numbers. It is just a smallest (in terms of range) integral type. Conversion between char and int exists, just like it exists between int and long or int and short. char has no special status among other integral types (aside from the fact that char type it is distinct from signed char type).
A literal of the form 'h' in C++ has type char, but as any other integral type it is comparable to int. That's why you can use it in case label the way it is used in your original example.
In other words, your original code is as "strange" as
switch (next_option) {
case 1L: ...
// ...
}
would be. In this case the switch argument is an int, but the case label is a long. The code is valid. Do you find it surprising? Probably not. Your example with 'h' is in not much different.
You are mistaken -- getopt_long(3) returns an int.
Several functions return int in C, but char in C++. Returning an int when a char would make more sense is simply an old C cultural decision. Plus, in a few cases, it's necessary so that a function can return sentinels like EOF.
As the other answerer says, you're asking the wrong question here. But to answer the question you did ask:
No implicit casting from char* to an int is available. On x86 machines, both int and char* are 32 bits long, so it's "safe" to explicitly cast:
int x = (int*) &someChar;
BUT HIGHLY NOT RECOMMENDED!!!
On x64 machines, this will not work! int remains 32 bits long, but all pointers are now 64 bits long... so you'll lose data in the process!
According to the man page, getopt_long returns an int. And yes, there is an implicit cast from char to int; a char is just a one-byte integer value.
So in this case, the cast is not happening when assigning to next_option, but in the case statement where you have a character constant being compared to an int. Of course, this is assuming you compile this as C++. In C++, a character constant is of type char, but in C it's of type int, so if you compile this code as C then there's no type conversion at all.
(And in your question you mention char*, but you probably meant char; there are no pointers being used here.)
Think of a char as a 8bit int. You can perform integer operations on chars and you can even declare them as unsigned. You wouldn't be surprised if you could compare a short and a long. Why should comparing a char and an int be different?
Lets us consider this snippet:
int s;
scanf("%c",&s);
Here I have used int, and not char, for variable s, now for using s for character conversion safely I have to make it char again because when scanf reads a character it only overwrites one byte of the variable it is assigning it to, and not all four that int has.
For conversion I could use s = (char)s; as the next line, but is it possible to implement the same by subtracting something from s ?
What you've done is technically undefined behaviour. The %c format calls for a char*, you've passed it an int* which will (roughly speaking) be reinterpreted. Even assuming that the pointer value is still good after reinterpreting, storing an arbitrary character to the first byte of an int and then reading it back as int is undefined behaviour. Even if it were defined, reading an int when 3 bytes of it are uninitialized, is undefined behaviour.
In practice it probably does something sensible on your machine, and you just get garbage in the top 3 bytes (assuming little-endian).
Writing s = (char)s converts the value from int to char and then back to int again. This is implementation-defined behaviour: converting an out-of-range value to a signed type. On different implementations it might clean up the top 3 bytes, it might return some other result, or it might raise a signal.
The proper way to use scanf is:
char c;
scanf("%c", &c);
And then either int s = c; or int s = (unsigned char)c;, according to whether you want negative-valued characters to result in a negative integer, or a positive integer (up to 255, assuming 8-bit char).
I can't think of any good reason for using scanf improperly. There are good reasons for not using scanf at all, though:
int s = getchar();
Are you trying to convert a digit to its decimal value? If so, then
char c = '8';
int n = c - '0';
n should 8 at this point.
That's probably not a good idea; GCC gives me a warning for that code:
main.c:10: warning: format ‘%c’ expects type ‘char *’, but
argument 2 has type ‘int *’
In this case you're ok since you're passing a pointer to more space than you need (for most systems), but what if you did it the other way around? Could be crash city. If you really want to do something like what you have there, just do the typecast or mask it - the mask will be endian-dependent.
As written this won't work reliably . The argument, &s, to scanf is a pointer to int and scanf is expecting a pointer to char. The two data type (int and char) have different sizes (at least on most architectures) so the data may get put in the wrong spot in memeory, and the other part of s may not get properly cleared.
The answers suggesting manipulation of the result after using a pointer to int rely on unspecified behavior (i.e. that scanf will put the character value it has in the least significant byte of the int you're pointing to), and are not safe.
Not but you could use the following:
s = s & 0xFF
That will blank out all of the data except the first byte. But in general all these ideas (and the ones above) are bad ideas, since not all systems store the lowest part of the integer in memory first. So if you ever have to port this code to a big endian system, you'll be screwed.
True, you may never have to port the code, but why write unportable code to begin with?
See this for more info:
http://en.wikipedia.org/wiki/Endianness
I have been working on a legacy C++ application and am definitely outside of my comfort-zone (a good thing). I was wondering if anyone out there would be so kind as to give me a few pointers (pun intended).
I need to cast 2 bytes in an unsigned char array to an unsigned short. The bytes are consecutive.
For an example of what I am trying to do:
I receive a string from a socket and place it in an unsigned char array. I can ignore the first byte and then the next 2 bytes should be converted to an unsigned char. This will be on windows only so there are no Big/Little Endian issues (that I am aware of).
Here is what I have now (not working obviously):
//packetBuffer is an unsigned char array containing the string "123456789" for testing
//I need to convert bytes 2 and 3 into the short, 2 being the most significant byte
//so I would expect to get 515 (2*256 + 3) instead all the code I have tried gives me
//either errors or 2 (only converting one byte
unsigned short myShort;
myShort = static_cast<unsigned_short>(packetBuffer[1])
Well, you are widening the char into a short value. What you want is to interpret two bytes as an short. static_cast cannot cast from unsigned char* to unsigned short*. You have to cast to void*, then to unsigned short*:
unsigned short *p = static_cast<unsigned short*>(static_cast<void*>(&packetBuffer[1]));
Now, you can dereference p and get the short value. But the problem with this approach is that you cast from unsigned char*, to void* and then to some different type. The Standard doesn't guarantee the address remains the same (and in addition, dereferencing that pointer would be undefined behavior). A better approach is to use bit-shifting, which will always work:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
This is probably well below what you care about, but keep in mind that you could easily get an unaligned access doing this. x86 is forgiving and the abort that the unaligned access causes will be caught internally and will end up with a copy and return of the value so your app won't know any different (though it's significantly slower than an aligned access). If, however, this code will run on a non-x86 (you don't mention the target platform, so I'm assuming x86 desktop Windows), then doing this will cause a processor data abort and you'll have to manually copy the data to an aligned address before trying to cast it.
In short, if you're going to be doing this access a lot, you might look at making adjustments to the code so as not to have unaligned reads and you'll see a perfromance benefit.
unsigned short myShort = *(unsigned short *)&packetBuffer[1];
The bit shift above has a bug:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
if packetBuffer is in bytes (8 bits wide) then the above shift can and will turn packetBuffer into a zero, leaving you with only packetBuffer[2];
Despite that this is still preferred to pointers. To avoid the above problem, I waste a few lines of code (other than quite-literal-zero-optimization) it results in the same machine code:
unsigned short p;
p = packetBuffer[1]; p <<= 8; p |= packetBuffer[2];
Or to save some clock cycles and not shift the bits off the end:
unsigned short p;
p = (((unsigned short)packetBuffer[1])<<8) | packetBuffer[2];
You have to be careful with pointers, the optimizer will bite you, as well as memory alignments and a long list of other problems. Yes, done right it is faster, done wrong the bug can linger for a long time and strike when least desired.
Say you were lazy and wanted to do some 16 bit math on an 8 bit array. (little endian)
unsigned short *s;
unsigned char b[10];
s=(unsigned short *)&b[0];
if(b[0]&7)
{
*s = *s+8;
*s &= ~7;
}
do_something_With(b);
*s=*s+8;
do_something_With(b);
*s=*s+8;
do_something_With(b);
There is no guarantee that a perfectly bug free compiler will create the code you expect. The byte array b sent to the do_something_with() function may never get modified by the *s operations. Nothing in the code above says that it should. If you don't optimize your code then you may never see this problem (until someone does optimize or changes compilers or compiler versions). If you use a debugger you may never see this problem (until it is too late).
The compiler doesn't see the connection between s and b, they are two completely separate items. The optimizer may choose not to write *s back to memory because it sees that *s has a number of operations so it can keep that value in a register and only save it to memory at the end (if ever).
There are three basic ways to fix the pointer problem above:
Declare s as volatile.
Use a union.
Use a function or functions whenever changing types.
You should not cast a unsigned char pointer into an unsigned short pointer (for that matter cast from a pointer of smaller data type to a larger data type). This is because it is assumed that the address will be aligned correctly. A better approach is to shift the bytes into a real unsigned short object, or memcpy to a unsigned short array.
No doubt, you can adjust the compiler settings to get around this limitation, but this is a very subtle thing that will break in the future if the code gets passed around and reused.
Maybe this is a very late solution but i just want to share with you. When you want to convert primitives or other types you can use union. See below:
union CharToStruct {
char charArray[2];
unsigned short value;
};
short toShort(char* value){
CharToStruct cs;
cs.charArray[0] = value[1]; // most significant bit of short is not first bit of char array
cs.charArray[1] = value[0];
return cs.value;
}
When you create an array with below hex values and call toShort function, you will get a short value with 3.
char array[2];
array[0] = 0x00;
array[1] = 0x03;
short i = toShort(array);
cout << i << endl; // or printf("%h", i);
static cast has a different syntax, plus you need to work with pointers, what you want to do is:
unsigned short *myShort = static_cast<unsigned short*>(&packetBuffer[1]);
Did nobody see the input was a string!
/* If it is a string as explicitly stated in the question.
*/
int byte1 = packetBuffer[1] - '0'; // convert 1st byte from char to number.
int byte2 = packetBuffer[2] - '0';
unsigned short result = (byte1 * 256) + byte2;
/* Alternatively if is an array of bytes.
*/
int byte1 = packetBuffer[1];
int byte2 = packetBuffer[2];
unsigned short result = (byte1 * 256) + byte2;
This also avoids the problems with alignment that most of the other solutions may have on certain platforms. Note A short is at least two bytes. Most systems will give you a memory error if you try and de-reference a short pointer that is not 2 byte aligned (or whatever the sizeof(short) on your system is)!
char packetBuffer[] = {1, 2, 3};
unsigned short myShort = * reinterpret_cast<unsigned short*>(&packetBuffer[1]);
I (had to) do this all the time. big endian is an obvious problem. What really will get you is incorrect data when the machine dislike misaligned reads! (and write).
you may want to write a test cast and an assert to see if it reads properly. So when ran on a big endian machine or more importantly a machine that dislikes misaligned reads an assert error will occur instead of a weird hard to trace 'bug' ;)
On windows you can use:
unsigned short i = MAKEWORD(lowbyte,hibyte);
I realize this is an old thread, and I can't say that I tried every suggestion made here. I'm just making my self comfortable with mfc, and I was looking for a way to convert a uint to two bytes, and back again at the other end of a socket.
There are alot of bit shifting examples you can find on the net, but none of them seemed to actually work. Alot of the examples seem overly complicated; I mean we're just talking about grabbing 2 bytes out of a uint, sending them over the wire, and plugging them back into a uint at the other end, right?
This is the solution I finally came up with:
class ByteConverter
{
public:
static void uIntToBytes(unsigned int theUint, char* bytes)
{
unsigned int tInt = theUint;
void *uintConverter = &tInt;
char *theBytes = (char*)uintConverter;
bytes[0] = theBytes[0];
bytes[1] = theBytes[1];
}
static unsigned int bytesToUint(char *bytes)
{
unsigned theUint = 0;
void *uintConverter = &theUint;
char *thebytes = (char*)uintConverter;
thebytes[0] = bytes[0];
thebytes[1] = bytes[1];
return theUint;
}
};
Used like this:
unsigned int theUint;
char bytes[2];
CString msg;
ByteConverter::uIntToBytes(65000,bytes);
theUint = ByteConverter::bytesToUint(bytes);
msg.Format(_T("theUint = %d"), theUint);
AfxMessageBox(msg, MB_ICONINFORMATION | MB_OK);
Hope this helps someone out.