I'm currently working on a program that converts to and from base64 in Eclipse. However, I've just noticed that char values seem to have 7 bits instead of the usual 8. For example, the character 'o' is shown to be represented in binary as 1101111 instead of 01101111, which effectively prevents me from completing my project, as I need a total of 24 bits to work with for the conversion to work. Is there any way to either append a 0 to the beginning of the value (i tried bitshifting in both directions, but neither worked), or preventing the issue altogether?
The code for the (incomplete/nonfuntional) offending method is as follows, let me know if more is required:
std::string Encoder::encode( char* src, unsigned char* dest)
{
char ch0 = src[0];
char ch1 = src[1];
char ch2 = src[2];
char sixBit1 = ch0 >> 1;
dest[0] = ch2;
dest[1] = ch1;
dest[2] = ch0;
dest[3] = '-';
}
char for C/C++ language is always signed int8. So, it is excepted that you have only 7 useable bits - because one bit is used for sign storage.
Try to use unsigned char instead.
Either unsigned char or uint8_t from <stdint.h> should work. For maximum portability, uint_least8_t is guaranteed to exist.
Related
I need to write 16-bit integers to a file. fstream only writes characters. Thus I need to convert the integers to char - the actual integer, not the character representing the integer (i.e. 0 should be 0x00, not 0x30) I tried the following:
char * chararray = (char*)(&the_int);
However this creates a backwards array of two characters. The individual characters are not flipped, but the order of the characters is. Thus I created this function:
char * inttochar(uint16_t input)
{
int input_size = sizeof(input);
char * chararray = (char*)(&input);
char * output;
output[0]='\0';
for (int i=0; i<input_size; i++)
{
output[i]=chararray[input_size-(i+1)];
}
return output;
}
This seems slow. Surely there is a more efficient, less hacky way to convert it?
It's a bit hard to understand what you're asking here (perhaps it's just me, although I gather the commentators thought so too).
You write
fstream only writes characters
That's true, but doesn't necessarily mean you need to create a character array explicitly.
E.g., if you have an fstream object f (opened in binary mode), you can use the write method:
uint16_t s;
...
f.write(static_cast<const char *>(&s), sizeof(uint16_t));
As others have noted, when you serialize numbers, it often pays to use a commonly-accepted ordering. Hence, use htons (refer to the documentation for your OS's library):
uint16_t s;
...
const uint16_t ns = htons(s);
f.write(static_cast<const char *>(&ns), sizeof(uint16_t));
I'm trying to display an integer on an LCD-Display. The way the Lcd works is that you send an 8-Bit ASCII-Character to it and it displays the character.
The code I have so far is:
unsigned char text[17] = "ABCDEFGHIJKLMNOP";
int32_t n = 123456;
lcd.printInteger(text, n);
//-----------------------------------------
void LCD::printInteger(unsigned char headLine[17], int32_t number)
{
//......
int8_t str[17];
itoa(number,(char*)str,10);
for(int i = 0; i < 16; i++)
{
if(str[i] == 0x0)
break;
this->sendCharacter(str[i]);
_delay_ms(2);
}
}
void LCD::sendCharacter(uint8_t character)
{
//....
*this->cOutputPort = character;
//...
}
So if I try to display 123456 on the LCD, it actually displays -7616, which obviously is not the correct integer.
I know that there is probably a problem because I convert the characters to signed int8_t and then output them as unsigned uint8_t. But I have to output them in unsigned format. I don't know how I can convert the int32_t input integer to an ASCII uint8_t-String.
On your architecture, int is an int16_t, not int32_t. Thus, itoa treats 123456 as -7616, because:
123456 = 0x0001_E240
-7616 = 0xFFFF_E240
They are the same if you truncate them down to 16 bits - so that's what your code is doing. Instead of using itoa, you have following options:
calculate the ASCII representation yourself;
use ltoa(long value, char * buffer, int radix), if available, or
leverage s[n]printf if available.
For the last option you can use the following, "mostly" portable code:
void LCD::printInteger(unsigned char headLine[17], int32_t number) {
...
char str[17];
if (sizeof(int) == sizeof(int32_t))
snprintf(str, sizeof(str), "%d", num);
else if (sizeof(long int) == sizeof(int32_t))
snprintf(str, sizeof(str), "%ld", num);
else if (sizeof(long long int) == sizeof(int32_t))
snprintf(str, sizeof(str), "%lld", num);
...
}
If, and only if, your platform doesn't have snprintf, you can use sprintf and remove the 2nd argument (sizeof(str)). Your go-to function should always be the n variant, as it gives you one less bullet to shoot your foot with :)
Since you're compiling with a C++ compiler that is, I assume, at least half-decent, the above should do "the right thing" in a portable way, without emitting all the unnecessary code. The test conditions passed to if are compile-time constant expressions. Even some fairly old C compilers could deal with such properly.
Nitpick: Don't use int8_t where a char would do. itoa, s[n]printf, etc. expect char buffers, not int8_t buffers.
I have a binary file from which I load whole text in unsigned char[] and a variable const uint32_t LITTLE_ENDIAN_ID = 0x49696949;
I need to compare first four characters from loaded char[] with given uint32_t.
Is that possible somehow?
If buff is your unsigned char[] buffer, you can do:
memcmp((unsigned char*)&LITTLE_ENDIAN_ID, buff, 4) == 0
memcmp is defined in string.h
yes, it's absolutely possible, but your question is underspecified. What you want to do is to take the first 4 characters of your character array and convert them into a uint32_t; the obvious question: which character corresponds to which byte of the 32-bit int? This is probably equivalent of asking if these bytes are stored in little-endian or big-endian order. Though now that I see your LITTLE_ENDIAN_ID I realize that it doesn't matter - it's (oddly) the same forwards and backwards.
Anyhow, what you want is either:
unsigned char[] text = ...
uint32_t x = text[0] << 24 + text[1] << 16 + text[2] << 8 + text[3];
if (x == LITTLE_ENDIAN_ID)
// do something
Or the same thing, but with
uint32_t x = text[3] << 24 + text[2] << 16 + text[1] << 8 + text[0];
Alternatively we could do something a little more unusual like
union {
uint32_t int_value;
unsigned char[4] characters;
} converter;
unsigned char[] text = ...
converter x;
for (int i=0; i < 4; i++)
x.characters[i] = text[i];
if (x.int_value == LITTLE_ENDIAN_ID)
// do something
This is probably closer to what you want if you are actually looking to test the endianness of the current system.
I am creating a C++ program for communication with a gripper on a serial port.
I have to send a buffer of type "unsigned char [8]", but of these 8 bytes, 4 are entered from the keyboard, and 2 are the CRC, calculated at the time.
So, how can I concatenate several pieces in a single buffer of 8 bytes unsigned char?
For example:
unsigned char buffer[8];
----
unsigned char DLEN[1]={0x05};
----
unsigned char CMD[1]={0x01};
----
unsigned char data[4]={0x00,0x01,0x20,0x41};
----
unsigned char CRC[2]={0xFF,0x41};
----
how can I get this buffer: {0x05,0x01,0x00,0x01,0x20,0x41,0xFF,0x41} that is the union of DLEN,CMD,data and CRC?
This:
buffer[0] = DLEN[0];
buffer[1] = CMD[0];
buffer[2] = data[0];
buffer[3] = data[1];
buffer[4] = data[2];
buffer[5] = data[3];
buffer[6] = CRC[0];
buffer[7] = CRC[1];
An alternative solution is this:
Start off with an unsigned char array of 8 characters.
When you need to pass it off to other methods to have data inserted in them, pass it by reference like this: updateCRC(&buffer[6]) with the method signature taking an unsigned char pointer. Assuming you respect the respective sizes of the inputs, the result is the best of both worlds, handling the buffer as if they were separate strings, and not having to merge it into a single array afterwards.
You can use bit shifting, the << and >> operators, to get the appropriate fields to the right places in the buffer.
Something like buffer |= (DLEN << 7);
Just make sure your buffer is cleared to be all 0's first.
My version of hmjd's answer:
buffer[0] = DLEN[0];
buffer[1] = CMD[0];
std::copy(begin(data),end(data),buffer+sizeof DLEN+sizeof CMD);
std::copy(begin(CRC) ,end(CRC) ,buffer+sizeof DLEN+sizeof CMD+sizeof data);
I was wondering is it safe to do so?
wchar_t wide = /* something */;
assert(wide >= 0 && wide < 256 &&);
char myChar = static_cast<char>(wide);
If I am pretty sure the wide char will fall within ASCII range.
Why not just use a library routine wcstombs.
assert is for ensuring that something is true in a debug mode, without it having any effect in a release build. Better to use an if statement and have an alternate plan for characters that are outside the range, unless the only way to get characters outside the range is through a program bug.
Also, depending on your character encoding, you might find a difference between the Unicode characters 0x80 through 0xff and their char version.
You are looking for wctomb(): it's in the ANSI standard, so you can count on it. It works even when the wchar_t uses a code above 255. You almost certainly do not want to use it.
wchar_t is an integral type, so your compiler won't complain if you actually do:
char x = (char)wc;
but because it's an integral type, there's absolutely no reason to do this. If you accidentally read Herbert Schildt's C: The Complete Reference, or any C book based on it, then you're completely and grossly misinformed. Characters should be of type int or better. That means you should be writing this:
int x = getchar();
and not this:
char x = getchar(); /* <- WRONG! */
As far as integral types go, char is worthless. You shouldn't make functions that take parameters of type char, and you should not create temporary variables of type char, and the same advice goes for wchar_t as well.
char* may be a convenient typedef for a character string, but it is a novice mistake to think of this as an "array of characters" or a "pointer to an array of characters" - despite what the cdecl tool says. Treating it as an actual array of characters with nonsense like this:
for(int i = 0; s[i]; ++i) {
wchar_t wc = s[i];
char c = doit(wc);
out[i] = c;
}
is absurdly wrong. It will not do what you want; it will break in subtle and serious ways, behave differently on different platforms, and you will most certainly confuse the hell out of your users. If you see this, you are trying to reimplement wctombs() which is part of ANSI C already, but it's still wrong.
You're really looking for iconv(), which converts a character string from one encoding (even if it's packed into a wchar_t array), into a character string of another encoding.
Now go read this, to learn what's wrong with iconv.
An easy way is :
wstring your_wchar_in_ws(<your wchar>);
string your_wchar_in_str(your_wchar_in_ws.begin(), your_wchar_in_ws.end());
char* your_wchar_in_char = your_wchar_in_str.c_str();
I'm using this method for years :)
A short function I wrote a while back to pack a wchar_t array into a char array. Characters that aren't on the ANSI code page (0-127) are replaced by '?' characters, and it handles surrogate pairs correctly.
size_t to_narrow(const wchar_t * src, char * dest, size_t dest_len){
size_t i;
wchar_t code;
i = 0;
while (src[i] != '\0' && i < (dest_len - 1)){
code = src[i];
if (code < 128)
dest[i] = char(code);
else{
dest[i] = '?';
if (code >= 0xD800 && code <= 0xD8FF)
// lead surrogate, skip the next code unit, which is the trail
i++;
}
i++;
}
dest[i] = '\0';
return i - 1;
}
Technically, 'char' could have the same range as either 'signed char' or 'unsigned char'. For the unsigned characters, your range is correct; theoretically, for signed characters, your condition is wrong. In practice, very few compilers will object - and the result will be the same.
Nitpick: the last && in the assert is a syntax error.
Whether the assertion is appropriate depends on whether you can afford to crash when the code gets to the customer, and what you could or should do if the assertion condition is violated but the assertion is not compiled into the code. For debug work, it seems fine, but you might want an active test after it for run-time checking too.
Here's another way of doing it, remember to use free() on the result.
char* wchar_to_char(const wchar_t* pwchar)
{
// get the number of characters in the string.
int currentCharIndex = 0;
char currentChar = pwchar[currentCharIndex];
while (currentChar != '\0')
{
currentCharIndex++;
currentChar = pwchar[currentCharIndex];
}
const int charCount = currentCharIndex + 1;
// allocate a new block of memory size char (1 byte) instead of wide char (2 bytes)
char* filePathC = (char*)malloc(sizeof(char) * charCount);
for (int i = 0; i < charCount; i++)
{
// convert to char (1 byte)
char character = pwchar[i];
*filePathC = character;
filePathC += sizeof(char);
}
filePathC += '\0';
filePathC -= (sizeof(char) * charCount);
return filePathC;
}
one could also convert wchar_t --> wstring --> string --> char
wchar_t wide;
wstring wstrValue;
wstrValue[0] = wide
string strValue;
strValue.assign(wstrValue.begin(), wstrValue.end()); // convert wstring to string
char char_value = strValue[0];
In general, no. int(wchar_t(255)) == int(char(255)) of course, but that just means they have the same int value. They may not represent the same characters.
You would see such a discrepancy in the majority of Windows PCs, even. For instance, on Windows Code page 1250, char(0xFF) is the same character as wchar_t(0x02D9) (dot above), not wchar_t(0x00FF) (small y with diaeresis).
Note that it does not even hold for the ASCII range, as C++ doesn't even require ASCII. On IBM systems in particular you may see that 'A' != 65