I've had some trouble with binary-to-(printable)hexa conversions. I've reached a functional (for my system) way of writing the code, but I need to know if it is portable on all systems (OS & hardware).
So this is my function (trying to construct a UUID from a piece of binary text):
int extractInfo( unsigned char * text )
{
char h[3];
int i;
this->str.append( "urn:uuid:" );
for( i = 56; i < 72; i++ )
{
ret = snprintf( h, 3, "%02x", text[i] );
if( ret != 2 )
return 1;
this->str.append( h );
if( i == 59 || i == 61 || i == 63 || i == 65 )
this->str.append( "-" );
}
return 0;
}
I understood that because of the sign extension my values are not printed well if I use char instead of unsigned char (C++ read binary file and convert to hex). Accepted and modified respectively.
But I've encountered more variants of doing this: Conversion from binary file to hex in C, and I am really lost. In unwind's piece of code:
sprintf(hex, "%02x", (unsigned int) buffer[0] & 0xff);
I did not understood why, although the array is unsigned char (as defined in the original posted code, by the one who asked the question), a cast to an unsigned int is needed, and also a bitwise AND on the byte to be converted...
So, as I did not understood very well the sign-extension thing, can you tell me at least if the piece of code I wrote will work on all systems?
since printf is not typesafe it expects for each formatting specifier a special sized argument.
thatswhy you have to cast your character argument to unsigned int if you use some formatting character that expects an int-sized type.
The "%x" specifier requires an unsigned int.
Related
I'm trying to display an integer on an LCD-Display. The way the Lcd works is that you send an 8-Bit ASCII-Character to it and it displays the character.
The code I have so far is:
unsigned char text[17] = "ABCDEFGHIJKLMNOP";
int32_t n = 123456;
lcd.printInteger(text, n);
//-----------------------------------------
void LCD::printInteger(unsigned char headLine[17], int32_t number)
{
//......
int8_t str[17];
itoa(number,(char*)str,10);
for(int i = 0; i < 16; i++)
{
if(str[i] == 0x0)
break;
this->sendCharacter(str[i]);
_delay_ms(2);
}
}
void LCD::sendCharacter(uint8_t character)
{
//....
*this->cOutputPort = character;
//...
}
So if I try to display 123456 on the LCD, it actually displays -7616, which obviously is not the correct integer.
I know that there is probably a problem because I convert the characters to signed int8_t and then output them as unsigned uint8_t. But I have to output them in unsigned format. I don't know how I can convert the int32_t input integer to an ASCII uint8_t-String.
On your architecture, int is an int16_t, not int32_t. Thus, itoa treats 123456 as -7616, because:
123456 = 0x0001_E240
-7616 = 0xFFFF_E240
They are the same if you truncate them down to 16 bits - so that's what your code is doing. Instead of using itoa, you have following options:
calculate the ASCII representation yourself;
use ltoa(long value, char * buffer, int radix), if available, or
leverage s[n]printf if available.
For the last option you can use the following, "mostly" portable code:
void LCD::printInteger(unsigned char headLine[17], int32_t number) {
...
char str[17];
if (sizeof(int) == sizeof(int32_t))
snprintf(str, sizeof(str), "%d", num);
else if (sizeof(long int) == sizeof(int32_t))
snprintf(str, sizeof(str), "%ld", num);
else if (sizeof(long long int) == sizeof(int32_t))
snprintf(str, sizeof(str), "%lld", num);
...
}
If, and only if, your platform doesn't have snprintf, you can use sprintf and remove the 2nd argument (sizeof(str)). Your go-to function should always be the n variant, as it gives you one less bullet to shoot your foot with :)
Since you're compiling with a C++ compiler that is, I assume, at least half-decent, the above should do "the right thing" in a portable way, without emitting all the unnecessary code. The test conditions passed to if are compile-time constant expressions. Even some fairly old C compilers could deal with such properly.
Nitpick: Don't use int8_t where a char would do. itoa, s[n]printf, etc. expect char buffers, not int8_t buffers.
I have a binary file from which I load whole text in unsigned char[] and a variable const uint32_t LITTLE_ENDIAN_ID = 0x49696949;
I need to compare first four characters from loaded char[] with given uint32_t.
Is that possible somehow?
If buff is your unsigned char[] buffer, you can do:
memcmp((unsigned char*)&LITTLE_ENDIAN_ID, buff, 4) == 0
memcmp is defined in string.h
yes, it's absolutely possible, but your question is underspecified. What you want to do is to take the first 4 characters of your character array and convert them into a uint32_t; the obvious question: which character corresponds to which byte of the 32-bit int? This is probably equivalent of asking if these bytes are stored in little-endian or big-endian order. Though now that I see your LITTLE_ENDIAN_ID I realize that it doesn't matter - it's (oddly) the same forwards and backwards.
Anyhow, what you want is either:
unsigned char[] text = ...
uint32_t x = text[0] << 24 + text[1] << 16 + text[2] << 8 + text[3];
if (x == LITTLE_ENDIAN_ID)
// do something
Or the same thing, but with
uint32_t x = text[3] << 24 + text[2] << 16 + text[1] << 8 + text[0];
Alternatively we could do something a little more unusual like
union {
uint32_t int_value;
unsigned char[4] characters;
} converter;
unsigned char[] text = ...
converter x;
for (int i=0; i < 4; i++)
x.characters[i] = text[i];
if (x.int_value == LITTLE_ENDIAN_ID)
// do something
This is probably closer to what you want if you are actually looking to test the endianness of the current system.
I have some data coming in from a sensor. The data is in the range of a signed int, 16 bits or so. I need to send the data out via Bluetooth.
Problem:
The data is -1564, lets say.The Bluetooth transmits -, 1, 5, 6, then 4. This is inefficient. I can process the data on the PC later, I just need the frequency to go up.
My Idea/ Solution:
Have it convert to binary, then to ASCII for output. I can convert the ASCII later in processing. I have the binary part (found on StackOverflow) here:
inline void printbincharpad(char c)
{
for (int i = 7; i >= 0; --i)
{
putchar( (c & (1 << i)) ? '1' : '0' );
}
}
This outputs in binary very well. But getting the bluetooth to transmit, say 24, spits out 1, 1, 0, 0, then 0. In fact, slower than just 2, then 4.
Say I have 65062, 5 bytes to transmit, coming out of the sensor. That is 1111111000100110 in binary, 16 bytes. To ASCII, it's �& (yes, the character set here is small, I know, but it's unique) just 2 bytes! In HEX it's FE26, 4 bytes. A savings of 3 vs decimal and 14 vs. binary and 2 vs. Hex. Ok, obviously, I want ASCII sent out here.
My Question:
So, how do I convert to ASCII if given a binary input?
I want to send that, the ASCII
Hedging:
Yes, I code in MatLab more than C++. This is for a microcontroller. The BAUD is 115200. No, I don't know how the above code works, I don't know where putchar's documentation is found. If you knw of a library that I need to run this, please tell me, as I do not know.
Thank you for any and all help or advice, I do appreciate it.
EDIT: In response to some of the comments: it's two 16 bit registers I am reading from, so data loss is impossible.
putchar writes to the standard output, which is usually the console.
You may take a look at the other output functions in the cstdio (or stdio.h) library.
Anyways, using putchar(), here's one way to achieve what you're asking for:
void print_bytes (int n)
{
char *p = (char *) &n ;
for (size_t i = 0; i < sizeof (n); ++i) {
putchar (p [i]) ;
}
}
If you know for certain that you only want 16 bits from the integer, you can simplify this like this:
void print_bytes (int n)
{
char b = n & 0xff ;
char a = (n >> 8) & 0xff ;
putchar (a) ;
putchar (b) ;
}
Looks like when you say ASCII, you mean Base 256. You can search for solutions to converting from Base 10 to Base 256.
Here is a C program that converts an string containing 65062 (5 characters) to a string of 2 characters:
#include <stdio.h>
#include <stdlib.h>
int main()
{
char* inputString="65062";
int input;
char* tmpString;
char* outString;
int Counter;
input = atoi(inputString);
outString= malloc (sizeof(input) + 1);
tmpString = &input;
for (Counter=0; Counter < sizeof(input) ; Counter++) {
outString[Counter] = tmpString[Counter];
}
outString[sizeof(input)] = '\0';
printf ("outString = %s\n", outString);
free(outString);
}
I receive values using winsock from another computer on the network. It is a TCP socket, with the 4 first bytes of the message carrying its size. The rest of the message is formatted by the server using protobuf (protocol buffers from google).
The problemn, I think, is that it seems that the values sent by the server are hex values sent as char (ie only 10 received for 0x10). To receive the values, I do this :
bytesreceived = recv(sock, buffer, msg_size, 0);
for (int i=0;i<bytesreceived;i++)
{
data_s << hex << buffer[i];
}
where data_s is a stringstream. Them I can use the ParseFromIstream(&data_s) method from protobuf and recover the information I want.
The problem that I have is that this is VERY VERY long (I got another implementation using QSock taht I can't use for my project but which is much faster, so there is no problem on the server side).
I tried many things that I took from here and everywhere on the internet (using Arrays of bytes, strings), but nothing works.
Do I have any other options ?
Thank you for your time and comments ;)
not sure if this will be of any use, but I've used a similar protocol before (first 4 bytes holds an int with the length, rest is encoded using protobuf) and to decode it I did something like this (probably not the most efficient solution due to appending to strings):
// Once I've got the first 4 bytes, cast it to an int:
int msgLen = ntohl(*reinterpret_cast<const int*>(buffer));
// Check I've got enough bytes for the message, if I have then
// just parse the buffer directly
MyProtobufObj obj;
if( bytesreceived >= msgLen+4 )
{
obj.ParseFromArray(buffer+4,msgLen);
}
else
{
// just keep appending buffer to an STL string until I have
// msgLen+4 bytes and then do
// obj.ParseFromString(myStlString)
}
I wouldn't use the stream operators. They're for formatted data and that's not what you want.
You can keep the values received in a std::vector with the char type (vector of bytes). That would essentially just be a dynamic array. If you want to continue using a string stream, you can use the stringstream::write function which takes a buffer and a length. You should have the buffer and number of bytes received from your call to recv.
If you want to use the vector method, you can use std::copy to make it easier.
#include <algorithm>
#include <iterator>
#include <vector>
char buf[256];
std::vector<char> bytes;
size_t n = recv(sock, buf, 256, 0);
std::copy(buf, buf + n, std::back_inserter(bytes));
Your question is kind of ambiguous. Let's follow your example. You receive 10 as characters and you want to retrieve this as a hex number.
Assuming recv will give you this character string, you can do this.
First of all make it null terminated:
bytesreceived[msg_size] = '\0';
then you can very easily read the value from this buffer using standard *scanf function for strings:
int hexValue;
sscanf(bytesreceived, "%x", &hexValue);
There you go!
Edit: If you receive the number in reverse order (so 01 for 10), probably your best shot is to convert it manually:
int hexValue = 0;
int positionValue = 1;
for (int i = 0; i < msg_size; ++i)
{
int digit = 0;
if (bytesreceived[i] >= '0' && bytesreceived[i] <= '9')
digit = bytesreceived[i]-'0';
else if (bytesreceived[i] >= 'a' && bytesreceived[i] <= 'f')
digit = bytesreceived[i]-'a';
else if (bytesreceived[i] >= 'A' && bytesreceived[i] <= 'F')
digit = bytesreceived[i]-'A';
else // Some kind of error!
return error;
hexValue += digit*positionValue;
positionValue *= 16;
}
This is just a clear example though. In reality you would do it with bit shifting for example rather than multiplying.
What data type is buffer?
The whole thing looks like a great big no-op, since operator<<(stringstream&, char) ignores the base specifier. The hex specifier only affects formatting of non-character integral types. For certain you don't want to be handing textual data to protobuf.
Just hand the buffer pointer to protobuf, you're done.
OK, a shot into the dark: Let's say your ingress stream is "71F4E81DA...", and you want to turn this into a byte stream { 0x71, 0xF4, 0xE8, ...}. Then we can just assemble the bytes from the character literals as follows, schematically:
char * p = getCurrentPointer();
while (chars_left() >= 2)
{
unsigned char b;
b = get_byte_value(*p++) << 8;
b += get_byte_value(*p++);
output_stream.insert(b);
}
Here we use a little helper function:
unsigned char get_byte_value(char c)
{
if ('0' <= c && c <= '9') return c - '0';
if ('A' <= c && c <= 'F') return 10 + c - 'A';
if ('a' <= c && c <= 'f') return 10 + c - 'a';
return 0; // error
}
I was wondering is it safe to do so?
wchar_t wide = /* something */;
assert(wide >= 0 && wide < 256 &&);
char myChar = static_cast<char>(wide);
If I am pretty sure the wide char will fall within ASCII range.
Why not just use a library routine wcstombs.
assert is for ensuring that something is true in a debug mode, without it having any effect in a release build. Better to use an if statement and have an alternate plan for characters that are outside the range, unless the only way to get characters outside the range is through a program bug.
Also, depending on your character encoding, you might find a difference between the Unicode characters 0x80 through 0xff and their char version.
You are looking for wctomb(): it's in the ANSI standard, so you can count on it. It works even when the wchar_t uses a code above 255. You almost certainly do not want to use it.
wchar_t is an integral type, so your compiler won't complain if you actually do:
char x = (char)wc;
but because it's an integral type, there's absolutely no reason to do this. If you accidentally read Herbert Schildt's C: The Complete Reference, or any C book based on it, then you're completely and grossly misinformed. Characters should be of type int or better. That means you should be writing this:
int x = getchar();
and not this:
char x = getchar(); /* <- WRONG! */
As far as integral types go, char is worthless. You shouldn't make functions that take parameters of type char, and you should not create temporary variables of type char, and the same advice goes for wchar_t as well.
char* may be a convenient typedef for a character string, but it is a novice mistake to think of this as an "array of characters" or a "pointer to an array of characters" - despite what the cdecl tool says. Treating it as an actual array of characters with nonsense like this:
for(int i = 0; s[i]; ++i) {
wchar_t wc = s[i];
char c = doit(wc);
out[i] = c;
}
is absurdly wrong. It will not do what you want; it will break in subtle and serious ways, behave differently on different platforms, and you will most certainly confuse the hell out of your users. If you see this, you are trying to reimplement wctombs() which is part of ANSI C already, but it's still wrong.
You're really looking for iconv(), which converts a character string from one encoding (even if it's packed into a wchar_t array), into a character string of another encoding.
Now go read this, to learn what's wrong with iconv.
An easy way is :
wstring your_wchar_in_ws(<your wchar>);
string your_wchar_in_str(your_wchar_in_ws.begin(), your_wchar_in_ws.end());
char* your_wchar_in_char = your_wchar_in_str.c_str();
I'm using this method for years :)
A short function I wrote a while back to pack a wchar_t array into a char array. Characters that aren't on the ANSI code page (0-127) are replaced by '?' characters, and it handles surrogate pairs correctly.
size_t to_narrow(const wchar_t * src, char * dest, size_t dest_len){
size_t i;
wchar_t code;
i = 0;
while (src[i] != '\0' && i < (dest_len - 1)){
code = src[i];
if (code < 128)
dest[i] = char(code);
else{
dest[i] = '?';
if (code >= 0xD800 && code <= 0xD8FF)
// lead surrogate, skip the next code unit, which is the trail
i++;
}
i++;
}
dest[i] = '\0';
return i - 1;
}
Technically, 'char' could have the same range as either 'signed char' or 'unsigned char'. For the unsigned characters, your range is correct; theoretically, for signed characters, your condition is wrong. In practice, very few compilers will object - and the result will be the same.
Nitpick: the last && in the assert is a syntax error.
Whether the assertion is appropriate depends on whether you can afford to crash when the code gets to the customer, and what you could or should do if the assertion condition is violated but the assertion is not compiled into the code. For debug work, it seems fine, but you might want an active test after it for run-time checking too.
Here's another way of doing it, remember to use free() on the result.
char* wchar_to_char(const wchar_t* pwchar)
{
// get the number of characters in the string.
int currentCharIndex = 0;
char currentChar = pwchar[currentCharIndex];
while (currentChar != '\0')
{
currentCharIndex++;
currentChar = pwchar[currentCharIndex];
}
const int charCount = currentCharIndex + 1;
// allocate a new block of memory size char (1 byte) instead of wide char (2 bytes)
char* filePathC = (char*)malloc(sizeof(char) * charCount);
for (int i = 0; i < charCount; i++)
{
// convert to char (1 byte)
char character = pwchar[i];
*filePathC = character;
filePathC += sizeof(char);
}
filePathC += '\0';
filePathC -= (sizeof(char) * charCount);
return filePathC;
}
one could also convert wchar_t --> wstring --> string --> char
wchar_t wide;
wstring wstrValue;
wstrValue[0] = wide
string strValue;
strValue.assign(wstrValue.begin(), wstrValue.end()); // convert wstring to string
char char_value = strValue[0];
In general, no. int(wchar_t(255)) == int(char(255)) of course, but that just means they have the same int value. They may not represent the same characters.
You would see such a discrepancy in the majority of Windows PCs, even. For instance, on Windows Code page 1250, char(0xFF) is the same character as wchar_t(0x02D9) (dot above), not wchar_t(0x00FF) (small y with diaeresis).
Note that it does not even hold for the ASCII range, as C++ doesn't even require ASCII. On IBM systems in particular you may see that 'A' != 65