Assign Unicode values to char - c++

My code includes a loop in which checks every character of a std::string, assign it to a char variable, which fails its assertion when -1 >= c >= 255.
It is a method from a JSON parser class that it's not mine:
static std::string UnescapeJSONString(const std::string& str)
{
std::string s = "";
for (int i = 0; i < str.length(); i++)
{
char c = str[i]; // << HERE FAILS WHEN 'É' CHARACTER
if ((c == '\\') && (i + 1 < str.length()))
{
int skip_ahead = 1;
unsigned int hex;
std::string hex_str;
switch (str[i+1])
{
case '"' : s.push_back('\"'); break;
case '\\': s.push_back('\\'); break;
case '/' : s.push_back('/'); break;
case 't' : s.push_back('\t'); break;
case 'n' : s.push_back('\n'); break;
case 'r' : s.push_back('\r'); break;
case 'b' : s.push_back('\b'); break;
case 'f' : s.push_back('\f'); break;
case 'u' : skip_ahead = 5;
hex_str = str.substr(i + 4, 2);
hex = (unsigned int)std::strtoul(hex_str.c_str(), nullptr, 16);
s.push_back((char)hex);
break;
default: break;
}
i += skip_ahead;
}
else
s.push_back(c);
}
return Trim(s);
}
How can I assign a Unicode value to a char? In this case the value is É, and the code is not ready to receive such a characters.
This is included into a dll library, and is giving me this error:

The std::string doesn't use Unicode. This is evident because there is a method c_str that lets you get a char array from the std::string.
Answering your question, your test is wrong:
-1 >= c && c >= 255
It should be:
-1 <= c && c <= 255
But you can't get c anywhere near 255 since the char is signed.
If you want to get 255 out a char it needs to be
unsigned char
This will not let you reach -1 though.
Read about char*'s here:
http://www.cplusplus.com/doc/tutorial/variables/
see char arrays here:
http://www.cplusplus.com/doc/tutorial/ntcs/
see std::string here:
http://www.cplusplus.com/reference/string/string/

Related

How to convert a String to a char * in Arduino?

I'm doing a function to convert an integer into a hexadecimal char * in Arduino, but I came across the problem of not being able to convert a String to a char *. Maybe if there is a way to allocate memory dynamically for char * I do not need a class String.
char *ToCharHEX(int x)
{
String s;
int y = 0;
int z = 1;
do
{
if (x > 16)
{
y = (x - (x % 16)) / 16;
z = (x - (x % 16));
x = x - (x - (x % 16));
}
else
{
y = x;
}
switch (y)
{
case 0:
s += "0";
continue;
case 1:
s += "1";
continue;
case 2:
s += "2";
continue;
case 3:
s += "3";
continue;
case 4:
s += "4";
continue;
case 5:
s += "5";
continue;
case 6:
s += "6";
continue;
case 7:
s += "7";
continue;
case 8:
s += "8";
continue;
case 9:
s += "9";
continue;
case 10:
s += "A";
continue;
case 11:
s += "B";
continue;
case 12:
s += "C";
continue;
case 13:
s += "D";
continue;
case 14:
s += "E";
continue;
case 15:
s += "F";
continue;
}
}while (x > 16 || y * 16 == z);
char *c;
s.toCharArray(c, s.length());
Serial.print(c);
return c;
}
The toCharArray () function is not converting the string to a char array. Serial.print (c) is returning empty printing. I do not know what I can do.
Updated: Your Question re: String -> char* conversion:
String.toCharArray(char* buffer, int length) wants a character array buffer and the size of the buffer.
Specifically - your problems here are that:
char* c is a pointer that is never initialized.
length is supposed be be the size of the buffer. The string knows how long it is.
So, a better way to run this would be:
char c[20];
s.toCharArray(c, sizeof(c));
Alternatively, you could initialize c with malloc, but then you'd have to free it later. Using the stack for things like this saves you time and keeps things simple.
Reference: https://www.arduino.cc/en/Reference/StringToCharArray
The intent in your code:
This is basically a duplicate question of: https://stackoverflow.com/a/5703349/1068537
See Nathan's linked answer:
// using an int and a base (hexadecimal):
stringOne = String(45, HEX);
// prints "2d", which is the hexadecimal version of decimal 45:
Serial.println(stringOne);
Unless this code is needed for academic purposes, you should use the mechanisms provided by the standard libraries, and not reinvent the wheel.
String(int, HEX) returns the hex value of the integer you're looking to convert
Serial.print accepts String as an argument
char* string2char(String command){
if(command.length()!=0){
char *p = const_cast<char*>(command.c_str());
return p;
}
}

How to form an ASCII(Hex) number using 2 Chars?

I have char byte[0] = '1' (H'0x31)and byte[1] = 'C'(H'0x43)
I am using one more buffer to more buff char hex_buff[0] .i want to have hex content in this hex_buff[0] = 0x1C (i.e combination of byte[0] and byte[1])
I was using below code but i realized that my code is valid for the hex values 0-9 only
char s_nibble1 = (byte[0]<< 4)& 0xf0;
char s_nibble2 = byte[1]& 0x0f;
hex_buff[0] = s_nibble1 | s_nibble2;// here i want to have 0x1C instead of 0x13
What keeps you from using strtol()?
char bytes[] = "1C";
char buff[1];
buff[0] = strtol(bytes, NULL, 16); /* Sets buff[0] to 0x1c aka 28. */
To add this as per chux's comment: strtol() only operates on 0-terminated character arrays. Which does not necessarily needs to be the case for the OP's question.
A possible way to do it, without dependencies with other character manipulation functions:
char hex2byte(char *hs)
{
char b = 0;
char nibbles[2];
int i;
for (i = 0; i < 2; i++) {
if ((hs[i] >= '0') && (hs[i] <= '9'))
nibbles[i] = hs[i]-'0';
else if ((hs[i] >= 'A') && (hs[i] <= 'F'))
nibbles[i] = (hs[i]-'A')+10;
else if ((hs[i] >= 'a') && (hs[i] <= 'f'))
nibbles[i] = (hs[i]-'a')+10;
else
return 0;
}
b = (nibbles[0] << 4) | nibbles[1];
return b;
}
For example: hex2byte("a1") returns the byte 0xa1.
In your case, you should call the function as: hex_buff[0] = hex2byte(byte).
You are trying to get the nibble by masking out the bits of character code, rather than subtracting the actual value. This is not going to work, because the range is disconnected: there is a gap between [0..9] and [A-F] in the encoding, so masking is going to fail.
You can fix this by adding a small helper function, and using it twice in your code:
int hexDigit(char c) {
c = toupper(c); // Allow mixed-case letters
switch(c) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9': return c-'0';
case 'A':
case 'B':
case 'C':
case 'D':
case 'E':
case 'F': return c-'A'+10;
default: // Report an error
}
return -1;
}
Now you can code your conversion like this:
int val = (hexDigit(byte[0]) << 4) | hexDigit(byte[1]);
It looks like you are trying to convert ASCII hex into internal representation.
There are many ways to do this, but the one I use most often for each nibble is:
int nibval(unsigned short x)
{
if (('0' <= x) && ('9' >= x))
{
return x - '0';
}
if (('a' <= x) && ('f' >= x))
{
return x - ('a' - 10);
}
if (('A' <= x) && ('F' >= x))
{
return x - ('A' - 10);
}
// Invalid input
return -1;
}
This uses an unsigned int parameter so that it will work for single byte characters as well as wchar_t characters.

How to convert an std::string to an unsigned char[] array *properly*. I think I did it wrong, someone point me in the right direction?

I'm currently reverse engineering a network protocol and I wrote a small decryption protocol.
I used to define the bytes of the packet into an unsigned character array, as so:
unsigned char buff[] = "\x00\xFF\x0A" etc.
In order to not recompile the program multiple times per packet I made a small GUI tool where it would get the bytes in \xFF notation from a string. I did this the following way:
int length = int(stencString.length());
unsigned char *buff = new unsigned char[length+1];
memcpy(buff, stencString.c_str(), length+1);
When I call my function it gives me a proper decryption when I hardcode it using the prior method but it gives me garbage then the rest of my string when I memcpy from the string to the array. The creepy part? They both have the same print output!
Here's how I'm using it:
http://pastie.org/private/kndfbaqgvmjiuwlounss9g
Here's kdxalgo.h (c) Luigi Auriemma:
http://pastie.org/private/7dzemmwyyqtngiamlxy8tw
Can someone point me in the right direction?
Thanks!
See what happens when you use the following for the hardcoded version of buff.
unsigned char buff[] =
"\\xd3\\x8c\\x38\\x6b\\x82\\x4c\\xe1\\x1e"
"\\x6b\\x7a\\xff\\x4c\\x9d\\x73\\xbe\\xab"
"\\x38\\xc7\\xc5\\xb8\\x71\\x8f\\xd5\\xbb"
"\\xfa\\xb9\\xf3\\x7a\\x43\\xdd\\x12\\x41"
"\\x4b\\x01\\xa2\\x59\\x74\\x60\\x1e\\xe0"
"\\x6d\\x68\\x26\\xfa\\x0a\\x63\\xa3\\x88";
I have a suspicion that it will produce the same output as you entering the following: \xd3\x8c\x38\x6b\x82\x4c\xe1\x1e\x6b\x7a\xff\x4c\x9d\x73\xbe\xab\x38\xc7\xc5\xb8\x71\x8f\xd5\xbb\xfa\xb9\xf3\x7a\x43\xdd\x12\x41\x4b\x01\xa2\x59\x74\x60\x1e\xe0\x6d\x68\x26\xfa\x0a\x63\xa3\x88.
The compiler automatically takes "\xd3" and converts it into the expected underlying binary representation. You need to have a method of converting the characters backslash, x, d, 3 into the same binary representation.
If you are certain that you will receive properly formated input, then the answer isn't too hard:
unsigned char c2h(char ch)
{
switch (ch)
{
case '0': return 0;
case '1': return 1;
case '2': return 2;
case '3': return 3;
case '4': return 4;
case '5': return 5;
case '6': return 6;
case '7': return 7;
case '8': return 8;
case '9': return 9;
case 'a': return 10;
case 'b': return 11;
case 'c': return 12;
case 'd': return 13;
case 'e': return 14;
case 'f': return 15;
}
}
std::string handle_hex(const std::string& str)
{
std::string result;
for (size_t index = 0; index < str.length(); index += 4) // skip to next hex digit
{
// str[index + 0] is '\\' and str[index + 1] is 'x'
unsigned char ch = c2h(str[index+2]) * 16 + c2h(str[index+3]);
result.append((char)ch);
}
return result;
}
Again assuming perfect formatting, so there is not error handling. I know that I'll lose some points for this answer because it's not the best way of doing this, but I want to make the algorithm as easy to understand as possible.
The problem, as Jeffery points out, is that the compiler processes the \xd3 and generates a character with that value, but when you read into a string \xd3 you are actually reading 4 characters: \, x, d and 3.
You will need to read the string, and then parse it into valid contents. For a simple approach, you can change the format so that the input is a space separated sequence of characters encoded as 0xd3 (as this is really simple to parse):
std::string buffer;
std::string input( "0xd3 0x8c 0x38" ); // this would be read
std::istringstream in( input );
in >> std::hex;
std::copy( std::istream_iterator<int>( in ),
std::istream_iterator<int>(),
std::back_inserter( buffer ) );
Of course, there is no need to change the format, you can process it. For that you will only need to read one character at a time. When you encounter a \ then read the next character, if it is x then read the next two characters (say ch1 and ch2) and transform them into an integer value:
int value_of_hex( char ch ) {
if (ch >= '0' && ch <= '9')
return ch-'0';
if (tolower(ch) >= 'a' && tolower(ch) <= 'f')
return 10 + toupper(ch) - 'a';
// error
throw std::runtime_error( "Invalid input" );
}
value = value_of_hex( ch1 )*16 + value_of_hex( ch2 );

String (const char*, size_t) to int?

What's the fastest way to convert a string represented by (const char*, size_t) to an int?
The string is not null-terminated.
Both these ways involve a string copy (and more) which I'd like to avoid.
And yes, this function is called a few million times a second. :p
int to_int0(const char* c, size_t sz)
{
return atoi(std::string(c, sz).c_str());
}
int to_int1(const char* c, size_t sz)
{
return boost::lexical_cast<int>(std::string(c, sz));
}
Given a counted string like this, you may be able to gain a little speed by doing the conversion yourself. Depending on how robust the code needs to be, this may be fairly difficult though. For the moment, let's assume the easiest case -- that we're sure the string is valid, containing only digits, (no negative numbers for now) and the number it represents is always within the range of an int. For that case:
int to_int2(char const *c, size_t sz) {
int retval = 0;
for (size_t i=0; i<sz; i++)
retval *= 10;
retval += c[i] -'0';
}
return retval;
}
From there, you can get about as complex as you want -- handling leading/trailing whitespace, '-' (but doing so correctly for the maximally negative number in 2's complement isn't always trivial [edit: see Nawaz's answer for one solution to this]), digit grouping, etc.
Another slow version, for uint32:
void str2uint_aux(unsigned& number, unsigned& overflowCtrl, const char*& ch)
{
unsigned digit = *ch - '0';
++ch;
number = number * 10 + digit;
unsigned overflow = (digit + (256 - 10)) >> 8;
// if digit < 10 then overflow == 0
overflowCtrl += overflow;
}
unsigned str2uint(const char* s, size_t n)
{
unsigned number = 0;
unsigned overflowCtrl = 0;
// for VC++10 the Duff's device is faster than loop
switch (n)
{
default:
throw std::invalid_argument(__FUNCTION__ " : `n' too big");
case 10: str2uint_aux(number, overflowCtrl, s);
case 9: str2uint_aux(number, overflowCtrl, s);
case 8: str2uint_aux(number, overflowCtrl, s);
case 7: str2uint_aux(number, overflowCtrl, s);
case 6: str2uint_aux(number, overflowCtrl, s);
case 5: str2uint_aux(number, overflowCtrl, s);
case 4: str2uint_aux(number, overflowCtrl, s);
case 3: str2uint_aux(number, overflowCtrl, s);
case 2: str2uint_aux(number, overflowCtrl, s);
case 1: str2uint_aux(number, overflowCtrl, s);
}
// here we can check that all chars were digits
if (overflowCtrl != 0)
throw std::invalid_argument(__FUNCTION__ " : `s' is not a number");
return number;
}
Why it's slow? Because it processes chars one-by-one. If we'd had a guarantee that we can access bytes upto s+16, we'd can use vectorization for *ch - '0' and digit + 246.
Like in this code:
uint32_t digitsPack = *(uint32_t*)s - '0000';
overflowCtrl |= digitsPack | (digitsPack + 0x06060606); // if one byte is not in range [0;10), high nibble will be non-zero
number = number * 10 + (digitsPack >> 24) & 0xFF;
number = number * 10 + (digitsPack >> 16) & 0xFF;
number = number * 10 + (digitsPack >> 8) & 0xFF;
number = number * 10 + digitsPack & 0xFF;
s += 4;
Small update for range checking:
the first snippet has redundant shift (or mov) on every iteration, so it should be
unsigned digit = *s - '0';
overflowCtrl |= (digit + 256 - 10);
...
if (overflowCtrl >> 8 != 0) throw ...
Fastest:
int to_int(char const *s, size_t count)
{
int result = 0;
size_t i = 0 ;
if ( s[0] == '+' || s[0] == '-' )
++i;
while(i < count)
{
if ( s[i] >= '0' && s[i] <= '9' )
{
//see Jerry's comments for explanation why I do this
int value = (s[0] == '-') ? ('0' - s[i] ) : (s[i]-'0');
result = result * 10 + value;
}
else
throw std::invalid_argument("invalid input string");
i++;
}
return result;
}
Since in the above code, the comparison (s[0] == '-') is done in every iteration, we can avoid this by calculating result as negative number in the loop, and then return result if s[0] is indeed '-', otherwise return -result (which makes it a positive number, as it should be):
int to_int(char const *s, size_t count)
{
size_t i = 0 ;
if ( s[0] == '+' || s[0] == '-' )
++i;
int result = 0;
while(i < count)
{
if ( s[i] >= '0' && s[i] <= '9' )
{
result = result * 10 - (s[i] - '0'); //assume negative number
}
else
throw std::invalid_argument("invalid input string");
i++;
}
return s[0] == '-' ? result : -result; //-result is positive!
}
That is an improvement!
In C++11, you could however use any function from std::stoi family. There is also std::to_string family.
llvm::StringRef s(c,sz);
int n;
s.getAsInteger(10,n);
return n;
http://llvm.org/docs/doxygen/html/classllvm_1_1StringRef.html
You'll have to either write custom routine or use 3rd party library if you're dead set on avoiding string copy.
You probably don't want to write atoi from scratch (it is still possible to make a bug here), so I'd advise to grab existing atoi from public domain or BSD-licensed code and modify it. For example, you can get existing atoi from FreeBSD cvs tree.
If you run the function that often, I bet you parse the same number many times. My suggestion is to BCD encode the string into a static char buffer (you know it's not going to be very long, since atoi only can handle +-2G) when there's less than X digits (X=8 for 32 bit lookup, X=16 for 64 bit lookup) then place a cache in a hash map.
When you're done with the first version, you can probably find nice optimizations, such as skipping the BCD encoding entirely and just using X characters in the string (when length of string <= X) for lookup in the hash table. If the string is longer, you fallback to atoi.
Edit: ... or fallback instead of atoi to Jerry Coffin's solution, which is as fast as they come.

testing a string to see if a number is present and asigning that value to a variable while skipping all the non-numeric values?

given a string say " a 19 b c d 20", how do I test to see if at that particular position on the string there is a number? (not just the character '1' but the whole number '19' and '20').
char s[80];
strcpy(s,"a 19 b c d 20");
int i=0;
int num=0;
int digit=0;
for (i =0;i<strlen(s);i++){
if ((s[i] <= '9') && (s[i] >= '0')){ //how do i test for the whole integer value not just a digit
//if number then convert to integer
digit = s[i]-48;
num = num*10+digit;
}
if (s[i] == ' '){
break; //is this correct here? do nothing
}
if (s[i] == 'a'){
//copy into a temp char
}
}
These are C solutions:
Are you just trying to parse the numbers out of the string? Then you can just walk the string using strtol().
long num = 0;
char *endptr = NULL;
while (*s) {
num = strtol(s, &endptr, 10);
if (endptr == s) { // Not a number here, move on.
s++;
continue;
}
// Found a number and it is in num. Move to next location.
s = endptr;
// Do something with num.
}
If you have a specific location and number to check for you can still do something similar.
For example: Is '19' at position 10?
int pos = 10;
int value = 19;
if (pos >= strlen(s))
return false;
if (value == strtol(s + pos, &endptr, 10) && endptr != s + pos)
return true;
return false;
Are you trying to parse out the numbers without using any library routines?
Note: I haven't tested this...
int num=0;
int sign=1;
while (*s) {
// This could be done with an if, too.
switch (*s) {
case '-':
sign = -1;
case '+':
s++;
if (*s < '0' || *s > '9') {
sign = 1;
break;
}
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
// Parse number, start with zero.
num = 0;
do {
num = (num * 10) + (*s - '0');
s++;
} while (*s >= '0' && *s <= '9');
num *= sign;
// Restore sign, just in case
sign = 1;
// Do something with num.
break;
default:
// Not a number
s++;
}
}
It seems like you want to parse the string and extract all the numbers from it; if so, here's a more "C++" way to do it:
string s = "a 19 b c d 20"; // your char array will work fine here too
istringstream buffer(s);
string token;
int num;
while (!buffer.eof())
{
buffer >> num; // Try to read a number
if (!buffer.fail()) { // if it doesn't work, failbit is set
cout << num << endl; // It's a number, do what you want here
} else {
buffer.clear(); // wasn't a number, clear the failbit
buffer >> token; // pull out the non-numeric token
}
}
This should print out the following:
19
20
The stream extraction operator pulls out space-delimited tokens automatically, so you're saved from having to do any messy character-level operations or manual integer conversion. You'll need to #include <sstream> for the stringstream class.
You can use atoi().
after your if you need to shift to while to collect subqsequent digits until you hit a non-digit.
BUT, more inportantly, have you clearly defined your requirements? Will you allow whitespace between the digits? What if there are two numbers, like abc123def456gh?
Its not very clear what you are looking for.. Assuming you want to extract all the digits from a string and then from a whole number from the found digits you can try the following:
int i;
unsigned long num=0; // to hold the whole number.
int digit;
for (i =0;i<s[i];i++){
// see if the ith char is a digit..if yes extract consecutive digits
while(isdigit(s[i])) {
num = num * 10 + (s[i] - '0');
i++;
}
}
It is assumed that all the digits in your string when concatenated to from the whole number will not overflow the long data type.
There's no way to test for a whole number. Writing a lexer, as you've done is one way to go. Another would be to try and use the C standard library's strtoul function (or some similar function depending on whether the string has floating point numbers etc).
Your code needs to allow for whitespaces and you can use the C library's isdigit to test if the current character is a digit or not:
vector<int> parse(string const& s) {
vector<int> vi;
for (size_t i = 0; i < s.length();) {
while (::isspace((unsigned char)s[ i ]) i++;
if (::isdigit((unsigned char)s[ i ])) {
int num = s[ i ] - '0';
while (::isdigit((unsigned char)s[ i ])) {
num = num * 10 + (s[ i ] - '0');
++i;
}
vi.push_back(num);
}
....
Another approach will be to use boost::lexical_cast:
vector<string> tokenize(string const& input) {
vector<string> tokens;
size_t off = 0, start = 0;
while ((off = input.find(' ', start)) != string::npos) {
tokens.push_back(input.substr(start, off-start));
start = off + 1;
}
return tokens;
}
vector<int> getint(vector<string> tokens) {
vector<int> vi;
for (vector<string> b = tokens.begin(), e = tokens.end(); b! = e; ++b) {
try
{
tokens.push_back(lexical_cast<short>(*b));
}
catch(bad_lexical_cast &) {}
}
return vi;
}