It took time looking as utf8 convert string to hexadecimal string, and backwards
I found some examples and possible solutions, but all work well only without special characters.
I have a folowing :
string in="áéíóúñü"
The result shoud be:
"c3a1c3a9c3adc3b3c3bac3b1c3bc"
I try following post, and others:
C++ convert string to hexadecimal and vice versa
How to convert a string in hexadecimal string?
http://www.cplusplus.com/forum/beginner/161703/
I will try to explain better, but I can not speak English properly. Sorry.
I have to send some data using socket. For that I have to convert names to hexadecimal using UTF-8, but in some cases have specials characters for example á, é, í...
When converting normal letters get a string length of 2 per letter.
a-> "61"
e-> "65"
But special characters are encoded (on UTF-8) with length 4
á-> "c3a1" this is the correct conversion
é-> "c3a9" this is the correct conversion
I have attempted the conversion of all the ways I've found, including that suggested me down. But every time you convert a special character gives me an answer of 2 digits, that is not correct.
á-> "e1" this isnt correct
é-> "e9" this isnt correct
Loop over each "character" in the std::string object, output it's two-digit hexadecimal equivalent as an int.
For looping, I recommend you look into range-based for loops.
To set the number of digits to print, read about setting stream precision.
To print a number as hexadecimal, read about the base I/O manipulators.
To convert to an int read about static_cast.
Oh, and I recommend using an unsigned char for the single "characters".
Simple solution based on the above:
std::string stoh(std::string const& in)
{
std::ostringstream os;
for(unsigned char const& c : in)
{
os << std::hex << std::setprecision(2) << std::setw(2)
<< std::setfill('0') << static_cast<int>(c);
}
return os.str();
}
Related
Suppose I have a string that contains a necessary numeric character but it is not terminated by '/0', it has garbage characters instead. Actually, the string has garbage characters after the number. So how to deal with the garbage character while storing that numerical character in another string or variable?
So how to deal with the garbage character while storing that numerical character in another string or variable?
Only copy a substring. Example:
std::string example "garbage1garbage";
char numerical = example[7];
We got the numerical character excluding the garbage entirely.
If the text be converted is in a std::string, then you can extract a number from the front as follows:
#include <sstream>
...
std::string input = "128734garbage";
std::istringstream iss{input};
int num;
if (iss >> num)
...use_num...
else
std::cerr << "wasn't able to parse an int from input\n";
Just change int to double, uint64_t, ... - whatever suits your data.
If you have only a pointer to the text and know it's not null-terminated, just getting the text into a std::string is problematic. You could instead use a function that converts text to a number, but stops at the first invalid character. std::stol et al, and the other unsigned and floating point variants linked from the same reference page, are good candidates for that.
From your "another string or variable" - the above addresses storing into a numeric variable. You can then create a new std::string from the number using std::to_string, or a std::ostringstream, if that's what you want to do. This will standardise the output format though, so input like say "1E4" might end up looking like say 1000.0. Alternatively, with the stol-type functions you can use the pointer-to-the-end-of-the-number to work out the length of the numeric part, and use std::string::substr() to extract the leading number as a new std::string object.
You should also be aware that the distinction between number and garbage is not always what you might expect. For example "0XBEFHJQ" might be split by some of the above functions as 0xBEF hex and HJQ garbage.
I'm wanting to save the content of a file to a struct. I've tried to use seekg and read to write to it but it isn't working.
My file is something like:
johnmayer24ericclapton32
I want to store the name, the last name and the age in a struct like that
typedef struct test_struct{
string name;
string last_name;
int age;
} test_struct;
Here is my code
int main(){
test_struct ts;
ifstream data_base;
data_base.open("test_file.txt");
data_base.seekg(0, ios_base::beg);
data_base.read(ts, sizeof(test_struct));
data_base.close();
return 0;
}
It doesn't compile as it don't want me to use ts on the read function. Is there another way - or a way - of doing it?
Serialization/Deserialization of strings is tricky.
As binary data the convention is to output the length of the string first, then the string data.
https://isocpp.org/wiki/faq/serialization#serialize-binary-format
String data is tricky because you have to unambiguously know when the string’s body stops. You can’t unambiguously terminate all strings with a '\0' if some string might contain that character; recall that std::string can store '\0'. The easiest solution is to write the integer length just before the string data. Make sure the integer length is written in “network format” to avoid sizeof and endian problems (see the solutions in earlier bullets).
That way when reading the data back in you know the length of the string to expect and can preallocate the size of the string then just read that much data from the stream.
If your data is a non-binary (text) format it's a little trickier:
https://isocpp.org/wiki/faq/serialization#serialize-text-format
String data is tricky because you have to unambiguously know when the string’s body stops. You can’t unambiguously terminate all strings with a '\n' or '"' or even '\0' if some string might contain those characters. You might want to use C++ source-code escape-sequences, e.g., writing '\' followed by 'n' when you see a newline, etc. After this transformation, you can either make strings go until end-of-line (meaning they are deliminated by '\n') or you can delimit them with '"'.
If you use C++-like escape-sequences for your string data, be sure to always use the same number of hex digits after '\x' and '\u'. I typically use 2 and 4 digits respectively. Reason: if you write a smaller number of hex digits, e.g., if you simply use stream << "\x" << hex << unsigned(theChar), you’ll get errors when the next character in the string happens to be a hex digit. E.g., if the string contains '\xF' followed by 'A', you should write "\x0FA", not "\xFA".
If you don’t use some sort of escape sequence for characters like '\n', be careful that the operating system doesn’t mess up your string data. In particular, if you open a std::fstream without std::ios::binary, some operating systems translate end-of-line characters.
Another approach for string data is to prefix the string’s data with an integer length, e.g., to write "now is the time" as 15:now is the time. Note that this can make it hard for people to read/write the file, since the value just after that might not have a visible separator, but you still might find it useful.
Text-based serialization/deserialization convention varies but one field per line is an accepted practice.
You'll have to develop a specific algorithm, since there is no separator character between the "fields".
static const std::string input_text = "johnmayer24ericclapton32";
static const std::string alphabet = "abcdefghijklmnopqrstuvwxyz";
static const std::string decimal_digit = "0123456789";
std::string::size_type position = 0;
std::string artist_name;
position = input_text.find_first_not_of(alphabet);
if (position != std::string::npos)
{
artist_name = input_text.substr(0, position - 1);
}
else
{
cerr << "Artist name not found.";
return EXIT_FAILURE;
}
Similarly, you can extract out the number, then use std::stoi to convert the numeric string to internal representation number.
Edit 1: Splitting the name
Since there is no separator character between the first and last name, you may want to have a list of possible first names and use that to find out where the first name ends and the surname starts.
This is the code I wrote to convert integer to string.
#include <iostream>
using namespace std;
int main()
{
string s;
int b=5;
s.push_back((char)b);
cout<<s<<endl;
}
I expected the output to be 5 but it is giving me blank space.
I know there is another way of doing it using stringstream but I want to know what is wrong in this method?
Character code for numbers are not equal to the integer the character represents in typical system.
It is granteed that character codes for decimal digits are consecutive (N3337 2.3 Character sets, Paragraph 3), so you can add '0' to convert one-digit number to character.
#include <iostream>
using namespace std;
int main()
{
string s;
int b=5;
s.push_back((char)(b + '0'));
cout<<s<<endl;
}
You are interpreting the integer 5 as a character. In ASCII encoding, 5 is the Enquiry control character as you lookup here.
The character 5 on the other hand is represented by the decimal number 53.
As others said, you can't convert an integer to a string the way you are doing it.
IMHO, the best way to do it is using the C++11 method std::to_string.
Your example would translate to:
using namespace std;
int main()
{
string s;
int b=5;
s = to_string(b);
cout<<s<<endl;
}
The problem in your code is that you are converting the integer 5 to ASCII (=> ENQ ASCII code, which is not "printable").
To convert it to ASCII properly, you have to add the ASCII code of '0' (48), so:
char ascii = b + '0';
However, to convert an integer to std::string use:
std::stringstream ss; //from <sstream>
ss << 5;
std::string s = ss.str ();
I always use this helper function in my projects:
template <typename T>
std::string toString (T arg)
{
std::stringstream ss;
ss << arg;
return ss.str ();
}
Also, you can use stringstream,
std::to_string doesn't work for me on GCC
If we were writing C++ from scratch in 2016, maybe we would make this work. However as it choose to be (mostly) backward compatible with a fairly low level language like C, 'char' is in fact just a number, that string/printing algorithms interpret as a character -but most of the language doesn't treat special. Including the cast. So by doing (char) you're only converting a 32 bit signed number (int) to a 8 bit signed number (char).
Then you interpret it as a character when you print it, since printing functions do treat it special. But the value it gets printed to is not '5'. The correspondence is conventional and completely arbitrary; the first numbers were reserved to special codes which are probably obsolete by now. As Hoffman pointed out, the bit value 5 is the code for Enquiry (whatever it means), while to print '5' the character has to contain the value 53. To print a proper space you'd need to enter 32. It has no meaning other than someone decided this was as good as anything, sometime decades ago, and the convention stuck.
If you need to know for other characters and values, what you need is an "ASCII table". Just google it, you'll find plenty.
You'll notice that numbers and letters of the same case are next to each other in the order you expect, so there is some logic to it at least. Beware, however, it's often not intuitive anyway: uppercase letters are before lowercase ones for instance, so 'A' < 'a'.
I guess you're starting to see why it's better to rely on dedicated system functions for strings!
I want to convert a double value into a std::string. Currently I'm writing
return std::to_string(double_value);
But this only returns 7 digits because internally to_string() just uses std::vsnprintf with a %f format specifier (see also here).
I could now just call std::vsnprintf manually with %.15f as format specifier but this leads to trailing zeros.
My (in my eyes very obvious) goal now is to have an approach like this:
string o1 = to_string(3.14)
string o2 = to_string(3.1415926536)
assert(o1 == "3.14")
assert(o2 == "3.1415926536")
Here is a nice elaboration on trimming trailing zeros from the %.20 output but this answer is about 8 years old.
Maybe things have changed? Can I convert a double with double precision without trailing zeros in C++ today?
Solution:
Based on 2mans answer you can write a generic function like this:
template<typename T>
inline std::string tostr(T value) {
std::ostringstream s;
s.precision(std::numeric_limits<T>::digits10);
s << value;
return s.str();
}
which will behaves like desired for numeric types. Note that I took digits10 rather than max_digits10 to favor a nice decimal representation rather than more digits and trailing ..0000001
Also IMHO it's worth to add that [v][s][n]printf() together with the format string "%.15g" (rather than 'f') will also trim trailing zeros (won't work with more digits because they could not be represented with 64bit which would lead to things like a trailing '1', e.g. 3.12 -> "3.1200000000000001")
Still strange:
Maybe someone can tell me why std::to_string(double) which was introduced with C++-11 hard-codes to vsnprintf(..., "%f", ...) rather than so something like vsnprintf("%.15g") which would result in a more precise representation without affecting C code?
You can use string stream (sstring) with stream manipulators, see example below:
std::stringstream ss1;
std::stringstream ss2;
ss1.precision(15);
ss1 << 3.14;
std::cout << ss1.str()<<' '<<("3.14" == ss1.str())<<std::endl;
ss2.precision(15);
ss2 << 3.1415926536;
std::cout << ss2.str()<<' '<<("3.1415926536" == ss2.str())<<std::endl;
Or you can use boost format. Here's a link!
std::cout<<format("%.2f") % 3.14 <<std::endl;
std::cout<<format("%.10f") % 3.1415926536 <<std::endl;
I have a string containing hexadecimal values (two characters representing a byte). I would like to use std::stringstream to make the conversion as painless as possible, so I came up with the following code:
std::string a_hex_number = "e3";
{
unsigned char x;
std::stringstream ss;
ss << std::hex << a_hex_number;
ss >> x;
std::cout << x << std::endl;
}
To my biggest surprise this prints out "e" ... Of course I don't give up so easily, and I modify the code to be:
{
unsigned short y;
std::stringstream ss;
ss << std::hex << a_hex_number;
ss >> y;
std::cout << y << std::endl;
}
This, as expected, prints out 227 ...
I looked at http://www.cplusplus.com/reference/istream/istream/operator%3E%3E/ and http://www.cplusplus.com/reference/ios/hex/ but I just could not find a reference which tells me more about why this behaviour comes ...(yes, I feel that it is right because when extracting a character it should take one character, but I am a little bit confused that std:hex is ignored for characters). Is there a mention about this situation somewhere?
(http://ideone.com/YHt7Fz)
Edit I am specifically interested if this behaviour is mentioned in any of the STL standards.
If I understand correctly, you're trying to convert a string in
hex to an unsigned char. So for starters, since this is
"input", you should be using std::istringstream:
std::istringstream ss( a_hex_number );
ss >> std::hex >> variable;
Beyond that, you want the input to parse the string as an
integral value. Streams do not consider character types as
numeric values; they read a single character into them (after
skipping leading white space). To get a numeric value, you
should input to an int, and then convert that to unsigned
char. Characters don't have a base, so std::hex is
irrelevant for them. (The same thing holds for strings, for
example, and even for floating point.)
With regards to the page you site: the page doesn't mention
inputting into a character type (strangely enough, because it
does talk about all other types, including some very special
cases). The documentation for the std::hex manipulator is
also weak: in the running text, it only says that "extracted
values are also expected to be in hexadecimal base", which isn't
really correct; in the table, however, it clearly talks about
"integral values". In the standard, this is documented in
§27.7.2.2.3. (The >> operators for character types are not
member functions, but free functions, so are defined in
a different section.) What we are missing, however, is a good
document which synthesizes these sort of things: whether the
>> operator is a member or a free function doesn't really
affect the user much; you want to see all of the >> available,
with their semantics, in one place.
Let's put it simple: variable type is 'stronger' than 'hex'. That's why 'hex' is ignored for 'char' variable.
Longer story:
'Hex' modifies internal state of stringstream object telling it how to treat subsequent operations on integers. However, this does not apply to chars.
When you print out a character (i.e. unsigned char), it's printed as a character, not as a number.