Replace character in std::string with hex value - c++

I am trying to replace a number/hex value in a std::string using std::replace but when I try and do fileBuf.replace(0x10, 1, "0x44"); it just expands the string with an ASCII "0x44" instead of replacing the 1 character at position 0x10 with the value 0x44. Is there a proper way to do this? Thanks

You need to use the \x escape sequence to represent hexadecimal characters. Moreover, since you're replacing just one character, you could use character literals rather than string literals:
fileBuf.replace(0x10, 1, '\x44');

Related

Convert string to raw string

char str[] = "C:\Windows\system32"
auto raw_string = convert_to_raw(str);
std::cout << raw_string;
Desired output:
C:\Windows\system32
Is it possible? I am not a big fan of cluttering my path strings with extra backslash. Nor do I like an explicit R"()" notation.
Any other work-around of reading a backslash in a string literally?
That's not possible, \ has special meaning inside a non-raw string literal, and raw string literals exist precisely to give you a chance to avoid having to escape stuff. Give up, what you need is R"(...)".
Indeed, when you write something like
char const * str{"a\nb"};
you can verify yourself that strlen(str) is 3, not 4, which means that once you compile that line, in the binary/object file there's only one single character, the newline character, corresponding to \n; there's no \ nor n anywere in it, so there's no way you can retrieve them.
As a personal taste, I find raw string literals great! You can even put real Enter in there. Often just for the price of 3 characters - R, (, and ) - in addtion to those you would write anyway. Well, you would have to write more characters to escape anything needs escaping.
Look at
std::string s{R"(Hello
world!
This
is
Me!)"};
That's 28 keystrokes from R to last " included, and you can see in a glimpse it's 6 lines.
The equivalent non-raw string
std::string s{"Hello\nworld!\nThis\nis\nMe!"};
is 30 keystrokes from R to last " included, and you have to parse it carefully to count the lines.
A pretty short string, and you already see the advantage.
To answer the question, as asked, no it is not possible.
As an example of the impossibility, assume we have a path specified as "C:\a\b";
Now, str is actually represented in memory (in your program when running) using a statically allocated array of five characters with values {'C', ':', '\007', '\010', '\000'} where '\xyz' represents an OCTAL representation (so '\010' is a char equal to numerically to 8 in decimal).
The problem is that there is more than one way to produce that array of five characters using a string literal.
char str[] = "C:\a\b";
char str1[] = "C:\007\010";
char str2[] = "C:\a\010";
char str3[] = "C:\007\b";
char str4[] = "C:\x07\x08"; // \xmn uses hex coding
In the above, str1, str2, str3, and str4 are all initialised using equivalent arrays of five char.
That means convert_to_raw("C:\a\b") could quite legitimately assume it is passed ANY of the strings above AND
std::cout << convert_to_raw("C:\a\b") << '\n';
could quite legitimately produce output of
C:\007\010
(or any one of a number of other strings).
The practical problem with this, if you are working with windows paths, is that c:\a\b, C:\007\010, C:\a\010, C:\007\b, and C:\x07\x08 are all valid filenames under windows - that (unless they are hard links or junctions) name DIFFERENT files.
In the end, if you want to have string literals in your code representing filenames or paths, then use \\ or a raw string literal when you need a single backslash. Alternatively, write your paths as string literals in your code using all forward slashes (e.g. "C:/a/b") since windows API functions accept those too.

How to deal with garbage characters in a string?

Suppose I have a string that contains a necessary numeric character but it is not terminated by '/0', it has garbage characters instead. Actually, the string has garbage characters after the number. So how to deal with the garbage character while storing that numerical character in another string or variable?
So how to deal with the garbage character while storing that numerical character in another string or variable?
Only copy a substring. Example:
std::string example "garbage1garbage";
char numerical = example[7];
We got the numerical character excluding the garbage entirely.
If the text be converted is in a std::string, then you can extract a number from the front as follows:
#include <sstream>
...
std::string input = "128734garbage";
std::istringstream iss{input};
int num;
if (iss >> num)
...use_num...
else
std::cerr << "wasn't able to parse an int from input\n";
Just change int to double, uint64_t, ... - whatever suits your data.
If you have only a pointer to the text and know it's not null-terminated, just getting the text into a std::string is problematic. You could instead use a function that converts text to a number, but stops at the first invalid character. std::stol et al, and the other unsigned and floating point variants linked from the same reference page, are good candidates for that.
From your "another string or variable" - the above addresses storing into a numeric variable. You can then create a new std::string from the number using std::to_string, or a std::ostringstream, if that's what you want to do. This will standardise the output format though, so input like say "1E4" might end up looking like say 1000.0. Alternatively, with the stol-type functions you can use the pointer-to-the-end-of-the-number to work out the length of the numeric part, and use std::string::substr() to extract the leading number as a new std::string object.
You should also be aware that the distinction between number and garbage is not always what you might expect. For example "0XBEFHJQ" might be split by some of the above functions as 0xBEF hex and HJQ garbage.

c++ adding "\u" to string

Learning c++, trying to find a way to display UTF-16 characters by adding the 4 digits after the "\u". But, for example, if I try to directly add 0000:
string temp = "\u" + "0000";
I get the error: incorrectly formed universal character name. So is there a way to get these two to form one Unicode character? Also I realize that the end four numbers range from 0-F but for now I just want to focus on the 0-9 characters.
How can I add"\u" with a different string
Edit: I was looking for the C++ equivalent of the JavaScript function:
String.fromCharCode()
You can't say "\u" + "0000", because the parsing of escape sequences happens early in the process, before the actual compilation begins. By the time the strings would be tacked together, escape sequences are already parsed and won't be again. And since \u is not a valid escape sequence on its own, you get an error about it.
You can't separate a string literal like that. The special sequence inside the quotes is a directive to the compiler to insert the relevant Unicode character at compile time so if you break it into two pieces it is no longer recognized as a directive.
To programatically generate a UTF-16 character based on its Unicode codepoint number you could use the Standard Library Unicode converson functions. Unfortunately there is no direct conversion between UTF-32 (Unicode codepoints) and UTF-16 so you have to go through UTF-8 as an intermediate value:
// UTF-16 may contain either one or two char16_t characters so
// we return a string to potentially contain both.
///
std::u16string codepoint_to_utf16(char32_t cp)
{
// convert UTF-32 (standard unicode codepoint) to UTF-8 intermediate value
char utf8[4];
char* end_of_utf8;
{
char32_t const* from = &cp;
std::mbstate_t mbs;
std::codecvt_utf8<char32_t> ccv;
if(ccv.out(mbs, from, from + 1, from, utf8, utf8 + 4, end_of_utf8))
throw std::runtime_error("bad conversion");
}
// Now convert the UTF-8 intermediate value to UTF-16
char16_t utf16[2];
char16_t* end_of_utf16;
{
char const* from = nullptr;
std::mbstate_t mbs;
std::codecvt_utf8_utf16<char16_t> ccv;
if(ccv.in(mbs, utf8, end_of_utf8, from, utf16, utf16 + 2, end_of_utf16))
throw std::runtime_error("bad conversion");
}
return {utf16, end_of_utf16};
}
int main()
{
std::u16string s; // can hold UTF-16
// iterate through some Greek codepoint values
for(char32_t u = 0x03b1; u < 0x03c9; ++u)
{
// append the converted UTF-16 characters to our string
s += codepoint_to_utf16(u);
}
// do whatever you want with s here...
}
What you're trying to do is not possible. C++ parsing is split into multiple phases. Per [lex.phases], escape sequences (in phase 5) are escaped before adjacent string literals are concatenated (phase 6).

Why can't character constants/literals be empty?

In C and C++ the rules are the same. In C,
[§6.4.4.4]/2 An integer character constant is a sequence of one or
more multibyte characters enclosed in single-quotes, as in 'x'.
In C++,
[§2.14.3]/1 A character literal is one or more characters enclosed
in single quotes, as in 'x', optionally preceded by one of the
letters u, U, or L, as in u'y', U'z', or L'x',
respectively.
The key phrase is "one or more". In contrast, a string literal can be empty, "", presumably because it consists of the null terminating character. In C, this leads to awkward initialization of a char. Either you leave it uninitialized, or use a useless value like 0 or '\0'.
char garbage;
char useless = 0;
char useless2 = '\0';
In C++, you have to use a string literal instead of a character literal if you want it to be empty.
(somecondition ? ' ' : '') // error
(somecondition ? " " : "") // necessary
What is the reason it is this way? I'm assuming C++'s reason is inherited from C.
The reason is that a character literal is defined as a character. There may be extensions that allow it to be more than one character, but it needs to be at least one character or it just doesn't make any sense. It would be the same as trying to do:
int i = ;
If you don't specify a value, what do you put there?
This is because an empty string still contains the the null character '\0' at the end, so there is still a value to bind to the variable name, whereas an empty character literal has no value.
String is a set of character terminated by a NULL character ( '\0' ).
So a Empty string will always have a NULL character in it at the end .
But in case of a character literal no value is there.
it needs at least one character.

What does '\0' mean?

I can't understand what the '\0' in the two different place mean in the following code:
string x = "hhhdef\n";
cout << x << endl;
x[3]='\0';
cout << x << endl;
cout<<"hhh\0defef\n"<<endl;
Result:
hhhdef
hhhef
hhh
Can anyone give me some pointers?
C++ std::strings are "counted" strings - i.e., their length is stored as an integer, and they can contain any character. When you replace the third character with a \0 nothing special happens - it's printed as if it was any other character (in particular, your console simply ignores it).
In the last line, instead, you are printing a C string, whose end is determined by the first \0 that is found. In such a case, cout goes on printing characters until it finds a \0, which, in your case, is after the third h.
C++ has two string types:
The built-in C-style null-terminated strings which are really just byte arrays and the C++ standard library std::string class which is not null terminated.
Printing a null-terminated string prints everything up until the first null character. Printing a std::string prints the whole string, regardless of null characters in its middle.
\0 is the NULL character, you can find it in your ASCII table, it has the value 0.
It is used to determinate the end of C-style strings.
However, C++ class std::string stores its size as an integer, and thus does not rely on it.
You're representing strings in two different ways here, which is why the behaviour differs.
The second one is easier to explain; it's a C-style raw char array. In a C-style string, '\0' denotes the null terminator; it's used to mark the end of the string. So any functions that process/display strings will stop as soon as they hit it (which is why your last string is truncated).
The first example is creating a fully-formed C++ std::string object. These don't assign any special meaning to '\0' (they don't have null terminators).
The \0 is treated as NULL Character. It is used to mark the end of the string in C.
In C, string is a pointer pointing to array of characters with \0 at the end. So following will be valid representation of strings in C.
char *c =”Hello”; // it is actually Hello\0
char c[] = {‘Y’,’o’,’\0′};
The applications of ‘\0’ lies in determining the end of string .For eg : finding the length of string.
The \0 is basically a null terminator which is used in C to terminate the end of string character , in simple words its value is null in characters basically gives the compiler indication that this is the end of the String Character
Let me give you example -
As we write printf("Hello World"); /* Hello World\0
here we can clearly see \0 is acting as null ,tough printinting the String in comments would give the same output .