How to write to std::ostream without formatting? - c++

For example, I want to write std::string my_str( "foobar\n" ); literally into std::ostream& with the backslash-n intact (no formatting).
Perhaps I need to convert my_str with a formatting function that converts backslash to double-backslash first? Is there a standard library function for that?
Or maybe there is a directive I can pass to std::ostream&?

The easiest way to do this for larger strings is with raw string literals:
std::string my_str(R"(foobar\n)");
If you want parentheses in it, use a delimiter:
R"delim(foobar\n)delim"
I don't know of anything that will let you keep the escape codes etc. in the string, but output it without, but std::transform with an std::ostream_iterator<std::string> destination and a function that handles the cases you want should do it, as it does in this example:
std::string filter(char c) {
if (c == '\n') return "\\n";
if (c == '\t') return "\\t";
//etc
return {1, c};
}
int main() {
std::string str{"abc\nr\tt"};
std::cout << "Without transform: " << str << '\n';
std::cout << "With transform: ";
std::transform(std::begin(str), std::end(str), std::ostream_iterator<std::string>(std::cout), filter);
}

Since "foobar\n" is a constant literal, just write it as "foobar\\n", so that "\\" becomes "\", letting n to leave in peace.
What you call "formatting" is not.
The stream does not format the string. The substitution of "\n" with char(10) is made by the compiler when producing the actual value form the literal. (ostream will at most translate char(10) into { char(10), char(13) } if the underlying platform requires it, but that's another story).

This is not related to std::ostream or std::string. "foobar\n" is a literal where \n already means end of line.
You have two options:
Escape \n symbol by \ std::string str("foobar\\n");
Use modern C++11 raw string literal std::string str(R"(foobar\n)");

The "backslash + n" is not formatted by the stream; for example the length of std::string("\n") is 1 (and not 2). Likewise '\n' is a single character. The fact that you write backslash + n is just a shortcut to represent non-printable (and non-ASCII) characters.
As another example, '\x0a' == '\n' (because 0a is the hexadecimal code for the line-feed character). And std::string("\x0a").size() == 1 too.
If (on Linux1) you open a std::ofstream and write '\x0a' to that stream, you will thus end up with a file containing a single byte; whose hexadecimal value is 0a.
As such, it is not your stream that is transforming what you wrote, it is the compiler. Depending on your usecase, you may either want to:
change the string as written in the code: using "foobar\\n" (note this increase the length by 1)
perform a transformation while streaming to print the hexadecimal or escape code of non-printable characters
1 on Windows, the '\n' character is translated to "\r\n" (carriage-return + line-feed) in text mode.

Related

Why printing out the characters “” (147, 148 ascii) does not work as expected on c++?

I do not understand what's going on here. This is compiled with GCC 10.2.0 compiler. Printing out the whole string is different than printing out each character.
#include <iostream>
int main(){
char str[] = "“”";
std::cout << str << std::endl;
std::cout << str[0] << str[1] << std::endl;
}
Output
“”
��
Why are not the two outputted lines the same? I would expect the same line twice. Printing out alphanumeric characters does output the same line twice.
Bear in mind that, on almost all systems, the maximum value a (signed) char can hold is 127. So, more likely than not, your two 'special' characters are actually being encoded as multi-byte combinations.
In such a case, passing the string pointer to std::cout will keep feeding data from that buffer until a zero (nul-terminator) byte is encountered. Further, it appears that, on your system, the std::cout stream can properly interpret multi-byte character sequences, so it shows the expected characters.
However, when you pass the individual char elements, as str[0] and str[1], there is no possibility of parsing those arguments as components of multi-byte characters: each is interpreted 'as-is', and those values do not correspond to valid, printable characters, so the 'weird' � symbol is shown, instead.
"“”" contains more bytes than you think. It's usually encoded as utf8. To see that, you can print the size of the array:
std::cout << sizeof str << '\n';
Prints 7 in my testing. Utf8 is a multi-byte encoding. That means each character is encoded in multiple bytes. Now, you're printing bytes of a utf8 encoded string, which are not printable themselves. That's why you get � when you try to print them.

Finding and comparing a Unicode charater in C++

I am writing a Lexical analyzer that parses a given string in C++. I have a string
line = R"(if n = 4 # comment
return 34;
if n≤3 retur N1
FI)";
All I need to do is output all words, numbers and tokens in a vector.
My program works with regular tokens, words and numbers; but I cannot figure out how to parse Unicode characters. The only Unicode characters my program needs to save in a vector are ≤ and ≠.
So far I all my code basically takes the string line by line, reads the first word, number or token, chops it off and recursively continues to eat tokens until the string is empty. I am unable to compare line[0] with ≠ (of course) and I am also not clear on how much of the string I need to chop off in order to get rid of the Unicode char? In case of "!=" I simple remove line[0] and line[1].
If your input-file is utf8, just treat your unicode characters ≤, ≠, etc as strings. So you just have to use the same logic to recognize "≤" as you would for "<=". The length of a unicode char is then given by strlen("≤")
All Unicode encodings are variable-length except UTF-32. Therefore the next character isn't necessary a single char and you must read it as a string. Since you're using a char* or std::string, the encoding is likely UTF-8 and the next character and can be returned as std::string
The encoding of UTF-8 is very simple and you can read about it everywhere. In short, the first byte of a sequence will indicate how long that sequence is and you can get the next character like this:
std::string getNextChar(const std::string& str, size_t index)
{
if (str[index] & 0x80 == 0) // 1-byte sequence
return std::string(1, str[index])
else if (str[index] & 0xE0 == 0xC0) // 2-byte sequence
return std::string(&str[index], 2)
else if (str[index] & 0xF0 == 0xE0) // 3-byte sequence
return std::string(&str[index], 3)
else if (str[index] & 0xF8 == 0xF0) // 4-byte sequence
return std::string(&str[index], 4)
throw "Invalid codepoint!";
}
It's a very simple decoder and doesn't handle invalid codepoints or broken datastream yet. If you need better handling you'll have to use a proper UTF-8 library

Why does the size of this std::string change, when characters are changed?

I have an issue in which the size of the string is effected with the presence of a '\0' character. I searched all over in SO and could not get the answer still.
Here is the snippet.
int main()
{
std::string a = "123123\0shai\0";
std::cout << a.length();
}
http://ideone.com/W6Bhfl
The output in this case is
6
Where as the same program with a different string having numerals instead of characters
int main()
{
std::string a = "123123\0123\0";
std::cout << a.length();
}
http://ideone.com/mtfS50
gives an output of
8
What exactly is happening under the hood? How does presence of a '\0' character change the behavior?
The sequence \012 when used in a string (or character) literal is an octal escape sequence. It's the octal number 12 which corresponds to the ASCII linefeed ('\n') character.
That means your second string is actually equal to "123123\n3\0" (plus the actual string literal terminator).
It would have been very clear if you tried to print the contents of the string.
Octal sequences are one to three digits long, and the compiler will use as many digits as possible.
If you check the coloring at ideone you will see that \012 has a different color. That is because this is a single character written in octal.

Including decimal equivalent of a char in a character array

How do I create a character array using decimal/hexadecimal representation of characters instead of actual characters.
Reason I ask is because I am writing C code and I need to create a string that includes characters that are not used in English language. That string would then be parsed and displayed to an LCD Screen.
For example '\0' decodes to 0, and '\n' to 10. Are there any more of these special characters that i can sacrifice to display custom characters. I could send "Temperature is 10\d C" and degree sign is printed instead of '\d'. Something like this would be great.
Assuming you have a character code that is a degree sign on your display (with a custom display, I wouldn't necessarily expect it to "live" at the common place in the extended IBM ASCII character set, or that the display supports Unicode character encoding) then you can use the encoding \nnn or \xhh, where nnn is up to three digits in octal (base 8) or hh is up to two digits of hex code. Unfortunately, there is no decimal encoding available - Dennis Ritchie and/or Brian Kernighan were probably more used to using octal, as that was quite common at the time when C was first developed.
E.g.
char *str = "ABC\101\102\103";
cout << str << endl;
should print ABCABC (assuming ASCII encoding)
You can directly write
char myValues[] = {1,10,33,...};
Use \u00b0 to make a degree sign (I simply looked up the unicode code for it)
This requires unicode support in the terminal.
Simple, use std::ostringstream and casting of the characters:
std::string s = "hello world";
std::ostringstream os;
for (auto const& c : s)
os << static_cast<unsigned>(c) << ' ';
std::cout << "\"" << s << "\" in ASCII is \"" << os.str() << "\"\n";
Prints:
"hello world" in ASCII is "104 101 108 108 111 32 119 111 114 108 100 "
A little more research and i found answer to my own question.
Characters follower by a '\' are called escape sequence.
You can put octal equivalent of ascii in your string by using escape sequence from'\000' to '\777'.
Same goes for hex, 'x00' to 'xFF'.
I am printing my custom characters by using 'xC1' to 'xC8', as i only had 8 custom characters.
Every thing is done in a single line of code: lcd_putc("Degree \xC1")

Extract Whitespace in addition to text from input file, C++?

I have to write a function that will read input from a file. The file is set up: one character, space, word, space, throughout the file, like such:
A space 1 space 2 space... etc
I need to extract the whitespace following the one character and NOT the whitespace following the word.
How can I go about doing this? Should I just make it so the function writes the whitespace itself instead of extracting it?
Also, I am importing this info into a 2-d char array. Will I run into problems trying to write integers to a char array?
Something like this maybe?
#include <iostream>
#include <fstream>
int main() {
char myChar;
char theWS;
std::string word;
std::ifstream in("example.txt");
while(in >> myChar >> std::noskipws >> theWS >> word >> std::skipws) {
std::cout << myChar << theWS << word << '\n';
}
}
You should've been exposed to the idea of a tokenizer by now. This is the structure you need.
You will be fine writing integers into character arrays. Since C and C++ represent ascii characters as small numbers anyways, handling them is easy. Some examples of the number values which correspond to specific chars: '0' => 48, '1' => 49, ... , 'A' => 65, 'B' => 66, etc.
Take a look at http://www.asciitable.com/ for the full set of ascii characters and their corresponding values.
This also allows you to perform mathematical operations on characters such as 'A' + 1 => 'B'
as well as convert between numbers and characters (char) 65 => 'A'