cout << hex << setfill('0');
cout << 12 << setw(2);
output : 0a??????
I have no lead on UTF-8, From my understanding unless it is a non ascii character there is no difference.
What is the C++ equivalent to find the hex of a UTF-8 encoded not included in ASCII?
Correct me if i am wrong, very new to C++ if i use this expression does it mean if i take an output, let's say 12, i set 2, i will get an output of 0a?
The expression itself does not create any output of any sort.
How do i tweak it so it can take a UTF-8 character? Right now i can only deal with ASCII.
Related
I do not understand what's going on here. This is compiled with GCC 10.2.0 compiler. Printing out the whole string is different than printing out each character.
#include <iostream>
int main(){
char str[] = "“”";
std::cout << str << std::endl;
std::cout << str[0] << str[1] << std::endl;
}
Output
“”
��
Why are not the two outputted lines the same? I would expect the same line twice. Printing out alphanumeric characters does output the same line twice.
Bear in mind that, on almost all systems, the maximum value a (signed) char can hold is 127. So, more likely than not, your two 'special' characters are actually being encoded as multi-byte combinations.
In such a case, passing the string pointer to std::cout will keep feeding data from that buffer until a zero (nul-terminator) byte is encountered. Further, it appears that, on your system, the std::cout stream can properly interpret multi-byte character sequences, so it shows the expected characters.
However, when you pass the individual char elements, as str[0] and str[1], there is no possibility of parsing those arguments as components of multi-byte characters: each is interpreted 'as-is', and those values do not correspond to valid, printable characters, so the 'weird' � symbol is shown, instead.
"“”" contains more bytes than you think. It's usually encoded as utf8. To see that, you can print the size of the array:
std::cout << sizeof str << '\n';
Prints 7 in my testing. Utf8 is a multi-byte encoding. That means each character is encoded in multiple bytes. Now, you're printing bytes of a utf8 encoded string, which are not printable themselves. That's why you get � when you try to print them.
I'm trying to learn more about encoding, I knew that CR is 0x0d in hex and LF is 0x0a but CRLF is not 0x0d 0x0a and not 0x0d0a, I tried std::cout << std::hex << (int)'\r\n' in C++ and the result was 0x0d.
So, is CRLF == CR? and are these hex values the same on all operating systems?
Edit
The following is the result when tried on Windows 10 machine using MSVC (v16.2.0-pre.2.0)
const char crlf = '\r\n';
std::cout << std::hex << (int)crlf << std::endl;
std::cout << std::hex << (int)'\r\n' << std::endl;
std::cout << std::hex << (int)'\n\r' << std::endl;
output
d
a0d
d0a
If you write ‘\r\n’ your compiler should warm you since that’s a multi character literal which is implementation specific and not usually used due to that. In this case it looks like the compiler discarded the other characters.
Yes CR is 0xd and LF is 0xa in ASCII standard. The C standard doesn’t require ASCII by itself as far as I know so theoretically they could be something else. That’s why we write \n instead of 0xa (also for clarity). But practically every system in use now uses ASCII as the basis of the character set and may extend it if needed.
Conclusion
CR is char and equals to 0x0d in hex
LF is char and equals to 0x0a in hex
CRLF is a character sequence and it's equal CR LFseperatedly so it's equal to 0x0d 0x0a in hex (as mentioned by #tkausl)
The explanation of the result I got is that const char crlf = '\r\n'; will be compiled to \n (when compiled by MSVC)
when I looked at the assembly output I've found that comment ; 544 : // doing \r\n -> \n translation
thanks for all of the helpful comments.
How do I create a character array using decimal/hexadecimal representation of characters instead of actual characters.
Reason I ask is because I am writing C code and I need to create a string that includes characters that are not used in English language. That string would then be parsed and displayed to an LCD Screen.
For example '\0' decodes to 0, and '\n' to 10. Are there any more of these special characters that i can sacrifice to display custom characters. I could send "Temperature is 10\d C" and degree sign is printed instead of '\d'. Something like this would be great.
Assuming you have a character code that is a degree sign on your display (with a custom display, I wouldn't necessarily expect it to "live" at the common place in the extended IBM ASCII character set, or that the display supports Unicode character encoding) then you can use the encoding \nnn or \xhh, where nnn is up to three digits in octal (base 8) or hh is up to two digits of hex code. Unfortunately, there is no decimal encoding available - Dennis Ritchie and/or Brian Kernighan were probably more used to using octal, as that was quite common at the time when C was first developed.
E.g.
char *str = "ABC\101\102\103";
cout << str << endl;
should print ABCABC (assuming ASCII encoding)
You can directly write
char myValues[] = {1,10,33,...};
Use \u00b0 to make a degree sign (I simply looked up the unicode code for it)
This requires unicode support in the terminal.
Simple, use std::ostringstream and casting of the characters:
std::string s = "hello world";
std::ostringstream os;
for (auto const& c : s)
os << static_cast<unsigned>(c) << ' ';
std::cout << "\"" << s << "\" in ASCII is \"" << os.str() << "\"\n";
Prints:
"hello world" in ASCII is "104 101 108 108 111 32 119 111 114 108 100 "
A little more research and i found answer to my own question.
Characters follower by a '\' are called escape sequence.
You can put octal equivalent of ascii in your string by using escape sequence from'\000' to '\777'.
Same goes for hex, 'x00' to 'xFF'.
I am printing my custom characters by using 'xC1' to 'xC8', as i only had 8 custom characters.
Every thing is done in a single line of code: lcd_putc("Degree \xC1")
I just want to write a simple text file:
ofstream test;
test.clear();
test.open("test.txt",ios::out);
float var = 132.26;
BYTE var2[2];
var2[0] = 45;
var2[1] = 55;
test << var << (BYTE)var2[0] << (BYTE)var2[1];
test.close();
But in the output file I get:
132.26-7
I don't get what the problem is...
I think that the problem might be that BYTE type might be a typedef for char. If this were the case, then whenevernyou try to write out a BYTE to a stream, it will print the ASCII character corresponding to that byte rather than the numeric value of the byte. Notice that the characters - and 7 correspond to ASCII values 45 and 55, for example.
To fix this, you'll want to do two things:
Typecast the BYTEs you're writing to some integral type like int or short before writing them to the file. This forces the stream to write a numeric value rather than a character.
Output some amount of whitespace in-between all of the data you output. Right now everythingnis bleeding together because there are no spaces, which makes things harder to read.
Hope this helps!
BYTE is nothing but an alias for unsigned char. By default, when you output a char in a stream, it is converted to its ASCII character. In the ASCII table, the character 45 is '-' and the character 55 is '7'.
Try this instead:
test << var << (int)var2[0] << (int)var2[1];
inpfile>>ch;
if(ch<16) outfile<<"0×0"<<std::hex<<setprecision(2)<<(int)ch<<" ";
what does std::hex<<setprecision(2) mean?
iostreams can be manipulated to achieve the desired formatting - this is done by what at first sight looks like outputting predefined values to them as shown in our subject line of code.
std::hex displays following integer values in base16.
setprecision sets the precision for display of following floating values.
Fur further info on manipulators, start here
This line is the same as:
char ch;
inpfile>>ch;
if(ch<16)
{
outfile << "0×0" // Prints "0x0" (Ox is the standard prefix for hex numbers)
/*outfile*/ << std::hex // Tells the stream to print the next number in hex format
/*outfile*/ << setprecision(2) // Does nothing. Presumably they were trying to indicate print min of 2 characters
/*outfile*/ << (int)ch // Covert you char to an integer (so the stream will print it in hex
/*outfile*/ << " "; // Add a space for good measure.
}
Rather than setprecision(2) what was probably intended was setw(2) << setfill('0')
what does std::hex<<setprecision(2) mean?
std::hex and std::setprecision are both so-called manipulators. Applied to a stream (done by outputting them) they manipulate the stream, usually to change the stream's formatting. In particular, std::hex manipulates the stream so that values are written in hexadecimal, and std::setprecision(x) manipulates it to output numbers with x digits.
(A rather popular manipulator which you might already know about is std::endl.)
As you can see, there are manipulators that take arguments and those that take none. Also, most (in principle all) manipulators are sticky, which means their manipulation of the stream lasts until it is explicitly changed. Here is an extensive discussion about this topic.
std::hex sets the output base to hexadecimal.
setprecision has no effect on this line since it affects floating point only.