I tried:
#define EURO char(128)
cout << EURO ; //only worked on my windows desktop, not linux
Or is there a character similar to the euro sign to display ?
According to this https://www.compart.com/en/unicode/U+20AC following should work if you Linux session configured to use UTF-8
std::cout << "\xe2\x82\xac" << std::endl;
Note it has to be a string literal not a single char as there are 3 bytes in UTF8 encoding for euro.
Related
I do not understand what's going on here. This is compiled with GCC 10.2.0 compiler. Printing out the whole string is different than printing out each character.
#include <iostream>
int main(){
char str[] = "“”";
std::cout << str << std::endl;
std::cout << str[0] << str[1] << std::endl;
}
Output
“”
��
Why are not the two outputted lines the same? I would expect the same line twice. Printing out alphanumeric characters does output the same line twice.
Bear in mind that, on almost all systems, the maximum value a (signed) char can hold is 127. So, more likely than not, your two 'special' characters are actually being encoded as multi-byte combinations.
In such a case, passing the string pointer to std::cout will keep feeding data from that buffer until a zero (nul-terminator) byte is encountered. Further, it appears that, on your system, the std::cout stream can properly interpret multi-byte character sequences, so it shows the expected characters.
However, when you pass the individual char elements, as str[0] and str[1], there is no possibility of parsing those arguments as components of multi-byte characters: each is interpreted 'as-is', and those values do not correspond to valid, printable characters, so the 'weird' � symbol is shown, instead.
"“”" contains more bytes than you think. It's usually encoded as utf8. To see that, you can print the size of the array:
std::cout << sizeof str << '\n';
Prints 7 in my testing. Utf8 is a multi-byte encoding. That means each character is encoded in multiple bytes. Now, you're printing bytes of a utf8 encoded string, which are not printable themselves. That's why you get � when you try to print them.
I'm trying to learn more about encoding, I knew that CR is 0x0d in hex and LF is 0x0a but CRLF is not 0x0d 0x0a and not 0x0d0a, I tried std::cout << std::hex << (int)'\r\n' in C++ and the result was 0x0d.
So, is CRLF == CR? and are these hex values the same on all operating systems?
Edit
The following is the result when tried on Windows 10 machine using MSVC (v16.2.0-pre.2.0)
const char crlf = '\r\n';
std::cout << std::hex << (int)crlf << std::endl;
std::cout << std::hex << (int)'\r\n' << std::endl;
std::cout << std::hex << (int)'\n\r' << std::endl;
output
d
a0d
d0a
If you write ‘\r\n’ your compiler should warm you since that’s a multi character literal which is implementation specific and not usually used due to that. In this case it looks like the compiler discarded the other characters.
Yes CR is 0xd and LF is 0xa in ASCII standard. The C standard doesn’t require ASCII by itself as far as I know so theoretically they could be something else. That’s why we write \n instead of 0xa (also for clarity). But practically every system in use now uses ASCII as the basis of the character set and may extend it if needed.
Conclusion
CR is char and equals to 0x0d in hex
LF is char and equals to 0x0a in hex
CRLF is a character sequence and it's equal CR LFseperatedly so it's equal to 0x0d 0x0a in hex (as mentioned by #tkausl)
The explanation of the result I got is that const char crlf = '\r\n'; will be compiled to \n (when compiled by MSVC)
when I looked at the assembly output I've found that comment ; 544 : // doing \r\n -> \n translation
thanks for all of the helpful comments.
I have to write a code in C++ that identifies and counts English and non-English characters in a string.
The user writes an input and the program must count user's letters and notify when it finds non-English letters.
My problem is that I get a question mark instead of the non-English letter!
At the beginning of the code I wrote:
...
#include <clocale>
int main() {
std::setlocale(LC_ALL, "sv_SE.UTF-8");
...
(the locale is Swedish)
If I try to print out Swedish letters before the counting loops (as a test), it does work, so I guess the clocale is working fine.
But when I launch the counting loop below,
for (unsigned char c: rad) {
if (c < 128) {
if (isalpha(c) != 0)
bokstaver++;
}
if (c >= 134 && c <= 165) {
cout << "Your text contains a " << c << '\n';
bokstaver++;
}
}
my non-English letter is taken into account but not printed out with cout.
I used unsigned char since non-English letters are between ASCII 134 and 165, so I really don't know what to do.
test with the word blå:
non-English letters are between ASCII 134 and 165
No, they aren't. Non english characters are not between any ASCII characters in UTF-8. Non ASCII characters consist of two or more code units (those individual code units themselves can represent some character in ASCII) . å for example consists of 0xC3 followed by 0xA5.
The C and C++ library functions which only accept a single char (such as std::isalpha) are not useful when using UTF-8 because that single char can only represent a single code unit.
cout << hex << setfill('0');
cout << 12 << setw(2);
output : 0a??????
I have no lead on UTF-8, From my understanding unless it is a non ascii character there is no difference.
What is the C++ equivalent to find the hex of a UTF-8 encoded not included in ASCII?
Correct me if i am wrong, very new to C++ if i use this expression does it mean if i take an output, let's say 12, i set 2, i will get an output of 0a?
The expression itself does not create any output of any sort.
How do i tweak it so it can take a UTF-8 character? Right now i can only deal with ASCII.
How do I create a character array using decimal/hexadecimal representation of characters instead of actual characters.
Reason I ask is because I am writing C code and I need to create a string that includes characters that are not used in English language. That string would then be parsed and displayed to an LCD Screen.
For example '\0' decodes to 0, and '\n' to 10. Are there any more of these special characters that i can sacrifice to display custom characters. I could send "Temperature is 10\d C" and degree sign is printed instead of '\d'. Something like this would be great.
Assuming you have a character code that is a degree sign on your display (with a custom display, I wouldn't necessarily expect it to "live" at the common place in the extended IBM ASCII character set, or that the display supports Unicode character encoding) then you can use the encoding \nnn or \xhh, where nnn is up to three digits in octal (base 8) or hh is up to two digits of hex code. Unfortunately, there is no decimal encoding available - Dennis Ritchie and/or Brian Kernighan were probably more used to using octal, as that was quite common at the time when C was first developed.
E.g.
char *str = "ABC\101\102\103";
cout << str << endl;
should print ABCABC (assuming ASCII encoding)
You can directly write
char myValues[] = {1,10,33,...};
Use \u00b0 to make a degree sign (I simply looked up the unicode code for it)
This requires unicode support in the terminal.
Simple, use std::ostringstream and casting of the characters:
std::string s = "hello world";
std::ostringstream os;
for (auto const& c : s)
os << static_cast<unsigned>(c) << ' ';
std::cout << "\"" << s << "\" in ASCII is \"" << os.str() << "\"\n";
Prints:
"hello world" in ASCII is "104 101 108 108 111 32 119 111 114 108 100 "
A little more research and i found answer to my own question.
Characters follower by a '\' are called escape sequence.
You can put octal equivalent of ascii in your string by using escape sequence from'\000' to '\777'.
Same goes for hex, 'x00' to 'xFF'.
I am printing my custom characters by using 'xC1' to 'xC8', as i only had 8 custom characters.
Every thing is done in a single line of code: lcd_putc("Degree \xC1")