Printing unicode literals in C

Printing unicode literals in C - c++

I am making an OpenVG application for Raspberry Pi that displays some text and I need a support for foreign characters (Polish in this case). I plan to prepare a function that maps unicode characters to literals in C in some higher level language but for now there's a problem with printing those literals in C.
Given the code below:
//both output the "ó" character, as expected
char A[] = "\xF3";
wchar_t B[] = L"\xF3";
//"ś" is expected as output but instead I get character with code 0x5B - "["
char A[] = "\x15B";
wchar_t B[] = L"\x15B";
Most of Polish characters have 3-digit hexadecimal codes. When I attempt to print "ś" (0x15B), it prints character "[" (0x5B) instead. It turns out I cannot print any unicode characters with more than 2-digit codes.
Is used data type the cause? I have considered using char16_t and char32_t but the header files are nowhere to be found in the system.

It's what in this
char A[]={'\xc5','\x9b'};
c59b is "ś" (0x15B) by UTF-8.

Related

What did I do CORRECTLY?-comparing index from string using .at(), error messages [duplicate]

When should I use single quotes and double quotes in C or C++ programming?

In C and in C++ single quotes identify a single character, while double quotes create a string literal. 'a' is a single a character literal, while "a" is a string literal containing an 'a' and a null terminator (that is a 2 char array).
In C++ the type of a character literal is char, but note that in C, the type of a character literal is int, that is sizeof 'a' is 4 in an architecture where ints are 32bit (and CHAR_BIT is 8), while sizeof(char) is 1 everywhere.

Some compilers also implement an extension, that allows multi-character constants. The C99 standard says:
6.4.4.4p10: "The value of an integer character constant containing more
than one character (e.g., 'ab'), or
containing a character or escape
sequence that does not map to a
single-byte execution character, is
implementation-defined."
This could look like this, for instance:
const uint32_t png_ihdr = 'IHDR';
The resulting constant (in GCC, which implements this) has the value you get by taking each character and shifting it up, so that 'I' ends up in the most significant bits of the 32-bit value. Obviously, you shouldn't rely on this if you are writing platform independent code.

Single quotes are characters (char), double quotes are null-terminated strings (char *).
char c = 'x';
char *s = "Hello World";

'x' is an integer, representing the numerical value of the
letter x in the machine’s character set
"x" is an array of characters, two characters long,
consisting of ‘x’ followed by ‘\0’

I was poking around stuff like: int cc = 'cc'; It happens that it's basically a byte-wise copy to an integer. Hence the way to look at it is that 'cc' which is basically 2 c's are copied to lower 2 bytes of the integer cc. If you are looking for a trivia, then
printf("%d %d", 'c', 'cc'); would give:
99 25443
that's because 25443 = 99 + 256*99
So 'cc' is a multi-character constant and not a string.
Cheers

Single quotes are for a single character. Double quotes are for a string (array of characters). You can use single quotes to build up a string one character at a time, if you like.
char myChar = 'A';
char myString[] = "Hello Mum";
char myOtherString[] = { 'H','e','l','l','o','\0' };

single quote is for character;
double quote is for string.

In C, single-quotes such as 'a' indicate character constants whereas "a" is an array of characters, always terminated with the \0 character

Double quotes are for string literals, e.g.:
char str[] = "Hello world";
Single quotes are for single character literals, e.g.:
char c = 'x';
EDIT As David stated in another answer, the type of a character literal is int.

A single quote is used for character, while double quotes are used for strings.
For example...
printf("%c \n",'a');
printf("%s","Hello World");
Output
a
Hello World
If you used these in vice versa case and used a single quote for string and double quotes for a character, this will be the result:
printf("%c \n","a");
printf("%s",'Hello World');
output :
For the first line. You will get a garbage value or unexpected value or you may get an output like this:
�
While for the second statement, you will see nothing. One more thing, if you have more statements after this, they will also give you no result.
Note: PHP language gives you the flexibility to use single and double-quotes easily.

Use single quote with single char as:
char ch = 'a';
here 'a' is a char constant and is equal to the ASCII value of char a.
Use double quote with strings as:
char str[] = "foo";
here "foo" is a string literal.
Its okay to use "a" but its not okay to use 'foo'

Single quotes are denoting a char, double denote a string.
In Java, it is also the same.

While I'm sure this doesn't answer what the original asker asked, in case you end up here looking for single quote in literal integers like I have...
C++14 added the ability to add single quotes (') in the middle of number literals to add some visual grouping to the numbers.
constexpr int oneBillion = 1'000'000'000;
constexpr int binary = 0b1010'0101;
constexpr int hex = 0x12'34'5678;
constexpr double pi = 3.1415926535'8979323846'2643383279'5028841971'6939937510;

In C & C++ single quotes is known as a character ('a') whereas double quotes is know as a string ("Hello"). The difference is that a character can store anything but only one alphabet/number etc. A string can store anything.
But also remember that there is a difference between '1' and 1.
If you type
cout<<'1'<<endl<<1;
The output would be the same, but not in this case:
cout<<int('1')<<endl<<int(1);
This time the first line would be 48. As when you convert a character to an int it converts to its ascii and the ascii for '1' is 48.
Same, if you do:
string s="Hi";
s+=48; //This will add "1" to the string
s+="1"; This will also add "1" to the string

different way to declare a char / string
char char_simple = 'a'; // bytes 1 : -128 to 127 or 0 to 255
signed char char_signed = 'a'; // bytes 1: -128 to 127
unsigned char char_u = 'a'; // bytes 2: 0 to 255
// double quote is for string.
char string_simple[] = "myString";
char string_simple_2[] = {'m', 'S', 't', 'r', 'i', 'n', 'g'};
char string_fixed_size[8] = "myString";
char *string_pointer = "myString";
char string_poionter_2 = *"myString";
printf("char = %ld\n", sizeof(char_simple));
printf("char_signed = %ld\n", sizeof(char_signed));
printf("char_u = %ld\n", sizeof(char_u));
printf("string_simple[] = %ld\n", sizeof(string_simple));
printf("string_simple_2[] = %ld\n", sizeof(string_simple_2));
printf("string_fixed_size[8] = %ld\n", sizeof(string_fixed_size));
printf("*string_pointer = %ld\n", sizeof(string_pointer));
printf("string_poionter_2 = %ld\n", sizeof(string_poionter_2));

Why printing out the characters “” (147, 148 ascii) does not work as expected on c++?

I do not understand what's going on here. This is compiled with GCC 10.2.0 compiler. Printing out the whole string is different than printing out each character.
#include <iostream>
int main(){
char str[] = "“”";
std::cout << str << std::endl;
std::cout << str[0] << str[1] << std::endl;
}
Output
“”
��
Why are not the two outputted lines the same? I would expect the same line twice. Printing out alphanumeric characters does output the same line twice.

Bear in mind that, on almost all systems, the maximum value a (signed) char can hold is 127. So, more likely than not, your two 'special' characters are actually being encoded as multi-byte combinations.
In such a case, passing the string pointer to std::cout will keep feeding data from that buffer until a zero (nul-terminator) byte is encountered. Further, it appears that, on your system, the std::cout stream can properly interpret multi-byte character sequences, so it shows the expected characters.
However, when you pass the individual char elements, as str[0] and str[1], there is no possibility of parsing those arguments as components of multi-byte characters: each is interpreted 'as-is', and those values do not correspond to valid, printable characters, so the 'weird' � symbol is shown, instead.

"“”" contains more bytes than you think. It's usually encoded as utf8. To see that, you can print the size of the array:
std::cout << sizeof str << '\n';
Prints 7 in my testing. Utf8 is a multi-byte encoding. That means each character is encoded in multiple bytes. Now, you're printing bytes of a utf8 encoded string, which are not printable themselves. That's why you get � when you try to print them.

Base64 encoded String too big, trailing characters truncated in c++

I have an image which I have to convert to base64. After the conversion, below is its value:
"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAIBAQEBAQIBAQECAgICAgQDAgICAgUEBAMEBgUGBgYFBgYGBwkIBgcJBwYGCAsICQoKCgoKBggLDAsKDAkKCgr/2wBDAQICAgICAgUDAwUKBwYHCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgr/wAARCAGQASwDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD8J6KKKzMxyMMYNLkeoplABPAoAkpjDB6U8DAwKKAI6KkpCwHWgBlBIHU0Ux/vGmnYB2R6ioZzkmnkgcmopGyOe9UncCtP3/GqzkE/SrFx/Wq8v9KpbiewzzPamu/JY96QgnocU1kYjGavlsJJMQHBzTt49DTPJalWNh1NNbmnKP3r60HDAgGk8o+/5UqxNkEUNKxIgQ96dTvKPv8AlR5R9/yrIbVhtFJ5fzf0p4jPc0CHA5GRRRgjtQRkYoJ5UIWA6mlpjDBpKnmDlQ8soOM05ZFAxmoqa27sakoseanrUkTDtVNd2e9WIAeMmgDTtWAFTb19aq2xwPmqbevrU8oCu46mpYGAqrIxJ61JBJjFK7QGtaMBirO8ehrPtphxzU/nn+9RzMDOoopG6H6VYAWAODTlYAdKipVfHBoAl3j0NPDEDAqLIPQ0eZt4zQA9mCjJphJJyaaX9BTfMP8AeFAElRu2CSaPN/2v0qOST0FNK7AGbuaieQ5oeQD7xpgXdxiqSSARwX5qJ4OM5qykRXtT/JLYNXHcLXM/yfb9KPKIHH41ofZvao5oNvarcmXGCuUvKHt+VKIwOM1dtrIT8ba1dN8Jm/IAjP5Vm5s3jSi2c9sX0pQAOgru4fhU8ig+S3NTf8Kjf/nifyrJ1ZG31eBwIAx92kfp93HNd3cfCl4hu8k/lWZfeA2tQSYzxUe1YOhGxye1c4bFOwB0Fac+iGFyu01AdOKtkLVc7I9hAp00oB61aezKdqhaMg4FVzMn2MCMxnsaTYfUU5m29qA+TjFS9jJwSG7G9KaYj2z+VThM4qRLbd2qObUiyKYjycZqzDEcZFP+zc4xUqRFe1aEvcWP5aVpsdqSoZO1UkhEhkyc4pRMR2qIcsPpTqzkrMtJF22mz3qx5o9vzqhbNgjNWPM9qOUlqzAP60GT0FNpm9qoluw+m+Z7UjPkYOKYX9KCbsmEny8CmPKR0FIHXB5qCWTmgLsm85qTzPaq/me1Hme1AXZPvPoKQnPJNQ+Z7UCVumD+dOO4XZIyButPjj24PtTY8twasJHnjFWCbuNVSxp4AHAqSOAntUyWpbtRexZX3f7f6U5LU3B6ZrSs9JM2Dsrc0vwwXx+5/SlOTUTalrPUo+FvCpvXX931PpXpXhX4eqm0+R+lQ+D/AAyYWU+V+lem+HdHcBcR9vSuOdSSjuetSpRc0VNL8Ah1UeT2rZh+GgKgCDr7V1vh7RnkI3RH8q7TSvCjTKv7rt6V58q87bnqLDUux5A/wn+0jH2fP4Vn6p8BftEefsX/AI7X0x4c+HIu3AMA5PpXa2XwRhuoFzAvNc31ir3NPqlHsfBl/wDs5hpC32D/AMdrK1D9nfyELix/8dr9C7v9neDyt4t0rD1P9nuFwY/sy/lTWKq9yXg6Vtj87tS+CxgYr9j6f7NYWpfCgwKW+zdP9mv0H1z9mmEoZBaL+VcP4m/Z3iiRh9kX8q1jiqt1qcssLT10Pg3VPBH2Uk+Tj8KxbjRzDLs2V9f+MfgNHAGYWo4z2ryrxT8KfsUzMLbpntXfGpJ2OKVCKvoeNRaTgZKdvSnmz8sYxXXax4dewcp5RH4VgXts6Nt2H8q7YxTVzypq0mZjQYHC1G8ZSrrowOWGOKglTjirsjkcmmVXHeoJfpirJGDUEi56iolJpm8dURA4ORS7z6Cl2D1NGwepqG7lEsLc8VNvb1qupK9KXzT7/nRdgPMhAqFpmyakf7pqF/vGrMZDvOak81/WnKoxjOKd5fvQSM85sYphJPJpxQ5+UUmxvSgB6RqaUQr3pUHepFUY5FA1uMWBT1H6UGIL/wDWqZVDdaXYPU0FjYlPerNumeaYkR6Y/CrMEEhxhetBUFqSwx7qtR27fexmi0tZMDctbOnaW04AVc1E5abnRGN3sT+F9L+0FSyfnXo3hjwlHMFbYPyrF8I+HpRt/d16t4L0I/IHT0rlq1Jcm510Ka9psLoXg2OIAiP9K7nwp4UWQr+77+lWtM8PqQCqZ49K7fwR4cy6lo+/pXl1KsuXc9ujTXOtCz4X8ErhQYxz7V3+g+CowFyo/Krvh7w7FHGjMvYdq7DR9KhTbkfpXnzqStuexCnG+xX8O+D4omVtoruNJ0KNYlUAcCq2mW1ugABrdsljVRzXG6k+51+zhbYkXw/DLEAzD8qik8GWjnLbefatWB0CAbql8xCBzWaqTvuZulC2xyureArR7clNp/CuC8V/DqFkciMflXssrQSJtZqyNa0uynt2OBn6V0Rqy7nNKjDXQ+XPGfw2jZXxCPyrx7xz8Lk+dvJHftX2B4p8MwSFsLn8K808YeCIponPl/pXbHET01PNnSV3ofE
and so on...
This a quite a big value. I need to put this in a char data[] like below:
char sPostData[21070] = "{ \"image\" : \"<base64 encoded value>\" , \"name\": \"dev\"}";
but it throws this error:
Error C2026 string too big, trailing characters truncated
How can I resolve it?

The Microsoft compiler imposes a limit of 16380 single-byte characters for a string literal. The documentation says
Prior to adjacent strings being concatenated, a string cannot be longer than 16380 single-byte characters.
Break the string into adjacent chunks, something like
char[] = "a whole bunch of characters"
"a whole bunch more characters"
" and even more characters";

According to the documentation for that error, there is a limit of 16380 bytes in a character array (characters for narrow strings, fewer for Unicode).
Character string pointers (const char *) have a different limit, 65535 bytes.

Including decimal equivalent of a char in a character array

How do I create a character array using decimal/hexadecimal representation of characters instead of actual characters.
Reason I ask is because I am writing C code and I need to create a string that includes characters that are not used in English language. That string would then be parsed and displayed to an LCD Screen.
For example '\0' decodes to 0, and '\n' to 10. Are there any more of these special characters that i can sacrifice to display custom characters. I could send "Temperature is 10\d C" and degree sign is printed instead of '\d'. Something like this would be great.

Assuming you have a character code that is a degree sign on your display (with a custom display, I wouldn't necessarily expect it to "live" at the common place in the extended IBM ASCII character set, or that the display supports Unicode character encoding) then you can use the encoding \nnn or \xhh, where nnn is up to three digits in octal (base 8) or hh is up to two digits of hex code. Unfortunately, there is no decimal encoding available - Dennis Ritchie and/or Brian Kernighan were probably more used to using octal, as that was quite common at the time when C was first developed.
E.g.
char *str = "ABC\101\102\103";
cout << str << endl;
should print ABCABC (assuming ASCII encoding)

You can directly write
char myValues[] = {1,10,33,...};

Use \u00b0 to make a degree sign (I simply looked up the unicode code for it)
This requires unicode support in the terminal.

Simple, use std::ostringstream and casting of the characters:
std::string s = "hello world";
std::ostringstream os;
for (auto const& c : s)
os << static_cast<unsigned>(c) << ' ';
std::cout << "\"" << s << "\" in ASCII is \"" << os.str() << "\"\n";
Prints:
"hello world" in ASCII is "104 101 108 108 111 32 119 111 114 108 100 "

A little more research and i found answer to my own question.
Characters follower by a '\' are called escape sequence.
You can put octal equivalent of ascii in your string by using escape sequence from'\000' to '\777'.
Same goes for hex, 'x00' to 'xFF'.
I am printing my custom characters by using 'xC1' to 'xC8', as i only had 8 custom characters.
Every thing is done in a single line of code: lcd_putc("Degree \xC1")

how I can set binary data in a char[]

I have const binary data that I need insert to buffer
for example
char buf[] = "1232\0x1";
but how can do it when binary data is at first like below
char buf[] = "\0x11232";
compiler see it like a big hex number
but my perpose is
char buf[] = {0x1,'1','2','3','2'};

You can use compile-time string concatenation:
char buf[] = "\x01" "1232";
However, with a 2-digit number after \x it also works without:
char buf[] = "\x011232";

You can create a single string literal by composing it of adjacent strings - the compiler will concatenate them:
char buf[] = "\x1" "1232";
is equivalent to:
char buf[] = {0x1,'1','2','3','2', 0}; // note the terminating null, which may or may not be important to you

You have to write it in two byte or four byte format:
\xhh = ASCII character in hexadecimal notation
\xhhhh = Unicode character in hexadecimal notation if this escape sequence is used in a wide-character constant or a Unicode string literal.
so in your case you have to write "\x0112345"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Printing unicode literals in C - c++

It's what in this char A[]={'\xc5','\x9b'}; c59b is "ś" (0x15B) by UTF-8.

Related

What did I do CORRECTLY?-comparing index from string using .at(), error messages [duplicate]

Why printing out the characters “” (147, 148 ascii) does not work as expected on c++?

Base64 encoded String too big, trailing characters truncated in c++

Including decimal equivalent of a char in a character array

how I can set binary data in a char[]

Categories

Resources