How to print wstring on Linux/OS X? - c++

How can I print a string like this: €áa¢cée£ on the console/screen? I tried this:
#include <iostream>
#include <string>
using namespace std;
wstring wStr = L"€áa¢cée£";
int main (void)
{
wcout << wStr << " : " << wStr.length() << endl;
return 0;
}
which is not working. Even confusing, if I remove € from the string, the print out comes like this: ?a?c?e? : 7 but with € in the string, nothing gets printed after the € character.
If I write the same code in python:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
wStr = u"€áa¢cée£"
print u"%s" % wStr
it prints out the string correctly on the very same console. What am I missing in c++ (well, I'm just a noob)? Cheers!!
Update 1: based on n.m.'s suggestion
#include <iostream>
#include <string>
using namespace std;
string wStr = "€áa¢cée£";
char *pStr = 0;
int main (void)
{
cout << wStr << " : " << wStr.length() << endl;
pStr = &wStr[0];
for (unsigned int i = 0; i < wStr.length(); i++) {
cout << "char "<< i+1 << " # " << *pStr << " => " << pStr << endl;
pStr++;
}
return 0;
}
First of all, it reports 14 as the length of the string: €áa¢cée£ : 14 Is it because it's counting 2 byte per character?
And all I get this:
char 1 # ? => €áa¢cée£
char 2 # ? => ??áa¢cée£
char 3 # ? => ?áa¢cée£
char 4 # ? => áa¢cée£
char 5 # ? => ?a¢cée£
char 6 # a => a¢cée£
char 7 # ? => ¢cée£
char 8 # ? => ?cée£
char 9 # c => cée£
char 10 # ? => ée£
char 11 # ? => ?e£
char 12 # e => e£
char 13 # ? => £
char 14 # ? => ?
as the last cout output. So, actual problem still remains, I believe. Cheers!!
Update 2: based on n.m.'s second suggestion
#include <iostream>
#include <string>
using namespace std;
wchar_t wStr[] = L"€áa¢cée£";
int iStr = sizeof(wStr) / sizeof(wStr[0]); // length of the string
wchar_t *pStr = 0;
int main (void)
{
setlocale (LC_ALL,"");
wcout << wStr << " : " << iStr << endl;
pStr = &wStr[0];
for (int i = 0; i < iStr; i++) {
wcout << *pStr << " => " << static_cast<void*>(pStr) << " => " << pStr << endl;
pStr++;
}
return 0;
}
And this is what I get as my result:
€áa¢cée£ : 9
€ => 0x1000010e8 => €áa¢cée£
á => 0x1000010ec => áa¢cée£
a => 0x1000010f0 => a¢cée£
¢ => 0x1000010f4 => ¢cée£
c => 0x1000010f8 => cée£
é => 0x1000010fc => ée£
e => 0x100001100 => e£
£ => 0x100001104 => £
=> 0x100001108 =>
Why there it's reported as 9 than 8? Or this is what I should expect? Cheers!!

Drop the L before the string literal. Use std::string, not std::wstring.
UPD: There's a better (correct) solution. keep wchar_t, wstring and the L, and call setlocale(LC_ALL,"") in the beginning of your program.
You should call setlocale(LC_ALL,"") in the beginning of your program anyway. This instructs your program to work with your environment's locale, instead of the default "C" locale. Your environment has a UTF-8 one so everything should work.
Without calling setlocale(LC_ALL,""), the program works with UTF-8 sequences without "realizing" that they are UTF-8. If a correct UTF-8 sequence is printed on the terminal, it will be interpreted as UTF-8 and everything will look fine. That's what happens if you use string and char: gcc uses UTF-8 as a default encoding for strings, and the ostream happily prints them without applying any conversion. It thinks it has a sequence of ASCII characters.
But when you use wchar_t, everything breaks: gcc uses UTF-32, the correct re-encoding is not applied (because the locale is "C") and the output is garbage.
When you call setlocale(LC_ALL,"") the program knows it should recode UTF-32 to UTF-8, and everything is fine and dandy again.
This all assumes that we only ever want to work with UTF-8. Using arbitrary locales and encodings is beyond the scope of this answer.

Related

How do I know the positional value of a character in a string with respect to that of the English alphabets?

For example if I have something like :
word="Bag"
I need output as:
B-2
a-1
g-7
Here is a more portable solution:
There are only 26 letters to consider, thus your question is easily answerable by simply providing a lookup table. There is no need to call functions such as tolower, or to assume that letters are contiguous in the collating sequence (as EBCDIC does not follow this pattern):
#include <iostream>
#include <unordered_map>
#include <string>
int main()
{
// Create the lookup table -- this could have been done in many ways,
// such as a static table of characters to values, but I chose a simple
// literal string and build the table at runtime.
const char *alpha = "aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ";
std::unordered_map<char, int> cMap;
for (int i = 0; i < 52; ++i)
cMap[alpha[i]] = i / 2; // Note there are two characters per character value,
// the lower and the upper case version of the character
// Test
std::string test = "Bag";
for ( auto& c : test)
std::cout << c << '-' << cMap[c]+1 << ' ';
std::cout << '\n';
test = "Buy";
for ( auto& c : test)
std::cout << c << '-' << cMap[c]+1 << ' ';
std::cout << '\n';
test = "Elephant";
for ( auto& c : test)
std::cout << c << '-' << cMap[c]+1 << ' ';
}
Output:
B-2 a-1 g-7
B-2 u-21 y-25
E-5 l-12 e-5 p-16 h-8 a-1 n-14 t-20
for (char ch: word) {
cout << ch << "-" << (tolower(ch) - 'a' + 1) << " ";
}
I hope the below code snippet helps :
char c = 'g'; //your character here
int position = 1 + (tolower(c))-'a': //ensure that character is in lowercase and find its position relative to 'a'

How to display character string literals with hex properly with std::cout in C++?

How to display character string literals with hex properly with std::cout in C++?
I want to use octal and hex to print character string literals with std::cout in C++.
I want to print "bee".
#include <iostream>
int main() {
std::cout << "b\145e" << std::endl;//1
std::cout << "b\x65e" << std::endl;//2
return 0;
}
//1 works fine, but //2 doesn't with hex escape sequence out of range.
Now I want to print "be3".
#include <iostream>
int main() {
std::cout << "b\1453" << std::endl;//1
std::cout << "b\x653" << std::endl;//2
return 0;
}
Also, //1 works fine, but //2 doesn't with hex escape sequence out of range.
Now can I come to the conclusion that hex is not a good way to display character string characters?
I get the feeling I am wrong but don't know why.
Can someone explain whether hex can be used and how?
There's actually an example of this exact same situation on cppreference's documentation on string literals.
If a valid hex digit follows a hex escape in a string literal, it would fail to compile as an invalid escape sequence. String concatenation can be used as a workaround:
They provide the example below:
// const char* p = "\xfff"; // error: hex escape sequence out of range
const char* p = "\xff""f"; // OK : the literal is const char[3] holding {'\xff','f','\0'}
Applying what they explain to your problem, we can print the string literal be3 in two ways:
std::cout << "b\x65" "3" << std::endl;
std::cout << "b\x65" << "3" << std::endl;
The hex escape sequences becomes \x65e and \x653 so you need to help the compiler to stop after 65:
#include <iostream>
int main() {
std::cout << "b\x65""e" << std::endl;//2
std::cout << "b\x65""3" << std::endl;//2
}

How to display ASCII of alphabet , numbers and special characters on console?

I tried
char c='A';
cout<<(int)c;
But this is not for special characters and numbers I think. What's should I do?
This will kinda do it:
#include <iostream>
int main()
{
char c = 0;
for (int i = 0; i < 128; ++i,++c)
std::cout << c << " = " << i << std::endl;
// or: std::cout << c << " = " << static_cast<int>(c) << std::endl;
}
But really, it is easier (and better) to just go look at http://www.asciitable.com/ or a similar site, because everything below 32 is not printable.
Also note, only the first 127 characters are ASCII - above that we run into extended ASCII that depends on your console settings.

Locate the source of a bug with sscanf

I've been struggling with this for too long.
Let's say i have this minimal code:
test.cxx
#include <iostream>
#include <cstdio>
int main (int argc, char *argv[])
{
const char *text = "1.01 foo";
float value = 0;
char other[8];
int code = sscanf(text, "%f %7s", &value, other);
std::cout << code << " | " << text << " | => | " << value << " | " << other << " | " << std::endl;
return 0;
}
$ g++ test.cxx; ./a.out produces this output, as expected:
$ 2 | 1.01 foo | => | 1.01 | foo |
Now I have these 5 lines embedded into a project with several thousand lines, and lots of includes ...
Compiling, running, and the output is now:
$ 2 | 1.01 foo | => | 1 | .01 |
What strategy could I use to locate the source of this inconsistency ?
EDIT:
export LC_ALL=C (or LC_NUMERIC=C); ./a.out seems to solve my problem
It might be caused by a different locale in your test and in your destination application. I was able to reproduce it on coliru:
by using:
setlocale(LC_ALL, "cs_CZ.utf8");
http://coliru.stacked-crooked.com/a/5a8f2ea7ac330d66
You can find some solutions in this SO:
sscanf() and locales. How does one really parse things like "3.14"?
[edit]
Solution with uselocale, but since you tagged this question with C++ then why not use std::stringstream and imbue it with proper locale (see link to SO above).
http://coliru.stacked-crooked.com/a/dc0fac7d2533d95c
const char *text = "1.01 foo";
float value = 0;
char other[8];
// set for testing, sscanf will assume floating point numbers use comma instead of dots
setlocale(LC_ALL, "cs_CZ.utf8");
// Temporarily use C locale (uses dot in floats) on current thread
locale_t locale = newlocale(LC_NUMERIC_MASK, "C", NULL);
locale_t old_locale = uselocale(locale);
int code = sscanf(text, "%f %7s", &value, other);
std::cout << code << " | " << text << " | => | " << value << " | " << other << " | " << std::endl;
// Go back to original locale
uselocale(old_locale);
freelocale(locale);

print to output at specific location on line in c++

I am trying to print to a uniform position in a line after printing out a header. here's an example:
PHRASE TYPE
"hello there" => greeting
"yo" => greeting
"where are you?" => question
"It's 3:00" => statement
"Wow!" => exclamation
Assume each of these are stored in a std::map<string, string>, where key = phrase and value = type. My issue is that simply using tabs is dependent on the console or text editor that I view the output in. If the tab width is too small I won't know for sure where it will be printed. I have tried using setw, but that only prints the separator ("=>") a fixed distance from the end of the phrase. Is there a simple way to do this?
NOTE Assume for now that we just always know that the phrase will not be longer than, say, 16 characters. We don't need to account for what to do if it is.
Use std::left and std::setw:
std::cout << std::left; // This is "sticky", but setw is not.
std::cout << std::setw(16) << phrase << " => " << type << "\n";
For example:
#include <iostream>
#include <string>
#include <iomanip>
#include <map>
int main()
{
std::map<std::string, std::string> m;
m["hello there"] = "greeting";
m["where are you?"] = "question";
std::cout << std::left;
for (std::map<std::string, std::string>::iterator i = m.begin();
i != m.end();
i++)
{
std::cout << std::setw(16)
<< std::string("\"" + i->first + "\"")
<< " => "
<< i->second
<< "\n";
}
return 0;
}
Output:
"hello there" => greeting
"where are you?" => question
See http://ideone.com/JTv6na for demo.
printf("\"%s\"%*c => %s",
it->first.c_str(),
std::max(0, 16 - it->first.size()),
' ',
it->second.c_str());`
The same idea as Peter's solution, but puts the padding outside the quotes. It uses %c with a length argument to insert padding.
If you're not adverse to C-style printing, printf is great for this sort of thing, and much more readable:
printf("\"%16s\" => %s\n", it->first.c_str(), it->second.c_str());
There's nothing wrong with using printf and friends in a C++ program, just be careful mixing iostreams and stdio. You can always sprintf into a buffer, then output that with iostreams.
You might find this function useful:
#include <iostream>
#include <iomanip>
void printRightPadded(std::ostream &stream,const std::string &s,size_t width)
{
std::ios::fmtflags old_flags = stream.setf(std::ios::left);
stream << std::setw(width) << s;
stream.flags(old_flags);
}
You could use it like this:
void
printKeyAndValue(
std::ostream &stream,
const std::string &key,
const std::string &value
)
{
printRightPadded(stream,"\"" + key + "\"",18);
stream << " => " << value << "\n";
}
if you can't work out with setw, a simple alternative to try is to patch all phrase with spaces so that they are all 16 characters long.
I personally find the C-style printing more readable for formatted printing. Using printf you could also handle the column width using the * formatter.
#include <cstdio>
int main() {
printf("%-*s%-*s\n", 10, "Hello", 10, "World");
printf("%-*s%-*s\n", 15, "Hello", 15, "World");
// in the above, '-' left aligns the field
// '*' denotes that the field width is a parameter specified later
// ensure that the width is specified before what it is used to print
}
Output
Hello World
Hello World