Garbled character output when using cout API

Garbled character output when using cout API - c++

I am trying to run this simple code in VS 2015
#include "stdafx.h"
# include <iostream>
int main()
{
char * szOldPath = "\"C:\icm\scripts\StartupSync\runall.bat\" nonprod";
std::cout << szOldPath << std::endl;
return 0;
}
However, the output of szOldPath is not proper and the console is printing--
unall.bat" nonprodupSync
I suspect this might be because of Unicode and I should be using wcout. So I disabled Unicode by going to Configuration Properties -> General --> Character Set and tried setting it to Not Set or Multi Byte. But still running into this issue.
I understand it is not good to disable UNICODE but I am trying to understand some legacy code written in our company and this experiment is a part of this exercise,is there any way I can get the cout command to print szOldPath successfully?

Your issue has nothing to do with Unicode.
\r is the escape sequence for a carriage return. So, you are printing out "C:\icm\scripts\StartupSync, and then \r tells the terminal to move the cursor back to the beginning of the current line, and then unall.bat" nonprod is printed, overwriting what was already there.
You need to escape all of the \ characters in your string literal, just like you had to escape the " characters.
Also, your variable needs to be declared as a pointer to const char when assigning a string literal to the pointer. This is enforced in C++11 and later:
#include "stdafx.h"
#include <iostream>
int main()
{
const char * szOldPath = "\"C:\\icm\\scripts\\StartupSync\\runall.bat\" nonprod";
std::cout << szOldPath << std::endl;
return 0;
}
Alternatively, in C++11 and later, you can use a raw string literal instead to avoid having to escape any characters with a leading \:
const char * szOldPath = R"("C:\icm\scripts\StartupSync\runall.bat" nonprod)";

Related

how to define my own special character in cout

for example :
cout << " hello\n400";
will print:
hello
400
another example:
cout << " hello\r400";
will print:
400ello
there is a option to define my own special character?
i would like to make somthing like:
cout << " hello\d400";
would give:
hello
400
(/d is my special character, and i already got the function to get the stdout cursor one line down(cursorDown()),but i just don't how to define a special character that each time will be writted will call to my cursorDown() function)

As said by others there is no way you can make cout understand user defined characters , however what you could do is
std::cout is an object of type std::ostream which overloads operator<<. You could create an object of the struct which parses your string for your special characters and other user defined characters before printing it to a file or console using ostream similar to any log stream.
Example
or
Instead of calling cout << "something\dsomething"
you can call a method special_cout(std::string); which parses the string for user defined characters and executes the calls.

There is no way to define "new" special characters.
But you can make the stream interpret specific characters to have new meanings (that you can define). You can do this using locals.
Some things to note:
The characters in the string "xyza" is just a simple way of encoding a string. Escaped characters are C++ way of allowing you to represent representing characters that are not visible but are well defined. Have a look at an ASCII table and you will see that all characters in the range 00 -> 31 (decimal) have special meanings (often referred to as control characters).
See Here: http://www.asciitable.com/
You can place any character into a string by using the escape sequence to specify its exact value; i.e. \x0A used in a string puts the "New Line" character in the string.
The more commonly used "control characters" have shorthand versions (defined by the C++ language). '\n' => '\x0A' but you can not add new special shorthand characters as this is just a convenience supply by the language (its like a tradition that most languages support).
But given a character can you give it a special meaning in an IO stream. YES. You need to define a facet for a locale then apply that locale to the stream.
Note: Now there is a problem with applying locals to std::cin/std::out. If the stream has already been used (in any way) applying a local may fail and the OS may do stuff with the stream before you reach main() and thus applying a locale to std::cin/std::cout may fail (but you can do it to file and string streams easily).
So how do we do it.
Lets use "Vertical Tab" as the character we want to change the meaning of. I pick this as there is a shortcut for it \v (so its shorter to type than \x0B) and usually has no meaning for terminals.
Lets define the meaning as new line and indent 3 spaces.
#include <locale>
#include <algorithm>
#include <iostream>
#include <fstream>
class IndentFacet: public std::codecvt<char,char,std::mbstate_t>
{
public:
explicit IndentFacet(size_t ref = 0): std::codecvt<char,char,std::mbstate_t>(ref) {}
typedef std::codecvt_base::result result;
typedef std::codecvt<char,char,std::mbstate_t> parent;
typedef parent::intern_type intern_type;
typedef parent::extern_type extern_type;
typedef parent::state_type state_type;
protected:
virtual result do_out(state_type& tabNeeded,
const intern_type* rStart, const intern_type* rEnd, const intern_type*& rNewStart,
extern_type* wStart, extern_type* wEnd, extern_type*& wNewStart) const
{
result res = std::codecvt_base::ok;
for(;(rStart < rEnd) && (wStart < wEnd);++rStart,++wStart)
{
if (*rStart == '\v')
{
if (wEnd - wStart < 4)
{
// We do not have enough space to convert the '\v`
// So stop converting and a subsequent call should do it.
res = std::codecvt_base::partial;
break;
}
// if we find the special character add a new line and three spaces
wStart[0] = '\n';
wStart[1] = ' ';
wStart[2] = ' ';
wStart[3] = ' ';
// Note we do +1 in the for() loop
wStart += 3;
}
else
{
// Otherwise just copy the character.
*wStart = *rStart;
}
}
// Update the read and write points.
rNewStart = rStart;
wNewStart = wStart;
// return the appropriate result.
return res;
}
// Override so the do_out() virtual function is called.
virtual bool do_always_noconv() const throw()
{
return false; // Sometime we add extra tabs
}
};
Some code that uses the locale.
int main()
{
std::ios::sync_with_stdio(false);
/* Imbue std::cout before it is used */
std::cout.imbue(std::locale(std::locale::classic(), new IndentFacet()));
// Notice the use of '\v' after the first lien
std::cout << "Line 1\vLine 2\nLine 3\n";
/* You must imbue a file stream before it is opened. */
std::ofstream data;
data.imbue(std::locale(std::locale::classic(), new IndentFacet()));
data.open("PLOP");
// Notice the use of '\v' after the first lien
data << "Loki\vUses Locale\nTo do something silly\n";
}
The output:
> ./a.out
Line 1
Line 2
Line 3
> cat PLOP
Loki
Uses Locale
To do something silly
BUT
Now writing all this is not really worth it. If you want a fixed indent like that us a named variable that has those specific characters in it. It makes your code slightly more verbose but does the trick.
#include <string>
#include <iostream>
std::string const newLineWithIndent = "\n ";
int main()
{
std::cout << " hello" << newLineWithIndent << "400";
}

Farsi character utf8 in c++

i m trying to read and write Farsi characters in c++ and i want to show them in CMD
first thing i fix is Font i add Farsi Character to that and now i can write on the screen for example ب (uni : $0628) with this code:
#include <iostream>
#include <io.h>
#include <fcntl.h>
using namespace std;
int main() {
_setmode(_fileno(stdout), _O_U16TEXT);
wcout << L"\u0628 \n";
wcout << L"ب"<<endl;
system("pause");
}
but how i can keep this character ... for Latin characters we can use char or string but how about Farsi character utf8 ?!
and how i can get them ... for Latin characters we use cin>>or gets_s
should i use wchar_t? if yes how?
because with this code it show wrong character ...
wchar_t a='\u0628';
wcout <<a;
and i can't show this character بـ (uni $FE91) even though that exist in my installed font but ب (uni $0628) showed correctly
thanks in advance

The solution is the following line:
wchar_t a=L'\u0628';
The use of L tells the compiler that your type char is a wide char ("large" type, I guess? At least that's how I remember it) and this makes sure the value doesn't get truncated to 8 bits - thus this works as intended.
UPDATE
If you are building/running this as a console application in Windows you need to manage your code pages accordingly. The following code worked for me when using Cyrillic input (Windows code page 1251) when I set the proper code page before wcin and cout calls, basically at the very top of my main():
SetConsoleOutputCP(1251);
SetConsoleCP(1251);
For Farsi I'd expect you should use code page 1256.
Full test code for your reference:
#include <iostream>
#include <Windows.h>
using namespace std;
void main()
{
SetConsoleOutputCP(1256); // to manage console output
SetConsoleCP(1256); // to properly process console input
wchar_t b;
wcin >> b;
wcout << b << endl;
}

Is it possible to print UTF-8 string with Boost and STL in windows console?

I'm trying to output UTF-8 encoded string with cout with no success. I'd like to use Boost.Locale in my program. I've found some info regarding windows console specific. For example, this article http://www.boost.org/doc/libs/1_60_0/libs/locale/doc/html/running_examples_under_windows.html says that I should set output console code page to 65001 and save all my sources in UTF-8 encoding with BOM. So, here is my simple example:
#include <windows.h>
#include <boost/locale.hpp>
using namespace std;
using namespace boost::locale;
int wmain(int argc, const wchar_t* argv[])
{
//system("chcp 65001 > nul"); // It's the same as SetConsoleOutputCP(CP_UTF8)
SetConsoleOutputCP(CP_UTF8);
locale::global(generator().generate(""));
static const char* utf8_string = u8"♣☻▼►♀♂☼";
cout << "cout: " << utf8_string << endl;
printf("printf: %s\n", utf8_string);
return 0;
}
I compile it with Visual Studio 2015 and it produces the following output in console:
cout: ���������������������
printf: ♣☻▼►♀♂☼
Why does printf do it well and cout don't? Can locale generator of Boost help with it? Or should I use somethong other to print UTF-8 text in console in stream mode (cout-like approach)?

It looks like std::cout is much too clever here: it tries to interpret your utf8 encoded string as an ascii one and finds 21 non ascii characters that it outputs as the unmapped character �. AFAIK Windows C++ console driver,insists on each character from a narrow char string being mapped to a position on screen and does not support multi bytes character sets.
Here what happens under the hood:
utf8_string is the following char array (just look at a Unicode table and do the utf8 conversion):
utf8_string = { '0xe2', '0x99', '0xa3', '0xe2', '0x98', '0xbb', '0xe2', '0x96',
'0xbc', '0xe2', '0x96', '0xba', '0xe2', '0x99', '0x80', '0xe2', '0x99',
'0x82', '0xe2', '0x98', '0xbc', '\0' };
that is 21 characters none of which is in the ascii range 0-0x7f.
On the opposite side, printf just outputs the byte without any conversion giving the correct output.
I'm sorry but even after many searches I could not find an easy way to correctly display UTF8 output on a windows console using a narrow stream such as std::cout.
But you should notice that your code fails to imbue the booster locale into cout

The key problem is that implementation of cout << "some string" after long and painful adventures calls WriteFile for every character.
If you'd like to debug it, set breakpoint inside _write function in write.c file of CRT sources, write something to cout and you'll see all the story.
So we can rewrite your code
static const char* utf8_string = u8"♣☻▼►♀♂☼";
cout << utf8_string << endl;
with equivalent (and faster!) one:
static const char* utf8_string = u8"♣☻▼►♀♂☼";
const size_t utf8_string_len = strlen(utf8_string);
DWORD written = 0;
for(size_t i = 0; i < utf8_string_len; ++i)
WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), utf8_string + i, 1, &written, NULL);
output: ���������������������
Replace cycle with single call of WriteFile and UTF-8 console gets brilliant:
static const char* utf8_string = u8"♣☻▼►♀♂☼";
const size_t utf8_string_len = strlen(utf8_string);
DWORD written = 0;
WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), utf8_string, utf8_string_len, &written, NULL);
output: ♣☻▼►♀♂☼
I tested it on msvc.2013 and msvc.net (2003), both of them behave identically.
Obviously windows implementation of console wants a whole characters at a call of WriteFile/WriteConsole and cannot take a UTF-8 characters by single bytes. :)
What we can do here?
My first idea is to make output buffered, like in files. It's easy:
static char cout_buff[128];
cout.rdbuf()->pubsetbuf(cout_buff, sizeof(cout_buff));
cout << utf8_string << endl; // works
cout << utf8_string << endl; // do nothing
output: ♣☻▼►♀♂☼ (only once, I explain it later)
First issue is console output become delayed, it waits until end of line or buffer overflow.
Second issue — it doesn't work.
Why? After first buffer flush (at first << endl) cout switch to bad state (badbit set). That's because of WriteFile normally returns in *lpNumberOfBytesWritten number of written bytes, but for UTF-8 console it returns number of written characters (problem described here). CRT detects, that number of bytes requested to write and written is different and stops writing to 'failed' stream.
What we can do more?
Well, I suppose that we can implement our own std::basic_streambuf to write console correct way, but it's not easy and I have no time for it. If anyone want, I'll be glad.
Another decisions are (a) use std::wcout and strings of wchar_t characters, (b) use WriteFile/WriteConsole. Sometimes that solutions can be accepted.
Working with UTF-8 console in Microsoft versions of C++ is really horrible.

Matching russian vowels in C++

I wanted to write a function which returns true if a given character is a russian vowel. But the results I get are strange to me. This is what I've got so far:
#include <iostream>
using namespace std;
bool is_vowel_p(char working_char)
// returns true if the character is a russian vowel
{
string matcher = "аяё×эеуюыи";
if (find(matcher.begin(), matcher.end(), working_char) != matcher.end())
return true;
else
return false;
}
void main()
{
cout << is_vowel_p('е') << endl; // russian vowel
cout << is_vowel_p('Ж') << endl; // russian consonant
cout << is_vowel_p('D') << endl; // latin letter
}
The result is:
1
1
0
what is strange to me. I expected the following result:
1
0
0
It's seems that there is some kind of internal mechanism which I don't know yet. I'm at first interested in how to fix this function to work properly. And second, what is going on there, that I get this result.

string and char are only guaranteed to represent characters in the basic character set - which does not include the Cyrillic alphabet.
Using wstring and wchar_t, and adding L before the string and character literals to indicate that they use wide characters, should allow you to work with those letters.
Also, for portability you need to include <algorithm> for find, and give main a return type of int.

C++ source code is ASCII. You are entering unicode characters. The comparison is being done using 8 bit values. I bet one of the vowels fulfills the following:-
vowel & 255 == (code point for 'Ж') & 255
You need to use unicode functions to do this, not ASCII functions, i.e. use functions that require wchar_t values. Also, make sure your compiler can parse the non-ASCII vowel string. Using MS VC, the compiler requires:-
L"аяё×эеуюыи" or TEXT("аяё×эеуюыи")
the latter is a macro that adds the L when compiling with unicode support.
Convert the code to use wchar_t and it should work.

Very useful function in locale.h
setlocale(LC_ALL, "Russian");
Past this in the beginning of the program.
Example:
#include <stdio.h>
#include <locale.h>
void main()
{
setlocale(LC_ALL, "Russian");
printf("Здравствуй, мир!\n");//Hello, world!
}

Make sure your system default locale is Russian, and make sure your file is saved as codepage 1251 (Cyrillic/Windows). If it's saved as Unicode, this won't ever work.
The system default locale is the one used by non-Unicode-compliant programs. It's in Control Panel, under Regional settings.
Alternatively, rewritte to use wstring and wchar_t and L"" string/char literals.

C++ printf: newline (\n) from commandline argument

How print format string passed as argument ?
example.cpp:
#include <iostream>
int main(int ac, char* av[])
{
printf(av[1],"anything");
return 0;
}
try:
example.exe "print this\non newline"
output is:
print this\non newline
instead I want:
print this
on newline

No, do not do that! That is a very severe vulnerability. You should never accept format strings as input. If you would like to print a newline whenever you see a "\n", a better approach would be:
#include <iostream>
#include <cstdlib>
int main(int argc, char* argv[])
{
if ( argc != 2 ){
std::cerr << "Exactly one parameter required!" << std::endl;
return 1;
}
int idx = 0;
const char* str = argv[1];
while ( str[idx] != '\0' ){
if ( (str[idx]=='\\') && (str[idx+1]=='n') ){
std::cout << std::endl;
idx+=2;
}else{
std::cout << str[idx];
idx++;
}
}
return 0;
}
Or, if you are including the Boost C++ Libraries in your project, you can use the boost::replace_all function to replace instances of "\\n" with "\n", as suggested by Pukku.

At least if I understand correctly, you question is really about converting the "\n" escape sequence into a new-line character. That happens at compile time, so if (for example) you enter the "\n" on the command line, it gets printed out as "\n" instead of being converted to a new-line character.
I wrote some code years ago to convert escape sequences when you want it done. Please don't pass it as the first argument to printf though. If you want to print a string entered by the user, use fputs, or the "%s" conversion format:
int main(int argc, char **argv) {
if (argc > 1)
printf("%s", translate(argv[1]));
return 0;
}

You can't do that because \n and the like are parsed by the C compiler. In the generated code, the actual numerical value is written.
What this means is that your input string will have to actually contain the character value 13 (or 10 or both) to be considered a new line because the C functions do not know how to handle these special characters since the C compiler does it for them.
Alternatively you can just replace every instance of \\n with \n in your string before sending it to printf.

passing user arguments directly to printf causes a exploit called "String format attack"
See Wikipedia and Much more details

There's no way to automatically have the string contain a newline. You'll have to do some kind of string replace on your own before you use the parameter.

It is only the compiler that converts \n etc to the actual ASCII character when it finds that sequence in a string.
If you want to do it for a string that you get from somewhere, you need to manipulate the string directly and replace the string "\n" with a CR/LF etc. etc.
If you do that, don't forget that "\\" becomes '\' too.
Please never ever use char* buffers in C++, there is a nice std::string class that's safer and more elegant.

I know the answer but is this thread is active ?
btw
you can try
example.exe "print this$(echo -e "\n ")on newline".
I tried and executed
Regards,
Shahid nx

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Garbled character output when using cout API - c++

Related

how to define my own special character in cout

Farsi character utf8 in c++

Is it possible to print UTF-8 string with Boost and STL in windows console?

Matching russian vowels in C++

C++ printf: newline (\n) from commandline argument

Categories

Resources