How to convert a string to its unicode escapes? - c++

I'm trying to convert a char array to unicode-escapedchar array.
Say I have a string "C:/İmüp".
How can I convert it to C:/\u0130m\u00fcp as char array or const char?
(I get "C:/Hello İmüp" as char array via ExpandEnvironmentStrings(), then i need to write that to a file with its unicode escapes)
I tried typecast converting, std::stringstream and ASCII tables, looked up for examples on C++ json encoders, however i couldn't get it working

Try this:
std::wstring env;
// fill env with data from ExpandEnvironmentStringsW()...
std::stringstream ss;
for (std::wstring::iterator iter = env.begin(); iter != env.end(); ++iter)
{
if (*iter <= 127)
ss << (char) *iter;
else
ss << "\\u" << std::hex << std::setfill('0') << std::setw(4) << (int)*iter;
}
std::string str = ss.str();
// use str as needed...

First convert it from char array to wchar_t array, using the system-default code page.
Then write trivial code that walks over your wchar_t array and escapes every Unicode character with code >= 128.
P.S. Better yet, make your application Unicode so it will use Unicode version of ExpandEnvironmentStrings. This way you will only have to escape the string, plus your app will still work correctly if some environmental string contains a character that doesn’t fit in char with your system-default code page.

Try this code:
string yourAsciiString = "this is test";
string yourUnicodeString = System.Text.Encoding.Unicode.GetString(System.Text.Encoding.ASCII.GetBytes(yourAsciiString));

Related

Changing type of char using wchar_t used not so like L

In my code I tried to create massive of 4 bite chars, where every char contain a Cyrillic letter.
wchar_t OUT_STRING[4] = { L'т',L'л',L'о',L'р' };
All in normal with this and I have expected output. It's only test, in real I need to convert element from string to the same type like in OUT_STRING; I tried to use something like this:
wchar_t OUT_STRING[4] = { (wchar_t)'т',L'л',L'о',L'р' };
But it didn't work and in output I have a rectangle.
I think you want to pass in a string using std::string in UTF-8 encoding and process it one character at a time, each time converting the single character to a wide character string of length 1 so that you can pass it to TTF_SizeUNICODE, and TTF_RenderUNICODE_Blended.
I will demonstrate the relevant string conversion code.
Here is a test function that expects a null-terminated wide character string with just one character in it. The body of main shows how to convert a UTF-8 string to UTF-16 (using codecvt_utf8_utf16) and how to convert a single character to a string (using std::wstring(1, ch))
#include <string>
#include <codecvt>
#include <iostream>
void test(const wchar_t* str) {
std::cout << "Number of characters in string: " << wcslen(str) << std::endl;
for (const wchar_t* ch = str; *ch; ++ch) {
std::cout << ((int)*ch) << std::endl;
}
}
int main() {
std::string input = u8"тлор";
for (wchar_t ch : std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t>().from_bytes(input)) {
std::wstring string_with_just_one_character(1, ch);
test(string_with_just_one_character.c_str());
}
return 0;
}

Writing class object to file using streams

I have this code to serialize/deserialize class objects to file, and it seems to work.
However, I have two questions.
What if instead two wstring's (as I have now) I want to have one wstring and one string member
variable in my class? (I think in such case my code won't work?).
Finally, below, in main, when I initialize s2.product_name_= L"megatex"; if instead of megatex I write something in Russian say (e.g., s2.product_name_= L"логин"), the code doesn't work anymore as intended.
What can be wrong? Thanks.
Here is code:
// ConsoleApplication3.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <iostream>
#include <string>
#include <fstream> // std::ifstream
using namespace std;
// product
struct Product
{
double price_;
double product_index_;
wstring product_name_;
wstring other_data_;
friend std::wostream& operator<<(std::wostream& os, const Product& p)
{
return os << p.price_ << endl
<< p.product_index_ << endl
<< p.product_name_ << endl
<< p.other_data_ << endl;
}
friend wistream& operator>>(std::wistream& is, Product& p)
{
is >> p.price_ >> p.product_index_;
is.ignore(std::numeric_limits<streamsize>::max(), '\n');
getline(is,p.product_name_);
getline(is,p.other_data_);
return is;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
Product s1,s2;
s1.price_ = 100;
s1.product_index_ = 0;
s1.product_name_= L"flex";
s1.other_data_ = L"dat001";
s2.price_ = 300;
s2.product_index_ = 2;
s2.product_name_= L"megatex";
s2.other_data_ = L"dat003";
// write
wofstream binary_file("c:\\test.dat",ios::out|ios::binary|ios::app);
binary_file << s1 << s2;
binary_file.close();
// read
wifstream binary_file2("c:\\test.dat");
Product p;
while (binary_file2 >> p)
{
if(2 == p.product_index_){
cout<<p.price_<<endl;
cout<<p.product_index_<<endl;
wcout<<p.product_name_<<endl;
wcout<<p.other_data_<<endl;
}
}
if (!binary_file2.eof())
std::cerr << "error during parsing of input file\n";
else
std::cerr << "Ok \n";
return 0;
}
What if instead two wstring's (as I have now) I want to have one
wstring and one string member variable in my class? (I think in such
case my code won't work?).
There are an inserter defined for char * for any basic_ostream (ostream and wostream), so you can use the result of c_str() member function call for the string member. For example, if the string member is other_data_:
return os << p.price_ << endl
<< p.product_index_ << endl
<< p.product_name_ << endl
<< p.other_data_.c_str() << endl;
The extractor case is more complex, since you'll have to read as wstring and the convert to string. The most simple way to do this is just reading as wstring and then narrowing each character:
wstring temp;
getline(is, temp);
p.other_data_ = string(temp.begin(), temp.end());
I'm not using locales in this sample, just converting a sequence of bytes (8 bits) to a sequence of words (16 bits) for output and the opposite (truncating values) for input. That is OK if you are using ASCII chars, or using single-byte chars and you don't require an specific format (as Unicode) for output.
Otherwise, you will need handle with locales. locale gives cultural contextual information to interpret the string (remember that is just a sequence of bytes, not characters in the sense of letters or symbols; the map between the bytes and what symbol represents is defined by the locale). locale is not an very easy to use concept (human culture isn't too). As you suggest yourself, it would be better make first some investigation about how it works.
Anyway, the idea is:
Identify the charset used in string and the charset used in file (Unicode or utf-16).
Convert the strings from original charset to Unicode using locale for output.
Convert the wstrings read from file (in Unicode) to strings using locale.
Finally, below, in main, when I initialize s2.product_name_=
L"megatex"; if instead of megatex I write something in Russian say
(e.g., s2.product_name_= L"логин"), the code doesn't work anymore as
intended.
When you define an array of wchar_t using L"", you'are not really specifying the string is Unicode, just that the array is of chars, not wchar_t. I suppose the intended working is s2.product_name_ store the name in Unicode format, but the compiler will take every char in that string (as without L) and convert to wchar_t just padding with zeros the most significant byte. Unicode is not good supported in the C++ standard until C++11 (and is still not really too supported). It works just for ASCII characters because they have the same codification in Unicode (or UTF-8).
For using the Unicode characters in a static string, you can use escape characters: \uXXXX. Doing that for every not-English character is not very comfortable, I know. You can found a list of Unicode characters in multiple sites in the web. For example, in the Wikipedia: http://en.wikipedia.org/wiki/List_of_Unicode_characters.

c++ custom string format using stringstreams

I am trying to use the new stringstreams method to convert certain float+int combination into certain format but trying to see if there is any better way to handle this:
Now using //string String = static_cast( &(ostringstream() << Number) )->str(); kind of mode - How can I get this stored into a string form of the format - "1.10(3)". Precision is equal to decimals. The catch here is none of these values are constants. Even if the solution can't be an in-line function or stringstreams - it's fine as long as it's generic enough. Also note that in the end the plan is to use this string into GDI text string.
Thanks in advance - if any one can help.
Here is my current sample code(and looking for an alternate efficient way to get this done):
string Convert(float number,int decimals)
{
std::ostringstream buff;
buff<<setprecision(decimals)<<fixed<<number;
return buff.str();
}
float f=1.1; // this can have any values from 1,1.5 or 1.52
int decimals=2; //dynamic number - calculated by other means - not a fixed number
int i=3; // some dynamic number as well - calculated by other means
string s=Convert(f,decimals)+"("+Convert(i,0)+")"; // output - 1.10(3)
You can use std::fixed, std::setprecision, std::setw and std::setfill defined in <iomanip> :
float f=1.1;
int decimals=2;
int i=3;
ostringstream ss;
ss << std::fixed << std::setprecision(decimals) << f << '(' << i << ')';
string str = ss.str();
Which outputs :
1.10(3)
You can also configure the stringstream and keep this configuration :
ostringstream ss;
ss.precision(5);
ss.setf(std::ios::fixed);
EDIT
You can still do this in one line if you really want to :
string str = ((ostringstream&)(ostringstream() << fixed << setprecision(decimals) << f << '(' << i << ')')).str();
If you want a LPCWSTR (const wchar_t *) instead of a LPCSTR (const char*) you should use wstringstream instead of stringstream.
ostringstream ss;
string str = ss.str();
LPCSTR* c_str = str.c_str();
wostringstream wss;
wstring wstr = wss.str();
LPCWSTR* wc_str = wstr.c_str();
If you want a LPCTSTR (LPCSTR or LPCWSTR if UNICODE is defined), you can use some typedef like this :
typedef std::basic_string<TCHAR> tstring;
typedef std::basic_ostringstream<TCHAR , std::char_traits<TCHAR> > tstringstream;
tostringstream tss;
tstring tstr = tss.str();
LPCTSTR* tc_str = tstr.c_str();
TCHAR is a char * if UNICODE is not defined in your project and a wchar_t * if UNICODE is defined.

Store a string of hex into a char?

so I have a string that has a hex value in it. For example, my string may have
string a = "0x4D";
Would it be possible to assign 0x4D to a char? Because I know that if I had
char c = 0x4D
then I could print out its ASCII symbol, which would be M.
Is it possible to store "0x4D" into a char so that I can print out its ascii symbol? If anyone has any tips, that would be appreciated! If there's a better way to do this, please let me know! Thanks!
You can use strtol to convert the string to a number. You can then print this number or do other things you like with it.
Oh wait, you tagged it C++, and strtol is very much C-style. In C++, you can use a stringstream, and extract a number from it.
You can use std::stoi to convert the string to an integer (the base is auto-detected from the 0x prefix):
std::string str = "0x4D";
char c = static_cast<char>(std::stoi(str));
std::cout << c << std::endl;
However, this is not guaranteed to give you the ASCII character for that value. There are various translations between character sets that occur in this simple code alone. For example, the chars in the string literal "0x4D" are initialized with the corresponding value in the implementation-defined execution character set. The printed character is also up to interpretation by the medium that is displaying it.
The best you could do is provide a mapping from ASCII values to characters. You could do this with an array where the index is the ASCII value and the element is the corresponding character.
To use stringstreams as Bas suggests:
int x;
string s = "0x10";
stringstream ss;
ss << hex << s;
ss >> x;
But I think it's a wasteful way to do it.
Here is a solution based on std::stringstream:
std::istringstream iss ("0x4D");
iss.flags(std::ios::hex);
int i;
iss >> i;
std::cout << "[" << (char)i << "]" << std::endl; //--> prints "[M]"

How to cast wchar_t into int for displaying the code point?

I have a simple function in my program, when I was wanting to mess around with unicode and do stuff with it. In this function, I wished to display the code value of the character the user entered. It SEEMED possible, here's my function:
wstring listcode(wchar_t arg) {
wstring str = L"";
str += static_cast<int> (arg); //I tried (int) arg as well
str += L": ";
str += (wchar_t) arg;
return str;
}
Now as you see I just wanted to display the integer value (like an ascii character, such as (int) "a"), but something like listcode(L"&") will be displayed as &: & !
Is it not possible to find the integer value of a wide character like that?
In C++, you cannot add anything to strings but characters and other strings. There is no implicit conversion from int (or anything else) to string. That's just the way the string type is designed.
What you do instead is to use string streams:
std::wstring listcode(wchar_t arg)
{
std::wostringstream oss;
oss << static_cast<int>(arg);
oss << L": ";
oss << arg;
return oss.str();
}
In practice, however, when converting to strings in C++, it's better to have functions writing to a stream, than returning a string:
void listcode(std::wostream os, wchar_t arg)
{
os << static_cast<int>(arg);
os << L": ";
os << arg;
}
That way, if you want to output something to the console or to a file, you can directly pass std::cout or a file stream, and if you want a string, you just pass a string stream.