Consider the following example:
#include <iostream>
#include <clocale>
#include <cstdlib>
#include <string>
int main()
{
std::setlocale(LC_ALL, "en_US.utf8");
std::string s = "03A0";
wchar_t wstr = std::strtoul(s.c_str(), nullptr, 16);
std::wcout << wstr;
}
This outputs Π on Coliru.
Question
std::strtoul, is from <cstdlib>. I'm perfectly fine with using it, but I was wondering if the above example was possible using only the C++ standard library (perhaps stringstreams)?
Note also that there is no prefex 0x on the string indicating hexadecimal.
Sure, std::stoul:
wchar_t wstr = std::stoul(s, nullptr, 16);
The main difference is the fact that it can throw exceptions for errors.
Related
Here a little code that reads a line from UFT-8 file:
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale>
#include <fstream>
#include <codecvt>
int main()
{
_setmode(_fileno(stdout), _O_U8TEXT);
auto inputFileStream = std::wifstream("input.txt");
const auto utf8Locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t>());
inputFileStream.imbue(utf8Locale);
std::wstring line;
std::getline(inputFileStream, line);
std::wcout << line << std::endl;
inputFileStream.close();
return 0;
}
When I build it with the Visual Studio Visual C++ compiler, I got the next result:
test τεστ тест
as expected.
By when I use MinGW with the GCC compiler, I got
琀攀猀琀 쐃딃쌃쐃 䈄㔄䄄䈄
As you understand, it's not the expected result.
Does any simple way exist to fix the output for GCC to the expected string?
OR
Does any simple way exist to use UTF-8 for both MSVC and GCC?
Answer (thanks for Igor Tandetnik and Remy Lebeau):
Seems, we must specify endian mode explicitly, because MSVC and GCC have different defaults. So
new std::codecvt_utf8<wchar_t, 0x10ffff, std::little_endian>()
should be used.
Fixed code:
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale>
#include <fstream>
#include <codecvt>
int main()
{
_setmode(_fileno(stdout), _O_U8TEXT);
auto inputFileStream = std::wifstream("input.txt");
const auto utf8Locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t, 0x10ffff, std::little_endian>());
inputFileStream.imbue(utf8Locale);
std::wstring line;
std::getline(inputFileStream, line);
std::wcout << line << std::endl;
inputFileStream.close();
return 0;
}
For your second question, one option is to limit the use of utf16 and std::w-prefixed stuff to the cases when you need to exchange utf16-encoded strings with the operating system. This happens when you receive arguments in wmain, open file with _wfopen, call Windows API function, etc. Otherwise, you would store, get from the user and return to the user utf8 strings using char type (char*, std::string, etc). Conversion between utf8 and utf16 can be done with MultiByteToWideChar and WideCharToMultiByte, bypassing the retarded c++ encoding api. The place where this does not work well is console input/output. Overall, you can output utf8 to the console if the user sets chcp 65001 and a ttf font. At least in Windows 7, you will also have to make sure not to split a character between two write calls, otherwise it will not print correctly (this also implies you cannot use std::cout, because msvcrt will call putc for every byte separately, and you'll need to use puts, fprintf, etc instead); I heard that this was fixed in Windows 10, but cannot confirm. Reading utf8 from the console with file api does not work as far as I know; if you want that, you'd need to detect that stdin is attached to a console and use console api instead.
#include <iostream>
using namespace std;
int main() {
std::wstring str = L"\u00A2";
std::wcout << str;
return 0;
}
Whys this doesn't work? And how solve this?
It doesn't work because in the default C locale, there is no character which corresponds to U+00A2.
If you're using a standard ubuntu install, it is most likely that your user locale uses a larger character set than US-ASCII, quite possibly Unicode encoded with UTF-8. So you just need to switch to the locale specified in the environment, as follows:
#include <iostream>
/* locale is needed for std::setlocale */
#include <locale>
#include <string>
int main() {
/* The following switches to the locale specified
* by the LC_ALL environment variable.
*/
std::setlocale (LC_ALL, "");
std::wstring str = L"\u00A2";
std::wcout << str;
return 0;
}
If you use std::string instead of std::wstring and std::cout instead of std::wcout, then you don't need the setlocale because no translation is needed (provided the console expects UTF-8).
I am working with C++ in eclipse CDT and I am trying to convert string to uint64_t by using strtoull but everytime I get below error message -
..\src\HelloTest.cpp:39:42: error: strtoull was not declared in this scope
Below is my C++ example
#include <iostream>
#include <cstring>
#include <string>
using namespace std;
int main() {
string str = "1234567";
uint64_t hashing = strtoull(str, 0, 0);
cout << hashing << endl;
}
return 0;
}
Is there anything wrong I am doing?
Why your solution doesn't work has already been pointed out by others. But there hasn't been a good alternative suggested yet.
Try this for C++03 strtoull usage instead:
#include <string>
#include <cstdlib>
int main()
{
std::string str = "1234";
// Using NULL for second parameter makes the call easier,
// but reduces your chances to recover from error. Check
// the docs for details.
unsigned long long ul = std::strtoull( str.c_str(), NULL, 0 );
}
Or, since C++11, do it directly from std::string via stoull (which is just a wrapper for the above, but saves on one include and one function call in your code):
#include <string>
int main()
{
std::string str = "1234";
// See comment above.
unsigned long long ul = std::stoull( str, nullptr, 0 );
}
Never use char[] or pointers if you have a working alternative. The dark side of C++, they are. Quicker, easier, more seductive. If once you start down the dark path, forever will it dominate your destiny, consume you it will. ;-)
the structure for strtoull is: strtoull(const char *, char * *, int)
You have given it a std::string as pointed out by #juanchopanza
This is the solution I came up with is
#include <iostream>
#include <cstring>
#include <string>
#include <cstdlib>
using namespace std;
int main() {
char str[] = "1234567";
unsigned long long ul;
char* new_pos;
charDoublePointer = 0;
ul = strtoull(str, &new_pos, 0);
cout << ul << endl;
return 0;
}
The output I got was: 1234567
Straight from the eclipse console.
Also at the end of your program you have return 0 out of scope with an extra curly brace.
#include "stdafx.h"
#include <string>
#include <windows.h>
using namespace std;
int main()
{
string FilePath = "C:\\Documents and Settings\\whatever";
CreateDirectory(FilePath, NULL);
return 0;
}
Error: error C2664: 'CreateDirectory' : cannot convert parameter 1 from 'const char *' to 'LPCTSTR'
How do I make this conversion?
The next step is to set today's date as a string or char and concatenate it with the filepath. Will this change how I do step 1?
I am terrible at data types and conversions, is there a good explanation for 5 year olds out there?
std::string is a class that holds char-based data. To pass a std::string data to API functions, you have to use its c_str() method to get a char* pointer to the string's actual data.
CreateDirectory() takes a TCHAR* as input. If UNICODE is defined, TCHAR maps to wchar_t, otherwise it maps to char instead. If you need to stick with std::string but do not want to make your code UNICODE-aware, then use CreateDirectoryA() instead, eg:
#include "stdafx.h"
#include <string>
#include <windows.h>
int main()
{
std::string FilePath = "C:\\Documents and Settings\\whatever";
CreateDirectoryA(FilePath.c_str(), NULL);
return 0;
}
To make this code TCHAR-aware, you can do this instead:
#include "stdafx.h"
#include <string>
#include <windows.h>
int main()
{
std::basic_string<TCHAR> FilePath = TEXT("C:\\Documents and Settings\\whatever");
CreateDirectory(FilePath.c_str(), NULL);
return 0;
}
However, Ansi-based OS versions are long dead, everything is Unicode nowadays. TCHAR should not be used in new code anymore:
#include "stdafx.h"
#include <string>
#include <windows.h>
int main()
{
std::wstring FilePath = L"C:\\Documents and Settings\\whatever";
CreateDirectoryW(FilePath.c_str(), NULL);
return 0;
}
If you're not building a Unicode executable, calling c_str() on the std::string will result in a const char* (aka non-Unicode LPCTSTR) that you can pass into CreateDirectory().
The code would look like this:
CreateDirectory(FilePath.c_str(), NULL):
Please note that this will result in a compile error if you're trying to build a Unicode executable.
If you have to append to FilePath I would recommend that you either continue to use std::string or use Microsoft's CString to do the string manipulation as that's less painful that doing it the C way and juggling raw char*. Personally I would use std::string unless you are already in an MFC application that uses CString.
I'm trying to convert UTF-16 encoded strings to UCS-4
If I understand correctly, C++11 provides this conversion through codecvt_utf16.
My code is something like:
#include <iostream>
#include <locale>
#include <memory>
#include <codecvt>
#include <string>
using namespace std;
int main()
{
u16string s;
s.push_back('h');
s.push_back('e');
s.push_back('l');
s.push_back('l');
s.push_back('o');
wstring_convert<codecvt_utf16<wchar_t>, wchar_t> conv;
wstring ws = conv.from_bytes(reinterpret_cast<const char*> (s.c_str()));
wcout << ws << endl;
return 0;
}
Note: the explicit push_backs to get around the fact that my version of clang (Xcode 4.2) doesn't have unicode string literals.
When the code is run, I get terminate exception. Am I doing something illegal here? I was thinking it should work because the const char* that I passed to wstring_convert is UTF-16 encoded, right? I have also considered endianness being the issue, but I have checked that it's not the case.
Two errors:
1) from_bytes() overload that takes the single const char* expects a null-terminated byte string, but your very second byte is '\0'.
2) your system is likely little-endian, so you need to convert from UTF-16LE to UCS-4:
#include <iostream>
#include <locale>
#include <memory>
#include <codecvt>
#include <string>
using namespace std;
int main()
{
u16string s;
s.push_back('h');
s.push_back('e');
s.push_back('l');
s.push_back('l');
s.push_back('o');
wstring_convert<codecvt_utf16<wchar_t, 0x10ffff, little_endian>,
wchar_t> conv;
wstring ws = conv.from_bytes(
reinterpret_cast<const char*> (&s[0]),
reinterpret_cast<const char*> (&s[0] + s.size()));
wcout << ws << endl;
return 0;
}
Tested with Visual Studio 2010 SP1 on Windows and CLang++/libc++-svn on Linux.