I'm trying to read a text file written in utf-16 as utf-8.
#include <fstream>
#include <codecvt>
#include <cassert>
int main(int argc, char** argv)
{
std::ios_base::sync_with_stdio(false);
const wchar_t utf16_raw_string[] = L"Привет!";
const char expected_string [] = u8"Привет!";
std::ofstream("file.txt").write((char*)utf16_raw_string, sizeof(utf16_raw_string));
std::ifstream ifs("file.txt");
ifs.imbue(std::locale(std::locale::empty(), new std::codecvt<wchar_t, char, std::mbstate_t>() ));
std::string got_string;
ifs >> got_string;
assert(got_string == expected_string);
return 0;
}
It seems that imbue takes no effect. No matter what codecvt is, got_string is always "\x1f\x4#\x48\x42\x45\x4B\x4" and I get the assertion. Any ideas?
I'm using Visual Studio 2015 with update 3.
Related
I have one problem with CString and STL's set.
It looks a bit strange to use CString and STL together, but I tried to be curious.
My code is below:
#include "stdafx.h"
#include <iostream>
#include <set>
#include <atlstr.h>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
wchar_t line[1024] = {0};
FILE * pFile = _wfopen(L"F:\\test.txt", L"rt");
set<CString> cstr_set;
while (fgetws(line, 1024, pFile))
{
CString cstr;
swscanf(line, L"%s\n", cstr);
cstr_set.insert(cstr);
}
fclose(pFile);
cout << "count" << cstr_set.size();
return 0;
}
The contents of the test.txt is:
13245
123
2344
45
After the loop ends, cstr_set contains only one value.
It works as if cstr is static or const variable.
What is the problem?
A CString is a Microsoft implementation wrapping a character array into a C++ object to allow simpler processing.
But, swscanf is a good old C function that knows nothing about what a CString is: it just expects its arguments to be large enough to accept the decoded values. It should never be directly passed a CString.
The correct way would be:
...
#include <cstring>
...
while (fgetws(line, 1024, pFile))
{
line[wcscspn(line, L"\n")] = 0; // remove an optional end of line
CString cstr(line);
cstr_set.insert(cstr);
}
...
Im trying to convert the command line argument(*argv[]) to an integer using the atoi function
int main(int argc, char *argv[]) {
This is my attempt
#include <iostream>
#include <sstream>
#include <string>
#include <cstdlib>
#include <conio.h>
using namespace std;
int main(int argc, char *argv[]) {
int x = 0;
for ( x=0; x < argc; x++ )
{
int x = atoi(argv[1]);
cout << x;
}
return 0;
}
However this returns 0 and im unsure why. Thankyou
It's hard to say having the arguments you pass to your program, but there are few problems here.
Your loop goes from 0 to argc, but your inside your loop you always use argv[1], if you didn't pass any arguments you're going out of bounds, because argv[0] is always the path to your executable.
atoi is a function from C, and when it fails to parse it's argument as an int, it returns 0, replace it with std::stoi, and you will get and execption if the conversion failed. You can catch this exception with try/catch, and then check the string that you tried to convert to int.
Well, this
#include <iostream>
#include <sstream>
#include <string>
#include <cstdlib>
#include <conio.h>
using namespace std;
int main(int argc, char* argv[]) {
int x = 0;
for (x = 0; x < argc; x++)
{
cout << argv[x];
}
return 0;
}
just prints the path to the .exe, the path is a string, it has no numbers. And as I understood from my "research" about command line arguments, you need to use your program through a command line, a terminal, to initialise the argv argument.
Link : https://www.tutorialspoint.com/cprogramming/c_command_line_arguments.htm
Also, as I understood at least, the argv[0] is always the path of the .exe
I hope I will be of some help, if I am mistaken at something, pls tell me where and I will correct my self by editing the answer
I'm trying to compute a SHA256 hash of the string iEk21fuwZApXlz93750dmW22pw389dPwOkm198sOkJEn37DjqZ32lpRu76xmw288xSQ9
When I run my C++ code, I get a string that's not even a valid SHA256 hash. However, when I run echo -n iEk21fuwZApXlz93750dmW22pw389dPwOkm198sOkJEn37DjqZ32lpRu76xmw288xSQ9 | openssl sha256, I get the correct hash. Here's my C++ code:
#include <iostream>
#include <time.h>
#include <sstream>
#include <string>
#include <iomanip>
#include <typeinfo>
#include <openssl/sha.h>
#include <cstdio>
#include <cstring>
std::string hash256(std::string string) {
unsigned char digest[SHA256_DIGEST_LENGTH];
SHA256_CTX ctx;
SHA256_Init(&ctx);
SHA256_Update(&ctx, string.c_str(), std::strlen(string.c_str()));
SHA256_Final(digest, &ctx);
char mdString[SHA256_DIGEST_LENGTH*2+1];
for (int i = 0; i < SHA256_DIGEST_LENGTH; i++)
std::sprintf(&mdString[i*2], "%02x", (unsigned int)digest[i]);
return std::string(mdString);
}
int main(int argc, char *argv[])
{
const char *hash = hash256("iEk21fuwZApXlz93750dmW22pw389dPwOkm198sOkJEn37DjqZ32lpRu76xmw288xSQ9").c_str();
std::cout << hash << std::endl;
return 0;
}
Another thing to note: When I run my code in an online compiler, such as Coliru, I get the correct hash. I am compiling with G++ on Cygwin with OpenSSL version OpenSSL 1.0.1g 7 Apr 2014
As pointed out by #Alan Stokes, you have Undefined Behavior due to a dangling reference to the internal structure of the string. Change your declaration of hash in main:
std::string hash = hash256("...");
I'm trying to implement an application where I would like users to enter accented chars on the command line. What I'm trying to do is to convert the char array into a vector of wstring.
I'm on Linux.
Here is what I got so far:
#include <vector>
#include <string>
#include <cstring>
#include <iostream>
std::vector<std::wstring> parse_args(int argc, const char* argv[]){
std::vector<std::wstring> args;
for(int i = 0; i < argc - 1; ++i){
auto raw = argv[i+1];
wchar_t* buf = new wchar_t[1025];
auto size = mbstowcs(buf, raw, 1024);
args.push_back(std::wstring(buf, size));
delete[] buf;
}
return std::move(args);
}
int main(int argc, const char* argv[]){
auto args = parse_args(argc, argv);
for(auto& arg : args){
std::wcout << arg << std::endl;
}
}
It works as expected with normal characters, but does not with accented chars. For instance, if I do:
./a.out Ménage
it crashes:
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::_S_create
[1] 30564 abort ./a.out Ménage
The exception comes from the constructor of wstring because size = 18446744073709551615 (size_t - 1 I think) which seems to indicate that there is an unexpected character.
I don't see what it goes wrong ?
What I'm doing wrong ?
EDIT: It's going better
If I add
setlocale(LC_ALL, "");
At the beginning of the program, it doesn't crash, but does output a weird char:
M�nage
could it be a problem with my console now ?
The mbstowcs function uses the character encoding from the current locale. You are not setting the locale, so the default "C" locale gets used; the default locale supports ASCII characters only. Also, you should check the return value of mbstowcs, so it won't fail without you knowing it.
To fix this problem, set the locale in your program:
#include <clocale>
...
int main(int argc, const char* argv[]){
setlocale(LC_ALL,""); // Use locale from environment
....
}
I have some strings read from the database, stored in a char* and in UTF-8 format (you know, "á" is encoded as 0xC3 0xA1). But, in order to write them to a file, I first need to convert them to ANSI (can't make the file in UTF-8 format... it's only read as ANSI), so that my "á" doesn't become "á". Yes, I know some data will be lost (chinese characters, and in general anything not in the ANSI code page) but that's exactly what I need.
But the thing is, I need the code to compile in various platforms, so it has to be standard C++ (i.e. no Winapi, only stdlib, stl, crt or any custom library with available source).
Anyone has any suggestions?
A few days ago, somebody answered that if I had a C++11 compiler, I could try this:
#include <string>
#include <codecvt>
#include <locale>
string utf8_to_string(const char *utf8str, const locale& loc)
{
// UTF-8 to wstring
wstring_convert<codecvt_utf8<wchar_t>> wconv;
wstring wstr = wconv.from_bytes(utf8str);
// wstring to string
vector<char> buf(wstr.size());
use_facet<ctype<wchar_t>>(loc).narrow(wstr.data(), wstr.data() + wstr.size(), '?', buf.data());
return string(buf.data(), buf.size());
}
int main(int argc, char* argv[])
{
string ansi;
char utf8txt[] = {0xc3, 0xa1, 0};
// I guess you want to use Windows-1252 encoding...
ansi = utf8_to_string(utf8txt, locale(".1252"));
// Now do something with the string
return 0;
}
Don't know what happened to the response, apparently someone deleted it. But, turns out that it is the perfect solution. To whoever posted, thanks a lot, and you deserve the AC and upvote!!
If you mean ASCII, just discard any byte that has bit 7 set, this will remove all multibyte sequences. Note that you could create more advanced algorithms, like removing the accent from the "á", but that would require much more work.
This should work:
#include <string>
#include <codecvt>
using namespace std::string_literals;
std::string to_utf8(const std::string& str, const std::locale& loc = std::locale{}) {
using wcvt = std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t>;
std::u32string wstr(str.size(), U'\0');
std::use_facet<std::ctype<char32_t>>(loc).widen(str.data(), str.data() + str.size(), &wstr[0]);
return wcvt{}.to_bytes(wstr.data(),wstr.data() + wstr.size());
}
std::string from_utf8(const std::string& str, const std::locale& loc = std::locale{}) {
using wcvt = std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t>;
auto wstr = wcvt{}.from_bytes(str);
std::string result(wstr.size(), '0');
std::use_facet<std::ctype<char32_t>>(loc).narrow(wstr.data(), wstr.data() + wstr.size(), '?', &result[0]);
return result;
}
int main() {
auto s0 = u8"Blöde C++ Scheiße äöü!!1Elf"s;
auto s1 = from_utf8(s0);
auto s2 = to_utf8(s1);
return 0;
}
For VC++:
#include <string>
#include <codecvt>
using namespace std::string_literals;
std::string to_utf8(const std::string& str, const std::locale& loc = std::locale{}) {
using wcvt = std::wstring_convert<std::codecvt_utf8<int32_t>, int32_t>;
std::u32string wstr(str.size(), U'\0');
std::use_facet<std::ctype<char32_t>>(loc).widen(str.data(), str.data() + str.size(), &wstr[0]);
return wcvt{}.to_bytes(
reinterpret_cast<const int32_t*>(wstr.data()),
reinterpret_cast<const int32_t*>(wstr.data() + wstr.size())
);
}
std::string from_utf8(const std::string& str, const std::locale& loc = std::locale{}) {
using wcvt = std::wstring_convert<std::codecvt_utf8<int32_t>, int32_t>;
auto wstr = wcvt{}.from_bytes(str);
std::string result(wstr.size(), '0');
std::use_facet<std::ctype<char32_t>>(loc).narrow(
reinterpret_cast<const char32_t*>(wstr.data()),
reinterpret_cast<const char32_t*>(wstr.data() + wstr.size()),
'?', &result[0]);
return result;
}
int main() {
auto s0 = u8"Blöde C++ Scheiße äöü!!1Elf"s;
auto s1 = from_utf8(s0);
auto s2 = to_utf8(s1);
return 0;
}
#include <stdio.h>
#include <string>
#include <codecvt>
#include <locale>
#include <vector>
using namespace std;
std::string utf8_to_string(const char *utf8str, const locale& loc){
// UTF-8 to wstring
wstring_convert<codecvt_utf8<wchar_t>> wconv;
wstring wstr = wconv.from_bytes(utf8str);
// wstring to string
vector<char> buf(wstr.size());
use_facet<ctype<wchar_t>>(loc).narrow(wstr.data(), wstr.data() + wstr.size(), '?', buf.data());
return string(buf.data(), buf.size());
}
int main(int argc, char* argv[]){
std::string ansi;
char utf8txt[] = {0xc3, 0xa1, 0};
// I guess you want to use Windows-1252 encoding...
ansi = utf8_to_string(utf8txt, locale(".1252"));
// Now do something with the string
return 0;
}