WideCharToMultiByte doesn't work in Wine - c++

I'm trying to use WideCharToMultiByte in order to convert std::wstring to utf8 std::string. Here is my code:
const std::wstring & utf16("lorem ipsum"); // input
if (utf16.empty()) {
return "";
}
cout << "wstring -> string, input: , size: " << utf16.size() << endl;
for (size_t i = 0; i < utf16.size(); ++i) {
cout << i << ": " << static_cast<int>(utf16[i]) << endl;
}
for (size_t i = 0; i < utf16.size(); ++i) {
wcout << static_cast<wchar_t>(utf16[i]);
}
cout << endl;
std::string res;
int required_size = 0;
if ((required_size = WideCharToMultiByte(
CP_UTF8,
0,
utf16.c_str(),
utf16.size(),
nullptr,
0,
nullptr,
nullptr
)) == 0) {
throw std::invalid_argument("Cannot convert.");
}
cout << "required size: " << required_size << endl;
res.resize(required_size);
if (WideCharToMultiByte(
CP_UTF8,
0,
utf16.c_str(),
utf16.size(),
&res[0],
res.size(),
nullptr,
nullptr
) == 0) {
throw std::invalid_argument("Cannot convert.");
}
cout << "Result: " << res << ", size: " << res.size() << endl;
for (size_t i = 0; i < res.size(); ++i) {
cout << i << ": " << (int)static_cast<uint8_t>(res[i]) << endl;
}
exit(1);
return res;
It runs OK, no exceptions, no error. Only the result is wrong. Here is output from running the code:
wstring -> string, input: , size: 11
0: 108
1: 111
2: 114
3: 101
4: 109
5: 32
6: 105
7: 112
8: 115
9: 117
10: 109
lorem ipsum
required size: 11
Result: lorem , size: 11
0: 108
1: 0
2: 111
3: 0
4: 114
5: 0
6: 101
7: 0
8: 109
9: 0
10: 32
I don't understand why are there the null bytes. What am I doing wrong?

Summarizing from comments:
Your code is correct as far as the WideCharToMultiByte logic and arguments go; the only actual problem is the initialization of utf16, which needs to be initialized with a wide literal. The code gives the expected results with both VC++ 2015 RTM and Update 1, so this is a bug in the WideCharToMultiByte emulation layer you're using.
That said, for C++11 onwards, there is a portable solution you should prefer when possible: std::wstring_convert in conjunction with std::codecvt_utf8_utf16
#include <cstddef>
#include <string>
#include <locale>
#include <codecvt>
#include <iostream>
std::string test(std::wstring const& utf16)
{
std::wcout << L"wstring -> string, input: " << utf16 << L", size: " << utf16.size() << L'\n';
for (std::size_t i{}; i != utf16.size(); ++i)
std::wcout << i << L": " << static_cast<int>(utf16[i]) << L'\n';
for (std::size_t i{}; i != utf16.size(); ++i)
std::wcout << utf16[i];
std::wcout << L'\n';
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> cvt;
std::string res = cvt.to_bytes(utf16);
std::wcout << L"Result: " << res.c_str() << L", size: " << res.size() << L'\n';
for (std::size_t i{}; i != res.size(); ++i)
std::wcout << i << L": " << static_cast<int>(res[i]) << L'\n';
return res;
}
int main()
{
test(L"lorem ipsum");
}
Online Demo

Related

C++ bitshift causing issues to another variable

I'm currently learning c++ and decided as a first project to make a wave file parser. I have the logic I want to get the ChunkSize, but for some reason the bit shifting for wav.ChunkSize causes the wav.ChunkID to append 2 extra characters at the end.
#include <fstream>
#include <iterator>
#include <vector>
#include <iostream>
#include <bitset>
#include "bwave.h"
#include <cstdint>
using namespace std;
bwave wav;
int main(){
std::ifstream input("cello.wav", std::ios::binary);
std::vector<uint8_t> buffer((std::istreambuf_iterator<char>(input)),(std::istreambuf_iterator<char>()));
for (int i = 0; i < 60 ; ++i)
{
std::cout << "Byte " << i << ": ";
std::cout << buffer[i] <<" ";
std::cout << "\n";
}
wav.ChunkID[0] = buffer[0];
wav.ChunkID[1] = buffer[1];
wav.ChunkID[2] = buffer[2];
wav.ChunkID[3] = buffer[3];
wav.Type[0] = buffer[8];
wav.Type[1] = buffer[9];
wav.Type[2] = buffer[10];
wav.Type[3] = buffer[11];
wav.DataStart[0] = buffer[36];
wav.DataStart[1] = buffer[37];
wav.DataStart[2] = buffer[38];
wav.DataStart[3] = buffer[39];
unsigned char tmp[4];
// ChunkSize
for (int i = 4; i < 8 ; ++i)
{
switch(i){
case 4:
tmp[3] = buffer[i];
std::cout << bitset<8>(tmp[3]) << "\n";
break;
case 5:
tmp[2] = buffer[i];
std::cout << bitset<8>(tmp[2]) << "\n";
break;
case 6:
tmp[1] = buffer[i];
std::cout << bitset<8>(tmp[1]) << "\n";
break;
case 7:
tmp[0] = buffer[i];
std::cout << bitset<8>(tmp[0]) << "\n";
break;
default:
printf("%s\n","Error!" );
break;
}
}
std::cout << bitset<24>(tmp[0] << 24 | tmp[1] << 16 | tmp[2] << 8 | tmp[3] ) << "\n";
wav.ChunkSize = tmp[0] << 24 | tmp[1] << 16 | tmp[2] << 8 | tmp[3];
// Datasize
for (int i = 40; i < 44 ; ++i)
{
switch(i){
case 40:
tmp[3] = buffer[i];
break;
case 41:
tmp[2] = buffer[i];
break;
case 42:
tmp[1] = buffer[i];
break;
case 43:
tmp[0] = buffer[i];
break;
default:
printf("%s\n","Error!" );
break;
}
}
//wav.DataSize = tmp[0] << 24 | tmp[1] << 16 | tmp[2] << 8 | tmp[3];
std::cout << "Header: " << wav.ChunkID << "\n";
std::cout << "Size: " << wav.ChunkSize << "\n";
std::cout << "Type: " << wav.Type << "\n";
std::cout << "Data: " << wav.DataStart << "\n";
std::cout << "Data Size: " << wav.DataSize << "\n";
return 0;
}
Output of the following.
...
Byte 59:
00101010
01100011
00001100
00000000
000011000110001100101010
Header: RIFF*c
Size: 811818
Type: WAVE
Data: data
Data Size:
Also i'm open to any suggestions for best practices.
Reference - http://soundfile.sapp.org/doc/WaveFormat/
Thank you for your time,
B
EDIT - The bwave file
struct bwave{
char ChunkID[4];
uint ChunkSize;
char Type[4];
char Format[4];
uint NumChannels;
uint SampleRate;
uint BPS;
char DataStart[4];
char DataSize;
};
Looks like my problem has been solved! I fixed it by giving my char array 1 extra byte than it needed. I think this worked, because C expects a null terminator and if it doesn't see one the memory does something wonkey. Like printing characters I didn't want.
the fix in the end was as follows.
char ChunkID[4]; -> char ChunkID[5];
Thank you everyone for the help!

creating a c++ program that displays hexadecimal-formatted data from a bmp file

I'm trying to create a program that displays output of a bmp file in the form of hexadecimal. So far I get the output, but I need it to be organized a certain way.
The way it needs to be organized is with the address of the bmp file to be on the left column and then 16 bytes of data in hex across each row in the order they appear in the file. While leaving an extra space between every 8 bytes. So far, I got the hexadecimal to show up, I just need help with organizing it.
What I have:
What I'm trying to make it look like:
Here is my code:
#include <iostream> // cout
#include <fstream> // ifstream
#include <iomanip> // setfill, setw
#include <stdlib.h>
using namespace std; // Use this to avoid repeated "std::cout", etc.
int main(int argc, char *argv[]) // argv[1] is the first command-line argument
[enter image description here][1]{
// Open the provided file for reading of binary data
ifstream is("C:\\Users\\Test\\Documents\\SmallTest.bmp", ifstream::binary);
if (is) // if file was opened correctly . . .
{
is.seekg(0, is.end); // Move to the end of the file
int length = is.tellg(); // Find the current position, which is file length
is.seekg(0, is.beg); // Move to the beginning of the file
char * buffer = new char[length]; // Explicit allocation of memory.
cout << "Reading " << length << " characters... ";
is.read(buffer, length); // read data as a block or group (not individually)
if (is)
cout << "all characters read successfully.\n";
else
cout << "error: only " << is.gcount() << " could be read.\n";
is.close();
// Now buffer contains the entire file. The buffer can be printed as if it
// is a _string_, but by definition that kind of print will stop at the first
// occurrence of a zero character, which is the string-ending mark.
cout << "buffer is:\n" << buffer << "\n"; // Print buffer
for (int i = 0; i < 100; i++) // upper range limit is typically length
{
cout << setfill('0') << setw(4) << hex << i << " ";
cout << setfill('0') << setw(2) << hex << (0xff & (int)buffer[i]) << " ";
}
delete[] buffer; // Explicit freeing or de-allocation of memory.
}
else // There was some error opening file. Show message.
{
cout << "\n\n\tUnable to open file " << argv[1] << "\n";
}
return 0;
}
You could do it something like this:
#include <iostream>
#include <iomanip>
#include <fstream>
#include <vector>
#include <cctype>
std::ostream& fullLine(std::ostream& out, const std::vector<uint8_t>& v, size_t offset)
{
//save stream state so we can restore it after all the hex/setw/setfill nonsense.
std::ios oldState(0);
oldState.copyfmt(out);
out << std::hex << std::setfill('0') << std::setw(8) << offset << " ";
for (size_t i = 0; i < 16; ++i)
{
if (i == 8) out << " ";
out << std::hex << std::setfill('0') << std::setw(2) << static_cast<uint32_t>(v[i + offset]) << " ";
}
out << " ";
//restore stream state to print normal text
out.copyfmt(oldState);
for (size_t i = 0; i < 16; ++i)
{
out << (std::isprint(v[i + offset]) ? static_cast<char>(v[i + offset]) : '.');
}
out << "\n";
return out;
}
int main()
{
std::vector<uint8_t> data;
std::ifstream f("test.txt", std::ios::binary);
if (f)
{
f.seekg(0, f.end);
data.resize(static_cast<size_t>(f.tellg()));
f.seekg(0, f.beg);
f.read((char*)data.data(), data.size());
const size_t numFullLines = data.size() / 16;
const size_t lastLineLength = data.size() % 16;
for (size_t i = 0; i < numFullLines; ++i)
{
if (!fullLine(std::cout, data, i * 16))
{
std::cerr << "Error during output!\n";
return -1;
}
}
}
return 0;
}
There's probably a fancy way to do it, but I usually go for brute force when I'm looking for particular output using iostreams.
How to handle the partial last line is up to you. :)
Use the % operator to break the line after every 16th count:
cout << hex;
for(int i = 0; i < 100; i++)
{
if(i && (i % 16) == 0)
cout << "\n";
cout << setfill('0') << setw(2) << (buffer[i] & 0xFF) << " ";
}
I need it to be organized a certain way.
In another answer, I submitted this form of dumpByteHex()... perhaps it can help you achieve what you want. (see also https://stackoverflow.com/a/46083427/2785528)
// C++ support function
std::string dumpByteHex (char* startAddr, // reinterpret_cast explicitly
size_t len, // allows to char* from T*
std::string label = "",
int indent = 0)
{
std::stringstream ss;
if(len == 0) {
std::cerr << "\n dumpByteHex() err: data length is 0? " << std::endl << std::dec;
assert(len != 0);
}
// Output description
ss << label << std::flush;
unsigned char* kar = reinterpret_cast<unsigned char*>(startAddr); // signed to unsigned
std::string echo; // holds input chars until eoln
size_t indx;
size_t wSpaceAdded = false;
for (indx = 0; indx < len; indx++)
{
if((indx % 16) == 0)
{
if(indx != 0) // echo is empty the first time through for loop
{
ss << " " << echo << std::endl;
echo.erase();
}
// fields are typically < 8 bytes, so skip when small
if(len > 7) {
if (indent) { ss << std::setw(indent) << " "; }
ss << std::setfill('0') << std::setw(4) << std::hex
<< indx << " " << std::flush;
} // normally show index
}
// hex code
ss << " " << std::setfill('0') << std::setw(2) << std::hex
<< static_cast<int>(kar[indx]) << std::flush;
if((indx % 16) == 7) { ss << " "; wSpaceAdded = true; } // white space for readability
// defer the echo-of-input, capture to echo
if (std::isprint(kar[indx])) { echo += kar[indx]; }
else { echo += '.'; }
}
// finish last line when < 17 characters
if (((indx % 16) != 0) && wSpaceAdded) { ss << " "; indx++; } // when white space added
while ((indx % 16) != 0) { ss << " "; indx++; } // finish line
// the last echo
ss << " " << echo << '\n';
return ss.str();
} // void dumpByteHex()
Output format:
0000 11 22 33 44 55 66 00 00 00 00 77 88 99 aa ."3DUf....w...

Substring to divide text line by number of elements in the loop in C++ Producer Consumer pattern

I am writing the code which counts the lines in the document and split it into equal pats if the line more than 100. To split I am using string.substr(i, i+adding+ addCount). If i have to slit in three parts: First and third split part is OK, Second part has not only its part but also third part words in it. It looks something like this:
linesize: 331
divider3
0 Output I 110
1 EXPRESSION: Mrs. Bennet and her daughters then departed, and Elizabeth returned instantly to Jane, leaving her own and her
0 I
110I 110 0
was here OST
110 Output I 220
2 (error) EXPRESSION: relations’ behaviour to the remarks of the two ladies and Mr. Darcy; the latter of whom, however, could not be prevailed on to join in their censure of her, in spite of all Miss Bingley’s witticisms on fine eyes
110 I
220I 110 0
was here OST
220 Output I 416
3 EXPRESSION: be prevailed on to join in their censure of her, in spite of all Miss Bingley’s witticisms on fine eyes.
220 I
416I 110 86
was here OST
#include <iostream>
#include <deque>
#include <mutex>
#include <thread>
#include <condition_variable>
#include <future>
#include <map>
#include <fstream>
#include <vector>
using namespace std;
atomic<bool> isReady{false};
mutex mtx;
condition_variable condvar;
map<string, int> mapper;
string line;
vector<string> block;
size_t line_index = 0;
int block_size = 100;
int limit_chars = 100;
int c = 0;
deque<vector<string>> dq;
void Producer() {
std::cout << " Producer " << std::endl;
fstream fl("/home/ostap/CLionProjects/WordsCount2/file.txt"); //full path to the file
if (!fl.is_open()) {
cout << "error reading from file" << endl;
}
else {
cout << "SUCCESS!!!" << endl;
while (getline(fl, line) && line_index < block_size) {
if (line.find_first_not_of(' ') != string::npos) { // Checks whether it is a non-space.
// There's a non-space.
cout<< "linesize: " << line.length() << endl;
if (line.length() / limit_chars > 1.4) {
int divider = (int) (line.length() / limit_chars);
int adding = (int) line.length()/divider;
//попробуй поміняти на while все через addCount
int addCount = 0;
int i = 0;
cout <<"divider" << divider<<endl;
while ( i < line.length()){
while (line[i + adding + addCount] != ' ') {addCount+=1;}
cout << i << " Output I " << i + adding + addCount << endl;
cout << "EXPRESSION: " << line.substr(i, i + adding + addCount) << endl; //to del
block.push_back(line.substr(i, i + adding + addCount));
cout << i << " I" << endl;
i = i + adding + addCount;
cout << i << "I" <<" " <<adding <<" "<< addCount <<endl;
++line_index;
addCount = 0;
cout << "was here OST" << endl;
}
}
else {
++line_index;;
block.push_back(line);
cout << "Line: " << line << endl;
}
if (line_index >= block_size) {
c++;
cout << c << endl;
{
lock_guard<mutex> guard(mtx);
//cout << "Producing message: " << x << " th" << endl;
dq.push_back(block);
}
line_index = 0;
block.clear();
}
condvar.notify_one();
}
cout << "Producer completed" << endl;
isReady = true;
// for (unsigned i = 0; i < block.size(); ++i) cout << ' ' << block[i];
// cout << '\n';
//this_thread::sleep_for(chrono::seconds(1));
}
}
}
void Consumer() {
while (true) {
unique_lock<mutex> lk(mtx);
if (!dq.empty()) {
vector<string> & i = dq.front();
dq.pop_front();
lk.unlock();
cout << "Consuming: " << i.data() << " th" << endl;
} else {
if(isReady){
break;
}
else {
condvar.wait(lk);
cout << "There are no messages remained from producer" << endl;
}
}
cout << "\nConsumer is done" << endl;
}
}
int main() {
//cout << "Hello, World!" << endl;
auto t1 = async(launch::async, Producer);
auto t2 = async(launch::async, Consumer);
//auto t3 = async(launch::async, Consumer);
t1.get();
t2.get();
//t3.get();
return -1;

char, hex, because it shows ffffff

I have a char [], with the buffer name, the data is saved using an ifstream in binary mode,
void File::mostrarBuffer(){
for (int a = 0; a < std::strlen(buffer); a++){
std::cout << std::hex << ((int)buffer[a]) << std::endl;
}
// para ver char test, only for test
std::cout << "===" << std::endl;
for (int a = 0; a < std::strlen(buffer); a++){
std::cout << buffer[a] << std::endl;
}
char charTest = '\211';
std::cout << "===" << std::endl;
std::cout << std::hex << (int)charTest << std::endl;
std::cout << std::hex << (int)buffer[0] << std::endl;
}
the shell out:
ffffff89
50
4e
47
===
\211
P
N
G
===
ffffff89
ffffff89
the file in hexdump ("little-endian"):
0000000 5089 474e 0a0d 0a1a 0000 0d00 4849 5244
my question is why, appears ffffff89 and not 89, and only on the first element of char [] I've been around with this and can not find the solution. thanks for reading.
this solution works for me:
std::cout << std::hex << ((unsigned int)(unsigned char)buffer[a])
<< std::endl;
Because your chars are signed (highest bit is set).
I'm sorry, I'm not familiar with using std::hex but you somehow need to treat it like an unsigned char value. Try casting the char to and unsigned type.

Three questions regarding command line arguments / char*

My post is organized in three sections:
1. My code
2. Example input and output
3. My three questions
MY CODE:
#include <iostream>
#include <cstdlib>
#include <cmath>
#include <iomanip>
#include <fstream>
#include <sstream>
#include <string>
#include <cstring>
using namespace std;
void deleteTrash(char*, char*);
const int kStr = 2;
const int kStrLen = 3;
int main(int argc, char* argv[])
{
if (argc < 4) {
cout << "Incorrect argument given." << endl;
cout << "Try again." << endl;
return 0;
}
cout << "PRINT argv[2]" << endl;
cout << "-----" << endl;
for (int i = 0; i < sizeof(argv[2]); i++) {
cout << "Iterator: " << i << endl;
cout << argv[2][i] << endl;
}
char* inputString;
deleteTrash(argv[kStr], inputString);
cout << "PRINT inputString" << endl;
cout << "-----" << endl;
for (int i = 0; i < sizeof(inputString); i++) {
cout << i << endl;
cout << inputString[i] << endl;
}
int strLen;
stringstream num;
num << argv[kStrLen];
num >> strLen;
if ( num.fail() ) {
cout << "Incorrect argument given." << endl;
cout << "Try again." << endl;
return 0;
}
if ( strLen < sizeof(inputString) ) {
cout << "Incorrect argument given." << endl;
cout << "Try again." << endl;
return 0;
}
return 0;
}
void deleteTrash(char* tempString, char* inputString)
{
int tempStringLen = sizeof(tempString);
int newSize = 0;
while (tempString[newSize] != '\0')
newSize++;
char newString[newSize + 1];
int iterator = 0;
while (tempString[iterator] != '\0') {
newString[iterator] = tempString[iterator];
iterator++;
}
newString[newSize] = '\0';
cout << "PRINT newString" << endl;
cout << "-----" << endl;
for (int i = 0; i < sizeof(newString); i++) {
cout << newString[i] << endl;
}
inputString = newString;
cout << "PRINT inputString" << endl;
cout << "-----" << endl;
for (int i = 0; i < sizeof(inputString); i++) {
cout << "Iterator: " << i << endl;
cout << inputString[i] << endl;
}
return;
}
EXAMPLE INPUT:
./hw1q5 4 W# 3
OUTPUT:
PRINT argv[2]
-----
Iterator: 0
W
Iterator: 1
#
Iterator: 2
Iterator: 3
3
Iterator: 4
Iterator: 5
T
Iterator: 6
E
Iterator: 7
R
PRINT newString
-----
W
#
PRINT inputString
-----
Iterator: 0
W
Iterator: 1
#
Iterator: 2
Iterator: 3
Iterator: 4
Iterator: 5
Iterator: 6
Iterator: 7
PRINT inputString
-----
0
Segmentation fault: 11
MY QUESTIONS:
Why does argv contain more than 3 elements (M, #, and \0). It prints out 8 elements (print statement iterates 0 - 7), which, after W, #, \0, are garbage. Should it not be printing only the 3 elements (M, #, and \0). Why is this happening? How may I fix it?
Why is it that when I set newString (type char) to inputString (type char*), by doing inputString = newString, inputString iterates 8 time in the print statement, printing blanks after printing M, #, and \0.
Why is the seg fault happening in the third statement?
sizeof() does not return the length of a null-terminated character array string. Instead you need something like strlen().
Let's just take one of the problems here:
// Wrong!
int main(int argc, char* argv[])
...
for (int i = 0; i < sizeof(argv[2]); i++) {
cout << "Iterator: " << i << endl;
cout << argv[2][i] << endl;
}
// Better
int main(int argc, char* argv[])
...
for (int i = 0; i < argc; i++) {
cout << "argv[" << i << "]: " << argv[i] << endl;
}
argv[] is an array of one or more "C" strings.
argc tells you how many strings are in the array.
You want to iterate through the strings in the array (argv[i]), not the characters in the string (for example, "argv[0][0]").
... AND ...
"sizeof(argv)" just gives you the size of a pointer (4 bytes, for a 32-bit CPU). It does NOT give you the #/elements in the array. That's what "argc" is for.
In answer to your first question, I'm going to refer you to another SO article: What does int argc, char *argv[] mean?
The first command line argument is always the command itself. So in your example:
./hw1q5 4 W# 3
There are four command line arguments: hw1q5, 4, W#, and 3.
In regards to your other questions, and the remainder of the first question, the majority of your problems stem from the assumption that sizeof(char*) returns the length of a null terminated string, which it does not (as has been pointed out both in comments and an earlier answer).
A good reference for understanding sizeof can be found here: http://en.cppreference.com/w/cpp/language/sizeof, or as I suspect this is a homework assignment based on your compiled program name, your C++ textbook.