Character conversion using iconv without the Unicode Byte Order Mark

Character conversion using iconv without the Unicode Byte Order Mark - c++

How do I make iconv do conversions to UTF-* encodings without adding a byte order mark?
The problem is shown by this code, compiled with G++ 4.8 under 64-bit Linux:
#include <iostream>
#include <iconv.h>
#include <cstdint>
#include <iomanip>
using namespace std;
int main() {
const wchar_t *ws = L"Hello, world.\n";
for(size_t ii = 0; ii < 15; ii++) {
cout << setw(2) << hex << uint32_t(ws[ii]) << " ";
}
cout << endl;
iconv_t conv = iconv_open("UTF-16", "UTF-32");
char *outbuf = new char[14 * 4];
char *inptr = const_cast<char*>(reinterpret_cast<const char *>(ws));
char *outptr = outbuf;
size_t in_len = 14 * 4;
size_t out_len = 14 * 4;
size_t result = iconv(conv, &inptr, &in_len, &outptr, &out_len);
uint16_t *encoded = reinterpret_cast<uint16_t*>(outbuf);
for(size_t ii = 0; ii < 15; ii++) {
cout << setw(2) << hex << encoded[ii] << " ";
}
cout << endl;
}
This outputs:
48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 2e a 0
feff 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 2e a
clearly showing the BOM at the start of the resulting string where it is not present at the

I found that this works for me:
iconv_t conv = iconv_open("UTF16LE", "UTF32");
Though I believe it's implementation-dependent.
There is a similar question: Using iconv to convert from UTF-16BE to UTF-8 without BOM

Related

How can I perform a hex memory dump on an address in memory?

I am trying to write a C++ program for my Computer Machine Organization class in which I perform a memory dump in hex on some address stored in memory. I don't really understand what a memory dump is, and am pretty new to writing C++. My questions are:
How can I create a method that takes two arguments in which they specify address in memory?
How can I further modify those arguments to specify a word address that is exactly 4 bytes long?
How can I then convert those addresses into hex values?
I know that this is a lot, but thank you for any suggestions.
For anyone who needs it, here is my code so far:
#include <stdio.h>
// Create something to do the methods on
char array[3] = {'a', 'b', 'c'};
void mdump(char start, char end){
// Create pointers to get the address of the starting and ending characters
char* pointer1 = (char *)& start;
char* pointer2 = (char *)& end;
// Check to see if starting pointer is in lower memory than ending pointer
if(pointer1 < pointer2){
printf("Passed");
}
else{
printf("Failed");
}
// Modify both the arguments so that each of them are exactly 4 bytes
// Create a header for the dump
// Iterate through the addresses, from start pointer to end pointer, and produce lines of hex values
// Declare a struct to format the values
// Add code that creates printable ASCII characters for each memory location (print "cntrl-xx" for values 0-31, or map them into a blank)
// Print the values in decimal and in ASCII form
}
int main(){
mdump(array[0], array[2]);
return 0;
}

How to write a Hex dump tool while learning C++:
Start with something simple:
#include <iostream>
int main()
{
char test[32] = "My sample data";
// output character
std::cout << test[0] << '\n';
}
Output:
M
Live demo on coliru
Print the hex-value instead of the character:
#include <iostream>
int main()
{
char test[32] = "My sample data";
// output a character as hex-code
std::cout << std::hex << test[0] << '\n'; // Uh oh -> still a character
std::cout << std::hex << (unsigned)(unsigned char)test[0] << '\n';
}
Output:
M
4d
Live demo on coliru
Note:
The stream output operator for char is intended to print a character (of course). There is another stream output operator for unsigned which fits better. To achieve that it's used, the char has to be converted to unsigned.
But be prepared: The C++ standard doesn't mandate whether char is signed or unsigned—this decision is left to the compiler vendor. To be on the safe side, the 'char' is first converted to 'unsigned char' then converted to unsigned.
Print the address of the variable with the character:
#include <iostream>
int main()
{
char test[32] = "My sample data";
// output an address
std::cout << &test[0] << '\n'; // Uh oh -> wrong output stream operator
std::cout << (const void*)&test[0] << '\n';
}
Output:
My sample data
0x7ffd3baf9b70
Live demo on coliru
Note:
There is one stream output operator for const char* which is intended to print a (zero-terminated) string. This is not what is intended. Hence, the (ugly) trick with the cast to const void* is necessary which triggers another stream output operator which fits better.
What if the data is not a 2 digit hex?
#include <iomanip>
#include <iostream>
int main()
{
// output character as 2 digit hex-code
std::cout << (unsigned)(unsigned char)'\x6' << '\n'; // Uh oh -> output not with two digits
std::cout << std::hex << std::setw(2) << std::setfill('0')
<< (unsigned)(unsigned char)'\x6' << '\n';
}
Output:
6
06
Live demo on coliru
Note:
There are I/O manipulators which can be used to modify the formatting of (some) stream output operators.
Now, put it all together (in loops) et voilà: a hex-dump.
#include <iomanip>
#include <iostream>
int main()
{
char test[32] = "My sample data";
// output an address
std::cout << (const void*)&test[0] << ':';
// output the contents
for (char c : test) {
std::cout << ' '
<< std::hex << std::setw(2) << std::setfill('0')
<< (unsigned)(unsigned char)c;
}
std::cout << '\n';
}
Output:
0x7ffd345d9820: 4d 79 20 73 61 6d 70 6c 65 20 64 61 74 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Live demo on coliru
Make it nice:
#include <algorithm>
#include <iomanip>
#include <iostream>
int main()
{
char test[32] = "My sample data";
// hex dump
const size_t len = sizeof test;
for (size_t i = 0; i < len; i += 16) {
// output an address
std::cout << (const void*)&test[i] << ':';
// output the contents
for (size_t j = 0, n = std::min<size_t>(len - i, 16); j < n; ++j) {
std::cout << ' '
<< std::hex << std::setw(2) << std::setfill('0')
<< (unsigned)(unsigned char)test[i + j];
}
std::cout << '\n';
}
}
Output:
0x7fffd341f2b0: 4d 79 20 73 61 6d 70 6c 65 20 64 61 74 61 00 00
0x7fffd341f2c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Live demo on coliru
Make it a function:
#include <algorithm>
#include <iomanip>
#include <iostream>
void hexdump(const char* data, size_t len)
{
// hex dump
for (size_t i = 0; i < len; i += 16) {
// output an address
std::cout << (const void*)&data[i] << ':';
// output the contents
for (size_t j = 0, n = std::min<size_t>(len - i, 16); j < n; ++j) {
std::cout << ' '
<< std::hex << std::setw(2) << std::setfill('0')
<< (unsigned)(unsigned char)data[i + j];
}
std::cout << '\n';
}
}
int main()
{
char test[32] = "My sample data";
std::cout << "dump test:\n";
hexdump(test, sizeof test);
std::cout << "dump 4 bytes of test:\n";
hexdump(test, 4);
std::cout << "dump an int:\n";
int n = 123;
hexdump((const char*)&n, sizeof n);
}
Output:
dump test:
0x7ffe900f4ea0: 4d 79 20 73 61 6d 70 6c 65 20 64 61 74 61 00 00
0x7ffe900f4eb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
dump 4 bytes of test:
0x7ffe900f4ea0: 4d 79 20 73
dump an int:
0x7ffe900f4e9c: 7b 00 00 00
Live demo on coliru
Note:
(const char*)&n may look a bit adventurous. In fact, conversion of pointers is always something which should be at best not necessary. However, for the dump tool this is the easiest way to access the bytes of arbitrary data. (This is one of the rare cases which is explicitly allowed by the standard.)
An even nicer hexdump can be found in
SO: How would I create a hex dump utility in C++?
(which I recommended OP beforehand).

Qt C++ macOS problem. I am searching for words in multiset with function .find("a word") it works on windows but not on mac

I have written some code that loads some files containing a list of words (one word pr line). each word is added to a multiset. later I try to search the multiset with multiset.find("aWord"). where I look for the word and substrings of the word in the multiset.
This code works fine if I compile it with qt on a windows system.
But don't work if i compile it in qt on my mac !
my goal is to make it work from qt on my mac.
I am woking on macbook Air (13" early 2018) with a
macOS Majave version 10.14.4 instalation
Buil version 18E226
local 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT
2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64
Using a qt installation:
QTKit:
Version: 7.7.3
Obtained from: Apple
Last Modified: 13/04/2019 12.11
Kind: Intel
64-Bit (Intel): Yes
Get Info String: QTKit 7.7.3, Copyright 2003-2012, Apple Inc.
Location: /System/Library/Frameworks/QTKit.framework
Private: No
And xcode installation:
Xcode 10.2
Build version 10E125
I have tried to print out:
every strings that i am searching for
and every string i should find in the multiset as hex format
and concluded that some of the letters do not match.
in there hex value. despite i think my whole system run utf-8 and the file also is utf-8 encoded.
Dictionary.h
#ifndef DICTIONARY_H
#define DICTIONARY_H
#include <iostream>
#include <vector>
#include <set>
class Dictionary
{
public:
Dictionary();
void SearchForAllPossibleWordsIn(std::string searchString);
private:
std::multiset<std::string, std::less<std::string>> mDictionary;
void Initialize(std::string folder);
void InitializeLanguage(std::string folder, std::string languageFileName);
};
#endif // DICTIONARY_H
Dictionary.cpp
#include "Dictionary.h"
#include <vector>
#include <set>
#include <iostream>
#include <fstream>
#include <exception>
Dictionary::Dictionary()
{
Initialize("../Lektion10Projekt15-1/");
}
void Dictionary::Initialize(std::string folder)
{
InitializeLanguage(folder,"da-utf8.wl");
}
void Dictionary::InitializeLanguage(std::string folder, std::string languageFileName)
{
std::ifstream ifs;
ifs.open(folder+languageFileName,std::ios_base::in);
if (ifs.fail()) {
std::cerr <<"Error! Class: Dictionary. Function: InitializeLanguage(...). return: ifs.fail to load file '" + languageFileName + "'" << std::endl;
}else {
std::string word;
while (!ifs.eof()) {
std::getline(ifs,word);
mDictionary.insert(word);
}
}
ifs.close();
}
void Dictionary::SearchForAllPossibleWordsIn(std::string searchString)
{
std::vector<std::string> result;
for (unsigned int a = 0 ; a <= searchString.length(); ++a) {
for (unsigned int b = 1; b <= searchString.length()-a; ++b) {
std::string substring = searchString.substr(a,b);
if (mDictionary.find(substring) != mDictionary.end())
{
result.push_back(substring);
}
}
}
if (!result.empty()) {
for (unsigned int i = 0; i < result.size() ;++i) {
std::cout << result[i] << std::endl;
}
}
}
main.cpp
#include <iostream>
#include "Dictionary.h"
int main()
{
Dictionary myDictionary;
myDictionary.SearchForAllPossibleWordsIn("byggearbejderen");
return 0;
}
I have tried to change the following line in main.cpp
myDictionary.SearchForAllPossibleWordsIn("byggearbejderen");
to (OBS: the first word in the word list is byggearbejderen)
std::ifstream ifs;
ifs.open("../Lektion10Projekt15-1/da-utf8.wl",std::ios::in);
if (ifs.fail()) {
std::cerr <<"Error!" << std::endl;
}else {
std::getline(ifs,searchword);
}
ifs.close();
myDictionary.SearchForAllPossibleWordsIn(searchword);
And then in the main.cpp add som print out with the expected string and substring in hex value.
std::cout << " cout as hex test:" << std::endl;
myDictionary.SearchForAllPossibleWordsIn(searchword);
std::cout << "Suposet search resul for ''bygearbejderen''" << std::endl;
for (char const elt: "byggearbejderen")
std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(elt) << " ";
std::cout << "byggearbejderen" << std::endl;
for (char const elt: "arbejderen")
std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(elt) << " ";
std::cout << "arbejderen" << std::endl;
for (char const elt: "ren")
std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(elt) << " ";
std::cout << "ren" << std::endl;
for (char const elt: "en")
std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(elt) << " ";
std::cout << "en" << std::endl;
for (char const elt: "n")
std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(elt) << " ";
std::cout << "n" << std::endl;
And also added the same print in the outprint of result in Dictonary.cpp
std::cout << "result of seartchword as hex" << std::endl;
if (!result.empty()) {
for (unsigned int i = 0; i < result.size() ;++i)
{
for (char const elt: result[i] )
{
std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(elt) << " ";
}
std::cout << result[i] << std::endl;
}
}
which gave the following output:
result of seartchword as hex
ffffffef ffffffbb ffffffbf 62 79 67 67 65 61 72 62 65 6a 64 65 72 65 6e 0d byggearbejderen
61 72 62 65 6a 64 65 72 65 6e 0d arbejderen
72 65 6e 0d ren
65 6e 0d en
6e 0d n
Suposet search resul for ''bygearbejderen''
62 79 67 67 65 61 72 62 65 6a 64 65 72 65 6e 00 byggearbejderen
61 72 62 65 6a 64 65 72 65 6e 00 arbejderen
72 65 6e 00 ren
65 6e 00 en
6e 00 n
where I notice that some values were different.
I don't know why this is the case when i am on a macOS but not the case on windows. I do not know if there are any settings of encoding in my environment I need to change or set correct.
I would like i my main.cpp looked liked this:
#include <iostream>
#include "Dictionary.h"
int main()
{
Dictionary myDictionary;
myDictionary.SearchForAllPossibleWordsIn("byggearbejderen");
return 0;
}
resulting in the following output:
byggearbejderen
arbejderen
ren
en
n

Line endings for text files are different on Windows than they are on a Mac. Windows uses both CR/LF characters (ASCII codes 13 and 10, respectively). Old Macs used the CR character alone, Linux systems use just the LF. If you create a text file on Windows, then copy it to your Mac, the line endings might not be handled correctly.
If you look at the last character in your output, you'll see it is a 0d, which would be the CR character. I don't know how you generated that output, but it is possible that the getline on the Mac is treating that as a normal character, and including it in the string that has been read in.
The simplest solution is to either process that text file beforehand to get the line endings correct, or strip the CR off the end of the words after they are read in.

Send std::vector<float> over TCP, from boost::asio to QTcpSocket

The server needs to send a std::vector<float> to a Qt application over a TCP socket. I am using Qt 5.7.
On the server side, using boost::asio:
std::vector<float> message_ = {1.2, 8.5};
asio::async_write(socket_, asio::buffer<float>(message_),
[this, self](std::error_code ec, std::size_t)
This works and I manage to get it back on my client using boost::asio's read_some(). As both Qt and asio have their own event manager, I want to avoid using asio in my Qt app.
So on the client side I have (which does not work):
client.h:
#define FLOATSIZE 4
QTcpSocket *m_socket;
QDataStream m_in;
QString *m_string;
QByteArray m_buff;
client.cpp (constructor):
m_in.setDevice(m_socket);
m_in.setFloatingPointPrecision(QDataStream::SinglePrecision);
// m_in.setByteOrder(QDataStream::LittleEndian);
client.cpp (read function, which is connected via QObject::connect(m_socket, &QIODevice::readyRead, this, &mywidget::ask2read); ):
uint availbytes = m_socket->bytesAvailable(); // which is 8, so that seems good
while (availbytes >= FLOATSIZE)
{
nbytes = m_in.readRawData(m_buff.data(), FLOATSIZE);
bool conv_ok = false;
const float f = m_buff.toFloat(&conv_ok);
availbytes = m_socket->bytesAvailable();
m_buff.clear();
}
The m_buff.toFloat() call returns 0.0 which is a fail according to the Qt doc. I have tried to change the float precision, little or big endian, but I can not manage to get my std::vector<float> back. Any hints?
Edit: everything runs on the same PC/compiler.
Edit: see my answer for a solution and sehe's for more detail on what is going on

I managed to resolve the issue, by editing the Qt side (client), to read the socket:
uint availbytes = m_socket->bytesAvailable();
while (availbytes >= 4)
{
char buffer[FLOATSIZE];
nbytes = m_in.readRawData(buffer, FLOATSIZE);
float f = bytes2float(buffer);
availbytes = m_socket->bytesAvailable();
}
I use those two conversion functions, bytes2float and bytes2int:
float bytes2float(char* buffer)
{
union {
float f;
uchar b[4];
} u;
u.b[3] = buffer[3];
u.b[2] = buffer[2];
u.b[1] = buffer[1];
u.b[0] = buffer[0];
return u.f;
}
and:
int bytes2int(char* buffer)
{
int a = int((unsigned char)(buffer[3]) << 24 |
(unsigned char)(buffer[2]) << 16 |
(unsigned char)(buffer[1]) << 8 |
(unsigned char)(buffer[0]));
return a;
}
I also found that function to display bytes, which is useful to see what is going on behind the scene (from https://stackoverflow.com/a/16063757/7272199):
template <typename T>
void print_bytes(const T& input, std::ostream& os = std::cout)
{
const unsigned char* p = reinterpret_cast<const unsigned char*>(&input);
os << std::hex << std::showbase;
os << "[";
for (unsigned int i=0; i<sizeof(T); ++i)
os << static_cast<int>(*(p++)) << " ";
os << "]" << std::endl;;
}

Re. your answer: Which side is this on? Also, are your platforms not the same (OS/architecture?). I had assumed from the question that both processes run on the same PC and compiler etc.
For one thing, you can see that ASIO does not do anything related to endianness.
#include <boost/asio.hpp>
#include <iostream>
#include <iomanip>
namespace asio = boost::asio;
#include <iostream>
void print_bytes(unsigned char const* b, unsigned char const* e)
{
std::cout << std::hex << std::setfill('0') << "[ ";
while (b!=e)
std::cout << std::setw(2) << static_cast<int>(*b++) << " ";
std::cout << "]\n";
}
template <typename T> void print_bytes(const T& input) {
using namespace std;
print_bytes(reinterpret_cast<unsigned char const*>(std::addressof(*begin(input))),
reinterpret_cast<unsigned char const*>(std::addressof(*end(input))));
}
int main() {
float const fs[] { 1.2, 8.5 };
std::cout << "fs: "; print_bytes(fs);
{
std::vector<float> gs(2);
asio::buffer_copy(asio::buffer(gs), asio::buffer(fs));
for (auto g : gs) std::cout << g << " "; std::cout << "\n";
std::cout << "gs: "; print_bytes(gs);
}
{
std::vector<char> binary(2*sizeof(float));
asio::buffer_copy(asio::buffer(binary), asio::buffer(fs));
std::cout << "binary: "; print_bytes(binary);
std::vector<float> gs(2);
asio::buffer_copy(asio::buffer(gs), asio::buffer(binary));
for (auto g : gs) std::cout << g << " "; std::cout << "\n";
std::cout << "gs: "; print_bytes(gs);
}
}
Prints
fs: [ 9a 99 99 3f 00 00 08 41 ]
1.2 8.5
gs: [ 9a 99 99 3f 00 00 08 41 ]
binary: [ 9a 99 99 3f 00 00 08 41 ]
1.2 8.5
gs: [ 9a 99 99 3f 00 00 08 41 ]
Theory
I suspect the Qt side ruins things. Since the naming of the function readRawData certainly implies a lack of endianness awareness, I'd guess your system's endianness wreaks havoc (https://stackoverflow.com/a/2945192/85371, also the comment).
Suggestion
In that case, consider using Boost Endian.

I think it's a bad idea to use high level send method server side (you try to send a c++ vector) and low level client side.
I'm quite sure there is an endianness problem somewhere.
Anyway try to do this client side:
char buffer[FLOATSIZE];
bytes = m_in.readRawData(buffer, FLOATSIZE);
if (bytes != FLOATSIZE)
return ERROR;
const float f = (float)(ntohl(*((int32_t *)buffer)));
If boost::asio uses the network byte order for the floats (as it should), this will work.

Construct a string who contain the hexValue of a binary array

I'm trying to construct a string from a byte array (libcrypto++) but I have issues with '0' in order to connect to SQS in c++
The result is almost correct except some '0' go at the end of the string.
std::string shaDigest(const std::string &key = "") {
byte out[64] = {0};
CryptoPP::SHA256().CalculateDigest(out, reinterpret_cast<const byte*>(key.c_str()), key.size());
std::stringstream ss;
std::string rep;
for (int i = 0; i < 64; i++) {
ss << std::hex << static_cast<int>(out[i]);
}
ss >> rep;
rep.erase(rep.begin()+64, rep.end());
return rep;
}
output:
correct : c46268185ea2227958f810a84dce4ade54abc4f42a03153ef720150a40e2e07b
mine : c46268185ea2227958f810a84dce4ade54abc4f42a3153ef72015a40e2e07b00
^ ^
Edit: I'm trying to do the same that hashlib.sha256('').hexdigest() in python does.

If that indeed works, here's the solution with my suggestions incorporated.
std::string shaDigest(const std::string &key = "") {
std::array<byte, 64> out {};
CryptoPP::SHA256().CalculateDigest(out.data(), reinterpret_cast<const byte*>(key.c_str()), key.size());
std::stringstream ss;
ss << std::hex << std::setfill('0');
for (byte b : out) {
ss << std::setw(2) << static_cast<int>(b);
}
// I don't think `.substr(0,64)` is needed here;
// hex ASCII form of 64-byte array should always have 128 characters
return ss.str();
}

You correctly convert bytes in hexadecimal, and it works correctly as soon as the byte value is greater than 15. But below, the first hexa digit is a 0 and is not printed by default. The two absent 0 are for 0x03 -> 3 and 0x0a -> a.
You should use :
for (int i = 0; i < 64; i++) {
ss << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(out[i]);
}

You need to set the width for the integer numbers for the proper zero-padding of numbers with otherwise less than two hexadecimal digits. Note that you need to re-set the width before every number that is inserted into the stream.
Example:
#include <iostream>
#include <iomanip>
int main() {
std::cout << std::hex << std::setfill('0');
for (int i=0; i<0x11; i++)
std::cout << std::setw(2) << i << "\n";
}
Output:
$ g++ test.cc && ./a.out
00
01
02
03
04
05
06
07
08
09
0a
0b
0c
0d
0e
0f
10
For reference:
http://en.cppreference.com/w/cpp/io/manip/setw
http://en.cppreference.com/w/cpp/io/manip/setfill

segmentation fault on getline in Ubuntu

I'm having the famous segmentation fault. I've tracked it down to a single line in the code (getline). Here's someone with a similar issue, also on Ubuntu:
http://www.daniweb.com/software-development/cpp/threads/329191
Note that getline returns -1 after the segmentation fault, but it couldn't have been really the end of the stream (in my case).
When the stream is smaller, everything goes ok. As we can deduce from the output, the segmentation fault is on line 98.
1 /*
2 * File: RequestDispatcher.cpp
3 * Author: albert
4 *
5 * Created on July 8, 2011, 7:15 PM
6 */
7
8 #include "iostream"
9 #include "fstream"
10 #include "stdlib.h"
11 #include "stdio.h"
12 #include "cstring"
13 #include "algorithm"
14
15 #include "RequestDispatcher.h"
16 #include "Functions.h"
17
18 #define PROXIES 1
19
20 RequestDispatcher::RequestDispatcher()
21 {
22 }
23
24 RequestDispatcher::RequestDispatcher(const RequestDispatcher& orig)
25 {
26 }
27
28 RequestDispatcher::~RequestDispatcher()
29 {
30 }
31
32 int RequestDispatcher::addRequest(string host, string request, IResponseReceiver* response_receiver)
33 {
34 RequestInfo info;
35 info.request_index = request_info.size();
36 info.host = host;
37 info.request = request;
38 info.response_receiver = response_receiver;
39 request_info.push_back(info);
40 return info.request_index;
41 }
42
43 void RequestDispatcher::run()
44 {
45 if (request_info.size()==0)
46 {
47 return;
48 }
49 FILE* pipe[PROXIES];
50 int per_proxy = (request_info.size() + PROXIES - 1) / PROXIES;
51 int count_pipes = (request_info.size() + per_proxy - 1) / per_proxy;
52 for (int pipe_index=0; pipe_index<count_pipes; ++pipe_index)
53 {
54 int from = pipe_index * per_proxy;
55 int to = min(from + per_proxy, int(request_info.size()));
56 cout << "FROM: "<< from << "; TO: " << to;
57 const char* cmd = generateCmd(from, to);
58 pipe[pipe_index] = popen(cmd, "r");
59 if (!pipe[pipe_index])
60 {
61 cerr << "Error executing command in RequestDispatcher::run()";
62 }
63 }
64 string result[PROXIES];
65 bool finished[PROXIES];
66 for (int pipe_index=0; pipe_index<count_pipes; pipe_index++)
67 {
68 finished[pipe_index] = false;
69 }
70 int count_finished = 0;
71 char* buffer;
72 size_t buffer_length=1024;
73 buffer = (char *) malloc (buffer_length + 1);
74 while (count_finished < count_pipes)
75 {
76 cout << "D\n";
77 fflush(stdout);
78 for(int pipe_index=0; pipe_index<count_pipes; ++pipe_index)
79 {
80 cout << "E\n";
81 fflush(stdout);
82 if (finished[pipe_index])
83 {
84 continue;
85 }
86 cout << "Getline" << buffer_length << "\n";
87 ssize_t bytes_read = getline(&buffer, &buffer_length, pipe[pipe_index]);
88 cout << "Getline Done ("<<bytes_read<< "," << buffer_length << ")\n";
89 fflush(stdout);
90 while (bytes_read>0)
91 {
92 for (int i=0; i<bytes_read; i++)
93 {
94 result[pipe_index] += buffer[i];
95 }
96 cout << "P\n";
97 fflush(stdout);
98 bytes_read = getline(&buffer, &buffer_length, pipe[pipe_index]);
99 cout << "Bytes read ("<<bytes_read<<","<< buffer_length << ")\n";
100 fflush(stdout);
101
102 }
103 if (bytes_read == -1) // then finished this pipe
104 {
105 string* r = &result[pipe_index];
106 //cout << *r;
107 finished[pipe_index] = true;
108 ++count_finished;
109 cout << "HI\n";
110 fflush(stdout);
111 // delete trailing '\0' from result
112 pclose(pipe[pipe_index]);
113 result[pipe_index] = result[pipe_index].substr(0, result[pipe_index].length()-1);
114 int pos = r->find("RESPONSE_DATA");
115 int valuepos, endvaluepos;
116 int request_index, length;
117 string headers;
118 int headerslength;
119 string body;
120 int bodypos, bodylength;
121 while (pos!=r->npos)
122 {
123 valuepos = r->find("REQUEST_INDEX=", pos) + 14;
124 endvaluepos = r->find("\n", valuepos);
125 request_index = pipe_index * per_proxy + atoi(r->substr(valuepos, endvaluepos-valuepos).c_str());
126
127 cout << "REQUEST_INDEX " << request_index;
128
129 valuepos = r->find("LENGTH=", pos) + 7;
130 endvaluepos = r->find("\n", valuepos);
131 length = atoi(r->substr(valuepos, endvaluepos-valuepos).c_str());
132
133 pos = r->find("START", pos)+5;
134 bodypos = r->find("\r\n\r\n", pos)+4;
135 headerslength = bodypos-pos-4;
136 bodylength = length-headerslength-4;
137 headers = r->substr(pos, headerslength);
138 body = r->substr(bodypos, bodylength);
139 request_info[request_index].response_receiver->notifyResponse(headers, body, request_index);
140
141 pos=r->find("RESPONSE_DATA", pos+length);
142 }
143 }
144 }
145 }
146 cout << "\n?\n";
147 fflush(stdout);
148 free(buffer);
149 request_info.clear();
150 }
151
152 const char* RequestDispatcher::generateCmd(int first_request, int to_request)
153 {
154 string r("/home/albert/apachebench-standalone-read-only/ab -a");
155 for (int i=first_request; i<to_request; i++)
156 {
157 r.append(" '");
158 r.append(request_info.at(i).request);
159 r.append("'");
160 }
161 ofstream out("/home/albert/apachebench-standalone-read-only/debug");
162 if(! out)
163 {
164 cerr<<"Cannot open output file\n";
165 return "";
166 }
167 out << r.c_str();
168 out.close();
169 return "/home/albert/apachebench-standalone-read-only/debug";
170 /*int size = strlen("/home/albert/apachebench-standalone-read-only/ab -a");
171 for (int i=first_request; i<to_request; i++)
172 {
173 size += 2+strlen(request_info.at(i).request)+1;
174 cout << "len: " << strlen(request_info.at(i).request) << "\n";
175 cout << "total: " << size << "\n";
176 }
177 size += 1;
178 char* cmd = new char[size];
179 strcpy(cmd, "/home/albert/apachebench-standalone-read-only/ab -a");
180 for (int i=first_request; i<to_request; i++)
181 {
182 cout << "LEN: " << strlen(cmd) << "\n";
183 cout << "NEXT: " << strlen(request_info.at(i).request) << "\n";
184 fflush(stdout);
185 strcat(cmd, " '");
186 strcat(cmd, request_info.at(i).request);
187 strcat(cmd, "'");
188 }
189 cout << "LEN: " << strlen(cmd) << "\n";
190 fflush(stdout);
191 return cmd;*/
192 }
When I run /home/albert/apachebench-standalone-read-only/debug from the command line everything works perfectly fine. It returns binary data.
The end of the output is:
P
Bytes read (272,6828)
P
Bytes read (42,6828)
P
Bytes read (464,6828)
P
Bytes read (195,6828)
P
Bytes read (355,6828)
P
Bytes read (69,6828)
P
Bytes read (111,6828)
P
Segmentation fault
Bytes read (368,6828)
P
Bytes read (-1,6828)
HI
REQUEST_INDEX 46REQUEST_INDEX 48REQUEST_INDEX 44REQUEST_INDEX 0REQUEST_INDEX 45
?
Mind the "?" for exiting the loop. After this, the program is finished.
By the way, I always thought the program would terminate on a segmentation fault (edit: I did not do anything to catch it).
In reply to some answers: There seem to be different versions of getline and I seem to be using the one documented here:
http://www.kernel.org/doc/man-pages/online/pages/man3/getline.3.html

So after some thought the issue I believe is that your buffer is being written to as you're reading it. In some cases the buffer is not done being written to and you remove some of the data from it (which could mean that you may read an empty buffer because the write isn't done). This is because you are using popen and simply piping data from another process. What I would recommend is that for one you use the C++ standard for getline (although both are somewhat unsafe) and that you have some leeway for reading data from the pipe. Retry logic might be what you need as I can't think of a clean way to solve this. If anyone knows please post it, I'm posting this because this is what I believe to be the likely culprit of the problem.
Also if you're coding in C++ I highly recommend that you use the C++ libraries so that you're not constantly mixing or casting between types (such as string to char * and such, it just saves you some hassle) and that you use the safer versions of methods so you avoid errors such as buffer overflows.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Character conversion using iconv without the Unicode Byte Order Mark - c++

I found that this works for me: iconv_t conv = iconv_open("UTF16LE", "UTF32"); Though I believe it's implementation-dependent. There is a similar question: Using iconv to convert from UTF-16BE to UTF-8 without BOM

Related

How can I perform a hex memory dump on an address in memory?

Qt C++ macOS problem. I am searching for words in multiset with function .find("a word") it works on windows but not on mac

Send std::vector<float> over TCP, from boost::asio to QTcpSocket

Construct a string who contain the hexValue of a binary array

segmentation fault on getline in Ubuntu

Categories

Resources