How can I iterate through string bytes numerically? - c++

I have a string encoded in windows-1256 and is displayed as ÓíÞÑÕäí áßí ¿.
The string should be displayed in Arabic if the operating system is configured to use the encoding.
Here is the HEX representation of the string:
My intention is to convert the text to utf8 manually (using lookup tables to see which bytes need to be altered, and which should be left as-is).
I will need to iterate through all bytes in the string to see the binary value of the byte.
The string is printed to the output stream as ÓíÞÑÕäí áßí ¿. The string length is 13 visible characters. But when I try to iterate through the bytes, the loop is run double the size (24) iterations. Maybe it wrongly assumes at UTF8 or UTF16.
How can I access the numerical value of each byte in the string?
#include <iostream>
#include <bitset>
using std::string;
using std::cout;
using std::endl;
int main() {
string myString = "ÓíÞÑÕäí áßí ¿";
// text is written in Windows-1256 encoding
cout << "string is : " << myString << endl;
// outputs: string is : ÓíÞÑÕäí áßí ¿
cout << "length : " << myString.size() << endl;
// outputs : length : 24
for (std::size_t i = 0; i < myString.size(); ++i)
{
uint8_t b1 = (uint8_t)myString.c_str()[i];
unsigned char b2 = (unsigned char) myString.c_str()[i];
unsigned int b3 = (unsigned int) myString.c_str()[i];
int b4 = (int) myString.c_str()[i];
cout << i << " - "
<< std::bitset<8>(myString.c_str()[i])
<< " : " << b1 // prints �
<< " : " << b2 // prints �
<< " : " << b3 // prints very large numbers, except for spaces (32)
<< " : " << b4 // negative values, except for the space (32)
<< endl;
}
return 0;
}
output
string is : ÓíÞÑÕäí áßí ¿
length : 24
0 - 11000011 : � : � : 4294967235 : -61
1 - 10010011 : � : � : 4294967187 : -109
2 - 11000011 : � : � : 4294967235 : -61
3 - 10101101 : � : � : 4294967213 : -83
4 - 11000011 : � : � : 4294967235 : -61
5 - 10011110 : � : � : 4294967198 : -98
6 - 11000011 : � : � : 4294967235 : -61
7 - 10010001 : � : � : 4294967185 : -111
8 - 11000011 : � : � : 4294967235 : -61
9 - 10010101 : � : � : 4294967189 : -107
10 - 11000011 : � : � : 4294967235 : -61
11 - 10100100 : � : � : 4294967204 : -92
12 - 11000011 : � : � : 4294967235 : -61
13 - 10101101 : � : � : 4294967213 : -83
14 - 00100000 : : : 32 : 32
15 - 11000011 : � : � : 4294967235 : -61
16 - 10100001 : � : � : 4294967201 : -95
17 - 11000011 : � : � : 4294967235 : -61
18 - 10011111 : � : � : 4294967199 : -97
19 - 11000011 : � : � : 4294967235 : -61
20 - 10101101 : � : � : 4294967213 : -83
21 - 00100000 : : : 32 : 32
22 - 11000010 : � : � : 4294967234 : -62
23 - 10111111 : � : � : 4294967231 : -65

I finally was able to iterate through the string byte by byte using the following code, (which I copied from another answer, but couldn't find the link to it)
// this function receives any std::string
// and returns a vector<byte> containing the numerical value of each byte in the string
vector<byte> getBytes(string const &s) {
std::vector<std::byte> bytes;
bytes.reserve(std::size(s));
std::transform(std::begin(s),
std::end(s),
std::back_inserter(bytes),
[](char const &c){ return std::byte(c);});
return bytes;
}

Related

C++ u8 literal - unexpected encoding on Windows

I'm sure there's something I am missing here but I am comparing the contents of a regular string literal (in a utf8 encoded document) with a u8 string literal and on Windows the u8 encoded literal doesn't contain the expected utf8 encoded data while on Linux it does.
Details:
cpp file is utf8 encoded
C++17 is enabled
compiling using vs 2019 on Windows
compiling using gcc 9.2.1 on Linux
Here's the code:
#include <iostream>
#include <string>
struct HexCharStruct {
unsigned char c;
HexCharStruct(unsigned char _c) : c(_c) { }
};
inline std::ostream& operator<<(std::ostream& o, const HexCharStruct& hs) {
return (o << std::hex << (int)hs.c);
}
inline HexCharStruct hex(unsigned char _c) {
return HexCharStruct(_c);
}
int main( int argc, char** argv ) {
std::string s1 = "🎂";
std::string s2 = u8"🎂";
std::cout << "s1: ";
for (const char& c : s1)
std::cout << hex(c) << " ";
std::cout << "\ns2: ";
for (const char& c : s2)
std::cout << hex(c) << " ";
return 0;
}
Here are the hex values printed on Windows and Linux for s1 and s2 when I run this:
s1 (Windows): f0 9f 8e 82
s1 (Linux): f0 9f 8e 82
s2 (Windows): c3 b0 c5 b8 c5 bd e2 80 9a
s2 (Linux): f0 9f 8e 82
The utf8 hex values for 🎂 are f0 9f 8e 82 so everything is as expected except for s2 on Windows. Can anyone explain this?
The Microsoft compiler assumes source is ANSI-encoded, which depends on the localized version of Windows in use. On U.S. and Western European Windows the encoding is assumed to be Windows-1252.
When the compiler assumes Windows-1252, it decodes the UTF-8 bytes encoded in the source in the wrong encoding and thinks it is four Windows-1252 characters, then encodes those characters in UTF-8. A quick demo (Python):
>>> '🎂'.encode('utf8') # bytes in the file
b'\xf0\x9f\x8e\x82'
>>> b'\xf0\x9f\x8e\x82'.decode('Windows-1252') # What the compiler reads.
'🎂'
>>> '🎂'.encode('utf8') # What the compiler generates for u8 string.
b'\xc3\xb0\xc5\xb8\xc5\xbd\xe2\x80\x9a'
To use UTF-8 sources, two options are to encode the source in UTF-8 w/ BOM or add the /utf-8 compiler switch.

Qt C++ macOS problem. I am searching for words in multiset with function .find("a word") it works on windows but not on mac

I have written some code that loads some files containing a list of words (one word pr line). each word is added to a multiset. later I try to search the multiset with multiset.find("aWord"). where I look for the word and substrings of the word in the multiset.
This code works fine if I compile it with qt on a windows system.
But don't work if i compile it in qt on my mac !
my goal is to make it work from qt on my mac.
I am woking on macbook Air (13" early 2018) with a
macOS Majave version 10.14.4 instalation
Buil version 18E226
local 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT
2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64
Using a qt installation:
QTKit:
Version: 7.7.3
Obtained from: Apple
Last Modified: 13/04/2019 12.11
Kind: Intel
64-Bit (Intel): Yes
Get Info String: QTKit 7.7.3, Copyright 2003-2012, Apple Inc.
Location: /System/Library/Frameworks/QTKit.framework
Private: No
And xcode installation:
Xcode 10.2
Build version 10E125
I have tried to print out:
every strings that i am searching for
and every string i should find in the multiset as hex format
and concluded that some of the letters do not match.
in there hex value. despite i think my whole system run utf-8 and the file also is utf-8 encoded.
Dictionary.h
#ifndef DICTIONARY_H
#define DICTIONARY_H
#include <iostream>
#include <vector>
#include <set>
class Dictionary
{
public:
Dictionary();
void SearchForAllPossibleWordsIn(std::string searchString);
private:
std::multiset<std::string, std::less<std::string>> mDictionary;
void Initialize(std::string folder);
void InitializeLanguage(std::string folder, std::string languageFileName);
};
#endif // DICTIONARY_H
Dictionary.cpp
#include "Dictionary.h"
#include <vector>
#include <set>
#include <iostream>
#include <fstream>
#include <exception>
Dictionary::Dictionary()
{
Initialize("../Lektion10Projekt15-1/");
}
void Dictionary::Initialize(std::string folder)
{
InitializeLanguage(folder,"da-utf8.wl");
}
void Dictionary::InitializeLanguage(std::string folder, std::string languageFileName)
{
std::ifstream ifs;
ifs.open(folder+languageFileName,std::ios_base::in);
if (ifs.fail()) {
std::cerr <<"Error! Class: Dictionary. Function: InitializeLanguage(...). return: ifs.fail to load file '" + languageFileName + "'" << std::endl;
}else {
std::string word;
while (!ifs.eof()) {
std::getline(ifs,word);
mDictionary.insert(word);
}
}
ifs.close();
}
void Dictionary::SearchForAllPossibleWordsIn(std::string searchString)
{
std::vector<std::string> result;
for (unsigned int a = 0 ; a <= searchString.length(); ++a) {
for (unsigned int b = 1; b <= searchString.length()-a; ++b) {
std::string substring = searchString.substr(a,b);
if (mDictionary.find(substring) != mDictionary.end())
{
result.push_back(substring);
}
}
}
if (!result.empty()) {
for (unsigned int i = 0; i < result.size() ;++i) {
std::cout << result[i] << std::endl;
}
}
}
main.cpp
#include <iostream>
#include "Dictionary.h"
int main()
{
Dictionary myDictionary;
myDictionary.SearchForAllPossibleWordsIn("byggearbejderen");
return 0;
}
I have tried to change the following line in main.cpp
myDictionary.SearchForAllPossibleWordsIn("byggearbejderen");
to (OBS: the first word in the word list is byggearbejderen)
std::ifstream ifs;
ifs.open("../Lektion10Projekt15-1/da-utf8.wl",std::ios::in);
if (ifs.fail()) {
std::cerr <<"Error!" << std::endl;
}else {
std::getline(ifs,searchword);
}
ifs.close();
myDictionary.SearchForAllPossibleWordsIn(searchword);
And then in the main.cpp add som print out with the expected string and substring in hex value.
std::cout << " cout as hex test:" << std::endl;
myDictionary.SearchForAllPossibleWordsIn(searchword);
std::cout << "Suposet search resul for ''bygearbejderen''" << std::endl;
for (char const elt: "byggearbejderen")
std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(elt) << " ";
std::cout << "byggearbejderen" << std::endl;
for (char const elt: "arbejderen")
std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(elt) << " ";
std::cout << "arbejderen" << std::endl;
for (char const elt: "ren")
std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(elt) << " ";
std::cout << "ren" << std::endl;
for (char const elt: "en")
std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(elt) << " ";
std::cout << "en" << std::endl;
for (char const elt: "n")
std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(elt) << " ";
std::cout << "n" << std::endl;
And also added the same print in the outprint of result in Dictonary.cpp
std::cout << "result of seartchword as hex" << std::endl;
if (!result.empty()) {
for (unsigned int i = 0; i < result.size() ;++i)
{
for (char const elt: result[i] )
{
std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(elt) << " ";
}
std::cout << result[i] << std::endl;
}
}
which gave the following output:
result of seartchword as hex
ffffffef ffffffbb ffffffbf 62 79 67 67 65 61 72 62 65 6a 64 65 72 65 6e 0d byggearbejderen
61 72 62 65 6a 64 65 72 65 6e 0d arbejderen
72 65 6e 0d ren
65 6e 0d en
6e 0d n
Suposet search resul for ''bygearbejderen''
62 79 67 67 65 61 72 62 65 6a 64 65 72 65 6e 00 byggearbejderen
61 72 62 65 6a 64 65 72 65 6e 00 arbejderen
72 65 6e 00 ren
65 6e 00 en
6e 00 n
where I notice that some values ​​were different.
I don't know why this is the case when i am on a macOS but not the case on windows. I do not know if there are any settings of encoding in my environment I need to change or set correct.
I would like i my main.cpp looked liked this:
#include <iostream>
#include "Dictionary.h"
int main()
{
Dictionary myDictionary;
myDictionary.SearchForAllPossibleWordsIn("byggearbejderen");
return 0;
}
resulting in the following output:
byggearbejderen
arbejderen
ren
en
n
Line endings for text files are different on Windows than they are on a Mac. Windows uses both CR/LF characters (ASCII codes 13 and 10, respectively). Old Macs used the CR character alone, Linux systems use just the LF. If you create a text file on Windows, then copy it to your Mac, the line endings might not be handled correctly.
If you look at the last character in your output, you'll see it is a 0d, which would be the CR character. I don't know how you generated that output, but it is possible that the getline on the Mac is treating that as a normal character, and including it in the string that has been read in.
The simplest solution is to either process that text file beforehand to get the line endings correct, or strip the CR off the end of the words after they are read in.

Key agreement with XTR-DH Crypto++

I try applied XTR-DH for Key Agreement with this example:
//////////////////////////////////////////////////////////////////////////
// Alice
// Initialize the Diffie-Hellman class with a random prime and base
AutoSeededRandomPool rngA;
DH dhA;
dh.Initialize(rngA, 128);
// Extract the prime and base. These values could also have been hard coded
// in the application
Integer iPrime = dhA.GetGroupParameters().GetModulus();
Integer iGenerator = dhA.GetGroupParameters().GetSubgroupGenerator();
SecByteBlock privA(dhA.PrivateKeyLength());
SecByteBlock pubA(dhA.PublicKeyLength());
SecByteBlock secretKeyA(dhA.AgreedValueLength());
// Generate a pair of integers for Alice. The public integer is forwarded to Bob.
dhA.GenerateKeyPair(rngA, privA, pubA);
//////////////////////////////////////////////////////////////////////////
// Bob
AutoSeededRandomPool rngB;
// Initialize the Diffie-Hellman class with the prime and base that Alice generated.
DH dhB(iPrime, iGenerator);
SecByteBlock privB(dhB.PrivateKeyLength());
SecByteBlock pubB(dhB.PublicKeyLength());
SecByteBlock secretKeyB(dhB.AgreedValueLength());
// Generate a pair of integers for Bob. The public integer is forwarded to Alice.
dhB.GenerateKeyPair(rngB, privB, pubB);
//////////////////////////////////////////////////////////////////////////
// Agreement
// Alice calculates the secret key based on her private integer as well as the
// public integer she received from Bob.
if (!dhA.Agree(secretKeyA, privA, pubB))
return false;
// Bob calculates the secret key based on his private integer as well as the
// public integer he received from Alice.
if (!dhB.Agree(secretKeyB, privB, pubA))
return false;
// Just a validation check. Did Alice and Bob agree on the same secret key?
if (VerifyBufsEqualp(secretKeyA.begin(), secretKeyB.begin(), dhA.AgreedValueLength()))
return false;
return true;
And here my code :
//Alice
AutoSeededRandomPool aSRPA;
XTR_DH xtrA(aSRPA, 512, 256);
Integer iPrime = xtrA.GetModulus();
Integer i_qnumber = xtrA.GetSubgroupOrder();
Integer iGeneratorc1 = xtrA.GetSubgroupGenerator().c1;
Integer iGeneratorc2 = xtrA.GetSubgroupGenerator().c2;
SecByteBlock privateA(xtrA.PrivateKeyLength());
SecByteBlock publicA(xtrA.PublicKeyLength());
SecByteBlock secretKeyA(xtrA.AgreedValueLength());
xtrA.GenerateKeyPair(aSRPA, privateA, publicA);
//Bob
AutoSeededRandomPool aSRPB;
XTR_DH xtrB(iPrime, i_qnumber, iGeneratorc1); // Use c1 or c2 or both ???
SecByteBlock privB(xtrB.PrivateKeyLength());
SecByteBlock publB(xtrB.PublicKeyLength());
SecByteBlock secretKeyB(xtrB.AgreedValueLength());
xtrB.GenerateKeyPair(aSRPB, privateB, publicB);
// Agreement
// Alice calculates the secret key based on her private integer as well as the
// public integer she received from Bob.
if (!xtrA.Agree(secretKeyA, privateA, publicB))
return false;
// Bob calculates the secret key based on his private integer as well as the
// public integer he received from Alice.
if (!xtrB.Agree(secretKeyB, privateB, publicA))
return false;
// Just a validation check. Did Alice and Bob agree on the same secret key?
if (VerifyBufsEqualp(secretKeyA.begin(), secretKeyB.begin(), xtrA.AgreedValueLength()))
return false;
return true;
I got this error
Severity Code Description Project File Line Suppression State
Error C2664 'CryptoPP::XTR_DH::XTR_DH(CryptoPP::XTR_DH &&)': cannot convert argument 3 from 'CryptoPP::Integer' to 'const CryptoPP::GFP2Element &' ConsoleApplication1 d:\tugas akhir\code\consoleapplication1\consoleapplication1\consoleapplication1.cpp 91
My question is :
Number of generator is c1 and c2. Is it need both for make xtrB or just one ?
I have tried take number of p, q and g from xtrA and input it for initiate for xtrB but its too long for integer. What the solution ?
Thanks before
XTR_DH xtrB(iPrime, i_qnumber, iGeneratorc1); // Use c1 or c2 or both ???
You should use the the following constructor from XTR-DH | Constructors:
XTR_DH (const Integer &p, const Integer &q, const GFP2Element &g)
There are two ways to setup xtrB. First, the way that uses the constructor (and artificially small parameters):
$ cat test.cxx
#include "cryptlib.h"
#include "osrng.h"
#include "xtrcrypt.h"
#include <iostream>
int main()
{
using namespace CryptoPP;
AutoSeededRandomPool aSRP;
XTR_DH xtrA(aSRP, 170, 160);
const Integer& iPrime = xtrA.GetModulus();
const Integer& iOrder = xtrA.GetSubgroupOrder();
const GFP2Element& iGenerator = xtrA.GetSubgroupGenerator();
XTR_DH xtrB(iPrime, iOrder, iGenerator);
std::cout << "Prime: " << std::hex << xtrB.GetModulus() << std::endl;
std::cout << "Order: " << std::hex << xtrB.GetSubgroupOrder() << std::endl;
std::cout << "Generator" << std::endl;
std::cout << " c1: " << std::hex << xtrB.GetSubgroupGenerator().c1 << std::endl;
std::cout << " c2: " << std::hex << xtrB.GetSubgroupGenerator().c2 << std::endl;
return 0;
}
And then:
$ g++ -DNDEBUG -g2 -O3 -fPIC -pthread test.cxx ./libcryptopp.a -o test.exe
$ ./test.exe
Prime: 2d4c4f9f4de9e32e84a7be42f019a1a4139e0fe7489h
Order: 89ab07fa5115443f51ce9a74283affaae2d7748fh
Generator
c1: 684fedbae519cb297f3448d5e564838ede5ed1fb81h
c2: 39112823212ccd7b01f10377536f51bf855752c7a3h
Second, the way that stores the domain parameters in an ASN.1 object (and artificially small parameters):
$ cat test.cxx
#include "cryptlib.h"
#include "osrng.h"
#include "files.h"
#include "xtrcrypt.h"
#include <iostream>
int main()
{
using namespace CryptoPP;
AutoSeededRandomPool prng;
XTR_DH xtrA(prng, 170, 160);
xtrA.DEREncode(FileSink("params.der").Ref());
XTR_DH xtrB(FileSource("params.der", true).Ref());
std::cout << "Prime: " << std::hex << xtrB.GetModulus() << std::endl;
std::cout << "Order: " << std::hex << xtrB.GetSubgroupOrder() << std::endl;
std::cout << "Generator" << std::endl;
std::cout << " c1: " << std::hex << xtrB.GetSubgroupGenerator().c1 << std::endl;
std::cout << " c2: " << std::hex << xtrB.GetSubgroupGenerator().c2 << std::endl;
return 0;
}
And then:
$ g++ -DNDEBUG -g2 -O3 -fPIC -pthread test.cxx ./libcryptopp.a -o test.exe
$ ./test.exe
Prime: 2ee076b3254c1520151bbe0391a77971f92e277ba37h
Order: f7674a8c2dd68d32c3da8e74874a48b9adf00fcbh
Generator
c1: 2d469e63b474ac45578a0027a38864f303fad03ba9h
c2: 1d5e5714bc19ef25eee0535584176889df8f26c4802h
And finally:
$ dumpasn1 params.der
0 94: SEQUENCE {
2 22: INTEGER 02 EE 07 6B 32 54 C1 52 01 51 BB E0 39 1A 77 97 1F 92 E2 77 BA 37
26 21: INTEGER 00 F7 67 4A 8C 2D D6 8D 32 C3 DA 8E 74 87 4A 48 B9 AD F0 0F CB
49 21: INTEGER 2D 46 9E 63 B4 74 AC 45 57 8A 00 27 A3 88 64 F3 03 FA D0 3B A9
72 22: INTEGER 01 D5 E5 71 4B C1 9E F2 5E EE 05 35 58 41 76 88 9D F8 F2 6C 48 02
: }
In practice you probably want to use something like this, which validates the parameters after loading them. You should always validate your security parameters.
// Load the domain parameters from somewhere
const Integer& iPrime = ...;
const Integer& iOrder = ...;
const GFP2Element& iGenerator = ...;
// Create the key agreement object using the parameters
XTR_DH xtrB(iPrime, iOrder, iGenerator);
// Verify the the parameters using the key agreement object
if(xtrB.Validate(aSRP, 3) == false)
throw std::runtime_error("Failed to validate parameters");
You are probably going to use something like the second method shown above. That is, you are going to generate your domain parameters once, and then both parties will use them. Below both parties xtrA and xtrB use params.der:
int main()
{
using namespace CryptoPP;
AutoSeededRandomPool prng;
XTR_DH xtrA(FileSource("params.der", true).Ref());
XTR_DH xtrB(FileSource("params.der", true).Ref());
if(xtrA.Validate(prng, 3) == false)
throw std::runtime_error("Failed to validate parameters");
if(xtrB.Validate(prng, 3) == false)
throw std::runtime_error("Failed to validate parameters");
...
}

Construct a string who contain the hexValue of a binary array

I'm trying to construct a string from a byte array (libcrypto++) but I have issues with '0' in order to connect to SQS in c++
The result is almost correct except some '0' go at the end of the string.
std::string shaDigest(const std::string &key = "") {
byte out[64] = {0};
CryptoPP::SHA256().CalculateDigest(out, reinterpret_cast<const byte*>(key.c_str()), key.size());
std::stringstream ss;
std::string rep;
for (int i = 0; i < 64; i++) {
ss << std::hex << static_cast<int>(out[i]);
}
ss >> rep;
rep.erase(rep.begin()+64, rep.end());
return rep;
}
output:
correct : c46268185ea2227958f810a84dce4ade54abc4f42a03153ef720150a40e2e07b
mine : c46268185ea2227958f810a84dce4ade54abc4f42a3153ef72015a40e2e07b00
^ ^
Edit: I'm trying to do the same that hashlib.sha256('').hexdigest() in python does.
If that indeed works, here's the solution with my suggestions incorporated.
std::string shaDigest(const std::string &key = "") {
std::array<byte, 64> out {};
CryptoPP::SHA256().CalculateDigest(out.data(), reinterpret_cast<const byte*>(key.c_str()), key.size());
std::stringstream ss;
ss << std::hex << std::setfill('0');
for (byte b : out) {
ss << std::setw(2) << static_cast<int>(b);
}
// I don't think `.substr(0,64)` is needed here;
// hex ASCII form of 64-byte array should always have 128 characters
return ss.str();
}
You correctly convert bytes in hexadecimal, and it works correctly as soon as the byte value is greater than 15. But below, the first hexa digit is a 0 and is not printed by default. The two absent 0 are for 0x03 -> 3 and 0x0a -> a.
You should use :
for (int i = 0; i < 64; i++) {
ss << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(out[i]);
}
You need to set the width for the integer numbers for the proper zero-padding of numbers with otherwise less than two hexadecimal digits. Note that you need to re-set the width before every number that is inserted into the stream.
Example:
#include <iostream>
#include <iomanip>
int main() {
std::cout << std::hex << std::setfill('0');
for (int i=0; i<0x11; i++)
std::cout << std::setw(2) << i << "\n";
}
Output:
$ g++ test.cc && ./a.out
00
01
02
03
04
05
06
07
08
09
0a
0b
0c
0d
0e
0f
10
For reference:
http://en.cppreference.com/w/cpp/io/manip/setw
http://en.cppreference.com/w/cpp/io/manip/setfill

segmentation fault on getline in Ubuntu

I'm having the famous segmentation fault. I've tracked it down to a single line in the code (getline). Here's someone with a similar issue, also on Ubuntu:
http://www.daniweb.com/software-development/cpp/threads/329191
Note that getline returns -1 after the segmentation fault, but it couldn't have been really the end of the stream (in my case).
When the stream is smaller, everything goes ok. As we can deduce from the output, the segmentation fault is on line 98.
1 /*
2 * File: RequestDispatcher.cpp
3 * Author: albert
4 *
5 * Created on July 8, 2011, 7:15 PM
6 */
7
8 #include "iostream"
9 #include "fstream"
10 #include "stdlib.h"
11 #include "stdio.h"
12 #include "cstring"
13 #include "algorithm"
14
15 #include "RequestDispatcher.h"
16 #include "Functions.h"
17
18 #define PROXIES 1
19
20 RequestDispatcher::RequestDispatcher()
21 {
22 }
23
24 RequestDispatcher::RequestDispatcher(const RequestDispatcher& orig)
25 {
26 }
27
28 RequestDispatcher::~RequestDispatcher()
29 {
30 }
31
32 int RequestDispatcher::addRequest(string host, string request, IResponseReceiver* response_receiver)
33 {
34 RequestInfo info;
35 info.request_index = request_info.size();
36 info.host = host;
37 info.request = request;
38 info.response_receiver = response_receiver;
39 request_info.push_back(info);
40 return info.request_index;
41 }
42
43 void RequestDispatcher::run()
44 {
45 if (request_info.size()==0)
46 {
47 return;
48 }
49 FILE* pipe[PROXIES];
50 int per_proxy = (request_info.size() + PROXIES - 1) / PROXIES;
51 int count_pipes = (request_info.size() + per_proxy - 1) / per_proxy;
52 for (int pipe_index=0; pipe_index<count_pipes; ++pipe_index)
53 {
54 int from = pipe_index * per_proxy;
55 int to = min(from + per_proxy, int(request_info.size()));
56 cout << "FROM: "<< from << "; TO: " << to;
57 const char* cmd = generateCmd(from, to);
58 pipe[pipe_index] = popen(cmd, "r");
59 if (!pipe[pipe_index])
60 {
61 cerr << "Error executing command in RequestDispatcher::run()";
62 }
63 }
64 string result[PROXIES];
65 bool finished[PROXIES];
66 for (int pipe_index=0; pipe_index<count_pipes; pipe_index++)
67 {
68 finished[pipe_index] = false;
69 }
70 int count_finished = 0;
71 char* buffer;
72 size_t buffer_length=1024;
73 buffer = (char *) malloc (buffer_length + 1);
74 while (count_finished < count_pipes)
75 {
76 cout << "D\n";
77 fflush(stdout);
78 for(int pipe_index=0; pipe_index<count_pipes; ++pipe_index)
79 {
80 cout << "E\n";
81 fflush(stdout);
82 if (finished[pipe_index])
83 {
84 continue;
85 }
86 cout << "Getline" << buffer_length << "\n";
87 ssize_t bytes_read = getline(&buffer, &buffer_length, pipe[pipe_index]);
88 cout << "Getline Done ("<<bytes_read<< "," << buffer_length << ")\n";
89 fflush(stdout);
90 while (bytes_read>0)
91 {
92 for (int i=0; i<bytes_read; i++)
93 {
94 result[pipe_index] += buffer[i];
95 }
96 cout << "P\n";
97 fflush(stdout);
98 bytes_read = getline(&buffer, &buffer_length, pipe[pipe_index]);
99 cout << "Bytes read ("<<bytes_read<<","<< buffer_length << ")\n";
100 fflush(stdout);
101
102 }
103 if (bytes_read == -1) // then finished this pipe
104 {
105 string* r = &result[pipe_index];
106 //cout << *r;
107 finished[pipe_index] = true;
108 ++count_finished;
109 cout << "HI\n";
110 fflush(stdout);
111 // delete trailing '\0' from result
112 pclose(pipe[pipe_index]);
113 result[pipe_index] = result[pipe_index].substr(0, result[pipe_index].length()-1);
114 int pos = r->find("RESPONSE_DATA");
115 int valuepos, endvaluepos;
116 int request_index, length;
117 string headers;
118 int headerslength;
119 string body;
120 int bodypos, bodylength;
121 while (pos!=r->npos)
122 {
123 valuepos = r->find("REQUEST_INDEX=", pos) + 14;
124 endvaluepos = r->find("\n", valuepos);
125 request_index = pipe_index * per_proxy + atoi(r->substr(valuepos, endvaluepos-valuepos).c_str());
126
127 cout << "REQUEST_INDEX " << request_index;
128
129 valuepos = r->find("LENGTH=", pos) + 7;
130 endvaluepos = r->find("\n", valuepos);
131 length = atoi(r->substr(valuepos, endvaluepos-valuepos).c_str());
132
133 pos = r->find("START", pos)+5;
134 bodypos = r->find("\r\n\r\n", pos)+4;
135 headerslength = bodypos-pos-4;
136 bodylength = length-headerslength-4;
137 headers = r->substr(pos, headerslength);
138 body = r->substr(bodypos, bodylength);
139 request_info[request_index].response_receiver->notifyResponse(headers, body, request_index);
140
141 pos=r->find("RESPONSE_DATA", pos+length);
142 }
143 }
144 }
145 }
146 cout << "\n?\n";
147 fflush(stdout);
148 free(buffer);
149 request_info.clear();
150 }
151
152 const char* RequestDispatcher::generateCmd(int first_request, int to_request)
153 {
154 string r("/home/albert/apachebench-standalone-read-only/ab -a");
155 for (int i=first_request; i<to_request; i++)
156 {
157 r.append(" '");
158 r.append(request_info.at(i).request);
159 r.append("'");
160 }
161 ofstream out("/home/albert/apachebench-standalone-read-only/debug");
162 if(! out)
163 {
164 cerr<<"Cannot open output file\n";
165 return "";
166 }
167 out << r.c_str();
168 out.close();
169 return "/home/albert/apachebench-standalone-read-only/debug";
170 /*int size = strlen("/home/albert/apachebench-standalone-read-only/ab -a");
171 for (int i=first_request; i<to_request; i++)
172 {
173 size += 2+strlen(request_info.at(i).request)+1;
174 cout << "len: " << strlen(request_info.at(i).request) << "\n";
175 cout << "total: " << size << "\n";
176 }
177 size += 1;
178 char* cmd = new char[size];
179 strcpy(cmd, "/home/albert/apachebench-standalone-read-only/ab -a");
180 for (int i=first_request; i<to_request; i++)
181 {
182 cout << "LEN: " << strlen(cmd) << "\n";
183 cout << "NEXT: " << strlen(request_info.at(i).request) << "\n";
184 fflush(stdout);
185 strcat(cmd, " '");
186 strcat(cmd, request_info.at(i).request);
187 strcat(cmd, "'");
188 }
189 cout << "LEN: " << strlen(cmd) << "\n";
190 fflush(stdout);
191 return cmd;*/
192 }
When I run /home/albert/apachebench-standalone-read-only/debug from the command line everything works perfectly fine. It returns binary data.
The end of the output is:
P
Bytes read (272,6828)
P
Bytes read (42,6828)
P
Bytes read (464,6828)
P
Bytes read (195,6828)
P
Bytes read (355,6828)
P
Bytes read (69,6828)
P
Bytes read (111,6828)
P
Segmentation fault
Bytes read (368,6828)
P
Bytes read (-1,6828)
HI
REQUEST_INDEX 46REQUEST_INDEX 48REQUEST_INDEX 44REQUEST_INDEX 0REQUEST_INDEX 45
?
Mind the "?" for exiting the loop. After this, the program is finished.
By the way, I always thought the program would terminate on a segmentation fault (edit: I did not do anything to catch it).
In reply to some answers: There seem to be different versions of getline and I seem to be using the one documented here:
http://www.kernel.org/doc/man-pages/online/pages/man3/getline.3.html
So after some thought the issue I believe is that your buffer is being written to as you're reading it. In some cases the buffer is not done being written to and you remove some of the data from it (which could mean that you may read an empty buffer because the write isn't done). This is because you are using popen and simply piping data from another process. What I would recommend is that for one you use the C++ standard for getline (although both are somewhat unsafe) and that you have some leeway for reading data from the pipe. Retry logic might be what you need as I can't think of a clean way to solve this. If anyone knows please post it, I'm posting this because this is what I believe to be the likely culprit of the problem.
Also if you're coding in C++ I highly recommend that you use the C++ libraries so that you're not constantly mixing or casting between types (such as string to char * and such, it just saves you some hassle) and that you use the safer versions of methods so you avoid errors such as buffer overflows.