2 functions that should be the same give different results - c++

I'm reading a file into a vector, to read bytes off of it. The standalone function works fine, but I don't want to open and close a file and create a vector every time i access, since im going to be grabbing lots of bytes. So I opted to put it in a file class. However, this has lead to me getting entirely different results when I use the file class.
This is a working standalone function:
template<class T>
T readFromNewFile(std::string path, size_t size = 0, uint offset = 0)
{
std::ifstream in(path.c_str(), std::ios::binary | std::ios::ate);
if (in.is_open())
{
std::streamoff max = in.tellg();
size_t _size = size == 0 ? (size_t)max : size;
in.seekg(offset, std::ios::beg);
std::vector<uint> v(_size);
in.read((char*)v.data(), _size);
uint temp = 0;
std::stringstream ss;
for (uint it = 0; it < v.size(); it++)
{
std::cout << "readFromNewFile: Adding offset: " << offset + it << ", with value: " << v[it] << "\n";
temp += v[it];
}
ss << temp;
T myVar;
ss >> myVar;
return myVar;
}
return T();
}
The file class in question:
class File
{
public:
File() {}
File(std::string path)
{
LOAD(path);
}
~File() {}
void LOAD(std::string path)
{
std::ifstream in(path.c_str(), std::ios::binary | std::ios::ate);
if (in.is_open())
{
std::streamoff max = in.tellg();
size_t _size = (size_t)max;
in.seekg(0, std::ios::beg);
data.clear();
data.resize(_size);
in.read((char*)data.data(), _size);
}
}
template<class T>
T readFromVector(size_t size = 0, uint offset = 0)
{
uint temp = 0;
std::stringstream ss;
for (uint it = offset; it < (offset + size) && it < data.size(); it++)
{
std::cout << "readFromVector: reading offset: " << it << ", with value: " << data[it] << "\n";
temp += data[it];
}
ss << temp;
T myVar;
ss >> myVar;
return myVar;
}
private:
std::vector<uint> data;
};
The code is all called in this block:
bool ArmourManager::init()
{
std::string armourFile = "resources\\armor.am_dat";
Offsets::ArmourOffsets offsets;
data.LOAD(armourFile); // data is a File object in ArmourManager class
uint armourOffset = 0x0A;
uint textOffset = 0x41C06;
// armours start at offset 0x0A, increase by 0x3C for every armour in file
// appears to end at 187390 ? 0x2DBFE‬
//std::cout << "[OFFSET]\t[IDX]\t[RAR]\t[SLOT]\t[DEF]\t[SLOTS]\n";
while (armourOffset < 0x2DBFE)
{
uint ind = SetMaker::readFromNewFile<uint>(armourFile, 4, armourOffset + offsets.Index);
uint x = data.readFromVector<uint>(4, armourOffset + offsets.Index);
std::cout << ind << " / " << x << "\n";
Console Output Here
From what i can tell with debugging, the data vector isn't being populated correctly, it's being filled with weird data.
Checking first index of both vectors
Even when I tried something like this,
std::vector<uint> x(_size);
in.read((char*)x.data(), _size);
std::cout << "x byte 0: " << x[0] << "\n";
the result is still the exact same.

Related

Collecting many printf calls in a single string

I am dealing with some code that performs RC4 encryption algorithm with some params passed into the function. From there I am trying to append the generated hash to an empty string but have failed with a few of my attempts. I had seen the use of snprintf() but how could I go about converting the code below to save what gets printed to a string?
for (size_t i = 0, len = strlen(plaintext); i < len; i++) {
printf("|x%02hhx| ", hash[i]);
}
Why not use C++.
#include <iomanip>
#include <iostream>
#include <sstream>
#include <cstring>
int main() {
char plaintext[] = "12345";
char hash[] = "123\xf0\x0f";
std::stringstream out;
for (size_t i = 0, len = strlen(plaintext); i < len; i++) {
out << "|x"
<< std::setfill('0') << std::setw(2) << std::setbase(16)
// ok, maybe this is the reason.
<< 0xff & hash[i]
<< "| ";
}
std::cout << out.str();
}
Just work with std::string::data after determining the size of the output of std::snprintf:
template<class...Args>
std::string PrintFToString(char const* format, Args...args)
{
std::string result;
char c;
int requiredSize = std::snprintf(&c, 1, format, args...);
if (requiredSize < 0)
{
throw std::runtime_error("error with snprintf");
}
result.resize(requiredSize);
int writtenSize = std::snprintf(result.data(), requiredSize+1, format, args...);
assert(writtenSize == requiredSize);
return result;
}
template<class...Args>
void AppendPrintFToString(std::string& target, char const* format, Args...args)
{
char c;
int requiredSize = std::snprintf(&c, 1, format, args...);
if (requiredSize < 0)
{
throw std::runtime_error("error with snprintf");
}
auto const oldSize = target.size();
target.resize(oldSize + requiredSize);
int writtenSize = std::snprintf(target.data() + oldSize, requiredSize+1, format, args...);
assert(writtenSize == requiredSize);
}
int main() {
std::cout << PrintFToString("|x%02hhx| ", 33) << '\n';
std::string output;
for (int i = 0; i != 64; ++i)
{
AppendPrintFToString(output, "|x%02hhx| ", i);
output.push_back('\n');
}
std::cout << output;
}
Note: If you know a reasonable upper bound for the number of characters of the output, you could use a char array allocated on the stack for output instead of having to use 2 calls to std::snprintf...

Hash multiple files

I'm trying to hash multiple files, but there is an error.
My files name start from Cheque 083654.tif - 08365122.tif
My code:
for (int i = 4; i < 123; i++)
{
stringstream file;
file<< "C:/Users/user/Desktop/datasets/Cheque 08365" << i << ".tif";
string filename = file.str();
cout << filename << '\n';
unsigned char *sha256digest = calculateSHA256(filename);
char *sha256hash = (char *)malloc(sizeof(char) * 65);
sha256hash[65] = '\0';
for (int i = 0; i < SHA256_DIGEST_LENGTH; i++)
{
sprintf(&sha256hash[i * 2], "%02x", sha256digest[i]);
}
printf("SHA256 HASH: %s\n", sha256hash);
system("pause");
}
The error states that no suitable conversion function from string to char * exists at the filename in:
unsigned char *sha256digest = calculateSHA256(filename);
How can I solve this error?
If calculateSHA256 returns std::string, that assignment is illegal in more ways than one. char * is just a pointer to that storage, string returned by function is a temporal object, which stops existing after semicolon. First, you have to save that string, second, to access its data by appropriate member function. There is no way to convert string directly to a pointer.
Or don't use pointer at all. You would find it better to avoid using C idioms at all.
std::string sha256digest = calculateSHA256(filename);
// FORMATTED OUTPUT
std::stringstream hashstr;
hashstr << std::hex << std::setfill('0');
for( auto x : sha256digest ) // this would iterate through entirety of string
{
hashstr << std::setw(2) << static_cast<int>(static_cast<unsigned char>(x));
}
std::string output;
hashstr >> output;
std::cout << "SHA256 HASH: " << output;
You do not need a stringstream to construct the filename. Use std::to_sting().
I think most of the simplification can be done inside your own calculateSHA256() function. Let it return a std::vector or std::string instead of a char*.
Here's an example where I let it return a std::vector<std::uint8_t> instead:
#include <openssl/evp.h>
#include <openssl/sha.h>
#include <cstdint>
#include <cstdio>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <stdexcept>
#include <string>
#include <vector>
// An EVP_MD_CTX helper class
class EvpMdCtx {
public:
explicit EvpMdCtx(const EVP_MD* type, ENGINE* impl = nullptr) : EvpMdCtx() {
if(init(type, impl) == 0)
throw std::runtime_error("EVP_DigestInit_ex failed");
}
EvpMdCtx() : ctx(EVP_MD_CTX_new()) {
if(ctx == nullptr) throw std::runtime_error("EVP_MD_CTX_new failed");
}
EvpMdCtx(const EvpMdCtx&) = delete;
EvpMdCtx& operator=(const EvpMdCtx&) = delete;
~EvpMdCtx() { EVP_MD_CTX_free(ctx); }
int init(const EVP_MD* type, ENGINE* impl = nullptr) {
return EVP_DigestInit_ex(ctx, type, impl);
}
int update(const void* d, size_t cnt) { return EVP_DigestUpdate(ctx, d, cnt); }
auto finalize() {
std::vector<std::uint8_t> md_value(EVP_MAX_MD_SIZE);
unsigned md_len;
if(EVP_DigestFinal_ex(ctx, md_value.data(), &md_len) == 0)
md_value.clear();
else
md_value.resize(md_len);
return md_value;
}
private:
EVP_MD_CTX* ctx;
};
std::vector<std::uint8_t> calculateSHA256(const std::string& filename) {
std::ifstream is(filename);
if(not is) return {};
EvpMdCtx ctx(EVP_sha256());
char buf[BUFSIZ]; // a buffer to fill
while(true) {
is.read(buf, std::size(buf));
auto len = is.gcount();
if(len > 0) {
if(ctx.update(buf, static_cast<size_t>(len)) == 0) return {};
} else {
break;
}
}
// finalize
return ctx.finalize();
}
int main() {
const std::string file = "C:/Users/user/Desktop/datasets/Cheque 08365";
for(int i = 4; i <= 122; ++i) {
std::string filename = file + std::to_string(i) + ".tif";
auto res = calculateSHA256(filename);
if(res.empty()) {
std::cout << "Failed: " << filename << '\n';
} else {
std::cout << std::hex << std::setfill('0');
for(auto v : res) {
std::cout << std::setw(2) << static_cast<int>(v);
}
std::cout << ' ' << filename << '\n';
}
}
}

parsing text and output "time of appearance" of some lines [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
So I got file that look like:
$GPGGA,124613.90,5543.3221231,N,03739.1368442,E,1,15,0.69,147.0851,M,14.4298,M,,*54
$GPGSV,3,1,10,27,12,078,41,05,31,308,49,16,25,043,44,02,11,268,44*7E
$GPGSV,3,2,10,26,03,031,39,07,74,216,52,09,58,121,52,30,39,234,48*71
$GPGSV,3,3,10,23,30,116,46,04,37,114,47*79
$GLGSV,2,1,07,84,17,338,43,78,15,212,48,85,12,032,46,67,84,223,53*67
$GLGSV,2,2,07,77,67,195,47,76,50,047,54,66,32,144,52*5C
$GPGGA,124614.00,5543.3221239,N,03739.1368445,E,1,15,0.69,147.0864,M,14.4298,M,,*53
$GPGSV,3,1,10,27,12,078,41,05,31,308,49,16,25,043,43,02,11,268,44*79
$GPGSV,3,2,10,26,03,031,39,07,74,216,52,09,58,121,52,30,39,234,48*71
$GPGSV,3,3,10,23,30,116,46,04,37,114,47*79
$GLGSV,2,1,07,84,17,338,43,78,15,212,48,85,12,032,46,67,84,223,53*67
$GLGSV,2,2,07,77,67,195,47,76,50,047,54,66,32,144,52*5C
My cod is checking check sum of string and output some values in strings.
In $GPGGA line "124614.00" is time. 12 hours 46 minutes 14.00 sec. I need to output time of "appearance" $GPGSV lines. I`ve tried subtract first value and the following ones through the pointer, but I must have messed up somewhere.
#include <iostream>
#include <fstream>
#include <string>
#include <stdlib.h>
#include <numeric>
#include <cstdlib>
#include <cstring>
#include <stdio.h>
int checksum(const char* s) {
int c = 0;
while (*s)
c ^= *s++;
return c;
}
int main() {
char linec_h[200];
int k, key;
int* hour = NULL;
int* minute = NULL;
float* sec = NULL;
std::string line, key_s;
std::ifstream logs_("C:/Users/Olya/Desktop/broken.txt");
std::ofstream pout("C:/Users/Olya/Desktop/outLOG.txt");
if (logs_.is_open()) {
while (getline(logs_, line)) {
key_s = line.substr(line.length() - 2, 2);
key = strtol(key_s.c_str(), NULL, 16);
line = line.substr(1, line.length() - 4);
strcpy_s(linec_h, line.c_str());
if (key != checksum(linec_h))
pout << "Line is corrupted!" << std::endl;
else {
k = 0;
if (line.substr(0, 5) == "GPGGA") {
if (hour, minute, sec) {
*hour = stoi(line.substr(5, 2)) - *hour;
*minute = stoi(line.substr(7, 2)) - *minute;
*sec = stof(line.substr(9, 4)) - *sec;
}
else {
hour = new int;
minute = new int;
sec = new float;
*hour = stoi(line.substr(5, 2));
*minute = stoi(line.substr(7, 2));
*sec = stof(line.substr(9, 4));
}
} else if (line.substr(0, 5) == "GPGSV") {
for (size_t i = 0, SNR = 7, N = 4; i < line.size(); i++) {
if (line[i] == ',')
k++;
if (k == N) {
pout << "Satellite number -- " << line.substr(i + 1, 2) << " ";
if ((N += 4) > 16)
;
} else if (k == SNR) {
pout << "SNR -- " << line.substr(i + 1, 2) << " time -- " << hour
<< "." << minute << "." << sec << std::endl;
if ((SNR += 4) > 19)
break;
}
}
}
}
delete hour;
delete minute;
delete sec;
}
logs_.close();
std::cout << "Success" << std::endl;
} else
std::cout << "File is not open" << '\n';
pout.close();
return 0;
}
Just for the FUn of it. I created a complete solution which parses your GPS NMEA format completely and put all results in structs. So you can get ALL satellite data.
However. I show only the values that you used in your example.
I adapted my coding style to yours. In C++ I would do things completel different. Anyway.
Please find attached an complete example:
#include <string>
#include <ctime>
#include <cstring>
#include <iostream>
#include <fstream>
#include <iomanip>
constexpr size_t NumberOfFixQualityStrings = 9;
constexpr size_t NumberOfSatellitesPerGSVSentencePart = 4;
constexpr size_t MaxNumberOfPartsInSentence = 10;
constexpr size_t MaxTokensInSentence = 64;
constexpr size_t NumberOfFieldsInGGA = 12;
std::string fixQualityString[NumberOfFixQualityStrings]{
"invalid", "GPS fix (SPS)", "DGPS fix", "PPS fix", "Real Time Kinematic", "Float RTK",
"estimated (dead reckoning", "Manual input mode", "Simulation mode" };
// essential fix data which provide 3D location and accuracy data
struct GGA {
// Time of last satellite fix
unsigned int fixTimeInUtcHours{};
unsigned int fixTimeInUtcMinutes{};
unsigned int fixTimeInUtcSeconds{};
unsigned int fixTimeInUtcMilliSeconds{};
// Position: Lattitude
unsigned int lattitudeInDegree{};
double lattitudeInMinutes{};
std::string lattitideDirection{};
// Position: Longitude
unsigned int longitudeInDegree{};
double longitudeInMinutes{};
std::string longitudeDirection{};
// FixQuality // see dteails as string above
unsigned int fixQuality{};
std::string fixQualityString{};
// Number of satellites being tracked (can be more than shown in GSV, not all are beeing used for calculation)
unsigned int numberOfTrackedSatellites{};
// Horizontal dilution of position
double horizontalDilution{};
// Altitude, Meters, above mean sea level
double altitude{};
std::string altitudeDimension{};
// Height of geoid (mean sea level) above WGS84 ellipsoid
double goidHight{};
std::string goidHightDimension{};
};
// Detail information for satellites in satellit view (GSV)
struct SatelliteData {
std::string satellitePRNnumber{};
double elevationInDegress{};
double azimuthInDegrees{};
double snr{}; // signal noise ratio
};
// Part of a GSV sentence
struct GSVSentencePart {
size_t numberOfSentencesForFullData{};
size_t sentencePartNumber{};
size_t numberOfSatellitesInView{};
size_t numberOfSatellitesInThisPart{};
SatelliteData satelliteData[NumberOfSatellitesPerGSVSentencePart];
};
struct GSV
{
GSVSentencePart gsvSentencePart[MaxNumberOfPartsInSentence];
size_t numberOfParts{};
};
bool checksumTest(std::string& line) {
bool result{ false };
// Check, if there is a 2 digt checksum at the end and convert it to decimal
if (size_t pos{}, checkSumGiven{ std::stoul(line.substr(line.size() - 2), &pos, 16) }; pos == 2)
{
// Strip off checksum part
line = line.substr(1,line.size() - 4);
// Calculate checksum
unsigned char calculatedChecksum{ 0U }; for (const unsigned char c : line) calculatedChecksum ^= c;
// Get result
result = (calculatedChecksum == checkSumGiven);
}
return result;
}
// Split all strings into a tokens
size_t splitIntoTokens(std::string& s, std::string (&tokens)[MaxTokensInSentence]) {
// Number of converted tokens
size_t numberOfTokens{ 0 };
// First check checksum
if (checksumTest(s)) {
// Now split along each comma
for (size_t i{ 0U }, startpos{ 0U }; i < s.size(); ++i) {
// So, if there is a comma or the end of the string
if ((s[i] == ',') || (i == (s.size() - 1))) {
// Copy substring
tokens[numberOfTokens++] = s.substr(startpos, i - startpos);
startpos = i + 1;
}
}
}
return numberOfTokens;
}
GGA convertStringToGGA(std::string& s) {
GGA gga;
// Split string into tokens and check, if it worked
if (std::string tokens[MaxTokensInSentence]; splitIntoTokens(s, tokens) > NumberOfFieldsInGGA && tokens[0] == "GPGGA") {
gga.fixTimeInUtcHours = std::stoul(tokens[1].substr(0, 2));
gga.fixTimeInUtcMinutes = std::stoul(tokens[1].substr(2, 2));
gga.fixTimeInUtcSeconds = std::stoul(tokens[1].substr(4, 2));
gga.fixTimeInUtcMilliSeconds = std::stod(tokens[1].substr(6, 2))*1000.0;
gga.lattitudeInDegree = std::stoul(tokens[2].substr(0, 2));
gga.lattitudeInMinutes = std::stod(tokens[2].substr(2));
gga.lattitideDirection = tokens[3];
gga.longitudeInDegree = std::stoul(tokens[4].substr(0, 2));
gga.longitudeInMinutes = std::stod(tokens[4].substr(2));
gga.longitudeDirection = tokens[5];
gga.fixQuality = std::stoul(tokens[6]);
gga.fixQualityString = (gga.fixQuality < NumberOfFixQualityStrings) ? fixQualityString[gga.fixQuality] : fixQualityString[0];
gga.numberOfTrackedSatellites = std::stoul(tokens[7]);
gga.horizontalDilution = std::stod(tokens[8]);
gga.altitude = std::stod(tokens[9]);
gga.altitudeDimension = tokens[10];
gga.goidHight = std::stod(tokens[11]);
gga.goidHightDimension = tokens[12];
}
return gga;
}
GSVSentencePart convertToGSVSentencePart(std::string& s) {
GSVSentencePart gsvsp;
// Split string into tokens and check, if it worked
std::string tokens[MaxTokensInSentence];
if (size_t numberOfCOnvertedTokens = splitIntoTokens(s, tokens); numberOfCOnvertedTokens > 0 && tokens[0] == "GPGSV") {
gsvsp.numberOfSentencesForFullData = std::stoul(tokens[1]);
gsvsp.sentencePartNumber = std::stoul(tokens[2]);
gsvsp.numberOfSatellitesInView = std::stoul(tokens[3]);
gsvsp.numberOfSatellitesInThisPart = 0;
for (size_t currentToken = 4; currentToken < numberOfCOnvertedTokens; currentToken += 4) {
gsvsp.satelliteData[gsvsp.numberOfSatellitesInThisPart].satellitePRNnumber = tokens[currentToken];
gsvsp.satelliteData[gsvsp.numberOfSatellitesInThisPart].elevationInDegress = stod(tokens[currentToken + 1]);
gsvsp.satelliteData[gsvsp.numberOfSatellitesInThisPart].azimuthInDegrees= stod(tokens[currentToken + 2]);
gsvsp.satelliteData[gsvsp.numberOfSatellitesInThisPart].snr = stod(tokens[currentToken + 3]);
++gsvsp.numberOfSatellitesInThisPart;
}
}
return gsvsp;
}
std::string calculateElapsedTime(const GGA& previousGGA, const GGA& nextGGA) {
std::tm tmPrevious{}, tmNext{};
tmPrevious.tm_year = 100; tmPrevious.tm_mon = 1; tmPrevious.tm_mday = 1;
tmNext.tm_year = 100; tmNext.tm_mon = 1; tmNext.tm_mday = 1;
tmPrevious.tm_hour = previousGGA.fixTimeInUtcHours;
tmPrevious.tm_min = previousGGA.fixTimeInUtcMinutes;
tmPrevious.tm_sec = previousGGA.fixTimeInUtcSeconds;
std::time_t previousTime = std::mktime(&tmPrevious);
tmNext.tm_hour = nextGGA.fixTimeInUtcHours;
tmNext.tm_min = nextGGA.fixTimeInUtcMinutes;
tmNext.tm_sec = nextGGA.fixTimeInUtcSeconds;
std::time_t nextTime = std::mktime(&tmNext);
double diff = std::difftime(nextTime, previousTime);
diff = diff + 1.0*nextGGA.fixTimeInUtcMilliSeconds/1000.0- 1.0*previousGGA.fixTimeInUtcMilliSeconds/1000.0;
return std::to_string(diff);
}
int main() {
// Open file and check, if it is open
if (std::ifstream nmeaFile("r:\\log.txt"); nmeaFile) {
GGA previousGGA;
GGA nextGGA;
GSV gsv;
size_t state{ 0 };
for (std::string line{}; std::getline(nmeaFile, line); ) {
switch ( state) {
case 0: // wait for first GGA data
if (line.substr(0, 6) == "$GPGGA") {
previousGGA = nextGGA;
nextGGA = convertStringToGGA(line);
state = 1;
gsv = {};
}
break;
case 1: // wait for GSV
if (line.substr(0, 6) == "$GPGSV") {
gsv.gsvSentencePart[gsv.numberOfParts] = convertToGSVSentencePart(line);
if (gsv.gsvSentencePart[gsv.numberOfParts].numberOfSentencesForFullData ==
gsv.gsvSentencePart[gsv.numberOfParts].sentencePartNumber) {
state = 0;
++gsv.numberOfParts;
// Now all data are available in reable and structed format.
// You can do, what you want with them
// For example, we can print all Satellite Data:
size_t counter{ 0 };
for (size_t i = 0; i < gsv.numberOfParts; ++i) {
for (size_t j = 0; j < gsv.gsvSentencePart[i].numberOfSatellitesInThisPart; j++) {
std::cout << "Satellite: " << std::setw(2) << ++counter << " Satellite name: " <<
std::setw(3) << gsv.gsvSentencePart[i].satelliteData[j].satellitePRNnumber <<
" SNR: " << std::setw(8) << gsv.gsvSentencePart[i].satelliteData[j].snr <<
" Elapsed time: "<< calculateElapsedTime(previousGGA, nextGGA)<< " s\n";
}
}
--gsv.numberOfParts;
}
++gsv.numberOfParts;
}
break;
}
}
}
return 0;
}
I can see bugs like if (hour, minute, sec) { and many C-Style code, operating with pointers or so. I do not want to debug you code.
As a samll hint for you, I created a parser that reads all source lines, splits tem into tokens and checks the checksum.
Only a few lines of code will do the trick. From that on you can develop further.
#include <iostream>
#include <regex>
#include <vector>
#include <iterator>
#include <string>
#include <utility>
#include <algorithm>
#include <functional>
#include <numeric>
#include <fstream>
const std::regex re{ R"(\$(.*)\*[abcdefABCDEF\d]{2})" };
const std::regex delimiter{ "," };
using Tokens = std::vector<std::string>;
std::tuple<bool, Tokens> checkString(const std::string& str) {
// Return value of the function. Assume that string is not ok
std::tuple<bool, std::vector<std::string>> result(false, {});
// We want to find a string in the given format
std::smatch sm{};
if (std::regex_match(str, sm, re)) {
// OK, found. Validate checksum
if (std::string s = sm[1]; std::stoul(str.substr(str.size() - 2), nullptr, 16) == std::accumulate(s.begin(), s.end(), 0U, std::bit_xor<unsigned char>())) {
// Tokenize string
Tokens tokens(std::sregex_token_iterator(str.begin(), str.end(), delimiter, -1), {});
// Build return value
result = std::make_tuple(true, std::move(tokens));
}
}
return result;
}
int main() {
std::vector<Tokens> csvData{};
// Open file and check if it is open
if (std::ifstream logs("r:\\log.txt"); logs) {
// Read all lines of files
for (std::string line{}; std::getline(logs, line);) {
if (const auto& [ok, data] = checkString(line); ok) {
csvData.push_back(std::move(data));
}
else {
std::cerr << "**** Coruppted: " << line << "\n";
}
}
}
// So, now we have read all csv data
// Show eight column of GPGSV data
for (const Tokens& t : csvData) {
if (t[0] == "$GPGGA") {
std::cout << "$GPGGA -->" << t[1] << "\n";
}
else if (t[0] == "$GPGSV") {
std::cout << "$GPGSV -->" << t[4] << " " << t[7] << "\n";
}
}
return 0;
}
Of course there are many other possibilities . .

Difference in file size on hard disk and RAM

I have data file of 36MB(each value in file is double type) residing on hard disk. My question is, when I read this file via c++ in RAM putting content in matrix (provided by boost library), does it going to occupy only 36MB of RAM or different? Am I running out of memory?
The reason is that I am on 64-bit ubuntu platform with 8 GB RAM and I am getting bad allocation error. The same file reading program works fine for small data files.
Below is snippet to load the (data real-sim )[https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html] . x and y are boost matrix and vector respectively declared as extern in .h file.
void load_data(const char* filename)
{
ifstream in(filename);
string line;
int line_num = 0;
if (in.is_open()) {
while (in.good()) {
getline(in, line);
if (line.empty()) continue;
int cat = 0;
if (!parse_line(line, cat, line_num)) {
cout << "parse line: " << line << ", failed.." << endl;
continue;
}
y(line_num) = cat;
line_num += 1;
}
in.close();
}
}
bool debug = false;
using namespace boost::numeric::ublas;
vector<double> y(no_records);
matrix<double> x(no_records,no_features);
using namespace std;
template < class T>
void convert_from_string(T& value, const string& s)
{
stringstream ss(s);
ss >> value;
}
int get_cat(const string& data) {
int c;
convert_from_string(c, data);
return c;
}
bool get_features(const string& data, int& index, double& value) {
int pos = data.find(":");
if (pos == -1) return false;
convert_from_string(index, data.substr(0, pos));
convert_from_string(value, data.substr(pos + 1));
return true;
}
bool parse_line(const string& line, int& cat, const int line_num) {
if (line.empty()) return false;
size_t start_pos = 0;
char space = ' ';
while (true) {
size_t pos = line.find(space, start_pos);
if ((int)pos != -1) {
string data = line.substr(start_pos, pos - start_pos);
if (!data.empty()) {
if (start_pos == 0) {
cat = get_cat(data);
}
else {
int index = -1;
double v = 0;
get_features(data, index, v);
if (debug)
cout << "index: " << index << "," << "value: " << v << endl;
if (index != -1) {
index -= 1; // index from 0
x(line_num, index) = v;
}
}
}
start_pos = pos + 1;
}
else {
string data = line.substr(start_pos, pos - start_pos);
if (!data.empty()) {
cout << "read data: " << data << endl;
int index = -1;
double v = 0;
get_features(data, index, v);
if (debug)
cout << "index: " << index << "," << "value: " << v << endl;
if (index != -1) {
index -= 1; // index from 0
x(line_num, index) = v;
}
}
break;
}
}
return true;
}
I found the culprit. The reason for bad allocation error was that I was running out of memory. The thing is that I was using dense matrix representation (as provided by boost library). As such, storing a matrix of size 20000x40000 as dense matrix in boost matrix representation will require RAM of size 6.4GB. Now, if one don't have that much space in RAM, bad allocation is going to pop-up.

boost::fibonacci_heap copy constructor corrupts the source heap

I have a member function that prints a snapshot of a boost::fibonacci_heap
virtual void printSnapshot(std::ostream& ss) {
Heap heap(this->heap);
double prev_price = DBL_MAX;
while(heap.size() > 0) {
const Order& order = heap.top();
if(order.price != prev_price) {
if(prev_price != DBL_MAX) ss << std::endl;
ss << order.price << " | ";
}
ss << order.quantity << " ";
prev_price = order.price;
heap.pop();
}
ss << std::endl;
}
I call this member function in another member function, which does
while(std::getline(stream, line)) {
... // do something on this->heap.
this->printSnapshot(std::cout);
}
Since the heap is created through a copy constructor at the beginning of "printSnapshot", then "printSnapshot" should change this->heap. However, this program leads to segment fault, while the following does not:
while(std::getline(stream, line)) {
... // do something on this->heap.
// this->printSnapshot(std::cout);
}
Now, if we add a const keyword to the definition of printSnapshot, i.e.
virtual void printSnapshot(std::ostream& ss) const {
Heap heap(this->heap);
double prev_price = DBL_MAX;
while(heap.size() > 0) {
const Order& order = heap.top();
if(order.price != prev_price) {
if(prev_price != DBL_MAX) ss << std::endl;
ss << order.price << " | ";
}
ss << order.quantity << " ";
prev_price = order.price;
heap.pop();
}
ss << std::endl;
}
The segment fault disappears. How could this be explained?
The constructor of fibonacci_heap that takes a lvalue reference (non-const) apparently doesn't do the right things.
It's not documented what it should do: http://www.boost.org/doc/libs/1_55_0/doc/html/boost/heap/fibonacci_heap.html#idp21129704-bb
I assume this might be a reportable bug. I'll look into this a bit.
UPDATE Surprisingly the behaviour of this constructor is apparently equivalent to move-construction:
#ifndef BOOST_NO_CXX11_RVALUE_REFERENCES
/// \copydoc boost::heap::priority_queue::priority_queue(priority_queue &&)
fibonacci_heap(fibonacci_heap && rhs):
super_t(std::move(rhs)), top_element(rhs.top_element)
{
roots.splice(roots.begin(), rhs.roots);
rhs.top_element = NULL;
}
fibonacci_heap(fibonacci_heap & rhs):
super_t(rhs), top_element(rhs.top_element)
{
roots.splice(roots.begin(), rhs.roots);
rhs.top_element = NULL;
}
The latter has the weird side-effect of simply removing all roots from the original (intrusive) list. This looks like a clear-cut bug.
Simply removing this constructor makes the code work.
The essential workaround is to avoid the lvalue-ref constructor:
Heap cloned(static_cast<Heap const&>(this->heap));
Meanwhile here's a self-contained reproducer:
#include <boost/heap/fibonacci_heap.hpp>
#include <iostream>
#include <random>
namespace {
#undef DBL_MAX
static double DBL_MAX = std::numeric_limits<double>::max();
std::mt19937 rng;
//std::uniform_real_distribution<double> dist(100, 4000);
std::discrete_distribution<int> dist({1,1,1,1,1,1});
static auto price_gen = [&] {
static double values[] = {52.40, 12.30, 87.10, 388., 0.10, 23.40};
return values[dist(rng)];
};
}
struct Order {
double price = price_gen();
unsigned quantity = rand() % 4 + 1;
double subtotal() const { return price * quantity; }
bool operator<(Order const& other) const { return subtotal() < other.subtotal(); }
};
using Heap = boost::heap::fibonacci_heap<Order>;
struct Y {
virtual void printSnapshot(std::ostream &ss) {
//Heap cloned(static_cast<Heap const&>(this->heap));
Heap cloned(this->heap);
double prev_price = DBL_MAX;
while (cloned.size() > 0) {
const Order &order = cloned.top();
if (order.price != prev_price) {
if (prev_price != DBL_MAX)
ss << std::endl;
ss << order.price << " | ";
}
ss << order.quantity << " ";
prev_price = order.price;
cloned.pop();
}
ss << std::endl;
}
void generateOrders() {
for (int i=0; i<3; ++i) {
heap.push({});
}
}
Heap heap;
};
int main() {
Y y;
for(int i=0; i<10; ++i) {
y.generateOrders();
y.printSnapshot(std::cout);
}
}