String manipulation , at a complete loss

String manipulation , at a complete loss - c++

I am trying to grab sub-strings out of a larger string and I have got it to work in a small program but when I try to run it into the real program it just goes wrong. I am building off someone else s function and got it to work for my purpose, but cannot get it to work in the main program I need it in. I will limit the program down to where I think error is occurring.
Problem: I pass in same value into function findStats(std::string sString) but get different results.
Case I:
stats = findStats("^9dff9d[Attribute 0% Active Defense 0]\r^f2f3f2Mana: 1411 ^00ff00(+1975)\r^f2f3f2^9dff9d[Attribute 0% Active Mana 0]\r^f2f3f2^ffc000Fortify Level: 12/12\r^f2f3f2^006effIdentified Attribute: + 6% Crit Damage\rIdentified Attribute: + 6 Accuracy\r^f2f3f2^006eff^O053Sacrifice Elapse(6/8)\r^00ff00 ^O041Desollar's Shadow\rÌÌÌÌÌÌÌÌL«");
The above case will output correctly and stores \r offsets correctly.
Case II:
stats = findStats((std::string)((char*)&buffer));
Case II is the case I need to work and has the same value as above Case I at start of function findStats but offsets for \r Are not stored for w.e reason when sString has same value at start of function.
//Function that finds positioning of \r
void calc_z (std::string &s, std::vector<int> & z)
{
int len = s.size();
z.resize (len);
int l = 0, r = 0;
for (int i=1; i<len; ++i)
if (z[i-l]+i <= r)
z[i] = z[i-l];
else
{
l = i;
if (i > r) r = i;
for (z[i] = r-i; r<len; ++r, ++z[i])
if (s[r] != s[z[i]])
break;
--r;
}
}
std::vector<std::string> findStats(std::string sString){
//sString is exactly the same in value for both cases of stats at this point
int offSet = 0;
int sOffsets[100] = {};
std::vector<std::string> t1;
std::string main_string = sString;
std::string substring = "\r";
std::string working_string = substring + main_string;
std::vector<int> z;
calc_z(working_string, z);
for(int i = substring.size(); i < working_string.size(); ++i){
if(z[i] >=substring.size()){
sOffsets[offSet] = i;
offSet++;
}
}
.... code ....problem occurs right above offsets are not stored for \r
}
void main()
{
std::vector<std::string> stats;
std::string buffer[10];
...code...
...code to find string and store in buffer...
stats = findStats((std::string)((char*)&buffer));
//stats = findStats("^9dff9d[Attribute 0% Active Defense 0]\r^f2f3f2Mana: 1411 ^00ff00(+1975)\r^f2f3f2^9dff9d[Attribute 0% Active Mana 0]\r^f2f3f2^ffc000Fortify Level: 12/12\r^f2f3f2^006effIdentified Attribute: + 6% Crit Damage\rIdentified Attribute: + 6 Accuracy\r^f2f3f2^006eff^O053Sacrifice Elapse(6/8)\r^00ff00 ^O041Desollar's Shadow\rÌÌÌÌÌÌÌÌL«");
for( std::vector<std::string>::const_iterator i = stats.begin(); i != stats.end(); ++i)std::cout << *i << ' ' << std::endl;
std::cin.get();
}

This statement: (std::string)((char*)&buffer) does not do what you think it does.
std::vector is not a simple array.
If you take address of std::vector, that won't be the address of the first element within std::vector.
YOu can't just cast const char* or char* into std::string. You can, however, construct new std::string using provided const char* or char * c-style string. const char *str = asdf; std::string s = std::string(str);.
So, to summarize:
If you want to pass several strings at once in std::vector, pass the buffer by const reference
typedef std::vector<std::string> StringVector;
void test(const StringVector& v){
for (StringVector::const_iterator i = v.begin(); i != v.end(); i++)
std::cout << *i << std::endl;
}
...
StringVector strings;
test(strings);
If you want to WRITE something into std::vector, pass it by reference:
typedef std::vector<std::string> StringVector;
void test(const StringVector& out){
out.push_back("test");
}
...
StringVector strings;
test(strings);
If you want to pass a single string from, vector, just pass the element itself (by reference, const reference, or by value, depending on what you want to do with it), without casts.
typedef std::vector<std::string> StringVector;
void test(const std::string& s){
std::cout << s << std::endl;
}
...
StringVector strings;
strings.push_back("test");
test(strings[0]);
--edit--
IN addition to that:
std::vector<std::string> findStats(std::string sString){
//sString is exactly the same in value for both cases of stats at this point
int offSet = 0;
int sOffsets[100] = {};//<<here's a problem
Using array with fixed size in this case is a bad idea. Your array is small, and it WILL overflow on any string larger than 100 bytes, breaking/crashing your program. You can simply store results on std::vectro<std::string>, make vector of structs, or use std::map, depending on your goals.

Related

C++ Call string into function?

Not sure how to exactly explain this, sorry. I'm creating a function to find the first instance of a char in an array built by a given string. I have the function to create an array from the string and loop through the array, but not sure how to put it the array into the find function.
the tester is built like
stringName("Test test test");
stringName.find("e",0); //where 0 is the starting position, so it would return 1.
int SuperString::find(char c, int start) {
// put array grabber thing here
size = *(&data + 1) - data;
for(int i = start; i < size ; i++){
if(data[i] == c){
return i;
}
}
return -1;
}
This is what I have to make the string into an array.
SuperString::SuperString(std::string str) {
size = str.size();
data = new char[size];
for (int i = 0; i < size; i++) {
data[i] = str.at(i);
}
}
This is probably something easy I'm missing, but any help is appreciated.

You are passing a string literal, specifically a const char[2], where a single char is expected. Use 'e' instead of "e":
stringName.find('e', 0);
More importantly, size = *(&data + 1) - data; will only work when data is a (reference to a) fixed array (see How does *(&arr + 1) - arr give the length in elements of array arr?). It will not work when data is a pointer to an array, as it is in your case since you are allocating the array with new char[]. You will have to keep track of the array's size separately, which you appear to be doing, except that you are not actually using the size you obtained in the SuperString constructor. Just get rid of the line in find() that is trying to re-calculate size, use the value you already have:
int SuperString::find(char c, int start) {
// size = *(&data + 1) - data; // <-- GET RID OF THIS
for(int i = start; i < size; ++i){
if (data[i] == c){
return i;
}
}
return -1;
}
That being said, Your SuperString class can be greatly simplified if you just make its data member be a std::string instead of char*, eg:
#include <string>
class SuperString {
private:
std::string data;
...
public:
SuperString(const std::string &str);
int find(char c, int start = 0);
...
};
SuperString::SuperString(const std::string &str) : data(str) {
}
int SuperString::find(char c, int start) {
return (int) data.find(c, start);
}

Given a list of strings and word S. Check if S exists in the list or not

/*
What is the error in this code ? I always get false(0) even if the
string is included in the list. Is the logic served correct for the above question ?
*/
#include <iostream>
using namespace std;
bool ispresent(char (*stringlist)[100] , char *arr){
for (int i = 0 ; i < 7 ; i++){
if (stringlist[i] == arr){
return true;
}
}
return false;
}
int main(){
//given a list of strings
char stringlist[7][100] ={
"He",
"is",
"very",
"bad",
"instead",
"do",
"yourself"
};
//input word to check
char arr[50];
cin.getline(arr , 50 , '\n');
//check if word is present or not
bool found = ispresent(stringlist , arr) ;
cout << found;
return 0;
}

You should use the string comparison functions instead of ==. It doesn't work on strings. Example:
strcmp(stringlist[i], arr)
And include the library string.h
The comparison operator works on primitive variables not on pointers. When using pointers that represent other type of data, you should implement your own methods/functions (or use methods/functions provided by libraries) as the == operator only compares the references, not what they reference.

if (stringlist[i] == arr)
The reason you always get false is because you are using the == operator which will always compare one element of the c-string instead of an entire part of the string. string::find() is what does the job.
You should use std::string where possible so you don't have to allocate/deallocate memory. In std::string there is the str.find(str1) function which gives out the first index where str1 was found in str. You can use that in this fashion
Information about string::npos:
From cplusplus.com:
static const size_t npos = -1;
Maximum value for size_t
This value, when used as the value for a len (or sublen) parameter in
string's member functions, means "until the end of the string".
As a return value, it is usually used to indicate no matches.
This constant is defined with a value of -1, which because size_t is an >unsigned integral type, it is the largest possible representable value for >this type.
This should work:
#include <iostream>
#include <string>
// str is the string array
// str_size is the size of the array passed to the funcion
// str 1 is the string you are looking for.
bool ispresent(std::string str[], int str_size, std::string str1);
int main()
{
const int SIZE = 4;
std::string str0[SIZE];
std::cout << "Enter four strings:\n";
for (int i = 0; i < 4; i++)
std::cin >> (str0)[i];
std::string search_term;
std::cout << "Enter a search term:";
std::cin >> search_term;
bool result = ispresent(str0, SIZE, search_term);
// If output is 1 then it was found
std::cout << result;
return 0;
}
bool ispresent(std::string str[], int str_size, std::string str1)
{
for (int i = 0; i < str_size; i++)
{
// Use the find function in string on each element of the array.
if (str[i].find(str1) != std::string::npos)
return true; // Return true if found
}
// String not found
return false;
}

Remove extra white spaces in C++

I tried to write a script that removes extra white spaces but I didn't manage to finish it.
Basically I want to transform abc sssd g g sdg gg gf into abc sssd g g sdg gg gf.
In languages like PHP or C#, it would be very easy, but not in C++, I see. This is my code:
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <cstring>
#include <unistd.h>
#include <string.h>
char* trim3(char* s) {
int l = strlen(s);
while(isspace(s[l - 1])) --l;
while(* s && isspace(* s)) ++s, --l;
return strndup(s, l);
}
char *str_replace(char * t1, char * t2, char * t6)
{
char*t4;
char*t5=(char *)malloc(10);
memset(t5, 0, 10);
while(strstr(t6,t1))
{
t4=strstr(t6,t1);
strncpy(t5+strlen(t5),t6,t4-t6);
strcat(t5,t2);
t4+=strlen(t1);
t6=t4;
}
return strcat(t5,t4);
}
void remove_extra_whitespaces(char* input,char* output)
{
char* inputPtr = input; // init inputPtr always at the last moment.
int spacecount = 0;
while(*inputPtr != '\0')
{
char* substr;
strncpy(substr, inputPtr+0, 1);
if(substr == " ")
{
spacecount++;
}
else
{
spacecount = 0;
}
printf("[%p] -> %d\n",*substr,spacecount);
// Assume the string last with \0
// some code
inputPtr++; // After "some code" (instead of what you wrote).
}
}
int main(int argc, char **argv)
{
printf("testing 2 ..\n");
char input[0x255] = "asfa sas f f dgdgd dg ggg";
char output[0x255] = "NO_OUTPUT_YET";
remove_extra_whitespaces(input,output);
return 1;
}
It doesn't work. I tried several methods. What I am trying to do is to iterate the string letter by letter and dump it in another string as long as there is only one space in a row; if there are two spaces, don't write the second character to the new string.
How can I solve this?

There are already plenty of nice solutions. I propose you an alternative based on a dedicated <algorithm> meant to avoid consecutive duplicates: unique_copy():
void remove_extra_whitespaces(const string &input, string &output)
{
output.clear(); // unless you want to add at the end of existing sring...
unique_copy (input.begin(), input.end(), back_insert_iterator<string>(output),
[](char a,char b){ return isspace(a) && isspace(b);});
cout << output<<endl;
}
Here is a live demo. Note that I changed from c style strings to the safer and more powerful C++ strings.
Edit: if keeping c-style strings is required in your code, you could use almost the same code but with pointers instead of iterators. That's the magic of C++. Here is another live demo.

Here's a simple, non-C++11 solution, using the same remove_extra_whitespace() signature as in the question:
#include <cstdio>
void remove_extra_whitespaces(char* input, char* output)
{
int inputIndex = 0;
int outputIndex = 0;
while(input[inputIndex] != '\0')
{
output[outputIndex] = input[inputIndex];
if(input[inputIndex] == ' ')
{
while(input[inputIndex + 1] == ' ')
{
// skip over any extra spaces
inputIndex++;
}
}
outputIndex++;
inputIndex++;
}
// null-terminate output
output[outputIndex] = '\0';
}
int main(int argc, char **argv)
{
char input[0x255] = "asfa sas f f dgdgd dg ggg";
char output[0x255] = "NO_OUTPUT_YET";
remove_extra_whitespaces(input,output);
printf("input: %s\noutput: %s\n", input, output);
return 1;
}
Output:
input: asfa sas f f dgdgd dg ggg
output: asfa sas f f dgdgd dg ggg

Since you use C++, you can take advantage of standard-library features designed for that sort of work. You could use std::string (instead of char[0x255]) and std::istringstream, which will replace most of the pointer arithmetic.
First, make a string stream:
std::istringstream stream(input);
Then, read strings from it. It will remove the whitespace delimiters automatically:
std::string word;
while (stream >> word)
{
...
}
Inside the loop, build your output string:
if (!output.empty()) // special case: no space before first word
output += ' ';
output += word;
A disadvantage of this method is that it allocates memory dynamically (including several reallocations, performed when the output string grows).

There are plenty of ways of doing this (e.g., using regular expressions), but one way you could do this is using std::copy_if with a stateful functor remembering whether the last character was a space:
#include <algorithm>
#include <string>
#include <iostream>
struct if_not_prev_space
{
// Is last encountered character space.
bool m_is = false;
bool operator()(const char c)
{
// Copy if last was not space, or current is not space.
const bool ret = !m_is || c != ' ';
m_is = c == ' ';
return ret;
}
};
int main()
{
const std::string s("abc sssd g g sdg gg gf into abc sssd g g sdg gg gf");
std::string o;
std::copy_if(std::begin(s), std::end(s), std::back_inserter(o), if_not_prev_space());
std::cout << o << std::endl;
}

You can use std::unique which reduces adjacent duplicates to a single instance according to how you define what makes two elements equal is.
Here I have defined elements as equal if they are both whitespace characters:
inline std::string& remove_extra_ws_mute(std::string& s)
{
s.erase(std::unique(std::begin(s), std::end(s), [](unsigned char a, unsigned char b){
return std::isspace(a) && std::isspace(b);
}), std::end(s));
return s;
}
inline std::string remove_extra_ws_copy(std::string s)
{
return remove_extra_ws_mute(s);
}
std::unique moves the duplicates to the end of the string and returns an iterator to the beginning of them so they can be erased.
Additionally, if you must work with low level strings then you can still use std::unique on the pointers:
char* remove_extra_ws(char const* s)
{
std::size_t len = std::strlen(s);
char* buf = new char[len + 1];
std::strcpy(buf, s);
// Note that std::unique will also retain the null terminator
// in its correct position at the end of the valid portion
// of the string
std::unique(buf, buf + len + 1, [](unsigned char a, unsigned char b){
return (a && std::isspace(a)) && (b && std::isspace(b));
});
return buf;
}

for in-place modification you can apply erase-remove technic:
#include <string>
#include <iostream>
#include <algorithm>
#include <cctype>
int main()
{
std::string input {"asfa sas f f dgdgd dg ggg"};
bool prev_is_space = true;
input.erase(std::remove_if(input.begin(), input.end(), [&prev_is_space](unsigned char curr) {
bool r = std::isspace(curr) && prev_is_space;
prev_is_space = std::isspace(curr);
return r;
}), input.end());
std::cout << input << "\n";
}
So you first move all extra spaces to the end of the string and then truncate it.
The great advantage of C++ is that is universal enough to port your code to plain-c-static strings with only few modifications:
void erase(char * p) {
// note that this ony works good when initial array is allocated in the static array
// so we do not need to rearrange memory
*p = 0;
}
int main()
{
char input [] {"asfa sas f f dgdgd dg ggg"};
bool prev_is_space = true;
erase(std::remove_if(std::begin(input), std::end(input), [&prev_is_space](unsigned char curr) {
bool r = std::isspace(curr) && prev_is_space;
prev_is_space = std::isspace(curr);
return r;
}));
std::cout << input << "\n";
}
Interesting enough remove step here is string-representation independent. It will work with std::string without modifications at all.

I have the sinking feeling that good ol' scanf will do (in fact, this is the C school equivalent to Anatoly's C++ solution):
void remove_extra_whitespaces(char* input, char* output)
{
int srcOffs = 0, destOffs = 0, numRead = 0;
while(sscanf(input + srcOffs, "%s%n", output + destOffs, &numRead) > 0)
{
srcOffs += numRead;
destOffs += strlen(output + destOffs);
output[destOffs++] = ' '; // overwrite 0, advance past that
}
output[destOffs > 0 ? destOffs-1 : 0] = '\0';
}
We exploit the fact that scanf has magical built-in space skipping capabilities. We then use the perhaps less known %n "conversion" specification which gives us the amount of chars consumed by scanf. This feature frequently comes in handy when reading from strings, like here. The bitter drop which makes this solution less-than-perfect is the strlen call on the output (there is no "how many bytes have I actually just written" conversion specifier, unfortunately).
Last not least use of scanf is easy here because sufficient memory is guaranteed to exist at output; if that were not the case, the code would become more complex due to buffering and overflow handling.

Since you are writing c-style, here's a way to do what you want.
Note that you can remove '\r' and '\n' which are line breaks (but of course that's up to you if you consider those whitespaces or not).
This function should be as fast or faster than any other alternative and no memory allocation takes place even when it's called with std::strings (I've overloaded it).
char temp[] = " alsdasdl gasdasd ee";
remove_whitesaces(temp);
printf("%s\n", temp);
int remove_whitesaces(char *p)
{
int len = strlen(p);
int new_len = 0;
bool space = false;
for (int i = 0; i < len; i++)
{
switch (p[i])
{
case ' ': space = true; break;
case '\t': space = true; break;
case '\n': break; // you could set space true for \r and \n
case '\r': break; // if you consider them spaces, I just ignore them.
default:
if (space && new_len > 0)
p[new_len++] = ' ';
p[new_len++] = p[i];
space = false;
}
}
p[new_len] = '\0';
return new_len;
}
// and you can use it with strings too,
inline int remove_whitesaces(std::string &str)
{
int len = remove_whitesaces(&str[0]);
str.resize(len);
return len; // returning len for consistency with the primary function
// but u can return std::string instead.
}
// again no memory allocation is gonna take place,
// since resize does not not free memory because the length is either equal or lower
If you take a brief look at the C++ Standard library, you will notice that a lot C++ functions that return std::string, or other std::objects are basically a wrapper to a well written extern "C" function. So don't be afraid to use C functions in C++ applications, if they are well written and you can overload them to support std::strings and such.
For example, in Visual Studio 2015, std::to_string is written exactly like this:
inline string to_string(int _Val)
{ // convert int to string
return (_Integral_to_string("%d", _Val));
}
inline string to_string(unsigned int _Val)
{ // convert unsigned int to string
return (_Integral_to_string("%u", _Val));
}
and _Integral_to_string is a wrapper to a C function sprintf_s
template<class _Ty> inline
string _Integral_to_string(const char *_Fmt, _Ty _Val)
{ // convert _Ty to string
static_assert(is_integral<_Ty>::value,
"_Ty must be integral");
char _Buf[_TO_STRING_BUF_SIZE];
int _Len = _CSTD sprintf_s(_Buf, _TO_STRING_BUF_SIZE, _Fmt, _Val);
return (string(_Buf, _Len));
}

Well here is a longish(but easy) solution that does not use pointers.
It can be optimized further but hey it works.
#include <iostream>
#include <string>
using namespace std;
void removeExtraSpace(string str);
int main(){
string s;
cout << "Enter a string with extra spaces: ";
getline(cin, s);
removeExtraSpace(s);
return 0;
}
void removeExtraSpace(string str){
int len = str.size();
if(len==0){
cout << "Simplified String: " << endl;
cout << "I would appreciate it if you could enter more than 0 characters. " << endl;
return;
}
char ch1[len];
char ch2[len];
//Placing characters of str in ch1[]
for(int i=0; i<len; i++){
ch1[i]=str[i];
}
//Computing index of 1st non-space character
int pos=0;
for(int i=0; i<len; i++){
if(ch1[i] != ' '){
pos = i;
break;
}
}
int cons_arr = 1;
ch2[0] = ch1[pos];
for(int i=(pos+1); i<len; i++){
char x = ch1[i];
if(x==char(32)){
//Checking whether character at ch2[i]==' '
if(ch2[cons_arr-1] == ' '){
continue;
}
else{
ch2[cons_arr] = ' ';
cons_arr++;
continue;
}
}
ch2[cons_arr] = x;
cons_arr++;
}
//Printing the char array
cout << "Simplified string: " << endl;
for(int i=0; i<cons_arr; i++){
cout << ch2[i];
}
cout << endl;
}

I don't know if this helps but this is how I did it on my homework. The only case where it might break a bit is when there is spaces at the beginning of the string EX " wor ds " In that case, it will change it to " wor ds"
void ShortenSpace(string &usrStr){
char cha1;
char cha2;
for (int i = 0; i < usrStr.size() - 1; ++i) {
cha1 = usrStr.at(i);
cha2 = usrStr.at(i + 1);
if ((cha1 == ' ') && (cha2 == ' ')) {
usrStr.erase(usrStr.begin() + 1 + i);
--i;//edit: was ++i instead of --i, made code not work properly
}
}
}

I ended up here for a slighly different problem. Since I don't know where else to put it, and I found out what was wrong, I share it here. Don't be cross with me, please.
I had some strings that would print additional spaces at their ends, while showing up without spaces in debugging. The strings where formed in windows calls like VerQueryValue(), which besides other stuff outputs a string length, as e.g. iProductNameLen in the following line converting the result to a string named strProductName:
strProductName = string((LPCSTR)pvProductName, iProductNameLen)
then produced a string with a \0 byte at the end, which did not show easily in de debugger, but printed on screen as a space. I'll leave the solution of this as an excercise, since it is not hard at all, once you are aware of this.

Array element never successfully added? (C++)

I'm fairly new to C++. I tried implementing a really simple hash table, and then I wanted to see if my hashing algorithm put the element in the correct position. However, apparently the element wasn't even added to the array at all:
void add(string str, array<string, 2000> data) {
int i = makeHash(str) % data.size();
while (data[i++ % data.size()].compare("") != 0)
continue;
data[i % data.size()] = str;
cout << "Added!"; // successfully prints, meaning str was added to data
}
int main() {
array<string, 2000> data;
string str = "The quick brown fox something something";
add(str, data);
for (int i = 0; i < data.size(); i++)
if (data[i].compare(str) == 0)
cout << i; // never prints... so str was never added to data?
return 0;
}

You need to pass data variable as reference -
void add(string str, array<string, 2000> &data)
What you are doing here is pass by value, so as soon as your function ends, value of data is destroyed.

Try passing data by reference, i.e.
void add ( string str, array<string, 2000>& data ){...}
Passing by value will mean a copy of data will be passed into the function.
Also, there's an off by one error. I'm sure you want to have:
data[(i-1) % data.size()] = str;
because i will still be incremented when you exit the while loop.

Efficient parsing of mmap file

Following is the code for creating a memory map file using boost.
boost::iostreams::mapped_file_source file;
boost::iostreams::mapped_file_params param;
param.path = "\\..\\points.pts"; //! Filepath
file.open(param, fileSize);
if(file.is_open())
{
//! Access the buffer and populate the ren point buffer
const char* pData = file.data();
char* pData1 = const_cast<char*>(pData); //! this gives me all the data from Mmap file
std::vector<RenPoint> readPoints;
ParseData( pData1, readPoints);
}
The implementation of ParseData is as follows
void ParseData ( char* pbuffer , std::vector<RenPoint>>& readPoints)
{
if(!pbuffer)
throw std::logic_error("no Data in memory mapped file");
stringstream strBuffer;
strBuffer << pbuffer;
//! Get the max number of points in the pts file
std::string strMaxPts;
std::getline(strBuffer,strMaxPts,'\n');
auto nSize = strMaxPts.size();
unsigned nMaxNumPts = GetValue<unsigned>(strMaxPts);
readPoints.clear();
//! Offset buffer
pbuffer += nSize;
strBuffer << pbuffer;
std::string cur_line;
while(std::getline(strBuffer, cur_line,'\n'))
{
//! How do I read the data from mmap file directly and populate my renpoint structure
int yy = 0;
}
//! Working but very slow
/*while (std::getline(strBuffer,strMaxPts,'\n'))
{
std::vector<string> fragments;
istringstream iss(strMaxPts);
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter<vector<string>>(fragments));
//! Logic to populate the structure after getting data back from fragments
readPoints.push_back(pt);
}*/
}
I have say a minimum of 1 million points in my data structure and I want to optimize my parsing. Any ideas ?

read in header information to get the number of points
reserve space in a std::vector for N*num_points (N=3 assuming only X,Y,Z, 6 with normals, 9 with normals and rgb)
load the remainder of the file into a string
boost::spirit::qi::phrase_parse into the vector.
//code here can parse a file with 40M points (> 1GB) in about 14s on my 2 year old macbook:
#include <boost/spirit/include/qi.hpp>
#include <fstream>
#include <vector>
template <typename Iter>
bool parse_into_vec(Iter p_it, Iter p_end, std::vector<float>& vf) {
using boost::spirit::qi::phrase_parse;
using boost::spirit::qi::float_;
using boost::spirit::qi::ascii::space;
bool ret = phrase_parse(p_it, p_end, *float_, space, vf);
return p_it != p_end ? false : ret;
}
int main(int argc, char **args) {
if(argc < 2) {
std::cerr << "need a file" << std::endl;
return -1;
}
std::ifstream in(args[1]);
size_t numPoints;
in >> numPoints;
std::istreambuf_iterator<char> eos;
std::istreambuf_iterator<char> it(in);
std::string strver(it, eos);
std::vector<float> vf;
vf.reserve(3 * numPoints);
if(!parse_into_vec(strver.begin(), strver.end(), vf)) {
std::cerr << "failed during parsing" << std::endl;
return -1;
}
return 0;
}

AFAICT, you're currently copying the entire contents of the file into strBuffer.
What I think you want to do is use boost::iostreams::stream with your mapped_file_source instead.
Here's an untested example, based on the linked documentation:
// Create the stream
boost::iostreams::stream<boost::iostreams::mapped_file_source> str("some/path/file");
// Alternately, you can create the mapped_file_source separately and tell the stream to open it (using a copy of your mapped_file_source)
boost::iostreams::stream<boost::iostreams::mapped_file_source> str2;
str2.open(file);
// Now you can use std::getline as you normally would.
std::getline(str, strMaxPts);
As an aside, I'll note that by default mapped_file_source maps the entire file, so there's no need to pass the size explicitly.

You can go with something like this (just a fast concept, you'll need to add some additional error checking etc.):
#include "boost/iostreams/stream.hpp"
#include "boost/iostreams/device/mapped_file.hpp"
#include "boost/filesystem.hpp"
#include "boost/lexical_cast.hpp"
double parse_double(const std::string & str)
{
double value = 0;
bool decimal = false;
double divisor = 1.0;
for (std::string::const_iterator it = str.begin(); it != str.end(); ++it)
{
switch (*it)
{
case '.':
case ',':
decimal = true;
break;
default:
{
const int x = *it - '0';
value = value * 10 + x;
if (decimal)
divisor *= 10;
}
break;
}
}
return value / divisor;
}
void process_value(const bool initialized, const std::string & str, std::vector< double > & values)
{
if (!initialized)
{
// convert the value count and prepare the output vector
const size_t count = boost::lexical_cast< size_t >(str);
values.reserve(count);
}
else
{
// convert the value
//const double value = 0; // ~ 0:20 min
const double value = parse_double(str); // ~ 0:35 min
//const double value = atof(str.c_str()); // ~ 1:20 min
//const double value = boost::lexical_cast< double >(str); // ~ 8:00 min ?!?!?
values.push_back(value);
}
}
bool load_file(const std::string & name, std::vector< double > & values)
{
const int granularity = boost::iostreams::mapped_file_source::alignment();
const boost::uintmax_t chunk_size = ( (256 /* MB */ << 20 ) / granularity ) * granularity;
boost::iostreams::mapped_file_params in_params(name);
in_params.offset = 0;
boost::uintmax_t left = boost::filesystem::file_size(name);
std::string value;
bool whitespace = true;
bool initialized = false;
while (left > 0)
{
in_params.length = static_cast< size_t >(std::min(chunk_size, left));
boost::iostreams::mapped_file_source in(in_params);
if (!in.is_open())
return false;
const boost::iostreams::mapped_file_source::size_type size = in.size();
const char * data = in.data();
for (boost::iostreams::mapped_file_source::size_type i = 0; i < size; ++i, ++data)
{
const char c = *data;
if (strchr(" \t\n\r", c))
{
// c is whitespace
if (!whitespace)
{
whitespace = true;
// finished previous value
process_value(initialized, value, values);
initialized = true;
// start a new value
value.clear();
}
}
else
{
// c is not whitespace
whitespace = false;
// append the char to the value
value += c;
}
}
if (size < chunk_size)
break;
in_params.offset += chunk_size;
left -= chunk_size;
}
if (!whitespace)
{
// convert the last value
process_value(initialized, value, values);
}
return true;
}
Note that your main problem will be the conversion from string to float, which is very slow (insanely slow in the case of boost::lexical_cast). With my custom special parse_double func it is faster, however it only allows a special format (e.g. you'll need to add sign detection if negative values are allowed etc. - or you can just go with atof if all possible formats are needed).
If you'll want to parse the file faster, you'll probably need to go for multithreading - for example one thread only parsing the string values and other one or more threads converting the loaded string values to floats. In that case you probably won't even need the memory mapped file, as the regular buffered file read might suffice (the file will be read only once anyway).

A few quick comments on your code:
1) you're not reserving space for your vector so it's doing expansion every time you add a value. You have read the number of points from the file so call reserve(N) after the clear().
2) you're forcing a map of the entire file in one hit which will work on 64 bits but is probably slow AND is forcing another allocation of the same amount of memory with strBuffer << pbuffer;
http://www.boost.org/doc/libs/1_53_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.mapped_file.mapped_file_mapping_regions shows how to getRegion
Use a loop through getRegion to load an estimated chunk of data containing many lines. You are going to have to handle partial buffers - each getRegion will likely end with part of a line you need to preserve and join to the next partial buffer starting the next region.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

String manipulation , at a complete loss - c++

Related

C++ Call string into function?

Given a list of strings and word S. Check if S exists in the list or not

Remove extra white spaces in C++

Array element never successfully added? (C++)

Efficient parsing of mmap file

Categories

Resources