Proper implementation for string find() function? - c++

I am trying to push doubles to a stack given a string from stdin and until EOF. The string can be composed of doubles, ints, chars, and single spaces.
Currently, I'm utilizing the substring and find() function to account for the whitespace. It works most of the time, but for various input in which a single int is being read (exhibited below), the find() function appears to be clobbering any trailing char.
I've tried to use a variety of the different string functions to try and re-implement the way that I parse the input -- none of which has been successful.
while(std::getline(std::cin, string, '\n')){
for(unsigned int x = 0; x < string.size(); x++){
std::cout << "You read " << string[x] << std::endl;
if(isdigit(string[x])){
do{
// Get the number, stopping at the first instance of ws
std::string get_str = string.substr(x, string.find(' '));
std::cout << "You're converting " << get_str << std::endl;
// Convert it to a double
double num = stod(get_str);
std::cout << "You pushed " << number << std::endl;
// Push it to the stack
stack.push(number);
// Get the new increment
std::cout << "The size is " << get_str.size() << std::endl;
x+= get_str.size();
} while(string[x] >= '0' && string[x] <= '9');
}
/* else, do other things... */
Given an input of
100 200 + 2 /
The output is:
You read 1
You're converting 100
You pushed 100
The size is 3
You read 2
You're converting 200
You pushed 200
The size is 3
You read +
You read
You read 2
You're converting 2 /
You pushed 2
The size is 3
Specifically, I am wondering why the 3rd to last line 'You're converting 2 / ' includes the '/' when I had utilized string.find(' ') in my code as a delimiter. And given this issue, how would I be able to fix it so that only 2 is 'converted'?
Any assistance and feedback is appreciated!

The one parameter find will start the search at the beginning of the string, and return an index of the matched character. The second parameter to substr is a count of characters. Put those together with your input and you get a substring with 3 characters.

Related

A 'stack overflow' error returns upon any array size I enter above 36603. How can I make a string capable of capturing my entire .txt file?

I need to create a string capable of holding the entire book 'The Hunger Games' which comes out to around 100500 words. My code can capture samples of the txt, but anytime I exceed a string size of 36603(tested), I receive a 'stack overflow' error.
I can successfully capture anything below 36603 elements and can output them perfectly.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
int i;
char set[100];
string fullFile[100000]; // this will not execute if set to over 36603
ifstream myfile("HungerGames.txt");
if (myfile.is_open())
{
// saves 'i limiter' words from the .txt to fullFile
for (i = 0; i < 100000; i++) {
//each word is saparated by a space
myfile.getline(set, 100, ' ');
fullFile[i] = set;
}
myfile.close();
}
else cout << "Unable to open file";
//prints 'i limiter' words to window
for (i = 0; i < 100000; ++i) {
cout << fullFile[i] << ' ';
}
What is causing the 'stack overflow' and how can I successfully capture the txt? I will later be doing a word counter and word frequency counter, so I need it in "word per element" form.
There's a limit on how much stack is used in a function; Use std::vector instead.
More here and here. The default in Visual studio is 1MB (more info here) and you can change it with /F, but this is a bad idea generally.
My system is Lubuntu 18.04, with g++ 7.3. The following snippet shows some "implementation details" of my system, and how to report them on yours. It would help you to understand what your system provides ...
void foo1()
{
int i; // Lubuntu
cout << "\n sizeof(i) " << sizeof(i) << endl; // 4 bytes
char c1[100];
cout << "\n sizeof(c1) " << sizeof(c1) << endl; // 100 bytes
string s1; // empty string
cout << "\n s1.size() " << s1.size() // 0 bytes
<< " sizeof(s1) " << sizeof(s1) << endl; // 32 bytes
s1 = "1234567890"; // now has 10 chars
cout << "\n s1.size() " << s1.size() // 10 bytes
<< " sizeof(s1) " << sizeof(s1) << endl; // 32 bytes
string fullFile[100000]; // this is an array of 100,000 strings
cout << "\n sizeof(fullFile) " // total is vvvvvvvvv
<< sops.digiComma(sizeof(fullFile)) << endl; // 3,200,000 bytes
uint64_t totalChars = 0;
for( auto ff : fullFile ) totalChars += ff.size();
cout << "\n total chars in all strings " << totalChars << endl;
}
What is causing the 'stack overflow' and how can I successfully
capture the txt?
The fullFile array is an unfortunate choice ... because each std::string, even when empty, consumes 32 bytes of automatic memory (~stack), for a total of 3,200,000 bytes, and this is with no data in the strings! This will stack overflow your system when the stack is smaller than the automatic var space.
On Lubuntu the default automatic-memory size (lately) is 10 M Bytes, so not a problem for me. But you will have to check on what your version of your target os defaults to. I think Windows defaults down near 1 M Byte. (Sorry, I don't know how to check Windows automatic-memory size.)
How can I make a string capable of capturing my entire .txt file.
The answer is -- you don't need to make your own. (unless you have some unstated requirement)
Also, you really should look at en.cppreference.com/w/cpp/string/basic_string/append".
In my 1st snippet above, you should take notice that the sizeof(string) reports 32 bytes, regardless of how many chars are in it.
Think on that a while ... if you put 1000 chars into a string, where do they go? The objects stays at 32 bytes! You might guess or read that the string object handles memory management on your behalf, and puts all characters into dynamic-memory (heap).
On my system, heap is about 4 G bytes. That's a lot more than stack.
In summary, every single std::string expands auto-magically, using heap, so if your text input will fit in heap, it will fit into '1 std::string'.
While browsing around in the cppreference, check out the 'string::reserve()' command.
Conclusion:
Any std::string you declare can auto-magically 'grow' to support your need, and will thus hold the entire text (if it will fit in memory).
Operationally, you simply get a line of text from the file, then append it to the single string, until the entire file is contained. You only need the one array, which std::string provides.
With this new idea ... I suggest you change fullFile from an array to a string.
string fullFile; // file will expand to handle append actions
// to the limit of available heap.
// open file ... check status
do {
myfile.getline(line); // fetch line of text up thru the line feed
// Note that getline does not put the \n into 'line'
// there are file state checks that should be done (perhaps here?)
// tbd - line += '\n';
// you may need the line feed in your fullFile string?
fullFile += line; // append the line
} while (!myfile.eof); // check for eof
// ... other file cleanup.
foo1() output on Lubuntu 18.04, g++ v7.3
sizeof(i) 4
sizeof(c1) 100
s1.size() 0 sizeof(s1) 32
s1.size() 10 sizeof(s1) 32
sizeof(fullFile) 3,200,000
total chars in all strings 0
Example slurp() :
string slurp(ifstream& sIn)
{
stringstream ss;
ss << sIn.rdbuf();
dtbAssert(!sIn.bad());
if(sIn.bad())
throw "\n DTB::slurp(sIn) 'ss << sIn.rdbuf()' is bad";
ss.clear(); // clear flags
return ss.str();
}

Comparing Two C++ Vectors of different length when one holds lines and one holds letters, want to see what letters are in lines

Basically, im writing a program that is supposed to compare a char vector to a string vector made out of each line from a file, the char vector is supposed to iterate through the string vector.
when it finds a match it should spit out the line number and the offset of the character and save them as variables.
I've started on this but the way im doing it now I'm having a problem where if the message that needs to be encoded in the position of the characters contains more then one of the same letter, it only prints it once, and it also prints the characters out of order.
for example if the message was AGAIN A would only be printed once and the message thats left would be scrambled. Here is the way i've done it below.
for (int i = 0; i < bookVector.size(); i++)
for (int x = 0; x < bookVector[i].size(); x++)
if (find(messageVector.begin(), messageVector.end(), bookVector[i][x]) != messageVector.end())
{
cout << "found a match at "
<< bookVector[i][x]
<< " at positions "
<< i << "," << x << endl;
}

Storing data in char array causing corruption around variable

I am working on a C++ project and I am having an issue.
Below is my code
tempfingerprint = libssh2_hostkey_hash(session, LIBSSH2_HOSTKEY_TYPE_RSA);
char temp[48];
memset(temp, 0, sizeof(temp));
for (i = 0; i < 16; i++)
{
//fingerprintstream << (unsigned char)tempfingerprint[i] << ":";
if (temp[0] == 0)
{
sprintf(temp, "%02X:", (unsigned char)tempfingerprint[i]);
}
else
{
//sprintf(temp, "%s:%02X", temp, (unsigned char)tempfingerprint[i]);
char characters[3];
memset(characters, 0, sizeof(characters));
//If less than 16, then add the colon (:) to the end otherwise don't bother as we're at the end of the fingerprint
sprintf(characters, "%02X:", (unsigned char)tempfingerprint[i]);
strcat(temp, characters);
}
}
//Remove the end colon as its not needed. 48 Will already be null terminated, so the previous will contain the last colon
temp[47] = 0;
return string(temp);
When I run my app, I get the following error from visual studio
Run-Time-Check Failure #2 - Stack around the variable 'temp' was corrupted.
I've ran the same code on Linux through Valgrind and no errors were shown so I'm not sure what the problem is with Windows.
Here's an approach using on what Paul McKenzie's talking about (though he might implement it differently) based on it looks like you were trying to do with the stream
#include <iostream>
#include <sstream>
#include <iomanip> // output format modifiers
using namespace std;
int main()
{
stringstream fingerprintstream;
// set up the stream to print uppercase hex with 0 padding if required
fingerprintstream << hex << uppercase << setfill('0');
// print out the first value without a ':'
fingerprintstream << setw(2) << 0;
for (int i = 1; i < 16; i++) // starting at 1 because first has already been handled.
{
// print out the rest prepending the ':'
fingerprintstream << ":" << setw(2) << i;
}
// print results
std::cout << fingerprintstream.str();
return 0;
}
Output:
00:01:02:03:04:05:06:07:08:09:0A:0B:0C:0D:0E:0F
Just realized what I think OP ran up against with the garbage output. When you output a number, << will use the appropriate conversion to get text, but if you output a character << prints the character. So fingerprintstream << (unsigned char)tempfingerprint[i]; takes the binary value at tempfingerprint[i] and, thanks to the cast, tries to render it as a character. Rather than "97", you will get (assuming ASCII) "a". A large amount of what you try to print will give nonsense characters.
Example: If I change
fingerprintstream << ":" << setw(2) << i;
to
fingerprintstream << ":" << setw(2) << (unsigned char)i;
the output becomes
0?:0?:0?:0?:0?:0?:0?:0?:0?:0?:0 :0
:0?:0?:0
:0?:0?
Note the tab and the line feeds.
I need to know the definition of tempfingerprint to be sure, but you can probably solve the garbage output problem by removing the cast.
Based on new information, tempfingerprint is const char *, so tempfingerprint[i] is a char and will be printed as a character.
We want a number, so we have to force the sucker to be an integer.
static_cast<unsigned int>(tempfingerprint[i]&0xFF)
the &0xFF masks out everything but the last byte, eliminating sign extension of negative numbers into huge positive numbers when displayed unsigned.
There are, as far as I see, two issues in the code which lead to exceeding array boundaries:
First, with char temp[48] you reserve exactly 48 characters for storing results; However, when calling strcat(temp, characters) with the 16th value, and characters comprises at least the characters including the colon, then temp will comprise 16*3 digits/colons + one terminating '\0'-character, i.e. 49 characters (not 48). Note that strcat automatically appends a string terminating char.
Second, you define char characters[3] such that you reserve place for two digits and the colon, but not for the terminating '\0'-character. Hence, an sprintf(characters, "%02X:",...) will exceed characterss array bounds, as sprintf also appends the string terminator.
So, if you do not want to rewrite your code in general, changing your definitions to char temp[49] and char characters[4] will solve the problem.

Using fscanf to read from tabbed file with ints and floats in C++

I have looked for a day or so on StackOverflow and other sites, and I can't find a solution to my problem. There are some that are similar, but I can't seem to make them work.
I have a tab-delimited .txt file. One line contains a heading, and 500 lines after that each contain an integer, an integer, a float, an integer, and an integer, in that order. I have a function that is supposed to read the first and third values (the first integer and the float) from each line. It skips the first line. This is in a do-while loop, because I need to be able to process files of different lengths. However, it's getting stuck in the loop. I have it set to output the mean, but it just outputs zeros forever.
void HISTS::readMeans(int rnum) {
int r;
char skip[500];
int index = 0; int area = 0; double mean = 0; int min = 0; int max = 0;
FILE *datafile = fopen(fileName,"r");
if(datafile == NULL) cout << "No such file!\n";
else {
//ignore the first line of the file
r = fscanf(datafile,"%s\n",skip);
cout << skip << endl; //just to check
//this is the problematic code
do {
r = fscanf(datafile,"%d\t%d\t%f\t%d\t%d\n",&index,&area,&mean,&min,&max);
cout << mean << " ";
} while(feof(datafile) != 1)
}
fclose(datafile);
}
Here is a sample data file of the format I'm trying to read:
Area Mean Min Max
1 262144 202.448 160 687
2 262144 201.586 155 646
3 262144 201.803 156 771
Thanks!
Edit: I said I need to read the first and third value, and I know I'm reading all of them. Eventually I need to store the first and third value, but I cut that part for the sake of brevity. Not that this comment is brief.
You should do it C++ style,
#include <iostream>
#include <fstream>
int main() {
std::ifstream inf("file.txt");
if (!inf) { exit(1); }
int idx, area, min, max;
double mean;
while (inf >> idx >> area >> mean >> min >> max) {
if (inf.eof()) break;
std::cout << idx << " " << area << " " << mean << " " << min << " " << max << std::endl;
}
return 0;
}
It is :
1) Easy to read.
2) Less code, so less chance of error.
3) Correct handling of EOF.
Although I have left handling of first line, that is upto you.
fscanf returns the number of arguments read. Thus, if it returns less than 5 you should exit the loop.
OP ended up using operator>>, which is the correct way to do this in C++. However, for the interested C reader, there were a couple of issues in the code posted:
mean was declared as double but read using the wrong format specifier %f instead of %lf.
The first line wasn't completely read, but only the first token, Area.
A possible way to implement the desired task is as follows:
r = fscanf(datafile,"%[^\n]\n",skip);
// ^^^^^ read till newline
while ( (r = fscanf(datafile,"%d%d%lf%d%d",&index,&area,&mean,&min,&max)) == 5 ) {
// ^^ correct format specifier for double
// ...
}

How to convert vector to string and convert back to vector

----------------- EDIT -----------------------
Based on juanchopanza's comment : I edit the title
Based on jrok's comment : I'm using ofstream to write, and ifstream to read.
I'm writing 2 programs, first program do the following tasks :
Has a vector of integers
convert it into array of string
write it in a file
The code of the first program :
vector<int> v = {10, 200, 3000, 40000};
int i;
stringstream sw;
string stringword;
cout << "Original vector = ";
for (i=0;i<v.size();i++)
{
cout << v.at(i) << " " ;
}
cout << endl;
for (i=0;i<v.size();i++)
{
sw << v[i];
}
stringword = sw.str();
cout << "Vector in array of string : "<< stringword << endl;
ofstream myfile;
myfile.open ("writtentext");
myfile << stringword;
myfile.close();
The output of the first program :
Original vector : 10 200 3000 40000
Vector in string : 10200300040000
Writing to File .....
second program will do the following tasks :
read the file
convert the array of string back into original vector
----------------- EDIT -----------------------
Now the writing and reading is fine, thanks to Shark and Jrok,I am using a comma as a separator. The output of first program :
Vector in string : 10,200,3000,40000,
Then I wrote the rest of 2nd program :
string stringword;
ifstream myfile;
myfile.open ("writtentext");
getline (myfile,stringword);
cout << "Read From File = " << stringword << endl;
cout << "Convert back to vector = " ;
for (int i=0;i<stringword.length();i++)
{
if (stringword.find(','))
{
int value;
istringstream (stringword) >> value;
v.push_back(value);
stringword.erase(0, stringword.find(','));
}
}
for (int j=0;j<v.size();i++)
{
cout << v.at(i) << " " ;
}
But it can only convert and push back the first element, the rest is erased. Here is the output :
Read From File = 10,200,3000,40000,
Convert back to vector = 10
What did I do wrong? Thanks
The easiest thing would be to insert a space character as a separator when you're writing, as that's the default separator for operator>>
sw << v[i] << ' ';
Now you can read back into an int variable directly, formatted stream input will do the conversion for you automatically. Use vector's push_back method to add values to it as you go.
Yes, this question is over a year old, and probably completely irrelevant to the original asker, but Google led me here so it might lead others here too.
When posting, please post a complete minimal working example, having to add #include and main and stuff is time better spent helping. It's also important because of your very problem.
Why your second code isn't working is all in this block
for (int i=0;i<stringword.length();i++)
{
if (stringword.find(','))
{
int value;
istringstream (stringword) >> value;
v.push_back(value);
stringword.erase(0, stringword.find(','));
}
}
istringstream (stringword) >> value interprets the data up to the comma as an integer, the first value, which is then stored.
stringword.find(',') gets you the 0-indexed position of the comma. A return value of 0 means that the character is the first character in the string, it does not tell you whether there is a comma in the string. In that case, the return value would be string::npos.
stringword.erase deletes that many characters from the start of the string. In this case, it deletes 10, making stringword ,200,3000,40000. This means that in the next iteration stringword.find(',') returns 0.
if (stringword.find(',')) does not behave as wished. if(0) casts the integer to a bool, where 0 is false and everything else is true. Therefore, it never enters the if-block again, as the next iterations will keep checking against this unchanged string.
And besides all that there's this:
for (int j=0;j<v.size();i++)
{
cout << v.at(i) << " " ;
}
it uses i. That was declared in a for loop, in a different scope.
The code you gave simply doesn't compile, even with the added main and includes. Heck, v isn't even defined in the second program.
It is however not enough, as the for condition stringword.length() is recalculated every loop. In this specific instance it works, because your integers get an extra digit each time, but let's say your input file is 1,2,3,4,:
The loop executes normally three times
The fourth time, stringword is 4, stringword.length() returns 2, but i is already valued 3, so i<stringword.length() is invalid, and the loop exits.
If you want to use the string's length as a condition, but edit the string during processing, store the value before editing. Even if you don't edit the string, this means less calls to length().
If you save length beforehand, in this new scenario that would be 8. However, after 4 loops string is already empty, and it executes the for loop some more times with no effect.
Instead, as we are editing the string to become empty, check for that.
All this together makes for radically different code altogether to make this work:
while (!stringword.empty())
{
int value;
istringstream (stringword) >> value;
v.push_back(value);
stringword.erase(0, stringword.find(',')+1);
}
for (int i = 0; i < v.size(); i++)
{
cout << v.at(i) << " " ;
}
A different way to solve this would have been to not try to find from the start, but from index i onwards, leaving a string of commas. But why stick to messy stuff if you can just do this.
And that's about it.