Reading from a text file properly - c++

I am reading string from a line in a text file and for some reason the the code will not read the whole text file. It reads to some random point and then stops and leaves out several words from a line or a few lines. Here is my code.
string total;
while(file >> word){
if(total.size() <= 40){
total += ' ' + word;
}
else{
my_vector.push_back(total);
total.clear();
}
Here is an example of a file
The programme certifies that all nutritional supplements and/or ingredients that bear the Informed-Sport logo have been tested for banned substances by the world class sports anti-doping lab, LGC. Athletes choosing to use supplements can use the search function above to find products that have been through this rigorous certification process.
It reads until "through" and leaves out the last four words.
I expected the output to be the whole file. not just part of it.
This is how I printed the vector.
for(int x = 0; x< my_vector.size(); ++x){
cout << my_vector[x];
}

You missed two things here:
First: in case when total.size() is not <= 40 i.e >40 it moves to else part where you just update your my_vector but ignore the current data in word which you read from the file. You actually need to to update the total after total.clear().
Second: when your loop is terminated you ignore the data in word as well. you need to consider that and push_back()in vector (if req, depends on your program logic).
So overall you code is gonna look like this.
string total;
while(file >> word)
{
if(total.size() <= 40)
{
total += ' ' + word;
}
else
{
my_vector.push_back(total);
total.clear();
total += ' ' + word;
}
}
my_vector.push_back(total);//this step depends on your logic
//that what u actually want to do

Your loop finishes when the end of file is read. However at this point you still have data in total. Add something like this after the loop:
if(!total.empty()) {
my_vector.push_back(total);
}
to add the last bit to the vector.

There are two problems:
When 40 < total.size() only total is pushed to my_vector but the current word is not. You should probably unconditionally append the word to total and then my_vector.push_back(total) if 40 < total.size().
When the loop terminated you still need to push_back() the content of total as it may not have reached a size of more than 40. That is, if total is no-empty after the loop terminated, you still need to append it to my_vector.

Related

Time limit exceeded on test 10 code forces

hello i am a beginner in programming and am in the array lessons ,i just know very basics like if conditions and loops and data types , and when i try to solve this problem.
Problem Description
When Serezha was three years old, he was given a set of cards with letters for his birthday. They were arranged into words in the way which formed the boy's mother favorite number in binary notation. Serezha started playing with them immediately and shuffled them because he wasn't yet able to read. His father decided to rearrange them. Help him restore the original number, on condition that it was the maximum possible one.
Input Specification
The first line contains a single integer n (1⩽n⩽105) — the length of the string. The second line contains a string consisting of English lowercase letters: 'z', 'e', 'r', 'o' and 'n'.
It is guaranteed that it is possible to rearrange the letters in such a way that they form a sequence of words, each being either "zero" which corresponds to the digit 00 or "one" which corresponds to the digit 11.
Output Specification
Print the maximum possible number in binary notation. Print binary digits separated by a space. The leading zeroes are allowed.
Sample input:
4
ezor
Output:
0
Sample Input:
10
nznooeeoer
Output:
1 1 0
i got Time limit exceeded on test 10 code forces and that is my code
#include <iostream>
using namespace std;
int main()
{
int n;
char arr[10000];
cin >> n;
for (int i = 0; i < n; i++) {
cin >> arr[i];
}
for (int i = 0; i < n; i++) {
if (arr[i] == 'n') {
cout << "1"
<< " ";
}
}
for (int i = 0; i < n; i++) {
if (arr[i] == 'z') {
cout << "0"
<< " ";
}
}
}
Your problem is a buffer overrun. You put an awful 10K array on the stack, but the problem description says you can have up to 100K characters.
After your array fills up, you start overwriting the stack, including the variable n. This makes you try to read too many characters. When your program gets to the end of the input, it waits forever for more.
Instead of putting an even more awful 100K array on the stack, just count the number of z's and n's as you're reading the input, and don't bother storing the string at all.
According to the compromise (applicable to homework and challenge questions) described here
How do I ask and answer homework questions?
I will hint, without giving a code solution.
In order to fix TLEs you need to be more efficient.
In this case I'd start by getting rid of one of the three loops and of all of the array accesses.
You only need to count two things during input and one output loop.

Vector is not acting as expected

I am trying to open a text file and pass the lines of the text file to a vector. The first digit in each line is the size of the vector and since I do not know the end point of the text file I am using a while loop to find the end. The idea is that I can take a text file and run a merge sort on it. So, for example:
3 5 4 9
5 0 2 6 8 1
sorted it would be become:
4 5 9
0 1 2 6 8
The problem I am having is that when I sort a vector that is larger than the prior vector (as in the example) I do not get output. It is probably something simple that I just have over looked. I am pretty sure the issue is in the code below. Thanks for any pointers.
while (!file.eof())
{
int size;
file >> size;
vector<int> myVector(size);
int n = 0;
while (n < size && file >> myVector[n])
{
++n;
}
sort(myVector);
for (int j = 0; j < size; ++j)
{
if (file.eof()) break;
cout << myVector[j] << ' ';
}
cout << '\n';
}
The problem is this line:
if (file.eof()) break;
Once you've read the last line of the file, file.eof() will be true. So the first time through the loop that's supposed to print the sorted vector, you break out of the loop and don't print anything. It has nothing do with whether the vector is larger than the previous vector, it's just a problem with the last line of the file. The fix is to get rid of that unnecessary line.
You also need to change the main loop. while (!file.eof()) is the wrong way to loop over a file's contents (see the linked questions for full explanations). Use:
int size;
while (file >> size) {
...
}
because of the line :
if(file.eof()) break;
if you get to eof your program wont print anything since you break the printing loop on its first iteration
for instance - if there are no chars after 8 in your example - you wont get output ,but even a single space can change that
besides that - is there any chance or cases that your sorting function clears a vector ? or changes it ?

How to read a character array of a name, without reading white space?

I am new to C++ and I have a dynamic char array with a max size of 30. I need to extract the name from the array into a string but i have no way of knowing how long the name is.
For example someone named Bob should look like:
Bob____________________ (_ is blank space)
but it instead reads like:
Bob%#$(%$#)%##*##$$#
or something of the sort. I know its doing that because its just random unasigned memory, but how can i cut it off at the end of bob so i can add the blank spaces manually?
In case it is unclear i am reading the letters character by character using a for loop that runs 30 times exactly.
this is my current loop
string n = "";
for(int i=0; i<MAX_NAME_LEN; i++)
{
n += name[i];
}
In this case n should be equal to "Bob___________________"
but it is Bob and a bunch of random crap as stated earlier.
IF you read this caracters from a array of char so a C string you can do something like this(it should be a null terminated string '\0').
n = std::string(name)
And affter that you can add the number of time your want a white space in your string. In my code I want a 30 caracter max with the name included. So it can be someting like that.
if(n.size() < 30)
n = n + std::string(30-n.size(), ' ');

cpp stringstream read input file algorithm to find LCS

Hi here's my first questions here, I would write as clear as possible, if I am too newbie here, please bear it with me. Thanks
Backgroud: I was asked to solve longest common substring(lcs) problem with given input files in c++.
Its purpose is to optimize the algorithm, so it has limited run-time and RAM requirement.(case insensitive)
My Approach: I used to stringstream to parse the every input line and stored them into a vector. use something like suffix tree to chop the string, sort it and put into a vector array (vector that store vectors) and compare every 2 lines (v1,v2) to find common substirng.(I used nested foop loop to compare each word inside every vector), and then put common substrings back to array and remove v1 and v2.
suffix tree eg. banana -> anana -> nana -> ana -> na -> a..[I stored all 5 elements into the vector]
result: it works for most of the files (normal textfiles)
problem: I got 2 special test case that took me forever to find lcs.
1. has 10000 line input, and each line has ave 3000 chars (include space). It took me 50 mins to find lcs. the requirement is not exceed 5 mins.
2. has 100 line input, and each line has ave 60k chars. It never finish running
what I tried:build a common word dictionary for first 2 sentence
read first two lines and stored into vectors
used suffix tree again to find common elements(substring) and named as dictionary
for rest of input lines,
if (words read is within dictionary)
fine do what I did before, read next one
else if (word is not in dictionary)
ignore this word, read next one
help needed: I still cannot read the first two lines if each line contains 60k char, so building the dict itself would exceed the run time limitations. I am not sure if the hashed table would work way better than vectors. I knew a bit about HT but never write anything with it, so if you can explain HT with patience, I would appreciate that.
Update:
As suggested, I put some code here (first one for parse and store into vector, second involve how I compare 2 string and find common element)
vector< vector<string> > parsed_array;
vector<string> choped_element;
// Num1::read from file in a while loop
while (getline (myfile,line))
{
cout << "< InputlineLoopCounter: "<<InputlineLoopCounter<<endl;:q
choped_element.clear();
choped_element.push_back(line); //whole string as first element, eg'Hello World"
stringstream ss(line);
string copystr (line);
while (ss >> temp)
{
copystr.erase(0,copystr.find_first_of(" \t")+1); // here turns into "World"
choped_element.push_back(copystr);
}
choped_element.pop_back();//since I stored whole string as frist element, last one is not necessary
sort(choped_element.begin(),choped_element.end());
parsed_array.push_back(choped_element);//stored into vector array
InputlineLoopCounter ++;
}
//Num::2 compare part in 2 diff string and assembly into new string
//v1 and v2 and 2 vectors full of chopped strings and v3 should be common element
// eg. v1[0]="hello world"; v1[1]="world"
// eg. v2[0]="I dislike hello world"; v2[1]="dislike hello word"; v2[2]="hello word"; v2[4]="word"
// eg. v3 as result would be v3[0]="hello word";v3[1]="word"
for (size_t i = 0; i < v1len; i++)
{
for (size_t j = 0; j< v2len; j++)
{
stringstream ss1(v1[i]);
string fword1;
ss1>>fword1;
stringstream ss2(v2[j]);
string fword2;
ss2>>fword2;
if(fword1 == fword2) //v1[i] and v2[j] are space seperated words
{
string nword1;
string nword2;
string lcommon;
int comlen = 1;
string combine;
combine.append(fword1);
combine.append(space);
while (ss1>>nword1 && ss2>>nword2)
{
if (nword1 == nword2)
{
combine.append(nword1);
combine.append(space);
comlen ++;
}
else
break;
}
combine.erase(combine.find_last_of(" "));
cout<< "common word: "<<combine<<endl;
v3.push_back(combine);
}
}
}

C++ adding a carriage return at beginning of string when reading file

I have two questions:
1) Why is my code adding a carriage return at the beggining of the selected_line string?
2) Do you think the algorithm I'm using to return a random line from the file is good enough and won't cause any problems?
A sample file is:
line
number one
#
line number two
My code:
int main()
{
srand(time(0));
ifstream read("myfile.dat");
string line;
string selected_line;
int nlines = 0;
while(getline(read, line, '#')) {
if((rand() % ++nlines) == 0)
selected_line = line;
}
// this is adding a \n at the beginning of the string
cout << selected_line << endl;
}
EDIT: OK, what some of you suggested makes a lot of sense. The string is probably being read as "\nmystring". So I guess my question now is, how would i remove the first \n from the string?
What you probably want is something like this:
std::vector<std::string> allParagraphs;
std::string currentParagraph;
while (std::getline(read, line)) {
if (line == "#") { // modify this condition, if needed
// paragraph ended, store to vector
allParagraphs.push_back(currentParagraph);
currentParagraph = "";
else {
// paragraph continues...
if (!currentParagraph.empty()) {
currentParagraph += "\n";
}
currentParagraph += line;
}
}
// store the last paragraph, as well
// (in case it was not terminated by #)
if (!currentParagraph.empty()) {
allParagraphs.push_back(currentParagraph);
}
// this is not extremely random, but will get you started
size_t selectedIndex = rand() % allParagraphs.size();
std::string selectedParagraph = allParagraphs[selectedIndex];
For better randomness, you could opt for this instead:
size_t selectedIndex
= rand() / (double) (RAND_MAX + 1) * allParagraphs.size();
This is because the least significant bits returned by rand() tend to behave not so randomly at all.
Because you don't specify \n as a delimeter.
Your "random" selection is completely wrong. In fact, it will always select the first line:
rand() % 1 is always 0.
There is no way to uniformly select a random line without knowing the number of lines present.
In addition, why are you using # as a delimiter? Getline, by default, gets a line (ending with \n).
The newlines can appear from the second line that you print. This is because, the getline function halts on seeing the # character and resumes the next time it is called from where it left of i.e. a character past the # which as per your input file is a newline. Read the C FAQ 13.16 on effectively using rand().
One suggestion is to read the entire file in one go, store the lines in a vector and then output them as required.
Because # is your delimeter, the \n that exists right after that delimeter will be the beginning of your next line, thus making the \n be in front of your line.
1) You're not adding a \n to selected_line. Instead, by specifying '#' you are simply not removing the extra \n characters in your file. Note that your file actually looks something like this:
line\n
number one\n
#\n
line number two\n
<\pre>
So line number two is actually "\nline number two\n".
2) No. If you want to randomly select a line then you need to determine the number of lines in your file first.
You could use the substr method of the std::string class to remove the \n after you decide which line to use:
if ( line.substr(0,1) == "\n" ) { line = line.substr(1); }
As others have said, if you want to select the lines with uniform randomness, you'll need to read all the lines first and then select a line number. You could also use if (rand() % (++nlines+1)) which will select line 1 with 1/2 probability, line 2 with 1/2*1/3 probability, etc.