Why does replacing a string replace part of another string? - c++

I wrote a small piece of code today about replacing a word from a text file.
Though it replaces the given word but it also removes some spaces and some part of other string.
I want it to replace given word only while keeping rest as it as.
I don't know what should I do. Any help would be appreciated!
Original Data of file:
Is anyone there?
Who survived?
Somebody new?
Anyone else but you
On a lonely night
Was a burning light
A hundred years, we'll be born again
Output when replaced "anyone" by "porter":
Is anyonportere?
Who survived?
Somebody new?
anporterlse but you
On a lonely night
Was a burning light
A hundred years, we'll be born again
Code:
#include<iostream>
#include<fstream>
#include<cstdlib>
#include<cstring>
using namespace std;
int main(int argc , char* argv[])
{
string old_word,new_word;
int no=0;
old_word=argv[1];
new_word=argv[2];
if(argc<4)
{
cout<<"\nSome Arguments are missing";
return 0;
}
if(strlen(argv[1])!=strlen(argv[2]))
{
cout<<"\nReplacement is not possible as size of New wor dis not equal to old word";
return 0;
}
fstream obj;
obj.open(argv[3],ios::in|ios::out);
if(!obj)
{
cout<<"\nError in file creating";
return 0;
}
string fetch_word;
while(1)
{
if(obj.eof())
{
break;
}
else
{
int pos=obj.tellg();
obj>>fetch_word;
if(fetch_word==old_word)
{
no++;
obj.seekp(pos);
obj<<new_word;
}
}
}
if(no==0)
{
cout<<"\nNo Replacement Done . Zero Replacement Found";
}
else
{
cout<<"\nReplacement Done . Replacement Found ="<<no<<endl;
}
obj.close();
return 0;
}

If we take the string "Is anyone there?"
After read the word "Is" the read head is on the space after the "Is" so tellg will return 2.
Now you're reading the next word, you skip white spaces and begin reading untill next white space character, you are reading the "anyone" word and put its replacement in the taken position (2).
so it should give you the string: "Isportere there?"
Not what you ment for, but not the result you've got.
to fix it you should ignore white spaces before reading the position:
like this:
//#include <cwctype> for iswspace
//eat white spaces
while(iswspace(obj.peek()))
obj.ignore();
//now read head is on the beginning of a word, you can take position.
int pos=obj.tellg();
Edit
You'll have to debug and see if the tellg returns 3 in the first line before you read the word "anyone". I sugggest to add some debug print for each replacement with the possition.
like:
if(fetch_word==old_word)
{
no++;
cout<<"Replacing in pos "<< pos <<endl;
obj.seekp(pos);
obj<<new_word;
}
Now you can check:
Does the pos was correct? (you can try to seekg and read the word again)
Does the seekp was succes? (you can use tellp to check!)
What happen when you just do obj.seekp(3); obj<<"porter"; does it replace the string in the correct position?

#include <cctype>
#include <cstdlib>
#include <string>
#include <fstream>
#include <iostream>
int main()
{
std::string old_word{ "anyone" };
std::string new_word{ "porter" };
if (old_word.length() != new_word.length()) {
std::cerr << "Sorry, I can only replace words of equal length :(\n\n";
return EXIT_FAILURE;
}
char const *filename{ "test.txt" };
std::fstream obj{ "test.txt", std::ios::in | std::ios::out };
if (!obj.is_open()) {
std::cerr << "Couldn't open \"" << filename << "\" for reading and writing.\n\n";
return EXIT_FAILURE;
}
std::string word;
for (std::streampos pos; pos = obj.tellg(), obj >> word;) {
if (word == old_word) {
obj.seekg(pos); // set the "get" position to where it were before extracting word
while (std::isspace(obj.peek())) // for every whitespace we peek at
obj.get(); // discard it
obj.seekp(obj.tellg()); // set the "put" position to the current "get" position
obj << new_word; // overwirte word with new_word
obj.seekg(obj.tellp()); // set the "get" position to the current "put" position
}
}
}

Related

Need to read a line of text from an archive.txt until "hhh" its found and then go to the next line

My teacher gave me this assignment, where I need to read words from a file and then do some stuff with them. My issue is that these words have to end with "hhh", for example: groundhhh, wallhhh, etc.
While looking for a way to do this, I came up with the getline function from the <fstream> library. The thing is that getline(a,b,c) uses 3 arguments, where the third argument is to read until c is found but c has to be a char, so it doesn't work for me.
In essence, what I'm trying to achieve is reading a word from a file, "egghhh" for example, and make it so if "hhh" is read then it means the line finishes there and I receive "egg" as the word. My teacher used the term "sentinel" to describe this hhh thing.
This is my try:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
ifstream read_archive;
void showListWords(){
string word;
string sentinel = "hhh";
while (getline(read_archive,word,sentinel)){
cout << word << endl;
}
}
void openReadArchive(){
read_archive.open("words.txt");
if(read_archive.fail())
cout << "There is something wrong with the archive" << endl;
}
As you discovered, std::getline() allows you to specify only 1 char for the "sentinel" that it stops reading on, where '\n' (line break) is the default.
What you can do is use this default to read an entire line of text from the file into a std::string, then put that string into an std::istringstream and read words from that stream until the end of the stream is reached or you find a word that ends with "hhh", eg:
#include <sstream>
#include <limits>
void showListWords()
{
string line, word;
string sentinel = "hhh";
while (getline(read_archive, line))
{
istringstream iss(line);
while (iss >> word)
{
if (word.length() > sentinel.length())
{
string::size_type index = word.length() - sentinel.length();
if (word.compare(index, sentinel.length(), sentinel) == 0)
{
word.resize(index);
iss.ignore(numeric_limits<streamsize>::max());
}
}
cout << word << endl;
}
}
}
In which case, you could alternatively just read words from the original file stream, stopping when you find a word that ends with "hhh" and ignore() the rest of the current line before continuing to read words from the next line, eg:
#include <limits>
void showListWords()
{
string word;
string sentinel = "hhh";
while (read_archive >> word)
{
if (word.length() > sentinel.length())
{
string::size_type index = word.length() - sentinel.length();
if (word.compare(index, sentinel.length(), sentinel) == 0)
{
word.resize(index);
read_archive.ignore(numeric_limits<streamsize>:max, '\n');
}
}
cout << word << endl;
}
}

How to replace Hi with Bye in a file

I want to replace hi with a bye by reading a file and outputting another file with the replaced letters.
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ifstream myfile;
ofstream output;
output.open("outputfile.txt");
myfile.open("infile.txt");
char letter;
myfile.get(letter);
while (!myfile.eof()) {
if (letter == 'H') {
char z = letter++;
if (z == 'i')
output << "BYE";
}
else output << letter;
}
output.close();
myfile.close();
return 0;
}
My outputs are repeated capital I's that is repeated infinity times.
Here is my input file
Hi
a Hi Hi a
Hi a a Hi
Don't check eof
The eof method is returning the location of the input stream read pointer, and not the status of the get. It is more like telling you whether or not get will succeed, so you could write something like:
while (!myfile.eof()) {
char letter;
myfile.get(letter);
//...
}
In this way, you would at least be getting a new letter at each iteration, and the loop ends when the read pointer reaches the end of the input.
But, there are other cases that might cause the get to not succeed. Fortunately, these are captured by the stream itself, which is returned by get. Testing the status of the stream is as easy as treating the stream as a boolean. So, a more idiomatic way to write the loop is:
char letter;
while (myfile.get(letter)) {
//...
}
Peek at the next letter
When you want to look at the next letter in the input following the detected 'H', you perform an increment.
char z = letter++;
But, this does not achieve the desired result. Instead, it just sets both letter and z variables to the numerical successor of 'H' ('H' + 1), and does not observe the next letter in the input stream.
There is another method you can use that is like get, but leaves the input in the input stream. It is called peek.
char z;
auto peek = [&]() -> decltype(myfile) {
if (myfile) z = myfile.peek();
return myfile;
};
if (peek()) {
//...
}
And now, you can check the value of z, but it is still considered input for the next get on letter.
Close to what you implemented
So, the complete loop could look like:
char letter;
while (myfile.get(letter)) {
if (letter == 'H') {
char z;
auto peek = [&]() -> decltype(myfile) {
if (myfile) z = myfile.peek();
return myfile;
};
if (peek() && z == 'i') {
myfile.get(z);
output << "BYE";
continue;
}
}
output << letter;
}
With this approach, you will be able to correctly handle troublesome cases like HHi as input, or the last letter in the input being an H.
Your two lines:
myfile.get(letter);
while (!myfile.eof()) {
are wrong.
First off you only read letter once, hence your infinite loop.
Secondly you don't use eof in a while loop.
You want something more like:
while (myfile.get(letter)) {
Also:
char z = letter++;
is wrong, you want to read another letter:
myfile.get(z);
but you have to be careful that you get something, so
if(!myfile.get(z)) {
output << letter;
break;
}
So finally:
char letter;
while (myfile.get(letter)) {
if (letter == 'H') {
char z;
if(!myfile.get(z)) {
output << letter;
break;
}
if (z == 'i') {
output << "BYE";
}
else output << letter << z;
}
else output << letter;
}
But now we are consuming the character after any H which may not be desirable.
See #jxh's answer for a way to do this with look ahead.
There is a dedicated function to replace patterns in strings. For example, you could use std::regex_replace. That is very simple. We define, what should be searched for and with what that would be replaced.
Some comments. On StackOverflow, I cannot use files. So in my example program, I use a std::istringstream instead. But this is also an std::istream. You can use any other std::istream as well. So if you define an std::ifstream to read from a file, then it will work in the same way as the std::istringstream. You can simply replace it. For the output I use the same mechanism to show the result on the console.
Please see the simple solution:
#include <iostream>
#include <sstream>
#include <regex>
// The source file
std::istringstream myfile{ R"(Hi
a Hi Hi a
Hi a a Hi)" };
// The destination file
std::ostream& output{ std::cout };
int main() {
// Temporary string, to hold one line that was read from a file
std::string line{};
// Read all lines from the file
while (std::getline(myfile, line)) {
// Replace the sub-string and write to output file
output << std::regex_replace(line, std::regex("Hi"), "Bye") << "\n";
}
return 0;
}

find lines (from a file) that contain a specified word

I cannot figure out how to list out the lines that contain a specified word. I am provided a .txt file that contains lines of text.
So far I have come this far, but my code is outputting the amount of lines there are. Currently this is the solution that made sense in my head:
#include <iostream>
#include <fstream>
#include <iomanip>
using namespace std;
void searchFile(istream& file, string& word) {
string line;
int lineCount = 0;
while(getline(file, line)) {
lineCount++;
if (line.find(word)) {
cout << lineCount;
}
}
}
int main() {
ifstream infile("words.txt");
string word = "test";
searchFile(infile, word);
}
However, this code simply doesn't get the results I expect.
The output should just simply state which lines have the specified word on them.
So, to sum up the solution from the comments. It is just about the std::string's find member function. It doesn't return anything compatible with a boolean, it either return an index if found, or std::string::npos if not found, which is a special constant.
So calling it with traditional way if (line.find(word)) is wrong, but instead, it should be checked this way:
if (line.find(word) != std::string::npos) {
std::cout << "Found the string at line: " << lineCount << "\n";
} else {
// String not found (of course this else block could be omitted)
}

Finding pattern in a text in C++

I have written the following code to find the number of "ATA" in a text that is read to a string as "GCTATAATAGCCATA". The count returned should be 3 but it returns 0. When I check in debugger the string for text is initially created. However, when an empty string is passed to the function patternCount. Am I reading the contents of the file into the string text correctly?
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
void patternCount(string text, string pattern);
int main()
{
string text;
fstream file_("test.txt");
if(file_.is_open())
{
while(getline(file_,text))
{
cout << text << '\n';
}
file_.close();
}
cout << "Enter a string ";
string pattern;
getline(cin, pattern);
patternCount(text, pattern);
return 0;
}
void patternCount(string text, string pattern)
{
int count = 0;
size_t nPos = text.find(pattern, 0);
while (nPos != string::npos)
{
nPos = text.find(pattern, nPos + 1);
++count;
}
cout << "There are " << count <<" " << pattern << " in your text.\n";
}
This code just counts the number of occurrence of input string in the last line of text file. If that line is empty or no does not contain the string, The output result will be 0.
But I guess the OP wants to search a whole file, in which case the main function need be fixed accordingly.
std::ifstream file{"test.txt"};
std::ostringstream text;
std::copy(std::istream_iterator<char>{file}, std::istream_iterator<char>{},std::ostream_iterator<char>{text});
//...
patternCount(text.str(), pattern);
So if I understand correctly, you're not sure if you're reading correctly the contents from the file test.txt. If you want to read every content, then try this instead:
ifstream file_("test.txt");
string s,text;
file_>>s;
text=s;
while(file_>>s)
{
text=text+" "+s;
}
This should probably work. Note that reading from a file like filename>>string only reads till the first space. That's why I'm using the while. You can also use getline(), which reads the whole text with spaces. Also note that you should include fstream. Printing out the text should help more as well.
#include <iostream>
#include <fstream>
#include <string>
using std::cout;
using std::cerr;
using std::string;
int count = 0; // we will count the total pattern count here
void patternCount(string text, string pattern);
int main() {
cout << "Enter a string ";
string pattern;
std::getline(cin, pattern);
string text;
fstream file_("test.txt");
if(file_.is_open()){
while(std::getline(file_,text))
patternCount(text,pattern);
file_.close();
}else
cerr<<"Failed to open file";
cout << "There are " << count <<" " << pattern << " in your text.\n";
return 0;
}
void patternCount(string text, string pattern){
size_t nPos = text.find(pattern, 0);
while (nPos != string::npos) {
nPos = text.find(pattern, nPos + 1);
++count;
}
}
The Problem
Your code was good, there were no bugs in patternCount function.
But You were reading the file in an incorrect way. See, everytime you call std::getline(file_, text), the old result of the _text are overwritten by new line. So, in the end of the loop, when you pass text to patternCount function, your text only contains the last line of the file.
The Solution
You could have solved it in two ways:
As mentioned above, you could run patternCount() to each line in while loop and update a global count variable.
You could append all the lines to text in while loop and at last call the patternCount function.
Whichever you prefer, I have implemented the first, while second one is in other answers.

C++ Compare words between 2 different text files

I have 2 text files:
Main file: Library.txt
File to compare: fileToCompare.txt
The main file(Library.txt) contains a lot of words, but still not a complete one. So I search online to find more words and save them in fileToCompare.txt. But there must be many same words in Library.txt & fileToCompare.txt, so to eliminate the same words I need to compare fileToCompare.txt with Library.txt to determine which words are the same.
My way to eliminate the same words is compare each word one by one with Library.txt. That means let say if the first word is "apple", then "apple" will compare each word 1 by 1 in Library.txt and when it finds it, "apple" is the same word occurs in these 2 files. If not found, "apple" will be cout in the console and save it the text file (which asked user before to enter the file name to save non-existing words).
I found out that if fileToCompare.txt contains many words e.g. 1mb of file size, it takes an hour to compare all the words. So I think out a way:
fileToCompare.txt is sorted alphabetically, so it always start from alphabet "a" (if it is). It compares as usual and when it reach alphabet "b", it create another text file Library2.txt in "lib/" directory.
I ofstream all the words start from alphabet "b" to Library2.txt. And now instead of comparing with the main file, it compares with Library2.txt. Or I can say Library2.txt is the main file now.
The comparison process continued start from alphabet "b" and if it reached alphabet "c", it create another text file Library3.txt and ofstream all the words start from alphabet "c" and so on... till the end of word start from "z" obviously, which is end of comparison process.
But the problem is it won't eliminate same words, actually some does, but many don't. I checked the main file and some words in the output file are the same.
Here is the download link for Library.txt & fileToCompre.txt if you need it:
Library.txt -> https://www.dropbox.com/s/ihqpaju3b33ysgv/Library.txt?dl=0
fileToCompre.txt -> https://www.dropbox.com/s/pioy77g9mfz9och/fileToCompare.txt?dl=0
What I explain above might be confusing and the code is quite messy actually, I know it's hard to understand, sure to take you a whole evening to figure out.
#include<iostream>
#include<conio.h>
#include<fstream>
using namespace std;
int main(){
string txt="fileToCompare.txt";
ifstream lib;
lib2.open(txt.c_str());
if(!lib2){
cout<<"\n Oops! "<<txt<<" is missing!\n If such file exists, be sure to check the file extension is .txt\n";
getch();
main();
}
cout<<"\n Enter the file name to save the non-existing words\n (required an extension at the end)\n";
getline(cin,word);
string libPath="lib/"+word,alphaStr="a",libtxt[26]={"Library.txt","lib/Library2.txt","lib/Library3.txt","lib/Library4.txt","lib/Library5.txt","lib/Library6.txt","lib/Library7.txt","lib/Library8.txt","lib/Library9.txt","lib/Library10.txt","lib/Library11.txt","lib/Library12.txt","lib/Library13.txt","lib/Library14.txt","lib/Library15.txt","lib/Library16.txt","lib/Library17.txt","lib/Library18.txt","lib/Library19.txt","lib/Library20.txt","lib/Library21.txt","lib/Library22.txt","lib/Library23.txt","lib/Library24.txt","lib/Library25.txt","lib/Library26.txt"};
const char* wordChar=libPath.c_str();
const char* libManip=libtxt[0].c_str();
int alphaI=1,boolcheck=1;
lib.open(libManip);
outWord.open(wordChar);
while(getline(lib2,libStr2)){
if(libStr2.substr(0,1)!=alphaStr){
lib.close();
lib.open(libManip);
libMO.open(libtxt[alphaI].c_str());
while(getline(lib,libStr)){
if(libStr.substr(0,1)!=alphaStr){
libMO<<libStr<<endl;
}
}
libManip=libtxt[alphaI].c_str();
libMO.close();
lib.close();
alphaI++;
alphaStr=libStr2.substr(0,1);
boolcheck=1;
}
if(boolcheck==1){
lib.close();
lib.open(libManip);
boolcheck=0;
}
while(getline(lib,libStr)){
if(libStr==libStr2){
found=1;
break;
}
}
if(!found){
cout<<"\n "<<libStr2;
outWord<<libStr2<<endl;
countNF++;
}
count++;
found=0;
}
cout<<"\n\n\n Total words: "<<count<<"\n Total words reserved: "<<countNF;
lib2.close();
lib.close();
getch();
return 0;
}
You should use a different algorithm / data structure for the comparison.
The following example uses a std::set. It reads both files and writes the merged result into merged.txt:
#include <iostream>
#include <set>
#include <string>
#include <fstream>
int main()
{
std::ifstream lib("Library.txt");;
std::set<std::string> lib_set;
std::string word;
while (lib >> word)
{
lib_set.insert(word);
}
std::ifstream check("fileToCompare.txt");
while (check >> word)
{
lib_set.insert(word);
}
std::ofstream merged("merged.txt");
std::set<std::string>::iterator it;
for (it = lib_set.begin(); it != lib_set.end(); ++it)
{
merged << *it << std::endl;
}
}
Executing this for your dataset takes 0.8 seconds on my computer.
Since the files fileToCompare.txt and Library.txt are sorted alphabetically, your code can take advantage of that.
Read a word from each file.
If the two words are same, read the next words from the files.
If the word from fileToCompare.txt is less than the word from Library.txt, keep the word from Library.txt and read the next word from fileToCompare.txt. Otherwise, keep the word from fileToCompare.txt and read the next word form Library.txt.
Keep doing that until there are no more words to read.
At the end, if there are still any more words left in fileToCompare.txt, read and print them.
The following program follows the above logic and seems to work for me.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
void compareFiles(ifstream& txtf, ifstream& libf)
{
string txtWord;
string libWord;
bool readTxt = true;
bool readLib = true;
while ( true )
{
if ( readLib )
{
// Try to read the next word from the libf
// If the read is not successful, break out of the loop.
if ( ! (libf >> libWord) )
{
break;
}
}
if ( readTxt )
{
// Try to read the next word from the txtf
// If the read is not successful, break out of the loop.
if ( ! (txtf >> txtWord) )
{
break;
}
}
if ( txtWord == libWord )
{
// The same word exists in both files.
// Read the next words from both files.
readTxt = readLib = true;
continue;
}
// A word from the text file doesn't exist in the library file.
// Print the word from the text file if the word from the text file
// was read in this iteration.
if ( readTxt )
{
cout << txtWord << endl;
}
// The next word we read will depend on whether the txtWord is less
// or greater than libWord.
if ( txtWord < libWord )
{
// Read the next txtWord but keep the current libWord.
readTxt = true;
readLib = false;
}
else
{
// Read the next libWord but keep the current txtWord.
readTxt = false;
readLib = true;
}
// The above logic can be shortened to.
// readTxt = (textWord < libWord);
// readLib = !readTxt;
}
// When the previous while loop ends, there might be more words in txtf.
// Read the remaining words from txtf and print them.
while ( txtf >> txtWord )
{
cout << txtWord << endl;
}
}
void compareFiles(string const& txt, string const& lib)
{
ifstream txtf(txt);
ifstream libf(lib);
compareFiles(txtf, libf);
}
int main()
{
string txt="fileToCompare.txt";
string lib="Library.txt";
compareFiles(txt, lib);
return 0;
}