Finding different strings in two files - C++ - c++

I'm trying to export all lines from second file that aren't in the first one. The order of the lines doesn't matters, I just want to find ones that aren't in the first file already and save them to difference.txt.
Example:
firstfile.txt
This is first line
This is second line
This is third line
secondfile.txt
This is first line
This is some line
This is third line
Now compare them...
difference.txt
This is some line
This is what I came up so far. I know I need to loop through all lines in the second file and compare each of that line with each line of the first file. It's not making any sense to me why it isn't working
void compfiles()
{
std::string diff;
std::cout << "-------- STARTING TO COMPARE FILES --------\n";
ifstream file2;
file2.open("C:\\\\firstfile.txt",ios::binary);
//---------- compare two files line by line ------------------
std::string str;
int j = 0;
while(!file2.eof())
{
getline(file2, str);
if(!CheckWord(str))
{
cout << "appending";
diff.append(str);
diff.append("\n");
}
j++;
}
ofstream myfile;
myfile.open ("C:\\\\difference.txt");
myfile << diff;
myfile.close();
}
bool CheckWord(std::string search)
{
ifstream file;
int matches = 0;
int c = 0;
file.open("C:\\\\secondfile.txt",ios::binary);
std::string stringf;
while(!file.eof())
{
getline(file, stringf);
if(strcmp(stringf.c_str(), search.c_str()))
{
matches += 1;
}
c++;
}
if(matches == 0)
{
return false;
}
else
{
return true;
}
}
Any help would be appreciated. Thanks for reading this block of text.

This code doesn't do what you think it does:
if (strcmp(stringf.c_str(), search.c_str()))
{
matches += 1;
}
strcmp() returns 0 when the strings are equal, but your code will not increment
matches in that case.

Here is a simple but much more effective and idiomatic solution using std::set:
std::ifstream file1("firstfile.txt");
std::set<std::string> str_in_file1;
std::string s;
while (std::getline(file1, s)) {
str_in_file1.insert(s);
}
file1.close();
std::ifstream file2("secondfile.txt");
std::ofstream file_diff("diff.txt");
while (std::getline(file2, s)) {
if (str_in_file1.find(s) == str_in_file1.end()) {
file_diff << s << std::endl;
}
}
file2.close();
file_diff.close();
Also, you might want to use a tool called diff. It does exactly what you are trying to do.

If you want to do it manually then it sounds like you don't need a c++ program but you can do this from the command line using grep.
grep -vxFf firstfile.txt secondfile.txt > difference.txt

Related

How to reverse the order of a input file using C++?

I need to reverse the order of the file and outputted into another file. For example, the
input:
hello
this is a testing file again
just so much fun
Expected output:
just so much fun
this is a testing file again
hello
This is my current code, it printed to where it reverses the order of the lines but also the order of the characters of each word.
Current:
nuf hcum os tsuj
niaga elif gnitset a si siht
olleh
int print_rev(string filename, char c){
ifstream inStream (filename);
ofstream outStream;
inStream.seekg(0,inStream.end);
int size = inStream.tellg();
outStream.open("output.txt");
for (int j=1; j<=size; j++){
inStream.seekg(-j, ios::end);
c=inStream.get();
outStream << c;
}
inStream.close();
outStream.close();
return 0;
}
You're reversing the whole file, character by character. What you want to do is read in each line separately, and then reverse the line order.
A stack of lines seems like a good choice for this :
int printRev(string filename)
{
stack<string> lines;
ifstream in(filename);
string line;
while (getline(in, line))
{
lines.push(line);
}
ofstream out("output.txt");
while (!lines.empty())
{
out << lines.top() << endl;
lines.pop();
}
return 0;
}

How can I label the lines of an existing file?

Lets say I have a text file containing something like:
Four
score
and
seven
years
ago
...
I want to be able to label these lines so that after the program runs, the file looks like:
1.Four
2.score
3.and
4.seven
5.years
6.ago
...
I've prepared a solution; however, I find it to be heavy weight and it has a problem of labeling one past the last line...
std::string file = "set_test - Copy.txt";
std::ifstream in_test{file};
std::vector<std::string> lines;
while(in_test) {
std::string temp;
getline(in_test, temp);
lines.push_back(temp);
}
in_test.close();
std::ofstream out_test{file};
for(unsigned int i = 0; i < lines.size(); ++i) {
out_test << i+1 << '.' << lines[i] << '\n';
}
On top of being heavy-weight, this solution also labels the line beyond the last line of text.
Does anyone have a better solution to this problem?
The cause of your problem is this structure
while (stream is good)
read from stream
do something
as it will read too much. (See this Q&A for explanation.)
What's happening is that the very last getline, the one that actually reaches the end of the file, will fail and leave temp empty.
Then you add that empty line to your lines.
The "canonical" stream-reading loop structure is
while (attempt to read)
do something with the result
in your case,
std::string temp;
while (getline(in_test, temp)) {
lines.push_back(temp);
}
If you write to a different file you don't need to store anything except the last line; you can write each line immediately.
If you want to replace the original, you can replace the old with the new afterwards.
Something like this:
std::ifstream in_test{"set_test - Copy.txt";}
std::ofstream out_test{"set_test - Numbered.txt"};
if (!in_test || !out_test) {
std::cerr << "There was an error in the opening of the files.\n";
return;
}
int i = 1;
std::string line;
while (getline(in_test, line) && out_test << i << '.' << line << '\n') {
i++;
}

Encode a string of characters given a custom code table

I want to programmatically convert a string of characters stored in a file to a string of character codes (encode) by following a code table. The string of binary codes should then go to a file, from which I can revert it back to the string of characters later (decode). The codes in the code table were generated using Huffman algorithm and the code table is stored in a file.
For example, by following a code table where characters and its corresponding codes are single spaced like this:
E 110
H 001
L 11
O 111
encoding "HELLO" should output as "0011101111111"
My C++ code cannot seem to complete the encoded string. Here is my code:
int main
{
string English;
ifstream infile("English.txt");
if (!infile.is_open())
{
cout << "Cannot open file.\n";
exit(1);
}
while (!infile.eof())
{
getline (infile,English);
}
infile.close();
cout<<endl;
cout<<"This is the text in the file:"<<endl<<endl;
cout<<English<<endl<<endl;
ofstream codefile("codefile.txt");
ofstream outfile ("compressed.txt");
ifstream codefile_input("codefile.txt");
char ch;
string st;
for (int i=0; i<English.length();)
{
while(!codefile_input.eof())
{
codefile_input >> ch >> st;
if (English[i] == ch)
{
outfile<<st;
cout<<st;
i++;
}
}
}
return 0;
}
For an input string of "The_Quick_brown_fox_jumps_over_the_lazy_dog", the output string is 011100110, but it should be longer than that!
output image
Please help! Is there anything I have missed?
(n.b. my C++ code has no syntax errors)
Let's take a look at the main loop, you are doing your work in:
for (int i=0; i<English.length();)
{
while(!codefile_input.eof())
{
codefile_input >> ch >> st;
if (English[i] == ch)
{
outfile<<st;
cout<<st;
i++;
}
}
}
Your code, will read through the codefile_input once, and then will get stuck in codefile_input.eof () == true condition, and then, for (int i=0; i<English.length();) will become an infinite loop, due to the fact, that there won't be a code path, in which i is increased, and it will never reach the value equal to English.length ().
As a side note, take a read on Why is iostream::eof inside a loop condition considered wrong?.
To avoid the issue, explained above, consider reading the dictionary file, to a data container (e.g. std::map), and then, use that, while iterating through the string, that you want to encode.
For example:
std::ifstream codefile_input("codefile.txt");
char ch;
std::string str;
std::map<char, std::string> codes;
while (codefile_input >> ch >> str)
{
codes[ch] = str;
}
codefile_input.close ();
for (int i=0; i<English.length(); ++i)
{
auto it = codes.find (English[i]);
if (codes.end () != it)
{
outfile << codes->second;
cout << codes->second;
}
}
Note, you will need to #include <map> to use std::map.
In addition to solving the issue, about which, your question, was actually, about, your loop:
while (!infile.eof())
{
getline (infile,English);
}
only reads the last line of the file, while discarding all other lines, that came prior to it. If you want to process all the lines in a file, consider changing that loop to:
while (std::getline (infile, English))
{
/* Line processing goes here */
}
And, since, your dictionary is unlikely to be different for different lines, you can move that logic, to the front of this loop:
std::ifstream codefile_input("codefile.txt");
char ch;
std::string str;
std::map<char, std::string> codes;
while (codefile_input >> ch >> str)
{
codes[ch] = str;
}
codefile_input.close ();
ifstream infile("English.txt");
if (!infile.is_open())
{
cout << "Cannot open file.\n";
exit(1);
}
ofstream outfile ("compressed.txt");
string English;
while (std::getline (infile, English))
{
for (int i=0; i<English.length(); ++i)
{
auto it = codes.find (English[i]);
if (codes.end () != it)
{
outfile << codes->second;
cout << codes->second;
}
}
}
In addition, consider adding error checking for all of the files that you open. You check if you can open file English.txt, and exit if you can't, but you don't check if you could open any other file.
On unrelated note #2, considering reading Why is “using namespace std” considered bad practice? (that's why you see me using std:: explicitly in the code, that I added).

Deleting specific line from file

These are the contents of my example file:
abcdefg hijk lmnopqrstAB CSTAKLJSKDJD KSA FIND ME akjsdkjhwjkjhasfkajbsdh ADHKJAHSKDJH
I need to find and delete the 'FIND ME' inside of the file so the output would look like this:
abcdefg hijk lmnopqrstAB CSTAKLJSKDJD KSA akjsdkjhwjkjhasfkajbsdh ADHKJAHSKDJH
I have tried the following method of doing getline and then writing all of the contents except the FIND ME into a temporary file and then rename the temporary file back.
string deleteline;
string line;
ifstream fin;
fin.open("example.txt");
ofstream temp;
temp.open("temp.txt");
cout << "Which line do you want to remove? ";
cin >> deleteline;
while (getline(fin,line))
{
if (line != deleteline)
{
temp << line << endl;
}
}
temp.close();
fin.close();
remove("example.txt");
rename("temp.txt","example.txt");
but it doesn't work.
Just as a side note: the file has NO newline/linefeeds. So the file contents are all written in 1 line.
EDIT:
FIXED CODE:
while (getline(fin,line))
{
line.replace(line.find(deleteline),deleteline.length(),"");
temp << line << endl;
}
This gets me the results I expected. Thank you everyone for helping!
In case anyone would like it I have converted Venraey's useful code into a function:
#include <iostream>
#include <fstream>
void eraseFileLine(std::string path, std::string eraseLine) {
std::string line;
std::ifstream fin;
fin.open(path);
// contents of path must be copied to a temp file then
// renamed back to the path file
std::ofstream temp;
temp.open("temp.txt");
while (getline(fin, line)) {
// write all lines to temp other than the line marked for erasing
if (line != eraseLine)
temp << line << std::endl;
}
temp.close();
fin.close();
// required conversion for remove and rename functions
const char * p = path.c_str();
remove(p);
rename("temp.txt", p);
}
Try this:
line.replace(line.find(deleteline),deleteline.length(),"");
I'd like to clarify something. Although the answer provided by gmas80 could work, for me, it didn't. I had to modify it somewhat, and here's what I ended up with:
position = line.find(deleteLine);
if (position != string::npos) {
line.replace(line.find(deleteLine), deleteLine.length(), "");
}
Another thing that didn't satisfy me was that it left blank lines in the code. So I wrote another thing to delete the blank lines:
if (!line.empty()) {
temp << line << endl;
}

C++ file handling, is_open returning bad

If I include the if test in my code the error message is returned and I'm not sure why.
and when it's not used, my program get's stuck in a loop where it never reaches the end of the file. I don't understand what's going wrong.
int countlines()
{
fstream myfile;
myfile.open("questions.txt", ios::in);
string contents;
int linenumber = 0;
//if (myfile.is_open())
// {
while (!myfile.eof())
{
getline( myfile, contents );
if (contents != "")
{
linenumber++;
}
}
cout << "there are " << linenumber << " lines.\n";
//}else {cout<<"Unable to get file.\n";}
myfile.close();
return(linenumber);
}
What's going on is that your file is not being opened. That's why is_open fails.
Then, when you comment out the check, you're breaking your loop because you're iterating incorrectly (see my comment) and not detecting stream failures (.eof() will never be true on that stream).
Make sure that the file is in the right place, and that it is accessible.
The correct idiom for reading a file line-by-line in C++ is using a loop like this:
for (std::string line; std::getline(file,line);)
{
// process line.
}
Inserting this in your example (+fixing indentation and variable names) gives something like this:
int countlines(const std::string& path)
{
// Open the file.
std::ifstream file(path.c_str());
if (!file.is_open()) {
return -1; // or better, throw exception.
}
// Count the lines.
int count = 0;
for (std::string line; std::getline(file,line);)
{
if (!line.empty()) {
++count;
}
}
return count;
}
Note that if you don't intend to process the line contents, you can actually skip processing them using std::streambuf_iterator, which can make your code look like:
int countlines(const std::string& path)
{
// Open the file.
std::ifstream file(path.c_str());
if (!file.is_open()) {
return -1; // or better, throw exception.
}
// Refer to the beginning and end of the file with
// iterators that process the file character by character.
std::istreambuf_iterator<char> current(file);
const std::istreambuf_iterator<char> end;
// Count the number of newline characters.
return std::count(current, end, '\n');
}
The second version will completely bypass copying the file contents and avoid allocating large chunks of memory for long lines.
When using std::istream and std::ostream (whose std::fstream implements), the recommended usage is to directly use the stream in a bool context instead of calling eof() function because it only return true when you managed to read until the last byte of the file. If there was any error before that, the function will still return true.
So, you should have written your code as:
int countlines() {
ifstream myfile;
int linenumber = 0;
string linecontent;
myfile.open("question.txt", ios::in);
while (getline(myfile, linecontent)) {
if (!linecontent.empty()) {
++linenumber;
}
}
return linenumber;
}
Try the following code. It will also (hopefully) give you an idea why the file open is failing...
int countlines()
{
ifstream myfile;
myfile.open("questions.txt");
string contents;
int linenumber = 0;
if (myfile.is_open())
{
while (getline(myfile, contents))
{
if (contents != "")
linenumber++;
}
cout << "there are " << linenumber << " lines." << endl;
myfile.close();
}
else
cout << "Unable to get file (reason: " << strerror(errno) << ")." << endl;
return linenumber;
}