Order files with MergeSort in c++ - c++

I'm triying to implement my own MergeSort, but I've got some problems, see if anyone can help me a little.
I have a big file with some info separeted with coma (Name,city,mail,telf). I would like to apply mergesort to order it, because I supose that the client computer wont have as much memory to do it in one try.
So, I split it into files of MAX_CUSTOMERS lines, and order them individually, all correct until here, but when I want to get the first two files and order them, I've got all the problems, I got repeated, ones and others dissapear, here's my code:
void MergeSort(string file1Name, string file2Name,string name){
printf("Enter MERGE SORT %s AND %s\n",file1Name.c_str(),file2Name.c_str());
string temp;
string fileName;
string lineFile1, lineFile2;
bool endFil1 = false, endFil2 = false;
int numCust1 = 0;
int numCust2 = 0;
int x1 = 0, x2 = 0;
ifstream file1;
file1.open(file1Name.c_str());
ifstream file2;
file2.open(file2Name.c_str());
ofstream mergeFile;
fileName = "customers_" +name +".txt";
cout << "Result file " << fileName << endl;
mergeFile.open("temp.txt");
getline(file1,lineFile1);
getline(file2,lineFile2);
while(!endFil1 && !endFil2){
if(CompareTelf(lineFile1,lineFile2)==1){
mergeFile << lineFile1 << endl;
if(!getline(file1,lineFile1)){
cout << lineFile1 << endl;
cout << "1st file end" << endl;
endFil1 = true;
}
}else{
mergeFile << lineFile2 << endl;
if(!getline(file2,lineFile2)){
cout << lineFile2 << endl;
cout << "2nd file end" << endl;
endFil2 = true;
}
}
}
if(endFil1){
//mergeFile << lineFile2 << endl;
while(getline(file2,lineFile2)){
mergeFile << lineFile2 << endl;
}
}else{
//mergeFile << lineFile1 << endl;
while(getline(file1,lineFile1)){
mergeFile << lineFile1 << endl;
}
}
file1.close();
file2.close();
mergeFile.close();
rename("temp.txt",fileName.c_str());
return;
}
Customer SplitLine(string line){
string splitLine;
string temp;
Customer cust;
int actProp = 0;
int number;
istringstream readLineStream(line); //convert String readLine to Stream readLine
while(getline(readLineStream,splitLine,',')){
if (actProp == 0)cust.name = splitLine;
else if (actProp == 1)cust.city = splitLine;
else if (actProp == 2)cust.mail = splitLine;
else if (actProp == 3)cust.telf = atoi(splitLine.c_str());
actProp++;
}
//printf("Customer read: %s, %s, %s, %i\n",cust.name.c_str(), cust.city.c_str(), cust.mail.c_str(), cust.telf);
return cust;
}
int CompareTelf(string str1, string str2){
Customer c1 = SplitLine(str1);
Customer c2 = SplitLine(str2);
if(c1.telf<c2.telf)return 1; //return 1 if 1st string its more important than second, otherwise, return -1
else return -1;
}
struct Customer{
string name;
string city;
string mail;
long telf;
};
If have some question about the code, just say it! I tried to use varNames as descriptive as possible!
Thanks a lot.

Your code seems quite good, but it has several flaws and one important omission.
One of the minor flaws is lack of initialization of Customer structure - you didn't provide a constructor to the struct, and do no explicit initialization of the cust variable. Hopefully string members are properly initialized by the string class constructor, but long telf may get any initial value.
Another one is lack of format checking in splitting an input line. Are you sure that every input line has same format? If there are lines with too many commas (say, comma inside a name) then the loop may incorrectly try to assign 'email' data to 'telf' member...
OTOH if there is too few commas, the 'telf' member may remain uninitialized, with a random initial value...
Together with the first one this flaw may lead to incorrect order of output data.
Similar problems arise when you use atoi function: it returns int but your variable is long. I suppose you have chosen long type because of the expected range of values - if so, converting input data to int may truncate significant part of data! I'm not sure what atoi does in that case, it may either return the result of converting some initial part of the input string or just return zero. Both values are wrong and lead to incorrect sorting, so you better use atol instead.
Next issue is reading first line from both input files. You don't check if getline() succeeded. If an input file is empty, the corresponding lineFile_num string will be empty, but endFil_num will not reflect that - it will still be false. So you again go into comparing invalid data.
Finally the main problem. Assume the file1 contents is 'greater than' (that is: goes after) the whole file2. Then the first line stored in lineFile1 results in CompareTelf() returning -1 all the time. the main loop copies the whole file2 into the output, and...? And the final while() loop starts with getline(file1,lineFile1) thus discarding the first line of file1!
Similar result happens with files consisting of records (A,C) and (B), to be merged as (A,B,C): first A and B are read in, then A is saved and C is read in, then B is saved and end of file 2 detected. Then while(getline(...)) cancels C in memory and finds end of file 1, which terminates the loop. Record C gets lost.
Generally, when the main merging loop while(!endFil1 && !endFil2) exhausts one of files, the first unsaved line of the other file gets discarded. To avoid this you need to store the result of the first read:
endFil1 = ! getline(file1,lineFile1);
endFil2 = ! getline(file2,lineFile2);
then, after the main loop, start copying the input file's tail with the unsaved line:
while(!endFil1) {
mergeFile << lineFile1 << endl;
endFil1 = !getline(file1,lineFile1);
}
while(!endFil2) {
mergeFile << lineFile2 << endl;
endFil2 = !getline(file2,lineFile2);
}

Related

My code seems to enter an infinite loop after loading a list from a binary file

I was studying the STL and decided to write some code to practice the writing and reading of files. The problem consists of creating a list of int (0, 1,...,9), save it in a binary file, and finally load it again.
There are 5 basic code blocks:
1. create list
2. present list
3. save list
4. load list
5. present list again
It seems simple and straightforward; however, the code seems to get in an infinite loop.
int main(){
list<int> numbers;
/////// Create list of 10 integers ///////
for(int i=0; i<10; i++){
numbers.push_back(i);
}
/////// Present List ///////
cout << "List created: [";
list<int>::iterator it;
for(it = numbers.begin(); it != numbers.end(); it++){
if(*it != 9){
cout << *it << ", ";
}
else{
cout << *it;
}
}
cout << "]" << endl;
/////// Save list ///////
string fileName = "test.bin";
ofstream outputFile;
outputFile.open(fileName, ios::binary);
if(outputFile.is_open()){
outputFile.write(reinterpret_cast<char *>(&numbers), sizeof(numbers));
outputFile.close();
cout << "List saved to file." << endl;
}
else{
cout << "Could not open file named " << fileName << endl;
}
/////// Load list ///////
list<int> anotherList;
ifstream inputFile;
inputFile.open(fileName, ios::binary);
if(inputFile.is_open()){
inputFile.read(reinterpret_cast<char *>(&anotherList), sizeof(anotherList));
inputFile.close();
cout << "List loaded from file." << endl;
}
else{
cout << "Could not open file named " << fileName << endl;
}
/////// Present List ///////
cout << "List loaded: [";
for(it = anotherList.begin(); it != anotherList.end(); it++){
if(*it != 9){
cout << *it << ", ";
}
else{
cout << *it;
}
}
cout << "]" << endl;
return 0;
}
The problem is in the "Load List" code block, since, if I comment it out, everything works fine.
Am I saving the object correctly? What am I doing wrong?
Thanks in advance.
The problem lies in the flawed logic of reinterpret_cast<char *>(&numbers). Why?
std::list manages its storage using pointers. It simply holds a pointer to a chain of elements consisting of some objects and a pointer to the next element. You cannot simply treat it like a sequence of bytes and expect it to maintain its functionality.
What you instead need to do is to loop over the elements and write them to the file one by one:
#include <fstream>
#include <iostream>
#include <list>
int main() {
std::fstream file{};
file.open("data.txt", std::ios::binary | std::ios::out);
std::list<int> ints{2, 4, 5, 6, 8, 1, 3, 5, 7, 9};
for (int i : ints) {
file.write(reinterpret_cast<char*>(&i), sizeof(i));
}
file.flush();
file.close();
file.open("data.txt", std::ios::binary | std::ios::in);
ints.clear();
std::cout << "Before reading the file, size of the list is: " << ints.size() << '\n';
for (int i; file.read(reinterpret_cast<char*>(&i), sizeof(i)); ints.push_back(i));
for (int i : ints) {
std::cout << i << ' ';
}
}
Clarification of the second for loop:
for (int i; file.read(reinterpret_cast<char*>(&i), sizeof(i)); ints.push_back(i));
We declare a variable i since we need a place where we read the data. This one should be quite clear. We do not need to initialize i, since the condition of the loop will take care of that (although it would probably be a good practice to do it anyway).
The condition part: file.read(reinterpret_cast<char*>(&i), sizeof(i)). This may seem tricky at first, but it really isn't! First of all, we have a method call. We call std::basic_istream::read, specifying the two arguments - first, the memory address where to read the variable, and second, the number of bytes we want to read. The trick is that the read method not only reads and saves the data - it also returns the stream, so essentially after the data processing, we are left with the condition file. But it's not a bool, is it? Correct, it's not a bool (neither an int), but stream objects can be implicitly converted to bool, which is exactly what happens here! The rules are as follows: if the stream is in a correct state, the conversion returns true. It returns false otherwise. An incorrect state may be cauased, for example, by failure to read, which happens, for example, when you already have read the whole file. Essentially, this part both reads from the file and checks whether the reading process executed successfully. It's both the reading logic and the condition!
The third part: ints.push_back(i). Notice that this part only executes if the condition (reading from the file) executed successfully. It simply adds the read int (i) to the ints container.
All in all, you can read the for loop in the following way:
create a variable i, which will store, one by one, the variables from the file
as long as reading from the file is successful...
...add the read value to the container
outputFile.write(reinterpret_cast<char *>(&numbers), sizeof(numbers));
What you actually print is the binary representation of the list object itself. Unfortunately, it does not contain the data of the list directly, but instead looks similar to something like this:
template <typename T>
class list
{
struct node
{
node* next;
node* previous;
T data;
};
node* m_head;
node* m_tail;
size_t m_size;
public:
// ...
};
No direct link to the data. Even worse: With std::list, the data can get shattered all over your memory (in contrast to std::vector which assures contiguous data).
So you only can iterate over your list again (either with the iterator variant you chose already before or, more convenient, with a range based for loop):
for(auto n : numbers)
{
outputFile.write(reinterpret_cast<char*>(&n), sizeof(n));
}
Reading is different; you don't know the size in advance, do you? Well, there are ways to retrieve it (seekg, tellg), but that's more of interest if you want to read all the data into contiguous memory at once (you could reserve sufficient of in a std::vector), but that's another issue.
For the list approach:
int n;
while(inputFile.read(reinterpret_cast<char*>(&n), sizeof(n)))
{
anotherList.push_back(n);
}

How to determine in C++ if an element in a text file is a character or numeric?

I am trying to write a code in C++ reading a text file contains a series of numerics. For example, I have this .txt file which contains the following series of numbers mixed with a character:
1 2 3 a 5
I am trying to make the code capable of recognizing numerics and characters, such as the 4th entry above (which is a character), and then report error.
What I am doing is like
double value;
while(in) {
in >> value;
if(!isdigit(value)) {
cout << "Has non-numeric entry!" << endl;
break;
}
else
// some codes for storing the entry
}
However, the isdigit function doesn't work for text file. It seems when I am doing in >> value, the code will implicitly type-cast a into double.
Can anyone give me some suggestion?
Thanks a lot!
Your while loop doesn't do what you think it does.
It only iterates one statement:
in >> value;
The rest of the statements are actually outside the loop.
Using curly braces for the while body is always recommended
I created a small mini script where I would be reading in a file through a standard fstream library object as I was a little unsure on what your "in" represented.
Essentially, try to read in every element as a character and check the digit function. If you're reading in elements that are not of just length 1, a few modifications would have to be made. Let me know if that's the case and I'll try to help!
int main() {
std::fstream fin("detect_char.txt");
char x;
while (fin >> x) {
if (!isdigit(x)) {
std::cout << "found non-int value = " << x << '\n';
}
}
std::cout << '\n';
return 0;
}
Try reading the tokens into string and explicitly parsing it
ifstream infile("data.txt");
string token;
while (infile >> token) {
try {
double num = stod(token);
cout << num << endl;
}
catch (invalid_argument e) {
cerr << "Has non-numeric entry!" << endl;
}
}
Since it looks like the Asker's end goal is to have a double value for their own nefarious purposes and not simply detect the presence of garbage among the numbers, what the heck. Let's read a double.
double value;
while (in) // loop until failed even after the error handling case
{
if (in >> value) // read a double.
{
std::cout << value; // printing for now. Store as you see fit
}
else // failed to read a double
{
in.clear(); // clear error
std::string junk;
in >> junk; // easiest way I know of to read up to any whitepsace.
// It's kinda gross if the discard is long and the string resizes
}
}
Caveat:
What this can't handle is stuff like 3.14A. This will be read as 3.14 and stop, returning the 3.14 and leave the A for the next read where it will fail to parse and then be consumed and discarded by in >> junk; Catching that efficiently is a bit trickier and covered by William Lee's answer. If the exception handling of stod is deemed to expensive, use strtod and test that the end parameter reached the end of the string and no range errors were generated. See the example in the linked strtod documentation

Using fscanf to read from tabbed file with ints and floats in C++

I have looked for a day or so on StackOverflow and other sites, and I can't find a solution to my problem. There are some that are similar, but I can't seem to make them work.
I have a tab-delimited .txt file. One line contains a heading, and 500 lines after that each contain an integer, an integer, a float, an integer, and an integer, in that order. I have a function that is supposed to read the first and third values (the first integer and the float) from each line. It skips the first line. This is in a do-while loop, because I need to be able to process files of different lengths. However, it's getting stuck in the loop. I have it set to output the mean, but it just outputs zeros forever.
void HISTS::readMeans(int rnum) {
int r;
char skip[500];
int index = 0; int area = 0; double mean = 0; int min = 0; int max = 0;
FILE *datafile = fopen(fileName,"r");
if(datafile == NULL) cout << "No such file!\n";
else {
//ignore the first line of the file
r = fscanf(datafile,"%s\n",skip);
cout << skip << endl; //just to check
//this is the problematic code
do {
r = fscanf(datafile,"%d\t%d\t%f\t%d\t%d\n",&index,&area,&mean,&min,&max);
cout << mean << " ";
} while(feof(datafile) != 1)
}
fclose(datafile);
}
Here is a sample data file of the format I'm trying to read:
Area Mean Min Max
1 262144 202.448 160 687
2 262144 201.586 155 646
3 262144 201.803 156 771
Thanks!
Edit: I said I need to read the first and third value, and I know I'm reading all of them. Eventually I need to store the first and third value, but I cut that part for the sake of brevity. Not that this comment is brief.
You should do it C++ style,
#include <iostream>
#include <fstream>
int main() {
std::ifstream inf("file.txt");
if (!inf) { exit(1); }
int idx, area, min, max;
double mean;
while (inf >> idx >> area >> mean >> min >> max) {
if (inf.eof()) break;
std::cout << idx << " " << area << " " << mean << " " << min << " " << max << std::endl;
}
return 0;
}
It is :
1) Easy to read.
2) Less code, so less chance of error.
3) Correct handling of EOF.
Although I have left handling of first line, that is upto you.
fscanf returns the number of arguments read. Thus, if it returns less than 5 you should exit the loop.
OP ended up using operator>>, which is the correct way to do this in C++. However, for the interested C reader, there were a couple of issues in the code posted:
mean was declared as double but read using the wrong format specifier %f instead of %lf.
The first line wasn't completely read, but only the first token, Area.
A possible way to implement the desired task is as follows:
r = fscanf(datafile,"%[^\n]\n",skip);
// ^^^^^ read till newline
while ( (r = fscanf(datafile,"%d%d%lf%d%d",&index,&area,&mean,&min,&max)) == 5 ) {
// ^^ correct format specifier for double
// ...
}

How to convert vector to string and convert back to vector

----------------- EDIT -----------------------
Based on juanchopanza's comment : I edit the title
Based on jrok's comment : I'm using ofstream to write, and ifstream to read.
I'm writing 2 programs, first program do the following tasks :
Has a vector of integers
convert it into array of string
write it in a file
The code of the first program :
vector<int> v = {10, 200, 3000, 40000};
int i;
stringstream sw;
string stringword;
cout << "Original vector = ";
for (i=0;i<v.size();i++)
{
cout << v.at(i) << " " ;
}
cout << endl;
for (i=0;i<v.size();i++)
{
sw << v[i];
}
stringword = sw.str();
cout << "Vector in array of string : "<< stringword << endl;
ofstream myfile;
myfile.open ("writtentext");
myfile << stringword;
myfile.close();
The output of the first program :
Original vector : 10 200 3000 40000
Vector in string : 10200300040000
Writing to File .....
second program will do the following tasks :
read the file
convert the array of string back into original vector
----------------- EDIT -----------------------
Now the writing and reading is fine, thanks to Shark and Jrok,I am using a comma as a separator. The output of first program :
Vector in string : 10,200,3000,40000,
Then I wrote the rest of 2nd program :
string stringword;
ifstream myfile;
myfile.open ("writtentext");
getline (myfile,stringword);
cout << "Read From File = " << stringword << endl;
cout << "Convert back to vector = " ;
for (int i=0;i<stringword.length();i++)
{
if (stringword.find(','))
{
int value;
istringstream (stringword) >> value;
v.push_back(value);
stringword.erase(0, stringword.find(','));
}
}
for (int j=0;j<v.size();i++)
{
cout << v.at(i) << " " ;
}
But it can only convert and push back the first element, the rest is erased. Here is the output :
Read From File = 10,200,3000,40000,
Convert back to vector = 10
What did I do wrong? Thanks
The easiest thing would be to insert a space character as a separator when you're writing, as that's the default separator for operator>>
sw << v[i] << ' ';
Now you can read back into an int variable directly, formatted stream input will do the conversion for you automatically. Use vector's push_back method to add values to it as you go.
Yes, this question is over a year old, and probably completely irrelevant to the original asker, but Google led me here so it might lead others here too.
When posting, please post a complete minimal working example, having to add #include and main and stuff is time better spent helping. It's also important because of your very problem.
Why your second code isn't working is all in this block
for (int i=0;i<stringword.length();i++)
{
if (stringword.find(','))
{
int value;
istringstream (stringword) >> value;
v.push_back(value);
stringword.erase(0, stringword.find(','));
}
}
istringstream (stringword) >> value interprets the data up to the comma as an integer, the first value, which is then stored.
stringword.find(',') gets you the 0-indexed position of the comma. A return value of 0 means that the character is the first character in the string, it does not tell you whether there is a comma in the string. In that case, the return value would be string::npos.
stringword.erase deletes that many characters from the start of the string. In this case, it deletes 10, making stringword ,200,3000,40000. This means that in the next iteration stringword.find(',') returns 0.
if (stringword.find(',')) does not behave as wished. if(0) casts the integer to a bool, where 0 is false and everything else is true. Therefore, it never enters the if-block again, as the next iterations will keep checking against this unchanged string.
And besides all that there's this:
for (int j=0;j<v.size();i++)
{
cout << v.at(i) << " " ;
}
it uses i. That was declared in a for loop, in a different scope.
The code you gave simply doesn't compile, even with the added main and includes. Heck, v isn't even defined in the second program.
It is however not enough, as the for condition stringword.length() is recalculated every loop. In this specific instance it works, because your integers get an extra digit each time, but let's say your input file is 1,2,3,4,:
The loop executes normally three times
The fourth time, stringword is 4, stringword.length() returns 2, but i is already valued 3, so i<stringword.length() is invalid, and the loop exits.
If you want to use the string's length as a condition, but edit the string during processing, store the value before editing. Even if you don't edit the string, this means less calls to length().
If you save length beforehand, in this new scenario that would be 8. However, after 4 loops string is already empty, and it executes the for loop some more times with no effect.
Instead, as we are editing the string to become empty, check for that.
All this together makes for radically different code altogether to make this work:
while (!stringword.empty())
{
int value;
istringstream (stringword) >> value;
v.push_back(value);
stringword.erase(0, stringword.find(',')+1);
}
for (int i = 0; i < v.size(); i++)
{
cout << v.at(i) << " " ;
}
A different way to solve this would have been to not try to find from the start, but from index i onwards, leaving a string of commas. But why stick to messy stuff if you can just do this.
And that's about it.

compile c++ project on ubuntu

i'm writing my c++ project and in visual studio everything goes good but when i'm compiling it on ubuntu many things get wrong.
example:
int main (int argsNum, char* args[]){
Country* country = new Country("USA");
Military* military = new Military("Army",country);
Shalishut* shalishut = new Shalishut(military);
Manager* manager = Manager::GetInstance();
FileReader* fileReader = FileReader::GetInstance();
fileReader->ReadCityConfig(args,country);
fileReader->ReadRoadConfig(args,country);
fileReader->ReadMilitrayCampConfig(args,military);
military->ShowBases();
return 0;
}
void FileReader::ReadMilitrayCampConfig(char* args[], Military* military){
string line;
char inputFileName [MAX_FILE_NAME_LEN];
strcpy (inputFileName,args[3]);
ifstream myfile (inputFileName); //inputFileName
char* campName;
string cityName;
if (myfile.is_open()){
while (!myfile.eof()){ //until the end of file
getline (myfile,line); //separate each line.
if ((line.size() != 0) && (line[0] != '#')) {
campName = strtok(&line[0],",");
cityName = (string)strtok(NULL,",");
Shalishut::FixName(campName); Shalishut::FixName(&cityName[0]);
if (!(military->IsBaseExist(campName))){
if (military->GetCountry()->IsCityExist(cityName)){
Base* baseToAdd = new Base(campName,cityName);
if (baseToAdd != NULL){
military->AddBaseToMilitary(baseToAdd);
military->GetCountry()->FindCity(cityName)->AddBaseToCity(baseToAdd);
}
}
else cout << "ERROR: City named \"" << cityName << "\" does not exist, can't add base \"" << campName << "\" !" << endl<<endl;
}
else cout << "ERROR: Base Named \"" << campName << "\" is already exist in Military, can't create base!" << endl<<endl;
}
}
myfile.close();
}
else throw ExceptionMilitaryCampConfigFileFault(); /*cout << "ERROR: Unable to open MilitaryConfig file!"<< endl;*/
}
bool Country::IsCityExist(const string cityName){
map<string ,City*>::iterator itCities;
itCities = m_cities.find((string)cityName);
if (itCities != m_cities.end()) return true;
else return false;
}
void Shalishut::FixName(char* name){
int i;
name[0] = toupper(name[0]);
for (i=1 ; name[i] ; i++){
name[i] = tolower (name[i]);
}
}
}
The problem is that the program reads the cities and the roads, but when it reads the military camp i got:
" does not exist, can't add base "Hazerim" !
even though in the config file i have base in the same name.
remind: in visual studio it works perfectly!
Assuming the error message is actually ERROR: City named _____ does not exist, can't add base "Hazerim" I would look carefully at the capitalization/spelling of the cities and city-for-base in your inputs. They probably don't match.
Also using strtok on a std::string is just asking for trouble, as it's destructive and strings don't expect their internal state to be blown away randomly. There are method like find_first_of that will help you parse C++ strings.
Like others have said:
double check line endings (maybe run dos2unix on input files in lieu of a more robust / error=prone solution)
make sure the case of everything is correct, file names are case sensitive
be aware of where it is looking for files, make sure everything is in the CWD
I'd advise not messing around with std::string internals. I don't know that it's legal, and it certainly could cause problems. Use .c_str() to get the C-style string and copy it to a char [], or use string functions to parse the input.
To debug, put insome output statements so you can see what the string values are, or learn a bit about gdb and step through a short initialization run.
That cityname = (string)... is just plain ugly. Since you're not using cityname out of that scope, you can declare string cityname(...);, and cityname will always be initialized and will be defined close to where it's used.