I wrote a program to do an external mergesort on a file of 100,000 doubles. I couldn't quickly find and external storage libraries for c++ because googling it just leads to a bunch of pages about the extern keyword, so I decided to just write my own, and I think that's where the problem is.
The program actually works, except for a couples details. The output fill will have all of the doubles in sorted order, but at the end of the file are 30 lines of
-9.2559631349317831e+061
which is not in the input file. I also have 21 more values in the output file and the input file, not counting the 30 lines of the single number I just mentioned.
How the program runs is it reads the 100,000 doubles ~4000 lines at a time and sorts them, then stores them in to 26 text files, then those 26 files are merged into 13 files, and those 13 into 7, etc... until there is only one file.
I'm sorry if the code is really ugly, I figured out all of the external storage stuff on my own by pencil, paper, trial, and error. The program is not going to be used for anything. I haven't cleaned it up yet. The driver doesn't do much other than call these methods.
//reads an ifstream file and stores the data in a deque. returns a bool indicating if the file has not reached EOF
bool readFile(ifstream &file, deque<DEQUE_TYPE> &data){
double d;
for(int i = 0; i < DEQUE_SIZE && file.good(); i++){
file >> d;
data.push_back(d);
}
return file.good();
}
//opens a file with the specified filename and prints the contents of the deque to it. if append is true, the data will be appended to the file, else it will be overwritten
void printFile(string fileName, deque<DEQUE_TYPE> &data, bool append){
ofstream outputFile;
if(append)
outputFile.open(fileName, ios::app);
else
outputFile.open(fileName);
outputFile.precision(23);
while(data.size() > 0){
outputFile << data.front() << endl;
data.pop_front();
}
}
//merges the sortfiles until there is one file left
void mergeFiles(){
ifstream inFile1, inFile2;
ofstream outFile;
string fileName1, fileName2;
int i, k, max;
deque<DEQUE_TYPE> data1;
deque<DEQUE_TYPE> data2;
bool fileGood1, fileGood2;
i = 0;
k = 0;
max = 25;
while(max > 1){
fileName1 = ""; fileName1 += "sortfile_"; fileName1 += to_string(i); fileName1 += ".txt";
fileName2 = ""; fileName2 += "sortfile_"; fileName2 += to_string(i+1); fileName2 += ".txt";
try{
inFile1.open(fileName1);
inFile2.open(fileName2);
} catch(int e){
cout << "Could not open the open the files!\nError " << e;
}
fileGood1 = true;
fileGood2 = true;
while(fileGood1 || fileGood2){
fileGood1 = readFile(inFile1, data1);
fileGood2 = readFile(inFile2, data2);
data1 = merge(data1, data2);
printFile("temp", data1, true);
data1.clear();
}
inFile1.close();
inFile2.close();
remove(fileName1.c_str());
remove(fileName2.c_str());
fileName1 = ""; fileName1 += "sortfile_"; fileName1 += to_string(k); fileName1 += ".txt";
rename("temp", fileName1.c_str());
i = i + 2;
k++;
if(i >= max){
max = max / 2 + max % 2;
i = 0;
k = 0;
}
}
}
//merge function
deque<double> merge(deque<double> &left, deque<double> &right){
deque<double> result;
while(left.size() > 0 || right.size() > 0){
if (left.size() > 0 && right.size() > 0){
if (left.front() <= right.front()){
result.push_back(left.front());
left.pop_front();
}
else{
result.push_back(right.front());
right.pop_front();
}
}
else if(left.size() > 0){
result.push_back(left.front());
left.pop_front();
}
else if(right.size() > 0){
result.push_back(right.front());
right.pop_front();
}
}
return result;
}
I sorted a file of 26 numbers (0 - 25), as ThePosey suggested, and here are the results:
-9.2559631349317831e+061 (47 lines of this)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
25
25
25
25
25
So I'm pretty sure the last number of the file is being duplicated, but I'm still not sure what the 47 occurrences of the random large number is caused by. I checked and the last number of the 100,000 number word is only in the output file twice, not 22 times, So I think I have 11 separate last number being duplicated.
I don't know if this is the whole problem or not, but you have a classic error in your input loop. file.good() doesn't guarantee that the next read will succeed, it only tells you that the previous one did. Try restructuring it like this:
for(int i = 0; i < DEQUE_SIZE && (file >> d); i++){
data.push_back(d);
}
The expression file >> d returns a reference to file, which calls good when you try to evaluate it as a boolean.
Is there a reason why you can't use a few megs of memory to read the entire list in at once into RAM and sort it all at once? It would simplify your program a lot. If you are trying to do this as a challenge I would start by shrinking the problem to say like 1 file of 100 doubles, split that into 4, 25 double reads, and then it should be very easy to trace through and see where the additional lines are coming from.
Assuming your files are in text format, you can use std::merge to do an external merge just as well as an internal one, by using std::istream_iterators.
std::ifstream in1("temp1.txt");
std::ifstream in2("temp2.txt");
std::ofstream out("output.txt");
std::merge(std::istream_iterator<double>(in1),
std::istream_iterator<double>(),
std::istream_iterator<double>(in2),
std::istream_iteraror<double>(),
std::ostream_iterator<double>(out, "\n"));
Related
I am trying to pull a series of video game console names from a text file. The text file reads as such,
Nintendo Entertainment System
57
Sega Genesis
34
Microsoft Xbox 360
58
Sony PlayStation 4
261
Atari 2600
26
Nintendo Game Cube
52
So what I set up was a do while to get save the names in an array, then their prices, then repeat. The problem seems to be it gets the first console name, gets the first price, then when it repeats it fails and no more names load and the prices go to -858993460.
This is my loadArrays function -
void loadArrays(string consoleNames[], int consolePrices[], int& size)
{
ifstream dataFile("prices.txt");
int i = 0;
int cont = 1;
do
{
getline(dataFile, consoleNames[i]);
dataFile >> consolePrices[i];
i++;
size += 1;
/*if (consolePrices = 0)
{
size = size - 1;
for (int j = 0; i < size; j++)
{
consoleNames[j] = consoleNames[j + 1];
consolePrices[j] = consolePrices[j + 1];
}
cont = 0;
}*/
if (size == 6)
break;
} while (cont == 1);
}
And this is what I get back
Failed Output
The problem could be the getline not moving to the next string properly?
dataFile >> consolePrices[i]; on the first iteration leaves \n in the istream.
Then getline(dataFile, consoleNames[i]); extracts that new line symbol on the second iteration and you get the empty string. After that dataFile >> consolePrices[i]; tries to read int but meets the string Sega Genesis, leaves the consolePrices[i]; uninitialized and the istream in fail. Further iterations can't read the failed istream.
I'm trying to find the fastest way for getting numbers from file. There can be negative numbers. My e.x. input:
5 3
-5 -6 2 -1 4
1 2 3 4
4 3 2 1
I'm using:
getline(cin, line);
istringstream linestream(line);
linestream >> var;
The result is okay but my program has run time error with last test, maybe min. 100 000 numbers. My question is, is there a faster way to get string and split it to numbers than my solution? The time is the most important.
If there's only numbers in your input, you could do:
std::vector<int> numbers;
int i;
while(cin >> i) {
numbers.push_back(i);
}
To stop the input from cin you'll need to send the EOF (End Of File) signal, which is either Ctrl+D or Ctrl+Z depending on your OS.
The input from a file will automatically stop when the end of the file is reached.
See c++ stringstream is too slow, how to speed up?
For your runtime error, you didn't post compilable code, and your error is in what you didn't post.
The best is make a function that reads a file line by line and puts each line elements in an array( if you are just printing just print it dont store in an array).I am using c function instead of c++ streams because for large data they are faster.The function should use fgetc which is faster than fscanf when used for large data.If in your system fgetc_unlocked works fine then you should replace that to fgetc
-5 -6 2 -1 4
1 2 3 4
Assume the input is like above and is stored in input.txt. Just make the input.txt in your dir and run the following code in the same dir. You can make changes later how you want to use the numbers
#include<iostream>
#include<cstdio>
using namespace std;
#define GC fgetc // replace with fgetc_unlocked if it works in your system(Linux)
//This function takes a line of f and put all integers into A
//and len is number of elements filled
void getNumsFromLine( FILE* f, int *A, int& len){
register char ch=GC(f);
register int n=0;
register bool neg = false;
len=0;
while( ch!='\n' ){
while( ch !='-' && (ch>'9' || ch<'0') ) ch=GC(f);
if( ch=='-') {
neg = true;
ch = GC(f);
}
while( ch>='0' && ch<='9' ){
n = (n<<3)+(n<<1)+ch-'0';
ch = GC(f);
}
if(neg) {
n=-n;
neg=false;
}
A[len++]=n;
n=0;
}
}
int main(){
FILE* f=fopen("input.txt", "r");
int A[10][2],len;
for( int i=0; i<2; i++ ){
getNumsFromLine( f, A[i], len );
for( int j=0; j<len; j++ ) cout << A[i][j] <<" ";
cout << endl;
}
fclose(f);
return 0;
}
I have hit a brick wall trying to format one of my files. I have a file that I have formatted to look like this:
0 1 2 3 4 5
0.047224 0.184679 -0.039316 -0.008939 -0.042705 -0.014458
-0.032791 -0.039254 0.075326 -0.000667 -0.002406 -0.010696
-0.020048 -0.008680 -0.000918 0.302428 -0.127547 -0.049475
...
6 7 8 9 10 11
[numbers as above]
12 13 14 15 16 17
[numbers as above]
...
Each block of numbers has exactly the same number of lines. What I am trying to do is basically move every block (including the headers) to the right of the first block so in the end my output file would look like this:
0 1 2 3 4 5 6 7 8 9 10 11 ...
0.047224 0.184679 -0.039316 -0.008939 -0.042705 -0.014458 [numbers] ...
-0.032791 -0.039254 0.075326 -0.000667 -0.002406 -0.010696 [numbers] ...
-0.020048 -0.008680 -0.000918 0.302428 -0.127547 -0.049475 [numbers] ...
...
So in the end I should basically get a nxn matrix (only considering the numbers). I already have a python/bash hybrid script that can format this file
exactly like this BUT I've switched the running of my code from Linux to Windows and hence cannot use the bash part of the script anymore (since my code has to be compliant will all versions of Windows). To be honest I have no idea how to do it so any help would be appreciated!
Here's what I tried until now (it's completely wrong I know but maybe I can build on it...):
void finalFormatFile()
{
ifstream finalFormat;
ofstream finalFile;
string fileLine = "";
stringstream newLine;
finalFormat.open("xxx.txt");
finalFile.open("yyy.txt");
int countLines = 0;
while (!finalFormat.eof())
{
countLines++;
if (countLines % (nAtoms*3) == 0)
{
getline(finalFormat, fileLine);
newLine << fileLine;
finalFile << newLine.str() << endl;
}
else getline(finalFormat, fileLine);
}
finalFormat.close();
finalFile.close();
}
For such a task, I would do it the simple way. As we already know the number of lines and we know the pattern, I would simply keep a vector of strings (one entry per line of the final file) that I would update as I'm parsing the input file. Once it's done, I would iterate through my strings to print them into the final file. Here is a code that's doing it :
#include <iostream>
#include <string>
#include <fstream>
#include <vector>
int main(int argc, char * argv[])
{
int n = 6; // n = 3 * nAtoms
std::ifstream in("test.txt");
std::ofstream out("test_out.txt");
std::vector<std::string> lines(n + 1);
std::string line("");
int cnt = 0;
// Reading the input file
while(getline(in, line))
{
lines[cnt] = lines[cnt] + " " + line;
cnt = (cnt + 1) % (n + 1);
}
// Writing the output file
for(unsigned int i = 0; i < lines.size(); i ++)
{
out << lines[i] << std::endl;
}
in.close();
out.close();
return 0;
}
Note that, depending of the structure of your input/ouput files, you might want to adjust the line lines[cnt] = lines[cnt] + " " + line in order to separate the columns with the right delimiter.
when I use c++ to process a file ,I found there is always a blank line in the end of file .Someone says that vim will append an '\n' in the end of file,but when I use gedit,it also has the same question.Can anyone tell me the reason?
1 #include<iostream>
2 #include<fstream>
3
4 using namespace std;
5 const int K = 10;
6 int main(){
7 string arr[K];
8 ifstream infile("test1");
9 int L = 0;
10 while(!infile.eof()){
11 getline(infile, arr[(L++)%K]);
12 }
13 //line
14 int start,count;
15 if (L < K){
16 start = 0;
17 count = L;
18 }
19 else{
20 start = L % K;
21 count = K;
22 }
23 cout << count << endl;
24 for (int i = 0; i < count; ++i)
25 cout << arr[(start + i) % K] << endl;
26 infile.close();
27 return 1;
28 }
while test1 file just:
abcd
but the program out is :
2
abcd
(upside is a blank line)
while(!infile.eof())
infile.eof() only is true after you tried to read beyond the end of the file. So the loop tries to read one more line than there is and gets an empty line on that attempt.
It's a matter of order, you're reading, assigning and after checking...
you should change your code a little bit, in order to read, check and assign:
std::string str;
while (getline(infile, str)) {
arr[(L++)%K] = str;
}
http://www.parashift.com/c++-faq-lite/istream-and-eof.html
How to determine whether it is EOF when using getline() in c++
I have written a code to read file below but its not working.
Input file:
2 1 16
16 0 0
1 1 1234
16 0 0
1 1 2345
code is:
std::ifstream input_file;
evl_wire wire;
int num_pins,width,cycles,no;
std::vector<int>IP;
while(input_file)
{
input_file >> num_pins;//num_pins=2
if (pins_.size() != num_pins) return false;
for (size_t i = 0; i < pins_.size(); ++i)
{
input_file >> width;//width=1 for 1=0 ,=16 for i=2
if (wire.width != width) return false;
pins_[i]->set_as_output();
}
for (size_t i = 1; i < file_name.size(); i=i+1)
input_file>>cycles;
input_file>>no;
pins_=IP;
}
where std::vector<pin *> pins_; is in gate class and void set_as_output(); is in pin class
2 represent no of pins,1 width of first pin and 16 width of second pin.
here from second line in file 16 is no of cycles pins must remain at 0 0,for next 1 cycle pins must be assigned 1 and 1234 as inputs.
Some parts of your code are almost certainly wrong. Other parts I'm less certain about -- they don't make much sense to me, but maybe I'm just missing something.
while(input_file)
This is almost always a mistake. It won't sense the end of the file until after an attempt at reading from the file has failed. In a typical case, your loop will execute one more iteration than intended. What you probably want is something like:
while (input_file >> num_pins)
This reads the data (or tries to, anyway) from the file, and exits the loop if that fails.
if (pins_.size() != num_pins) return false;
This is less clear. It's not at all clear why we'd read num_pins from the file if we already know what value it needs to be (and the same seems to be true with width vs. wire.width).
for (size_t i = 1; i < file_name.size(); i=i+1)
input_file>>cycles;
This strikes me as the most puzzling part of all. What does the size of the string holding the file name have to do with anything? This has be fairly baffled.
I don't fully understand your code, but I don't see you are opening the input file anywhere. I think it should be:
std::ifstream input_file;
evl_wire wire;
int num_pins,width,cycles,no;
std::vector<int>IP;
input_file.open("name of the file");
if(input_file.is_open())
{
while(input_file >> num_pins) //num_pins=2
{
if (pins_.size() != num_pins) return false;
for (size_t i = 0; i < pins_.size(); ++i)
{
input_file >> width;//width=1 for 1=0 ,=16 for i=2
if (wire.width != width) return false;
pins_[i]->set_as_output();
}
for (size_t i = 1; i < file_name.size(); i=i+1)
input_file>>cycles;
input_file>>no;
pins_=IP;
}
input_file.close();
}
The function I used:
bool input::validate_structural_semantics()
{
evl_wire wire;
std::ifstream input_file;std::string line;
int x[]={1000};
for (int line_no = 1; std::getline(input_file, line); ++line_no)
std::string s; int i=0;
std::istringstream iss;
do
{
std::string sub;
iss >> sub;
x[i]=atoi(sub.c_str());
i++;
}
while (iss);
if (pins_.size()!=x[0]) return false;
for (size_t i = 0; i < pins_.size(); ++i)
{
if (wire.width != x[i+1]) return false;
pins_[i]->set_as_input();
}
for(size_t i=4;i<1000;i++)
{
for(size_t j=0;j<pins_.size();j++)
pins_.assign(x[i-1],x[i+j]);
}
return true;
}
This implementation is using arrays but it didn't work,although there isn't any compling error.