Fixed Length Flat File Table Algorithms in c++ - c++

I am doing simple project for table processing using flat files in c++. I have two type of files to access the table data.
1) Index File. ( employee.idx )
2) Table File. ( employee.tbl )
In index file, I have table details in the format of tab delimited . i.e.,
Column-name Column-Type Column-Offset Column-Size
for example, employee.idx
ename string 0 10
eage number 10 2
ecity string 12 10
In Table file, I have the data in the format of Fixed Length.
for example, employee.tbl
first 25Address0001
second 31Address0002
Here I will explain my algorithm what I did in my program.
1) First I have loaded index file data in 2D vector String ( Index Vector ) using fstream.
2) This is my code to load Table File Data into 2D
while (fsMain)
{
if (!getline( fsMain, s )) break;
string s_str;
for(size_t i=0;i<idxTable.size();i++)
{
int fieldSize=stoi(idxTable[i].at(3));
string data (s,stoi(idxTable[i].at(2)),fieldSize);
string tmp=trim_right_inplace(data);
recordVec.push_back( tmp );
}
mainTable.push_back(record);
recordVec.clear();
s="";
}
Ok. Now my question is , " Is there any other better way to load the Fixed length data to memory ? ". I checked this process for 60 tables with 200 Records. It takes nearly 20 Seconds. But I want to load 100 tables with 200 records within one Second. But It takes more time. How can I improve efficiency for this task ?

Related

Read a file in by columns in C++

I have a file that contains numbers:
23 899 234 12
12 366 100 14
10 256 500 23
14 888 564 30
How can I read this file by column using C++? I searched YouTube but they only read file by row. If I have to find the highest value from the first column, I need to read it by columns right?
Try initialising a variable called column =0
Now when you access row iterate through column using a for loop until end of column which can be found out by no of lines in a file, then implement the operation what you are willing to do, now increase column no for going to next column.
And you can get value in form of n*vectors
Where n is no of rows
Dimension of vector is length of column
Files are storages with sequential access, so , generally you have to read everything. Essentially it's like reading from a tape. And format of your file doesn't offer any shortcuts.
But, you can fast-forward and rewind along file, using seek() if it's a file on permanent storage. It's not effective and you have to know position where to go. If your records are of same size, you can advance by fixed amount of bytes.
That is usually done with binary formats and those formats are designed to have some kind directory or other auxiliary data to help searching for proper position..
Read each line and split it by space and store the first value as highest value. repeat the steps for all rows until you reach end of file while comparing the first value again with already stored value as highest.
I don't have C++ compiler available with myself to test. it should work for you conceptually.
#include <fstream>
#include <string>
#include <iostream>
#include<sstream>
using namespace std;
int main() {
ifstream inFile("Read.txt");
string line;
int greatest = 0;
while (getline(inFile, line)) {
stringstream ss(line);
string FirstColumn;
while (ss >> FirstColumn) {
if (stoi(FirstColumn) > greatest)
greatest = stoi(FirstColumn);
}
cout << greatest << endl;
}
}

Regarding write into file in csv

I have a CSV which must be read and have duplicate values removed before it gets written.
Duplicate value would be based on two columns (date, price) (AND conditional statement). Therefore in the example below row 1, row 2, and row 4 would get written to CSV. Row 3 would qualify as a duplicate (since the same date and price match row 1) and would be excluded (not written to CSV).
address floor date price
40 B STREET 18 3/29/2015 2200000
40 B STREET 23 1/7/2015 999000
40 B STREET 18 3/29/2015 2200000
40 B STREET 18 4/29/2015 2200000
You can use a map of date and price for finding duplicates. I am not sharing the complete code, but this will provide you a pointer of how to do what you want.
1) Create a map of date and price.
std::map<std::string, long> dataMap;
2) Read a line from the CSV. Do a lookup in the dataMap. If key (date) is found, check value (price), if both match, then it is a duplicate and you should ignore this record.
// Read a line from CSV and parse it to store
// different values in below variables. This should
// be in a loop where each loop will be fetching
// a line from the CSV. The loop should continue till
// you reach end of the input CSV file.
int floorValue;
std::string addressValue;
std::string dateValue;
long priceValue;
// The benefit of using map is observed here, where you
// can find a particular date in O(log n) time complexity
auto it = dataMap.find(dateValue)
if (it != dataMap.end())
if (it->second == priceValue)
// Ignore this record
else
{
// Write this entry into another CSV file.
// You can later rename this to the original
// CSV file, which will give an impression that
// your duplicate entries have been removed
// from the original CSV file.
}

C++ --- leading zeros in filenames (CSV-files) --- storing/splitting values of a std::vector

I want to store the values of a std::vector<T> into a bunch of csv-files.
The values shall be split in a way that N (with N < std::vector<T>.size()) values are stored in one csv-file, except the last file, it can store between 1 and N values.
Assuming std::vector<T>.size() == 113 and a N == 10 results in
00.csv
01.csv
.
.
.
12.csv (with only 3 elements)
Is there an easy way out there?
To generate the filename do sprintf(fileName, "%02d.csv", fileNumber);
See http://www.cplusplus.com/reference/cstdio/sprintf/

Reading a file of number into an array while skipping first two values every 1026 entries

I am trying to read in a text file of numbers in which there are 2 values in the beginning that I do not care about, followed by 1024 values that I do care about. The file has approximately 100000 entries that I need to do a calculation on every 1024 of them. The format is something like
1
1025
3000
3572
3579
4023
3593
2930
.
.
.
1
1025
.
.
.
So basically the 1 and the 1025 are header values describing the data set which I need to ignore, then I need to read every value after those header values into an array so I can then run calculations on the values in the array. I was thinking of using while(!file.eof()) but I can not think of how to have the code skip those two numbers while it reads through the 100000 entries. I am pretty new to c++, I usually use GUI's to do my data analysis, but I am on a project that is requiring me to us C++, so I'm really out of my comfort zone here. I appreciate any help I can get.
There are a lot of ways you can do it. The most straight forward example I could think of was:
#include <iostream>
#include <string>
int main()
{
int i = 0;
std::string s;
while( std::cin >> s )
{
if( i++ < 2 ) continue;
std::cout << s;
if( i == 1024 ) i = 0;
}
}

Reading from a file with multiple columns of integers and putting them into arrays

I am creating a command-line minesweeper game which has a save and continue capability. My code generates a file called "save.txt" which stores the position of the mines and the cells that the player has opened. It is separated into two columns delimited by a space where the left column represents the row of the cell and the right column represents the column of the cell in the matrix generated by my code. The following is the contents of save.txt after a sample run:
3 7
3 9
5 7
6 7
8 4
Mine end
2 9
1 10
3 5
1 1
Cell open end
You may have noticed Mine end and Cell open end. These two basically separate the numbers into two groups where the first one is for the position of the mines and the latter is for the position of the cells opened by the player. I have created a code which generates an array for each column provided that the text file contains integers:
int arrayRow[9];
int arrayCol[9];
ifstream infile("save.txt");
int a, b;
while(infile >> a >> b){
for(int i = 0; i < 9; i++){
arrayRow[i] = a;
arrayCol[i] = b;
}
}
As you can see, this won't quite work with my text file since it contains non-integer text. Basically, I want to create four arrays: mineRow, mineCol, openedRow, and openedCol as per described by the first paragraph.
Aside from parsing the string yourself and doing string operations, you can probably redefine the file format to have a header. Then you can parse the once and keep everything in numbers. Namely:
Let the Header be the first two rows
Row 1 = mineRowLen mineColLen
Row 2 = openedRowLen openedColLen
Row 3...N = data
save.txt:
40 30
20 10
// rest of the entries
Then you just read 40 for the mineRow, 30 for mineCol, 20 for openedRow, 10 for openedCol since you know their lengths. This will be potentially harder to debug, but would allow you to hide the save state better to disallow easy modification of it.
You can read the file line by line.
If the line matches "Mine end" or "Cell open end", continue;
Else, split the line by space (" "), and fill the array.