Trouble getting string to print random line from text file

Trouble getting string to print random line from text file - c++

I picked up this bit of code a while back as a way to select a random line from a text file and output the result. Unfortunately, it only seems to output the first letter of the line that it selects and I can't figure out why its doing so or how to fix it. Any help would be appreciated.
#include "stdafx.h"
#include <stdio.h>
#include <iostream>
#include <fstream>
#include <string>
#include <time.h>
using namespace std;
#define MAX_STRING_SIZE 1000
string firstName()
{
string firstName;
char str[MAX_STRING_SIZE], pick[MAX_STRING_SIZE];
FILE *fp;
int readCount = 0;
fp = fopen("firstnames.txt", "r");
if (fp)
{
if (fgets(pick, MAX_STRING_SIZE, fp) != NULL)
{
readCount = 1;
while (fgets (str, MAX_STRING_SIZE, fp) != NULL)
{
if ((rand() % ++readCount) == 0)
{
strcpy(pick, str);
}
}
}
}
fclose(fp);
firstName = *pick;
return firstName;
}
int main()
{
srand(time(NULL));
int n = 1;
while (n < 10)
{
string fn = firstName();
cout << fn << endl;
++n;
}
system("pause");
}

firstName = *pick;
I am guessing this is the problem.
pick here is essentially a pointer to the first element of the array, char*, so of course *pick is of type char.. or the first character of the array.
Another way to see it is that *pick == *(pick +0) == pick[0]
There are several ways to fix it. Simplest is to just do the below.
return pick;
The constructor will automatically make the conversion for you.

Since you didn't specify the format of your file, I'll cover both cases: fixed record length and variable record length; assuming each text line is a record.
Reading Random Names, Fixed Length Records
This one is straight forward.
Determine the index (random) of the record you want.
Calculate the file position = record length * index.
Set file to the position.
Read text from file, using std::getline.
Reading Random Names, Variable Length Records
This assumes that the length of the text lines vary. Since they vary, you can't use math to determine the file position.
To randomly pick a line from a file you will either have to put each line into a container, or put the file offset of the beginning of the line into a container.
After you have your container establish, determine the random name number and use that as an index into the container. If you stored the file offsets, position the file to the offset and read the line. Otherwise, pull the text from the container.
Which container should be used? It depends. Storing the text is faster but takes up memory (you are essentially storing the file into memory). Storing the file positions takes up less room but you will end up reading each line twice (once to find the position, second to fetch the data).
Augmentations to these algorithms is to memory-map the file, which is an exercise for the reader.
Edit 1: Example
include <iostream>
#include <fstream>
#include <vector>
#include <string>
using std::string;
using std::vector;
using std::fstream;
// Create a container for the file positions.
std::vector< std::streampos > file_positions;
// Create a container for the text lines
std::vector< std::string > text_lines;
// Load both containers.
// The number of lines is the size of either vector.
void
Load_Containers(std::ifstream& inp)
{
std::string text_line;
std::streampos file_pos;
file_pos = inp.tellg();
while (!std::getline(inp, text_line)
{
file_positions.push_back(file_pos);
file_pos = inp.tellg();
text_lines.push_back(text_line);
}
}

Related

Parsing a CSV file - C++

C++14
Generally, the staff in university has recommended us to use Boost to parse the file, but I've installed it and not succeeded to implement anything with it.
So I have to parse a CSV file line-by-line, where each line is of 2 columns, separated of course by a comma. Each of these two columns is a digit. I have to take the integral value of these two digits and use them to construct my Fractal objects at the end.
The first problem is: The file can look like for example so:
1,1
<HERE WE HAVE A NEWLINE>
<HERE WE HAVE A NEWLINE>
This format of file is okay. But my solution outputs "Invalid input" for that one, where the correct solution is supposed to print only once the respective fractal - 1,1.
The second problem is: The file can look like:
1,1
<HERE WE HAVE A NEWLINE>
1,1
This is supposed to be an invalid input but my solution treats it like a correct one - and just skips over the middle NEWLINE.
Maybe you can guide me how to fix these issues, it would really help me as I'm struggling with this exercise for 3 days from morning to evening.
This is my current parser:
#include <iostream>
#include "Fractal.h"
#include <fstream>
#include <stack>
#include <sstream>
const char *usgErr = "Usage: FractalDrawer <file path>\n";
const char *invalidErr = "Invalid input\n";
const char *VALIDEXT = "csv";
const char EXTDOT = '.';
const char COMMA = ',';
const char MINTYPE = 1;
const char MAXTYPE = 3;
const int MINDIM = 1;
const int MAXDIM = 6;
const int NUBEROFARGS = 2;
int main(int argc, char *argv[])
{
if (argc != NUBEROFARGS)
{
std::cerr << usgErr;
std::exit(EXIT_FAILURE);
}
std::stack<Fractal *> resToPrint;
std::string filepath = argv[1]; // Can be a relative/absolute path
if (filepath.substr(filepath.find_last_of(EXTDOT) + 1) != VALIDEXT)
{
std::cerr << invalidErr;
exit(EXIT_FAILURE);
}
std::stringstream ss; // Treat it as a buffer to parse each line
std::string s; // Use it with 'ss' to convert char digit to int
std::ifstream myFile; // Declare on a pointer to file
myFile.open(filepath); // Open CSV file
if (!myFile) // If failed to open the file
{
std::cerr << invalidErr;
exit(EXIT_FAILURE);
}
int type = 0;
int dim = 0;
while (myFile.peek() != EOF)
{
getline(myFile, s, COMMA); // Read to comma - the kind of fractal, store it in s
ss << s << WHITESPACE; // Save the number in ss delimited by ' ' to be able to perform the double assignment
s.clear(); // We don't want to save this number in s anymore as we won't it to be assigned somewhere else
getline(myFile, s, NEWLINE); // Read to NEWLINE - the dim of the fractal
ss << s;
ss >> type >> dim; // Double assignment
s.clear(); // We don't want to save this number in s anymore as we won't it to be assigned somewhere else
if (ss.peek() != EOF || type < MINTYPE || type > MAXTYPE || dim < MINDIM || dim > MAXDIM)
{
std::cerr << invalidErr;
std::exit(EXIT_FAILURE);
}
resToPrint.push(FractalFactory::factoryMethod(type, dim));
ss.clear(); // Clear the buffer to update new values of the next line at the next iteration
}
while (!resToPrint.empty())
{
std::cout << *(resToPrint.top()) << std::endl;
resToPrint.pop();
}
myFile.close();
return 0;
}

You do not need anything special to parse .csv files, the STL containers from C++11 on provide all the tools necessary to parse virtually any .csv file. You do not need to know the number of values per-row you are parsing before hand, though you will need to know the type of value you are reading from the .csv in order to apply the proper conversion of values. You do not need any third-party library like Boost either.
There are many ways to store the values parsed from a .csv file. The basic "handle any type" approach is to store the values in a std::vector<std::vector<type>> (which essentially provides a vector of vectors holding the values parsed from each line). You can specialize the storage as needed depending on the type you are reading and how you need to convert and store the values. Your base storage can be struct/class, std::pair, std::set, or just a basic type like int. Whatever fits your data.
In your case you have basic int values in your file. The only caveat to a basic .csv parse is the fact you may have blank lines in between the lines of values. That's easily handled by any number of tests. For instance you can check if the .length() of the line read is zero, or for a bit more flexibility (in handling lines with containing multiple whitespace or other non-value characters), you can use .find_first_of() to find the first wanted value in the line to determine if it is a line to parse.
For example, in your case, your read loop for your lines of value can simply read each line and check whether the line contains a digit. It can be as simple as:
...
std::string line; /* string to hold each line read from file */
std::vector<std::vector<int>> values {}; /* vector vector of int */
std::ifstream f (argv[1]); /* file stream to read */
while (getline (f, line)) { /* read each line into line */
/* if no digits in line - get next */
if (line.find_first_of("0123456789") == std::string::npos)
continue;
...
}
Above, each line is read into line and then line is checked on whether or not it contains digits. If so, parse it. If not, go get the next line and try again.
If it is a line containing values, then you can create a std::stringstream from the line and read integer values from the stringstream into a temporary int value and add the value to a temporary vector of int, consume the comma with getline and the delimiter ',', and when you run out of values to read from the line, add the temporary vector of int to your final storage. (Repeat until all lines are read).
Your complete read loop could be:
while (getline (f, line)) { /* read each line into line */
/* if no digits in line - get next */
if (line.find_first_of("0123456789") == std::string::npos)
continue;
int itmp; /* temporary int */
std::vector<int> tmp; /* temporary vector<int> */
std::stringstream ss (line); /* stringstream from line */
while (ss >> itmp) { /* read int from stringstream */
std::string tmpstr; /* temporary string to ',' */
tmp.push_back(itmp); /* add int to tmp */
if (!getline (ss, tmpstr, ',')) /* read to ',' w/tmpstr */
break; /* done if no more ',' */
}
values.push_back (tmp); /* add tmp vector to values */
}
There is no limit on the number of values read per-line, or the number of lines of values read per-file (up to the limits of your virtual memory for storage)
Putting the above together in a short example, you could do something similar to the following which just reads your input file and then outputs the collected integers when done:
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
int main (int argc, char **argv) {
if (argc < 2) { /* validate at least 1 argument given for filename */
std::cerr << "error: insufficient input.\nusage: ./prog <filename>\n";
return 1;
}
std::string line; /* string to hold each line read from file */
std::vector<std::vector<int>> values {}; /* vector vector of int */
std::ifstream f (argv[1]); /* file stream to read */
while (getline (f, line)) { /* read each line into line */
/* if no digits in line - get next */
if (line.find_first_of("0123456789") == std::string::npos)
continue;
int itmp; /* temporary int */
std::vector<int> tmp; /* temporary vector<int> */
std::stringstream ss (line); /* stringstream from line */
while (ss >> itmp) { /* read int from stringstream */
std::string tmpstr; /* temporary string to ',' */
tmp.push_back(itmp); /* add int to tmp */
if (!getline (ss, tmpstr, ',')) /* read to ',' w/tmpstr */
break; /* done if no more ',' */
}
values.push_back (tmp); /* add tmp vector to values */
}
for (auto row : values) { /* output collected values */
for (auto col : row)
std::cout << " " << col;
std::cout << '\n';
}
}
Example Input File
Using an input file with miscellaneous blank lines and two-integers per-line on the lines containing values as you describe in your question:
$ cat dat/csvspaces.csv
1,1
2,2
3,3
4,4
5,5
6,6
7,7
8,8
9,9
Example Use/Output
The resulting parse:
$ ./bin/parsecsv dat/csvspaces.csv
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
Example Input Unknown/Uneven No. of Columns
You don't need to know the number of values per-line in the .csv or the number of lines of values in the file. The STL containers handle the memory allocation needs automatically allowing you to parse whatever you need. Now you may want to enforce some fixed number of values per-row, or rows per-file, but that is simply up to you to add simple counters and checks to your read/parse routine to limit the values stored as needed.
Without any changes to the code above, it will handle any number of comma-separated-values per-line. For example, changing your data file to:
$ cat dat/csvspaces2.csv
1
2,2
3,3,3
4,4,4,4
5,5,5,5,5
6,6,6,6,6,6
7,7,7,7,7,7,7
8,8,8,8,8,8,8,8
9,9,9,9,9,9,9,9,9
Example Use/Output
Results in the expected parse of each value from each line, e.g.:
$ ./bin/parsecsv dat/csvspaces2.csv
1
2 2
3 3 3
4 4 4 4
5 5 5 5 5
6 6 6 6 6 6
7 7 7 7 7 7 7
8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9
Let me know if you have questions that I didn't cover or if you have additional questions about something I did and I'm happy to help further.

I will not update your code. I look at your title Parsing a CSV file - C++ and would like to show you, how to read csv files in a more modern way. Unfortunately you are still on C++14. With C++20 or the ranges library it would be ultra simple using getlines and split.
And in C++17 we could use CTAD and if with initializer and so on.
But what we do not need is boost. C++`s standard lib is sufficient. And we do never use scanf and old stuff like that.
And in my very humble opinion the link to the 10 years old question How can I read and parse CSV files in C++? should not be given any longer. It is the year 2020 now. And more modern and now available language elements should be used. But as said. Everybody is free to do what he wants.
In C++ we can use the std::sregex_token_iterator. and its usage is ultra simple. It will also not slow down your program dramatically. A double std::getline would also be ok. Although it is not that flexible. The number of columns must be known for that. The std::sregex_token_iterator does not care about the number of columns.
Please see the following example code. In that, we create a tine proxy class and overwrite its extractor operator. Then we us the std::istream_iterator and read and parse the whole csv-file in a small one-liner.
#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <regex>
#include <string>
#include <vector>
// Define Alias for easier Reading
// using Columns = std::vector<std::string>;
using Columns = std::vector<int>;
// The delimiter
const std::regex re(",");
// Proxy for the input Iterator
struct ColumnProxy {
// Overload extractor. Read a complete line
friend std::istream& operator>>(std::istream& is, ColumnProxy& cp) {
// Read a line
std::string line;
cp.columns.clear();
if(std::getline(is, line) && !line.empty()) {
// Split values and copy into resulting vector
std::transform(
std::sregex_token_iterator(line.begin(), line.end(), re, -1), {},
std::back_inserter(cp.columns),
[](const std::string& s) { return std::stoi(s); });
}
return is;
}
// Type cast operator overload. Cast the type 'Columns' to
// std::vector<std::string>
operator Columns() const { return columns; }
protected:
// Temporary to hold the read vector
Columns columns{};
};
int main() {
std::ifstream myFile("r:\\log.txt");
if(myFile) {
// Read the complete file and parse verything and store result into vector
std::vector<Columns> values(std::istream_iterator<ColumnProxy>(myFile), {});
// Show complete csv data
std::for_each(values.begin(), values.end(), [](const Columns& c) {
std::copy(c.begin(), c.end(),
std::ostream_iterator<int>(std::cout, " "));
std::cout << "\n";
});
}
return 0;
}
Please note: There are tons of other possible solutions. Please feel free to use whatever you want.
EDIT
Because I see a lot of complicated code here, I would like to show a 2nd example of how to
Parsing a CSV file - C++
Basically, you do not need more than 2 statements in the code. You first define a regex for digits. And then you use a C++ language element that has been exactly designed for the purpose of tokenizing strings into substrings. The std::sregex_token_iterator. And because such a most-fitting language element is available in C++ since years, it would may be worth a consideration to use it. And maybe you could do basically the task in 2 lines, instead of 10 or more lines. And it is easy to understand.
But of course, there are thousands of possible solutions and some like to continue in C-Style and others like more moderen C++ features. That's up to everybodies personal decision.
The below code reads the csv file as specified, regardless of how many rows(lines) it contains and how many columns are there for each row. Even foreing characters can be in it. An empty row will be an empty entry in the csv vector. This can also be easly prevented, with an "if !empty" before the emplace back.
But some like so and the other like so. Whatever people want.
Please see a general example:
#include <algorithm>
#include <iterator>
#include <iostream>
#include <regex>
#include <sstream>
#include <string>
#include <vector>
// Test data. Can of course also be taken from a file stream.
std::stringstream testFile{ R"(1,2
3, a, 4
5 , 6 b , 7
abc def
8 , 9
11 12 13 14 15 16 17)" };
std::regex digits{R"((\d+))"};
using Row = std::vector<std::string>;
int main() {
// Here we will store all the data from the CSV as std::vector<std::vector<std::string>>
std::vector<Row> csv{};
// This extremely simple 2 lines will read the complete CSV and parse the data
for (std::string line{}; std::getline(testFile, line); )
csv.emplace_back(Row(std::sregex_token_iterator(line.begin(), line.end(), digits, 1), {}));
// Now, you can do with the data, whatever you want. For example: Print double the value
std::for_each(csv.begin(), csv.end(), [](const Row& r) {
if (!r.empty()) {
std::transform(r.begin(), r.end(), std::ostream_iterator<int>(std::cout, " "), [](const std::string& s) {
return std::stoi(s) * 2; }
); std::cout << "\n";}});
return 0;
}
So, now, you may get the idea, you may like it, or you do not like it. Whatever. Feel free to do whatever you want.

Reading a specific line from a .txt file

I have a text file full of names:
smartgem
marshbraid
seamore
stagstriker
meadowbreath
hydrabrow
startrack
wheatrage
caskreaver
seaash
I want to code a random name generator that will copy a specific line from the.txt file and return it.

While reading in from a file you must start from the beginning and continue on. My best advice would be to read in all of the names, store them in a set, and randomly access them that way if you don't have stringent concerns over efficiency.
You cannot pick a random string from the end of the file without first reading up that name in the file.
You may also want to look at fseek() which will allow you to "jump" to a location within the input stream. You could randomly generate an offset and then provide that as an argument to fseek().
http://www.cplusplus.com/reference/cstdio/fseek/

You cannot do that unless you do one of two things:
Generate an index for that file, containing the address of each line, then you can go straight to that address and read it. This index can be stored in many different ways, the easiest one being on a separate file, this way the original file can still be considered a text file, or;
Structure the file so that each line starts at a fixed distance in bytes of each other, so you can just go to the line you want by multiplying (desired index * size). This does not mean the texts on each line need to have the same length, you can pad the end of the line with null-terminators (character '\0'). In this case it is not recommended to work this file as a text file anymore, but a binary file instead.
You can write a separate program that will generate this index or generate the structured file for your main program to use.
All this of course, considering you want the program to run and read the line without having to load the entire file in memory first. If your program will constantly read lines from the file, you should probably just load the entire file into a std::vector<std::string> and then read the lines at will from there.

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <cstdlib>
#include <ctime>
using namespace std;
int main()
{
string filePath = "test.txt";
vector<std::string> qNames;
ifstream openFile(filePath.data());
if (openFile.is_open())
{
string line;
while (getline(openFile, line))
{
qNames.push_back(line.c_str());
}
openFile.close();
}
if (!qNames.empty())
{
srand((unsigned int)time(NULL));
for (int i = 0; i < 10; i++)
{
int num = rand();
int linePos = num % qNames.size();
cout << qNames.at(linePos).c_str() << endl;
}
}
return 0;
}

Big csv file c++ parsing performance

I have a big csv file (25 mb) that represents a symmetric graph (about 18kX18k). While parsing it into an array of vectors, i have analyzed the code (with VS2012 ANALYZER) and it shows that the problem with the parsing efficiency (about 19 seconds total) occurs while reading each character (getline::basic_string::operator+=) as shown in the picture below:
This leaves me frustrated, as with Java simple buffered line file reading and tokenizer i achieve it with less than half a second.
My code uses only STL library:
int allColumns = initFirstRow(file,secondRow);
// secondRow has initialized with one value
int column = 1; // dont forget, first column is 0
VertexSet* rows = new VertexSet[allColumns];
rows[1] = secondRow;
string vertexString;
long double vertexDouble;
for (int row = 1; row < allColumns; row ++){
// dont do the last row
for (; column < allColumns; column++){
//dont do the last column
getline(file,vertexString,',');
vertexDouble = stold(vertexString);
if (vertexDouble > _TH){
rows[row].add(column);
}
}
// do the last in the column
getline(file,vertexString);
vertexDouble = stold(vertexString);
if (vertexDouble > _TH){
rows[row].add(++column);
}
column = 0;
}
initLastRow(file,rows[allColumns-1],allColumns);
init first and last row basically does the same thing as the loop above, but initFirstRow also counts the number of columns.
VertexSet is basically a vector of indexes (int). Each vertex read (separated by ',') goes no more than 7 characters length long (values are between -1 and 1).

At 25 megabytes, I'm going to guess that your file is machine generated. As such, you (probably) don't need to worry about things like verifying the format (e.g., that every comma is in place).
Given the shape of the file (i.e., each line is quite long) you probably won't impose a lot of overhead by putting each line into a stringstream to parse out the numbers.
Based on those two facts, I'd at least consider writing a ctype facet that treats commas as whitespace, then imbuing the stringstream with a locale using that facet to make it easy to parse out the numbers. Overall code length would be a little greater, but each part of the code would end up pretty simple:
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <time.h>
#include <stdlib.h>
#include <locale>
#include <sstream>
#include <algorithm>
#include <iterator>
class my_ctype : public std::ctype<char> {
std::vector<mask> my_table;
public:
my_ctype(size_t refs=0):
my_table(table_size),
std::ctype<char>(my_table.data(), false, refs)
{
std::copy_n(classic_table(), table_size, my_table.data());
my_table[',']=(mask)space;
}
};
template <class T>
class converter {
std::stringstream buffer;
my_ctype *m;
std::locale l;
public:
converter() : m(new my_ctype), l(std::locale::classic(), m) { buffer.imbue(l); }
std::vector<T> operator()(std::string const &in) {
buffer.clear();
buffer<<in;
return std::vector<T> {std::istream_iterator<T>(buffer),
std::istream_iterator<T>()};
}
};
int main() {
std::ifstream in("somefile.csv");
std::vector<std::vector<double>> numbers;
std::string line;
converter<double> cvt;
clock_t start=clock();
while (std::getline(in, line))
numbers.push_back(cvt(line));
clock_t stop=clock();
std::cout<<double(stop-start)/CLOCKS_PER_SEC << " seconds\n";
}
To test this, I generated an 1.8K x 1.8K CSV file of pseudo-random doubles like this:
#include <iostream>
#include <stdlib.h>
int main() {
for (int i=0; i<1800; i++) {
for (int j=0; j<1800; j++)
std::cout<<rand()/double(RAND_MAX)<<",";
std::cout << "\n";
}
}
This produced a file around 27 megabytes. After compiling the reading/parsing code with gcc (g++ -O2 trash9.cpp), a quick test on my laptop showed it running in about 0.18 to 0.19 seconds. It never seems to use (even close to) all of one CPU core, indicating that it's I/O bound, so on a desktop/server machine (with a faster hard drive) I'd expect it to run faster still.

The inefficiency here is in Microsoft's implementation of std::getline, which is being used in two places in the code. The key problems with it are:
It reads from the stream one character at a time
It appends to the string one character at a time
The profile in the original post shows that the second of these problems is the biggest issue in this case.
I wrote more about the inefficiency of std::getline here.
GNU's implementation of std::getline, i.e. the version in libstdc++, is much better.
Sadly, if you want your program to be fast and you build it with Visual C++ you'll have to use lower level functions than std::getline.

The debug Runtime Library in VS is very slow because it does a lot of debug checks (for out of bound accesses and things like that) and calls lots of very small functions that are not inlined when you compile in Debug.
Running your program in release should remove all these overheads.
My bet on the next bottleneck is string allocation.

I would try read bigger chunks of memory at once and then parse it all.
Like.. read full line. and then parse this line using pointers and specialized functions.

Hmm good answer here. Took me a while but I had the same problem. After this fix my write and process time went from 38 sec to 6 sec.
Here's what I did.
First get data using boost mmap. Then you can use boost thread to make processing faster on the const char* that boost mmap returns. Something like this: (the multithreading is different depending on your implementation so I excluded that part)
#include <boost/iostreams/device/mapped_file.hpp>
#include <boost/thread/thread.hpp>
#include <boost/lockfree/queue.hpp>
foo(string path)
{
boost::iostreams::mapped_file mmap(path,boost::iostreams::mapped_file::readonly);
auto chars = mmap.const_data(); // set data to char array
auto eofile = chars + mmap.size(); // used to detect end of file
string next = ""; // used to read in chars
vector<double> data; // store the data
for (; chars && chars != eofile; chars++) {
if (chars[0] == ',' || chars[0] == '\n') { // end of value
data.push_back(atof(next.c_str())); // add value
next = ""; // clear
}
else
next += chars[0]; // add to read string
}
}

C++ Read file into Array / List / Vector

I am currently working on a small program to join two text files (similar to a database join). One file might look like:
269ED3
86356D
818858
5C8ABB
531810
38066C
7485C5
948FD4
The second one is similar:
hsdf87347
7485C5
rhdff
23487
948FD4
Both files have over 1.000.000 lines and are not limited to a specific number of characters. What I would like to do is find all matching lines in both files.
I have tried a few things, Arrays, Vectors, Lists - but I am currently struggling with deciding what the best (fastest and memory easy) way.
My code currently looks like:
#include iostream>
#include fstream>
#include string>
#include ctime>
#include list>
#include algorithm>
#include iterator>
using namespace std;
int main()
{
string line;
clock_t startTime = clock();
list data;
//read first file
ifstream myfile ("test.txt");
if (myfile.is_open())
{
for(line; getline(myfile, line);/**/){
data.push_back(line);
}
myfile.close();
}
list data2;
//read second file
ifstream myfile2 ("test2.txt");
if (myfile2.is_open())
{
for(line; getline(myfile2, line);/**/){
data2.push_back(line);
}
myfile2.close();
}
else cout data2[k], k++
//if data[j] > a;
return 0;
}
My thinking is: With a vector, random access on elements is very difficult and jumping to the next element is not optimal (not in the code, but I hope you get the point). It also takes a long time to read the file into a vector by using push_back and adding the lines one by one. With arrays the random access is easier, but reading >1.000.000 records into an array will be very memory intense and takes a long time as well. Lists can read the files faster, random access is expensive again.
Eventually I will not only look for exact matches, but also for the first 4 characters of each line.
Can you please help me deciding, what the most efficient way is? I have tried arrays, vectors and lists, but am not satisfied with the speed so far. Is there any other way to find matches, that I have not considered? I am very happy to change the code completely, looking forward to any suggestion!
Thanks a lot!
EDIT: The output should list the matching values / lines. In this example the output is supposed to look like:
7485C5
948FD4

Reading a 2 millions lines won't be too much slow, what might be slowing down is your comparison logic :
Use : std::intersection
data1.sort(data1.begin(), data1.end()); // N1log(N1)
data2.sort(data2.begin(), data2.end()); // N2log(N2)
std::vector<int> v; //Gives the matching elements
std::set_intersection(data1.begin(), data1.end(),
data2.begin(), data2.end(),
std::back_inserter(v));
// Does 2(N1+N2-1) comparisons (worst case)
You can also try using std::set and insert lines into it from both files, the resultant set will have only unique elements.

If the values for this are unique in the first file, this becomes trivial when exploiting the O(nlogn) characteristics of a set. The following stores all lines in the first file passed as a command-line argument to a set, then performs a O(logn) search for each line in the second file.
EDIT: Added 4-char-only preamble searching. To do this, the set contains only the first four chars of each line, and the search from the second looks for only the first four chars of each search-line. The second-file line is printed in its entirety if there is a match. Printing the first file full-line in entirety would be a bit more challenging.
#include <iostream>
#include <fstream>
#include <string>
#include <set>
int main(int argc, char *argv[])
{
if (argc < 3)
return EXIT_FAILURE;
// load set with first file
std::ifstream inf(argv[1]);
std::set<std::string> lines;
std::string line;
for (unsigned int i=1; std::getline(inf,line); ++i)
lines.insert(line.substr(0,4));
// load second file, identifying all entries.
std::ifstream inf2(argv[2]);
while (std::getline(inf2, line))
{
if (lines.find(line.substr(0,4)) != lines.end())
std::cout << line << std::endl;
}
return 0;
}

One solution is to read the entire file at once.
Use istream::seekg and istream::tellg to figure the size of the two files. Allocate a character array large enough to store them both. Read both files into the array, at appropriate location, using istream::read.
Here is an example of the above functions.

Visual Studio C++ Data from CSV File into Variable Arrays

Basically, I want to make a simple function for a larger project, this will actually end up as a Header file that handles all my inventory. Any who, I just want it to be able to read/pull/input the Data from a .csv file format... Or I can even do a .txt if it is easier to make this function work, As long as I can open it up in MS Excel and edit the items and add new items, when the function is ran it will Open up "Example.csv" and for example, the function will look for the cell in Column 1 - 'sSwordName' and it will pull the data in the row FOLLOWING that cell, excluding that first column completely in all the inputs essentially... It will group it together, or what ever it may have to do in order to ASSIGN it to that Variable. Please see Code.h (Comments) to see my questions about my source code.
Code.h -
#include "stdafx.h"
#include <string>
#include <fstream>
#include <sstream>
#include <iostream>
#include <vector>
using namespace std;
char sSwordName[100][25] = {};
int sSwordLvlR[100] = {};
vector<string> split_at_commas(const string & row)
{
vector<string> res;
istringstream buf(row);
string s;
while (getline(buf, s, ','))
res.push_back(s);
return res;
system("pause");
}
/* Question 1: Where & How do I properly add 'ifstream wInv("Example.csv");' in order to Load the CSV file it is reading? */
/* Question 2: Line 38 & 46. */
/* Question 3: Line 39 & 47 */
int main()
{
int i = 1;
string line;
string row;
vector<string> values = split_at_commas(line);
if (values[0] == "sSwordName")
{
for(int i = 1; i < values.size(); ++i);
{
/*int i error: Object must have a pointer-to-object type*/
sSwordName[100][25][i - 1] = /*How do I convert string to char?*/(vector[i]);
}
}
else if (values[1] == "sSwordLvlR")
{
for(int i = 1; i < values.size(); ++i);
{
/*int i error: Object must have a pointer-to-object type*/
sSwordLvlR[i - 1] = /*How do I Convert string to int?*/(vector[i]);
}
}
}
/* Question 4: Is there anything else that is wrong in this? If so, how would I fix it */
Example.csv -
sSwordName,Wooden Shortsword,Bronze Shortsword,Iron Shortsword,Steel Shortsword,Titanium Shortsword
sSwordLvlR,1,3,5,6,10
More Information On The CSV:
sSwordName,Wooden Shortsword,Bronze Shortsword,Iron Shortsword,Steel Shortsword,Titanium Shortsword
sSwordLvlR,1,3,5,6,10
^Is not the ONLY way I can format it, for convenience I can do something like this (Below, E2.csv). If it is easier to make the function, function.;
sSwordName,"Wooden Shortsword","Bronze Shortsword","Iron Shortsword","Steel Shortsword","Titanium Shortsword"
sSwordLvlR,"1,","3,","5,","6,","10,"
I can even format it in this way;
sSwordName,"{"Wooden Shortsword","Bronze Shortsword","Iron Shortsword","Steel Shortsword","Titanium Shortsword"};
sSwordLvlR,"{1,3,5,6,10};"
Once again, I greatly appreciate any help. I thank you in Advanced!
-Leaum

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Trouble getting string to print random line from text file - c++

Related

Parsing a CSV file - C++

Reading a specific line from a .txt file

Big csv file c++ parsing performance

C++ Read file into Array / List / Vector

Visual Studio C++ Data from CSV File into Variable Arrays

Categories

Resources