Reading a specific line from a .txt file - c++

I have a text file full of names:
smartgem
marshbraid
seamore
stagstriker
meadowbreath
hydrabrow
startrack
wheatrage
caskreaver
seaash
I want to code a random name generator that will copy a specific line from the.txt file and return it.

While reading in from a file you must start from the beginning and continue on. My best advice would be to read in all of the names, store them in a set, and randomly access them that way if you don't have stringent concerns over efficiency.
You cannot pick a random string from the end of the file without first reading up that name in the file.
You may also want to look at fseek() which will allow you to "jump" to a location within the input stream. You could randomly generate an offset and then provide that as an argument to fseek().
http://www.cplusplus.com/reference/cstdio/fseek/

You cannot do that unless you do one of two things:
Generate an index for that file, containing the address of each line, then you can go straight to that address and read it. This index can be stored in many different ways, the easiest one being on a separate file, this way the original file can still be considered a text file, or;
Structure the file so that each line starts at a fixed distance in bytes of each other, so you can just go to the line you want by multiplying (desired index * size). This does not mean the texts on each line need to have the same length, you can pad the end of the line with null-terminators (character '\0'). In this case it is not recommended to work this file as a text file anymore, but a binary file instead.
You can write a separate program that will generate this index or generate the structured file for your main program to use.
All this of course, considering you want the program to run and read the line without having to load the entire file in memory first. If your program will constantly read lines from the file, you should probably just load the entire file into a std::vector<std::string> and then read the lines at will from there.

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <cstdlib>
#include <ctime>
using namespace std;
int main()
{
string filePath = "test.txt";
vector<std::string> qNames;
ifstream openFile(filePath.data());
if (openFile.is_open())
{
string line;
while (getline(openFile, line))
{
qNames.push_back(line.c_str());
}
openFile.close();
}
if (!qNames.empty())
{
srand((unsigned int)time(NULL));
for (int i = 0; i < 10; i++)
{
int num = rand();
int linePos = num % qNames.size();
cout << qNames.at(linePos).c_str() << endl;
}
}
return 0;
}

Related

C++ - Opening text files sequentially

I have hundreds of .txt files ordered by number: 1.txt, 2.txt, 3.txt,...n.txt. In each file there are two columns with decimal numbers.
I wrote an algorithm that does some operations to one .txt file alone, and now I want to recursively do the same to all of them.
This helpful question gave me some idea of what I'm trying to do.
Now I'm trying to write an algorithm to read all of the files:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main ()
{
int i, n;
char filename[6];
double column1[100], column2[100];
for (n=1;n=200;n++)
{
sprintf(filename, "%d.txt", n);
ifstream datafile;
datafile.open(filename);
for (i=0;i<100;i++)
{
datafile >> column1[i] >> column2[i];
cout << column1[i] << column2[i];
}
datafile.close();
}
return 0;
}
What I think the code is doing: it is creating string names from 1.txt till 200.txt, then it opens files with these names. For each file, the first 100 columns will be associated to the arrays column1 and column2, then the values will be shown on the screen.
I don't get any error when compiling it, but when I run it the output is huge and simply won't stop. If i set the output to a .txt file it reaches easily some Gb!
I also tried decreasing the loop number and reduce the numbers of columns (to 3 or so), but I till get an infinite output. I would be glad if someone could point the mistakes I'm doing in the code...
I am using gcc 5.2.1 with Linux.
Thanks!
6-element array is too short to store "200.txt". It must be at least 8 elements.
The condition n=200 is wrong and is always true. It should be n<=200.
If all your files are in the same directory, you could also use boost::filesystem, e.g.:
auto path = "path/to/folder";
std::for_each(boost::filesystem::directory_iterator{path},
boost::filesystem::directory_iterator{},
[](boost::filesystem::directory_entry file){
// test if file is of the correct type
// do sth with file
});
I think this is a cleaner solution.

to find a word from a text file and then also display the line number in which the word lies using C++

I am absolutely a beginner at programming. No this is not a homework either. I am trying to learn it by myself. As stated in the titel itself I would like to input a .txt file and find out a particular word from the file. But thats not exactly what I want. I rather want the line number in which the word lies, so that i can use this line number and print out all the lines in the .txt file after this particular line. I found this code on youtube where it showed me how to find the word. I modified it a bit and its now giving me the line in which the searched word states (NOT THE LINE NUMBER). I am attaching the code.
#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <string>
#include <stdio.h>
using namespace std;
string find_word(string file, string word)
{
int offset;
string line1;
ifstream Myfile;
Myfile.open(open);
if (Myfile.is_open())
{
while (!Myfile.eof())
{
getline(Myfile, line1);
if ((offset = line1.find(word,o)) != string::npos)
{
return line1;
}
}
Myfile.close();
}
else
cout << "couldn't open...." << endl;
}
int main ()
{
string c = find_word("test.txt", "$COOR");
cout << c;
cin.get();
return 0;
}
right now the text file contains just 8 lines and "$COOR" lies in line 4. the program just gives me the entire line. But I want the line number so that I can print out the lines after line number 4.
I would later like to test it for a file having many lines i.e. more than 50000000 or so.
Think logically. You don't know whether a given line contains the word you're looking for until you read it. Therefore, you always need to know the number of the current line in the text file that you've read, so if the line contains the word, you then print the line number.
You need another variable int, initialized to zero, and incremented every time your code reads a line of text. So, when your code reads the first line of text, it gets incremented to 1. Then, when the code reads the next line of text, the new variable gets incremented to 2, and so on. So, when you find the word, you know where to look to find the line number.
Your code already has a loop for reading each line of text. Don't you think it's now obvious where you will need to increment the line counter?

C++ Read file into Array / List / Vector

I am currently working on a small program to join two text files (similar to a database join). One file might look like:
269ED3
86356D
818858
5C8ABB
531810
38066C
7485C5
948FD4
The second one is similar:
hsdf87347
7485C5
rhdff
23487
948FD4
Both files have over 1.000.000 lines and are not limited to a specific number of characters. What I would like to do is find all matching lines in both files.
I have tried a few things, Arrays, Vectors, Lists - but I am currently struggling with deciding what the best (fastest and memory easy) way.
My code currently looks like:
#include iostream>
#include fstream>
#include string>
#include ctime>
#include list>
#include algorithm>
#include iterator>
using namespace std;
int main()
{
string line;
clock_t startTime = clock();
list data;
//read first file
ifstream myfile ("test.txt");
if (myfile.is_open())
{
for(line; getline(myfile, line);/**/){
data.push_back(line);
}
myfile.close();
}
list data2;
//read second file
ifstream myfile2 ("test2.txt");
if (myfile2.is_open())
{
for(line; getline(myfile2, line);/**/){
data2.push_back(line);
}
myfile2.close();
}
else cout data2[k], k++
//if data[j] > a;
return 0;
}
My thinking is: With a vector, random access on elements is very difficult and jumping to the next element is not optimal (not in the code, but I hope you get the point). It also takes a long time to read the file into a vector by using push_back and adding the lines one by one. With arrays the random access is easier, but reading >1.000.000 records into an array will be very memory intense and takes a long time as well. Lists can read the files faster, random access is expensive again.
Eventually I will not only look for exact matches, but also for the first 4 characters of each line.
Can you please help me deciding, what the most efficient way is? I have tried arrays, vectors and lists, but am not satisfied with the speed so far. Is there any other way to find matches, that I have not considered? I am very happy to change the code completely, looking forward to any suggestion!
Thanks a lot!
EDIT: The output should list the matching values / lines. In this example the output is supposed to look like:
7485C5
948FD4
Reading a 2 millions lines won't be too much slow, what might be slowing down is your comparison logic :
Use : std::intersection
data1.sort(data1.begin(), data1.end()); // N1log(N1)
data2.sort(data2.begin(), data2.end()); // N2log(N2)
std::vector<int> v; //Gives the matching elements
std::set_intersection(data1.begin(), data1.end(),
data2.begin(), data2.end(),
std::back_inserter(v));
// Does 2(N1+N2-1) comparisons (worst case)
You can also try using std::set and insert lines into it from both files, the resultant set will have only unique elements.
If the values for this are unique in the first file, this becomes trivial when exploiting the O(nlogn) characteristics of a set. The following stores all lines in the first file passed as a command-line argument to a set, then performs a O(logn) search for each line in the second file.
EDIT: Added 4-char-only preamble searching. To do this, the set contains only the first four chars of each line, and the search from the second looks for only the first four chars of each search-line. The second-file line is printed in its entirety if there is a match. Printing the first file full-line in entirety would be a bit more challenging.
#include <iostream>
#include <fstream>
#include <string>
#include <set>
int main(int argc, char *argv[])
{
if (argc < 3)
return EXIT_FAILURE;
// load set with first file
std::ifstream inf(argv[1]);
std::set<std::string> lines;
std::string line;
for (unsigned int i=1; std::getline(inf,line); ++i)
lines.insert(line.substr(0,4));
// load second file, identifying all entries.
std::ifstream inf2(argv[2]);
while (std::getline(inf2, line))
{
if (lines.find(line.substr(0,4)) != lines.end())
std::cout << line << std::endl;
}
return 0;
}
One solution is to read the entire file at once.
Use istream::seekg and istream::tellg to figure the size of the two files. Allocate a character array large enough to store them both. Read both files into the array, at appropriate location, using istream::read.
Here is an example of the above functions.

Trouble getting string to print random line from text file

I picked up this bit of code a while back as a way to select a random line from a text file and output the result. Unfortunately, it only seems to output the first letter of the line that it selects and I can't figure out why its doing so or how to fix it. Any help would be appreciated.
#include "stdafx.h"
#include <stdio.h>
#include <iostream>
#include <fstream>
#include <string>
#include <time.h>
using namespace std;
#define MAX_STRING_SIZE 1000
string firstName()
{
string firstName;
char str[MAX_STRING_SIZE], pick[MAX_STRING_SIZE];
FILE *fp;
int readCount = 0;
fp = fopen("firstnames.txt", "r");
if (fp)
{
if (fgets(pick, MAX_STRING_SIZE, fp) != NULL)
{
readCount = 1;
while (fgets (str, MAX_STRING_SIZE, fp) != NULL)
{
if ((rand() % ++readCount) == 0)
{
strcpy(pick, str);
}
}
}
}
fclose(fp);
firstName = *pick;
return firstName;
}
int main()
{
srand(time(NULL));
int n = 1;
while (n < 10)
{
string fn = firstName();
cout << fn << endl;
++n;
}
system("pause");
}
firstName = *pick;
I am guessing this is the problem.
pick here is essentially a pointer to the first element of the array, char*, so of course *pick is of type char.. or the first character of the array.
Another way to see it is that *pick == *(pick +0) == pick[0]
There are several ways to fix it. Simplest is to just do the below.
return pick;
The constructor will automatically make the conversion for you.
Since you didn't specify the format of your file, I'll cover both cases: fixed record length and variable record length; assuming each text line is a record.
Reading Random Names, Fixed Length Records
This one is straight forward.
Determine the index (random) of the record you want.
Calculate the file position = record length * index.
Set file to the position.
Read text from file, using std::getline.
Reading Random Names, Variable Length Records
This assumes that the length of the text lines vary. Since they vary, you can't use math to determine the file position.
To randomly pick a line from a file you will either have to put each line into a container, or put the file offset of the beginning of the line into a container.
After you have your container establish, determine the random name number and use that as an index into the container. If you stored the file offsets, position the file to the offset and read the line. Otherwise, pull the text from the container.
Which container should be used? It depends. Storing the text is faster but takes up memory (you are essentially storing the file into memory). Storing the file positions takes up less room but you will end up reading each line twice (once to find the position, second to fetch the data).
Augmentations to these algorithms is to memory-map the file, which is an exercise for the reader.
Edit 1: Example
include <iostream>
#include <fstream>
#include <vector>
#include <string>
using std::string;
using std::vector;
using std::fstream;
// Create a container for the file positions.
std::vector< std::streampos > file_positions;
// Create a container for the text lines
std::vector< std::string > text_lines;
// Load both containers.
// The number of lines is the size of either vector.
void
Load_Containers(std::ifstream& inp)
{
std::string text_line;
std::streampos file_pos;
file_pos = inp.tellg();
while (!std::getline(inp, text_line)
{
file_positions.push_back(file_pos);
file_pos = inp.tellg();
text_lines.push_back(text_line);
}
}

C++ length of file and vectors

Hi I have a file with some text in it. Is there some easy way to get the number of lines in the file without traversing through the file?
I also need to put the lines of the file into a vector. I am new to C++ but I think vector is like ArrayList in java so I wanted to use a vector and insert things into it. So how would I do it?
Thanks.
There is no way of finding the number of lines in a file without reading it. To read all lines:
1) create a std::vector of std::string
3 ) open a file for input
3) read a line as a std::string using getline()
4) if the read failed, stop
5) push the line into the vector
6) goto 3
You would need to traverse the file to detect the number of lines (or at least call a library method that traverse the file).
Here is a sample code for parsing text file, assuming that you pass the file name as an argument, by using the getline method:
#include <string>
#include <vector>
#include <fstream>
#include <iostream>
int main(int argc, char* argv[])
{
std::vector<std::string> lines;
std::string line;
lines.clear();
// open the desired file for reading
std::ifstream infile (argv[1], std::ios_base::in);
// read each file individually (watch out for Windows new lines)
while (getline(infile, line, '\n'))
{
// add line to vector
lines.push_back (line);
}
// do anything you like with the vector. Output the size for example:
std::cout << "Read " << lines.size() << " lines.\n";
return 0;
}
Update: The code could fail for many reasons (e.g. file not found, concurrent modifications to file, permission issues, etc). I'm leaving that as an exercise to the user.
1) No way to find number of lines without reading the file.
2) Take a look at getline function from the C++ Standard Library. Something like:
string line;
fstream file;
vector <string> vec;
...
while (getline(file, line)) vec.push_back(line);
Traversing the file is fundamentally required to determine the number of lines, regardless of whether you do it or some library routine does it. New lines are just another character, and the file must be scanned one character at a time in its entirety to count them.
Since you have to read the lines into a vector anyways, you might as well combine the two steps:
// Read lines from input stream in into vector out
// Return the number of lines read
int getlines(std::vector<std::string>& out, std::istream& in == std::cin) {
out.clear(); // remove any data in vector
std::string buffer;
while (std::getline(in, buffer))
out.push_back(buffer);
// return number of lines read
return out.size();
}