I'm having trouble reading in lines from a text file? - c++

I want to read in a line from a text file, store the line in an array of strings to later display the line numbers a word can be found on, and also break down the line into words for marking how many times unique words come up. I have successfully been able to break down the lines word by word and mark the frequency in which they appear but I'm struggling with storing the line in an array of strings so that I can use it later.
void get_word(istream& in_stream, string& w,
list<string> &wordlist, int& lineCount, string *line)
{
string t;
getline(in_stream,t);
for (int j=0; t[j]; j++)
t[j] = tolower(t[j]);
line = &t;
istringstream iss(t);
string word;
while(iss >> word)
{
insert_word(word, wordlist);
}
}
So far this is what I have and no matter what I try to do with the line where I try to assign the string "t" to the "line" array that's being pointed to it doesn't put anything in the array, I think I'm just completely missing something.
line is initialized as:
string line[0];

First of all, string line[0] gives you an array of no strings. That's not likely to be what you intended. Why not simply string line? That's one string. You'd pass a pointer to that string into your function, with &line.
Secondly, line = &t replaces the input pointer with a pointer to the local variable. This information is lost once the function ends. If you write *line = t then you are instead setting "the string that line points to" to the value of "the string t".
Ideally you'd avoid the "out argument" and just return the string you want, though.

Related

Reading lines of txt file into array prints only the last element

First of all, I didn't code in C++ for more then 8 years, but there is a hobby project I would like to work on where I ran into this issue.
I checked a similar question: Only printing last line of txt file when reading into struct array in C
but in my case I don't have a semicolon at the end of the while cycle.
Anyway, so I have a nicknames.txt file where I store nicknames, one in each line.
Then I want to read these nicknames into an array and select one random element of it.
Example nicknames.txt:
alpha
beta
random nickname
...
Pirate Scrub
Then I read the TXT file:
int nicknameCount = 0;
char *nicknames[2000];
std::string line;
std::ifstream file("nicknames.txt");
FILE *fileID = fopen("asd.txt", "w");
while (std::getline(file, line))
{
nicknames[nicknameCount++] = line.data();
// (1)
fprintf(fileID, "%i: %s\n", nicknameCount - 1, nicknames[nicknameCount - 1]);
}
int randomNickIndex = rand() % nicknameCount;
// (2)
for (int i = 0; i < nicknameCount; i++)
fprintf(fileID, "%i: %s\n", i, nicknames[i]);
fprintf(fileID, "Result: %s\n", nicknames[randomNickIndex]);
fprintf(fileID, "Result: %i\n", randomNickIndex);
fclose(fileID);
exit(0);
What then I see at point (1) is what I expect; the nicknames. Then later at point (2) every single member of the array is "Pirate Scrub", which is the last element of the nicknames.txt.
I think it must be something obvious, but I just can't figure it out. Any ideas?
line.data() returns a pointer to the sequence of characters. It is always the same pointer. Every time you read a new line, the contents of line are overwritten. To fix this, you will need to copy the contents of line.
Change:
char *nicknames[2000];
to
char nicknames[2000][256];
and
nicknames[nicknameCount++] = line.data();
to
strcpy(nicknames[nicknameCount++], line.data());
However, using a vector to store the lines is probably better, since this is C++
Your nicknames array does not contain copies of the strings, all the nicknames are pointers to the same data owned by line.
Instead of char* nicknames[2000] i would recommend you use
std::vector<std::string> nicknames;
and then inside the loop:
nicknames.push_back(line);
This:
char *nicknames[2000];
is an array of 2000 pointers to char. Nowhere in your code you are actually storing the strings from the file. This
nicknames[nicknameCount++] = line.data();
merely stores pointers to the lines internal buffer in the array. In the next iteration this buffer is overwritten with contents of the next line.
Forget about all the C i/o. Mixing C and C++ is advanced and you don't need it here. If you want to store a dynamically sized array of strings in C++, that is a std::vector<std::string>:
std::vector<std::string> lines;
std::string line;
while (std::getline(file, line))
{
lines.push_back(line);
}
Also for writing to the output file you should use an std::ofstream.

How do I only read a certain part of this line into a structure?

I am working with a csv file with a comma(,) as the delimiter. A certain line in the text file version of the csv file looks like this.
Station Name,MONTREAL/PIERRE ELLIOTT TRUDEAU INTL,,,,,,,,,,,,,,,,,,,,,,,
I want to be able to only store "MONTREAL/PIERRE ELLIOTT TRUDEAU INTL", minus the quotes. Therefore, i want to be able to not store STATION NAME. Based on my research, my code looks like this.
#include<string>
#include<sstream>
#include<fstream>
using namespace std;
struct company_data
{
string station_name, province, climate_identifier, TC_identifier, time_info;
float latitude, longitude;
int WMO_identifier;
string E, M, NA, symbol;
};
void accept_company_data (company_data initial)
{
ifstream infile;
infile.open("eng-hourly-montreal-wind_dec_2015.csv");
string line, temp1,temp2;
getline (infile, line);
istringstream iss(line);
iss>>temp1;
iss>>initial.station_name;
cout<<initial.station_name;
}
Any help would be greatly appreciated.
There are two ways to solve this
both of these use "C" string. you can use string.c_str() to get that.
take a look at strtok() - this will break up a string based on some delimiter (in your case the comma). on Linux/UNIX type 'man strtok'
set a pointer to the beginning of the string and loop till you hit the comma. Then increment the pointer by one (to pass over the comma) and save that position (set a pointer to it). Now continue to look for the next comma. When you have that next comma you can copy all the characters from the start pointer to the end pointer.
for example:
char *string = "you're input,with commas, in it";
char *start_pointer, *end_pointer, *ptr;
ptr = string;
while (*ptr!=',') ptr++; // scan along looking for comma
ptr++; // the above while, will have stopped on the comma
start_pointer=ptr;
while (*ptr!=',' && *ptr) ptr++;
end_pointer=ptr;
//now you can copy to your destination
char destination_buffer[128];
char *des=destination_buffer;
for(ptr=start_pointer;ptr<end_pointer;) *des++=*ptr++;
the above is not really efficient since you scan twice
what you could do is after you found the first comma
you can do
while(*ptr!=','&&*ptr) *des++=*ptr++;
the "&& *ptr " is looking for a NULL that delimits the end of a string.

C++ Reading from a text file into a const char array

I want to read in lines from a text file into a 2-d char array but without the newline character.
Example of .txt:
TCAGC
GTAGA
AGCAG
ATGTC
ATGCA
ACAGA
CTCGA
GCGAC
CGAGC
GCTAG
...
So far, I have:
ifstream infile;
infile.open("../barcode information.txt");
string samp;
getline(infile,samp,',');
BARCLGTH = samp.length();
NUMSUBJ=1;
while(!infile.eof())
{
getline(infile,samp,',');
NUMSUBJ++;
}
infile.close(); //I read the file the first time to determine how many sequences
//there are in total and the length of each sequence to determine
//the dimensions of my array. Not sure if there is a better way?
ifstream file2;
file2.open("../barcode information.txt");
char store[NUMSUBJ][BARCLGTH+1];
for(int i=0;i<NUMSUBJ;i++)
{
for(int j=0;j<BARCLGTH+1;j++)
{
store[i][j] = file2.get();
}
}
However, I do not know how to ignore the newline character. I want the array to be indexed so that I can access a sequence with the first index and then a specific char within that sequence with the second index; i.e. store[0][0] would give me 'T', but I do not want store[0][5] to give me '\n'.
Also, as an aside, store[0][6], which I think should be out of bounds since BARCLGTH is 5, returns 'G',store[0][7] returns 'T',store[0][8] returns 'A', etc. These are the chars from the next line. Alternatively, store[1][0],store[1][1], and store[1][2] also return the same values. Why does the first set return values, shouldn't they be out of bounds?
As you're coding in C++, you could do like this instead:
std::vector<std::string> barcodes;
std::ifstream infile("../barcode information.txt");
std::string line;
while (std::getline(infile, line))
barcodes.push_back(line);
infile.close();
After this the vector barcodes contains all the contents from the file. No need for arrays, and no need to count the number of lines.
And as both vectors and strings can be indexed like arrays, you can use syntax such as barcodes[2][0] to get the first character of the third entry.

Why is this char stopping my program?

Does the newline character have some kind of special significance in c++? Is it a non-ASCII character?
I'm trying to build a Markov chain for each unique n-character substring within a larger piece of text. Every time I come across a new unique substring I enter it into a map whose value is a 256-element vector (one element for each character in the extended ASCII table).
There's no problem when I print out the entire contents of the file ("lines" is a vector of lines of text built using ifstream and getline):
for(int i=0; i<lines.size(); i++) cout << lines[i] << endl;
The whole text file shows up in the console. The problem happens when I try to return the newline character to a function that's expecting a char. "moveSpaces" is an integer constant that determines how many characters further ahead to move in the vector of strings on each iteration.
char GetNextChar(int row, int col){
for (int i=0; i<MOVESPACES; i++) {
if (col+1<lines[row].size()) {
col+=1;
} else { // If you're not at the end of the line keep going
row+=1; // Otherwise, move to the beginning of the next row
col=0;
}
}
return lines[row].at(col);
}
I've walked through with the debugger, and when it gets to the 1st column of the 2nd line it craps out on me – no error or anything. It fails within this function, not the calling function.
The file I'm using is A Christmas Carol (first thing that came up on Project Gutenberg). For reference here are the first few lines:
STAVE I: MARLEY'S GHOST
MARLEY was dead: to begin with. There is no doubt
whatever about that. The register of his burial was
The function breaks when it should return the first character on the second line. This doesn't happen if I get rid of the newline, or if I build the "lines" vector myself line by line in the program. Any idea what's wrong?
Your GetNextChar function is assuming that if you are at the last character in some line, there will be a character in the next line. What happens if there is no character in that next line? This can happen in two places: When you have hit end of file, or when the next line is the empty string.
The second line is the empty string.

Tokenization of a text file with frequency and line occurrence. Using C++

once again I ask for help. I haven't coded anything for sometime!
Now I have a text file filled with random gibberish. I already have a basic idea on how I will count the number of occurrences per word.
What really stumps me is how I will determine what line the word is in. Gut instinct tells me to look for the newline character at the end of each line. However I have to do this while going through the text file the first time right? Since if I do it afterwords it will do no good.
I already am getting the words via the following code:
vector<string> words;
string currentWord;
while(!inputFile.eof())
{
inputFile >> currentWord;
words.push_back(currentWord);
}
This is for a text file with no set structure. Using the above code gives me a nice little(big) vector of words, but it doesn't give me the line they occur in.
Would I have to get the entire line, then process it into words to make this possible?
Use a std::map<std::string, int> to count the word occurrences -- the int is the number of times it exists.
If you need like by line input, use std::getline(std::istream&, std::string&), like this:
std::vector<std::string> lines;
std::ifstream file(...) //Fill in accordingly.
std::string currentLine;
while(std::getline(file, currentLine))
lines.push_back(currentLine);
You can split a line apart by putting it into an std::istringstream first and then using operator>>. (Alternately, you could cobble up some sort of splitter using std::find and other algorithmic primitaves)
EDIT: This is the same thing as in #dash-tom-bang's answer, but modified to be correct with respect to error handing:
vector<string> words;
int currentLine = 1; // or 0, however you wish to count...
string line;
while (getline(inputFile, line))
{
istringstream inputString(line);
string word;
while (inputString >> word)
words.push_back(pair(word, currentLine));
}
Short and sweet.
vector< map< string, size_t > > line_word_counts;
string line, word;
while ( getline( cin, line ) ) {
line_word_counts.push_back();
map< string, size_t > &word_counts = line_word_counts.back();
istringstream line_is( line );
while ( is >> word ) ++ word_counts[ word ];
}
cout << "'Hello' appears on line 5 " << line_word_counts[5-1]["Hello"]
<< " times\n";
You're going to have to abandon reading into strings, because operator >>(istream&, string&) discards white space and the contents of the white space (== '\n' or != '\n', that is the question...) is what will give you line numbers.
This is where OOP can save the day. You need to write a class to act as a "front end" for reading from the file. Its job will be to buffer data from the file, and return words one at a time to the caller.
Internally, the class needs to read data from the file a block (say, 4096 bytes) at a time. Then a string GetWord() (yes, returning by value here is good) method will:
First, read any white space characters, taking care to increment the object's lineNumber member every time it hits a \n.
Then read non-whitespace characters, putting them into the string object you'll be returning.
If it runs out of stuff to read, read the next block and continue.
If the you hit the end of file, the string you have is the whole word (which may be empty) and should be returned.
If the function returns an empty string, that tells the caller that the end of file has been reached. (Files usually end with whitespace characters, so reading whitespace characters cannot imply that there will be a word later on.)
Then you can call this method at the same place in your code as your cin >> line and the rest of the code doesn't need to know the details of your block buffering.
An alternative approach is to read things a line at a time, but all the read functions that would work for you require you to create a fixed-size buffer to read into beforehand, and if the line is longer than that buffer, you have to deal with it somehow. It could get more complicated than the class I described.