Parsing text file into list gives segmentation fault - c++

I'm getting a segmentation fault while trying to parse a big text file. The file contains 91 529 mRNA transcripts and details about these transcripts. I've created a RefSeqTranscript object that will take these details. When I parse the file, I create a list of these objects and start putting the details into these lists. It works fine for the first 1829 transcripts and then crashes with a segmentation fault. The method I'm running is:
void TranscriptGBFFParser::ParseFile(list<RefSeqTranscript> &transcripts, const char* filepath)
{
cout << "Parsing " << filepath << "..." << endl;
ifstream infile;
infile.open(filepath);
int num = 0;
RefSeqTranscript *transcript = new RefSeqTranscript();
for(string line; getline(infile, line); )
{
in.clear();
in.str(line);
if (boost::starts_with(line, "LOCUS"))
{
if((*transcript).transcriptRefSeqAcc.size() > 0)
{
cout << (*transcript).transcriptRefSeqAcc << ":" << (*transcript).gi << ":" << (*transcript).gene.geneName << ":" << ++num << endl;
transcripts.push_back(*transcript);
delete transcript;
RefSeqTranscript *transcript = new RefSeqTranscript();
}
}
else if (boost::starts_with(line, " var"))
{
TranscriptVariation variant;
(*transcript).variations.push_back(variant);
}
//Store the definition of the transcript in the description attribute
else if (boost::starts_with(line, "DEFINITION"))
{
(*transcript).description = line.substr(12);
for(line; getline(infile, line); )
{
if(boost::starts_with(line, "ACCESSION "))
break;
(*transcript).description += line.substr(12);
}
}
//The accession number and GI number are obtained from the VERSION line
else if (boost::starts_with(line, "VERSION"))
{
string versions = line.substr(12);
vector<string> strs;
boost::split(strs, versions, boost::is_any_of( " GI:" ), boost::token_compress_on);
boost::trim_left(strs[0]);
(*transcript).transcriptRefSeqAcc = strs[0];
(*transcript).gi = atoi(strs[1].c_str());
}
//Gene information is obtained from the "gene" sections of each transcript
else if (boost::starts_with(line, " gene"))
{
for(line; getline(infile, line); )
{
if(boost::starts_with(line.substr(21), "/gene="))
{
Gene *gene = new Gene();
string name = line.substr(27);
Utilities::trim(name, '\"');
(*gene).geneName = name;
(*transcript).gene = *gene;
delete gene;
break;
}
}
(*transcript).gene.geneID = 0;
}
else if (boost::starts_with(line, " CDS"))
{
(*transcript).proteinRefSeqAcc = "";
}
else if (boost::starts_with(line, "ORIGIN"))
{
(*transcript).sequence = "";
}
}
cout << (*transcript).transcriptRefSeqAcc << ":" << (*transcript).gi << ":" << (*transcript).gene.geneName << endl;
transcripts.push_back(*transcript);
delete transcript;
cout << "No. transcripts: " << transcripts.size() << endl;
cout << flush;
infile.close();
cout << "Finished parsing " << filepath << "." << endl;
}
I'm new to C++ and don't have a great understanding of how to work with pointers etc so I'm guessing I might have done something wrong there. I don't understand why it would work for almost 2000 objects before cutting out though.
The file I'm parsing is 2.1 GB and consists of about 44 000 000 lines so any tips on how to improve the efficiency would also be much appreciated.

This is probably not the only answer, but you have a leak...
if (boost::starts_with(line, "LOCUS"))
{
if((*transcript).transcriptRefSeqAcc.size() > 0)
{
cout << (*transcript).transcriptRefSeqAcc << ":" << (*transcript).gi << ":" << (*transcript).gene.geneName << ":" << ++num << endl;
transcripts.push_back(*transcript);
delete transcript;
// LEAK!
RefSeqTranscript *transcript = new RefSeqTranscript();
}
}
You probably mean:
transcript = new RefSeqTranscript();

It's hard to say anything specific unless you provide some more details:
What line does it crashed in?
Do you really need all of those transcripts at the same time?
But I would suggest you a couple improvements:
Don't use pointer (or at least use smart pointer) for the RefSeqTranscript *transcript;
Don't use pointer for the Gene *gene;
Generally, don't use pointers unless you realy need them;
And you have a bug here:
delete transcript;
RefSeqTranscript *transcript = new RefSeqTranscript();
Since you've laready declared transcript outside the loop's body, here you hide it with new variable with the same name. This causes memory leak, and moreover, you delete an outer transcript and do not replace it with anything. So, you probably get a crash on the next iteration.

Related

Why can't my Node tempNode show the right data?

I have a bit of an issue with my program. I have a function void loadData() which will load the data from a text file customers.txt and store each line of data into a Linked List. My concern is, specifically with how I/O works. I managed to get the data from the text file into and stored into a linked list data member variable. When i call that variable i get the answer i want printed onto the console.
std::cout << "Group Name: " << tempCustomer->groupName << std::endl;
However, i decided to run a console output command later in the function to test if all the variables had the right data, i realize that it was all over the place. I'm not sure why its not working.
Here is the loadData() function
void Groups::loadData(){
fin.open("customers.txt");
char holder[MAX_SIZE];
if(!fin.is_open())
std::cerr << "Could not access file" << std::endl;
else{
while(!fin.eof()){
Customers *tempCustomer = new Customers;
fin.getline(holder,MAX_SIZE,';');
tempCustomer->groupName = holder;
std::cout << "Group Name: " << tempCustomer->groupName << std::endl;
fin.getline(holder,MAX_SIZE,';');
tempCustomer->name = holder;
fin.getline(holder,MAX_SIZE,';');
tempCustomer->email = holder;
fin >> tempCustomer->choice;
fin.get(); //gets the last character, which is '\n'
fin.ignore(); //ignores the next character which is the '\n'
tempCustomer->next = NULL;
std::cout << "What does the temp Node Store?" << std::endl;
std::cout << "Group Name: " << tempCustomer->groupName << std::endl;
std::cout << "Name: " << tempCustomer->name << std::endl;
std::cout << "Email: " << tempCustomer->email << std::endl;
std::cout << "Choice: " << tempCustomer->choice << std::endl;
//addCustomerToLL(tempCustomer);
tempCustomer = NULL;
delete tempCustomer;
}
}
fin.close();
}
Here is the Console out put:
Group Name: Jonathan Group
What does the temp Node Store?
Group Name: vazquez.jonathan#pcc.edu
Name: vazquez.jonathan#pcc.edu
Email: vazquez.jonathan#pcc.edu
Choice: 2
Here is the text file customers.txt
Jonathan Group;Jonathan;vazquez.jonathan#pcc.edu;2
This is a school assignment, i'm to store all the customers from the text file into a linked list. I'm also to use c strings as strings rather than c++ version of strings. Let me know if the other files are necessary, i didnt include them since well nothing in this function utilize anything else outside the func besides the ifstream fin; private variable i have in the class and the const int MAX_SIZE = 256; global variable.
Assuming you're not allowed to use std::string, you need to allocate memory for each string.
So replace this:
fin.getline(holder,MAX_SIZE,';');
tempCustomer->groupName = holder;
with:
fin.getline(holder, MAX_SIZE, ';');
char *s = new char[strlen(holder) + 1];
strcpy(s, holder);
tempCustomer->groupName = s;
You should release the memory you allocate when you no longer need it, so create a destructor for your Customers class:
Customers::~Customers()
{
delete[] groupName;
}
It's because the holder changes when you read a new line,but your all string in your Customer points to the same holder which stores the last line you read.
Change the type of name,email etc to char[MAX_SIZE] may help.

Why does this leveldb code truncate "std::string"s that have spaces in them?

I wrote this piece of code to try leveldb. I am using Unix time as keys. For values that have spaces, only the last part gets saved. Here is the code. I am running Linux Kernel 4.4.0-47-generic
while (true) {
std::string note;
std::string key;
std::cout << "Test text here ";
std::cin >> note;
std::cout << std::endl;
if(note.size() == 0 || tolower(note.back()) == 'n' ) break;
key = std::to_string(std::time(nullptr));
status = db->Put(write_options, key, note);
if(!status.ok()) break;
}
std::cout << "Read texts........" << std::endl;
leveldb::Iterator* it = db->NewIterator(leveldb::ReadOptions());
for(it->SeekToFirst(); it->Valid(); it->Next()){
std::cout << it->key().ToString() << " " << it->value().ToString() << std::endl;
}
delete db;
The issue is not in leveldb, but in the way you read the input:
std::string note;
std::cin >> note;
This will read only up to the first whitespace. It is common mistake, see for example:
reading a line from ifstream into a string variable

Final element of list corrupted after removing an element

I'm having a weird issue. I'm writing a function to delete a line from a list of names created elsewhere, which, after some research, seems like it should be fairly simple. I write the current list of names into a list, display the list, have the user input the name they want to delete, remove the user-inputted name from the list, then display the updated list to the user.
Up to here, everything works perfectly, but when I write the list back into the file, the last name gets a random amount of characters chopped off of it, ranging from a couple of characters to the entire line. Now, this is where it gets strange. If I open the file and look at it without exiting the program, the last line of the file is messed up and continues to be whenever I display it later in the program. But, if I exit the program and then open the file, the last line is back to how it was originally written! That file is not written to again by the program after the list is written in, so I cannot imagine why this is happening.
I almost decided that since the file ultimately comes out of the program correct, I could just ignore the issue, but I want the user to be able to view the list of names after the deletion for various reasons, which is made impossible while the last list item prints incorrectly.
I am still fairly beginner with C++, so I'm kind of hoping that this is just an issue of me not fully understanding lists or something. Regardless, dumbed down explanations would be ace.
I included the function below, any help is much appreciated.
char act, charname[50];
string namestr;
list <string> c1;
list <string>::iterator c1_Iter;
//write the names from the file into a list
ifstream names("List of Names.txt");
while (std::getline(names, namestr))
{
c1.push_back(namestr);
}
//print the current names
cout << "Registered names:";
for (c1_Iter = c1.begin(); c1_Iter != c1.end(); c1_Iter++)
cout << "\n" << setw(5) << " " << *c1_Iter;
//choose which names to delete and confirm
cout << "\n\nEnter the name you would like to delete: ";
cin.getline(charname, 50);
cin.getline(charname, 50);
cout << "\nAre you sure? Enter 'y' to permanently delete " << charname << ", and any other key to return to the start screen.";
cin >> act;
if (act == 'y' || act == 'Y')
{
//delete a file associated with each name
string strname(charname);
strname.append(".txt");
if (remove(strname.c_str()) < 0)
perror("Error deleting file");
else
{
//delete name from the file only if that person's individual file is successfully deleted
c1.remove(charname);
cout << "\n" << charname << " successfully deleted!\n";
//print the updated list of names
cout << "\nUpdated list of registered names:\n";
for (c1_Iter = c1.begin(); c1_Iter != c1.end(); c1_Iter++)
cout << *c1_Iter << endl;
//write updated list of names over "List of Names" to update the file
ofstream newNames("List of Names.txt");
for (c1_Iter = c1.begin(); c1_Iter != c1.end(); c1_Iter++)
newNames << *c1_Iter << endl;
newNames.close();
}
}
As Mohit Jain mentioned in the comments, you need to call names.close() on the ifstream before opening the file for writing as a separate ofstream. Also, you can use a std::string charname rather than char charname[50].
You could also use an fstream with appropriate seeking. If I'm not mistaken have active ifstream and ofstream objects handling the same file can lead to undefined behavior.
Here's a more C++ friendly solution:
#include <iostream>
#include <string>
#include <fstream>
#include <list>
#include <iomanip>
int main()
{
char act;
std::string charname;
std::string namestr;
std::list<std::string> c1;
std::list<std::string>::iterator c1_Iter;
//write the names from the file into a list
std::ifstream names("names.txt");
while (std::getline(names, namestr))
{
c1.push_back(namestr);
}
//print the current names
std::cout << "Registered names:";
for (c1_Iter = c1.begin(); c1_Iter != c1.end(); c1_Iter++)
std::cout << "\n" << std::setw(5) << " " << *c1_Iter;
//choose which names to delete and confirm
std::cout << "\n\nEnter the name you would like to delete: ";
std::cin >> charname;
std::cout << "\nAre you sure? Enter 'y' to permanently delete " << charname << ", and any other key to return to the start screen.";
std::cin >> act;
if (act == 'y' || act == 'Y')
{
//delete a file associated with each name
std::string strname(charname);
strname.append(".txt");
if (remove(strname.c_str()) < 0)
{
std::cerr << "Error deleting file " << strname << std::endl;
return 1;
}
else
{
//delete name from the file only if that person's individual file is successfully deleted
c1.remove(charname);
std::cout << "\n" << charname << " successfully deleted!\n";
//print the updated list of names
std::cout << "\nUpdated list of registered names:\n";
for (c1_Iter = c1.begin(); c1_Iter != c1.end(); c1_Iter++)
std::cout << *c1_Iter << std::endl;
//write updated list of names over "List of Names" to update the file
names.close(); //Close the ifstream before opening the file for editing
std::ofstream newNames("names.txt");
for (c1_Iter = c1.begin(); c1_Iter != c1.end(); c1_Iter++)
newNames << *c1_Iter << std::endl;
newNames.close();
}
}
return 0;
}

how to access images in sequence using loop in openCV

Guys i ran into a problem regarding accessing images in sequential order. i have images whose names change with incrementing number i.e. cube_0.jpg, cube_1.jpg, .... and so on. Now i want to access each image one-by-one and show.
Following is my code that i am playing with since 2-days and don't know how to handle this situation or what is wrong with this problem.
ostringstream s;
for (int fileNumber = 0; fileNumber<=40; fileNumber++)
{
s<<"\"cube_"<<fileNumber<<"\.jpg\""<<endl;
string fullfileName(s.str());
images[i] = fullfileName;
}
stringstream ss;
cout<<"file name"<<images[0]<<endl;
for (int file = 0; file<41; file++)
{
string str = images[file];
cout<<"str "<<str<<endl;
img_raw = imread(ss.str(), 1); // load as color image Error
cout<<"Done"<<endl<<"size"<<img_raw.size();
system("pause");
}
This code runs fine till it gets reached to "img_raw = imread(ss.str())", now this line is basically hindering me from accessing file. Since imread requires "string& filename" therefore i performed stringstream operation but nothing is working!
Any help would be greatly appreciated!
There are a few errors.
Your stringstream ss is empty. You declared it but did not fill with any values. I am pretty sure you meant imread(str, 1); instead of imread(ss.str(), 1);
In the first for loop, you are continuously printing filenames to ostringstream, so it goes like this:
0: "cube_0.jpg\"
1: "cube_0.jpg\""cube_1.jpg\"
2: "cube_0.jpg\""cube_1.jpg\""cube_2.jpg\"
...
so the ostringstream just grows and grows. ostringstream needs to be declared in the loop to be cleared for every iteration.
Edited code:
string images[41];
Mat img_raw;
for (int fileNumber = 0; fileNumber < 41; fileNumber++)
{
stringstream ss;
ss << "\cube_" << fileNumber << "\.jpg" << endl;
string fullfileName;
ss >> fullfileName;
images[fileNumber] = fullfileName;
}
for (int file = 0; file < 41; file++)
{
cout << "Loading " << images[file] << endl;
img_raw = imread(images[file], 1);
if (!img_raw.empty())
{
cout << "Successfully loaded " << images[file] << " with size " << img_raw.size() << endl;
}
else
{
cout << "Error loading file " << images[file] << endl;
}
system("pause");
}

How to input a file into C++ and comparing console input

Im working on my homework assignment and I stuck because in the assignment we have to ask the user to enter a file name but also to type in either wc cc or lc (word count, character count, and line count of a file. For example, wc filename.txt. Im suppose to check the file to see if its valid or not which i understand and I know how to compare the users input to determine the different kind of function to run, but I dont understand how you could do it together. Any ideas? This is what I have so far.
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
string line;
string file;
ifstream input; //input file stream
int i;
cout << "Enter a file name" << endl;
while(true){
cout << ">" ;
getline(cin,file);
input.open(file.c_str());
if (input.fail()) {
cerr << "ERROR: Failed to open file " << file << endl;
input.clear();
}
else {
i = 0;
while (getline(input, line))
if(line == "wc"){
cout << "The word count is: " << endl;
}
else if(line == "cc"){
cout << "The character count is: " << endl;
}
else if(line == "lc"){
cout << "The line count is: " << endl;
}
else if(line == "exit"){
return 0;
}
else{
cout << "----NOTE----" << endl;
cout << "Available Commands: " << endl;
cout <<"lc \"filename\"" << endl;
cout <<"cc \"filename\"" << endl;
cout <<"wc \"filename\"" << endl;
cout <<"exit" << endl;
}
}
}
return 0;
}
void wordCount(){
//TBD
}
void characterCount(){
//TBD
}
void lineCount(){
//TBD
}
You have to find the space between the command and the file name in the users input and then split the string where you find the space. Something like this
cout << "Enter a command\n";
string line;
getline(cin, line);
// get the position of the space as an index
size_t space_pos = line.find(' ');
if (space_pos == string::npos)
{
// user didn't enter a space, so error message and exit
cout << "illegal command\n";
exit(1);
}
// split the string at the first space
string cmd = line.substr(0, space_pos);
string file_name = line.substr(space_pos + 1);
This is untested code.
You could do better than this, for instance this would not work if the user entered two spaces between the command and the file name. But this kind of work rapidly gets very tedious. As this is an assignment I would be tempted to move on to more interesting things. You can always come back and improve things later if you have the time.
I think you are asking how to validate multiple arguments: the command and the file.
A simple strategy is to have function like the following:
#include <fstream> // Note: this is for ifstream below
bool argumentsInvalid(const string& command, const string & command) {
// Validate the command
// Note: Not ideal, just being short for demo
if("wc" != command && "cc" != command && "lc" != command) {
std::cout << "Invalid command" << std::endl;
return false;
}
// Validate the file
// Note: This is a cheat that uses the fact that if its valid, its open.
std::ifstream fileToRead(filename);
if(!fileToRead) {
std::cout << "Invalid file: \"" << filename << "\"" << std::endl;
return false;
}
return true;
// Note: This does rely on the ifstream destructor closing the file and would mean
// opening the file twice. Simple to show here, but not ideal real code.
}
If you want to evaluate ALL arguments before returning an error, insert a flag at the top of that function, like:
// To be set true if there is an error
bool errorFound = false;
and change all of the returns in the conditions to:
errorFound = true;
and the final return to:
return !errorFound;
Usage:
....
if(argumentsInvalid(command, filename)) {
std::cout << "Could not perform command. Skipping..." << std::endl;
// exit or continue or whatever
}
// Now do your work
Note: The specific validity tests here are over simplified.