How to optimise code to read a large CSV file in C++? - c++

I am trying to read entries of my csv file into an entries vector whose data type is object OrderBookEntry, this is an object that stores the tokenized variables from the csv line which itself is converted through the function tokenise. My problem is that my csv file has around a million lines and it takes over 30 seconds to load, how can I reduce this?
This is my code that actually reads lines from the file and converts them into entries.
std::vector<OrderBookEntry> CSVReader::readCSV(std::string csvFilename)
{
std::vector<OrderBookEntry> entries;
std::ifstream csvFile{csvFilename};
std::string line;
if (csvFile.is_open())
{
while(std::getline(csvFile, line))
{
try {
OrderBookEntry obe = stringsToOBE(tokenise(line, ','));
entries.push_back(obe);
//std::cout << "pushing entry" << std::endl;
}catch(const std::exception& e)
{
//std::cout << "CSVReader::readCSV bad data" << std::endl;
}
}// end of while
}
std::cout << "CSVReader::readCSV read " << entries.size() << " entries" <<
std::endl;
return entries;
}
This is the part of my code that tokenises the read line
std::vector<std::string> CSVReader::tokenise(std::string csvLine, char separator)
{
std::vector<std::string> tokens;
signed int start, end;
std::string token;
start = csvLine.find_first_not_of(separator, 0);
do{
end = csvLine.find_first_of(separator, start);
if (start == csvLine.length() || start == end) break;
if (end >= 0) token = csvLine.substr(start, end - start);
else token = csvLine.substr(start, csvLine.length() - start);
tokens.push_back(token);
start = end + 1;
}while(end > 0);
return tokens;
}
This is the part of my code that converts the tokens into a suitable object to work with
OrderBookEntry CSVReader::stringsToOBE(std::vector<std::string> tokens)
{
double price, amount;
if (tokens.size() != 5) // bad
{
//std::cout << "Bad line " << std::endl;
throw std::exception{};
}
// we have 5 tokens
try {
price = std::stod(tokens[3]);
amount = std::stod(tokens[4]);
}catch(const std::exception& e){
std::cout << "Bad float! " << tokens[3]<< std::endl;
std::cout << "Bad float! " << tokens[4]<< std::endl;
throw;
}
OrderBookEntry obe{price,
amount,
tokens[0],
tokens[1],
OrderBookEntry::stringToOrderBookType(tokens[2])};
return obe;
}

Related

Reading in fasta file C++

I'm trying to read in a fasta file. I want to remove/ignore the header/info lines that begin with ">" and store the following sequences into sperate strings. Below is the code I have to do that (partially reworked from https://rosettacode.org/wiki/FASTA_format#C++, as what I had originally worked even less). They have a good example of what I want to do.
My problem is given this fasta file:
">sequence_1
MSTAGKVIKCKAAVLWELHKPFTIEDIEVAPPKAHEVRIKMVATGVCRSDDHVVSGTLVTPLPAVLGHE
GAGIVEGVTCVKPGDKVIPLFSPQCGECRICKHPESNFCSRSDLLMPRGTLREGTSRFSCKGKQIHNFI
STSTFSQYTVVDDIAVAKIDGASPLDKVCLIGCGFSTGYGSAVKVAKVTPGSTCAVFGLGGVGLSVIIG
CKAAGAARIIAVDINKDKFAKAKELGATECIYSKPIQEVLQEMTDGGVDFSFEVIGRLDTMTSALLSCH
AACGVSVVVGVPPNAQNLSMNPMLLLLGRTWKGAIFGGFKSKDSVPKLVAKKFPLDPLITHVLPFEKIN
EAFDLLRSGKSIRTVLTF
">sequence_2
MNQGKVIKCKAAVLWEVKKPFSIEDVEVAPPKAYEVRIKMVAVGICHTDDHVVSGNLVTPLPVILGHEA
AGIVESVGEGVTTVKPGDKVIPLFTCRVCKNPESNYCLKNDLGNPRGTLQDGTRRFTCRGKPIHHFLGT
STFSQYTVVDENAVAKIDAASPLEKVCLIGCGFSTGYGSAVNVAKVTPGSTCAVFGLGGVGLSAVMGCK
AAGAARIIAVDINKDKFAKAKELGATECINPQDYKLPIQEVLKEMTDGSTVIGRLDTMMASLLCCGTSV
IVEDTPASQNLSINPMLLLTGRTWKGAVYGGFKSKEGIPKLVADFMAKKFSLDALITHVLPFEKINEGF
DLLHSGKSIRTVLTF
My output:
Sequence 1: MSTAGKVIKCKAAVLWELHKPFTIEDIEVAPPKAHEVRIKMVATGVCRSDDHVVSGTLVTPLPAVLGHEGAGIVEGVTCVKPGDKVIPLFSPQCGECRICKHPESNFCSRSDLLMPRGTLREGTSRFSCKGKQIHNFISTSTFSQYTVVDDIAVAKIDGASPLDKVCLIGCGFSTGYGSAVKVAKVTPGSTCAVFGLGGVGLSVIIGCKAAGAARIIAVDINKDKFAKAKELGATECIYSKPIQEVLQEMTDGGVDFSFEVIGRLDTMTSALLSCHAACGVSVVVGVPPNAQNLSMNPMLLLLGRTWKGAIFGGFKSKDSVPKLVAKKFPLDPLITHVLPFEKINEAFDLLRSGKSIRTVLTF
Sequence 2: MNQGKVIKCKAAVLWEVKKPFSIEDVEVAPPKAYEVRIKMVAVGICHTDDHVVSGNLVTPLPVILGHEAAGIVESVGEGVTTVKPGDKVIPLFTCRVCKNPESNYCLKNDLGNPRGTLQDGTRRFTCRGKPIHHFLGTSTFSQYTVVDENAVAKIDAASPLEKVCLIGCGFSTGYGSAVNVAKVTPGSTCAVFGLGGVGLSAVMGCKAAGAARIIAVDINKDKFAKAKELGATECINPQDYKLPIQEVLKEMTDGSTVIGRLDTMMASLLCCGTSVIVEDTPASQNLSINPMLLLTGRTWKGAVYGGFKSKEGIPKLVADFMAKKFSLDALITHVLPFEKINEGF
The last line or so of Sequence 2 is cut off..... Any help/solutions?
void read_in_Protein(string Protein_filename)
{ // read in the sequences
fstream myfile;
myfile.open(Protein_filename, ios::in);
if (!myfile.is_open()) {
cerr << "Error can not open file" << endl;
exit(1);
}
string Protein_Sequences{};
string Protein_Seq_names{};
// string temp{};
string Prot_Seq1{};
string Prot_Seq2{};
string line{};
while (getline(myfile, line).good()) {
//std::cout << "Line input received (" << line.length() << "): " << line << std::endl;
if (line.empty() || line[0] == '>') { // Identifier marker
if (!Protein_Seq_names.empty()) { // Print out what we read from the last entry
//std::cout << "\tReseting to new sequence" << std::endl;
// cout << Protein_Sequences << endl;
Protein_Seq_names.clear();
Prot_Seq1 = Protein_Sequences;
}
if (!line.empty()) {
//std::cout << "\tSetting sequence start" << std::endl;
Protein_Seq_names = line.substr(1);
}
// std::cout << "\tClearing sequences..." << std::endl;
Protein_Sequences.clear();
}
else if (!Protein_Seq_names.empty()) {
line = line.substr(0, line.length() - 1);
if (line.find(' ') != string::npos) { // Invalid sequence--no spaces allowed
//std::cout << "\tSpace found, clearing buffers..." << std::endl;
Protein_Seq_names.clear();
Protein_Sequences.clear();
}
else {
//std::cout << "\tAppending line to protein sequence..." << std::endl;
Protein_Sequences += line;
}
}
//std::cout << "Protein_Sequences: " << Protein_Sequences << std::endl;
}
if (!Protein_Seq_names.empty()) { // Print out what we read from the last entry
// cout << Protein_Sequences << endl;
Prot_Seq2 = Protein_Sequences;
}
cout << "\nSequence 1: " << Prot_Seq1 << endl;
cout << Prot_Seq1.length();
cout << "\nSequence 2: " << Prot_Seq2 << endl;
cout << Prot_Seq2.length();
}
Assuming your file doesn't end with a new line then the last call to std::getline will set the eof bit to indicate that it reached the end of the file before finding the line ending. As you are checking .good() in your while loop the last line will be discarded. You should instead check !fail() (or just the boolean value of the stream itself which is equivalent to !fail()):
while (getline(myfile, line))
After reading the final line the next iteration of the loop will try to read whilst the stream is in the eof state and immediately fail and break out of the loop.

fstream stops to read at substitute control character

I'm writing a simple encryption program in C++ to encrypt a text-based file.
It's using a simple XOR cipher algorithm, but this produces ASCII control characters in the output file. When I try to read from the newly encrypted file with std::ifstream, it stumbles upon character #26, it stops and becomes unable to read the rest of the file.
Example if I try to encrypt this text:
This is just a simple sample
text with two rows and one sentence.
It turns it to this
/[[[[[
[[[ [[[U
When I try to read that file in my program, it can't read past the character at position 15, so I get a half encrypted file.
How can I fix this?
Here's the code:
#include <iostream>
#include <Windows.h>
#include <string>
#include <fstream>
void Encrypt(char encryptionKey, std::string filename)
{
std::ifstream sourceFile(filename);
std::ofstream outputFile(filename.substr(0, filename.find_last_of("\\")) + "\\Encrypted" + filename.substr(filename.find_last_of("\\") + 1), std::ofstream::out | std::ofstream::trunc);
std::string sourceLine;
std::string outputLine;
long numLines = 0;
if (sourceFile.is_open())
{
std::cout << "Opening file: " + filename + " for encryption" << std::endl;
while (sourceFile.good()) // This iterates over the whole file, once for each line
{
sourceLine = ""; //Clearing the line for each new line
outputLine = ""; //Clearing the line for each new line
std::getline(sourceFile, sourceLine);
for (int i = 0; i < sourceLine.length(); i++) // Looping through all characters in each line
{
char focusByte = sourceLine[i] ^ encryptionKey;
std::cout << " focusByte: " << focusByte << std::endl;
outputLine.push_back(focusByte);
//std::cout << sourceLine << std::flush;
}
numLines++;
outputFile << outputLine << std::endl;
}
}
sourceFile.close();
outputFile.close();
}
void Decrypt(unsigned int encryptionKey, std::string filename)
{
std::ifstream sourceFile(filename);
std::ofstream outputFile(filename.substr(0, filename.find_last_of("\\")) + "\\Decrypted" + filename.substr(filename.find_last_of("\\") + 1), std::ofstream::out | std::ofstream::trunc);
std::string sourceLine;
std::string outputLine;
long numLines = 0;
if (sourceFile.is_open())
{
std::cout << "Opening file: " + filename + " for decryption" << std::endl;
while (sourceFile.good()) // This iterates over the whole file, once for each line
{
if (sourceFile.fail() == true)
std::cout << "eof" << std::endl;
sourceLine = ""; //Clearing the line for each new line
outputLine = ""; //Clearing the line for each new line
std::getline(sourceFile, sourceLine);
for (int i = 0; i < sourceLine.length(); i++) // Looping through all characters in each line
{
char focusByte = sourceLine[i] ^ encryptionKey;
std::cout << " focusByte: " << focusByte << std::endl;
outputLine.push_back(focusByte);
}
numLines++;
outputFile << outputLine << std::endl;
}
}
sourceFile.close();
outputFile.close();
}
int main(int argument_count,
char * argument_list[])
{
system("color a");
std::string filename;
if (argument_count < 2)
{
std::cout << "You didn't supply a filename" << std::endl;
}
else
{
filename = argument_list[1];
std::cout << "Target file: " << filename << std::endl;
std::cout << "Press e to encrypt the selected file, Press d to decrypt the file > " << std::flush;
char choice;
while (true)
{
std::cin >> choice;
if (choice == 'e')
{
Encrypt(123, filename);
break;
}
else if (choice == 'd')
{
Decrypt(123, filename);
break;
}
else
{
std::cout << "please choose option e or d for encryption respectivly decryption" << std::endl;
}
}
}
std::cout << "\nPaused, press Enter to continue > " << std::flush;
system("Pause");
return EXIT_SUCCESS;
}
In Decrypt(), after the first call to std::getline(), sourceFile.good() is false and sourceFile.fail() is true, which is why you stop reading subsequent lines from the encrypted file.
The reason is because the encrypted file has an encoded 0x1A byte in it, and depending on your platform and STL implementation, that character likely gets interpreted as an EOF condition, thus enabling the std::ifstream's eofbit state, terminating further reading.
In my compiler's STL implementation on Windows, when std::ifstream reads from a file, it ultimately calls a function named _Fgetc():
template<> inline bool _Fgetc(char& _Byte, _Filet *_File)
{ // get a char element from a C stream
int _Meta;
if ((_Meta = fgetc(_File)) == EOF) // <-- here
return (false);
else
{ // got one, convert to char
_Byte = (char)_Meta;
return (true);
}
}
When it tries to read an 0x1A character, fgetc() returns EOF, and when _Fgetc() returns false, std::getline() sets the eofbit on the std::ifstream and exits.
Check your compiler's STL for similar behavior.
This behavior is because you are opening the encrypted file in text mode. You need to open the encrypted file in binary mode instead:
std::ifstream sourceFile(..., std::ifstream::binary);
Also, you should enable binary mode on the encrypted file in Encrypt() as well:
std::ofstream outputFile(..., std::ofstream::binary | std::ofstream::trunc);
Try something more like this instead:
#include <Windows.h>
#include <iostream>
#include <string>
#include <fstream>
#include <cstdlib>
void Encrypt(char encryptionKey, const std::string &filename)
{
std::string::size_type pos = filename.find_last_of("\\");
std::string out_filename = filename.substr(0, pos+1) + "Encrypted" + filename.substr(pos + 1);
std::ifstream sourceFile(filename.c_str());
std::ofstream outputFile(out_filename.c_str(), std::ofstream::binary | std::ofstream::trunc);
if (sourceFile.is_open())
{
std::cout << "Opened file: " + filename + " for encryption" << std::endl;
std::string line;
long numLines = 0;
while (std::getline(sourceFile, line)) // This iterates over the whole file, once for each line
{
for (std::string::size_type i = 0; i < line.length(); ++i) // Looping through all characters in each line
{
char focusByte = line[i] ^ encryptionKey;
std::cout << " focusByte: " << focusByte << std::endl;
line[i] = focusByte;
//std::cout << line << std::flush;
}
outputFile << line << std::endl;
++numLines;
}
}
}
void Decrypt(char encryptionKey, const std::string &filename)
{
std::string::size_type pos = filename.find_last_of("\\");
std::string out_filename = filename.substr(0, pos+1) + "Decrypted" + filename.substr(pos + 1);
std::ifstream sourceFile(filename.c_str(), std::ifstream::binary);
std::ofstream outputFile(out_filename.c_str(), std::ofstream::trunc);
if (sourceFile.is_open())
{
std::cout << "Opened file: " + filename + " for decryption" << std::endl;
std::string line;
long numLines = 0;
while (std::getline(sourceFile, line)) // This iterates over the whole file, once for each line
{
for (std::string::size_type i = 0; i < line.length(); ++i) // Looping through all characters in each line
{
char focusByte = line[i] ^ encryptionKey;
std::cout << " focusByte: " << focusByte << std::endl;
line[i] = focusByte;
}
outputFile << line << std::endl;
++numLines;
}
std::cout << "eof" << std::endl;
}
}
int main(int argument_count, char* argument_list[])
{
std::system("color a");
std::string filename;
if (argument_count < 2)
{
std::cout << "Enter a file to process: " << std::flush;
std::getline(std::cin, filename);
}
else
{
filename = argument_list[1];
}
if (filename.empty())
{
std::cout << "You didn't supply a filename" << std::endl;
return EXIT_FAILURE;
}
std::cout << "Target file: " << filename << std::endl;
std::cout << "Press e to encrypt the file" << std::endl;
std::cout << "Press d to decrypt the file" << std::endl;
char choice;
while (true)
{
std::cout << "> " << std::flush;
std::cin >> choice;
if (choice == 'e')
{
Encrypt(123, filename);
break;
}
else if (choice == 'd')
{
Decrypt(123, filename);
break;
}
else
{
std::cout << "please choose option e or d for encryption or decryption, respectively" << std::endl;
}
}
std::cout << std::endl << "Paused, press Enter to continue" << std::flush;
std::system("pause");
return EXIT_SUCCESS;
}
That being said, keep in mind that when using XOR, some of the encrypted characters might end up being \r (0x0D) or \n (0x0A), which will interfere with std::getline() when decrypting the file later on, producing a decrypted output that does not match the original text input.
Since you should be treating the encrypted file as binary, you should not be reading/writing the file as text at all. Choose a different format for your encrypted output that does not rely on line-break semantics in text vs binary mode.
For example:
#include <Windows.h>
#include <iostream>
#include <string>
#include <fstream>
#include <cstdlib>
void Encrypt(char encryptionKey, const std::string &filename)
{
std::string::size_type pos = filename.find_last_of("\\");
std::string out_filename = filename.substr(0, pos+1) + "Encrypted" + filename.substr(pos + 1);
std::ifstream sourceFile(filename.c_str());
std::ofstream outputFile(out_filename.c_str(), std::ofstream::binary | std::ofstream::trunc);
if (sourceFile.is_open())
{
std::cout << "Opened file: " + filename + " for encryption" << std::endl;
std::string line;
std::string::size_type lineLen;
long numLines = 0;
while (std::getline(sourceFile, line)) // This iterates over the whole file, once for each line
{
lineLen = line.length();
for (std::string::size_type i = 0; i < lineLen; ++i) // Looping through all characters in each line
{
char focusByte = line[i] ^ encryptionKey;
std::cout << " focusByte: " << focusByte << std::endl;
line[i] = focusByte;
//std::cout << line << std::flush;
}
outputFile.write((char*)&lineLen, sizeof(lineLen));
outputFile.write(line.c_str(), lineLen);
++numLines;
}
}
}
void Decrypt(char encryptionKey, const std::string &filename)
{
std::string::size_type pos = filename.find_last_of("\\");
std::string out_filename = filename.substr(0, pos+1) + "Decrypted" + filename.substr(pos + 1);
std::ifstream sourceFile(filename.c_str(), std::ifstream::binary);
std::ofstream outputFile(out_filename.c_str(), std::ofstream::trunc);
if (sourceFile.is_open())
{
std::cout << "Opened file: " + filename + " for decryption" << std::endl;
std::string line;
std::string::size_type lineLen;
long numLines = 0;
while (sourceFile.read((char*)&lineLen, sizeof(lineLen))) // This iterates over the whole file, once for each line
{
line.resize(lineLen);
if (!sourceFile.read(&line[0], lineLen))
break;
for (std::string::size_type i = 0; i < lineLen; ++i) // Looping through all characters in each line
{
char focusByte = line[i] ^ encryptionKey;
std::cout << " focusByte: " << focusByte << std::endl;
line[i] = focusByte;
}
outputFile << line << std::endl;
++numLines;
}
std::cout << "eof" << std::endl;
}
}
int main(int argument_count, char* argument_list[])
{
std::system("color a");
std::string filename;
if (argument_count < 2)
{
std::cout << "Enter a file to process: " << std::flush;
std::getline(std::cin, filename);
}
else
{
filename = argument_list[1];
}
if (filename.empty())
{
std::cout << "You didn't supply a filename" << std::endl;
return EXIT_FAILURE;
}
std::cout << "Target file: " << filename << std::endl;
std::cout << "Press e to encrypt the file" << std::endl;
std::cout << "Press d to decrypt the file" << std::endl;
char choice;
while (true)
{
std::cout << "> " << std::flush;
std::cin >> choice;
if (choice == 'e')
{
Encrypt(123, filename);
break;
}
else if (choice == 'd')
{
Decrypt(123, filename);
break;
}
else
{
std::cout << "please choose option e or d for encryption or decryption, respectively" << std::endl;
}
}
std::cout << std::endl << "Paused, press Enter to continue" << std::flush;
std::system("pause");
return EXIT_SUCCESS;
}
ASCII value 26 is EOF on some operating systems.
You should probably treat your encrypted file as a byte stream rather than a text file for reading and writing. That means either using read() and write() functions of the IOStream or at the very least opening the files in binary mode.
If you're just enciphering your text instead of encrypting, maybe choose a different cipher (eg. ROT13) that is closed on the set of printable ASCII or UTF-8 characters.
I compiled your code in Linux (minus all the Windows stuff)...
I get this when encrypting your sentence with your code:
/[[[[[
[[[ [[[U
It also decrypts back to the original sentence. Without the goofy characters, it is the same as your output so your actual issue seems related to the encoding of the file and the program you are using to view the results. Stephan is correct in saying you should be reading/writing bytes instead of text. This can cause all sorts of issues with the characters you create. For example, line feeds and carriage returns since you are using getline().
Edit: Strange. After editing this answer, all the odd characters disappeared. Here is a screenshot:

Where can I use OpenMP in my C++ code

I am writing a C++ code to calculate the code coverage and I want to used the OpenMP to help enhance my code by minimizing the overall run time by making the functions work in parallel so I can get less run time.
Can someone please tell me how and where to use the OpenMP?
int _tmain(int argc, _TCHAR* argv[])
{
std::clock_t start;
start = std::clock();
char inputFilename[] = "Test-Case-3.cs"; // Test Case File
char outputFilename[] = "Result.txt"; // Result File
int totalNumberOfLines = 0;
int numberOfBranches = 0;
int statementsCovered = 0;
float statementCoveragePercentage = 0;
double overallRuntime = 0;
ifstream inFile; // object for reading from a file
ofstream outFile; // object for writing to a file
inFile.open(inputFilename, ios::in);
if (!inFile) {
cerr << "Can't open input file " << inputFilename << endl;
exit(1);
}
totalNumberOfLines = NoOfLines(inFile);
inFile.clear(); // reset
inFile.seekg(0, ios::beg);
numberOfBranches = NoOfBranches(inFile);
inFile.close();
statementsCovered = totalNumberOfLines - numberOfBranches;
statementCoveragePercentage = (float)statementsCovered * 100/ totalNumberOfLines;
outFile.open(outputFilename, ios::out);
if (!outFile) {
cerr << "Can't open output file " << outputFilename << endl;
exit(1);
}
outFile << "Total Number of Lines" << " : " << totalNumberOfLines << endl;
outFile << "Number of Branches" << " : " << numberOfBranches << endl;
outFile << "Statements Covered" << " : " << statementsCovered << endl;
outFile << "Statement Coverage Percentage" << " : " << statementCoveragePercentage <<"%"<< endl;
overallRuntime = (std::clock() - start) / (double)CLOCKS_PER_SEC;
outFile << "Overall Runtime" << " : " << overallRuntime << " Seconds"<< endl;
outFile.close();
}
i want to minimize the time taken to count the number of branches by allowing multiple threads to work in parallel to calculate the number faster? how can i edit the code so that i can use the open mp and here you can find my functions:bool is_only_ascii_whitespace(const std::string& str)
{
auto it = str.begin();
do {
if (it == str.end()) return true;
} while (*it >= 0 && *it <= 0x7f && std::isspace(*(it++)));
// one of these conditions will be optimized away by the compiler,
// which one depends on whether char is signed or not
return false;
}
// Function 1
int NoOfLines(ifstream& inFile)
{
//char line[1000];
string line;
int lines = 0;
while (!inFile.eof()) {
getline(inFile, line);
//cout << line << endl;
if ((line.find("//") == std::string::npos)) // Remove Comments
{
if (!is_only_ascii_whitespace(line)) // Remove Blank
{
lines++;
}
}
//cout << line << "~" <<endl;
}
return lines;
}
// Function 2
int NoOfBranches(ifstream& inFile)
{
//char line[1000];
string line;
int branches = 0;
while (!inFile.eof()) {
getline(inFile, line);
if ((line.find("if") != std::string::npos) || (line.find("else") != std::string::npos))
{
branches++;
}
}
return branches;
}

FASTA reader written in C++?

Let me start off by stating that I am a beginner in C++. Anyways, the FASTA format goes as follows:
Any line starting with a '>' indicates the name/id of the gene sequence right below it. There is a gene sequence right below the id. This gene sequence can be 1 or multiple lines.
So... what I want to do is print: id << " : " << gene_sequence << endl;
Here is my code:
#include <iostream>
#include <fstream>
int main(int argc, char **argv) {
if (argc < 2) {
std::cerr << " Wrong format: " << argv[0] << " [infile] " << std::endl;
return -1;
}
std::ifstream input(argv[1]);
if (!input.good()) {
std::cerr << "Error opening: " << argv[1] << " . You have failed." << std::endl;
return -1;
}
std::string line, id, DNA_sequence;
while (std::getline(input, line).good()) {
if (line[0] == '>') {
id = line.substr(1);
std::cout << id << " : " << DNA_sequence << std::endl;
DNA_sequence.clear();
}
else if (line[0] != '>'){
DNA_sequence += line;
}
}
}
For the second argument inputted into the command line, here is the content of my file:
>DNA_1
GATTACA
>DNA_2
TAGACCA
TAGACCA
>DNA_3
ATAC
>DNA_4
AT
Please copy and paste into text file.
After this has been done, and the code has been executed, I want to point out the problem. The code skips inputting the sequence of DNA_1 into its correct respective place, and instead placing DNA_1 's sequence into DNA_2. The results get pushed forward 1 as a result. Any assistance or tips would be greatly appreciated?
As I've said before, I am new to C++. And the semantics are quite hard to learn compared to Python.
I see a few problems with your code.
First you loop on std::ifstream::good() which doesn't work because it won't allow for End Of File (which happens even after a good read).
Then you access line[0] without checking if the line is empty which could cause a seg-fault.
Next you output the "previous line" before you have even collected it.
Finally you don't output the final line because the loop terminates when it doesn't find another >.
I added comments to my corrections to your code:
#include <iostream>
#include <fstream>
int main(int argc, char **argv) {
if (argc < 2) {
std::cerr << " Wrong format: " << argv[0] << " [infile] " << std::endl;
return -1;
}
std::ifstream input(argv[1]);
if (!input.good()) {
std::cerr << "Error opening: " << argv[1] << " . You have failed." << std::endl;
return -1;
}
std::string line, id, DNA_sequence;
// Don't loop on good(), it doesn't allow for EOF!!
// while (std::getline(input, line).good()) {
while (std::getline(input, line)) {
// line may be empty so you *must* ignore blank lines
// or you have a crash waiting to happen with line[0]
if(line.empty())
continue;
if (line[0] == '>') {
// output previous line before overwriting id
// but ONLY if id actually contains something
if(!id.empty())
std::cout << id << " : " << DNA_sequence << std::endl;
id = line.substr(1);
DNA_sequence.clear();
}
else {// if (line[0] != '>'){ // not needed because implicit
DNA_sequence += line;
}
}
// output final entry
// but ONLY if id actually contains something
if(!id.empty())
std::cout << id << " : " << DNA_sequence << std::endl;
}
Output:
DNA_1 : GATTACA
DNA_2 : TAGACCATAGACCA
DNA_3 : ATAC
DNA_4 : AT
You're storing the new id before printing the old one:
id = line.substr(1);
std::cout << id << " : " << DNA_sequence << std::endl;
Swap the lines around for proper order. You probably also want to check if you have any id already present to skip the first entry.
working implementation is here
https://rosettacode.org/wiki/FASTA_format#C.2B.2B
only corrected
while( std::getline( input, line ).good() ){
to
while( std::getline( input, line ) ){
Code
#include <iostream>
#include <fstream>
int main( int argc, char **argv ){
if( argc <= 1 ){
std::cerr << "Usage: "<<argv[0]<<" [infile]" << std::endl;
return -1;
}
std::ifstream input(argv[1]);
if(!input.good()){
std::cerr << "Error opening '"<<argv[1]<<"'. Bailing out." << std::endl;
return -1;
}
std::string line, name, content;
while( std::getline( input, line ) ){
if( line.empty() || line[0] == '>' ){ // Identifier marker
if( !name.empty() ){ // Print out what we read from the last entry
std::cout << name << " : " << content << std::endl;
name.clear();
}
if( !line.empty() ){
name = line.substr(1);
}
content.clear();
} else if( !name.empty() ){
if( line.find(' ') != std::string::npos ){ // Invalid sequence--no spaces allowed
name.clear();
content.clear();
} else {
content += line;
}
}
}
if( !name.empty() ){ // Print out what we read from the last entry
std::cout << name << " : " << content << std::endl;
}
return 0;
}
input:
>Rosetta_Example_1
THERECANBENOSPACE
>Rosetta_Example_2
THERECANBESEVERAL
LINESBUTTHEYALLMUST
BECONCATENATED
output:
Rosetta_Example_1 : THERECANBENOSPACE
Rosetta_Example_2 : THERECANBESEVERALLINESBUTTHEYALLMUSTBECONCATENATED

Reading from a file C++ Ifstream

I am trying to do a problem using dynamic programming to find which items to select from a text file under a certain weight capacity while maximizing value. The file is in the format:
item1, weight, value
item2, weight, value
With a variable number of items. I am having a problem reading the file in the main method, where I am trying to error check for bad input. My problem comes from when I check if the weight is either missing, or is a string instead of an int. I can't figure out how to check separately for a string or if there is nothing in that spot. I am supposed to output a different error depending on which it is. Thanks.
int main(int argc, char * const argv[])
{
std::vector<Item> items;
if (argc != 3)
{
std::cerr << "Usage: ./knapsack <capacity> <filename>";
return -1;
}
int capacity;
std::stringstream iss(argv[1]);
if (!(iss >> capacity) || (capacity < 0))
{
std::cerr << "Error: Bad value '" << argv[1] << "' for capacity." << std::endl;
return -1;
}
std::ifstream ifs(argv[2]);
if (ifs.is_open())
{
std::string description;
unsigned int weight;
unsigned int value;
int line = 1;
while (!ifs.eof())
{
if (!(ifs >> description))
{
std::cerr << "Error: Line number " << line << " does not contain 3 fields." << std::endl;
return -1;
}
if (!(ifs >> weight))
{
if (ifs.eof())
std::cerr << "Error: Line number " << line << " does not contain 3 fields." << std::endl;
else
std::cerr << "Error: Invalid weight '" << ifs << "' on line number " << line << "." << std::endl;
return -1;
}
else if (!(ifs >> weight) || (weight < 0))
{
std::cerr << "Error: Invalid weight '" << ifs << "' on line number " << line << "." << std::endl;
return -1;
}
if (!(ifs >> value))
{
if (ifs.eof())
std::cerr << "Error: Line number " << line << " does not contain 3 fields." << std::endl;
else
std::cerr << "Error: Invalid value '" << ifs << "' on line number " << line << "." << std::endl;
return -1;
}
else if (!(ifs >> value) || (value < 0))
{
std::cerr << "Error: Invalid value '" << ifs << "' on line number " << line << "." << std::endl;
return -1;
}
Item item = Item(line, weight, value, description);
items.push_back(item);
line++;
}
ifs.close();
knapsack(capacity, items, items.size());
}
else
{
std::cerr << "Error: Cannot open file '" << argv[2] << "'." << std::endl;
return -1;
}
return 0;
}
I think better approach will be using getline function, by which you will take a whole line and then you can try to split it into three strings. If three parts are not found or any of the three parts is not in a right format you can output an error.
An example code is give below,
#include<fstream>
#include<iostream>
#include<vector>
#include <sstream>
#include<string>
using namespace std;
void split(const std::string &s, std::vector<std::string> &elems, char delim) {
elems.clear();
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
}
}
int main()
{
vector<string> elems;
ifstream fi("YOUR\\PATH");
string inp;
while(getline(fi, inp))
{
split(inp, elems, ' ');
if(elems.size() != 3)
{
cout<<"error\n";
continue;
}
int value;
bool isint = istringstream(elems[1])>>value;
if(!isint)
{
cout << "error\n";
continue;
}
cout<<"value was : " << value << endl;
}
return 0;
}
You could read the whole line at once and check it's format using regular expressions.