C++ Load text file, optimization [closed] - c++

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
There is my source code loading text file and delimitting each line to single items (words).
How to further optimize the code? Testing empty lines (and other constructions) are (in my opinion) a little bit inefficient....
typedef std::vector < std::string > TLines;
typedef std::vector < std::vector < std::string > > TItems;
TItems TFloadFile ( const char * file_name )
{
//Load projection from file
unsigned int lines = 0;
char buffer[BUFF];
FILE * file;
TItems file_words;
TLines file_lines;
file = fopen ( file_name, "r" );
if ( file != NULL )
{
for ( ; fgets ( buffer, BUFF, file ); )
{
//Remove empty lines
bool empty_line = true;
for ( unsigned i = 0; i < strlen ( buffer ); i++ )
{
if ( !isspace ( ( unsigned char ) buffer[i] ) )
{
empty_line = false;
break;
}
}
if ( !empty_line )
{
file_lines.push_back ( buffer );
lines++;
}
}
file_words.resize ( lines + 1 );
for ( unsigned int i = 0; i < lines; i++ )
{
char * word = strtok ( const_cast<char *> ( file_lines[i].c_str() ), " \t,;\r\n" );
for ( int j = 0; word; j++, word = strtok ( 0, " \t;\r\n" ) )
{
file_words[i].push_back ( word );
}
}
fclose ( file );
}
return file_words;
}
Thanks for your help...

The line for ( unsigned i = 0; i < strlen ( buffer ); i++ ) is quite inefficient as you're calculating the length of buffer each time through the loop. However, it's possible that this will be optimised away by the compiler.
You're pushing items onto your std::vectors without reserve()ing any space. For large file, this will involve a lot of overhead as the content of the vectors will need to be copied in order to resize them. I just read #Notinlist's answer, which already talks about the inefficiencies of std::vector::resize().
Instead of reading each line into a vector through repeated fgets() calls, could you not simply determine the number of bytes in the file, dynamically allocate a char array to hold them, and then dump the bytes into it? Then, you could parse the words and store them in file_words. This would be more efficient than the method you're currently using.

Before optimizing, can you explain how big the file is, how long the code currently takes to execute and why you think it isn't already IO bound (ie due to hard disk speed). How long do you think it should take? Some idea of the type of data in the file would be good too (such as average line length, average proportion of empty lines etc).
That said, combine the remove-empty-line loop with the word-tokenising loop. Then you can remove TLines altogether and avoid the std::string constructions and vector push-back. I haven't checked this code works, but it should be close enough to give you the idea. It also includes a more efficient empty line spotter:
if ( file != NULL )
{
for ( ; fgets ( buffer, BUFF, file ); )
{
bool is_empty = true;
for (char *c = buffer; *c != '\0'; c++)
{
if (!isspace(c))
{
is_empty = false;
break;
}
}
if (is_empty)
continue;
file_words.resize ( lines + 1 );
char * word = strtok ( buffer, " \t,;\r\n" );
for ( int j = 0; word; j++, word = strtok ( 0, " \t;\r\n" ) )
{
file_words[i].push_back ( word );
}
lines++;
}
fclose ( file );
}

For one
file_lines.push_back ( buffer );
That is a very expensive line. If you don't have to use vector, then use a list instead. Maybe convert your list to a vector after you finished with the job.
If you absolutely in need of using vector for this purpose, then you should use some exponential increment instead, like:
if(file_lines.size()<=lines){
file_lines.resize((int)(lines * 1.3 + 1));
}
That way you will have much less cpu intensive .resize() operations, for a cost of minimal memory consumption overhead.

Simplified and converted to use std::list instead of std::vector.
typedef std::list< std::list< std::string > > TItems;
TItems TFloadFile ( const char * file_name )
{
using namespace std;
//Load projection from file
ifstream file(file_name);
TItems file_words;
string line;
for(getline(file,line); !file.fail() && !file.eof(); getline(file,line))
{
file_words.push_back(list<string>());
list<string> &words(file_words.back());
char *word = strtok((char*)line.c_str(), " \t,;\r\n" );
for(; word; word=strtok( 0, " \t;\r\n" ))
{
words.push_back( word );
}
if(!words.size())
file_words.pop_back();
}
return file_words;
}

Related

How do you read n bytes from a file and put them into a vector<uint8_t> using iterators?

Based on this this question:
How to read a binary file into a vector of unsigned chars
In the answer they have:
std::vector<BYTE> readFile(const char* filename)
{
// open the file:
std::basic_ifstream<BYTE> file(filename, std::ios::binary);
// read the data:
return std::vector<BYTE>((std::istreambuf_iterator<BYTE>(file)),
std::istreambuf_iterator<BYTE>());
}
Which reads the entire file into the vector.
What I want to do is read (for example) 100 bytes at a time in the vector, then do stuff, and then read the next 100 bytes into the vector (clear the vector between). I don't see how to specify how much of the file to read (i.e. how to setup the iterators). Is that even possible?
I am trying to avoid having to write my own code loop to copy each byte at a time.
You can use ifstream::read for that.
std::vector<BYTE> v(100);
while ( file.read(reinterpret_cast<char*>(v.data()), 100) )
{
// Find out how many characters were actually read.
auto count = file.gcount();
// Use v up to count BTYEs.
}
You could write a function to:
void readFile( const std::string &fileName, size_t chunk, std::function<void(const std::vector<BYTE>&)> proc )
{
std::ifstream f( fileName );
std::vector<BYTE> v(chunk);
while( f.read( v.data(), v.size() ) ) {
v.resize( f.gcount() );
proc( v );
v.resize( chunk );
}
}
then usage is simple:
void process( const std::vector<BYTE> &v ) { ... }
readFile( "foobar", 100, process ); // call process for every 100 bytes of data
or you can use lambda etc for callback.
Or you can write your own function for that:
template<typename Data>
std::istreambuf_iterator<Data> readChunk(std::istreambuf_iterator<Data>& curr, std::vector<Data>& vec, size_t chunk = 100) {
for (int i = 0; curr != std::istreambuf_iterator<Data>() && i < chunk; ++i, ++curr) {
vec.emplace_back(*curr);
}
return curr;
}
and use it as:
std::ifstream file("test.cpp");
std::vector<BYTE> v;
std::istreambuf_iterator<BYTE> curr(file);
readChunk<BYTE>(curr, v);
And you can call this function again.

reading csv file for specific information

I am wondering how to read a specific value from a csv file in C++, and then read the next four items in the file. For example, this is what the file would look like:
fire,2.11,2,445,7891.22,water,234,332.11,355,5654.44,air,4535,122,334.222,16,earth,453,46,77.3,454
What I want to do is let my user select one of the values, let's say "air" and also read the next four items(4535 122 334.222 16).
I only want to use fstream,iostream,iomanip libraries. I am a newbie, and I am horrible at writing code, so please, be gentle.
You should read about parsers. Full CSV specifications.
If your fields are free of commas and double quotes, and you need a quick solution, search for getline/strtok, or try this (not compiled/tested):
typedef std::vector< std::string > svector;
bool get_line( std::istream& is, svector& d, const char sep = ',' )
{
d.clear();
if ( ! is )
return false;
char c;
std::string s;
while ( is.get(c) && c != '\n' )
{
if ( c == sep )
{
d.push_back( s );
s.clear();
}
else
{
s += c;
}
}
if ( ! s.empty() )
d.push_back( s );
return ! s.empty();
}
int main()
{
std::ifstream is( "test.txt" );
if ( ! is )
return -1;
svector line;
while ( get_line( is, line ) )
{
//...
}
return 0;
}

string usage with textfiles c/c++

I got a problem using strings. So I had the idea of writing a program, that multiplicates two parenthesis, since I had some with 10 variables each. I put a parenthesis in a .txt file and wanted to read it and just print into another .txt file. I am not sure if it has problems with the specific signs.
So here is my txt that I read
2*x_P*x_N - x_P^2 + d_P - 2*x_N*x_Q + x_Q^2 - d_Q
and here is what it actually prints
2*x_--x_P^++d_P-2*x_++x_Q^--
as You can see it is completely wrong. In addition I get an error after executing, but it still prints it into the .txt. So here is my code:
#include <stdio.h>
#include <string>
using namespace std;
int main()
{
int i;
const int size = 11;
string array[ size ];
FILE * file_read;
file_read = fopen( "alt.txt", "r" );
for( i = 0; i < size; i++ ) //Read
{
fscanf( file_read, "%s", &array[ i ] );
}
fclose( file_read );
FILE * file_write;
file_write = fopen( "neu.txt", "w" );
for( i = 0; i < size; i++ ) //Write
{
fprintf( file_write, "%s", &array[ i ] );
}
fclose( file_write ); printf("test");
return 1;
}
Thanks for suggestions. You can put suggestions made with iostream as well.
You are mixing C++ and C forms of file input:
When you write:
fscanf( file_read, "%s", &array[ i ] );
the C standard library expects that you provide a pointer to a buffer in which the string read in the file will be stored in form of a C string, that is an array of null terminated characters.
Unfortunately, you provide a pointer to a C++ string. So this will result in undefined behaviour (most probably memory corruption).
Solution 1
If you want to keep using the C standard library file i/o, you have to use an interim buffer:
char mystring[1024]; //for storing the C string
...
fscanf( file_read, "%1023s", mystring );
array[ i ] = string(mystring); // now make it a C++ string
Please note that the format is slightly changed, in order to avoid risks of buffer overflow in case the file contains a string that is larger than your buffer.
Solution 2
If you learn C++ (looking at your C++ tag and the string header), I'd strongly suggest that you have a look at fstream in the C++ library. It's designed to work very well with strings.
Here how it could look like:
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
const int size = 11;
string array[ size ];
ifstream file_read( "alt.txt");
for(int i = 0; i < size && file_read >> array[ i ]; i++ ) //Read
;
file_read.close();
ofstream file_write("neu.txt");
for(int i = 0; i < size; i++ ) //Write
file_write << array[ i ] <<" "; // with space separator
file_write.close();
cout << "test"<<endl;
return 0;
}
And of course, the next thing you should consider, would be to replace classic arrays with vectors (you don't have to define their size in advance).

Data loss issue while converting the string from std::string to const char *

In this function, passing string as an argument (which is having the huge amount of data as a string)...
SendBytes method is defined like this
bool NetOutputBuffer_c::SendBytes ( const void * pBuf, int iLen )
{
BYTE * pMy = (BYTE*)pBuf;
while ( iLen>0 && !m_bError )
{
int iLeft = m_iBufferSize - ( m_pBufferPtr-m_pBuffer );
if ( iLen<=iLeft )
{
printf("iLen is %d\n",iLen);
memcpy ( m_pBufferPtr, pMy, iLen );
m_pBufferPtr += iLen;
break;
}
ResizeIf ( iLen );
}
return !m_bError;
}
bool NetOutputBuffer_c::SendCompressedString ( std::string sStr )
{
std::cout << sStr << endl;
const char *cStr = sStr.c_str();
std::cout << cStr <<endl;
int iLen = cStr ? strlen(cStr) : 0;
SendInt ( iLen );
return SendBytes ( cStr, iLen );
}
Tried printing the value of sStr to check whether it has proper data or not.
Then converting the std::string to const char *
After converting it,tried printing the value of cStr. But it(cStr) actually contains 5% of the actual data(sStr)...
what shall I need to do in order to get the whole data?
Can someone guide me in this regard?
strlen works for c-strings. That is, null terminated strings. It searches for a '\0' character, and returns that character's position as the length of the string. You have compressed binary data, not a c-string. It almost certainly contains a 0 before the end of the string. Use sStr.size() instead.
SendInt ( sStr.size() );
return SendBytes ( sStr.c_str(), sStr.size() );

Need help in c++ code [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Could you help me find what is error with this 2 line which I took from the below line. Since I am a newbie in c++, I need your help folks. In addition, how to change this code to c++ because I am used to C programming language rather than C++
fgets( line, 80, in )
error :
rewind( in );
rows = countLines(in);
Code:
int container:: countLines( ifstream in )
{
int count = 0;
char line[80];
if ( in.good())
{
while ( !in.eof() )
if (in>>line ) count++;
rewind( in );
}
return count;
}
// opens the file and stores the strings
//
// input: string of passenger data
// container to store strings
//
int container:: processFile( char* fn )
{
char line[80];
ifstream in ;
in.open(fn);
int count = 0;
if ( !in.fail() )
{
rows = countLines(in);
strings = new char* [rows];
while ( !in.eof() )
{
if ( in>>line )
{
strings[count] =new char [strlen(line)+1];
strcpy(strings[count],line);
count++;
}
}
}
else
{
//printf("Unable to open file %s\n",fn);
//cout<<"Unable to open file "<<fn<<endl;
exit(0);
}
in.close();
return count;
}
Generally, when you pass a stream argument you don't pass by value:
int container:: countLines( ifstream in )
You pass by reference:
int container:: countLines( ifstream& in )
This logic is wrong:
if ( in.good())
{
while ( !in.eof() )
if (in>>line ) count++;
}
Don't use eof() this way. Instead:
while (in >> line)
count++;
This is how to rewind in C:
rewind( in );
In C++, look at the seekg function:
http://en.cppreference.com/w/cpp/io/basic_istream/seekg
Prefer to use std::string over char*:
strings = new char* [rows];
Again, don't use eof():
while (in >> line)
{
strings[count] =new char [strlen(line)+1];
strcpy(strings[count],line);
count++;
}