reading csv file for specific information

reading csv file for specific information - c++

I am wondering how to read a specific value from a csv file in C++, and then read the next four items in the file. For example, this is what the file would look like:
fire,2.11,2,445,7891.22,water,234,332.11,355,5654.44,air,4535,122,334.222,16,earth,453,46,77.3,454
What I want to do is let my user select one of the values, let's say "air" and also read the next four items(4535 122 334.222 16).
I only want to use fstream,iostream,iomanip libraries. I am a newbie, and I am horrible at writing code, so please, be gentle.

You should read about parsers. Full CSV specifications.
If your fields are free of commas and double quotes, and you need a quick solution, search for getline/strtok, or try this (not compiled/tested):
typedef std::vector< std::string > svector;
bool get_line( std::istream& is, svector& d, const char sep = ',' )
{
d.clear();
if ( ! is )
return false;
char c;
std::string s;
while ( is.get(c) && c != '\n' )
{
if ( c == sep )
{
d.push_back( s );
s.clear();
}
else
{
s += c;
}
}
if ( ! s.empty() )
d.push_back( s );
return ! s.empty();
}
int main()
{
std::ifstream is( "test.txt" );
if ( ! is )
return -1;
svector line;
while ( get_line( is, line ) )
{
//...
}
return 0;
}

Related

changing first letter to uppercase with fscanf and fseek

My program changes first letter of each word to uppercase in a .txt file.
I enter the address of file.this program save a word as a character array named "word".it changes the first cell of array to uppercase.then counts the letters of that word and and moves back to first letter of the word.then it writes the new word in file.
But it dose not work correctly!!!
#include <iostream>
#include <stdio.h>
using namespace std;
int main ()
{
int t=0, i=0,j=0;
char word[5][20];
FILE *f;
char adres[20];
cin >> adres; // K:\\t.txt
f=fopen(adres,"r+");
{
t=ftell(f);
cout << t<<"\n";
fscanf(f,"%s",&word[i]);
word[i][0]-=32;
for (j=0;word[i][j]!=0;j++){}
fseek(f,-j,SEEK_CUR);
fprintf(f,"%s",word[i]);
t=ftell(f);
cout << t<<"\n";
}
i++;
{
fscanf(f,"%s",&word[i]);
word[i][0]-=32;
for (j=0;word[i][j]!=0;j++){}
fseek(f,-j,SEEK_CUR);
fprintf(f,"%s",word[i]);
t=ftell(f);
cout << t<<"\n";
}
return 0;
}
and the file is like:
hello kami how are you
the answer is that:
Hello kaAmihow are you

I think , this is what you need.
#include<iostream>
#include<cstring>
#include<fstream>
using namespace std;
void readFile()
{
string word;
ifstream fin;
ofstream fout;
fin.open ("read.txt");
fout.open("write.txt");
if (!fin.is_open()) return;
while (fin >> word)
{
if(word[0]>='a' && word[0]<='z')
word[0]-=32;
fout<< word << ' ';
}
}
int main(){
readFile();
return 0;
}

This looks like homework.
Don't try to read and write in the same file. Use different files (in.txt & out.txt, for instance). You may delete & rename the files at end.
Use c++ streams.
Read one character at a time.
Divide your algorithm in three parts:
Read & write white-space until you find a non-white-space character.
Change the character to uppercase and write it.
Read and write the rest of the word.
Here it is a starting point:
#include <fstream>
#include <locale>
int main()
{
using namespace std;
ifstream is( "d:\\temp\\in.txt" );
if ( !is )
return -1;
ofstream os( "d:\\temp\\out.txt" );
if ( !os )
return -2;
while ( is )
{
char c;
while ( is.get( c ) && isspace( c, locale() ) )
os.put( c );
is.putback( c );
// fill in the blanks
}
return 0;
}
[EDIT]
Your program has too many problems.
It is not clear what you're trying to do. You probably want to capitalize each word.
scanf functions skip white-spaces in front of a string. If the file contains " abc" (notice the white space in front of 'a') and you use fscanf, you will get "abc" - no white-space.
Subtracting 32 from a character does not necessarily convert it to a capital letter. What if it is a digit, not a letter? Instead you should use toupper function.
etc.
[EDIT] One file & c style:
#include <stdio.h>
#include <ctype.h>
int main()
{
FILE* f = fopen( "d:\\temp\\inout.txt", "r+b" );
if ( !f )
return -1;
while ( 1 )
{
int c;
//
while ( ( c = getc( f ) ) && isspace( c ) )
;
if ( c == EOF )
break;
//
fseek( f, -1, SEEK_CUR );
putc( toupper( c ), f );
fseek( f, ftell( f ), SEEK_SET ); // add this line if you're using visual c
//
while ( ( c = getc( f ) ) != EOF && !isspace( c ) )
;
if ( c == EOF )
break;
}
fclose( f );
return 0;
}

How to extract the string pattern in C++ efficiently?

I have a pattern that in the following format:
AUTHOR, "TITLE" (PAGES pp.) [CODE STATUS]
For example, I have a string
P.G. Wodehouse, "Heavy Weather" (336 pp.) [PH.409 AVAILABLE FOR LENDING]
I want to extract
AUTHOR = P.G. Wodehouse
TITLE = Heavy Weather
PAGES = 336
CODE = PH.409
STATUS = AVAILABLE FOR LENDING
I only know how to do that in Python, however, are there any efficient way to do the same thing in C++?

Exactly the same way as in Python. C++11 has regular expressions (and for earlier C++, there's Boost regex.) As for the read loop:
std::string line;
while ( std::getline( file, line ) ) {
// ...
}
is almost exactly the same as:
for line in file:
# ...
The only differences are:
The C++ version will not put the trailing '\n' in the buffer. (In general, the C++ version may be less flexible with regards to end of line handling.)
In case of a read error, the C++ version will break the loop; the Python version will raise an exception.
Neither should be an issue in your case.
EDIT:
It just occurs to me that while regular expressions in C++ and in Python are very similar, the syntax for using them isn't quite the same. So:
In C++, you'd normally declare an instance of the regular expression before using it; something like Python's re.match( r'...', line ) is theoretically possible, but not very idiomatic (and it would still involve explicitly constructuing a regular expression object in the expression). Also, the match function simply returns a boolean; if you want the captures, you need to define a separate object for them. Typical use would probably be something like:
static std::regex const matcher( "the regular expression" );
std::smatch forCaptures;
if ( std::regex_match( line, forCaptures, matcher ) ) {
std::string firstCapture = forCaptures[1];
// ...
}
This corresponds to the Python:
m = re.match( 'the regular expression', line )
if m:
firstCapture = m.group(1)
# ...
EDIT:
Another answer has suggested overloading operator>>; I heartily concur. Just out of curiousity, I gave it a go; something like the following works well:
struct Book
{
std::string author;
std::string title;
int pages;
std::string code;
std::string status;
};
std::istream&
operator>>( std::istream& source, Book& dest )
{
std::string line;
std::getline( source, line );
if ( source )
{
static std::regex const matcher(
R"^(([^,]*),\s*"([^"]*)"\s*\((\d+) pp.\)\s*\[(\S+)\s*([^\]]*)\])^"
);
std::smatch capture;
if ( ! std::regex_match( line, capture, matcher ) ) {
source.setstate( std::ios_base::failbit );
} else {
dest.author = capture[1];
dest.title = capture[2];
dest.pages = std::stoi( capture[3] );
dest.code = capture[4];
dest.status = capture[5];
}
}
return source;
}
Once you've done this, you can write things like:
std::vector<Book> v( (std::istream_iterator<Book>( inputFile )),
(std::istream_iterator<Book>()) );
And load an entire file in the initialization of a vector.
Note the error handling in the operator>>. If a line is misformed, we set failbit; this is the standard convention in C++.
EDIT:
Since there's been so much discussion: the above is fine for small, one time programs, things like school projects, or one time programs which will read the current file, output it in a new format, and then be thrown away. In production code, I would insist on support for comments and empty lines; continuing in case of error, in order to report multiple errors (with line numbers), and probably continuation lines (since titles can get long enough to become unwieldly). It's not practical to do this with operator>>, if for no other reason than the need to output line numbers, so I'd use a parser along the following line:
int
getContinuationLines( std::istream& source, std::string& line )
{
int results = 0;
while ( source.peek() == '&' ) {
std::string more;
std::getline( source, more ); // Cannot fail, because of peek
more[0] = ' ';
line += more;
++ results;
}
return results;
}
void
trimComment( std::string& line )
{
char quoted = '\0';
std::string::iterator position = line.begin();
while ( position != line.end() && (quoted != '\0' || *position == '#') ) {
if ( *position == '\' && std::next( position ) != line.end() ) {
++ position;
} else if ( *position == quoted ) {
quoted = '\0';
} else if ( *position == '\"' || *position == '\'' ) {
quoted = *position;
}
++ position;
}
line.erase( position, line.end() );
}
bool
isEmpty( std::string const& line )
{
return std::all_of(
line.begin(),
line.end(),
[]( unsigned char ch ) { return isspace( ch ); } );
}
std::vector<Book>
parseFile( std::istream& source )
{
std::vector<Book> results;
int lineNumber = 0;
std::string line;
bool errorSeen = false;
while ( std::getline( source, line ) ) {
++ lineNumber;
int extraLines = getContinuationLines( source, line );
trimComment( line );
if ( ! isEmpty( line ) ) {
static std::regex const matcher(
R"^(([^,]*),\s*"([^"]*)"\s*\((\d+) pp.\)\s*\[(\S+)\s*([^\]]*)\])^"
);
std::smatch capture;
if ( ! std::regex_match( line, capture, matcher ) ) {
std::cerr << "Format error, line " << lineNumber << std::endl;
errorSeen = true;
} else {
results.emplace_back(
capture[1],
capture[2],
std::stoi( capture[3] ),
capture[4],
capture[5] );
}
}
lineNumber += extraLines;
}
if ( errorSeen ) {
results.clear(); // Or more likely, throw some sort of exception.
}
return results;
}
The real issue here is how you report the error to the caller; I suspect that in most cases, and exception would be appropriate, but depending on the use case, other alternatives may be valid as well. In this example, I just return an empty vector. (The interaction between comments and continuation lines probably needs to be better defined as well, with modifications according to how it has been defined.)

Your input string is well delimited so I'd recommend using an extraction operator over a regex, for speed and for ease of use.
You'd first need to create a struct for your books:
struct book{
string author;
string title;
int pages;
string code;
string status;
};
Then you'd need to write the actual extraction operator:
istream& operator>>(istream& lhs, book& rhs){
lhs >> ws;
getline(lhs, rhs.author, ',');
lhs.ignore(numeric_limits<streamsize>::max(), '"');
getline(lhs, rhs.title, '"');
lhs.ignore(numeric_limits<streamsize>::max(), '(');
lhs >> rhs.pages;
lhs.ignore(numeric_limits<streamsize>::max(), '[');
lhs >> rhs.code >> ws;
getline(lhs, rhs.status, ']');
return lhs;
}
This gives you a tremendous amount of power. For example you can extract all the books from an istream into a vector like this:
istringstream foo("P.G. Wodehouse, \"Heavy Weather\" (336 pp.) [PH.409 AVAILABLE FOR LENDING]\nJohn Bunyan, \"The Pilgrim's Progress\" (336 pp.) [E.1173 CHECKED OUT]");
vector<book> bar{ istream_iterator<book>(foo), istream_iterator<book>() };

Use flex (it generates C or C++ code, to be used as a part or as the full program)
%%
^[^,]+/, {printf("Autor: %s\n",yytext );}
\"[^"]+\" {printf("Title: %s\n",yytext );}
\([^ ]+/[ ]pp\. {printf("Pages: %s\n",yytext+1);}
..................
.|\n {}
%%
(untested)

Here's the code:
#include <iostream>
#include <cstring>
using namespace std;
string extract (string a)
{
string str = "AUTHOR = "; //the result string
int i = 0;
while (a[i] != ',')
str += a[i++];
while (a[i++] != '\"');
str += "\nTITLE = ";
while (a[i] != '\"')
str += a[i++];
while (a[i++] != '(');
str += "\nPAGES = ";
while (a[i] != ' ')
str += a[i++];
while (a[i++] != '[');
str += "\nCODE = ";
while (a[i] != ' ')
str += a[i++];
while (a[i++] == ' ');
str += "\nSTATUS = ";
while (a[i] != ']')
str += a[i++];
return str;
}
int main ()
{
string a;
getline (cin, a);
cout << extract (a) << endl;
return 0;
}
Happy coding :)

How to create a html tree?

I need to dig a bit in some html files and I wanted to first transform them into the readible form of a tree one tag at a line. Nevertheless I have no experience in html. Could someone correnct my code and point out the rules I've forgotten?
My code does not work for for real life pages. At the end of the program execution the nesting counter should be set to 0, as program should leave all the nested tags it has met. It does not. For a facebook page it is more than 2000 tags remaining open.
Before one would suggest me using a library, I haven't seen any good one out there. For my pages transforming into xml somehow fails and htmlcxx library has no proper documentation.
#include <cstdio>
char get_char( FILE *stream ) {
char c;
do
c = getc(stream);
while ( c == ' ' || c == '\n' || c == '\t' || c == '\r' );
return c;
}
void fun( FILE *stream, FILE *out ) {
int counter = -1;
char c;
do {
c = get_char(stream);
if ( c == EOF )
break;
if ( c != '<' ) { // print text
for ( int i = counter + 1; i; --i )
putc( ' ', out );
fprintf( out, "TEXT: " );
do {
if ( c == '\n' )
fprintf( out, "<BR>" ); // random separator
else
putc( c, out );
c = getc( stream );
} while ( c != '<' );
putc( '\n', out );
}
c = getc( stream );
if ( c != '/' ) { // nest deeper
++counter;
for ( int i = counter; i; --i )
putc( ' ', out );
} else { // go back in nesting
--counter;
// maybe here should be some exception handling
do // assuming there's no strings in quotation marks here
c = getc( stream );
while ( c != '>' );
continue;
}
ungetc( c, stream );
do { // reading tag
c = getc(stream);
if( c == '/' ) { // checking if it's not a <blahblah/>
c = getc(stream);
if ( c == '>' ) {
--counter;
break;
}
putc( '/', out );
putc( c, out );
} else if ( c == '"' ) { // not parsing strings put in quotation marks
do {
putc( c, out ); c = getc( stream );
if ( c == '\\' ) {
putc( c, out ); c = getc( stream );
if ( c == '"' ) {
putc( c, out ); c = getc( stream );
}
}
} while ( c != '"' );
putc( c, out );
} else if ( c == '>' ) { // end of tag
break;
} else // standard procedure
putc( c, out );
} while ( true );
putc( '\n', out );
} while (true);
fprintf( out, "Counter: %d", counter );
}
int main() {
const char *name = "rfb.html";
const char *oname = "out.txt";
FILE *file = fopen(name, "r");
FILE *out = fopen(oname, "w");
fun( file, out );
return 0;
}

HTML != XML
Tags could be non-closed, for example <img ...> is considered equal to <img ... />

Such interesting and useful topic and almost no answers. Really strange...
It's hard to find good C++ HTML parser! I try to guide in right direction...it may help you to move on...
The Lib curl page has some source code to get you going. Documents traversing the dom tree. You don't need an xml parser. Doesn't fail on badly formated html.
http://curl.haxx.se/libcurl/c/htmltidy.html
Another option is htmlcxx. From the website description:
htmlcxx is a simple non-validating css1 and html parser for C++.
Can try libs like tidyHTML - http://tidy.sourceforge.net (free)
If you're using Qt 4.6, you can use the QWebElement. A simple example:
frame->setHtml(HTML);
QWebElement document = frame->documentElement();
QList imgs = document.findAll("img");
Here is another example. http://doc.qt.digia.com/4.6/webkit-simpleselector.html

c++ Reading numbers from text files, ignoring comments

So I've seen lots of solutions on this site and tutorials about reading in from a text file in C++, but have yet to figure out a solution to my problem. I'm new at C++ so I think I'm having trouble piecing together some of the documentation to make sense of it all.
What I am trying to do is read a text file numbers while ignoring comments in the file that are denoted by "#". So an example file would look like:
#here is my comment
20 30 40 50
#this is my last comment
60 70 80 90
My code can read numbers fine when there aren't any comments, but I don't understand parsing the stream well enough to ignore the comments. Its kind of a hack solution right now.
/////////////////////// Read the file ///////////////////////
std::string line;
if (input_file.is_open())
{
//While we can still read the file
while (std::getline(input_file, line))
{
std::istringstream iss(line);
float num; // The number in the line
//while the iss is a number
while ((iss >> num))
{
//look at the number
}
}
}
else
{
std::cout << "Unable to open file";
}
/////////////////////// done reading file /////////////////
Is there a way I can incorporate comment handling with this solution or do I need a different approach? Any advice would be great, thanks.

If your file contains # always in the first column, then just test, if the line starts with # like this:
while (std::getline(input_file, line))
{
if (line[0] != "#" )
{
std::istringstream iss(line);
float num; // The number in the line
//while the iss is a number
while ((iss >> num))
{
//look at the number
}
}
}
It is wise though to trim the line of leading and trailing whitespaces, like shown here for example: Remove spaces from std::string in C++

If this is just a one of use, for line oriented input like yours, the
simplest solution is just to strip the comment from the line you just
read:
line.erase( std::find( line.begin(), line.end(), '#' ), line.end() );
A more generic solution would be to use a filtering streambuf, something
like:
class FilterCommentsStreambuf : public std::streambuf
{
std::istream& myOwner;
std::streambuf* mySource;
char myCommentChar;
char myBuffer;
protected:
int underflow()
{
int const eof = std::traits_type::eof();
int results = mySource->sbumpc();
if ( results == myCommentChar ) {
while ( results != eof && results != '\n') {
results = mySource->sbumpc(0;
}
}
if ( results != eof ) {
myBuffer = results;
setg( &myBuffer, &myBuffer, &myBuffer + 1 );
}
return results;
}
public:
FilterCommentsStreambuf( std::istream& source,
char comment = '#' )
: myOwner( source )
, mySource( source.rdbuf() )
, myCommentChar( comment )
{
myOwner.rdbuf( this );
}
~FilterCommentsStreambuf()
{
myOwner.rdbuf( mySource );
}
};
In this case, you could even forgo getline:
FilterCommentsStreambuf filter( input_file );
double num;
while ( input_file >> num || !input_file.eof() ) {
if ( ! input_file ) {
// Formatting error, output error message, clear the
// error, and resynchronize the input---probably by
// ignore'ing until end of line.
} else {
// Do something with the number...
}
}
(In such cases, I've found it useful to also track the line number in
the FilterCommentsStreambuf. That way you have it for error
messages.)

An alternative to the "read aline and parse it as a string", can be use the stream itself as the incoming buffer:
while(input_file)
{
int n = 0;
char c;
input_file >> c; // will skip spaces ad read the first non-blank
if(c == '#')
{
while(c!='\n' && input_file) input_file.get(c);
continue; //may be not soooo beautiful, but does not introduce useless dynamic memory
}
//c is part of something else but comment, so give it back to parse it as number
input_file.unget(); //< this is what all the fuss is about!
if(input_file >> n)
{
// look at the nunber
continue;
}
// something else, but not an integer is there ....
// if you cannot recover the lopop will exit
}

C++ Load text file, optimization [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
There is my source code loading text file and delimitting each line to single items (words).
How to further optimize the code? Testing empty lines (and other constructions) are (in my opinion) a little bit inefficient....
typedef std::vector < std::string > TLines;
typedef std::vector < std::vector < std::string > > TItems;
TItems TFloadFile ( const char * file_name )
{
//Load projection from file
unsigned int lines = 0;
char buffer[BUFF];
FILE * file;
TItems file_words;
TLines file_lines;
file = fopen ( file_name, "r" );
if ( file != NULL )
{
for ( ; fgets ( buffer, BUFF, file ); )
{
//Remove empty lines
bool empty_line = true;
for ( unsigned i = 0; i < strlen ( buffer ); i++ )
{
if ( !isspace ( ( unsigned char ) buffer[i] ) )
{
empty_line = false;
break;
}
}
if ( !empty_line )
{
file_lines.push_back ( buffer );
lines++;
}
}
file_words.resize ( lines + 1 );
for ( unsigned int i = 0; i < lines; i++ )
{
char * word = strtok ( const_cast<char *> ( file_lines[i].c_str() ), " \t,;\r\n" );
for ( int j = 0; word; j++, word = strtok ( 0, " \t;\r\n" ) )
{
file_words[i].push_back ( word );
}
}
fclose ( file );
}
return file_words;
}
Thanks for your help...

The line for ( unsigned i = 0; i < strlen ( buffer ); i++ ) is quite inefficient as you're calculating the length of buffer each time through the loop. However, it's possible that this will be optimised away by the compiler.
You're pushing items onto your std::vectors without reserve()ing any space. For large file, this will involve a lot of overhead as the content of the vectors will need to be copied in order to resize them. I just read #Notinlist's answer, which already talks about the inefficiencies of std::vector::resize().
Instead of reading each line into a vector through repeated fgets() calls, could you not simply determine the number of bytes in the file, dynamically allocate a char array to hold them, and then dump the bytes into it? Then, you could parse the words and store them in file_words. This would be more efficient than the method you're currently using.

Before optimizing, can you explain how big the file is, how long the code currently takes to execute and why you think it isn't already IO bound (ie due to hard disk speed). How long do you think it should take? Some idea of the type of data in the file would be good too (such as average line length, average proportion of empty lines etc).
That said, combine the remove-empty-line loop with the word-tokenising loop. Then you can remove TLines altogether and avoid the std::string constructions and vector push-back. I haven't checked this code works, but it should be close enough to give you the idea. It also includes a more efficient empty line spotter:
if ( file != NULL )
{
for ( ; fgets ( buffer, BUFF, file ); )
{
bool is_empty = true;
for (char *c = buffer; *c != '\0'; c++)
{
if (!isspace(c))
{
is_empty = false;
break;
}
}
if (is_empty)
continue;
file_words.resize ( lines + 1 );
char * word = strtok ( buffer, " \t,;\r\n" );
for ( int j = 0; word; j++, word = strtok ( 0, " \t;\r\n" ) )
{
file_words[i].push_back ( word );
}
lines++;
}
fclose ( file );
}

For one
file_lines.push_back ( buffer );
That is a very expensive line. If you don't have to use vector, then use a list instead. Maybe convert your list to a vector after you finished with the job.
If you absolutely in need of using vector for this purpose, then you should use some exponential increment instead, like:
if(file_lines.size()<=lines){
file_lines.resize((int)(lines * 1.3 + 1));
}
That way you will have much less cpu intensive .resize() operations, for a cost of minimal memory consumption overhead.

Simplified and converted to use std::list instead of std::vector.
typedef std::list< std::list< std::string > > TItems;
TItems TFloadFile ( const char * file_name )
{
using namespace std;
//Load projection from file
ifstream file(file_name);
TItems file_words;
string line;
for(getline(file,line); !file.fail() && !file.eof(); getline(file,line))
{
file_words.push_back(list<string>());
list<string> &words(file_words.back());
char *word = strtok((char*)line.c_str(), " \t,;\r\n" );
for(; word; word=strtok( 0, " \t;\r\n" ))
{
words.push_back( word );
}
if(!words.size())
file_words.pop_back();
}
return file_words;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

reading csv file for specific information - c++

Related

changing first letter to uppercase with fscanf and fseek

How to extract the string pattern in C++ efficiently?

How to create a html tree?

c++ Reading numbers from text files, ignoring comments

C++ Load text file, optimization [closed]

Categories

Resources