Text parsing a log file effectively in c++ - c++

I want to build a log browser. For which I need to code effectively. Given is a simple code for parsing.
Please let me know if this code is okay or any improvements to be given.
Also strtok(o,delim) function in the below given program is not clear. So please explain me about its functionality.
// parsing ex.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
using std::cout;
using std::endl;
#include <fstream>
using std::ifstream;
#include <cstring>
const int MAX_CHARS_PER_LINE = 512;
const int MAX_TOKENS_PER_LINE = 20;
const char* const DELIMITER = " ";
int main()
{
// create a file-reading object
ifstream fin;
fin.open("C:\\Personal\\data.txt"); // open a file
if (!fin.good())
return 1; // exit if file not found
// read each line of the file
while (!fin.eof())
{
// read an entire line into memory
char buf[MAX_CHARS_PER_LINE];
fin.getline(buf, MAX_CHARS_PER_LINE);
// parse the line into blank-delimited tokens
int n = 0; // a for-loop index
// array to store memory addresses of the tokens in buf
const char* token[MAX_TOKENS_PER_LINE] = {}; // initialize to 0
// parse the line
token[0] = strtok(buf, DELIMITER); // first token
if (token[0]) // zero if line is blank
{
for (n = 1; n < MAX_TOKENS_PER_LINE; n++)
{
token[n] = strtok(0, DELIMITER); // subsequent tokens
if (!token[n]) break; // no more tokens
}
}
// process (print) the tokens
for (int i = 0; i < n; i++) // n = #of tokens
cout << "Token[" << i << "] = " << token[i] << endl;
cout << endl;
}
}

Your code works, except there are no boundary checks. It will fail if a line in the file is longer than MAX_CHARS_PER_LINE. while (!fin.eof()){...} is prone to other errors as well.
You can easily solve this problem with std::string
The code also fails if a line contains more than MAX_TOKENS_PER_LINE tokens. You can solve this by using std::vector
For improvements, use std::string instead of character arrays.
Use std::vector instead of C-style arrays.
Use std::stringstream instead of strtok
The advantage is that you don't have to worry about maximum line length, or maximum number of tokens per line.
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
const char CDELIMITER = ' ';
int main()
{
...
std::string buf;
//read the file line by line
while (std::getline(fin, buf))
{
//convert the line in to stream:
std::istringstream ss(buf);
//declare vector of string (instead of fixed array)
std::vector<std::string> vec;
//read the line, word by word
while (std::getline(ss, buf, CDELIMITER))
vec.push_back(buf);
for (size_t i = 0; i < vec.size(); i++)
std::cout << "Token[" << i << "] = " << vec[i] << "\n";
std::cout << "\n";
}
return 0;
}

Related

arrange line in txt file in ASCII order using array and display them

#include <stdio.h>
#include <string.h>
#include <fstream>
#include <iostream>
using namespace std;
int main() {
ifstream infile; // ifstream is reading file
infile.open("read.txt"); // read.txt is the file we need to read
std::cout << infile;
string str;
if (infile.is_open()) {
while (getline(infile, str)) {
char str[2000], ch;
int i, j, len;
len = strlen(str);
for (i = 0; i < len; i++) {
for (j = 0; j < (len - 1); j++) {
if (str[j] > str[j + 1]) {
ch = str[j];
str[j] = str[j + 1];
str[j + 1] = ch;
}
}
}
}
cout << "\nProcessed data:" << str;
}
return 0;
}
My txt file:
Today is a fine day.
It’s sunny.
Let us go out now!
My result should be:
.Taaaddefiinosyy
’.Innsstuy
!Legnooosttuuw
Spaces is consider here as well.
I'm new to C++.
I need some pros help.
Thank you very much!
Making use of the STL:
Read your file line by line into a std::string using std::getline.
Sort every line using std::ranges::sort.
Print it.
The example below:
also uses the fmt library instead of std::cout, and
reads from a std::istringstream instead of a std::ifstream.
[Demo]
#include <algorithm> // sort
#include <fmt/core.h>
#include <sstream> // istringstream
#include <string> // getline
int main() {
std::istringstream iss{
"Today is a fine day.\n"
"It's sunny.\n"
"Let us go out now!\n"
};
fmt::print("Original file:\n{}\n", iss.str());
fmt::print("Processed file:\n");
std::string line{};
while (std::getline(iss, line)) {
std::ranges::sort(line);
fmt::print("{}\n", line);
}
}
// Outputs:
//
// Original file:
// Today is a fine day.
// It's sunny.
// Let us go out now!
//
// Processed file:
// .Taaaddefiinosyy
// '.Innsstuy
// !Legnooosttuuw
Your code does not work, because:
The line std::cout << infile; is wrong. If you want to print the result of istream::operator bool() in order to determine whether the file was successfully opened, then you should write std::cout << infile.operator bool(); or std::cout << static_cast<bool>(infile); instead. However, it would probably be better to simply write std::cout << infile.fail(); or std::cout << !infile.fail();.
The function std::strlen requires as a parameter a pointer to a valid string. Maybe you intended to write str.length()? In that case, you should delete the declaration char str[2000], because it shadows the declaration string str;.
You should print the sorted result immediately after sorting it, before it gets overwritten by the next line. Currently you are only printing the content str a single time at the end of your program, so you are only printing the last line.
After performing the fixes mentioned above, your code should look like this:
#include <stdio.h>
#include <string.h>
#include <fstream>
#include <iostream>
using namespace std;
int main() {
ifstream infile; // ifstream is reading file
infile.open("read.txt"); // read.txt is the file we need to read
std::cout << infile.fail();
string str;
if (infile.is_open()) {
while (getline(infile, str)) {
char ch;
int i, j, len;
len = str.length();
for (i = 0; i < len; i++) {
for (j = 0; j < (len - 1); j++) {
if (str[j] > str[j + 1]) {
ch = str[j];
str[j] = str[j + 1];
str[j + 1] = ch;
}
}
}
cout << "\nProcessed data:" << str;
}
}
return 0;
}
For the input
Today is a fine day.
It's sunny.
Let us go out now!
this program has the following output:
0
Processed data: .Taaaddefiinosyy
Processed data: '.Innsstuy
Processed data: !Legnooosttuuw
Note that the input posted in your question contains a forward tick ’ instead of an apostrophe '. This could cause trouble. For example, when I tested your program, this forward tick was encoded into a multi-byte UTF-8 character, because it is not representable in 7-bit US-ASCII. This caused your sorting algorithm to fail, because it only supports single-byte characters. I was only able to fix this bug by replacing the forward tick with an apostrophe in the input.

unique words from a file c++

ts been 3 days i just cant identify whats wrong with the program the program should compare words by words instead it only comparing a character to charcter its is showing like if i have words like (aaa bbb cc dd ) the result its printing is a b and same is the sentence file if i put paragraphs to compare its only comparing few character please help me
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
ifstream myfile("unique.text");
int count = 0;
string temp;
string a;
int i,j;
while(getline(myfile,temp))
{
for(i=0 ; i < sizeof(temp); i++)
{
for(int j = 0; j < i; j++)
{
if (temp[i] == temp[j])
break;
}
if (i == j)
cout << temp [i] <<" , ";
}
myfile.close ();
}
You have a couple of problems
temp is of type string. sizeof is not the way to determine the length of a string (it's used for determining things like the number of bytes in an int). You want:
temp.length()
Secondly, indexing into a string (temp[n]) gives you the nth character, not the nth word.
You can make getline split into words by adding a third delimiter parameter:
getline (myfile, temp, ' '))
So, some bugs in your code.
Mixing up characters and strings, closing the file in the while loop and not storing last words.
One recommenadtion. Before you write code, write comments for what you want to do.
Meaning, make a design, before you start coding. That is very important.
For your problem at hand in the title of this thread:
unique words from a file c++
I prepared 3 different solutions. The first is just using very simple constructs. The second is using a std::vector. And, the 3rd is the C++ solution using the C++ algorithm library.
Please see:
Simple, but lengthy
And not recommended, because we should not use raw pointers for owned memory and should not use new
#include <iostream>
#include <fstream>
#include <string>
const std::string fileName{ "unique.text" };
unsigned int numberOfWords() {
// Here we will count the number of words in the file
unsigned int counter = 0;
// Open the file. File must not be already open
std::ifstream sourceFileStream(fileName);
// Check, if we could open the file
if (sourceFileStream) {
// Simply read all words and increment the counter
std::string temp;
while (sourceFileStream >> temp) ++counter;
}
else {
// In case of problem
std::cerr << "\nCould not open file '" << fileName << "'\n";
}
return counter;
}
int main() {
// Get the number of words in the source file
unsigned size = numberOfWords();
// Allocate a dynamic array of strings. Size is the count of the words in the file
// Including doubles. So we will waste a little bit of space
std::string* words = new std::string[size+1];
// Open the source file
std::ifstream sourceFileStream(fileName);
// Check, if it could be opened
if (sourceFileStream) {
// We will read first into a temporary variable
std::string temp;
// Her we will count number of the unique words
unsigned int wordCounter = 0;
// Read all words in the file
while (sourceFileStream >> temp) {
// We will search, if we have read alread the word before. We assume NO for the beginning
bool wordIsAlreadyPresent = false;
// Go through all alread read words, and check, if the just read word is already existing
for (unsigned int i = 0; i < wordCounter; ++i) {
// Check, if just read word is already in the word array
if (temp == words[i]) {
// Yes it is, set flag, and stop the loop.
wordIsAlreadyPresent = true;
break;
}
}
// if the word was not already there
if (! wordIsAlreadyPresent) {
// Then add the just read temporary word into our array
words[wordCounter] = temp;
// And increment the counter
++wordCounter;
}
}
// Show all read unique words
for (unsigned int i = 0; i < wordCounter; ++i) {
std::cout << words[i] << "\n";
}
}
else { // In case of error
std::cerr << "\nCould not open file '" << fileName << "'\n";
}
delete[] words;
}
Using a vector. Already more compact and better readable
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
const std::string fileName{ "unique.text" };
int main() {
// Open the source file
std::ifstream sourceFileStream(fileName);
// Check, if the source file is oepen
if (sourceFileStream) {
// Temporary string for holding just read words
std::string temp;
// In this vector we will store all unique words
std::vector<std::string> words;
// Read all words from the source file
while (sourceFileStream >> temp) {
// We will search, if we have read alread the word before. We assume NO for the beginning
bool wordIsAlreadyPresent = false;
// Go through all alread read words, and check, if the just read word is already existing
for (unsigned int i = 0; i < words.size(); ++i) {
// Check, if just read word is already in the word vector
if (temp == words[i]) {
// Yes it is, set flag, and stop the loop.
wordIsAlreadyPresent = true;
break;
}
}
// if the word was not already there
if (not wordIsAlreadyPresent) {
// Then add the just read temporary word into our array
words.push_back(temp);
}
}
for (unsigned int i = 0; i < words.size(); ++i) {
std::cout << words[i] << "\n";
}
}
else {
std::cerr << "\nCould not open file '" << fileName << "'\n";
}
}
And 3., more advance C++ programming. Just very few lines and elegant code.
But too difficult to understand for starters.
#include <iostream>
#include <fstream>
#include <set>
#include <string>
#include <iterator>
#include <algorithm>
const std::string fileName{ "unique.text" };
int main() {
// Open the source file and check, if it could be opend and there is no failure
if (std::ifstream sourceFileStream(fileName); sourceFileStream) {
// Read all words (everything delimited by a white space) into a set
std::set words(std::istream_iterator<std::string>(sourceFileStream), {});
// Now we have a set with all unique words. Show this on the screen
std::copy(words.begin(), words.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}
// If we could not open the source file
else {
std::cerr << "\nCould not open file '" << fileName << "'\n";
}
return 0;
}

Tokenization of strings in C++

I am using the following code for splitting of each word into a Token per line. My problem lies here: I want a continuous update on my number of tokens in the file. The contents of the file are:
Student details:
Highlander 141A Section-A.
Single 450988012 SA
Program:
#include <iostream>
using std::cout;
using std::endl;
#include <fstream>
using std::ifstream;
#include <cstring>
const int MAX_CHARS_PER_LINE = 512;
const int MAX_TOKENS_PER_LINE = 20;
const char* const DELIMITER = " ";
int main()
{
// create a file-reading object
ifstream fin;
fin.open("data.txt"); // open a file
if (!fin.good())
return 1; // exit if file not found
// read each line of the file
while (!fin.eof())
{
// read an entire line into memory
char buf[MAX_CHARS_PER_LINE];
fin.getline(buf, MAX_CHARS_PER_LINE);
// parse the line into blank-delimited tokens
int n = 0; // a for-loop index
// array to store memory addresses of the tokens in buf
const char* token[MAX_TOKENS_PER_LINE] = {}; // initialize to 0
// parse the line
token[0] = strtok(buf, DELIMITER); // first token
if (token[0]) // zero if line is blank
{
for (n = 1; n < MAX_TOKENS_PER_LINE; n++)
{
token[n] = strtok(0, DELIMITER); // subsequent tokens
if (!token[n]) break; // no more tokens
}
}
// process (print) the tokens
for (int i = 0; i < n; i++) // n = #of tokens
cout << "Token[" << i << "] = " << token[i] << endl;
cout << endl;
}
}
Output:
Token[0] = Student
Token[1] = details:
Token[0] = Highlander
Token[1] = 141A
Token[2] = Section-A.
Token[0] = Single
Token[1] = 450988012
Token[2] = SA
Expected:
Token[0] = Student
Token[1] = details:
Token[2] = Highlander
Token[3] = 141A
Token[4] = Section-A.
Token[5] = Single
Token[6] = 450988012
Token[7] = SA
So I want it to be incremented so that I could easily identify the value by its variable name. Thanks in advance...
What's wrong with the standard, idiomatic solution:
std::string line;
while ( std::getline( fin, line ) ) {
std::istringstream parser( line );
int i = 0;
std::string token;
while ( parser >> token ) {
std::cout << "Token[" << i << "] = " << token << std::endl;
++ i;
}
}
Obviously, in real life, you'll want to do more than just
output each token, and you'll want more complicated parsing.
But anytime you're doing line oriented input, the above is the
model you should be using (probably keeping track of the line
number as well, for error messages).
It's probably worth pointing out that in this case, an even
better solution would be to use boost::split in the outer
loop, to get a vector of tokens.
I would just let iostream do the splitting
std::vector<std::string> token;
std::string s;
while (fin >> s)
token.push_back(s);
Then you can output the whole array at once with proper indexes.
for (int i = 0; i < token.size(); ++i)
cout << "Token[" << i << "] = " << token[i] << endl;
Update:
You can even omit the vector altogether and output the tokens as you read them from the input strieam
std::string s;
for (int i = 0; fin >> s; ++i)
std::cout << "Token[" << i << "] = " << token[i] << std::endl;

parsing a text file with first line all 1's and second line all 2's

i have created a program which will take a file read a text file line-by-line, and separate out the individual words in each line (as separated by blanks).
now i want to be able to edit the code so that all the tokens for the first line will be all 1's and all the tokens for the second line will be 2's
if any one could help me with this please
down below is my code:
#include <iostream>
using std::cout;
using std::endl;
#include <fstream>
using std::ifstream;
#include <cmath>
#include <string>
const int MAX_CHARS_PER_LINE = 512;
const int MAX_TOKENS_PER_LINE = 20;
const char* const DELIMITER = " ";
using namespace std;
int main()
{
string filename;
// create a file-reading object
/*std::ifstream file1("file1.txt", ios_base::app);
std::ifstream file2("file2.txt");
std::ofstream combinedfile("combinedfile.txt");
combinedfile << file1.rdbuf() << file2.rdbuf();*/
ifstream fin;
//enter in file name combinedfile.txt
cout <<"please enter file name (including .txt)";
cin >> filename ;
fin.open(filename); // open a file
if (!fin.good())
return 1; // exit if file not found
// read each line of the file
while (!fin.eof())
{
// read an entire line into memory
char buf[MAX_CHARS_PER_LINE];
fin.getline(buf, MAX_CHARS_PER_LINE);
// parse the line into blank-delimited tokens
int n = 0; // a for-loop index
// array to store memory addresses of the tokens in buf
const char* token[MAX_TOKENS_PER_LINE] = {}; // initialize to 0
// parse the line
token[0] = strtok(buf, DELIMITER); // first token
if (token[0]) // zero if line is blank
{
for (n = 1; n < MAX_TOKENS_PER_LINE; n++)
{
token[n] = strtok(0, DELIMITER); // subsequent tokens
if (!token[n]) break; // no more tokens
}
}
// process (print) the tokens
for (int i = 0; i < n; i++) // n = #of tokens
cout << "Token[" << i << "] = " << token[i] << endl;
cout << endl;
}
system("pause");
return 0;
}
so the output should be like this:
Token[1] = This
Token[1] = course
Token[1] = provides
Token[1] = detailed
Token[1] = coverage
Token[1] = of
Token[1] = the
Token[1] = concepts
Token[1] = and
Token[1] = syntax
Token[2] = Coverage
Token[2] = includes
Token[2] = inheritance,
Token[2] = overloaded
Token[2] = operators,
Token[2] = overloaded
Token[2]= default
Token[2] = operators,
If you just want to separate words into two containers such that the first contains words from the first line, and the second contains words from the second line, you can use vectors to store them and string streams to extract words from a line of text:
#include <sstream>
#include <string>
#include<vector>
#include<fstream>
using namespace std;
int main()
{
ifstream infile("test.txt");
string line;
string word;
vector< vector<string> > tokens(2);
for (int ix = 0; ix < 2; ++ix)
{
getline(infile, line);
istringstream iss(line);
while(iss >> word)
tokens[ix].push_back(word);
}
}
Here, tokens[0] is a vector containing words from the first line, and tokens[1] contains words from the second line.
You can create a vector of vectors as
vector<vector<string>> VEC;
And keep on adding words to VEC[0] for the first line and then increment the counter as you encounter a newline character to point to VEC[1] and so on.

splitting int from a string

I have string with number on ints seperated by space delimiter. Can some one help me how to split the string into ints. I tried to use find and then substr. Is there a better way to do it ?
Use a stringsteam:
#include <string>
#include <sstream>
int main() {
std::string s = "100 123 42";
std::istringstream is( s );
int n;
while( is >> n ) {
// do something with n
}
}
This has been discussed as part of Split a string in C++?
Also, you can use boost library split function to achieve the splitting without a loop in your program.
Eg.
boost::split(epoch_vector, epoch_string, boost::is_any_of(","));
A version using boost. The stringstream version from Neil is so much simpler!
#include <iostream>
#include <vector>
#include <algorithm>
#include <boost/lexical_cast.hpp>
#include <boost/tokenizer.hpp>
int main()
{
const std::string str( "20 30 40 50" );
std::vector<int> numbers;
boost::tokenizer<> tok(str);
std::transform( tok.begin(), tok.end(), std::back_inserter(numbers),
&boost::lexical_cast<int,std::string> );
// print them
std::copy( numbers.begin(), numbers.end(), std::ostream_iterator<int>(std::cout,"\n") );
}
I had some trouble when reading and converting more than one string (I found I had to clear the string stream). Here a test I made with multiple int/string conversions with read/write to an i/o file.
#include <iostream>
#include <fstream> // for the file i/o
#include <string> // for the string class work
#include <sstream> // for the string stream class work
using namespace std;
int main(int argc, char *argv[])
{
// Aux variables:
int aData[3];
string sData;
stringstream ss;
// Creation of the i/o file:
// ...
// Check for file open correctly:
// ...
// Write initial data on file:
for (unsigned i=0; i<6; ++i)
{
aData[0] = 1*i;
aData[1] = 2*i;
aData[2] = 3*i;
ss.str(""); // Empty the string stream
ss.clear();
ss << aData[0] << ' ' << aData[1] << ' ' << aData[2];
sData = ss.str(); // number-to-string conversion done
my_file << sData << endl;
}
// Simultaneous read and write:
for (unsigned i=0; i<6; ++i)
{
// Read string line from the file:
my_file.seekg(0, ios::beg);
getline (my_file, sData); // reads from start of file
// Convert data:
ss.str(""); // Empty the string stream
ss.clear();
ss << sData;
for (unsigned j = 0; j<3; ++j)
if (ss >> aData[j]) // string-to-num conversion done
;
// Write data to file:
my_file.seekp(0, ios::end);
my_file << 100+aData[0] << ' '; // appends at the end of stream.
my_file << 100+aData[1] << ' ';
my_file << 100+aData[2] << endl;
}
// R/W complete.
// End work on file:
my_file.close();
cout << "Bye, world! \n";
return 0;
}