C++ - Remove or skip quote char in reading a file line by tokenizer - c++

I have a csv file that has records like:
837478739*"EP"1"3FB2B464BD5003B55CA6065E8E040A2A"*"F"*21*15*"NH"*"N"0*-1*"-1"*0*0**-1*223944*-1*"23"1"-1""-1""78909""-1""-1""-1""-1""-1""-1""-1""-1""-1""-1""-1""-1""-1""74425""26""-1"*"-1"*1*1*69*23.58*0*0*0*0*"MC"
The file has lots of records, so I need a fast method to breakdown the line and push_back each of those parts into a vector. The main reason I choose tokenizer is that I heard a lot about its performance. I have a function:
void break(){
//using namespace boost;
string s = "This is a , test '' file";
boost::tokenizer<> tok(s);
vector<string> line;
for(boost::tokenizer<>::iterator beg=tok.begin();beg!=tok.end();++beg){
line.push_back(*beg);
}
cout << line[3] << " and " << line[5] << endl;
}
By that I can get each part of the sentence and ignore everything that is not a letter. Does the tokenizer have the ability to read the record that I have and parse them by "*" delimiter and remove the quotes from the string? There won't be any kind of special character between quotes, I just need to remove the quote marks. I tried to read the tokenizer document, but nothing came out.

You need to assign another TokenizerFunc to your Tokenizer to parse the string differently, the default parses on space and punctuation
http://www.boost.org/doc/libs/1_37_0/libs/tokenizer/tokenizerfunction.htm

You can use regex_replace.
"break" is keyword. You shouldn't use it for function name.

Related

How to remove duplicate phrases that are separated by being inside double quotes or separated by a comma in a file with c++

I use this function to remove duplicate words in a file
But I need it to remove duplicate expressions instead
for example What the function is currently doing
If I have the expression
"Hello World"
"beautiful world"
The function will remove the word "world" from both expressions
And I need this function to replace the entire expression only if it is found more than once in the file
for example
If I have the expressions
"Hello World"
"Hello World"
"beautiful world"
"beautiful world"
The function will remove the expression "Hello world" and "beautiful world" and leave only one from each of them but it will not touch the word "world" because the function will treat everything that is within the quotes as one word
This is the code I use now
#include <string>
#include <sstream>
#include <iostream>
#include <unordered_set>
void Remove_Duplicate_Words(string str)
{
ofstream Write_to_file{ "test.txt" };
// Used to split string around spaces.
istringstream ss(str);
// To store individual visited words
unordered_set<string> hsh;
// Traverse through all words
do
{
string word;
ss >> word;
// If current word is not seen before.
while (hsh.find(word) == hsh.end()) {
cout << word << '\n';
Write_to_file << word << endl; // write to outfile
hsh.insert(word);
}
} while (ss);
}
int main()
{
ifstream Read_from_file{ "test.txt" };
string file_content{ ist {Read_from_file}, ist{} };
Remove_Duplicate_Words(file_content);
return 0;
}
How do I remove duplicate expressions instead of duplicate words?
Unfortunately my knowledge on this subject is very basic and usually what I do is try all kinds of things until I succeed. I tried to do it here too and I just can not figure out how to do it
Any help would be greatly appreciated
Requires a little bit of String parsing.
Your example works by reading tokens, which are similar to words (but not exactly). For your problem, the token becomes word OR quoted string. The more complex your definition of tokens, the harder the problem becomes. Try starting by thinking of tokens as either words or quoted strings on the same line. A quoted string across lines might be a little more complex.
Here's a similar SO question to get you started: Reading quoted string in c++. You need to do something similar, but instead of having set positions, your quoted string can occur anywhere in the line. So you read tokens something like this:
Read next word token (as you're doing now)
If last read token is quote character ("), read till next (") as a single token
Check on the set and output token only if it isn't already there (if token is quoted, don't forget to output the quotes)
Insert token into set.
Repeat till EOF
Hope that helps

C++ retrieve numerical values in a line of string

Here is the content of txt file that i've managed read.
X-axis=0-9
y-axis=0-9
location.txt
temp.txt
I'm not sure whether if its possible but after reading the contents of this txt file i'm trying to store just the x and y axis range into 2 variables so that i'll be able to use it for later functions. Any suggestion? And do i need to use vectors? Here is the code for reading of the file.
string configName;
ifstream inFile;
do {
cout << "Please enter config filename: ";
cin >> configName;
inFile.open(configName);
if (inFile.fail()){
cerr << "Error finding file, please re-enter again." << endl;
}
} while (inFile.fail());
string content;
string tempStr;
while (getline(inFile, content)){
if (content[0] && content[1] == '/') continue;
cout << endl << content << endl;
depends on the style of your file, if you are always sure that the style will remain unchanged, u can read the file character by character and implement pattern recognition stuff like
if (tempstr == "y-axis=")
and then convert the appropriate substring to integer using functions like
std::stoi
and store it
I'm going to assume you already have the whole contents of the .txt file in a single string somewhere. In that case, your next task should be to split the string. Personally, yes, I would recommend using vectors. Say you wanted to split that string by newlines. A function like this:
#include <string>
#include <vector>
std::vector<std::string> split(std::string str)
{
std::vector<std::string> ret;
int cur_pos = 0;
int next_delim = str.find("\n");
while (next_delim != -1) {
ret.push_back(str.substr(cur_pos, next_delim - cur_pos));
cur_pos = next_delim + 1;
next_delim = str.find("\n", cur_pos);
}
return ret;
}
Will split an input string by newlines. From there, you can begin parsing the strings in that vector. They key functions you'll want to look at are std::string's substr() and find() methods. A quick google search should get you to the relevant documentation, but here you are, just in case:
http://www.cplusplus.com/reference/string/string/substr/
http://www.cplusplus.com/reference/string/string/find/
Now, say you have the string "X-axis=0-9" in vec[0]. Then, what you can do is do a find for = and then get the substrings before and after that index. The stuff before will be "X-axis" and the stuff after will be "0-9". This will allow you to figure that the "0-9" should be ascribed to whatever "X-axis" is. From there, I think you can figure it out, but I hope this gives you a good idea as to where to start!
std::string::find() can be used to search for a character in a string;
std::string::substr() can be used to extract part of a string into another new sub-string;
std::atoi() can be used to convert a string into an integer.
So then, these three functions will allow you to do some processing on content, specifically: (1) search content for the start/stop delimiters of the first value (= and -) and the second value (- and string::npos), (2) extract them into temporary sub-strings, and then (3) convert the sub-strings to ints. Which is what you want.

ifstream get line change output from char to string

C++ ifstream get line change getline output from char to string
I got a text file.. so i read it and i do something like
char data[50];
readFile.open(filename.c_str());
while(readFile.good())
{
readFile.getline(data,50,',');
cout << data << endl;
}
My question is instead of creating a char with size 50 by the variable name data, can i get the getline to a string instead something like
string myData;
readFile.getline(myData,',');
My text file is something like this
Line2D, [3,2]
Line3D, [7,2,3]
I tried and the compiler say..
no matching function for getline(std::string&,char)
so is it possible to still break by delimiter, assign value to a string instead of a char.
Updates:
Using
while (std::getline(readFile, line))
{
std::cout << line << std::endl;
}
IT read line by line, but i wanna break the string into several delimiter, originally if using char i will specify the delimiter as the 3rd element which is
readFile.getline(data,50,',');
how do i do with string if i break /explode with delimiter comma , the one above. in line by line
Use std::getline():
std::string line;
while (std::getline(readFile, line, ','))
{
std::cout << line << std::endl;
}
Always check the result of read operations immediately otherwise the code will attempt to process the result of a failed read, as is the case with the posted code.
Though it is possible to specify a different delimiter in getline() it could mistakenly process two invalid lines as a single valid line. Recommend retrieving each line in full and then split the line. A useful utility for splitting lines is boost::split().

How to read a file and get words in C++

I am curious as to how I would go about reading the input from a text file with no set structure (Such as notes or a small report) word by word.
The text for example might be structured like this:
"06/05/1992
Today is a good day;
The worm has turned and the battle was won."
I was thinking maybe getting the line using getline, and then seeing if I can split it into words via whitespace from there. Then I thought using strtok might work! However I don't think that will work with the punctuation.
Another method I was thinking of was getting everything char by char and omitting the characters that were undesired. Yet that one seems unlikely.
So to sort the thing short:
Is there an easy way to read an input from a file and split it into words?
Since it's easier to write than to find the duplicate question,
#include <iterator>
std::istream_iterator<std::string> word_iter( my_file_stream ), word_iter_end;
size_t wordcnt;
for ( ; word_iter != word_iter_end; ++ word_iter ) {
std::cout << "word " << wordcnt << ": " << * word_iter << '\n';
}
The std::string argument to istream_iterator tells it to return a string when you do *word_iter. Every time the iterator is incremented, it grabs another word from its stream.
If you have multiple iterators on the same stream at the same time, you can choose between data types to extract. However, in that case it may be easier just to use >> directly. The advantage of an iterator is that it can plug into the generic functions in <algorithm>.
Yes. You're looking for std::istream::operator>> :) Note that it will remove consecutive whitespace but I doubt that's a problem here.
i.e.
std::ifstream file("filename");
std::vector<std::string> words;
std::string currentWord;
while(file >> currentWord)
words.push_back(currentWord);
You can use getline with a space character, getline(buffer,1000,' ');
Or perhaps you can use this function to split a string into several parts, with a certain delimiter:
string StrPart(string s, char sep, int i) {
string out="";
int n=0, c=0;
for (c=0;c<(int)s.length();c++) {
if (s[c]==sep) {
n+=1;
} else {
if (n==i) out+=s[c];
}
}
return out;
}
Notes: This function assumes that it you have declared using namespace std;.
s is the string to be split.
sep is the delimiter
i is the part to get (0 based).
You can use the scanner technique to grabb words, numbers dates etc... very simple and flexible. The scanner normally returns token (word, number, real, keywords etc..) to a Parser.
If you later intend to interpret the words, I would recommend this approach.
I can warmly recommend the book "Writing Compilers and Interpreters" by Ronald Mak (Wiley Computer Publishing)

How to read a word into a string ignoring a certain character

I am reading a text file which contains a word with a punctuation mark on it and I would like to read this word into a string without the punctuation marks.
For example, a word may be " Hello, "
I would like the string to get " Hello " (without the comma). How can I do that in C++ using ifstream libraries only.
Can I use the ignore function to ignore the last character?
Thank you in advance.
Try ifstream::get(Ch* p, streamsize n, Ch term).
An example:
char buffer[64];
std::cin.get(buffer, 64, ',');
// will read up to 64 characters until a ',' is found
// For the string "Hello," it would stream in "Hello"
If you need to be more robust than simply a comma, you'll need to post-process the string. The steps might be:
Read the stream into a string
Use string::find_first_of() to help "chunk" the words
Return the word as appropriate.
If I've misunderstood your question, please feel free to elaborate!
If you only want to ignore , then you can use getline.
const int MAX_LEN = 128;
ifstream file("data.txt");
char buffer[MAX_LEN];
while(file.getline(buffer,MAX_LEN,','))
{
cout<<buffer;
}
EDIT: This uses std::string and does away with MAX_LEN
ifstream file("data.txt");
string string_buffer;
while(getline(file,string_buffer,','))
{
cout<<string_buffer;
}
One way would be to use the Boost String Algorithms library. There are several "replace" functions that can be used to replace (or remove) specific characters or strings in strings.
You can also use the Boost Tokenizer library for splitting the string into words after you have removed the punctuation marks.