I want to read data from stream, which has specific format, such as:
"number:name_that_can_contain_spaces:string,string,string..." without quotes where ... means that I dont know how many strings are there separated with commas and strings can have spaces before and after it but not in the middle of string, I want to stop reading at new line
I only come up with using getline() and store each line into string, but I dont know how to continue, if there is something like strtok(line, ":",":",",","\n") which would parse it for me or I have to parse it myself character by character
example of valid line format is:
54485965:abc abc abc: some string, next string , third string\n
parsed result would be:
int 54485965
string "abc abc abc"
string "some string"
string "next string"
string "third string"
You can read line with std::getline and then split it with std::string::find and std::string::substr. In the code below we read line from file data, then find : (so everything before it becomes number which we parse into int with std::stoi) and throw away first part. Similar we do it with name. And in the end we fill std::list with strings separated by ,.
#include <iostream>
#include <fstream>
#include <string>
#include <list>
#include <exception>
#include <stdexcept>
struct entry {
std::string name;
int number;
std::list<std::string> others;
};
int main(int argc, char** argv) {
std::ifstream input("data");
std::list<entry> list;
std::string line;
while(std::getline(input, line)) {
entry e;
std::string::size_type i = 0;
/* get number from line */
i = line.find(":");
if(i != std::string::npos) {
e.number = stoi(line.substr(0, i));
line = line.substr(i + 1);
} else {
throw std::runtime_error("error reading file");
}
/* get name from line */
i = line.find(":");
if(i != std::string::npos) {
e.name = line.substr(0, i);
line = line.substr(i + 1);
} else {
throw std::runtime_error("error reading file");
}
/* get other strings */
do {
i = line.find(",");
e.others.push_back(line.substr(0, i));
line = line.substr(i + 1);
} while(i != std::string::npos);
list.push_back(e);
}
/* output data */
for(entry& e : list) {
std::cout << "name: " << e.name << std::endl;
std::cout << "number: " << e.number << std::endl;
std::cout << "others: ";
for(std::string& s : e.others) {
std::cout << s << ",";
}
std::cout << std::endl;
}
return 0;
}
Related
I am trying to parse a large text file and split it up into single words using strtok. The delimiters remove all special characters, whitespace, and new lines. For some reason when I printf() it, it only prints the first word and a bunch of (null) for the rest.
ifstream textstream(textFile);
string textLine;
while (getline(textstream, textLine))
{
struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX] += textLine.length() + 1;
char *line_c = new char[textLine.length() + 1]; // creates a character array the length of the line
strcpy(line_c, textLine.c_str()); // copies the line string into the character array
char *word = strtok(line_c, delimiters); // removes all unwanted characters
while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
{
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
word = strtok(NULL, delimiters); // move to next word
printf("%s", word);
}
}
Rather than jumping through the hoops necessary to use strtok, I'd write a little replacement that works directly with strings, without modifying its input, something on this general order:
std::vector<std::string> tokenize(std::string const &input, std::string const &delims = " ") {
std::vector<std::string> ret;
int start = 0;
while ((start = input.find_first_not_of(delims, start)) != std::string::npos) {
auto stop = input.find_first_of(delims, start+1);
ret.push_back(input.substr(start, stop-start));
start = stop;
}
return ret;
}
At least to me, this seems to simplify the rest of the code quite a bit:
std::string textLine;
while (std::getline(textStream, textLine)) {
struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX] += textLine.length() + 1;
auto words = tokenize(textLine, delims);
for (auto const &word : words) {
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n';
std::cout << word << '\n';
}
}
This also avoids (among other things) the massive memory leak you had, allocating memory every iteration of your loop, but never freeing any of it.
Move printf two lines UP.
while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
{
printf("%s", word);
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
word = strtok(NULL, delimiters); // move to next word
}
As #j23 pointed out, your printf is in the wrong location.
As #Jerry-Coffin points out, there are more c++-ish and modern ways to accomplish, what you try to do. Next to avoiding mutation, you can also avoid copying the words out of the text string. (In my code below, we read line by line, but if you know your whole text fits into memory, you could as well read the whole content into a std::string.)
So, using std::string_view avoids to perform extra copies, it being just something like a pointer into your string and a length.
Here, how it looks like, for a use case, where you need not store the words in another data structure - some kind of one-pass processing of words:
#include <iostream>
#include <fstream>
#include <string>
#include <string_view>
#include <cctype>
template <class F>
void with_lines(std::istream& stream, F body) {
for (std::string line; std::getline(stream,line);) {
body(line);
}
}
template <class F>
void with_words(std::istream& stream, F body) {
with_lines(stream,[&body](std::string& line) {
std::string_view line_view{line.cbegin(),line.cend()};
while (!line_view.empty()) {
// skip whitespaces
for (; !line_view.empty() && isspace(line_view[0]);
line_view.remove_prefix(1));
size_t position = 0;
for (; position < line_view.size() &&
!isspace(line_view[position]);
position++);
if (position > 0) {
body(line_view.substr(0,position));
line_view.remove_prefix(position);
}
}
});
}
int main (int argc, const char* argv[]) {
size_t word_count = 0;
std::ifstream stream{"input.txt"};
if(!stream) {
std::cerr
<< "could not open file input.txt" << std::endl;
return -1;
}
with_words(stream, [&word_count] (std::string_view word) {
std::cout << word_count << " " << word << std::endl;
word_count++;
});
std::cout
<< "input.txt contains "
<< word_count << " words."
<< std::endl;
return 0;
}
I get a line like: "1001", Name
I want to know how to grab the number in between the quotes without atoi.
The problem asks to make the function just grab the integer that's between two quotes in a string, then grab the name and place it in a string, but I don't understand that part.
Search using the regular expressions:
#include <regex>
#include <iostream>
int main()
{
const std::string s = "\"1001\", John Martin";
std::regex rgx("\"(\\d+)\", *([\\w ]+)"); // this will extract quoted numbers in any string
std::smatch match;
if (std::regex_search(s.begin(), s.end(), match, rgx))
std::cout << "ID: " << match[1] << ", Name: " << match[2] << '\n';
}
Have a look at std::istringstream, eg:
std::string s = "\"1001\", Name";
std::string name;
int num;
std::istringstream iss(s);
iss.ignore();
iss >> num;
iss.ignore();
iss.ignore();
std::getline(iss, name);
Or
std::string s = "\"1001\", Name";
std::string name;
int num;
std::istringstream iss(s);
iss.ignore(std::numeric_limits<std::streamsize>::max(), '"');
iss >> num;
iss.ignore(std::numeric_limits<std::streamsize>::max(), ',');
std::getline(iss, name);
Or
std::string s = "\"1001\", Name";
std::string name;
int num;
std::string::size_type start = s.find('"') + 1;
std::string::size_type end = s.find('"', start);
std::string snum = s.substr(start, end - start);
std::istringstream(snum) >> num;
start = s.find(',', end+1) + 1;
start = s.find_first_not_of(' ', start);
name = s.substr(start);
You can also make use of the std::string functions find, find_first_not_of, and substr to parse the information.
You simply work your way down the original string finding the opening quote ", storing the index, then finding the closing quote, and its index, the integer string is the characters in between.
Next, you can use find_first_not_of locating the first character not a ", \t" (comma, space, tab), taking the name as the remainder of the original string.
#include <iostream>
#include <string>
int main (void) {
std::string s = "\"1001\", Name", ssint, ssname;
size_t begin, end;
begin = s.find ("\""); /* locate opening quote */
if (begin == std::string::npos) { /* validate found */
std::cerr << "error: '\"' not found.\n";
return 1;
}
end = s.find ("\"", begin + 1); /* locate closing quote */
if (end == std::string::npos) { /* validate found */
std::cerr << "error: closing '\"' not found.\n";
return 1;
}
ssint = s.substr (begin + 1, end - 1); /* int is chars between */
begin = s.find_first_not_of (", \t", end + 1); /* find not , space tab */
if (begin == std::string::npos) { /* validate found */
std::cerr << "error: no non-excluded characters found.\n";
return 1;
}
ssname = s.substr (begin); /* name is reamining chars */
std::cout << "int : " << ssint << "\nname: " << ssname << '\n';
}
(note: always validate the results of find and find_first_not_of by ensuring the return was not std::string::npos)
Example Use/Output
$ ./bin/parse_str
int : 1001
name: Name
You can find details on all of the string library member functions at cppreference - std::basic_string Let me know if you have any questions.
I'm having difficulty creating a function that reverse the order of the sentence around. I've read many functions on how to recursively reverse the letters around and I have successfully done so, but I do not want to reverse the letters in the words. I want to reverse the placement of the words in the sentence.
Example would be:
This is a sentence.
sentence. a is This
This is my code so far. How do I go from reversing order of letters of the entire sentence to placement order of words in a sentence?
The output of the current code would provide: !dlroW olleH
void reverse(const std::string str)
{
int length = str.size();
if(length > 0)
{
reverse(str.substr(0,length-1));
std::cout << str[0];
}
}
Edit: Additional question. If this was a char array would the logic be different?
Simplify your logic by using a std::istringstream and a helper function. The program below works for me.
#include <sstream>
#include <iostream>
void reverse(std::istringstream& stream)
{
std::string word;
if ( stream >> word )
{
reverse(stream);
std::cout << word << " ";
}
}
void reverse(const std::string str)
{
std::istringstream stream(str);
reverse(stream);
std::cout << std::endl;
}
int main(int argc, char** argv)
{
reverse(argv[1]);
return 0;
}
// Pass string which comes after space
// reverse("This is a sentence.")
// reverse("is a sentence.")
// reverse("a sentence.")
// reverse("sentence.")
// will not find space
// start print only word in that function
void reverse(const std::string str)
{
int pos = str.find_first_of(" ");
if (pos == string::npos) // exit condition
{
string str1 = str.substr(0, pos);
cout << str1.c_str() << " " ;
return;
}
reverse(str.substr(pos+1));
cout << str.substr(0, pos).c_str() << " ";
}
Simple to understand:
void reverse(const std::string str)
{
int pos = str.find_first_of(" ");
if (pos != string::npos) // exit condition
{
reverse(str.substr(pos + 1));
}
cout << str.substr(0, pos).c_str() << " ";
}
std::vector<std::string> splitString(const std::string &s, char delim) {
std::stringstream ss(s);
std::string item;
std::vector<std::string> tokens;
while (getline(ss, item, delim)) {
tokens.push_back(item);
}
return tokens;
}
void reverseString(const std::string& string) {
std::vector<std::string> words = splitString(string, ' ');
auto end = words.rend();
for (auto it = words.rbegin(); it <= end; it++) {
std::cout << *it << std::endl;
}
}
reverseString("This is a sentence.");
You can split input and print them in inverse order
Or if you want to use recursive structure just move the cout after calling a function like this:
void reverse(const std::string str)
{
std::stringstream ss(str);
std::string firstWord, rest;
if(ss >> firstWord)
{
getline(ss , rest);
reverse(rest);
std::cout << firstWord << " ";
}
}
I am not a C++ programmer, but I would create another array (tempWord[ ]) to store individual word.
Scan each word and store them into tempWord array. In your case, the words are separated by space, so:
a.get the index of the next space,
b substring to the index of the next space and
c. you should get {"This", "is", "a", "sentence."}
Add them up again reversely:
a. loop index i from "tempWord.length -1" to "0"
b. new String = tempWord[i]+" ";
print out result.
sometimes when you copy code from a document it gets line numbers and strange quotes. I've written a script to remove those initial numbers but it is very hard to find a way to remove those strange quotes ‘’“” so I've included my full code. It reads in a file and puts out a formatted file. But the compiler warns that these quotes are multi characters, which I guess means non standard ascii chars. It kinda works but it's not a great solution. Any help appreciated:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
string replaceChar(string str, char ch1, char ch2);
// Main
int main(int argc, char *argv[]) {
string line;
fstream stri, stro;
// ifstream in
stri.open(argv[1], ios::in);
if(stri.fail()){
cerr << "File failed to open for input" << endl;
return 1;
}
// ofstream out
stro.open("file_out.txt", ios::out);
if(stro.fail()){
cerr << "File failed to open for output" << endl;
return 1;
}
// Read - Write
//stri.get(c);
getline(stri, line, '\n');
while(!stri.eof()){
// Remove numbers
line.erase(0,3);
//line.replace( line.begin(), line.end(), "‘", "\'" );
//line.replace( line.begin(), line.end(), "’", "\'" );
//line.replace( line.begin(), line.end(), "“", "\'" );
//line.replace( line.begin(), line.end(), "”", "\'" );
line = replaceChar(line, '‘','\'');
line = replaceChar(line, '’','\'');
line = replaceChar(line, '“','\"');
line = replaceChar(line, '”','\"');
stro << line << endl;
getline(stri, line, '\n');
}
// Close files
stri.close();
stro.close();
// Output
cout << "File Edited Ok!";
//cout << count -1 << " characters copied."<< endl;
}
string replaceChar(string str, char ch1, char ch2) {
for (int i = 0; i < str.length(); ++i) {
if (str[i] == ch1)
str[i] = ch2;
}
return str;
}
Ok, it ain't pretty, but it works. Anyone want to refine searching for one of those damned strange quote marks be my guest!
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
// Function Declaration
bool replace(string& str, const string& from, const string& to);
bool checkMyLine(string line);
// Main
int main(int argc, char *argv[]) {
// line to edit
string line;
fstream stri, stro;
// ifstream in
stri.open(argv[1], ios::in);
if(stri.fail()){
cerr << "File failed to open for input" << endl;
return 1;
}
// ofstream out
stro.open("file_out.txt", ios::out);
if(stro.fail()){
cerr << "File failed to open for output" << endl;
return 1;
}
// Read - Write
while(getline(stri, line, '\n')){
// Remove numbers at start of each line followed by space, eg: "001: "
int i;
for(i = 0;i < line.length();i++)
{
if(line[i] == ' ') break;
}
line.erase(0,i+1);
//Replace Odd Chars
for(int i=0;i<line.length();i++)
{
replace(line, "\u2018","\u0027"); // replaces ‘
replace(line, "\u2019","\u0027"); // replaces ’
replace(line, "\u201C","\u0022"); // replaces “
replace(line, "\u201D","\u0022"); // replaces ”
}
// Write to file
stro << line << endl;
}
// Close files
stri.close();
stro.close();
// Output Message
cout << "File Edited Ok!";
}// End of Main
//
bool replace(string& str, const string& from, const string& to)
{
size_t start_pos = str.find(from);
if(start_pos == string::npos)
return false;
str.replace(start_pos, from.length(), to);
return true;
}
What kind of script did you write to remove the leading numbers?
Do you have access to sed or tr? They exist for just this kind of problem.
sed -e 's/[‘’“”]//g'
No need to re-invent the wheel
I'm building a simple interpreter of a language that i'm developing, but how i can do a cout of something that is after a word and in rounded by "", like this:
#include <iostream>
#include <fstream>
#include <string>
#include <cstdlib>
using namespace std;
int main( int argc, char* argv[] )
{
if(argc != 2)
{
cout << "Error syntax is incorrect!\nSyntax: " << argv[ 0 ] << " <file>\n";
return 0;
}
ifstream file(argv[ 1 ]);
if (!file.good()) {
cout << "File " << argv[1] << " does not exist.\n";
return 0;
}
string linha;
while(!file.eof())
{
getline(file, linha);
if(linha == "print")
{
cout << text after print;
}
}
return 0;
}
And how i can remove the "" when printing the text. Here is the file example:
print "Hello, World"
Read my post in the middle of the answers!
Thanks
I hope this simple example would help.
std::string code = " print \" hi \" ";
std::string::size_type beg = code.find("\"");
std::string::size_type end = code.find("\"", beg+1);
// end-beg-1 = the length of the string between ""
std::cout << code.substr(beg+1, end-beg-1);
This code finds the first occurnce of ". Then finds the next occurrence of it after the first one. Finally, it extracts the desired string between "" and prints it.
I'm assuming what you want is to identify quoted strings in the file, and print them without the quotes. If so, the below snippet should do the trick.
This goes in your while(!file.eof()) loop:
string linha;
while(!file.eof())
{
getline(file, linha);
string::size_type idx = linha.find("\""); //find the first quote on the line
while ( idx != string::npos ) {
string::size_type idx_end = linha.find("\"",idx+1); //end of quote
string quotes;
quotes.assign(linha,idx,idx_end-idx+1);
// do not print the start and end " strings
cout << "quotes:" << quotes.substr(1,quotes.length()-2) << endl;
//check for another quote on the same line
idx = linha.find("\"",idx_end+1);
}
}
I don't understand your problem. On input of
print "Hello, World"
your test of linha == "print" will never be true (as linha contains the rest of the line so the equalitry is never true).
Are you looking for help on string processing, i.e. splitting of the input line?
Or are you looking for regular expression help? There are libraries you can use for the latter.