Elegant solution for string parsing - c++

so i got a dozen of strings which i download, example's below which i need to parse.
"Australija 036 AUD 1 4,713831 4,728015 4,742199"
"Vel. Britanija 826 GBP 1 10,300331 10,331325 10,362319"
So my first idea was to count manually where the number i need is (the second one, 4,728015 or 10,331325 in exampels up) and get substring.(52,8)
But then i realized that few of the the strings im parsing has a >9 number in it, so i would need a substring of (51,9) for that case, so i cant do it this way
Second idea was to save all the number like chars in a vector, and then get vector[4] and save it into a seperate variable.
And third one is to just loop the string until i position myself after the 5th group of spaces and then substring it.
Just looking for some feedback on what would be "best".

The problem
is that we can have multiple words at the beginning of the string. I.e. the first element may contain spaces.
The solution
Start from the end of the string where we are stable.
Split the string up at the spaces. Start counting from the end, and pick the previous-last element.
Solution 1: Boost string algorithms
#include <string>
#include <vector>
#include <boost/algorithm/string.hpp>
using namespace std;
using namespace boost;
string extractstring(string & fullstring)
{
vector<string> vs;
split(vs, fullstring);
return vs[vs.size() - 2];
}
Solution 2: QString (from Qt framework)
#include <QString>
QString extractstring(QString & fullstring)
{
QStringlist sl = fullstring.split(" ");
return sl[vs.size() - 2];
}
Solution 3: STL only
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
using namespace std;
string extractstring(string & fullstring)
{
istringstream iss(fullstring);
vector<string> elements;
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter(elements));
return elements[elements.size() - 2];
}
Other solutions: regex, C-pointer acrobatic.
Update:
I would not use sscanf based solutions because it may be difficult to identify multiple words at the beginning of the string.

I believe you can do it with a single line using sscanf?
http://www.cplusplus.com/reference/cstdio/sscanf/
For example (http://ideone.com/e2cCT9):
char *str = "Australija 4,713831 4,728015 4,742199";
char tmp[255];
int a,b,c,d;
sscanf(str, "%*[^0-9] %d,%d %d,%d", &a, &b, &c, &d);
printf("Parsed values: %d %d %d %d\n",a,b,c,d);

The hurdle is that the first field is allowed to have spaces, but the remaining fields are separated by spaces.
This may not be elegant, but the concept should work:
std::string text_line;
getline(my_file, text_line);
std::string::size_type field_1_start;
const unsigned int text_length = text_line.length();
for (field_1_start = 0; field_1_start < text_length; ++field_1_start)
{
if (is_digit(text_line[field_1_start])
{
break;
}
}
if (field_1_start < text_length)
{
std::string remaining_text = text_line.substr(field_1_start, text_length - field_1_start);
std::istringstream input_data(remaining_text);
int field_1;
std::string field2;
input_data >> field_1;
input_data >> field_2;
//...
}

Related

Extracting integers from a string which has no space

so i'm currently trying to extract integers from a string. This is what i've done so far
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
using namespace std;
int main() {
string s="VA 12 RC13 PO 44";
stringstream iss;
iss << s;
string temp;
vector<int> a1;
int j;
while (!iss.eof()) {
iss >> temp;
if (stringstream(temp)>>j) {
a1.push_back(j);
}
temp="";
}
}
Now, this works fine, but if i change the string to something like s="VA12RC13PO44", aka with no spaces this code doesn't work. Does anyone know how to go about solving this?
Thanks
There are really many many possible solutions. It depends on how advanced you are. Personally I would always use a std::regex approach for such tasks, but this is maybe too complicated.
A simple handcrafted solution could be this:
#include <iostream>
#include <string>
#include <vector>
#include <cctype>
int main() {
std::string s = "VA 12 RC13 PO 44";
std::vector<int> a;
// We want to iterate over all characters in the string
for (size_t i = 0; i < s.size();) {
// Basically, we want to continue with the next character in the next loop run
size_t increments{1};
// If the current character is a digit, then we convert this and the following characters to an int
if (std::isdigit(s[i])) {
// The stoi function will inform us, how many characters ahve been converted.
a.emplace_back(std::stoi(s.substr(i), &increments));
}
// Either +1 or plus the number of converted characters
i += increments;
}
return 0;
}
So, here we check character for character of the string. If we find a digit, then we build a substring of the string, starting at the current character position and hand it of to std::stoi for conversion.
std::stoi will convert all characters it can get an convert it to an int. If there is a non-digit character, it stops conversion and informs on how many characters have been converted. We add this value to the current position of the evaluated character, to avoid to convert the digits from the same integer substring over and over again.
At the end, we put the resulting integer into the vector. We use emplace_back to avoid unnecessary temporary value.
This will of course work with or without blanks.

How to delete part of a string c++ [duplicate]

I got a string and I want to remove all the punctuations from it. How do I do that? I did some research and found that people use the ispunct() function (I tried that), but I cant seem to get it to work in my code. Anyone got any ideas?
#include <string>
int main() {
string text = "this. is my string. it's here."
if (ispunct(text))
text.erase();
return 0;
}
Using algorithm remove_copy_if :-
string text,result;
std::remove_copy_if(text.begin(), text.end(),
std::back_inserter(result), //Store output
std::ptr_fun<int, int>(&std::ispunct)
);
POW already has a good answer if you need the result as a new string. This answer is how to handle it if you want an in-place update.
The first part of the recipe is std::remove_if, which can remove the punctuation efficiently, packing all the non-punctuation as it goes.
std::remove_if (text.begin (), text.end (), ispunct)
Unfortunately, std::remove_if doesn't shrink the string to the new size. It can't because it has no access to the container itself. Therefore, there's junk characters left in the string after the packed result.
To handle this, std::remove_if returns an iterator that indicates the part of the string that's still needed. This can be used with strings erase method, leading to the following idiom...
text.erase (std::remove_if (text.begin (), text.end (), ispunct), text.end ());
I call this an idiom because it's a common technique that works in many situations. Other types than string provide suitable erase methods, and std::remove (and probably some other algorithm library functions I've forgotten for the moment) take this approach of closing the gaps for items they remove, but leaving the container-resizing to the caller.
#include <string>
#include <iostream>
#include <cctype>
int main() {
std::string text = "this. is my string. it's here.";
for (int i = 0, len = text.size(); i < len; i++)
{
if (ispunct(text[i]))
{
text.erase(i--, 1);
len = text.size();
}
}
std::cout << text;
return 0;
}
Output
this is my string its here
When you delete a character, the size of the string changes. It has to be updated whenever deletion occurs. And, you deleted the current character, so the next character becomes the current character. If you don't decrement the loop counter, the character next to the punctuation character will not be checked.
ispunct takes a char value not a string.
you can do like
for (auto c : string)
if (ispunct(c)) text.erase(text.find_first_of(c));
This will work but it is a slow algorithm.
Pretty good answer by Steve314.
I would like to add a small change :
text.erase (std::remove_if (text.begin (), text.end (), ::ispunct), text.end ());
Adding the :: before the function ispunct takes care of overloading .
The problem here is that ispunct() takes one argument being a character, while you are trying to send a string. You should loop over the elements of the string and erase each character if it is a punctuation like here:
for(size_t i = 0; i<text.length(); ++i)
if(ispunct(text[i]))
text.erase(i--, 1);
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main() {
string str = "this. is my string. it's here.";
transform(str.begin(), str.end(), str.begin(), [](char ch)
{
if( ispunct(ch) )
return '\0';
return ch;
});
}
#include <iostream>
#include <string>
using namespace std;
int main()
{
string s;//string is defined here.
cout << "Please enter a string with punctuation's: " << endl;//Asking for users input
getline(cin, s);//reads in a single string one line at a time
/* ERROR Check: The loop didn't run at first because a semi-colon was placed at the end
of the statement. Remember not to add it for loops. */
for(auto &c : s) //loop checks every character
{
if (ispunct(c)) //to see if its a punctuation
{
c=' '; //if so it replaces it with a blank space.(delete)
}
}
cout << s << endl;
system("pause");
return 0;
}
Another way you could do this would be as follows:
#include <ctype.h> //needed for ispunct()
string onlyLetters(string str){
string retStr = "";
for(int i = 0; i < str.length(); i++){
if(!ispunct(str[i])){
retStr += str[i];
}
}
return retStr;
This ends up creating a new string instead of actually erasing the characters from the old string, but it is a little easier to wrap your head around than using some of the more complex built in functions.
I tried to apply #Steve314's answer but couldn't get it to work until I came across this note here on cppreference.com:
Notes
Like all other functions from <cctype>, the behavior of std::ispunct
is undefined if the argument's value is neither representable as
unsigned char nor equal to EOF. To use these functions safely with
plain chars (or signed chars), the argument should first be converted
to unsigned char.
By studying the example it provides, I am able to make it work like this:
#include <string>
#include <iostream>
#include <cctype>
#include <algorithm>
int main()
{
std::string text = "this. is my string. it's here.";
std::string result;
text.erase(std::remove_if(text.begin(),
text.end(),
[](unsigned char c) { return std::ispunct(c); }),
text.end());
std::cout << text << std::endl;
}
Try to use this one, it will remove all the punctuation on the string in the text file oky.
str.erase(remove_if(str.begin(), str.end(), ::ispunct), str.end());
please reply if helpful
i got it.
size_t found = text.find('.');
text.erase(found, 1);

c++ extract an int from inside a string using stringstream

So I have the following string, line is it possible to extract the int that's inside?
I can use a very rudimentary regex expression but some stringstream solutions I found here look way cleaner and convert to type int.
string line = " <li id="episode_275">"
I have the following code, but I don't know to deal with the rest of the string like: the 4 tab indent,the "
int value;
stringstream ss(line);
ss >> value;
This can be done quite simply by just looking for the first digit, then having strtol do the integer parsing for you from that point:
#include <string>
#include <cctype>
#include <cstdlib>
int extractFirstIntInString(std::string const& s)
{
for (std::size_t i = 0; i != s.size(); ++i)
if (std::isdigit(s[i]))
return std::strtol(s.c_str() + i, nullptr, 10);
return 0; // no integer in string
}

Finding all occurrences of a character in a string

I have comma delimited strings I need to pull values from. The problem is these strings will never be a fixed size. So I decided to iterate through the groups of commas and read what is in between. In order to do that I made a function that returns every occurrence's position in a sample string.
Is this a smart way to do it? Is this considered bad code?
#include <string>
#include <iostream>
#include <vector>
#include <Windows.h>
using namespace std;
vector<int> findLocation(string sample, char findIt);
int main()
{
string test = "19,,112456.0,a,34656";
char findIt = ',';
vector<int> results = findLocation(test,findIt);
return 0;
}
vector<int> findLocation(string sample, char findIt)
{
vector<int> characterLocations;
for(int i =0; i < sample.size(); i++)
if(sample[i] == findIt)
characterLocations.push_back(sample[i]);
return characterLocations;
}
vector<int> findLocation(string sample, char findIt)
{
vector<int> characterLocations;
for(int i =0; i < sample.size(); i++)
if(sample[i] == findIt)
characterLocations.push_back(sample[i]);
return characterLocations;
}
As currently written, this will simply return a vector containing the int representations of the characters themselves, not their positions, which is what you really want, if I read your question correctly.
Replace this line:
characterLocations.push_back(sample[i]);
with this line:
characterLocations.push_back(i);
And that should give you the vector you want.
If I were reviewing this, I would see this and assume that what you're really trying to do is tokenize a string, and there's already good ways to do that.
Best way I've seen to do this is with boost::tokenizer. It lets you specify how the string is delimited and then gives you a nice iterator interface to iterate through each value.
using namespace boost;
string sample = "Hello,My,Name,Is,Doug";
escaped_list_seperator<char> sep("" /*escape char*/, ","/*seperator*/, "" /*quotes*/)
tokenizer<escaped_list_seperator<char> > myTokens(sample, sep)
//iterate through the contents
for (tokenizer<escaped_list_seperator<char>>::iterator iter = myTokens.begin();
iter != myTokens.end();
++iter)
{
std::cout << *iter << std::endl;
}
Output:
Hello
My
Name
Is
Doug
Edit If you don't want a dependency on boost, you can also use getline with an istringstream as in this answer. To copy somewhat from that answer:
std::string str = "Hello,My,Name,Is,Doug";
std::istringstream stream(str);
std::string tok1;
while (stream)
{
std::getline(stream, tok1, ',');
std::cout << tok1 << std::endl;
}
Output:
Hello
My
Name
Is
Doug
This may not be directly what you're asking but I think it gets at your overall problem you're trying to solve.
Looks good to me too, one comment is with the naming of your variables and types. You call the vector you are going to return characterLocations which is of type int when really you are pushing back the character itself (which is type char) not its location. I am not sure what the greater application is for, but I think it would make more sense to pass back the locations. Or do a more cookie cutter string tokenize.
Well if your purpose is to find the indices of occurrences the following code will be more efficient as in c++ giving objects as parameters causes the objects to be copied which is insecure and also less efficient. Especially returning a vector is the worst possible practice in this case that's why giving it as a argument reference will be much better.
#include <string>
#include <iostream>
#include <vector>
#include <Windows.h>
using namespace std;
vector<int> findLocation(string sample, char findIt);
int main()
{
string test = "19,,112456.0,a,34656";
char findIt = ',';
vector<int> results;
findLocation(test,findIt, results);
return 0;
}
void findLocation(const string& sample, const char findIt, vector<int>& resultList)
{
const int sz = sample.size();
for(int i =0; i < sz; i++)
{
if(sample[i] == findIt)
{
resultList.push_back(i);
}
}
}
How smart it is also depends on what you do with those subtstrings delimited with commas. In some cases it may be better (e.g. faster, with smaller memory requirements) to avoid searching and splitting and just parse and process the string at the same time, possibly using a state machine.

Finding character in String in Vector

Judging from the title, I kinda did my program in a fairly complicated way. BUT! I might as well ask anyway xD
This is a simple program I did in response to question 3-3 of Accelerated C++, which is an awesome book in my opinion.
I created a vector:
vector<string> countEm;
That accepts all valid strings. Therefore, I have a vector that contains elements of strings.
Next, I created a function
int toLowerWords( vector<string> &vec )
{
for( int loop = 0; loop < vec.size(); loop++ )
transform( vec[loop].begin(), vec[loop].end(),
vec[loop].begin(), ::tolower );
that splits the input into all lowercase characters for easier counting. So far, so good.
I created a third and final function to actually count the words, and that's where I'm stuck.
int counter( vector<string> &vec )
{
for( int loop = 0; loop < vec.size(); loop++ )
for( int secLoop = 0; secLoop < vec[loop].size(); secLoop++ )
{
if( vec[loop][secLoop] == ' ' )
That just looks ridiculous. Using a two-dimensional array to call on the characters of the vector until I find a space. Ridiculous. I don't believe that this is an elegant or even viable solution. If it was a viable solution, I would then backtrack from the space and copy all characters I've found in a separate vector and count those.
My question then is. How can I dissect a vector of strings into separate words so that I can actually count them? I thought about using strchr, but it didn't give me any epiphanies.
Solution via Neil:
stringstream ss( input );
while( ss >> buffer )
countEm.push_back( buffer );
From that I could easily count the (recurring) words.
Then I did a solution via Wilhelm that I will post once I re-write it since I accidentally deleted that solution! Stupid of me, but I will post that once I have it written again ^^
I want to thank all of you for your input! The solutions have worked and I became a little better programmer. If I could vote up your stuff, then I would :P Once I can, I will! And thanks again!
If the words are always space separated, the easiest way to split them is to use a stringstream:
string words = .... // populat
istringstream is( words );
string word;
while( is >> word ) {
cout << "word is " << word << endl;
}
You'd want to write a function to do this, of course, and apply it to your strings. Or it may be better not to store the strings at allm but to split into words on initial input.
You can use std::istringstream to extract the words one by one and count them. But this solution consumes O(n) in space complexity.
string text("So many words!");
size_t count = 0;
for( size_t pos(text.find_first_not_of(" \t\n"));
pos != string::npos;
pos = text.find_first_not_of(" \t\n", text.find_first_of(" \t\n", ++pos)) )
++count;
Perhaps not as short as Neil's solution, but takes no space and extra-allocation other than what's already used.
Use a tokenizer such as the one listed here in section 7.3 to split the strings in your vector into single words (or rewrite it so that it just returns the number of tokens) and loop over your vector to count the total number of tokens you encounter.
Since C++11 there is a special and very powerful iterator, for iterating over patterns (for example words) in a string: The std::sregex_token_iterator
With that and iterator function std::distance, we can simply count all words (or other patterns in a string, by calculating the distance between the first and the last pattern.
The resulting program is always a one-liner:
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
#include <regex>
const std::regex re{R"(\w+)"};
const std::string test{"the quick brown fox jumps over the lazy dog"};
int main()
{
std::cout << std::distance(std::sregex_token_iterator(test.begin(), test.end(), re), {});
}
With this method, we can of course also split the string and show the resulting words:
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
#include <regex>
const std::regex re{R"(\w+)"};
const std::string test{"the quick brown fox jumps over the lazy dog"};
int main()
{
std::copy(std::sregex_token_iterator(test.begin(), test.end(), re), {}, std::ostream_iterator<std::string>(std::cout, "\n"));
}
By using the std::vectors range constructor, we can store also the words in a std::vector:
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
#include <regex>
#include <vector>
const std::regex re{R"(\w+)"};
const std::string test{"the quick brown fox jumps over the lazy dog"};
int main()
{
std::vector<std::string> words(std::sregex_token_iterator(test.begin(), test.end(), re), {});
std::cout << words.size();
}
You see. There are really many possibilities.
If you have a stream, then you can use the std::istream iterator for the same purpose-