boost split with a single character or just one string - c++

I wish to split a string on a single character or a string. I would like to use boost::split since boost string is our standard for basic string handling (I don't wish to mix several techniques).
In the single character case I could do split(vec,str,is_any_of(':')) but I'd like to know if there is a way to specify just a single character. It may improve performance, but more importantly I think the code would be clearer with just a single character, since is_any_of conveys a different meaning that what I want.
For matching against a string I don't know what syntax to use. I don't wish to to construct a regex; some simple syntax like split(vec,str,match_str("::") would be good.

I was looking for the same answer but I couldn't find one. Finally I managed to produce one on my own.
You can use std::equal_to to form the predicate you need. Here's an example:
boost::split(container, str, std::bind1st(std::equal_to<char>(), ','));
This is exactly how I do it when I need to split a string using a single character.

In the following code, let me assume using namespace boost for brevity.
As for splitting on a character, if only algorithm/string is allowed,
is_from_range might serve the purpose:
split(vec,str, is_from_range(':',':'));
Alternatively, if lambda is allowed:
split(vec,str, lambda::_1 == ':');
or if preparing a dedicated predicate is allowed:
struct match_char {
char c;
match_char(char c) : c(c) {}
bool operator()(char x) const { return x == c; }
};
split(vec,str, match_char(':'));
As for matching against a string, as David Rodri'guez mentioned,
there seems not to be the way with split.
If iter_split is allowed, probably the following code will meet the purpose:
iter_split(vec,str, first_finder("::"));

On the simple token, I would just leave is_any_of as it is quite easy to understand what is_any_of( single_option ) means. If you really feel like changing it, the third element is a functor, so you could pass an equals functor to the split function.
That approach will not really work with multiple tokens, as the iteration is meant to be characater by character. I don't know the library enough to offer prebuilt alternatives, but you can implement the functionality on top of split_iterators

Related

Recognize a specific string from a string array without iteration

I'm writing a derivative calculator in C++, and in order to properly perform the derivation operations, I need to parse the input equation at various character indexes along the equation's string.
I'm using isdigit() to parse out the numerical values from the equation and then store them into a separate string array, however now I need to parse out the mathematical symbols from the equation to identify which operation I need to perform.
Is there any way I can modify (overwrite?) isdigit() to recognize custom values from a string array? I'd like to avoid iteration to make my code a little less cluttered, since I'm already going to be using plenty of loops for the rest of this program and I want my code to be easy to follow. Does overwriting and inheritance in C++ work similarly to inheritance in Java (with the exception of multiple inheritance/interfaces)?
Please refrain from posting solutions that are irrelevant to the scope of this question, IE; different approaches to deriving equations in C++, as I've used this approach for some fairly specific reasons.
Thanks
You can use the new powerful C++11 regular expressions library that does almost what ever parsing you want. This way, you'll avoid iterations and code cluttering.
You can just use strchr. (Not everyone will like the macros here, but they do make combining character classes easy.)
#define OPERATOR "+-*/"
#define DIGIT "0123456789"
// Is c an operator
if (strchr(OPERATOR, c)) {
// Yes it is
}
or:
// Is c an operator or a digit?
if (strchr(OPERATOR DIGIT, c)) {
// Yup
}
Overriding and Inheritance works more or less the same as in Java.
You need to define a function as virtual and redefine it in derived class.
I know the "Please refrain from posting...", but I've written a library that does function parsing and derivation.
It is available at https://github.com/B3rn475/MathParseKit
I hope you can find some tips there.

Is it better to use std::string or single char when possible?

Is it better to use std::string or single char when possible?
In my class I want to store certain characters. I have CsvReader
class, and I want to store columnDelimiter character. I wonder,
is it better to have it as char, or just use std::string?
In terms of usage I suppose std::string is far better, but I wonder
maybe there will be major performance differences?
If your delimiter is constrained to be a single character, use a char.
If your delimiter may be a string, use a std::string.
Seems fairly self-explanatory. Refer to the requirements of the project, and the constraints of the feature that follow from those requirements.
Personally it seems to me that a CSV field delimiter will always be a single character, in which case std::string is not only misleading, but pointlessly heavy.
In terms of usage I suppose std::string is far better
I have largely ignored this claim as you did not provide any rationale, but let me just say that I reject the hypothetical premise of the claim.
I wonder maybe there will be major performance differences?
Absolutely! A string consists of a dynamically-allocated block of characters; this is entirely more heavy than a single byte in memory. Notwithstanding the small-string-optimisation that your implementation may perform, it's simply pointless to add all this weight when all you wish to represent is a single character. A single character is a char, so use a char in such a case.
A character is a character. A string is a string; conceptually, a set of N characters, where N is any natural number.
If your design requires a character, use char. If it requires a string, use string.
In both cases you may have multilanguage issues (what happens if the characteer is 青? what happens if the string is 青い?), but these are totally independent of your choice of whether you need a character or a set of N characters, i.e. a string.

Replacing instances of a given std::string with another std::string in C++

I have been looking online without success for something that does the following. I have some ugly string returned as part of a Betfair SOAP response that uses a number of different char delimiters to identify certain parts of the information. What makes it awkward is that they are not always just one character length. Specifically, I need to split a string at ':' characters, but only after I have first replaced all instances of "\\:" with my personal flag "-COLON-" (which must then be replaced again AFTER the first split).
Basically I need all portions of a string like this
"6(2.5%,11\:08)~true~5.0~1162835723938~"
to become
"6(2.5%,11-COLON-08)~true~5.0~1162835723938~
In perl it is (from memory)
$mystring =~ s/\\:/-COLON-/g;
I have been looking for some time at the functions of std::string, specifically std::find and std::replace and I know that I can code up how to do what I need using these basic functions, but I was wondering if there was a function in the standard library (or elsewhere) that already does this??
boost::replace_all(input_string, "\\:", "-COLON-");
If you have C++11 something like this ought to do the trick:
#include <string>
#include <regex>
int main()
{
std::string str("6(2.5%,11\\:08)~true~5.0~1162835723938~");
std::regex rx("\\:");
std::string fmt("-COLON-");
std::regex_replace(str, rx, fmt);
return 0;
}
Edit: There is an optional fourth parameter for the type of match as well which can be anything found in std::regex_constants namespace I do believe. For example replacing only the first occurrence of the regular expression match with the supplied format.

How to compare the string in C++?

checkFormat(string &s)
{
}
string s is a string that indicate the date.
I want to compare a string s, to find whether it is in terms of "yyyy:mm::dd" or not.
What should I do?
compare it char by char? What if the string is "600:12:01" ?
Sorry for my poor English.
Don't use regex. Use strptime(), which is designed to parse time strings (hence the name: str p time, string -> parse -> time). A regex can't figure out that 2013:2:29 is invalid.
Here's one idea for an algorithm:
Check that the length is the expected one. This is quick.
Check that the colons are in the expected places.
Check that the first four characters are digits.
Check that the middle two characters are digits.
Check that the final two characters are digits.
If either test fails, return false. If you get through them all, return true.
Of course, this doesn't validate the ranges of the values. Also, you're not really "comparing", you are "validating".
You can use Boost Regex to check whether the string matches your pattern.
This is the job for regular expressions. Since you're using C++, Boost.Regex is one option.
Easiest would to be slice the string into its component parts of year, month, day and compare those.
See here to split strings by delimiter.
Does your compiler support regular expressions, i.e. are you using a somewhat C++11 compliant compiler? This would make the task much easier … Otherwise you might want to resort to Boost.Regex.
Assuming that you can use C++11, the following code should do what you want (untested though):
std::regex rx("\\d{4}:\\d{2}:\\d{2}");
return regex_match(s.begin(), s.end(), rx);
John Cook has written an introduction into C++ regular expressions. Just replace every occurrence of std::tr1 by std if your compiler supports C++11.

Overloading a method on default arguments

Is it possible to overload a method on default parameters?
For example, if I have a method split() to split a string, but the string has two delimiters, say '_' and "delimit". Can I have two methods something like:
split(const char *str, char delim = ' ')
and
split(const char *str, const char* delim = "delimit");
Or, is there a better way of achieving this? Somehow, my brain isn't working now and am unable to think of any other solution.
Edit: The problem in detail:
I have a string with two delimiters, say for example, nativeProbableCause_Complete|Alarm|Text. I need to separate nativeProbableCause and Complete|Alarm|Text and then further, I need to separate Complete|Alarm|Text into individual words and join them back with space as a separator sometime later (for which I already have written a utility and isn't a big deal). It is only the separation of the delimited string that is troubling me.
No, you cant - if you think about it, the notion of a default means 'use this unless I say otherwise'. If the compiler has 2 options for a default, which one will it choose?
How about implementing as 2 different methods like
split_with_default_delimiter_space
split_with_default_delimiter_delimit
Personally I'd prefer using something like this (more readable.. intent conveying) over the type of overloading that you mentioned... even if it was somehow possible for the compiler to do that.
Why not just call split() twice and explicitly pass the delimiter the second time? Will delimiters always be single characters?
Do you perform any other processing on the 2nd set of words before joining them? If not, then for the second task what you really want to do is replace substrings. This is most easily done with std::string::find and std::string::replace. If you must use c-strings, you could use strstr/strchr/strpbrk, strcpy and strcat, or use just strstr/strchr/strpbrk and join them in place.
You could use a version of split that accepts a variable number of delimiters (split(const char*,vector<string>), if you want to split(const char*, const char**)) or just use Boost Tokenizer.