How to compare the string in C++? - c++

checkFormat(string &s)
{
}
string s is a string that indicate the date.
I want to compare a string s, to find whether it is in terms of "yyyy:mm::dd" or not.
What should I do?
compare it char by char? What if the string is "600:12:01" ?
Sorry for my poor English.

Don't use regex. Use strptime(), which is designed to parse time strings (hence the name: str p time, string -> parse -> time). A regex can't figure out that 2013:2:29 is invalid.

Here's one idea for an algorithm:
Check that the length is the expected one. This is quick.
Check that the colons are in the expected places.
Check that the first four characters are digits.
Check that the middle two characters are digits.
Check that the final two characters are digits.
If either test fails, return false. If you get through them all, return true.
Of course, this doesn't validate the ranges of the values. Also, you're not really "comparing", you are "validating".

You can use Boost Regex to check whether the string matches your pattern.

This is the job for regular expressions. Since you're using C++, Boost.Regex is one option.

Easiest would to be slice the string into its component parts of year, month, day and compare those.
See here to split strings by delimiter.

Does your compiler support regular expressions, i.e. are you using a somewhat C++11 compliant compiler? This would make the task much easier … Otherwise you might want to resort to Boost.Regex.
Assuming that you can use C++11, the following code should do what you want (untested though):
std::regex rx("\\d{4}:\\d{2}:\\d{2}");
return regex_match(s.begin(), s.end(), rx);
John Cook has written an introduction into C++ regular expressions. Just replace every occurrence of std::tr1 by std if your compiler supports C++11.

Related

Java Regex to find if a given String contains a set of characters in the same order of their occurrence.

We need Java Regex to find if a given String contains a set of characters in the same order of their occurrence.
E.g. if the given String is "TYPEWRITER",
the following strings should return a match:
"YERT", "TWRR" & "PEWRR" (character by character match in the order of occurrence),
but not
"YERW" or "YERX" (this contains characters either not present in the given string or doesn't match the order of occurrence).
This can be done by character by character matching in a for loop, but it will be more time consuming. A regex for this or any pointers will be highly appreciated.
First of all REGEX has nothing to do with it. Regex is powerful but not that much powerful to accomplish this.
The thing you are asking is a part of Longest Common Subsequence(LCS) Algorithm implementation. For your case you need to change the algorithm a bit. I mean instead of matching part of string from both, you'll require to match your one string as a whole subsequence from the Larger one.
The LCS is a dynamic algorithm and so far this is the fastest way to achieve this. If you take a look at the LCS Example here you'll find that what I am talking about.

Boost regex does not match

I made a python regular expression and now I'm supposed to code the program in C++.
I was told to use boost's regex by the respective person.
It is supposed to match a group of at least one to 80 lower alphanumeric characters including underscore followed by a backslash then another group of at least one to 80 lower alphanumeric characters again including an underscore and last but not least a question mark. The total string must be at least 1 character long and is not allowed to exceed 256.
Here is my python regex:
^((?P<grp1>[a-z0-9_]{1,80})/(?P<grp2>[a-z0-9_]{1,80})([?])){1,256}$
My current boost regex is:
^(([a-z0-9_]{1,80})\/([a-z0-9_]{1,80})([?])){1,256}$
Cut down basically my code would look like this:
boost::cmatch match;
bool isMatch;
boost::regex myRegex = "^(([a-z0-9_]{1,80})\/([a-z0-9_]{1,80})([?])){1,256}$";
isMatch = boost::regex_match(str.c_str(), match, myRegex);
Edit: whoops totally forgot the question xDD. My problem is quite simple: The regex doesn't match though it's supposed to.
Example matches would be:
some/more?
object/value?
devel42/version_number?
The last requirement
The total string must be at least 1 character long and is not allowed to exceed 256.
is always true as your string is already limited from 3 to 162 characters. You have only to keep the first part of your regex:
^[a-z0-9_]{1,80}/[a-z0-9_]{1,80}\?$
My g++ gives me the warning "unknown escape sequence: '\/'"; that means you should use "\\/" instead of "\/". You need a backslash char stored in the string, and then let the regex parser eat it as a escaping trigger.
By the way, my boost also requires a constructor invocation, so
boost::regex myRegex("^(([a-z0-9_]{1,80})\\/([a-z0-9_]{1,80})([?])){1,256}$");
seems work.
You can also use C++11 raw string literal to avoid C++ escaping:
boost::regex myRegex(R"(^(([a-z0-9_]{1,80})\/([a-z0-9_]{1,80})([?])){1,256}$)");
By the way, testing <regex> in libstdc++ svn is welcome. It should come with GCC 4.9 ;)
The actual error was a new line sent to the server by the client on entering the respective string that would've been later compared.
Funny how the errors root is rarely where you expect it to be.
Anyways, thank you all for your answers. They gave me the ability to clean up my regular expressions.

Regular Expression extract first three characters from a string

Using a regular expression how can I extract the first 3 characters from a string (Regardless of characters)? Also using a separate expression I want to extract the last 3 characters from a string, how would I do this? I can't find any examples on the web that work so thanks if you do know.
Thanks
Steven
Any programming language should have a better solution than using a regular expression (namely some kind of substring function or a slice function for strings). However, this can of course be done with regular expressions (in case you want to use it with a tool like a text editor). You can use anchors to indicate the beginning or end of the string.
^.{0,3}
.{0,3}$
This matches up to 3 characters of a string (as many as possible). I added the "0 to 3" semantics instead of "exactly 3", so that this would work on shorter strings, too.
Note that . generally matches any character except linebreaks. There is usually an s or singleline option that changes this behavior, but an alternative without option-setting is this, (which really matches any 3 characters):
^[\s\S]{0,3}
[\s\S]{0,3}$
But as I said, I strongly recommend against this approach if you want to use this in some code that provides other string manipulation functions. Plus, you should really dig into a tutorial.

Finding a string *and* its substrings in a haystack

Suppose you have a string (e.g. needle). Its 19 continuous substrings are:
needle
needl eedle
need eedl edle
nee eed edl dle
ne ee ed dl le
n e d l
If I were to build a regex to match, in a haystack, any of the substrings I could simply do:
/(needle|needl|eedle|need|eedl|edle|nee|eed|edl|dle|ne|ee|ed|dl|le|n|e|d|l)/
but it doesn't look really elegant. Is there a better way to create a regex that will greedly match any one of the substrings of a given string?
Additionally, what if I posed another constraint, wanted to match only substrings longer than a threshold, e.g. for substrings of at least 3 characters:
/(needle|needl|eedle|need|eedl|edle|nee|eed|edl|dle)/
note: I deliberately did not mention any particular regex dialect. Please state which one you're using in your answer.
As Qtax suggested, the expression
n(e(e(d(l(e)?)?)?)?)?|e(e(d(l(e)?)?)?)?|e(d(l(e)?)?)?|d(l(e)?)?|l(e)?|e
would be the way to go if you wanted to write an explicit regular expression (egrep syntax, optionally replace (...) by (?:...)). The reason why this is better than the initial solution is that the condensed version requires only O(n^2) space compared to O(n^3) space in the original version, where n is the length of the input. Try this with extraordinarily as input to see the difference. I guess the condensed version is also faster with many regexp engines out there.
The expression
nee(d(l(e)?)?)?|eed(l(e)?)?|edl(e)?|dle
will look for substrings of length 3 or longer.
As pointed out by vhallac, the generated regular expressions are a bit redundant and can be optimized. Apart from the proposed Emacs tool, there is a Perl package Regexp::Optimizer that I hoped would help here, but a quick check failed for the first regular expression.
Note that many regexp engines perform non-overlapping search by default. Check this with the requirements of your problem.
I have found elegant almostsolution, depending how badly you need only one regexp. For example here is the regexp, which finds common substring (perl) of length 7:
"$needle\0$heystack" =~ /(.{7}).*?\0.*\1/s
Matching string is in \1. Strings should not contain null character which is used as separator.
You should make a cycle which starters with length of the needle and goes downto treshold and tries to match the regexp.
Is there a better way to create a regex that will match any one of the
substrings of a given string?
No. But you can generate such expression easily.
Perhaps you're just looking for
.*(.{1,6}).*

What is a good delimiter for multiple Regex expressions in one string?

I have a configuration where I need to store multiple Regex expressions in one string so I can split the string into an array of expressions that I can process individually. What would be a good delimiter I can use for the split that won't be too complex and at the same time not get confused with parts of the actual regex expression?
you could take
the comments tag (?#COMMENTTEXT) with an magic word to break
or
you can insert a magic word something like BREAKHEREVOODOO
or
something that is unlikey to occur like two underscores (__)
edit:
or you could put the regexes in a xml string that contains a list of CDATA-elements :-)
A common delimiter is / but it can be changed if you want to use it in the regex.
If you really have to use a delimiter (I think, for example a JSON array would be a better alternative) you could introduce an escaping scheme: If it stand alone it is a delimiter, if it's preceded by a certain character (for example ) it is part of the regex.
You could use something that's unlikely to occur in a real regex, for example a string that can never match, and thus will most likely never be used:
$!^
for example looks safe.
If I absolutely had to do this, I would either use hacktick's idea of using Regex-comments, or I would prepend the regexes with a header of counts.
Say I had 3 regexes, I would begin the data with 5;10;20;; which would tell the parser that after ;; a 5 characters long regex would follow, after that one 10 characters long and so forth. The actual details are debatable but I hope you understand my idea.
The final string would be something like 5;10;20;;barns[a-zA-Z_]*?^Bonobo Monkey Hope$
Technically they also pass as a regex, but your code would of course require the header no matter what.
It's not beautiful but it's the most robust idea I can come up with.