How to use regex to format input number as string? - c++

I want to take a number input as a string and then strip redundant zeroes. I've done this using substring but was wondering if there's a way to do it using regex. For example, regex replace.
I only use a regex to check if the string is a valid number or not. I use substring and if conditions repeatedly for the strippings. I want to be able to convert say: "0012340.3200E6" to "12340.32E6" using regex.
rr("((\\+|-)?[[:digit:]]+)(\\.(([[:digit:]]+)?))?((e|E)((\\+|-)?)[[:digit:]]+)?");

You would be much better off converting the number into a float and then use string formatting, i.e. using
char buffer [50];
sprintf(buffer, "%E", ...)
)
or spritnf(buffer, "%fE%d", ..., ...) if for whatever you're doing this should work better.
Anyways, the regex should look something like
([+-]?)0*(\d+)(?:(\.\d*[1-9])|\.)0*(?:([Ee][+-]?)0*(\d+))?
(see regex101.com) and the substitution pattern would then be
$1$2$3$4$5
so you have everything in matching groups except the zeros and then you replace it with the content of all matching groups.
What's left is to escape the above expression and deal with the \d for digits – I've seen you're using [[:digit:]] instead.

Related

Parsing string use regex

I have some string
022/03/17 05:53:40.376949 1245680 029 DSA- DREP COLS log debug S 1
Need get 1245680 number use regex statement
I use next regular \d+ but many result in output.
First: are you sure that you want to have regex? Wouldn't a string cut operation be better?
First for a fixed amount of 29 characters as this is the prefix length and then search for the next space in the rest of the string to clear the remainder.
If you have to use regex for some other reason (e.g. you don't have the ability to implement a routine where you need it), you can use a regex with a group to extract just the number you want: ^.{29}(\d+).*$
Here you have to use group(1) or any other reference to a group in the language you are using to get the value you want.
As the rest of the line also can contain numbers (and I suppose a variable amount of characters, if this a log entry), my simple attempts to use lookbehind and lookahead combination failed as they also found that other numbers in the line.
If 022/03/17 05:53:40.376949 is always in that format, you can use:
\d{2}:\d{2}:\d{2}.\d{1,6}\s*(\d*)\s*
or more generally:
\d*\/\d*\/\d*\s+.*?\s+(\d*)
These will match the date/time segment, whitespace, the sequence of (captured) digits you desire, and then more whitespace.

Regular Expression using C++

I want to create a Regular Expression which will only the input string to contain only [G|y|M|d|D|F|E|h|H|m|s|S|w|W|a|z|Z] so I come up with some code from below:
std::regex Reg = regex("[G|y|M|d|D|F|E|h|H|m|s|S|w|W|a|z|Z]");
My problem is that my regex still not correct because my input string can contain other characters with the characters in the above group such as:
std::string myInputString = Gx //correct
Which Gx has to be wrong
I’m not a user of C++’s regex lib, but I understand it supports ECMAScript syntax. So I don’t think you need the pipe characters. The “any character in set” [] syntax doesn’t use pipes. Secondly, if you want to match the entire input string (instead of any part of it) you need to use the ^ and $ anchors
Try:
std::regex( "^[GyMdDFEhHmsSwWazZ]+$" );
What i can understand from your question is that, you want to input string with those chosen characters only
the regex is correct up to my knowledge
but you need to compare the string char by char because if you are not doing that you might get the similar results like you are getting now...

Regular expression: matching part of words [duplicate]

I'm trying to make a Regex that matches this string {Date HH:MM:ss}, but here's the trick: HH, MM and ss are optional, but it needs to be "HH", not just "H" (the same thing applies to MM and ss). If a single "H" shows up, the string shouldn't be matched.
I know I can use H{2} to match HH, but I can't seem to use that functionality plus the ? to match zero or one time (zero because it's optional, and one time max).
So far I'm doing this (which is obviously not working):
Regex dateRegex = new Regex(#"\{Date H{2}?:M{2}?:s{2}?\}");
Next question. Now that I have the match on the first string, I want to take only the HH:MM:ss part and put it in another string (that will be the format for a TimeStamp object).
I used the same approach, like this:
Regex dateFormatRegex = new Regex(#"(HH)?:?(MM)?:?(ss)?");
But when I try that on "{Date HH:MM}" I don't get any matches. Why?
If I add a space like this Regex dateFormatRegex = new Regex(#" (HH)?:?(MM)?:?(ss)?");, I have the result, but I don't want the space...
I thought that the first parenthesis needed to be escaped, but \( won't work in this case. I guess because it's not a parenthesis that is part of the string to match, but a key-character.
(H{2})? matches zero or two H characters.
However, in your case, writing it twice would be more readable:
Regex dateRegex = new Regex(#"\{Date (HH)?:(MM)?:(ss)?\}");
Besides that, make sure there are no functions available for whatever you are trying to do. Parsing dates is pretty common and most programming languages have functions in their standard library - I'd almost bet 1k of my reputation that .NET has such functions, too.
In your edit you mention an unwanted leading space in the result… to check a leading or trailing condition together with your regex without including this to the result you can use lookaround feature of regex.
new Regex(#"(?<=Date )(HH)?:?(MM)?:?(ss)?")
(?<=...) is a lookbehind pattern.
Regex test site with this example.
For input Date HH:MM:ss, it will match both regexes (with or without lookbehind).
But input FooBar HH:MM:ss will still match a simple regex, but the lookbehind will fail here. Lookaround doesn't change the content of the result, but it prevents false matches (e.g., this second input that is not a Date).
Find more information on regex and lookaround here.

RegEx to match string between delimiters or at the beginning or end

I am processing a CSV file and want to search and replace strings as long as it is an exact match in the column. For example:
xxx,Apple,Green Apple,xxx,xxx
Apple,xxx,xxx,Apple,xxx
xxx,xxx,Fruit/Apple,xxx,Apple
I want to replace 'Apple' if it is the EXACT value in the column (if it is contained in text within another column, I do not want to replace). I cannot see how to do this with a single expression (maybe not possible?).
The desired output is:
xxx,GRAPE,Green Apple,xxx,xxx
GRAPE,xxx,xxx,GRAPE,xxx
xxx,xxx,Fruit/Apple,xxx,GRAPE
So the expression I want is: match the beginning of input OR a comma, followed by desired string, followed by a comma OR the end of input.
You cannot put ^ or $ in character classes, so I tried \A and \Z but that didn't work.
([\A,])Apple([\Z,])
This didn't work, sadly. Can I do this with one regular expression? Seems like this would be a common enough problem.
It will depend on your language, but if the one you use supports lookarounds, then you would use something like this:
(?<=,|^)Apple(?=,|$)
Replace with GRAPE.
Otherwise, you will have to put back the commas:
(^|,)Apple(,|$)
Or
(\A|,)Apple(,|\Z)
And replace with:
\1GRAPE\2
Or
$1GRAPE$2
Depending on what's supported.
The above are raw regex (and replacement) strings. Escape as necessary.
Note: The disadvatage with the latter solution is that it will not work on strings like:
xxx,Apple,Apple,xxx,xxx
Since the comma after the first Apple got consumed. You'd have to call the regex replacement at most twice if you have such cases.
Oh, and I forgot to mention, you can have some 'hybrids' since some language have different levels of support for lookbehinds (in all the below ^ and \A, $ and \Z, \1 and $1 are interchangeable, just so I don't make it longer than it already is):
(?:(?<=,)|(?<=^))Apple(?=,|$)
For those where lookbehinds cannot be of variable width, replace with GRAPE.
(^|,)Apple(?=,|$)
And the above one for where lookaheads are supported but not lookbehinds. Replace with \1Apple.
This does as you wish:
Find what: (^|,)(?:Apple)(,|$)
Replace with: $1GRAPE$2
This works on regex101, in all flavors.
http://regex101.com/r/iP6dZ8
I wanted to share my original work-around (before the other answers), though it feels like more of a hack.
I simply prepend and append a comma on the string before doing the simpler:
/,Apple,/,GRAPE,/g
then cut off the first and last character.
PHP looks like:
$line = substr(preg_replace($search, $replace, ','.$line.','), 1, -1);
This still suffers from the problem of consecutive columns (e.g. ",Apple,Apple,").

C# String Format Placeholders in Regular Expressions

I have a regular expression, defined in a verbatim C# string type, like so:
private static string importFileRegex = #"^{0}(\d{4})[W|S]\.(csv|cur)";
The first 3 letters, right after the regex line start (^) can be one of many possible combinations of alphabetic characters.
I want to do an elegant String.Format using the above, placing my 3-letter combination of choice at the start and using this in my matching algorithm, like this:
string regex = String.Format(importFileRegex, "ABC");
Which will give me a regex of ^ABC(\d{4})[W|S]\.(csv|cur)
Problem is, when I do the String.Format, because I have other curly braces in the string (e.g. \d{4}) String.Format looks for something to put in here, can't find it and gives me an error:
System.FormatException : Index (zero based) must be greater than or equal to zero and less than the size of the argument list.
Anyone know how, short of splitting the strings out, I can escape the other curly braces or something to avoid the above error?
Try this (notice the double curly braces):
#"^{0}(\d{{4}})[W|S]\.(csv|cur)"