Regex match Pattern through Characters

Regex match Pattern through Characters - regex

I have a Pattern: (^([-]?\d+([.]\d+)?,){6}([10],)([-]?\d+([.]\d+)?)$) which matches: "26.9841,300.007666,4,1,0,15,1,0" this is what I want, however my pattern does not match the following Strings:
"26 . 9841,300 . 007666,4,1,0,15,1,0"
"26.9841\n,300.007666\n,4,\n1,0,15,1,0"
"2 6 . 9 8 4 1 ,\n 3 0 0 .0 0 7 6 6 6 , 4 \n, 1 , 0 , 1 5 , 1 , 0"
Which is the exact same String just with random Spaces and New Lines thrown in.
I could match those with the following Pattern:
(^([-]?\s*?\n*?[0-9 ]+\s*?\n*?(\s*?\n*?[.]\s*?\n*?[0-9 ]+\s*?\n*?)?\s*?\n*?,\s*?\n*?){6}([10]\s*?\n*?,)(\s*?\n*?[-]?\s*?\n*?[0-9 ]+\s*?\n*?([.]\s*?\n*?[0-9 ]+\s*?\n*?)?)$)
Which matches 1, 2, and 3, however this pattern is absurd, most likely can be simplified, and doesn't match all New lines; (it won't match New Line occurrence in [0-9]+ (+) chunks). It's also just slapping "\s*?\n*?" wherever it can.
Question
What I want to know is if there is a way to match through those characters. Ignoring their occurrence, as long as you can say, if they weren't there the Pattern would match.
Note:
Input String should match: ((Decimal|Int),{6}(1|0),(Decimal|Int))
If New line characters appear at the end of the pattern assume that no more input can be found.
I cannot remove those characters from the input String as I need to know that they were there.
I do not care about leading or trailing spaces/new-lines
Pattern will always start with "-" or "[0-9]" (yes 0 can be the first char)
Pattern will always end with [0-9]
Edit
This Regex works and passes my test suite: (^(-?\s*[0-9]\s*[\s.0-9]*,){6}(\s*[10]\s*,)(\s*-?\s*[0-9][\s.0-9]*?)$)

If you wish to have a test that is a little more of a validation,
this would be more appropriate.
However, anytime you intersperse white space \s construct in a regex that
separates alike clusters (?:\s*\d)+ when this looks like most of your data,
there is a risk of no way points from which to end the search.
This particular regex might work though.
^\s*((?:\s*[-]?(?:\s*\d)+(?:\s*[.](?:\s*\d)+)?\s*,){6}\s*[10]\s*,\s*[-]?(?:\s*\d)+?(?:\s*[.](?:\s*\d)+?)?)$
https://regex101.com/r/YmKJgW/1
The capture group 1 is a convenience that strips leading white space from the match.
^
\s*
( # (1 start)
(?:
\s* [-]?
(?: \s* \d )+
(?:
\s* [.]
(?: \s* \d )+
)?
\s* ,
){6}
\s*
[10] \s* , \s*
[-]?
(?: \s* \d )+?
(?:
\s* [.]
(?: \s* \d )+?
)?
) # (1 end)
$

Related

replace groups in regex loop

I have these 2 lines:
What is P(output1|cause1=2, cause2=2)
What is P(output2|cause3=2)
I would like to change it to:
method_to_use(model, {"cause1": 2, "cause2": 2}, "output1")
method_to_use(model, {"cause3": 2}, "output2")
this is my regex:
.*P[(]([a-z1-9]+)[|](([a-z1-9]+)=([1-9]),?)+[)]
and I try to replace it like this:
method_to_use(model, {"$3": $4}, "$1")
but I only get the last fit of the group:
method_to_use(model, {"cause2": 2}, "output1")
is it possible to do some kind of a "loop" and change all fits on the way?

You could match the string with the following regular expression.
^.*P\(([^|]+)\|([^=]+)=(\d+)(?:, +([^=]+)=(\d+))?\)$
If capture group 4 is non-empty replace the (whole-string) match with
method_to_use(model, {"$2": $3, "$4": $5}, "$1")
This causes the string
What is P(output1|cause1=2, cause2=2)
to be replaced with
method_to_use(model, {"cause1": 2, "cause2": 2}, "output1")
Demo 1
If capture group 4 is empty replace the match with
method_to_use(model, {"$2": $3}, "$1")
This causes the string
What is P(output2|cause3=2)
to be replaced with
method_to_use(model, {"cause3": 2}, "output2")
Demo 2
Note that the regular expressions at the two links are equivalent, the only difference is that at Demo 1 I expressed the regex in free-spacing mode, which permits it to be self-documenting.
Instead of replacing the entire string one could of course simply form the new string from the values of the capture groups. If that is done ^.*P at the beginning of the regex could be changed to simply P.
The regex engine performs the following operations.
^ # match beginning of line
.*P\( # match 0+ chars then '|('
([^|]+) # save 1+ chars except '|' in cap grp 1 (output)
\| # match ':'
([^=]+) # save 1+ chars except '=' in cap grp 2 (causeA)
= # match '='
(\d+) # save 1+ digits in cap grp 3 (causeAval)
(?: # begin non-cap grp
,\ + # match ',' then 1+ spaces
([^=]+) # match 1+ chars except '=' in cap grp 4 (causeB)
= # match '='
(\d+) # match 1+ digits in cap grp 5 (causeBval)
)? # end non-cap grp and make it optional
\) # match ')'
$ # match end of line

One thing is certain: you can't do that with a single regex.
You may use a 3-step approach:
Replace .*P\( with method_to_use(
Replace \(\K(\w+)\|([^()]+) with model, {$2}, "$1"
Replace (\w+)=(\w+) with "$1": $2
Note that
.*P\( matches any 0 or more chars other than line break chars as many as possible and the P(
\(\K(\w+)\|([^()]+) matches (, then discards it from the match value with \K, and then 1+ word chars are captured into Group 1 ($1 in the replacement pattern), | is matched and then 1+ chars other than ) and ( are captured into Group 2 ($2)
(\w+)=(\w+) - 1+ word chars are captured into Group 1 ($1 in the replacement pattern), = is matched and then 1+ word chars are captured into Group 2 ($2).

How can I allow one space in a regular expression

The following Regex checks for a number which starts with 6, 8 or 9 and has to be exactly 8 digits long.
/^(6|8|9)\d{7}$/
Now I want to accept one space in between digits as well, but don't know where to start.
For example both 61234567 and 6123 4567 should be allowed, but only the first one passes my current regex.
Can you help me create it?

You may use
^(?!.*(?:\s\d+){2})[689](?:\s?\d){7}$
See the regex demo
Details
^ - start of string
(?!.*(?:\s\d+){2}) - a negative lookahead that fails the match if, after any 0+ chars other than line break chars, as many as possible occurrences, there are two occurrences of a whitespaces followed with 1+ digits
[689] - 6, 7 or 9
(?:\s?\d){7} - seven occurrences of an optional whitespace followed with a single digit
$ - end of string.
To allow leadign/trailing whitespace, add \s? (1 or 0) or \s* (0 or more) right after ^ and before $.
To allow a single 1+ more whitespace chunk in the digit string, use
^(?!.*(?:\s+\d+){2})[689](?:\s*\d){7}$
See this regex demo.

You could use the regular expression
/^[689](?:\d{7}|(?=.{8}$)\d* \d+)$/
demo
We can make this self-documenting by writing it in free-spacing mode:
/
^ # match beginning of line
[689] # match '6', '8' or '9'
(?: # begin non-capture group
\d{7} # match 7 digits
| # or
(?=.{8}$) # require the remainder of the line to have 8 chars
\d*\ \d+ # match 0+ digits, a space, 1+ digits
) # end non-capture group
$ # match end of line
/x # free-spacing regex definition mode

Using regex on a file to pull data out. Having issues with multi-line

I am looking to get to the next line of data within a text file. Here is an example of data from the file I am working with.
0519 ABF 244 AN A1 ADV STUFF 1.0 2.0 Somestuff 018 0155 MTWTh 10:30A 11:30A 20 20 0 6.7
Somestuff 011 0145 MTWTh 12:30P 1:30P
I have been trying to move to the next line by utilizing a variety of code such as.. carriage return \n using \s+ to replace the large space after 6.7. using m like so //m not finding a result just yet.
Here is some example code
while !regex_file.eof?
line = regex_file.gets.chomp
if line =~ ^.*?\d{4}\s+[A-Z]+\s+\d{3}.+$
puts line
end
end
Using https://rubular.com/ this particular set of code matches my desired output for the first line
0519 ABF 244 AN A1 ADV STUFF 1.0 2.0 Somestuff 018 0155 MTWTh 10:30A 11:30A 20 20 0 6.7
but does not match and haven't figured out how to match the next line.
Somestuff 011 0145 MTWTh 12:30P 1:30P

Try something like this: the \n captures the new line, and you can apply your own rules to capture anything you want which comes after \n - see below pls:
^.*\d{4}\s+[A-Z]+\s+\d{3}.+\n.*$

I've made an arbitrary assumption about the requirements for matching the second line. It is more demanding than the requirements for matching the first that are reflected in your regex, but I thought the additional complexity would have some educational value for you.
Here is a regular expression (untested) for matching both lines. Note you don't need ^.*? at the beginning of the regex and for the part of the regex that matches the first line .+$ adds nothing, so I removed it. After all you are just matching each line separately (line), and will display the entire line if there's a match. As well, the end-of-string anchor \z is more appropriate than the end-of-line anchor ($), though either can be used.
r = /
(?: # begin non-capture group
\d{4} # match 4 digits
\s+ # match > 0 whitespaces
[A-Z]+ # match > 0 uppercase letters
\s+ # match > 0 whitespaces
\d{3} # match 3 digits
| # or
\b # match a (zero-width) word break
[A-Z] # match 1 uppercase letter
[a-z]* # match >= 0 lowercase letter
\s+ # match > 0 whitespaces
\d{3} # match 3 digits
\s+ # match > 0 whitespaces
\d{4} # match 4 digits
\s+ # match > 0 whitespaces
[A-Za-z]+ # match > 0 letters
(?: # begin non-capture group
\s+ # match > 0 whitespaces
(?: # begin a non-capture group
0\d # match 0 followed by any digit
| # or
1[012] # match 1 followed by 0, 1 or 2
) # end non-capture group
: # match a colon
[0-5][0-9] # match 0-5 followed by 0-9
){2} # end non-capture group and execute twice
) # end non-capture group
/x # free-spacing regex definition mode
This regular expression is conventionally written as follows.
r = /(?:\d{4}\s+[A-Z]+\s+\d{3}|\b[A-Z][a-z]*\s+\d{3}\s+\d{4}\s+[A-Za-z]+(?:\s+(?:0\d|1[012]):[0-5][0-9]){2})/
You might go through the file putsing matching lines as follows:
File.foreach(fname) { |line| puts line if line.match? r }
See IO::foreach, which is a very convenient method for reading files line-by-line. Note IO class methods (such foreach) are commonly invoked with File as their receiver. That's OK, as File.superclass #=> IO, so File inherits those methods from IO.
When used without a block foreach returns an enumerator, which is often convenient as well. If, for example, you wished to return an array of matching lines (rather than puts them), you could write:
File.foreach(fname).with_object([]) do |line, arr|
arr << line.chomp if line.match? r
end

Your current regex:
^.*?\d{4}\s+[A-Z]+\s+\d{3}.+$
matches in this order:
the beginning of the line (^)
zero or more characters non-greedy .*?
four digits (\d{4})
one or more spaces (\s+)
one or more capital letters ([A-Z]+)
one or more spaces
three digits (\d{3})
one or more characters (.+)
the end of the line ($)
The second line of your file is:
Somestuff 011 0145 MTWTh 12:30P 1:30P
starts matching 0145 MTWT but then fails to match \d{3}

Specific password regular expression

I am having problems creating a regular expresion. It needs to fullfill the following:
1) Has 8-12 characters
2) At least 1 uppercase letter
3) At least 3 lowercase letters
4) At least 1 number
5) At least 1 special character
6) Has to start with a lowercase, upercase or numeric
7) Maximum of 2 repeating characters
Thanks in advance!

This should work
^(?=.*[A-Z])(?=(?:.*[a-z]){3})(?=.*[0-9])(?=.*[!"#$%&'()*+,\-./:;<=>?#[\]^_`{|}~])(?=(?:(.)(?!\1\1))+$)[a-zA-Z0-9].{7,11}$
Explained / Expanded
^ # BOS
(?= .* [A-Z] ) # 1 upper
(?=
(?: .* [a-z] ){3} # 3 lower
)
(?= .* [0-9] ) # 1 number
(?=
.* [!"#$%&'()*+,\-./:;<=>?#[\]^_`{|}~] # 1 special
)
(?= # Maximum 2 repeating
(?:
( . ) # (1)
(?! \1 \1 )
)+
$
)
[a-zA-Z0-9] # First alnum
.{7,11} # 8 to 12 max chars
$ # EOS

What you got so far?
Also, which set of regex are you using ?
I'd start with the length of the expression
Restrict it to be 8-12, something like [a-zA-Z]{8,12}
For the requirements on the first one you can use a []+
For the other requirements it's a little tricker

Regex to fail if multiple matches found

Take the following regex:
P[0-9]{6}(\s|\.|,)
This is designed to check for a 6 digit number preceded by a "P" within a string - works fine for the most part.
Problem is, we need the to fail if more than one match is found - is that possible?
i.e. make Text 4 in the following screenshot fail but still keep all the others failing / passing as shown:
(this RegEx is being executed in a SQL .net CLR)

If the regex engine used by this tool is indeed the .NET engine, then you can use
^(?:(?!P[0-9]{6}[\s.,]).)*P[0-9]{6}[\s.,](?:(?!P[0-9]{6}[\s.,]).)*$
If it's the native SQL engine, then you can't do it with a single regex match because those engines don't support lookaround assertions.
Explanation:
^ # Start of string
(?: # Start of group which matches...
(?!P[0-9]{6}[\s.,]) # unless it's the start of Pnnnnnn...
. # any character
)* # any number of times
P[0-9]{6}[\s.,] # Now match Pnnnnnn exactly once
(?:(?!P[0-9]{6}[\s.,]).)* # Match anything but Pnnnnnn
$ # until the end of the string
Test it live on regex101.com.

or use this pattern
^(?!(.*P[0-9]{6}[\s.,]){2})(.*P[0-9]{6}[\s.,].*)$
Demo
basically check if the pattern exists and not repeated twice.
^ Start of string
(?! Negative Look-Ahead
( Capturing Group \1
. Any character except line break
* (zero or more)(greedy)
P "P"
[0-9] Character Class [0-9]
{6} (repeated {6} times)
[\s.,] Character Class [\s.,]
) End of Capturing Group \1
{2} (repeated {2} times)
) End of Negative Look-Ahead
( Capturing Group \2
. Any character except line break
* (zero or more)(greedy)
P "P"
[0-9] Character Class [0-9]
{6} (repeated {6} times)
[\s.,] Character Class [\s.,]
. Any character except line break
* (zero or more)(greedy)
) End of Capturing Group \2
$ End of string

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex match Pattern through Characters - regex

Related

replace groups in regex loop

How can I allow one space in a regular expression

Using regex on a file to pull data out. Having issues with multi-line

Specific password regular expression

Regex to fail if multiple matches found

Categories

Resources