Customised regex in ruby [closed] - regex

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I want to check if a string states Above 60. Example:
'>60', '> 60', 'above60', 'above 60', 'Above60', 'Above 60', OR more than 1 space in between (> and 60), (above and 60), (Above and 60).
How can I write a regex to validate a string that starts with either (>, above or Above) then any number of spaces and ends with 60?

How can I write a regex to validate a string that starts with either (>, above or Above) then any number of spaces and ends with 60?
It's very straight forward:
a string that starts with either >, above or Above
/^(>|above|Above)/
then any number of spaces
/^(>|above|Above)\s*/
and ends with 60
/^(>|above|Above)\s*60$/
Note that in Ruby, ^ and $ match beginning and end of a line, not string. You might want to change that to \A and \z respectively. And instead of specifying both cases explicitly (above / Above), you could append an i to make the regexp case-insensitive, i.e. /^(>|above)\s*60$/i.
As always, there's more than one way to get the desired result. I can recommend Rubular for fiddling with regular expressions: https://rubular.com/r/EEHBSOB3PK2Djk

r = /\A(?:>|[aA]bove) *60\z/
['>60', '> 60', 'above60', 'above 60', 'Above60', 'Above 60'].all? do |s|
s.match?(r)
end
#=> true
[' >60', '> 600', ' above60', 'above 600', 'Above60 '].any? do |s|
s.match?(r)
end
#=> false
We can write the regular expression in free-spacing mode to make it self-documenting.
/
\A # match beginning of string
(?: # begin a non-capture group
> # match '>'
| # or
[aA] # match 'a' or 'A'
bove # match 'bove'
) # end non-capture group
[ ]* # match 0+ spaces
60 # match '60'
\z # match end of string
/x # invoke free-spacing regex definition mode
Notice that in the above I placed the space in a character class ([ ]). Alternatively I could have escaped the space, used [[:space:]] or one of a few other options. Without protecting the space in one of these ways it would be stripped out (when using free-spacing mode) before the regex is parsed.
When spaces are to be reflected in a regex I use space characters rather than whitespace characters (\s), mainly because the latter also match end-of-line terminators (\n and possibly \r) which can result in problems.

This should work :
/(>|above|more than){0,1} *60\+*/i
The i at the end of the regex is for case insensitivity.
If you need additional prefixes, just add it after more than separated by a pipe |

Related

Get data between two pipes present in payload [duplicate]

This question already has answers here:
regex to match substring after nth occurence of pipe character
(3 answers)
Closed 2 years ago.
I have recently started learning regex in ruby and I wanted to extract specific data fro payload.
My payload looks something like this:
2021-02-01T16:06:06.703Z CEF:0|ABCD|Sample text|Numbers|Sample random Text |This value is random and i want to take this value out from payload|9|rest of the payload
Since my data is present between pipes (||), I wrote this regex:
(?<=\|)[^|]++(?=\|)
But the problem is, this regex is taking all the values present between | |.
Can anyone help me extract value between 5th pipe | and 6th pipe |.
You wish to extract the text that is between the 5th and 6th pipe. You can do that with the following regular expression.
r = /\A(?:[^|]*\|){5}\K[^|]*(?=\|)/
str = "2021-02-01T16:06:06.703Z CEF:0|ABCD|Sample text|Numbers|Sample random Text |My dog has fleas|9|rest of the payload"
str[r] #=> "My dog has fleas"
"a|b|c|d|e|My dog has fleas"[r]
#=> nil
We can write the regular expression in free-spacing mode to make it self-documenting. Free-spacing mode causes Ruby's regex engine to remove all comments and spaces before parsing the expression (which means that any spaces that are intended need to be protected, by escaping them, by putting them in a character class, etc.).
/
\A # match beginning of the sting
(?: # begin a non-capture group
[^|]* # match any character other than a pipe zero or more times
\| # match a pipe
){5} # end non-capture group and execute it 5 times
\K # discard all previous matches and reset the start of the
# match to the current location
[^|]* # match any character other than a pipe zero or more times
(?= # begin a positive lookahead to assert that the next
# character is a pipe
\| # match a pipe
)
/x # invoke free-spacing mode
Another way is to remove \K and add a capture group:
str[/\A(?:[^|]*\|){5}([^|]*)(?=\|)/, 1]
#=> "My dog has fleas"
Of course, you don't need to use a regular expression for this:
str.count('|') > 5 && str.split('|')[5]
#=> "My dog has fleas"

regex for certain characters and rules [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am trying to build a regex to ensure a given string only contains these 13 certain characters/rules. I am having a bit of trouble. Any help would be appreciated.
Allowed Characters:
a-z
A-Z
0-9
! (Exclamation point)
- (Hyphen)
_ (Underscore)
. (Period)
* (Asterisk)
' (Single Quote)
( (Open parenthesis)
) (Close parenthesis)
(No consecutive spaces)
*. (CANNOT end with a period)
So Far I have this
/^[+\-0-9().!-_*' ]+$/g
But not getting expected results. Thank you in advance.
EDIT:
Sorry first time posting here. Here are some test cases(JS). Second one should not pass because it has consecutive spaces and ends with period.:
let testOne = "Testing Regex - 2021 Th*i)s_On(e_pa!'ss.es.end";
let testTwo ="Testing Regex - 2021 Th*i)s_On(e_pa!'ss.es.end but
shouldn't.";
testOne.match(/^[+\-\w().!-_*' ]+$/g);
testTwo.match(/^[+\-\w().!-_*' ]+$/g);
Some issues:
Your regex does not allow for Latin letters: you didn't list them.
Your regex allows for some additional characters (including $, # and %) because of !-*, which specifies a range.
There is no provision for not allowing more than a single space.
There is no provision for not allowing a dot as last character
The g modifier has little purpose when you have end-of-string markers requiring that a match will always match the whole input.
From your regular expression it seems you also require that the input has at least 1 character.
Taken all that together, we get this:
/^(?!.*? )(?!.*?\.$)[\w+\-().!*' ]+$/
You can try this:
^(?!.* )[\w!()\-*'\s.]+[\w!()\-*'\s]$
https://regex101.com/r/kTcJUN/3
And if you don't want to allow space character at the end of string then:
^(?!.* )[\w!()\-*'\s.]+[\w!()\-*']$
Explanation:
(?!.* ) - Exclude double space in string
\w - any word character. Matches any letter, digit or underscore. Equivalent to [a-zA-Z0-9_].
! - literally !
( - literally (
) - literally )
- - literally -
* - literally *
' - literally '
\s - space character
. - literally .
+ - quantifier. Matches between one and unlimited times.
[\w!()\-*'\s] - Allow a single character from the list. Putting this just before $ (end of line) makes this character last in string.

Regex to move return value to function parameter [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have many lines like the following one:
PlayerInfo[playerid][pValue] = cache_get_value_name_int(i, "field");
However, due to library changes, I am now needed to replace this line with the following:
cache_get_value_name_int(i, "field", PlayerInfo[playerid][pValue]);
The problem is that PlayerInfo[playerid][pValue] is the "word" that changes. Every other line replaces this. Same things happens with "field".
I have a lot of lines which need replacing, at least a couple of hundreds of lines, so I want to find some sort of regex to replace them easily.
Any solutions for this?
In Notepad++ you can use this regex:
([^\s]+) = cache_get_value_name_int\(i,\s*("[^"]+")\s*\);
It searches for some number of non-space characters (captured as group 1), followed by = (you might want to use \s*=\s* if spacing can vary), followed by cache_get_value_name_int(i,, a string enclosed in " (captured as group 2) and then a trailing ) and ;.
and replace it with
cache_get_value_name_int\(i, $2, $1\);
Note that you may need to add \s* in places to account for different spacing.
If the value i can also change, you can use this regex which captures that string as well:
([^\s]+) = cache_get_value_name_int\((\w+),\s*("[^"]+")\s*\);
and replace it with:
cache_get_value_name_int\($2, $3, $1\);
Ctrl+H
Find what: ^(\S+) = (cache_get_value_name_int\(\w+, "\w+")
Replace with: $2, $1
check Wrap around
check Regular expression
Replace all
Explanation:
^ # beginning of line
(\S+) # group 1, 1 or more non space characters
= # space, equal sign, space
( # start group 2
cache_get_value_name_int\( # literally
\w+ # 1 or more word characters
, # comma, space
"\w+" # 1 or more word characters surrounded with quotes
) # end group 2
Replacement:
$2 # content of group 2
, # comma & space
$1 # content of group 1
Result for given example:
cache_get_value_name_int(i, "field", PlayerInfo[playerid][pValue]);
Screen capture:

Perl comprehensive phone number regex [duplicate]

This question already has answers here:
How to validate phone numbers using regex
(43 answers)
Closed 4 years ago.
I have a file that contains phone numbers of the following formats:
(xxx) xxx.xxxx
(xxx).xxx.xxxx
(xxx) xxx-xxxx
(xxx)-xxx-xxxx
xxx.xxx.xxxx
xxx-xxx-xxxx
xxx xxx-xxxx
xxx xxx.xxxx
I must parse the file for phone numbers of those and ONLY those formats, and output them to a separate file. I'm using perl, and so far I have what I think is a valid regex for two of these numbers
my $phone_regex = qr/^(\d{3}\-)?(\(\d{3}\))?\d{3}\-\d{4}$/;
But I'm not sure if this is correct, or how to do the rest all in one regex. Thank you!
Here you go
\(?\d{3}\)?[-. ]\d{3}[-. ]\d{4}
See a demo on regex101.com.
Broken down this is
\(? # "(", optional
\d{3} # three digits
\)? # ")", optional
[-. ] # one of "-", "." or " "
\d{3} # three digits
[-. ] # same as above
\d{4} # four digits
If you want, you can add word boundaries on the right site (\b), some potential matches may be filtered out then.
You haven't escaped parenthesis properly and have uselessly escaped hyphen which isn't needed. The regex you are trying to create is this,
^\(?\d{3}\)?[ .-]\d{3}[ .-]\d{4}$
Explanation:
^ -
\(? - Optional starting parenthesis (
\d{3} - Followed by three digits
\)? - Optional closing parenthesis )
[ .-] - A single character either a space or . or -
\d{3} - Followed by three digits
[ .-] - Again a single character either a space or . or -
\d{4} - Followed by four digits
$ - End of string
Demo
Your current regex allows too much, as it will allow xxx-(xxx) at the beginning. It also doesn't handle any of the . or space separated cases. You want to have only three sets of digits, and then allow optional parentheses around the first set which you can use an alternation for, and then you can make use of character classes to indicate the set of separators you want to allow.
Additionally, don't use \d as it will match any unicode digit. Since you likely only want to allow ASCII digits, use the character class [0-9] (there are other options, but this is the simplest).
Finally, $ allows a newline at the end of the string, so use \z instead which does not. Make sure if you are reading these from a file that you chomp them so they do not contain trailing newlines.
This leaves us with:
qr/^(?:[0-9]{3}|\([0-9]{3}\))[-. ][0-9]{3}[-.][0-9]{4}\z/
If you want to ensure that the two separators are the same if the first is a . or -, it is easiest to do this in multiple regex checks (these can be more lenient since we already validated the general format):
if ($str =~ m/^[0-9()]+ /
or $str =~ m/^[0-9()]+\.[0-9]{3}\./
or $str =~ m/^[0-9()]+-[0-9]{3}-/) {
# allowed
}

complex regular expression question on stop set [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
What regular expression to perform search for header that starts with a number such as 1. Humility?
Here's the sample data screen shot, http://www.knowledgenotebook.com/issue/sampleData.html
Thanks.
Don't know what regex your using so I asume its Perl compatible.
You should always post some example data incase your perceptions of regex are unclear.
Breaking down what your 'Stop signs' are:
## left out of regex, this could be anything up here
##
(?: # Start of non-capture group START sign
\d+\. # 1 or more digits followed by '.'
| # or
\(\d+\) # '(' folowed by 1 or more digits followed by ')'
# note that \( could be start of capture group1 in bizzaro world
) # End group
\s? # 0 or 1 whitespace (includes \n)
[^\n<]+ # 1 or more of not \n AND not '<' STOP sign's
It seems you want all chars after the group up to but not to include the
very next \n OR the very next '<'. In that case you should get rid of the \s?
because \s includes newline, if it matches a newline here, it will continue to match
until [^\n<]+ is satisfied.
(?:\d+\.|\(\d+\))[^\n<]+
Edit - After viewing your sample, it appears that you are searching unrendered html
pasted in html content. In that case the header appears to be:
'1. Self-Knowledge<br>' which when the entities are converted, would be
1. Self-Knowledge<br>
Self-Knowledge
Superior leadership ...
You can add the entity to the mix so that all your bases are covered (ie: entity, \n, <):
((?:\d+\.|\(\d+\)))[^\S\n]+((?:(?!<|[\n<]).)+)
Where;
Capture group1 = '1.'
Capture group2 = 'Self-Knowledge'
Other than that, I don't know what it could be.