Spacing at last part of the regex - regex

I've the following regex:
/(\(*)?(\d+)(\)*)?([+\-*\/^%]+)(\(*)?(\)*)?/g
I have the following string:
(5+4+8+5)+(5^2/(23%2))
I use the regex to add space between numbers, arithmetic operators and parentheses.
I do that like:
\1 \2 \3 \4 \5 \6
That turns the string into:
( 5 + 4 + 8 + 5 ) + ( 5 ^ 2 / ( 23 % 2))
As you can see the last two parentheses don't get spaced.
How can I make them space as well?
The output should look like:
( 5 + 4 + 8 + 5 ) + ( 5 ^ 2 / ( 23 % 2 ))
Try out the regex here.

You could try a simple and fast solution
edit
Some tips:
I know you are not validating the simple math expressions, but it would not hurt to do that before trying to beautify.
Either way you should remove all whitespace ahead of time
Find \s+ Replace nothing
To condense summation symbols you could do:
Find (?:--)+|\++ Replace +
Find [+-]*-+* Replace -
Division and power symbol meaning will vary with implementation,
and to condense them is not advisable, and better to just validate the form.
Validation is the more complex feat complicated by the meaning of parenthesis,
and their balance. That is another topic.
A minimum character validation should be done though.
String must be matched by ^[+\-*/^%()\d]+$ at least.
After optionally doing the above, run the beautifier on it.
https://regex101.com/r/NUj036/2
Find ((?:(?<=[+\-*/^%()])-)?\d+(?!\d)|[+\-*/^%()])(?!$))
Replace '$1 '
Explained
( # (1 start)
(?: # Allow negation if symbol is behind it
(?<= [+\-*/^%()] )
-
)?
\d+ # Many digits
(?! \d ) # - don't allow digits ahead
| # or,
[+\-*/^%()] # One of these operators
) # (1 end)
(?! $ ) # Don't match if at end of string

You can try something like this based on word boundaries and on non-word characters:
\b(?!^|$)|\W\K(?=\W)
and replace with a space.
demo
details:
\b # a word-boundary
(?!^|$) # not at the start or at the end of the string
| # OR
\W # a non-word character
\K # remove characters on the left from the match result
(?=\W) # followed by a non-word character

Related

How to do text wrapping without adding newline if the residue is short?

Description
Say I have a lot of strings, some of them are very long:
Aim for the moon. If you miss, you may hit a star. – Clement Stone
Nothing about us without us
I want to have a text wrapper doing this algorithm:
Starting from the beginning of the string, identify the nearest blank character ( ) that around position 25
If the residue is smaller than 5 character-length, then do nothing. If not, replace that blank character with \n
Identify the next nearest blank character in the end of the next 25 characters
Return to 2 until end of line
So that text will be replaced to:
Aim for the moon. If you\nmiss, you may hit a star.\n– Clement Stone
Nothing about us without us
Attempt 1
Consulting Wrapping Text With Regular Expressions
Matching pattern: (.{1,25})( +|$\n?)
Replacing pattern: $1\n
But this will produce Nothing about us without\nus, which is not preferable.
Attempt 2
Using a Lookahead Construct in a If-Then-Else Conditionals:
Matching pattern: (.{1,25})(?(?=(.{1,5}$).*))( +|$\n?)
Replacing pattern: $1$2\n
It still produce Nothing about us without\nus, which is not preferable.
Created this based on #sln 's? answer to a different word wrap problem.
All I have added is this alternative point to add a line break:
"Expand by up to 5 characters until before a linebreak or EOS"
and changed the number of characters allowed from 50 to 25
[^\r\n]{1,5}(?=\r?\n|$)
Compressed
(?:((?>.{1,25}(?:[^\r\n]{1,5}(?=\r?\n|$)|(?<=[^\S\r\n])[^\S\r\n]?|(?=\r?\n)|$|[^\S\r\n]))|.{1,25})(?:\r?\n)?|(?:\r?\n|$))
Replacement
$1 followed by a linebreak
$1\r\n
Preview
https://regex101.com/r/pRqdhi/1
Detailed Regular Expression
(?:
# -- Words/Characters
( # (1 start)
(?> # Atomic Group - Match words with valid breaks
.{1,25} # 1-N characters
# Followed by one of 4 prioritized, non-linebreak whitespace
(?: # break types:
[^\r\n]{1,5}(?=\r?\n|$) # Expand by up to 5 characters until before a linebreak or EOS
|
(?<= [^\S\r\n] ) # 1. - Behind a non-linebreak whitespace
[^\S\r\n]? # ( optionally accept an extra non-linebreak whitespace )
| (?= \r? \n ) # 2. - Ahead a linebreak
| $ # 3. - EOS
| [^\S\r\n] # 4. - Accept an extra non-linebreak whitespace
)
) # End atomic group
|
.{1,25} # No valid word breaks, just break on the N'th character
) # (1 end)
(?: \r? \n )? # Optional linebreak after Words/Characters
|
# -- Or, Linebreak
(?: \r? \n | $ ) # Stand alone linebreak or at EOS
)
If your input is run line-by-line, and there is no newline character in the middle of a line, then you can try this:
Pattern: (.{1,25}.{1,5}$|.{1,25}(?= ))
Substitution: $1\n
Then apply this:
Pattern: \n
Substitution: \n

Ruby regex to extract a number from string containing only one number and trim the part after comma

I have a method to extract number from a string using regex, like that:
def format
str = "R$ 10.000,00 + Benefits"
str.split(/[^\d]/).join
end
Its returns --> 1000000. I need to modfy regex to return 10000, removing zeros after comma.
You can use
str.gsub(/(?<=\d),\d+|\D/, '')
See the regex demo.
Regex details
(?<=\d),\d+ - a comma that is immediately preceded with a digit ((?<=\d) is a positive lookbehind) and then one or more digits
| - or
\D - any non-digit symbol
One important aspect is that you should order these alternatives like this, \D must be used as the last alternative. Else, \D can match a , and the solution won't work.
str = "R$ 10.000,00 R$1.200.000,03 R$ 0,09 R$ 4.00,10 R$ 3.30005,00 R$ 6.700 R$ 6, R$ 6,0 R$ 00,20 R$6,001 US$ 5.122,00 Benefits"
R = /(?:(?<=\bR\$)|(?<=\bR\$ ))(?:0|[1-9]\d{0,2}(?:\.\d{3})*),\d{2}(?!\d)/
str.scan(R).map { |s| s.delete('.') }
#=> ["10000,00", "1200000,03", "0,09"]
None of the following substrings match because they have invalid formats: "4.00,10", " 3.30005,00", "6.700", "6,", "6,0", "00,20", "6,001" and "5.122,00" (the last because it is not preceded by "$R" or "$R ".
The regular expression can be written in free-spacing mode (/x) to make it self-documenting.
R = /
(?: # begin non-capture group
(?<=\bR\$) # positive lookbehind asserts match is preceded by 'R$'
# that is preceded by a word break
| # or
(?<=\bR\$\ ) # positive lookbehind asserts match is preceded by 'R$ '
# that is preceded by a word break
) # end non-capture group
(?<= # begin negative lookbehind
$R[ ]) # asserts that match is preceded by a space
(?: # begin non-capture group
0 # match zero
| # or
[1-9] # match a digit other than zero
\d{0,2} # match 0-2 digits
(?:\.\d{3}) # match '.' followed by three digits in a non-capture group
* # execute preceding non-capture group 0+ times
) # end non-capture group
,\d{2} # match ',' followed by two digits
(?!\d) # negative lookahead asserts match is not followed by a digit
/x
Here is a slightly longer, but perhaps simpler and easier to understand solution. You can use it as an alternative to the excellent and concise answer by Wiktor Stribiżew, and the very thorough and complete answer by Cary Swoveland. Note that my answer may not work for some (more complex) strings, as mentioned in the comment by Cary below.
str = "R$ 10.000,00 + Benefits"
puts str.gsub(/^.*?(\d+[\d.]*).*$/, '\1').gsub(/[.]/, '')
# => 10000
Here gsub is applied to the input string twice:
gsub(/^.*?(\d+[\d.]*).*$/, '\1') : grab 10.000 part.
^ is the beginning of the string.
.*? is any character repeated 0 or more times, non-greedy (that is, minimum number of times).
(\d[\d.]*) is any digit followed by digits or literal dots (.). The parenthesis capture this and put into the first capture group (to be used later as '\1' as the replacement string).
.* is any character repeated 0 or more times, greedy (that is, as many as possible).
$ is the end of the string.
Thus, we replace the entire string with the first captured group: '\1', which is 10.000 here. Remember to use single quotes around \1, otherwise escape it twice like so: "\\1".
gsub(/[.]/, '') : remove all literal dots (.) in the string.
Note that this code does the expected replacements for a number of similar strings (but nothing fancier, such as leaves 001 as is):
['R$ 10.000,00 + Benefits',
'R$ 0,00 + Benefits',
'R$ .001,00 + Benefits',
'. 10.000,00 + Benefits',].each do |str|
puts [str, str.gsub(/^.*?(\d+[\d.]*).*$/, '\1').gsub(/[.]/, '')].join(" => ")
end
Output:
R$ 10.000,00 + Benefits => 10000
R$ 0,00 + Benefits => 0
R$ .001,00 + Benefits => 001
. 10.000,00 + Benefits => 10000

REGEX unique numbers delimited by comma

I am trying to validate a comma separated list of numbers 1-31 unique (not repeating).
i.e.
2,4,6,7,1 is valid input.
2,2,6 is invalid
2 is valid
2, is invalid
1,2,3,4,15,6,7,31 is valid
1,2,3,4,15,6,7,32 is invalid
20,15,3
I tried
^((([0]?[1-9])|([1-2][0-9])|(3[01]))(?!([0]?[1-9])|([1-2][0-9])|(3[01])*,\\1(?!([0]?[1-9])|([1-2][0-9])|(3[01])) but it's accepting repeating numbers
For a number range that exceeds 1 digit, just add word boundary's around
the capture group and the back reference.
This isolates a complete number.
This particular one is numb range 1-31
^ # BOS
(?! # Validate no dups
.*
( # (1 start)
\b
(?: [1-9] | [1-2] \d | 3 [0-1] ) # number range 1-31
\b
) # (1 end)
.*
\b \1 \b
)
(?: [1-9] | [1-2] \d | 3 [0-1] ) # Unrolled-loop, match 1 to many numb's
(?: # in the number range 1-31
,
(?: [1-9] | [1-2] \d | 3 [0-1] )
)*
$ # EOS
var data = [
'2,4,6,7,1',
'2,2,6',
'2,30,16,3',
'2,',
'1,2,3,2',
'1,2,2,3',
'1,2,3,4,5,6,7,8'
];
data.forEach(function(str) {
document.write(str + ' gives ' + /^(?!.*(\b(?:[1-9]|[1-2]\d|3[0-1])\b).*\b\1\b)(?:[1-9]|[1-2]\d|3[0-1])(?:,(?:[1-9]|[1-2]\d|3[0-1]))*$/.test(str) + '<br/>');
});
I have created a pattern that can do this.
The pattern:^((?!(\d+),[^\n]*\b\2\b)([1-9]\b|[1-2]\d|3[0-1])(,(?1))?)$
A demo.
A short description.
^ - matches start of a line
(?!(\d+),[^\n]*\b\2\b) - Looks ahead to ensure the next number is not repeated
(\d+) - grab next number
,[^\n]* - a comma followed by anything but a new line
\b\2\b - The next number again repeated
([1-9]\b|[1-2]\d|3[0-1]) - Checks next number between 1-31
[1-9]\b - Checks for single digit. Boundary used so to prevent two digit numbers matching.
[1-2]\d - Checks for 10-29
3[0-1] - Checks for 30,31
(,(?1))?) If followed by comma recurse on main pattern to check if next number is repeated.
, - checks followed by acomma
(?1) - recurses on main pattern.
$ - End of line
Updated: Forgot to check 1-31
I totally agree that there are much better ways than regex to look for duplicates, but if you must do this as regex, here's a way (depending on your regex flavor).
See on regex101 (I have made it multiline and extended just for testing and readability).
^
(?!.*\b(\d+)\b.*\b\1\b)
(0?[1-9]|[12][0-9]|3[01])
(,(0?\d|[12][0-9]|3[01]))*
$
Explanation:
(?!.*\b(\d+)\b.*\b\1\b) is a negative lookahead to ensure there are no duplicates
(0?[1-9]|[12][0-9]|3[01]) matches the first number
(,(0?\d|[12][0-9]|3[01]))* matches any more
Note: updated to use word boundaries - based on answer from #sln

Specific password regular expression

I am having problems creating a regular expresion. It needs to fullfill the following:
1) Has 8-12 characters
2) At least 1 uppercase letter
3) At least 3 lowercase letters
4) At least 1 number
5) At least 1 special character
6) Has to start with a lowercase, upercase or numeric
7) Maximum of 2 repeating characters
Thanks in advance!
This should work
^(?=.*[A-Z])(?=(?:.*[a-z]){3})(?=.*[0-9])(?=.*[!"#$%&'()*+,\-./:;<=>?#[\]^_`{|}~])(?=(?:(.)(?!\1\1))+$)[a-zA-Z0-9].{7,11}$
Explained / Expanded
^ # BOS
(?= .* [A-Z] ) # 1 upper
(?=
(?: .* [a-z] ){3} # 3 lower
)
(?= .* [0-9] ) # 1 number
(?=
.* [!"#$%&'()*+,\-./:;<=>?#[\]^_`{|}~] # 1 special
)
(?= # Maximum 2 repeating
(?:
( . ) # (1)
(?! \1 \1 )
)+
$
)
[a-zA-Z0-9] # First alnum
.{7,11} # 8 to 12 max chars
$ # EOS
What you got so far?
Also, which set of regex are you using ?
I'd start with the length of the expression
Restrict it to be 8-12, something like [a-zA-Z]{8,12}
For the requirements on the first one you can use a []+
For the other requirements it's a little tricker

How do I perform this regex in order to extract the value of the variable

You can test everything out here:
I would like to extract the value of individual variables paying attention to the different ways they have been defined. For example, for dtime we want to extract 0.004. It also has to be able to interpret exponential numbers, like for example for variable vis it should extract 10e-6.
The problem is that each variable has its own number of white spaces between the variable name and the equal sign (i dont have control on how they have been coded)
Text to test:
dtime = 0.004D0
case = 0
newrun = 1
periodic = 0
iscalar = 1
ieddy = 1
mg_level = 5
nstep = 20000
vis = 10e-6
ak = 10e-6
g = 9.81D0
To extract dtime's value this REGEX works:
(?<=dtime =\s)[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
To extract dtime's value this REGEX works:
(?<=vis =\s)[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
The problem is that I need to know the exact number of spaces between the variable name and the equal sign. I tried using \s+ but it does not work, why?
(?<=dtime\s+=\s)[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
If you are using PHP or PERL or more generally PCRE then you can use the \K flag to solve this problem like this:
dtime\s+=\s\K[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
^^
Notice the \K, it tells the expression to ignore everything
behind it as if it was never matched
Regex101 Demo
Edit: I think you need to capture the number in a capturing group if you can't use look behinds or eliminate what was matched so:
dtime\s*=\s*([-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)
(?<=dtime\s+=\s) is a variable length lookbehind because of \s+. Most(not all) engines support only a 'fixed' length lookbehind.
Also, your regex requires a digit before the exponential form, so if there is no digit, it won't match. Something like this might work -
# dtime\s*=\s*([-+]?[0-9]*\.?[0-9]*(?:[eE][-+]?[0-9]+)?)
dtime \s* = \s*
( # (1)
[-+]? [0-9]* \.? [0-9]*
(?: [eE] [-+]? [0-9]+ )?
)
Edit: After review, I see you're trying to fold multiple optional forms into one regex.
I think this is not really that straight forward. Just as interest factor, this is probably a baseline:
# dtime\s*=\s*([-+]?(?(?=[\d.]+)(\d*\.\d+|\d+\.\d*|\d+|(?!))|)(?(?=[eE][-+]?\d+)([eE][-+]?\d+)|))(?(2)|(?(3)|(?!)))
dtime \s* = \s*
( # (1 start)
[-+]? # optional -+
(?(?= # conditional check for \d*\.\d*
[\d.]+
)
( # (2 start), yes, force a match on one of these
\d* \. \d+ # \. \d+
| \d+ \. \d* # \d+ \.
| \d+ # \d+
| (?!) # or, Fail the match, the '.' dot is there without a number
) # (2 end)
| # no, match nothing
)
(?(?= # conditional check for [eE] [-+]? \d+
[eE] [-+]? \d+
)
( [eE] [-+]? \d+ ) # (3), yes, force a match on it
| # no, match nothing
)
) # (1 end)
(?(2) # Conditional check - did we match something? One of grp2 or grp3 or both
| (?(3)
| (?!) # Did not match a number, Fail the match
)
)