I am trying to match, the following cases:
1. Get either between or if only one x exists the end
Example:
| Matches/Cases | Result |
|-------------------|--------|
| 200 x 90 x 14 | 90 |
| 90x200 | 200 |
| 200 x 90x20 | 90 |
| 60,4 x46,5 x 42,6 | 46,5 |
| 90x190,9 | 190,9 |
2. Get if two x exist the final one, and if only one exist no result
Examples:
| Matches/Cases | Result |
|-------------------|--------|
| 200 x 90 x 14 | 14 |
| 90x200 | - |
| 200 x 90x20 | 20 |
| 60,4 x46,5 x 42,6 | 42,6 |
| 90x190,9 | - |
I stuck at getting one specific case! I tried to match with the following regex x\s?((\d+(?:,\d+)?))\s?, but I still get only the last part of the cases like for 90x200 I get 200, but for 200 x 90 x 14 I get 90 x 14.
Any suggestions of two regex that works for case 1 or case 2?
I appreciate your replies!
I tried to match with the following regex x\s?((\d+(?:,\d+)?))\s?, but
I still get only the last part of the cases.
Actually by your own RegEx you are going to capture all digits or floats followed by a x. So it's not only last part but all similar occurrences.
Solution (main regex):
(?: *(\d+(?:,\d+)?) *(?:x|$))
If you want it for case #1 then append quantifier {2}
(?: *(\d+(?:,\d+)?) *(?:x|$)){2}
Live demo
If you want it for case #2 append quantifier {3}
(?: *(\d+(?:,\d+)?) *(?:x|$)){3}
Live demo
m modifier should be set in both cases
Just turning the comments into an answer.
For the first case, you could use:
[\d,]+\h*x\h*([\d,]+)(?:\h*x*[\d,]+)?
See a demo on regex101.com.
And the second:
[\d,]+\h*x\h*[\d,]+\h*x\h*([\d,]+)
Another demo on regex101.com.
Hint: Replace \h with either [ ]* or \s* if it is not supported.
Related
I am trying to create a pattern that matches numeric digits but exclude those which starts with any alphabets/words.
This is the sample text that I am trying to match :
| 30 | 00:45.3 | 00:42.4 | 2.4869 | 5.6578
| event/slno1 | 00:45.3 | 00:42.4 | 2.4869 | 5.6578
| event/slno2 | 00:00.0 | 00:00.0 | 0.0000 | 0.0000
| event/slno3 | 00:45.3 | 00:42.4 | 2.4869 | 5.6578
| event/slno4 | 00:00.0 | 00:00.0 | 0.0000 | 0.0000
I wrote this:
(\d+)|\s+(\d\d):(\d+\.\d)\s+|(\d\d):(\d+\.\d)\s+|(\d+\.\d+)\s+|(\d+\.\d+)
i want to match only the
30 00:45.3 00:42.4 2.4869 5.6578 part and ignore th rest. How can I ignore the additional matches ?sure how i can negate the other ones.
Here the sample : https://regex101.com/r/ArZB3O/1
As you want to match whole lines, you should anchor your matches to the beginning and end of lines. Your input is composed of fields prefixed by a vertical bar, and the ones of interest are the ones that are composed of a sequence of fields that have (to not complicate further on the format of the numeric inputs) digits, colon and dots. So you can use this regexp to do that:
^\s*(\|\s+[0-9:.]+\s*)+$
as demonstrated by this demo As you matching string started after some whitespace, I have added support for it with the first \s*. then it comes a repeating group of one or more sequences of one vertical bar, some whitespace, and some sequence of digits, colons or dots. If you want to be more precise, you can specify the substructure of [0-9:.]+ as they follow a pattern, but I think for your problem it is enough with this.
I am trying to match a param string but exclude any matches when a substring is present.
From my limited regex knowledge this should work to exlude any string containing "porcupine", but it's not. What am I doing wrong?
(\/animal\?.*(?!porcupine).*color=white)
Expected Outcome
| string | matches? |
| ----------------------------------------------- | -------- |
| /animal?nose=wrinkly&type=porcupine&color=white | false |
| /animal?nose=wrinkly&type=puppy&color=white | true |
Actual Outcome
| string | matches? |
| ----------------------------------------------- | -------- |
| /animal?nose=wrinkly&type=porcupine&color=white | true |
| /animal?nose=wrinkly&type=puppy&color=white | true |
Use a Tempered Greedy Token:
/animal\?(?:(?!porcupine).)*color=white
Demo & explanation
The .* searches anything for any number of times, greedily. So you could replace it with a literal search:
(\/animal\?nose=wrinkly\&type=(?!porcupine).*color=white)
See example here: https://regex101.com/r/HJiM2N/1
This may seem overly verbose but it is actually relatively efficient in the number of steps:
(?!\/animal.*?porcupine.*?color)\/animal\?.*color=white
See Regex Demo
If the input string consists of only one and only one occurrence of what you are trying to match and nothing else, then just use the following to ensure that porcupine does not occur anywhere in the input string:
(?!.*porcupine)\/animal\?.*color=white
The code:
import re
tests = [
'/animal?nose=wrinkly&type=porcupine&color=white',
'/animal?nose=wrinkly&type=puppy&color=white'
]
rex = r'(?!\/animal.*?porcupine.*?color)\/animal\?.*color=white'
for test in tests:
m = re.search(rex, test)
print(test, 'True' if m else 'False')
Prints:
/animal?nose=wrinkly&type=porcupine&color=white False
/animal?nose=wrinkly&type=puppy&color=white True
I am trying to clean up some text where there are separators with no text before/after it.
The 'types' are
3150779 | 3674-4 |Water Supply Plan
3637730 |
| 10903-155 | Layout 10903 DWG 155 29 M |
| 10903-155 | | Water Supply |
I understand [^\|]+ splits this but I want to get rid of the separator when there's no text before/after the separator. So the regex should result in
3150779 | 3674-4 | Water Supply Plan
3637730
10903-155 | Layout 10903 DWG 155 29 M
10903-155 | Water Supply
I would like to apply this in a google sheet where the cleaned text goes only into one column.
See https://regex101.com/r/GzbCEU/1
I have also tried [\s]+\|\s(.*) and this selects the separators but doesn't clean the text.
--- UPDATE ---
When I try the suggestion from Pushpesh Kumar Rajwanshi I get no value in GSheet...
and also same issue
You can use this regex,
^ *(?:\| *)+| *(?:\| *)+$|(\| *){2,}
Explanation:
There are three alternation parts for handling three cases.
^ *(?:\| *)+ - This one replaces all the | that are in the beginning optionally having spaces in middle of them
| - Alternation
*(?:\| *)+$ - This one replaces all the | that are in the end optionally having spaces in middle of them
(\| *){2,} - This one replaces all the | that are more than two optionally having spaces in between them but preserves the last spacing.
And replace it with $1 that works in google sheets.
Notice the replacement with $1 only happens when | are matched in third alternation group where it just retains one | out of multiple.
Demo
Edit: screenshots showing how to find/replace using regex,
Before replacement
After replacement
I think this re should work for you:
/[ ]*(?<![\d][ \*])\| | \|$/gm
Demo (be sure to open the "Substitution" accordion at the bottom of the demo page to see output)
$re = '/[ ]*(?<![\d][ \*])\| | \|$/m';
$str = '3150779 | 3674-4 | Water Supply Plan
3637730 |
| 10903-155 | Layout 10903 DWG 155 29 M |
| 10903-155 | | Water Supply |';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
Output:
3150779 | 3674-4 | Water Supply Plan
3637730
10903-155 | Layout 10903 DWG 155 29 M
10903-155 |Water Supply
I have a string $s1 = "a_b"; and I want to match this string but only capture the letters. I tried to use a lookahead:
if($s1 =~ /([a-z])(?=_)([a-z])/){print "Captured: $1, $2\n";}
but this does not seem to match my string. I have solved the original problem by using a (?:_)instead, but I am curious to why my original attempt did not work? To my understanding a lookahead matches but do not capture, so what did I do wrong?
A lookahead looks for next immediate positions and if a true-assertion takes place it backtracks to previous match - right after a - to continue matching. Your regex would work only if you bring a _ next to the positive lookahead ([a-z])(?=_)_([a-z])
You even don't need (non-)capturing groups in substitution:
if ($s1 =~ /([a-z])_([a-z])/) { print "Captured: $1, $2\n"; }
Edit
In reply to #Borodin's comment
I think that moving backwards is the same as a backtrack which is more recognizable by debugging the whole thing (Perl debug mode):
Matching REx "a(?=_)_b" against "a_b"
.
.
.
0 <> <a_b> | 0| 1:EXACT <a>(3)
1 <a> <_b> | 0| 3:IFMATCH[0](9)
1 <a> <_b> | 1| 5:EXACT <_>(7)
2 <a_> <b> | 1| 7:SUCCEED(0)
| 1| subpattern success...
1 <a> <_b> | 0| 9:EXACT <_b>(11)
3 <a_b> <> | 0| 11:END(0)
Match successful!
As above debug output shows at forth line of results (when 3rd step took place) engine consumes characters a_ (while being in a lookahead assertion) and then we see a backtrack happens after successful assertion of positive lookahead, engine skips whole sub-pattern in a reverse manner and starts at the position right after a.
At line #5, engine has consumed one character only: a. Regex101 debugger:
How I interpret this backtrack is more clear in this illustration (Thanks to #JDB, I borrowed his style of representation)
a(?=_)_b
*
|\
| \
| : a (match)
| * (?=_)
| |↖
| | ↖
| |↘ ↖
| | ↘ ↖
| | ↘ ↖
| | : _ (match)
| | ^ SUBPATTERN SUCCESS (OP_ASSERT :=> MATCH_MATCH)
| * _b
| |\
| | \
| | : _ (match)
| | : b (match)
| | /
| |/
| /
|/
MATCHED
By this I mean if lookahead assertion succeeds - since extraction of parts of input string is happened - it goes back upward (back to previous match offset - (eptr (pointer into the subject) is not changed but offset is) and while resetting consumed chars it tries to continue matching from there and I call it a backtrack. Below is a visual representation of steps taken by engine with use of Regexp::Debugger
So I see it a backtrack or a kind of, however if I'm wrong with all these said, then I'd appreciate any reclaims with open arms.
I'm trying to match a number pattern in a text file.
The file can contain values such as
12345 567890
90123 string word word 54616
98765
The pattern should match on any line that contains a 5 digit number that does not start with 1234
I have tried using ((?!1234).*)[[:digit:]]{5} but it does not give the desired results.
Edit: The pattern can occur anywhere in the line and should still match
Any suggestions?
This regex should work for matching a line containing a number at least 5 digits long iff the line does not start with '12345':
^((?!12345).*\d{5}.*)$
Short explanation:
^((?!12345).*\d{5}.*)$ _____________
^ \_______/\/\___/\/ ^__|match the end|
_____________________________| | _| | |__ |of the line |
|match the start of a line| | | __|____ |
______________________________|_ | |match ey| |
|look ahead and make sure the | | |exactly | |
|line does not begin with "12345"| | |5 digits| |
___|_____ |
|match any|______|
|character|
|sequence |
EDIT:
It seems that the question has been edited, so this solution no longer reflects the OP's requirements. Still I'll leave it here in case someone looking for something similar lands on this page.
The following would work, using \b to match word boundaries such as start of string or space:
\b(?!12345)\d{5}.*
try this, contains at least 5 decimal digits but not 12345 using a negative look behind
\d{5,}(?<!12345)