I think I have solved this, but I'm wondering if anyone sees a flaw or a better method:
Using a regular expression in Notebook++ I'm trying to remove all strings that contain static and variable info like this:
{start of line},1,NRAG-E21-PRDCT-DT-CRWLR-8416 Result Data,NRAG-E21-PRDCT-DT-CRWLR-8416 Result Data,1,http:<l></l>//www.url.com/product/10E026,
-note: both ,1, strings are variable as well ,1, ,2, ,3, etc...
The advantage that I have is that it appears at the end of the string - just before the comma - the pattern is always [0-9] [A-Z] [0-9]
it, therefore, seems that this should work:
^.*?\/[0-9]+[A-Z]+[0-9]+,
That selects the start of the line ^ followed by everything before the pattern that looks like /10E026 and the comma at the end.
Does anybody see a flaw or a better way to find a string like that?
That selects the start of the line ^ followed by everything before the pattern that looks like /10E026 and the comma at the end.
That is not so. your ^.*?\/[0-9]+[A-Z]+[0-9]+, matches the start of a line (^), any 0+ chars other than a newline, as few as possible up to the first /, then a /, 1+ digits, 1+ uppercase ASCII letters, 1+ digits, and a comma - anywhere inside the string
It seems you need to match up to the last occurrence of the /xxAAAxxx, pattern:
^.*/[0-9]+[A-Z]+[0-9]+,
See the regex demo
Pattern details:
^ - start of a line (in Notepad++, ^ matches line start by default)
.* - 0+ any chars but a newline, greedily, up to the last...
/ - forward slash (no need escaping here)
[0-9]+[A-Z]+[0-9]+ - 1+ digits, 1+ uppercase letters, 1+ digits
, - a comma.
Related
I have fields which contain data in the following possible formats (each line is a different possibility):
AAA - Something Here
AAA - Something Here - D
Something Here
Note that the first group of letters (AAA) can be of varying lengths.
What I am trying to capture is the "Something Here" or "Something Here - D" (if it exists) using PCRE, but I can't get the Regex to work properly for all three cases. I have tried:
- (.*) which works fine for cases 1 and 2 but obviously not 3;
(?<= - )(.*) which also works fine for cases 1 and 2;
(?! - )(.+)| - (.+) works for cases 2 and 3 but not 1.
I feel like I'm on the verge of it but I can't seem to crack it.
Thanks in advance for your help.
Edit: I realized that I was unclear in my requirements. If there is a trailing " - D" (the letter in the data is arbitrary but should only be a single character), that needs to be captured as well.
About the patterns that you tried:
- (.*)This pattern will match the first occurrence of - followed by matching the rest of the line. It will match too much for the second example as the .* will also match the second occurrence of -
(?<= - )(.*)This pattern will match the same as the first example without the - as it asserts that is should occur directly to the left
(?! - )(.+)| - (.+) This pattern uses a negative lookahead which asserts what is directly to the right is not (?! - ). As none of the example start with - , the whole line will be matched directly after the negative lookahead due to .+ and the second part after the alternation | will not be evaluated
If the first group of letters can be of varying length, you could make the match either specific matching 1 or more uppercase characters [A-Z]+ or 1+ word characters \w+.
To get a more broad match, you could match 1 or more non whitespace characters using \S+
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*
Explanation
^ Start of string
(?:\S+\h-\h)? Optionally match the first group of non whitespace chars followed by - between horizontal whitespace chars
\K Clear the match buffer (Forget what is currently matched)
\S+ Match 1+ non whitespace characters
(?: Non capture group
\h(?!-\h) Match a horizontal whitespace char and assert what is directly to the right is not - followed by another horizontal whitespace char
\S+ Match 1+ non whitespace chars
)* Close non capture group and repeat 1+ times to match more "words" separated by spaces
Regex demo
Edit
To match an optional hyphen and trailing single character, you could add an optional non capturing group (?:-\h\S\h*)?$ and assert the end of the string if the pattern should match the whole string:
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*\h*(?:-\h\S\h*)?$
Regex demo
You may use
^(?:.*? - )?\K.*?(?= - | *$)
^(?:.*?\h-\h)?\K.*?(?=\h-\h|\h*$)
See the regex demo
Details
^ - start of string
-(?:.*? - )? - an optional non-capturing group matching any 0+ chars other than line break chars as few as possible up to the first space-space
\K - match reset operator
.*? - any 0+ chars other than line break chars as few as possible
(?= - | *$) - space-space or 0+ spaces till the end of string should follow immediately on the right.
Note that \h matches any horizontal whitespace chars.
^(?:[A-Z]+ - \K)?.*\S
demo
Since "Something Here" can be anything, there's no reason to specially describe the eventual last letter in the pattern. You don't need something more complicated.
With this pattern I assume that you are not interested by the trailing spaces, that's why I ended it with \S. If you want to keep them, remove the \S and change the previous quantifier to +.
I cannot figure out how to add two regex together, I have these requirements:
Letters and space ^[\p{L} ]+$
Cannot be whitespace ^[^\s]+$
I cannot figure out how to write one regex that will combine both? There is perhaps some other solution?
You may use
^(?! +$)[\p{L} ]+$
^(?!\s+$)[\p{L}\s]+$
^\s*\p{L}[\p{L}\s]*$
Details
^ - start of string
(?!\s+$) - no 1 or more whitespaces are allowed till the end of the string
[\p{L}\s]+ - 1+ letters or whitespaces
$ - end of string.
See the regex demo.
The ^\s*\p{L}[\p{L}\s]*$ is a regex that matches any 0+ whitespaces at the start of the string, then requires a letter that it consumes, and then any 0+ letters/whitespaces may follow.
See the regex demo.
I have a string which contains the rego number of the car like
1FX9JE - 2012 Audi A3 Ambition Sportback MY12 Stronic
I would like to match everything except the rego number, so anything after the dash.
The regex I came up with is (php)
\s.[^-]*$
My initial regex which i came up can match anything after the dash only if the string contains only 1 dash. For example https://regex101.com/r/Jao8W0/1
However, if the string has more than 1 dash. The regex is not usable.
For example : https://regex101.com/r/Jao8W0/2
Is there anyway for me to match anything after the first dash even though the string contains additional dash after the first dash.
Thank you
Try this Regex:
^[^-\r\n]+-\s*\K.*$
Click for Demo
Explanation:
^ - asserts the start of the string
[^-\r\n]+ - matches 1+ occurrences of any character that is neither a - or nor a newline
-\s* - matches the first - in the string followed by 0+ whitespaces
\K - forgets everything matched so far
.* - matches 0+ occurrences of any character
$ - asserts the end of the string
if only has one space, you can use this pattern:
(?<=\-\s)(.*)
else if there may have more than one space, get the group(1) from match
(?<=\-)\s*(.*)
(?<=...) Ensures that the given pattern will match, ending at the
current position in the expression. The pattern must have a fixed
width. Does not consume any characters.
I want my regex to allow alphanumeric characters, "/_-" and white spaces in between but it must always have at least one alphanumeric character.
my validation goes like this,
/^([A-Za-z0-9/-]+[A-Za-z0-9/-\s]*[A-Za-z0-9/_-]+)$/
It should accept **ABC_1-2-3 but it must not allow 123 or -_/ alone
Can somebody help me please.
The below given regex will capture strings with alpha-numeric characters with optional white space, hyphen and underscore in it. Try it.
([*A-Za-z]+(\s+)?([\d\-_]+)?)
Your regex is almost right, you need to add 2 positive lookaheads at the start to require at least 1 letter and at least 1 digit:
/^(?=.*[a-z])(?=.*\d)[a-z0-9\/_-][a-z0-9\/_\s-]*[a-z0-9\/_-]$/i
See the regex demo (in the demo, \s is replaced with a space since the demo is multiline).
Details:
^ - start of string
(?=.*[a-z]) - after any 0+ chars other than line break chars, there must be at least 1 letter (replace .* with [^a-z]* for better performance)
(?=.*\d) - after any 0+ chars other than line break chars, there must be at least 1 digit (replace.with\D` for better performance)
[a-z0-9\/_-] - a letter, digit, /, _ or -
[a-z0-9\/_\s-]* - 0+ letters, digits, /, whitespaces, _ or -
[a-z0-9\/_-] - a letter, digit, /, _ or -
$ - end of string.
The i modifier makes the pattern case insensitive.
I have a source string that looks like this: mID00231mID00008mID00231mID00054mID00013mID00008mID00065
The pattern I am trying to create, using this example, is: For the last occurrence of "mID00231" in the string, one or more occurrences of each of {mID00054, mID00013, mID00008, mID00065} must follow it (in any order).
Examples of matches:
mID00231mID00008mID00231mID00054mID00013mID00008mID00065
mID00231mID00013mID00054mID00008mID00065mID00008
Example of no match because of missing "mID00065":
mID00231mID00054mID00013mID00008
Example of no match because the last occurrence of "mID00231" is not followed by a "mID00054" and a "mID00008":
mID00231mID00013mID00065mID00054mID00008mID00231mID00013mID00065
I am fairly new to regex but usually arrive at something that works. This one has been very difficult. I tried this:
(?:mID00231)(?:(?=.*mID00054)(?=.*mID00013)(?=.*mID00008)(?=.*mID00065).*)
It works if there is only one occurrence of the first element (mID00231). If the element repeats, the pattern fails. Any help is appreciated.
You need to fail the match if there is the same value with a negative lookahead:
mID00231((?!.*mID00231)(?=.*mID00054)(?=.*mID00013)(?=.*mID00008)(?=.*mID00065).*)
^^^^^^^^^^^^^^
See the regex demo.
Details:
mID00231 - match a literal mID00231 text
( - start of the capturing group
(?!.*mID00231) - there cannot be mID00231 anywhere after 0+ any chars but a newline
(?=.*mID00054) - there must be mID00054 anywhere after 0+ any chars but a newline
(?=.*mID00013) - there must be mID00013 anywhere after 0+ any chars but a newline
(?=.*mID00008) - there must be mID00008 anywhere after 0+ any chars but a newline
(?=.*mID00065) - there must be mID00065 anywhere after 0+ any chars but a newline
.* - 0+ any chars but a newline
) - end of the capturing group.