Match everything but Regular Expression - regex

Been searching for a long time, reading up about negative/positive outlook but can't get this to match everything but my regular expression.
\b[A-Z]{1}\d{3,6}[A-Z0-9]+
is the string I don't want to extract.
(?!\b[A-Z]{1}\d{3,6}[A-Z0-9]+).*
is my best attempt using Negative Outlook, but it will still match the data.
I am using this Regex on:
11/02/2019 1 475.50 453.345 Serial number : C580A0453WD7996 
AFJ_LowGuard_NewNew
End User Details:
The output I want is:
11/02/2019 1 475.50 453.345 Serial number :
AFJ_LowGuard_NewNew
End User Details:

You can either use your regex to match and replace the match with empty string, that's one approach.
Another approach that you seem to be trying is, you can use this following regex to match anything but your regex,
\b(?:(?![A-Z]\d{3,6}[A-Z0-9]+).)+\b
Demo
This will match anything except your pattern. But personally I suggest replacing by matching your pattern should be easy.
Edit:
Ok I read your comment that you want to replace anything except the string matched by your pattern. In that case you can use following regex to match everything except your pattern and replace it with empty string to get your result,
\b(?:(?![A-Z]\d{3,6}[A-Z0-9]+).)+
Demo with replacement with empty string

Related

Regex formation and Issue in Negation

I need to create two regex
One, for catching these type of strings:
/xyz-courses/test/test
/abc-courses/test-abc/test-xyz
/abc-courses/test-abc/test-xyz?itsok=yes
But I don't want to match these strings where fixed word is prepended with -courses:
/fixed-courses/test/test
/fixed-courses/test-abc/test-xyz
/fixed-courses/test-abc/test-xyz?itsok=yes
I have created the following REGEX, which is working perfectly fine, but not sure about case how to exclude the prepended word fixed
/([^/]+)-courses/([^/]+)/([^/]+)$
Second, I need to create REGEX to negate all regex created in previous step.
I tried:
[^/([^/]+)-courses/([^/]+)/([^/]+)]$
But this is showing invalid on all REGEX checkers.
You may use this regex to disallow fixed- before courses:
^/((?!fixed-)[^/-]+)-courses/([^/]+)/([^/]+)$
RegEx Demo
(?!fixed-) is a negative lookahead that will fail the match if fixed- appears right after / and before courses/.
For second part use this to negate first regex:
^/(?!((?!fixed-)[^/-]+)-courses/([^/]+)/([^/]+)$).+
RegEx Demo 2

Trying to combine two Regex

I'm trying to combine two working regex patterns into one. Please let me know the correct syntax and if this can be better written.
Pattern 1: (?P<date>.*)\s+(?P<timezone>.*)\|.*\|.*\|(?P<ip>[\w*.:-]+)\|.*\|
Pattern 2: (?P<path>[^\/]+(?=\-[^\/-]*$))
Sample line:
06/Mar/2020:00:01:04 -0500|/TESTSTREAM|5766764|4.2.2.1|123290|path1/path2/x-fr-US.OPEN.1-Turtle-2020.30.04-64.mp3
The first expression matches the start of the string, the second matches the end, you can combine them by putting a non-greedy .*? between them, like this:
(?P<date>.*)\s+(?P<timezone>.*)\|.*\|.*\|(?P<ip>[\w*.:-]+)\|.*\|.*?(?P<path>[^\/]+(?=\-[^\/-]*$))
As you can see here this expression works, but it takes 1660 steps to match the string. This is because .* between | first capture the whole string up to the end, and then try to step back character by character in order to find the match.
If you use the non-greedy modifiers here: .*?, then the regex machine will initially match an empty string and then will need to move forward character by character until it finds the matching |. It will reduce the number of steps to 1183: demo
However, if you want to remove this backtracking (forward-tracking) at all, you can just very quickly skip as many non-| characters as possible with [^|]*. Similarly we can replace other .* patterns in the regex. The resulting regex finds a match in just 47 steps, more than 30-times less than the original regex:
(?P<date>\S*)\s+(?P<timezone>[^|]*)\|[^|]*\|[^|]*\|(?P<ip>[\w*.:-]+)\|[^|]*\|(?:[^\/\n]*\/)*(?P<path>.*)-.*
Demo here.
Update 2020-03-09
If you want to keep the last slash you can use this regex:
(?P<date>\S*)\s+(?P<timezone>[^|]*)\|[^|]*\|[^|]*\|(?P<ip>[\w*.:-]+)\|[^|]*\|.*?(?P<path>\/[^\/]*)-[^\/]*

Regex: ignore characters that follow

I'd like to know how can I ignore characters that follows a particular pattern in a Regex.
I tried with positive lookaheads but they do not work as they preserves those character for other matches, while I want them to be just... discarded.
For example, a part of my regex is: (?<DoubleQ>\"\".*?\"\")|(?<SingleQ>\".*?\")
in order to match some "key-parts" of this string:
This is a ""sample text"" just for "testing purposes": not to be used anywhere else.
I want to capture the entire ""sample text"", but then I want to "extract" only sample text and the same with testing purposes. That is, I want the group to match to be ""sample text"", but then I want the full match to be sample text. I partially achieved that with the use of the \K option:
(?<DoubleQ>\"\"\K.*?\"\")|(?<SingleQ>\"\K.*?\")
Which ignores the first "" (or ") from the full match but takes it into account when matching the group. How can I ignore the following "" (")?
Note: positive lookahead does not work: it does not ignore characters from the following matches, it just does not include them in the current match.
Thanks a lot.
I hope I got your questions right. So you want to match the whole string including the quotes, but you want to replace/extract it only the expression without the quotes, right?
You typically can use the regex replace functionality to extract just a part of the match.
This is the regex expression:
""?(.*?)""?
And this the replace expression:
$1

Go regex to match all lines that don't start with timestamp

Can anybody explain what the correct Java regex is to match all lines that don't start with timestamp [0-9]{4}-[0-9]{2}-[0-9]{2}?
I am trying to use ^(^[0-9]{4}-[0-9]{2}-[0-9]{2}) but it doesn't work.
Your ^(^[0-9]{4}-[0-9]{2}-[0-9]{2}) pattern matches a string starting with the pattern you defined (the ^ here just matches the start of a string).
In Go lang, the regex engine does not support lookarounds, and thus it is difficult to create a readable regex that would do the required job.
I suggest you remove all lines that match your pattern
(?m)\s*^[0-9]{4}-[0-9]{2}-[0-9]{2}.*
(see demo) and then split the result with line breaks to get the lines that did not match the pattern.

Regex - Match the pattern of a string representing a variable and its assigned value

I am trying to find a regex pattern that would enable me to swiftly search through my source code to find the following string pattern:
placeholder="any text here"
I have tried the following regex pattern however, it does not exclusively capture strings beginning with the sub-string "placeholder".
placeholder=\".+\"
You need to make lazy, with ?. Otherwise it captures the maximal possible match. Also, no need to escape the quotes.
placeholder=".+?"