Go regex to match all lines that don't start with timestamp - regex

Can anybody explain what the correct Java regex is to match all lines that don't start with timestamp [0-9]{4}-[0-9]{2}-[0-9]{2}?
I am trying to use ^(^[0-9]{4}-[0-9]{2}-[0-9]{2}) but it doesn't work.

Your ^(^[0-9]{4}-[0-9]{2}-[0-9]{2}) pattern matches a string starting with the pattern you defined (the ^ here just matches the start of a string).
In Go lang, the regex engine does not support lookarounds, and thus it is difficult to create a readable regex that would do the required job.
I suggest you remove all lines that match your pattern
(?m)\s*^[0-9]{4}-[0-9]{2}-[0-9]{2}.*
(see demo) and then split the result with line breaks to get the lines that did not match the pattern.

Related

Trying to combine two Regex

I'm trying to combine two working regex patterns into one. Please let me know the correct syntax and if this can be better written.
Pattern 1: (?P<date>.*)\s+(?P<timezone>.*)\|.*\|.*\|(?P<ip>[\w*.:-]+)\|.*\|
Pattern 2: (?P<path>[^\/]+(?=\-[^\/-]*$))
Sample line:
06/Mar/2020:00:01:04 -0500|/TESTSTREAM|5766764|4.2.2.1|123290|path1/path2/x-fr-US.OPEN.1-Turtle-2020.30.04-64.mp3
The first expression matches the start of the string, the second matches the end, you can combine them by putting a non-greedy .*? between them, like this:
(?P<date>.*)\s+(?P<timezone>.*)\|.*\|.*\|(?P<ip>[\w*.:-]+)\|.*\|.*?(?P<path>[^\/]+(?=\-[^\/-]*$))
As you can see here this expression works, but it takes 1660 steps to match the string. This is because .* between | first capture the whole string up to the end, and then try to step back character by character in order to find the match.
If you use the non-greedy modifiers here: .*?, then the regex machine will initially match an empty string and then will need to move forward character by character until it finds the matching |. It will reduce the number of steps to 1183: demo
However, if you want to remove this backtracking (forward-tracking) at all, you can just very quickly skip as many non-| characters as possible with [^|]*. Similarly we can replace other .* patterns in the regex. The resulting regex finds a match in just 47 steps, more than 30-times less than the original regex:
(?P<date>\S*)\s+(?P<timezone>[^|]*)\|[^|]*\|[^|]*\|(?P<ip>[\w*.:-]+)\|[^|]*\|(?:[^\/\n]*\/)*(?P<path>.*)-.*
Demo here.
Update 2020-03-09
If you want to keep the last slash you can use this regex:
(?P<date>\S*)\s+(?P<timezone>[^|]*)\|[^|]*\|[^|]*\|(?P<ip>[\w*.:-]+)\|[^|]*\|.*?(?P<path>\/[^\/]*)-[^\/]*

Match everything but Regular Expression

Been searching for a long time, reading up about negative/positive outlook but can't get this to match everything but my regular expression.
\b[A-Z]{1}\d{3,6}[A-Z0-9]+
is the string I don't want to extract.
(?!\b[A-Z]{1}\d{3,6}[A-Z0-9]+).*
is my best attempt using Negative Outlook, but it will still match the data.
I am using this Regex on:
11/02/2019 1 475.50 453.345 Serial number : C580A0453WD7996 
AFJ_LowGuard_NewNew
End User Details:
The output I want is:
11/02/2019 1 475.50 453.345 Serial number :
AFJ_LowGuard_NewNew
End User Details:
You can either use your regex to match and replace the match with empty string, that's one approach.
Another approach that you seem to be trying is, you can use this following regex to match anything but your regex,
\b(?:(?![A-Z]\d{3,6}[A-Z0-9]+).)+\b
Demo
This will match anything except your pattern. But personally I suggest replacing by matching your pattern should be easy.
Edit:
Ok I read your comment that you want to replace anything except the string matched by your pattern. In that case you can use following regex to match everything except your pattern and replace it with empty string to get your result,
\b(?:(?![A-Z]\d{3,6}[A-Z0-9]+).)+
Demo with replacement with empty string

Regex: ignore characters that follow

I'd like to know how can I ignore characters that follows a particular pattern in a Regex.
I tried with positive lookaheads but they do not work as they preserves those character for other matches, while I want them to be just... discarded.
For example, a part of my regex is: (?<DoubleQ>\"\".*?\"\")|(?<SingleQ>\".*?\")
in order to match some "key-parts" of this string:
This is a ""sample text"" just for "testing purposes": not to be used anywhere else.
I want to capture the entire ""sample text"", but then I want to "extract" only sample text and the same with testing purposes. That is, I want the group to match to be ""sample text"", but then I want the full match to be sample text. I partially achieved that with the use of the \K option:
(?<DoubleQ>\"\"\K.*?\"\")|(?<SingleQ>\"\K.*?\")
Which ignores the first "" (or ") from the full match but takes it into account when matching the group. How can I ignore the following "" (")?
Note: positive lookahead does not work: it does not ignore characters from the following matches, it just does not include them in the current match.
Thanks a lot.
I hope I got your questions right. So you want to match the whole string including the quotes, but you want to replace/extract it only the expression without the quotes, right?
You typically can use the regex replace functionality to extract just a part of the match.
This is the regex expression:
""?(.*?)""?
And this the replace expression:
$1

Regex to match certain word but not a particular combination

I have 15 titles as follows:
fruits-and-flowers-themeA
fruits-and-flowers-themeB
fruits-and-flowers-just-test-themeA
themeAfruitsandflowers
nice-fruits-and-flowers-themeA
botanical-names-themeA
I want a regex to help me get only those titles with "themeA" in them, but it should not include "nice" and not include "just-test" or "just-tests".
I tried
^(?!.*just-test|*just-tests|nice).*?(?:themeA).*,
but I still get fruits-and-flowers-just-test-themeA in the output.
How to fix this?
Thanks
You can use this regex with negative lookahead:
^(?!.*?(?:just-tests?|nice)).*?themeA.*$
Working Demo
Option 1
You can use a single regex with lookaheads (see online demo):
^(?!.*nice?)(?!.*just-tests?).*themeA.*
The ^ asserts that the match starts at the beginning of the string (so we don't match a subset of the string
The (?!.*nice?) is a negative lookahead that asserts that at this position in the string, we cannot find any characters followed by nice
The (?!.*just-tests?) is a negative lookahead that asserts that at this position in the string, we cannot find any characters followed by just-test and an optional s
As a further tweak, you can compress the lookaheads into one using an | alternation as in anubhava's answer.
Option 2 without lookaheads (Perl, PHP/PCRE)
^(?:.*(?:nice|just-tests?).*)(*SKIP)(?!)|.*themeA.*
This one doesn't use lookaheads but just skips the unwanted titles. See demo.
Use two different regular expressions for clarity and simplicity.
Match your string against one regex that matches themeA:
/themeA/
and then check that the string does NOT match the one you don't want:
/nice|just-tests?/
Doing it in two different regexes makes it far easier to understand and maintain.

Delphi TRegEx zero - length

I want to Match the Content between '(' and ')' of
Path()
Path(C:\...)
with
(?<=^Path\()(.*)(?=\))
In Notepad++ it matches '' <-- zero length match and 'C:...'.
But using Delphi XE3:
if TRegEx.IsMatch(pDef, '(?<=^Path\()(.*)(?=\))') then begin
does only match 'C:\...' but I need the empty match.
Try with that regex:
Path\((.*)\)
This also match the empty match, as in your example.
Online Demo
Delphi's TRegEx skips all zero-length matches. See QC104562 for details.
Your regex will work with Delphi's TPerlRegEx if you exclude preNotEmpty from the State property.
That said, using lookaround to try to isolate part of the regex match results in inefficient regexes. Much better to use something like Path\(([^)\r\n]*)\) or Path\((.*)\) and retrieve the text matched by the first capturing group to get the actual path. The first regex will correctly match Path(...) when there are additional ) characters on the same line but will not correctly handle paths that contain ) characters.