Regex: ignore characters that follow - regex

I'd like to know how can I ignore characters that follows a particular pattern in a Regex.
I tried with positive lookaheads but they do not work as they preserves those character for other matches, while I want them to be just... discarded.
For example, a part of my regex is: (?<DoubleQ>\"\".*?\"\")|(?<SingleQ>\".*?\")
in order to match some "key-parts" of this string:
This is a ""sample text"" just for "testing purposes": not to be used anywhere else.
I want to capture the entire ""sample text"", but then I want to "extract" only sample text and the same with testing purposes. That is, I want the group to match to be ""sample text"", but then I want the full match to be sample text. I partially achieved that with the use of the \K option:
(?<DoubleQ>\"\"\K.*?\"\")|(?<SingleQ>\"\K.*?\")
Which ignores the first "" (or ") from the full match but takes it into account when matching the group. How can I ignore the following "" (")?
Note: positive lookahead does not work: it does not ignore characters from the following matches, it just does not include them in the current match.
Thanks a lot.

I hope I got your questions right. So you want to match the whole string including the quotes, but you want to replace/extract it only the expression without the quotes, right?
You typically can use the regex replace functionality to extract just a part of the match.
This is the regex expression:
""?(.*?)""?
And this the replace expression:
$1

Related

Match everything but Regular Expression

Been searching for a long time, reading up about negative/positive outlook but can't get this to match everything but my regular expression.
\b[A-Z]{1}\d{3,6}[A-Z0-9]+
is the string I don't want to extract.
(?!\b[A-Z]{1}\d{3,6}[A-Z0-9]+).*
is my best attempt using Negative Outlook, but it will still match the data.
I am using this Regex on:
11/02/2019 1 475.50 453.345 Serial number : C580A0453WD7996 
AFJ_LowGuard_NewNew
End User Details:
The output I want is:
11/02/2019 1 475.50 453.345 Serial number :
AFJ_LowGuard_NewNew
End User Details:
You can either use your regex to match and replace the match with empty string, that's one approach.
Another approach that you seem to be trying is, you can use this following regex to match anything but your regex,
\b(?:(?![A-Z]\d{3,6}[A-Z0-9]+).)+\b
Demo
This will match anything except your pattern. But personally I suggest replacing by matching your pattern should be easy.
Edit:
Ok I read your comment that you want to replace anything except the string matched by your pattern. In that case you can use following regex to match everything except your pattern and replace it with empty string to get your result,
\b(?:(?![A-Z]\d{3,6}[A-Z0-9]+).)+
Demo with replacement with empty string

Regex - Match the pattern of a string representing a variable and its assigned value

I am trying to find a regex pattern that would enable me to swiftly search through my source code to find the following string pattern:
placeholder="any text here"
I have tried the following regex pattern however, it does not exclusively capture strings beginning with the sub-string "placeholder".
placeholder=\".+\"
You need to make lazy, with ?. Otherwise it captures the maximal possible match. Also, no need to escape the quotes.
placeholder=".+?"

Trying to figure out how to capture text between slashes regex

I have a regex
/([/<=][^/]*[/=?])$/g
I'm trying to capture text between the last slashes in a file path
/1/2/test/
but this regex matches "/test/" instead of just test. What am I doing wrong?
You need to use lookaround assertions.
(?<=\/)[^\/]*(?=\/[^\/]*$)
DEMO
or
Use the below regex and then grab the string you want from group index 1.
\/([^\/]*)\/[^\/]*$
The easy way
Match:
every character that is not a "/"
Get what was matched here. This is done by creating a backreference, ie: put inside parenthesis.
followed by "/" and then the end of string $
Code:
([^/]*)/$
Get the text in group(1)
Harder to read, only if you want to avoid groups
Match exactly the same as before, except now we're telling the regex engine not to consume characters when trying to match (2). This is done with a lookahead: (?= ).
Code:
[^/]*(?=/$)
Get what is returned by the match object.
The issue with your code is your opening and closing slashes are part of your capture group.
Demo
text: /1/2/test/
regex: /\/(\[^\/\]*?)(?=\/)/g
captures a list of three: "1", "2", "test"
The language you're using affects the results. For instance, JavaScript might not have certain lookarounds, or may actually capture something in a non-capture group. However, the above should work as intended. In PHP, all / match characters must be escaped (according to regex101.com), which is why the cleaner [/] wasn't used.
If you're only after the last match (i.e., test), you don't need the positive lookahead:
/\/([^\/]*?)\/$/

Regex to match one or two quotes but not three in a row

For the life of me I can't figure this one out.
I need to search the following text, matching only the quotes in bold:
Don't match: """This is a python docstring"""
Match: " This is a regular string "
Match: "" ← That is an empty string
How can I do this with a regular expression?
Here's what I've tried:
Doesn't work:
(?!"")"(?<!"")
Close, but doesn't match double quotes.
Doesn't work:
"(?<!""")|(?!"")"(?<!"")|(?!""")"
I naively thought that I could add the alternates that I don't want but the logic ends up reversed. This one matches everything because all quotes match at least one of the alternates.
(Please note: I'm not running the code, so solutions around using __doc__ won't help, I'm just trying to find and replace in my code editor.)
You can use /(?<!")"{1,2}(?!")/
DEMO
Autopsy:
(?<!") a negative look-behind for the literal ". The match cannot have this character in front
"{1,2} the literal " matched once or twice
(?!") a negative look-ahead for the literal ". The match cannot have this character after
Your first try might've failed because (?!") is a negative look-ahead, and (?<!") is a negative look-behind. It makes no sense to have look-aheads before your match, or look-behinds after your match.
I realized that my original problem description was actually slightly wrong. That is, I need to actually only match a single quote character, unless if it's part of a group of 3 quote characters.
The difference is that this is desirable for editing so that I can find and replace with '. If I match "one or two quotes" then I can't automatically replace with a single character.
I came up with this modification to h20000000's answer that satisfies that case:
(?<!"")(?<=(?!""").)"(?!"")
In the demo, you can see that the "" are matched individually, instead of as a group.
This works very similarly to the other answer, except:
it only matches a single "
that leaves us with matching everything we want except it still matches the middle quotes of a """:
Finally, adding the (?<=(?!""").) excludes that case specifically, by saying "look back one character, then fail the match if the next three characters are """):
I decided not to change the question because I don't want to hijack the answer, but I think this can be a useful addition.

Match Sequence using RegEx After a Specified Character

The initial string is [image:salmon-v5-09-14-2011.jpg]
I would like to capture the text "salmon-v5-09-14-2011.jpg" and used GSkinner's RegEx Tool
The closest I can get to my desired output is using this RegEx:
:([\w+.-]+)
The problem is that this sequence includes the colon and the output becomes
:salmon-v5-09-14-2011.jpg
How can I capture the desired output without the colon. Thanks for the help!
Use a look-behind:
(?<=:)[\w+.-]+
A look-behind (coded as (?<=someregex)) is a zero-width match, so it asserts, but does not capture, the match.
Also, your regex may be able to be simplified to this:
(?<=:)[^\]]+
which simply grabs anything between (but not including) a : and a ]
If you are always looking at strings in that format, I would use this pattern:
(?<=\[image:)[^\]]+
This looks behind for [image:, then matches until the closing ]
You have the correct regex only the tool you're using is highlighting the entire match and not just your capture group. Hover over the match and see what "group 1" actually is.
If you want a slightly more robust regex you could try :([^\]]+) which will allow for any characters other than ] to appear in the file name portion.