I've worked with regex for super simple stuff.
Now I came up with a situation that my knowledge isn't sufficient.
I need to get this info out of a lot of lines.
Everything after the first quotes and before the # sign is what I need to have to copy to a new file.
0: "mailname#…"
6: "mailname2#yahoo.com"
etc..
I first did the following
(?<=")\S\D[^"]+(?=")
But this takes everything in between the quotes. But it should exclude everything out of the quotes and just give me the mail address or the name before the # sign.
This is what I have so far before the mail and I'm stuck to remove the # and everything behind it.
(\d{0,2})([:])\s(["(.+)"$])
First, take a copy of your file, then use this in notepad++
Find what: ^.*"(.+)#.+
Replace with: $1
If you want to find and match the parts, you could use
(?<=")[^"\s#]+(?=#[^\s"]+")
The pattern matches:
(?<=") Positive lookbehind to assert " to the left
[^"\s#]+ Match 1+ occurrences of any char except " a whitespace char or #
(?=#[^\s"]+") Positive lookahead to assert 1+ times any char except " a whitspace char or " followed by a " at the right
Regex demo
If using a quantifier in the lookbehind is supported, a bit more precise match asserting from the start of the string taking the digit and the colon into account:
(?<=^\d+:\s")[^"\s#]+(?=#[^\s"]+"$)
Regex demo
Related
I have the following scenario sending Auth Headers to an application that can range from the following:
"APIAuth 5b6b7ed3b9708d1168455da4:hW1ZeYYLJFGBP8tEHAEGoiGD1xM="
"APIAuth-HMAC-SHA256 5b6b7ed3b9708d1168455da4:hW1ZeYYLJFGBP8tEHAEGoiGD1xM="
etc.
What I'd like to do is to be able to capture APIAuth and APIAuth-HMAC-SHA256 from the header leaving me the client_id:signature like so:
string = '5b6b7ed3b9708d1168455da4:hW1ZeYYLJFGBP8tEHAEGoiGD1xM='
I want to be able to grab this value from any APIAut-WHATEVER-ENCRYPTION
I've been playing around with regex's but the best I have was this /\ABearer\s+/i. I thought this would have worked to grab both because the \s+ is more than one of any single character so I don't know why its not working. Could someone please assist? Regexs are not my strong suit. Thank you.
For the example strings, you could match the parts that you want:
\bAPIAuth(?:-\S+)?\s+\K[^\s:"]+:[^\s:"]+
Explanation
\bAPIAuth A word boundary, followed by APIAuth
(?:-\S+)? Optionally match - and 1+ non whitespace chars
\s+\K Match 1+ whitespace chars and forget what is matched so far using \K
[^\s:"]+:[^\s:"]+ Match : surrounded by chars other than a whitespace char or : or " if those are also part of the string
See a rubular regex demo.
You could also match only the first part, and then replace with an empty string.
\bAPIAuth(?:-\S+)?\s+
See another regex demo
I am looking for regex to match following set:
/VIDEO_PRE_MINE
/VIDEO_PRE
/VIDEO_PRE/
/VIDEO_PRE/SOMETHING
And I want exclude expresions like this:
/VIDEO_PRESOMETHING
/VIDEO_PREsomething/something
In other words after expression '_PRE' cannot be any literal character, but it can be end of the string.
Here are regexes that i tried:
1. ^\/[^\/]*_PRE[^a-z|A-Z]
2. ^\/[^\/]*_PRE[^a-z|A-Z]?$
However I didn't manage to cover all use cases from sets with those regex.
I would really appreciate any help with this.
Thanks
For your example data, you could add an optional group (?:[_/].*)? to match either a _ or / followed by matching any char except a newline 0+ times until the end of the string $
^/[^/]*_PRE(?:[_/].*)?$
^ Start of string
/[^/]* Match /, then 0+ times any char except /
_PRE Match literally
(?: Non capturing group
[_/].* Match either _ or / followed by 0+ times any char except a newline
)? Close non capturing group and make it optional
$ End of string
Regex demo
Note that the forward slashes are not escaped. Depending on the language or delimiters you might have to escape them.
My guess is that we might want to have some right boundaries, such as
^\/VIDEO_PRE(?:\b\/?|\/[^\/\s]+\/?|_[^\/\s]+\/?)$
in specified form, and in general form:
^\/[^_]+_PRE(?:\b\/?|\/[^\/\s]+\/?|_[^\/\s]+\/?)$
which might work. You would likely want to test and modify the expression, which is explained on the top right panel of regex101.com, if you wish to explore/simplify it, and in this link, you can watch how it would match against some sample inputs, if you like.
DEMO
I am looking to capture all characters after the last instance of a string in regex.
The string (that which we're searching after the last instance of) is as follows, sans quotes: " - ", or \b\s\-\s\b: boundary(whitespace character, preceded by -, preceded by whitespace character).
Test string as follows:
One Thing - Two Things - Three Things - Four Things
Desired match:
Four Things
This regex only matches everything after the first instance of the string:
(?<=\b\s\-\s\b)(.*)$
(Returns, sans quotes: "Two Things - Three Things - Four Things")
Whereas this matches everything after the last single character -:
[^\-]+$
(Returns, sans quotes: " Four Things")
Thoughts?
Try using a positive lookbehind then negating on the - delimiter and taking the last result
(?<=- )[^-]+$
https://regex101.com/r/sMX9FC/1
I think you could get your match without using lookarounds.
You could match any char except a newline from the start of the string followed by matching your pattern. That will match the last instance.
Then capture in a group matching 0+ times any char except a newline until the end of the string.
^.*\b\s\-\s\b(.*)$
^ Start of string
.* Match any char except a newline
\b\s\-\s\b\ Match your pattern
(.*) Capture in group 1 matching 0+ times any char except a newline
$ End of string
Regex demo
The is no tool or programming language listed, but if \K is supported to forget what was matched, you might also use:
^.*\b\s\-\s\b\K.*$
Regex demo
This matches the end of a string, everything that is not a - after a -.
-\s*([^-]+)$
It's the simplest regex I could think of.
.*(?<=\b\s\-\s\b)(.*)$, or putting a .* before your current regex should achieve what you're after, since that's a greedy match by default.
There are now different requirements to the regex I am looking for, and it is too complex to solve it on my own.
I need to search for a specific string with the following requirements:
String starts with "fu: and ends with "
In between those start and end requirements there can be any other string which has the following requirements:
2.1. Less than 50 characters
2.2. Only lower case
2.3. No trailing spaces
2.4. No space between "fu: and the other string.
The result of the regex should be cases where case no' 1 matches but cases no' 2./2.1/2.2/2.3/2.4 don't.
At the moment I have following regex: "fu:([^"]*?[A-Z][^"]*?)",
which finds strings with start with "fu: and end with " with any upper case inbetween like this one:
"fu:this String is wrong cause the s from string is upper case"
I hope it all makes sense, I tried to get into regex but this problem seems to complex for someone who is not working with regex every day.
[Edit]
Apparently I was not clear enough. I want to have matches which are "wrong".
I am looking for the complement of this regex: "fu:(?:[a-z][a-z ]{0,47}[a-z]|[a-z]{0,2})"
some examples:
Match: "fu: this is a match"
Match: "fu:This is a match"
Match: "fu:this is a match "
NO Match: "fu:this is no match"
Sorry, its not easy to explain :)
Try the following:
"fu:([a-z](?:[a-z ]{0,48}[a-z])?)"
This will match any string that begins with "fu: and ends with a " and the string between those will contain 1-50 characters - only lower-case and not able to begin with a space nor have trailing spaces.
"fu: # begins with "fu:
( # group to match
[a-z] # starts with at least one character
(?: # non-matching sub-group
[a-z ]{0,48} # matches 0-48 a-z or space characters
[a-z] # sub-group must end with a character
)? # group is not required
)
" # ends with "
EDIT: In the event that you need an empty-string to match too, i.e. the full string is "fu:", you can add another ? to the end of the matching-group in the regex:
"fu:([a-z](?:[a-z ]{0,48}[a-z])?)?"
I've kept the two regexes separated (one that allows 1-50 characters in the string and one that allows 0-50) to show the minor difference.
EDIT #2: To match the inverse of the above, i.e. - to find all strings that do not match the required format, you can use:
^((?!"fu:([a-z](?:[a-z ]{0,48}[a-z])?)?").)*$
This will explicitly match any line that does not match that pattern. This will consequently also match lines that do not contain "fu: - if that matters.
The only way I can figure out to truly match the opposite of the above and still include the anchors of "fu: and " are to explicitly attempt to match the rules that fail:
"fu:([^a-z].*|[^"]{51,}|[a-z]([^"]*?[A-Z][^"]*?)+|[a-z ]{0,49}[ ])"
This regex will match anything that starts with not a lowercase a-z character, any string that's longer than 50 characters, any string that contains an uppercase letter, or any string that has trailing whitespace. For each additional rule, you'll need to update the regex to match the opposite of what's needed.
My recommendation is, in whatever language you're using, to match all input strings that actually follow your requirements - and if there are no matches then that string must violate your rules.
"fu:([^A-Z" ](?:[^A-Z"]{0,48}[^A-Z" ])?)"
The above regex should match the specified requirements.
That's probably what you need
"fu:([a-z](?:[a-z ]{,48}[a-z])?)"
Try this:
"fu:(?:[a-z][a-z ]{0,47}[a-z]|[a-z]?)"
I am working in VB.Net and trying to use Regex.Replace to format a string I am using to query Sql. What Im going for is to cut out comments "--". I've found that in most cases the below works for what I need.
string = Regex.Replace(command, "--.*\n", "")
and
string = Regex.Replace(command, "--.*$", "")
However I have ran into a problem. If I have a string inside of my query that contains the double dash string it doesn't work, the replace will just cut out the whole line starting at the double dash. It makes since to me as to why but I can't figure out the regular expression i need to match on.
logically I need to match on a string that starts with "--" and is not proceeded by "'" and not followed by "'" with any number of characters inbetween. But Im not sure how to express that in a regular expression. I have tried variations of:
string = Regex.Replace(cmd, "[^('.*)]--.*\n[^(.*')]", "")
Which I know is obviously wrong. I have looked at a couple of online resources including http://www.codeproject.com/KB/dotnet/regextutorial.aspx
but due to my lack of understanding I can't seem to figure this one out.
I think you meant "match on a string that starts with -- and is not proceededpreceeded by ' and not followed by ' with any number of characters inbetween"
If so, then this is what you are looking for:
string = Regex.Replace(cmd, "(?<!'.*?--)--(?!.*?').*(?=\r\n)", "")
'EDIT: modified a little
Of course, it means you can't have apostrophes in your comments... and would be exceedingly easy to hack if someone wanted to (you aren't thinking of using this to protect against injection attacks, are you? ARE YOU!??! :D )
I can break down the expression if you'd like, but it's essentially the same as my modified quote above!
EDIT:
I modified the expression a little, so it does not consume any carriage return, only the comment itself... the expression says:
(?<! # negative lookbehind assertion*
' # match a literal single quote
.*? # followed by anything (reluctantly*)
-- # two literal dashes
) # end assertion
-- # match two literal dashes
(?! # negative lookahead assertion
.*? # match anything (reluctant)
' # followed by a literal single quote
) # end assertion
.* # match anything
(?= # positive lookahead assertion
\r\n # match carriage-return, line-feed
) # end assertion
negative lookbehind assertion means at this point in the match, look backward here and assert that this cannot be matched
negative lookahead assertion means look forward from this point and assert this cannot be matched
positive lookahead asserts the following expression CAN be matched
reluctant means only consume a match for the previous atom (the . which means everything in this case) if you cannot match the expression that follows. Thus the .*? in .*?-- (when applied against the string abc--) will consume a, then check to see if the -- can be matched and fail; it will then consume ab, but stop again to see if the -- can be matched and fail; once it consumes abc and the -- can be matched (success), it will finally consume the entire abc--
non-reluctant or "greedy" which would be .* without the ? will match abc-- with the .*, then try to match the end of the string with -- and fail; it will then backtrack until it can match the --
one additional note is that the . "anything" does not by default include newlines (carriage-return/line-feed), which is needed for this to work properly (there is a switch that will allow . to match newlines and it will break this expression)
A good resource - where I've learned 90% of what I know about regex - is Regular-Expressions.info
Tread carefully and good luck!
OK what you are doing here is not right :
/[^('.*)]--.*\n[^(.*')]/
You are saying the following :
Do not match a (, ), ', ., * then match -- then match anything until a newline and to not match the same character class as the one at the start.
What you probably meant to do is this :
/(?<!['"])\s*--.*[\r\n]*/
Which says, make sure that you don't match a ' or " match any whitespace match -- and anything else until the end or a newline or line feed character.