Regex: get string between two characters

Regex: get string between two characters - regex

I am trying to extract a string between the two characters # and : for the string:
test23#server:/var/
So, when I try to do something like,
#([^.]*):
or even
\#(\S+):
I get #server:
I just want both # and : removed so I can get just the word server. Please help!

You only need to use a referrer to capturing group you just constructed or you can use \K token. You don't need to escape # character and [^.]* means greedily match everything except a literal dot . which is better to be changed to [^:]+:
#\K[^:]+
or more strictly:
#\K[^:]++(?=:)

Try (?<=#).*(?=:). I'm using positive lookbehind and positive lookahead.
It will match anything starting with (but excluding) # character and ending with (excluding) :
for detail, please see https://www.regular-expressions.info/lookaround.html
see demo at https://regex101.com/r/jIIB9Q/1

Related

Regex to capture ApiAuth Headers

I have the following scenario sending Auth Headers to an application that can range from the following:
"APIAuth 5b6b7ed3b9708d1168455da4:hW1ZeYYLJFGBP8tEHAEGoiGD1xM="
"APIAuth-HMAC-SHA256 5b6b7ed3b9708d1168455da4:hW1ZeYYLJFGBP8tEHAEGoiGD1xM="
etc.
What I'd like to do is to be able to capture APIAuth and APIAuth-HMAC-SHA256 from the header leaving me the client_id:signature like so:
string = '5b6b7ed3b9708d1168455da4:hW1ZeYYLJFGBP8tEHAEGoiGD1xM='
I want to be able to grab this value from any APIAut-WHATEVER-ENCRYPTION
I've been playing around with regex's but the best I have was this /\ABearer\s+/i. I thought this would have worked to grab both because the \s+ is more than one of any single character so I don't know why its not working. Could someone please assist? Regexs are not my strong suit. Thank you.

For the example strings, you could match the parts that you want:
\bAPIAuth(?:-\S+)?\s+\K[^\s:"]+:[^\s:"]+
Explanation
\bAPIAuth A word boundary, followed by APIAuth
(?:-\S+)? Optionally match - and 1+ non whitespace chars
\s+\K Match 1+ whitespace chars and forget what is matched so far using \K
[^\s:"]+:[^\s:"]+ Match : surrounded by chars other than a whitespace char or : or " if those are also part of the string
See a rubular regex demo.
You could also match only the first part, and then replace with an empty string.
\bAPIAuth(?:-\S+)?\s+
See another regex demo

Match a part of a string using regex

I have a string and would like to match a part of it.
The string is Accept: multipart/mixedPrivacy: nonePAI: <sip:4168755400#1.1.1.238>From: <sip:4168755400#1.1.1.238>;tag=5430960946837208_c1b08.2.3.1602135087396.0_1237422_3895152To: <sip:4168755400#1.1.1.238>
I want to match PAI: <sip:4168755400#
the whitespace can be a word so i would like to use .* but if i used that it matches most of the string
The example on that link is showing what i'm matching if i use the whitespace instead of .*
(PAI: <sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
The example on that link is showing what i'm trying to achieve with .* but it should only match PAI: <sip:4168755400#
(PAI:.*<sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
I tried lookaround but failing.
Any idea?
thanks

Matching the single space can be updated by using a character class matching either a space or a word character and repeat that 1 or more times to match at least a single occurrence.
Note that you don't have to escape the spaces, and in both occasions you can use an optional character class matching either a space or hyphen [ -]?
If you want the match only, you can omit the 2 capturing groups if you want to.
(PAI:[ \w]+<sip:)((?:\([2-9]\d{2}\) ?|[2-9]\d{2}[ -]?)[2-9]\d{2}[- ]?\d{4})#
Regex demo

The regex should be like
PAI:.*?(<sip:.*?#)
Explanation:
PAI:.*? find the word PAI: and after the word it can be anything (.*) but ? is used to indicate that it should match as few as possible before it found the next expression.
(<sip:.*?#) capturing group that we want the result.
<sip:.*?# find <sip: and after the word it can be anything .*? before it found #.
Example

Regex excluding catches that ending with a dot

First of all, I don't need full e-mail address validation, my given task doesn't require it. I just want to upgrade my current regex code so that it won't match addresses ending with a dot.
My current code: [0-9A-Za-z.]+[#][0-9A-Za-z.]+
It catches both "user#exampe.com", "user#example.com."
I'd like it to catch only from the string that ends without the dot. user#exampe.com
Example string:
dasd.fas#fsaf.dfas.dsa, zghs#gas.gsq, adg32.dsa12#cas, ksak#c.csa., gs32.basaa#scaa.upc.
I'd like to catch the strings marked as code in the example.
Edit: I have only one line with multiple e-mail addresses separated with a , and a space after them.

You might add [0-9A-Za-z]after your regex to end with what you want to match in your character class without the dot followed by a positive lookahead (?=, |$) that asserts what follows is either a comma followed by a whitespace or the end of the string.
[0-9A-Za-z.]+#[0-9A-Za-z.]+[0-9A-Za-z](?=, |$)
Regex Demo

([0-9A-z.]+#(?:\.?[0-9A-z]+)+)(?=,|$)
Try it here

Just slightly modify your pattern: [0-9A-Za-z.]+[#](?:[a-zA-Z]|\.(?=[a-zA-Z]))+.
It uses alternation after # to match one or more: letters OR dot, if it's followed by another letter, thanks to positive lookahead: \.(?=[a-zA-Z]).
Demo

Try this one:
just capture , , $ and group them in non-capturing group except end .
[0-9A-Za-z.]+[#][0-9A-Za-z.]+[0-9A-Za-z](?:(,|$))
demo here

How can I match all instances of the first letter?

For example, for this string I want to match all A and a:
"All the apples make good cake."
Here's what I did: /(.)[^.]*\1*/ig
I started by getting the first character in the group, which can be any character: (.) Then I added [^.]* because I don't want to match any other character that isn't the first one. Finally I added \1* because I wanted to match the first character again. All other similar variations that I've tried don't seem to work.

The regex you are trying to build would capture very first character then any thing up to the same character as much as possible, using a negative lookahead (tempered dot):
(?i)(\w)(?:(?!\1).)*
Capturing group 1 holds the character you need. Try it on a live demo.
If regex engine supports \K match re-setter token then you can append it to the regex above to only match desired part:
(?i)(\w)(?:(?!\1).)*\K

regex to match word (url) only if it does not contain character

I'm using an API that sometimes truncates links inside the text that it returns and instead of "longtexthere https://fancy.link" I get "longtexthere https://fa…".
I'm trying to get to match the link only if it's complete, or in other words does not contain "…" character.
So far I am able to get links by using the following regex:
((?:https?:)?\/\/\S+\/?)
but obviously it returns every link including broken ones.
I've tried to do something like this:
((?:https?:)?\/\/(?:(?!…)\S)+\/?)
Although that started to ignore the "…" character it was still returning the link but just without including the character, so with the case of "https://fa…" it returned "https://fa" whereas I simply want it to ignore that broken link and move on.
Been fighting this for hours and just can't get my head around it. :(
Thanks for any help in advance.

You can use
(?:https?:)?\/\/[^\s…]++(?!…)\/?
See the regex demo. The possessive quantifier [^\s…]++ will match all non-whitespace and non-… characters without later backtracking and then check if the next character is not …. If it is, no match will be found.
As an alternative, if your regex engine allow possessive quantifiers, use a negative lookahead version:
(?!\S+…)(?:https?:)?\/\/\S+\/?
See another regex demo. The lookahead (?!\S+…) will fail the match if 1+ non-whitespace characters are followed with ….

You can try following regex
https?:\/\/\w+(?:\.\w+\/?)+(?!\.{3})(\s|$)
See demo https://regex101.com/r/bS6tT5/3

Try:
((?:https?:)?\/\/\S+[^ \.]{3}\/?)
Its the same as your original pattern.. you just tell it that the last three characters should not be '.' (period) or ' ' (space)
UPDATE: Your second link worked.
and if you tweak your regex just slightly it will do what you want:
((?:https?:)?\/\/\S+[^ …] \/?)
Yes it looks just like what you had in there except I added a ' ' (space) after the part we do not want.. this will force the regular expression to match up until and including the space which it cannot with a url that has the '...' character. Without the space at the end it would match up until the not including the '...' which was why it was not doing what we wanted ;)

Please try:
https?:\/\/[^ ]*?…|(https?:\/\/[^ ]+\.[^ ]+)
Here is the demo.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex: get string between two characters - regex

I am trying to extract a string between the two characters # and : for the string: test23#server:/var/ So, when I try to do something like, #([^.]*): or even \#(\S+): I get #server: I just want both # and : removed so I can get just the word server. Please help!

You only need to use a referrer to capturing group you just constructed or you can use \K token. You don't need to escape # character and [^.]* means greedily match everything except a literal dot . which is better to be changed to [^:]+: #\K[^:]+ or more strictly: #\K[^:]++(?=:)

Try (?<=#).*(?=:). I'm using positive lookbehind and positive lookahead. It will match anything starting with (but excluding) # character and ending with (excluding) : for detail, please see https://www.regular-expressions.info/lookaround.html see demo at https://regex101.com/r/jIIB9Q/1

Related

Regex to capture ApiAuth Headers

Match a part of a string using regex

Regex excluding catches that ending with a dot

How can I match all instances of the first letter?

regex to match word (url) only if it does not contain character

Categories

Resources