how to create regex function to select an extract a query? - regex

I'm trying to extract a query from a string, I tried writing my own function, but it doesn't match my needs totally.
What I need is:
www.website.com/8056432988456?id=5, I need 8056432988456, with or without the / i.e. preceding a ?.
This is the regex I made for it : (?<=\/)(.*?)(?=\?)|(?<=\?)
Can someone help me out?

You can use
(?<=\/)\d+(?=(?:\/?\?.*)?$)
See the regex demo.
Details:
(?<=\/) - there must be a / immediately on the left
\d+ - one or more digits
(?=(?:\/?\?.*)?$) - immediately on the right, there must be an optional occurrence of:
(?:\/?\?.*)? - an optional occurrence of an optional /, then ? and then any zero or more chars other than line break chars as many as possible
$ - end of string.

Related

Regex to match multiple cases

I have the following examples that must match with my regex
1,[]
1,[0,0,0,[]]
1,[0,0,0,0,0,[]]
1,1
1
I came up with a simple way of matching the middle ones with .?,\[.*\[\]\] but it doesnt match the first and the last one.
Maybe this is too much to handle with regex but I want to check the following things:
If there is a ',' it should have a following character or characters(numbers or letters)
If a bracket is opened: it should close '[]'
The bracket insides can be whatever but it must respect rule 1 and 2.
I am trying to find a solution so I'm grateful if you can help me. Thank you.
You can use
^\d+(?:,(?:(\[(?:[^][]++|\g<1>)*])|\d+))?$
See the regex demo. Details:
^ - start of string
\d+ - one or more digits
(?:,(?:(\[(?:[^][]++|\g<1>)*])|\d+))? - an optional sequence of
, - a comma
(?:(\[(?:[^][]++|\g<1>)*])|\d+) - one of the alternatives:
(\[(?:[^][]++|\g<1>)*]) - Group 1: [, then zero or more occurrences of one or more chars other than [ and ] or Group 1 pattern recursed
| - or
\d+ - one or more digits
$ - end of string.

Regex With Conditional - Not Desired Output

Was actually glossing over a question and found myself struggling to perform something really simple.
If a string contains % I want to use a particular regex, else I want to use a different one.
I tried the following: https://regex101.com/r/UvFZpo/1/
Regex: (%)(?(1)[^$]+|[^%]+).
Test string: abc%
But I'm not getting the expected results.
I was expecting to see abc% matched as it contains %.
If the string was, abc$, I'd expect it to use the second expression.
Where am I going wrong?
Regex parses strings from left to right, position by position.
Once your pattern matches &, its index is at the end of string, hence, it fails since there are no more chars to be matched by the subsequent [^$]+ pattern.
You can use a mere alternation here:
^(?:([^$]*%[^$]*)|([^%]+))$
See the regex demo
If the string contains %, the Group 1 will be populated, else, Group 2 will.
Details
^ - start of string
(?:([^$]*%[^$]*)|([^%]+)) - either of the two alternatives:
([^$]*%[^$]*) - Group 1: any 0+ chars other than $, as many as possible, % any 0+ chars other than $, as many as possible,
| - or
([^%]+) - any 1+ chars other than %, as many as possible
$ - end of string.

Regexp Substring From URL

I need to retrieve some word from url :
WebViewActivity - https://google.com/search/?term=iphone_5s&utm_source=google&utm_campaign=search_bar&utm_content=search_submit
return I want :
search/iphone_5s
but I'm stuck and not really understand how to use regexp_substr to get that data.
I'm trying to use this query
regexp_substr(web_url, '\google.com/([^}]+)\/', 1,1,null,1)
which only return the 'search' word, and when I try
regexp_substr(web_url, '\google.com/([^}]+)\&', 1,1,null,1)
it turns out I get all the word until the last '&'
You may use a REGEXP_REPLACE to match the whole string but capture two substrings and replace with two backreferences to the capture group values:
REGEXP_REPLACE(
'WebViewActivity - https://google.com/search/?term=iphone_5s&utm_source=google&utm_campaign=search_bar&utm_content=search_submit',
'.*//google\.com/([^/]+/).*[?&]term=([^&]+).*',
'\1\2')
See the regex demo and the online Oracle demo.
Pattern details
.* - any zero or more chars other than line break chars as many as possible
//google\.com/ - a //google.com/ substring
([^/]+/) - Capturing group 1: one or more chars other than / and then a /
.* - any zero or more chars other than line break chars as many as possible
[?&]term= - ? or & and a term= substring
([^&]+) - Capturing group 2: one or more chars other than &
.* - any zero or more chars other than line break chars as many as possible
NOTE: To use this approach and get an empty result if the match is not found, append |.+ at the end of the regex pattern.

Regex pattern : Validating a single occurence

I have implemented the following Regex pattern
^[\d,|+\d,]+$
It validates the following pattern
14,+96,4,++67
I need to invalidate ++67 from my pattern and I need to keep values with only a single leading + sign.
How should I change my Regex pattern?
You may use
^\+?\d+(?:,\+?\d+)*$
See the regex demo.
Details
^ - start of string
\+? - an optional + char
\d+ - 1+ digits
(?:,\+?\d+)* - zero or more repetitions of a sequence of patterns:
, - a comma
\+? - an optional plus
\d+ - 1+ digits
$ - end of string
Perhaps you meant to do this?
^(\d,|\+\d,)+$
Square brackets use every character or character class within, which does not appear to be what you really want. For disjunction you need round brackets.
You can try this one
^(\d+\,?|\+\d+,?)+$

How to get the first match in regexp?

I have three strings as list below:
Levofloxacin 500mg/100mL
Levofloxacin 500mg
Procaterol Hydrochloride …………… 25μg
The first line, I want to just get 'mg' without 'mL' in my result.
The second line, I want get 'mg'.
The third line, I want get 'ug'.
I have try regexp pattern like:
(?!(.*[ ]{1}[0-9]+))[a-zA-Zμ]+
However, the first line always returns 'mg' with 'mL'...
How could I just acquire 'mg' with regexp?
Any suggestions will be appreciated.
As mentioned in the comment section, try this regex:
^\D*[\d.]+\K[a-zμ]+
Click for Demo
Explanation:
^ - asserts the start of the string
\D* - matches 0+ occurrences of any character that is not a digit
[\d.]+ - matches 1+ occurrences of any character that is a digit
\K - removes what has been matched so far
[a-zμ]+ - this is what you want. This will contain the units like mg, ml appearing after the first number. If there are any other special characters like μ, you can add them too in this character list