Regex needed for optional last digit - regex

I am trying to figure out a regex for a version parser.
I need to parse a version containing a major.minor.patch.build version string with 3 to 4 digits with the last (4th) digit optional.
For example the version could be:
1.2.3.4
or
1.2.3
I have my regex as the following, but it fails for 1.2.3 version string:
regex = "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)?"
Also, do I need the double back slashes ?

The following should do what you want:
(\d+)\.(\d+)\.(\d+)(\.(\d+))?
\d matches any single number <=> [0-9]
\. to match the . character (a single . in a regex matches any single character)
You can prepend '^' and append '$' to the regex to ensure there's no garbage before or after your version.

In your regex you have to make the last part \.(\d+) including the dot optional or else it would match 1.2.3.4 but also 1.2.3.
Try it like this with an optional last group where the dot and the digits are optional:
^\d+\.\d+\.\d+(?:\.\d+)?$
Or with capturing groups and the last is a non capturing group with a dot and a capturing group for the last digits:
^(\d+)\.(\d+)\.(\d+)(?:\.(\d+))?$
Instead of using anchors ^ and $ you could use a word boundary \b
There is no programming language specified concerning the double back slashes but what might help is when you open the regex101 demo link , there is a link under tools -> code generator where you can select a programming language. Perhaps that could be helpful.

Related

Regex - extract last term between _ and before . from path

This is the regex that I'm currently testing
[\w\. ]+(?=[\.])
My ultimate goal is to include a regex expression to extract using regexp_extract in Impala/Hive query.
regexp_extract(col, '[\w\. ]+(?=[\.])', 1)
This doesn't work in Impala however.
Examples of path to extract from:
D:\mypath\Temp\abs\device\Program1.lua
D:\mypath\Temp\abs\device\SE1_Test-program.lua
D:\mypath\Temp\abs\device\Test_program.lua
D:\mypath\Temp\abs\device\Device_Test_Case-general.lua
The regex I've tested extracts the term I'm looking for but it's not good enough, for the second and third, fourth cases I would need to extract only the part after the last underscore.
My expections are:
Program1
Test-program
program
Case-general
Any suggestions? I'm also open to using something other than regexp_extract.
Note that Impala regex does not support lookarounds, and thus you need a capturing group to get a submatch out of the overall match. Also, if you use escaping \ in the pattern, make sure it is doubled.
You can use
regexp_extract(col, '([^-_\\\\]+)\\.\\w+$', 1)
See the regex demo.
The regex means
([^-_\\]+) - Group 1: one or more chars other than -, _ and \
\. - a dot
\w+ - one or more word chars
$ - end of string.
Using \w also matches an underscore, instead you can use [a-zA-Z0-9] instead.
Add matching a dot and hyphen in the character class, capture that in group 1 and match the expected trailing dot.
Note that you don't have to escape dots in a character class.
([a-zA-Z0-9.-]+)[.]
See a regex101 demo
Example using regexp_extract where the , 1 gets the group 1 value:
regexp_extract(col, '([a-zA-Z0-9.-]+)[.]', 1)
If it should be at the end of the string only, matching the last dot without matching any backslashes in between:
regexp_extract(col, '([a-zA-Z0-9.-]+)[.][^\\\\.]+$', 1)

RegEx: Excluding a pattern from the match

I know some basics of the RegEx but not a pro in it. And I am learning it. Currently, I am using the following very very simple regex to match any digit in the given sentence.
/d
Now, I want that, all the digits except some patterns like e074663 OR e123444 OR e7736 should be excluded from the match. So for the following input,
Edit 398e997979 the Expression 9798729889 & T900980980098ext to see e081815 matches. Roll over matches or e081815 the expression e081815 for details.e081815 PCRE & JavaScript flavors of RegEx are e081815 supported. Validate your expression with Tests mode e081815.
Only bold digits should be matched and not any e081815. I tried the following without the success.
(^[e\d])(\d)
Also, going forward, some more patterns needs to be added for exclusion. For e.g. cg636553 OR cg(any digits). Any help in this regards will be much appreciated. Thanks!
Try this:
(?<!\be)(?<!\d)\d+
Test it live on regex101.com.
Explanation:
(?<!\be) # make sure we're not right after a word boundary and "e"
(?<!\d) # make sure we're not right after a digit
\d+ # match one or more digits
If you want to match individual digits, you can achieve that using the \G anchor that matches at the position after a successful match:
(?:(?<!\be)(?<=\D)|\G)\d
Test it here
Another option is to use a capturing group with lookarounds
(?:\b(?!e|cg)|(?<=\d)\D)[A-Za-z]?(\d+)
(?: Non capture group
\b(?!e|cg) Word boundary, assert what is directly to the right is not e or cg
| Or
(?<=\d)\D Match any char except a digit, asserting what is directly on the left is a digit
) Close group
[A-Za-z]? Match an optional char a-zA-Z
(\d+) Capture 1 or more digits in group 1
Regex demo

Regex to match ISO languages ISO

I have the following languages or language locale codes in a URL and i am trying to identify through REGEX. I was partially successful in identifying them but it is failing for some scenarios
Languages that i am testing with
en-us -- Passes
us -- Fails
Here is the REGEX that i have
([a-zA-Z]{2}|[a-zA-Z]{2}-[a-zA-Z]{2}\/)c\/(deals-and-tips\/)?
For instance:
https://forum.leasehackr.com/en-us/c/deals-and-tips (passes)
https://forum.leasehackr.com/us/c/deals-and-tips (fails)
What am I missing in the above REGEX?
The regex you wanted is:
([a-zA-Z]{2}|[a-zA-Z]{2}-[a-zA-Z]{2})\/c\/(deals-and-tips\/)?
The difference from your regex is that I moved the first \/ from inside the parenthesis to outside (to sit with c\/).
Test here.
The last / fails the match in any case since your urls doesn't have it, in any way I would rewrite your regex as this: ([a-zA-Z]{2})(-[a-zA-Z]{2})?\/c\/(deals-and-tips)?.
This way it always looks for the first part (en) and consider the second (-us) as optional.
Alternatively use (\w{2})(-\w{2})?\/c\/(deals-and-tips)?, if you don't mind risking to match underscores and similar simbols
The reason your pattern does not match us is because the alternation ([a-zA-Z]{2}|[a-zA-Z]{2}-[a-zA-Z]{2}\/) only matches the \/ in the second part of the alternation.
Also it does not match the last group with deals-and-tips because there is no trailing \/ in the example data.
Your updated pattern might look like
([a-zA-Z]{2}|[a-zA-Z]{2}-[a-zA-Z]{2})\/c\/(deals-and-tips)?
Regex demo
You could shorten the pattern a bit by using an optional non capturing group (?:-[a-zA-Z]{2})? inside the first capturing group to optionally match the part starting with a hyphen.
As in the example data you could match the leading \/ in front of the capturing group to get a more efficient match.
\/([a-zA-Z]{2}(?:-[a-zA-Z]{2})?)\/c\/(deals-and-tips)?
In parts
\/ To be a bit more precise, match the leading /
( Capture group 1
[a-zA-Z]{2} Match 2 chars a-z
(?:-[a-zA-Z]{2})? Optionally match - and 2 chars a-z
) Close group
\/c\/ Match /c/deals-and-tips`
(deals-and-tips)? Optional capture group 2 match deals-and-tips
Regex demo
Note that if you use another delimiter than / you don't have to escape the forward slash.

How to create proper regular expression to find last character which I want to?

I need to create regex to find last underscore in string like 012344_2.0224.71_3 or 012354_5.00123.AR_3.335_8
I have wanted find last part with expression [^.]+$ and then find underscore at found element but I can not handle it.
I hope you can help me :)
Just use a negative character class [^_] that will match everything except an underscore (this helps to ensure no other underscores are found afterwards) and end of string $
Pattern would look as such:
(_)[^_]*$
The final underscore _ is in a capturing group, so you are wanting to return the submatch. You would replace the group 1 (your underscore).
See it live: Regex101
Notice the green highlighted portion on Regex101, this is your submatch and is what would be replaced.
The simplest solution I can imagine is using .*\K_, however not all regex flavours support \K.
If not, another idea would be to use _(?=[^_]*$)
You have a demo of the first and second option.
Explanation:
.*\K_: Fetches any character until an underscore. Since the * quantifier is greedy, It will match until the last underscore. Then \K discards the previous match and then we match the underscore.
_(?=[^_]*$): Fetch an underscore preceeded by non-underscore characters until the end of the line
If you want nothing but the "net" (i.e., nothing matched except the last underscore), use positive lookahead to check that no more underscores are in the string:
/_(?=[^_]*$)/gm
Demo
The pattern [^.]+$ matches not a dot 1+ times and then asserts the end of the string. The will give you the matches 71_3 and 335_8
What you want to match is an underscore when there are no more underscores following.
One way to do that is using a negative lookahead (?!.*_) if that is supported which asserts what is at the right does not match any character followed by an underscore
_(?!.*_)
Pattern demo

Regex to find file version C#

Below are some examples of the file name without extension, from which I want to extract version and type of the file.
1] 2.13.1801.221 Expected output-[Version: 2.13.1801.221 and Type: Null]
2] 2.17.1801.221.SQLServer
Expected output-[Version: 2.17.1801.221 and Type: SQLServer]
3] 2.19.1801.SQLite
Expected output-[Version: 2.19.1801 and Type: SQLite]
I am using below regex to extract version and type from file name
^(?<version>(\d+\.\d+)+)\.(?<type>\w*)$
But this doesn't work.
Tested with regex online which shows result as:[https://i.stack.imgur.com/c9FlW.png]
Match groups formed as: [https://i.stack.imgur.com/V0azi.png
]
What am I missing here ?
please suggest some good regex.
Thanks in advance!
Your regex is a little incorrect which is why it is not working. The correct regex you should use is following,
^(?<version>\d+(?:\.\d+)+)(?:\.(?<type>[a-zA-Z]+))?$
Demo
Here is the explanation of problems in your ^(?<version>(\d+\.\d+)+)\.(?<type>\w*)$ regex,
This (\d+\.\d+)+ in your regex will not correctly capture version as this will expect data of type one or more digits followed by literal dot again followed by one or more digits and whole it it one or more times. The corrected version of this part will be this \d+(?:\.\d+)+ which can capture strings like 1.1 or 1.2.33.11 etc.
Second problem in your regex part is this \.(?<type>\w*) where this will match a literal dot and then zero or more word character which will even match last digit part in case there is actually no version data due to which it will match 221 in string 2.13.1801.221 which is not what you want. In fact since your version can be absent in the string, you need to use ? operator to specify the whole group as being optional and use [a-zA-Z] for capturing version data and your corrected regex part should be this (?:\.(?<type>[a-zA-Z]+))?. In case your version data can contain numbers, then you can enhance your second by making changing [a-zA-Z]+ to [a-zA-Z][a-zA-Z\d]* where it means your version string should start with alphabet and numbers can be present later.
Also, I have made some groups in your regex as non-capture groups by placing ?: just before ( as you don't need to capture them separately.
You are always assuming that there would be . after the version numbers. However, if there is no type specified after the version, the extra . would not exist. So instead, you could use the following:
^(?<version>[\d+\.]+\d)\.*(?<type>\w*)$
Demo
^ matches the beginning of the line
The version capture group is defined by (?<version>[\d+\.]+\d)
[\d+\.]+ matches 1+ number of digit following by . for 1+ number of times
\d matches the last digit
\.* matches whether there is any type specified after the version numbers
The type capture group is defined by (?<type>\w*)
\w* matches any number of word characters
$ matches the end of the line