regex filename of a unixpath without the first two digits - regex

I have filename in a unix-path starting with two digits ... how can i extract the name without the extension
/this/is/my/path/to/the/file/01filename.ext should be filename
I currently have [^/]+(?=\.ext$) so I get 01filename, but how do I get rid of the first two digits?

You can add a look-behind in front of what you already have, looking for two digits:
(?<=\d\d)[^/]+(?=.ext$)
This only works if you have exactly two digits! Unfortunately, in most regex engines it is not possible to use quantifiers like * or + in lookbehinds.
(?<=\d\d) - checks for two digits before the match
[^/]+ - matches 1 or more characters, except /
(?=.ext$) - checks for .ext behind the match

Try this one :
/\d\d(.*?).\w{3}$
Explanation :
/\d\d : slash followed by two digit
(.*?) : the capture
.\w{3} : a dot followed by three letters
$ : end of string
It works for me on Expresso

Consider the following Regex...
(?<=\d{2})[^/]+(?=.ext$)
Good Luck!

A more general regex:
(?:^|\/)[\d]+([^.]+)\.[\w.]+$
Explanation:
(?: group, but do not capture:
^ the beginning of the string
| OR
\/ '/'
) end of grouping
[\d]+ any character of: digits (0-9) (1 or more
times (matching the most amount possible))
( group and capture to \1:
[^.]+ any character except: '.' (1 or more
times (matching the most amount
possible))
) end of \1
\. '.'
[\w\.]+ any character of: word characters (a-z, A-
Z, 0-9, _), '.' (1 or more times
(matching the most amount possible))
$ before an optional \n, and the end of the
string

Related

How do I write regex to pull patch version out of semvar

I'm trying to use regex to pull just the patch version out of some semvars in the form v1.2.3
I've got some regex which can match the v1.2. part however I'm struggling to get the other part, the 3 (which I actually want back)
I'm using ^v\d+\.\d+\. to select the first part.
I'm trying to use a negative lookahead with this to then select everything after it with (?!(v\d+\.\d+\.)).* but this just seems to return everything after the v rather than everything after the group
Any pointers would be really appreciated, thanks!
In this special case:
'^(?<=v\d\.\d\.)[[:alnum:]]+'
The regular expression matches as follows:
Node
Explanation
^
start of string
(?<=
look behind to see if there is:
v
v
\d
digits (0-9)
\.
.
\d
digits (0-9)
\.
.
)
end of look-behind
[[:alnum:]]+
any character of: letters and digits (1 or more times (matching the most amount possible))
A more generic solution than works with any length of digits
'^v\d+\.\d+\.\K.[[:alnum:]]+'
The regular expression matches as follows:
Node
Explanation
^
start of string
v
v
\d+
digits (0-9) (1 or more times (matching the most amount possible))
\.
.
\d+
digits (0-9) (1 or more times (matching the most amount possible))
\.
.
\K
resets the start of the match (what is Kept) as a shorter alternative to using a look-behind assertion: look arounds and Support of K in regex
[[:alnum:]]+
any character of: letters and digits (1 or more times (matching the most amount possible))
Check man tr | grep -FA1 '[:' for a POSIX character classes like [[:alnum:]]

Extract the last path-segments of a URI or path using RegEx

I am trying to extract the last section of the following string :
"/subscriptions/5522233222-d762-666e-555a-e6666666666/resourcegroups/rg-sql-Belguim-01/providers/Microsoft.Compute/snapshots/vm-sql-image-v3.3-pre-sysprep-Oct-2021-BG"
I want to capture:
"snapshots/vm-sql-image-v3.3-pre-sysprep-Oct-2021-BG"
I tried below with no luck:
(\w*?\/\w*?)$
How to pull this off using regex?
Use
[^\/]+\/[^\/]+$
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
[^\/]+ any character except: '\/' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
[^\/]+ any character except: '\/' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Your issues
(\w*?/\w*?)$ is for simple or empty last 2 segments (tested), e.g.
matched hello/world/subscriptions123/snap_shots capturing subscriptions123/snap_shots
matched /1/2// capturing the last 2 empty segments
OK was:
capture-group
/ to match the last path-separator before end ($)
\w*? intended to match the path-segment of any length
What to improve:
*? is a bit too unrestricted, choose quantifier as + for at least one (instead * for any or ? for zero or one)
\w is for word-meta-character, does not match hyphens or dots (OK for snapshot, not for given last segment)
Quick-fixed
(\w+/[\w\.-]+)$ (tested)
added dot \. and hyphen - to character-set containing \w
Simple but solid
(snapshots/[^\/]+)$ (tested)
fore-last path-segment assumed as fix constant snapshots
[^\/] any character except (^) slash in last segment
Note: the slash doesn't need to be escaped \/ like Ryszard answered

regex if (text contain this text) match this

I have these two sentence
TAGGING ODP:-7.160792, 113.496069
TAGGING pel:-7.160792, 113.496069
I want to match -7.160792 part only if the full sentence contain "odp" in it.
I tried the following (?(?=odp)-\d+.\d+) but it doesn't work, i don't know why.
Any help is appreciated.
(?(?=odp)-\d+\.\d+) won't work because (?=odp) is a positive lookahead that imposes a constraint on the pattern on the right, -\d+\.\d+. Namely, it requires odp string to occur exactly at the same location where - and a number are expected.
Use
(?<=ODP:)-\d+\.\d+
ODP:(-\d+\.\d+)
If lookbehinds are supported, the first variant is more viable.
Otherwise, another option with capturing groups is good to use.
And if odp can appear anywhere, even after the number:
(?i)^(?=.*odp).*(-\d+\.\d+)
This will capture the value into a group.
EXPLANATION
--------------------------------------------------------------------------------
(?i) set flags for this block (case-
insensitive) (with ^ and $ matching
normally) (with . not matching \n)
(matching whitespace and # normally)
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
odp 'odp'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \1
You can use the regex, (?i)(?<=odp:)[^,]*.
Explanation:
(?i): Case-insenstitive flag
(?<=odp:): Positive lookbehind for odp:
[^,]*: Anything but ,
👉 If you want the match to be restricted to numbers only, you can use the regex, (?i)(?<=odp:)(?:-\d+.\d+)
Explanation:
(?i): Case-insenstitive flag
(?<=odp:): Positive lookbehind for odp:
(?:: Start non capturing group
-: Literal, -
\d+: 1+ digit(s)
.\d+: . followed by 1+ digit(s)
): End non capturing group
👉 If the sign can be either + or -, you can use the regex, (?i)(?<=odp:)(?:[+-]\d+.\d+)
The pattern (?(?=odp)\-\d+\.\d+) is using a conditional (? stating in the if clause:
If what is directly to the right from the current position is odp,
then match -\d+.\d+
That can not match.
What you also could do is match odp followed by any char other than a digit using \D* and capture the digit part in a group.
\bodp\b\D*(-\d+\.\d+)\b
The pattern matches:
\bodp\b match odp between word boundaries to prevent a partial match
\D* Optionally match any char other than a digit
(-\d+\.\d+) Capture - and 1+ digits with a decimal part in group 1
\b A word boundary
Regex demo
(?<=ODP:)(-\d+.\d+)
You can try using the negative look behind.
This should solve for the code you ve provided.

Ruby regex counting characters

I am trying to create a regex in ruby that matches against strings with 10 characters which are not special characters i.e. would match with \w.
So far I have come up with this:
/\w{10,}/
but the issue is that it will only count a consecutive sequence of word characters. I want to match any string which counts up to have at least 10 "word" characters. Is this possible? I am fairly new to regex as a whole so any help would be appreciated.
If I understood correctly, this should work:
/(?:\w[^\w]*){9,}\w/
Explanation:
We start with a single
\w
We want to capture all the other characters until another \w, hence:
\w[^\w]*
[^<list of chars>] matches any character other than listed in the brackets, so [^\w] means any character that is not a word character. * denotes 0 or more. The above will match "a-- ", "b" and "c!" in "a-- bc!" string.
Since we need 10 \w, we will match 9 (or more) groups like that, followed by a single \w
(\w[^\w]*){9,}\w
We don't really care for captures here (especially since ruby will ignore repeated group captures anyway, so we make the group non-capturing)
(?:\w[^\w]*){9,}\w
Alternatively we could just use simpler regex:
(?:\w[^\w]*){10,}
But it will also cover characters after the last word character in a string - not sure if this is required here.
Match anywhere in the string:
/\w(?:\W*\w){9,19}/
/(?:\W*\w){10,20}/
Validate a string of 10 to 20 characters long:
/\A(?:\W*\w){10,20}\W*\z/
Prefer non-capturing groups, particularly when extracting found matches.
Watch out for ^ and $ that mark up start and end of the line respectively in Ruby's regex.
EXPLANATION
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
(?: group, but do not capture (between 10 and
20 times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\W* non-word characters (all but a-z, A-Z, 0-
9, _) (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
){10,20} end of grouping
--------------------------------------------------------------------------------
\W* non-word characters (all but a-z, A-Z, 0-
9, _) (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\z the end of the string

Regexport() both integers and numbers with decimals

I'm working in Google Sheets and wondering if it's possible to use one regexport() function to export both whole and partial numbers.
I have a column with:
1 Ml/ 2 Ml
2 Ml/ 2.02 Ml
3 Ml/ 4.01 Ml
and want a column with:
2
2.02
4.01
The first value could be 2.00 as well.
I was wondering if this is possible specifically with regular expressions. I know how to do it without. I currently have regexport(cell#, "\/\D(\d+)\D")
Thanks!
I think all you need as a pattern is:
(\d+(?:\.\d+)?) Ml$
( - 1St apture group.
\d+ - One or more digits.
(?: - Open non-capture group.
\.\d+ - A literal dot followed by one or more digits.
)? - Close non-capture group and make it optional.
) - Close 1st capture group.
Ml$ - Match "Ml" literally upto the end string ancor ($).
Add this to an ARRAYFORMULA() like:
=ARRAYFORMULA(REGEXEXTRACT(A1:A3,"(\d+(?:\.\d+)?) Ml$"))
without Regex
We want to grab a value encapsulated by slash-space on the left and space on the right:
=TRIM(MID(A1,FIND("/ ",A1)+2,FIND(" ",A1,FIND("/ ",A1)+2)-(FIND("/ ",A1)+2)))
(Both Excel and Google Sheets should work the same way. If we have to grab multiple instances, I would use Regex.)
use:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A1:A2, " (\d+\.\d+) Ml"),
REGEXEXTRACT(A1:A2, " (\d+) Ml")))
You can also try the simpler which takes care of errors and returns results as numbers at the same time.
=ArrayFormula(IFERROR(
REGEXEXTRACT(A1:A,"/ (.*) ")*1))
Use
=REGEXEXTRACT(A1, "(\d[\d.]*)\s*Ml$")
See proof
Explanation
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
[\d.]* any character of: digits (0-9), '.' (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
Ml 'Ml'
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string