Regex to match PowerShell drive path - regex

PowerShell's New-PsDrive Cmdlet allows for drives to be created with more-flexible names like HKLM.
I'd like to match these drive\path\file patterns in the NavigationCmdletProvider that I'm building:
csb:
csb:\
csb:\foo\bar
csb:\foo\bar\
csb:\foo\bar bar\test.txt
but not these
csb:\\
csb:\\\
([a-zA-Z]+:)?(\\[a-zA-Z0-9_.-: :]+)*\\? matches everything that I want, but still includes the two that I don't. I can't seem to get it to match 0 or 1 \ at the end of the string.
What am I missing?

All you should need to do is tie your regular expression to the beginning and end of the line using a ^ and a $ respectively:
^([a-zA-Z]+:)?(\\[a-zA-Z0-9_.-: :]+)*\\?$
This is necessary almost any time you are trying to count a specific number of character in a regex.

Related

Regular Expression to Match List of File Extensions

I would like to have a regular expression that will match a list of file extensions that are delimited with a pipe | such as doc|xls|pdf This list could also just be a single extension such as pdf or it could also be a wild card * or ? I would also like to exclude the | at the start or the end of the list and also not match the \<>/:" characters.
I have tried the following but it doesn't account for a single * wildcard.
^([^|\\<>\/:"]|[^\\<>:"])[^\/\\<>:"]*[^|\/\\<>:"]$
I have been on one of the online testers but can't seem to get over the final hurdle. If someone could point me in the right direction I would be most grateful.
You can construct this from smaller building blocks. A single extension, excluding the characters you mention, would be:
[^\\<>/:"]+
We should probably also exclude | since that's our delimiter:
[^\\<>/:"|]+
This can automatically match wildcards as well, since they're not forbidden.
To construct the |-separated list from those is then easy:
[^\\<>/:"|]+
followed by an arbitrary number of the same thing with a | before that:
[^\\<>/:"|]+(\|[^\\<>/:"|]+)*
And if you want a complete string to match this, add the ^ and $ anchors:
^[^\\<>/:"|]+(\|[^\\<>/:"|]+)*$

pulling mixed letters and numbers [duplicate]

I'm trying to get a grasp on regular expressions and I came across with the one included inside the str.extract method:
movies['year']=movies['title'].str.extract('.*\((.*)\).*',expand=True)
It is supposed to detect and extract whichever is in parentheses. So, if given this string: foobar (1995) it should return 1995. However, if I open a terminal and type the following
echo 'foobar (1995)` | grep '.*\((.*)\).*'
matches the whole string instead of only the content between parentheses. I assume the method is working with BRE flavor because of the parentheses scaping, and so is grep (default behavior). Also, regex matches in blue the whole string and green the year (capturing group). Am I missing something here? The regex works perfectly inside python
First of all, the behavior of Pandas .str.extract() is quite expected: it returns only the capturing group contents. The pattern used with extract requires at least 1 capturing group:
pat : string
Regular expression pattern with capturing groups
If you use a named capturing group, the new column will be named after the named group.
The grep command you provided can be reduced to
grep '\((.*)\)'
as grep is capable of matching a line partially (does not require a full line match) and works on a per line basis: once a match is found the whole line is returned. To override that behavior, you may use -o switch.
With grep, you cannot return the capturing group contents. This can be worked around with PCRE regexp powered with -P option, but it is not available on Mac, for example. sed or awk may help in those situations, too.
Try using this:
movies['year']= movies['title'].str.extract('.*\((\d{4})\).*',expand=False)
Set expand= True if you want it to return a DataFrame or when applying multiple capturing groups.
A year is always composed of 4 digits. So the regex: \((\d{4})\) match any date between parentheses.

Find and trim part of what is found using regular expression

I'm a newbie in writing regular expressions
I have a file name like this TST0101201304-123.txt and my target is to get the numbers between '-' and '.txt'
So I wrote this formula -([0-9]*)\.txt this will get me the numbers that I want, but in addition, it is retrieving the highfin '-' and the last part of the string also '.txt' so the result in the example above is '-123.txt'
So my question is:
Is there a way in regular expressions to get only part of the matched string, like a submatch of the match without the need to trim it in my shell script code for unix?
I found this answer but it is getting the same result:
Regexp: Trim parts of a string and return what ever is left
Tip: To test my regular expressions is used this website
You can use lookbehind and lookahead
(?<=-)[0-9]*(?=[.]txt)
Don't know if it would work in unix
Different regex-engines are different. Since you're using expr match, you need to make two changes:
expr match expects a regex that matches the entire string; so, you need to add .* at the beginning of yours, to cover everything before the hyphen.
expr match uses POSIX Basic Regular Expressions (BREs), which use \( and \) for grouping (and capturing) rather than merely ( and ).
But, conveniently, when you give expr match a regex that contains a capture-group, its output is the content of that capture-group; you don't need to do anything else special. So:
$ expr match TST0101201304-123.txt '.*-\([0-9]*\)\.txt'
123
sed is your friend.
echo filename | sed -e 's/-\([0-9]*\)/\1'
should get you what you want.

Regular expression get filename without extention from full filepath

How can I extract the filename without extention from the following file path:
D:\Projects\Extract\downtown - second.pdf
The following regular expression gives me the filename with extention: [^\\]*$
e.g. downtown - second.pdf
The following regular expression gives me the filename without extention: (.+)(?=(\.))
e.g. D:\Projects\Extract\downtown - second
I'm struggling to combine the two into one regular expression to give me the results I want: downtown - second
I suspect that your 2nd regex would not give you the output you have shown. It will give you the complete string till the first period (.).
To get just the file name without extension, you can use this regex: -
[^\\]*(?=[.][a-zA-Z]+$)
I have just replaced (.+) in your 2nd regex with the [^\\]* from your first regex, and added pattern to match pdf till the end.
Now this pattern will match 0 or more repetition of any character but backslash(\), followed by a . and then 1 or more repetition of alphabets making up extension.
I made up this one, which allows to capture most of the possibilities:
/[^\\\/]+(?=\.[\w]+$)|[^\\\/]+$/
/path/to/file
/path/to/file.txt
/path.with/dots.to/file.txt
/path/to/file.with.dots.txt
file.txt
C:\path\to\file.txt
and so on...
I captured file from /path/to/file.pdf by using following regex:
[^/]*(?=\.[^.]+($|\?))
Hope this helps you
I had to use an extra backslash before the first ']' to make this work
[^\\\]*(?=[.][a-zA-Z]+$)
I use this pattern
[^\/]+[.+\.].*$ for / path separator
[^\\]+[.+\.].*$ for \ path separator
hich matches the filename at the end of the string without worrying about characters. There is one exception that if the path for some reason has a folder with a period in it this will get upset. Linux hidden directories that are preceded with a . like .rvm are unaffected.
Hope this helps.
http://rubular.com/r/LNrI4inMU1

URL Rewrite Pattern to exclude application name from path

I'm trying to use the IIS 7 URL Rewrite feature for the first time, and I'm having trouble getting my regular expression working. It seems like it should be simple enough. All I need to do is rewrite a URL like this:
http://localhost/myApplication/MySpecialFolder
To:
http://localhost/MySpecialFolder
Is this possible? I want the regular expression to ignore everything before "myApplication" in the original URL, so that I could use "http://localhost" OR "http://mysite", etc.
Here's what I've got so far:
^myApplication/MySpecialFolder$
But using the "Test Pattern..." feature in IIS, it says my patterns don't match unless I supply "myApplication/MySpecialFolder" exactly. Does anyone know how I can update my regular expression so that everything prior to "myApplication" is ignored and the following URLs will be seen as a match?
http://localhost/myApplication/MySpecialFolder
http://mysite/myApplication/MySpecialFolder
Many thanks in advance!
SOLUTION:
I needed to change my regex to:
myApplication/MySpecialFolder
Without the ^ at the beginning and without the $ at the end.
Your regular expression is correct, the pattern will be matched against path starting after the first slash after the domain.
So only bold part will be used for matching: http://localhost/myApplication/MySpecialFolder
To limit the rewriting to specific domain you have to use Conditions section with Condition input = {HTTP_HOST}
Unless there is something radically different with regexes in IIS, you would want to take out the anchor (^) at the beginning to match.
myApplication/MySpecialFolder$
The carat ^ tells it that that is the beginning of the string and the dollar sign $ tells it to match the end. A regex like abc finds "abc" anywhere in the string, ^abc matches strings that start with "abc", abc$ matches strings that end with "abc", and ^abc$ only matches when the whole string is "abc".