How to exclude file extension from string with regex - regex

I want to be able to get two matching groups from a regex and exclude a third.
This is an example of a string I want to match:
my-file-name-0.44.0.6-SOME-SNAPSHOT.zip
I want two matching groups, one for the file name without the version and one for the version without the file extension.
Group 1: my-file-name
Group 2: 0.44.0.6-SOME-SNAPSHOT
Excluded: .zip
the file name can be random, but the version will always have a hyphen before it, then file extension can also be random.
This is what I have come up with, but can't figure out the exclude part.
(.*?)-([0-9.]{1,4}.*)

Append \. to your regex:
(.*?)-([0-9.]{1,4}.*)\.
However you may want to modify it a little:
(.*?)-(\d.*)\.\w+
Live demo

Use this regular expression to remove file extension:
/(.*)\.[^.]+$/

Related

Regular Expression remove specific text in file name

I am using a file transferring tool that allows the use of Regular Expression to rename files as they are copied into a new folder (so I am working with Regular Expression only and not inside a code base) I have a large set of files with a specific naming convention with a version number at the end of the file name. My goal is to remove this file version number along with the underscore.
Here are some examples of the file names:
the_file_name_DS_017_EN_35.pdf
the_file_name_DS_037_SP_35.pdf
different_filename_DS_EN_5.pdf
I am looking to change them to:
the_file_name_DS_017_EN.pdf
the_file_name_DS_037_SP.pdf
different_filename_DS_EN.pdf
I am trying to remove the version number so that the file naming convention on my new server will always be the same. I am not good with regex and this is what I tried so far but to no avail:
Using _[^_]+$ it selects last underscore along with the .pdf extension.
Using \_(.*?)\. it selects the first underscore until the period.
How do I select the last underscore until the period removing that text but keeping the period? Maybe there is a better method? Thanks in advance!
If you regex motor works with positive lookaheads, you might work it like this and replace it by nothing
(_\d+)(?=\.pdf$)
Demo
Explanation :
(_\d+) will follow an underscore following by one or more digits
(?=\.pdf$) will match as a positive lookahead the .pdf extension at the end of the file name
TRY to use the regular expression here:
_[0-9]*\.
and replace it by
.

Improving a regex

I am looking for alternate methods to get john from the provided example.
My expression works as is but was hoping for some examples of better methods.
Example: john&home
my regexp: [a-z]{3,6}[^&home]
Im matching any character of length 3-6 upto but not including &home
Every item i run the regexp on is in the same format. 3-6 characters followed by &home
I have looked at other posts but was hoping for a reply specific to my regexp.
Most regex engines allow you to capture parts of a regex with capture groups. For instance:
^([A-Za-z]{3,6})&home$
The brackets here mean that you are interested in the part before the &home. The ^ and $ mean that you want to match the entire string. Without it, averylongname&homeofsomeone will be matched as well.
Since you use rubular, I assume you use the Ruby regex engine. In that case you can for instance use:
full = "john&home"
name = full.match(/^([A-Za-z]{3,6})&home$/).captures
And name will in this case contain john.

Notepad++ Regex to Insert Line between certain lines

I was trying this in Notepad++, but I'm not entirely sure if it's possible there. I have an iCal file where I need to insert some missing "Organizer" fields. For example, I need
DTSTAMP:20140821T160519Z
UID:ExampleUID1
to be this
DTSTAMP:20140821T160519Z
ORGANIZER;CN=Test:mailto:me#example.com
UID:ExampleUID1
The organizer is there in some cases, and is not a static value. How can I do this while keeping the DTSTAMP string intact? Finding DTSTAMP:[A-Za-z0-9_]*$\r\nUID: finds the entries, but I can't find out what to replace with. Using
DTSTAMP:^([A-Za-z0-9_])*$\r\nORGANIZER;CN=Test:mailto:me#example.com\r\nUID:
or any variation thereof injects the actual regex text (^[A-Za-z0-9_]*$.) into the results.
Replace this pattern:
^(DTSTAMP:.+)$
With this replacement string:
\1\r\nORGANIZER;CN=Test:mailto:me#example.com
You have to check the Regular expression mode (obviously), and uncheck the . matches newline option.
For some additional security, you also cound use this pattern:
^(DTSTAMP:.+)$(?!\r\nORGANIZER)
This won't insert an ORGANIZER field if one already exists just below the DTSTAMP field.
Also, if your iCal file is in UNIX newline format, replace every \r\n with \n.
You can use this regex:
(DTSTAMP.*)
Working demo
Check the substitution section:

Repeating groups regex url path, node.js

I am trying to extract express route named parameters with regex.
So, for example:
www.test.com/something/:var/else/:var2
I am trying with this regex:
.*\/?([:]+\w+)+
but I am getting only last matched group.
Does anyone knows how to match both :var and :var2.
The first problem is that .* is greedy, and will therefore bypass all matches until the final one is found. This means that the first :var is bypassed.
However, as you are searching for a variable number of capture groups (with thanks to #MichaelTang), I recommend using two regexes in sequence. First, use
^(?:.*?\/?\:\w+)+$
to detect which lines contain colon-elements...
Debuggex Demo
...and then search that line repeatedly for, simply
\/:(\w+)
This places the text post-colon into capture group one.
Debuggex Demo
Here is how you can match both of them:
www.test.com/something/:var/else/:var2'.match(/\:(\w+)/g)
[":var", ":var2"]

Regex to parse file paths

I have this text:
Unexpected error creating debug information file
'c:\Users\Path1\Path2\Strategies\Path3\CustomStrategy.PDB' --
'c:\Users\Path1\Path2\Strategies\Path3\CustomStrategy.pdb: The system
cannot find the path specified.
I need to parse out the file paths c:\Users\Path1\Path2\Strategies\Path3 or c:\Users\Path1\Path2\Strategies\Path3\CustomStrategy.PDB, whatever is easier. I tried to use the following Regex
\w:.+[.]\w{3}
But, this RegEx doesn't stop at first file extension and continues to match the the second instance of the path, stopping at the second instance of .pdb; thus putting both file paths in one regex match.
What do I need to change in order for the regex to parse the two paths as two separate matches? Thanks.
Non-greedy re:
\w:.+?[.]\w{3}
Note ? after +.
Also, if your path contains no dots except the last one, you can write it so:
\w:[^.]+[.]\w{3}
If you are not sure that the extension consists of three letters, you must specify the range:
\w:[^.]+[.]\w{1,3}
And when you are not sure that your path has extension at all, but it contains no spaces, then:
\w:\S+
What about this
\w:\\(?:[^\\\s]+\\)+
See it here on Regexr
\w:\\ matches a word character, a : and a backslash
(?:[^\\\s]+\\)+ matches the directories, non-backslash or non whitespace characters till a backslash, and this repeated.
So, this would match both paths c:\Users\Path1\Path2\Strategies\Path3. works as long as the directory names does not contain spaces.
Actually, here you may as well do without regex at all.
Split the text by ' and use the second part.
As for regex, I would use something more complicated, but allowing to catch other filenames, not just those ending with a 3-letter extension:
'([a-z]:(?:[\\/][^\\/]*)+?)' --
(and use first subpattern from the match)