Regex: how do I capture the file extension? - regex

How do I determine the file extension of a file name string?
lets say I have
I'm.a.file.name.tXt
the regex should return tXt

something like \.[^.]*$ should do it

You probably don't need regex - most languages will have the equivalent to this:
ListLast(Filename,'.')
(If you do need regex for some reason, Scharron's answer is correct.)

What language is this in? It's quite possible you don't want to actually use a regex - assuming the name is a String you'll probably just want to do something like split it over periods and then choose the last segment. Okay, that's sort of a regex answer, but not a proper one.

/^(.*?)\.(.*)$/
The '?' makes it greedy. Your result will be in the second group.

Related

Regex for file name in a directory

I have two files in a directory. FileAbc_1.xml and FileAbc.xml. I want write a regex that only select FileAbc_1.xml.
My regex is : FileAbc.*.xml
It is picking up both file names but I only want FileAbc_1.xml. Any help would great favor.
This will work for you
FileAbc_[0-9]+.xml
That should just be: FileAbc_\d\.xml
(assuming there's never more than one digit after the underscore)
You can go with this for anything that will start with FileAbc and end with XML FileAbc.+\.xml.

Exclude a certain String from variable in regex

Hi I have a Stylesheet where i use xsl:analyze-string with the following regex:
(&journal_abbrevs;)[\s ]*([0-9]{{4}})[,][\s ][S]?[\.]?[\s ]?([0-9]{{1,4}})([\s ][(][0-9]{{1,4}}[)])?
You don't need to look at the whole thing :)
&journal_abbrevs; looks like this:
"example-String1|example-String2|example-String3|..."
What I need to do know is exclude one of the strings in &journal_abbrevs; from this regex. E.g. I don't want example-String1 to be matched.
Any ideas on how to do that ?
It seems XSLT regex does not support look-around. So I don't think you'll be able to get a solution for this that does not involve writing out all strings from journal_abbrevs in your regex. Related question.
To minimize the amount of writing out, you could split journal_abbrevs into say journal_abbrevs1, journal_abbrevs2 and journal_abbrevs3 (or how many you decide to use) and only write out whichever one that contains the string you wish to exclude. If journal_abbrevs1 contains the string, you'd then end up with something like:
((&journal_abbrevs2;)|(&journal_abbrevs3;)|example-String2|example-String3|...)...
If it supported look-around, you could've used a very simple:
(?!example-String1)(&journal_abbrevs;)...

Regex to parse the first letter of each line?

This is the list:
Work
Work
Fire
Global
And I want to extract the string WWFG from it. [(?).*\n] just give me Global. What should I rather be using?
For context, I'm using Rainmeter's webparser plugin.
Try this: (?simU)^(.)
RainRegExp seems to lack the replacement feature, so it is impossible to get all the captures concatenated into one string.
You need to use the multiline flag with an anchor. I would use: /^(.)/gm (syntax differs from language to language)
See example here: http://regex101.com/r/uC1gV5
The easiest way depends on what language you're using, but you want to replace
(.).*\n
with
$1
(?siU)(?(?=.)(.))(?(?=.*\n).*\n(.))(?(?=.*\n).*\n(.))(?(?=.*\n).*\n(.))
Answered by #moshi here. And it works perfectly with Rainmeter.
I have no idea what language you're using, but here's some Python!
>>> import re
>>> ''.join(re.findall("(.).*", "Work\nWork\nFire\nGlobal"))
'WWFG'
This will capture the first character from each line
([a-z])[^\n]+\n*
the replace with \1 or $1
Depending on what is in the text you might need to change [a-z] to something more all-encompassing
If that is Lua, try s:gsub("(.).-\n","%1").

REGEX - Allow Numbers and . - /

I need a regular expression to allow just numeric and (.-/)
It should allow something like that:
011.235673.98923/0001-12
The pattern you're looking for, that matches only those strings with numbers, ., -, and /:
^[0-9\.\-\/]+$
If you have a specific language you're looking to implement this I may be able to help you.
You're looking for ^[\d./-]+$
to be sure it's in right order and require every part
^(\d+)(\.(\d+))*(\/(\d+))*-(\d+)$
edit: Forgot to add the / sry
^[\d./-]*$
does this. What regex flavor are you using? Perhaps it needs to be adjusted for it.
How about something like
(\d|\.|\-|\/)*
Does it matter how many - and . and / you get? Does the order matter?
You can use this item
[^1-9]
To view its performance, refer to the following link
regex101
This should do the work
^[\d\.\-\/]+$
If your sequence to be matched isn't at the start of the string, you can skip the ^. Similarly, $ is required to match sequence at the end of string.

Regex: Get Filename Without Extension in One Shot?

I want to get just the filename using regex, so I've been trying simple things like
([^\.]*)
which of course work only if the filename has one extension. But if it is adfadsfads.blah.txt I just want adfadsfads.blah. How can I do this with regex?
In regards to David's question, 'why would you use regex' for this, the answer is, 'for fun.' In fact, the code I'm using is simple
length_of_ext = File.extname(filename).length
filename = filename[0,(filename.length-length_of_ext)]
but I like to learn regex whenever possible because it always comes up at Geek cocktail parties.
Try this:
(.+?)(\.[^.]*$|$)
This will:
Capture filenames that start with a dot (e.g. .logs is a file named .logs, not a file extension), which is common in Unix.
Gets everything but the last dot: foo.bar.jpeg gets you foo.bar.
Handles files with no dot: secret-letter gets you secret-letter.
Note: as commenter j_random_hacker suggested, this performs as advertised, but you might want to precede things with an anchor for readability purposes.
Everything followed by a dot followed by one or more characters that's not a dot, followed by the end-of-string:
(.+?)\.[^\.]+$
The everything-before-the-last-dot is grouped for easy retrieval.
If you aren't 100% sure every file will have an extension, try:
(.+?)(\.[^\.]+$|$)
how about 2 captures one for the end and one for the filename.
eg.
(.+?)(?:\.[^\.]*$|$)
^(.*)\\(.*)(\..*)$
Gets the Path without the last \
The file without extension
The the extension with a .
Examples:
c:\1\2\3\Books.accdb
(c:\1\2\3)(Books)(.accdb)
Does not support multiple . in file name
Does support . in file path
I realize this question is a bit outdated, however, I had some trouble finding a good source and wound up making the regex myself. To save whoever may find this time,
If you're looking for a ~standalone~ regex
This will match the extension without the dot
\w+(?![\.\w])
This will always match the file name if it has an extention
[\w\. ]+(?=[\.])
Ok, I am not sure why I would use regular expression for this. If I know for example that the string is a full filepath, then I would use another API to get the file name. Regular expressions are very powerfull but at the same time quite complex (you have just proved that by asking how to create such a simple regex). Somebody said: you had a problem that you decided to solve it using regular expressions. Now you have two problems.
Think again. If you are on .NET platform for example, then take a look at System.IO.Path class.
I used this pattern for simple search:
^\s*[^\.\W]+$
for this text:
file.ext
fileext
file.ext.ext
file.ext
fileext
It finds fileext in the second and last lines.
I applied it in a text tree view of a folder (with spaces as indents).
Just the name of the file, without path and suffix.
^.*[\\|\/](.+?)\.[^\.]+$
Try
(?<=[\\\w\d-:]*\\)([\w\d-:]*)(?=\.[\.\w\d-:]*)
Captures just the filename of any kind within an entire filepath. Purposefully excludes the file path and the file extension
Etc:
C:\Log\test\bin\fee105d1-5008-410c-be39-883e5e40a33d.pdf
Doesn't capture (C:\Log\test\bin)
Captures (fee105d1-5008-410c-be39-883e5e40a33d)
Doesn't capture (.pdf)
This RegExp works for me:
(.+(?=\..+$))|(.+[^\.])
Results (bold means match):
test.txt
test 234!.something123
.test
.test.txt
test.test2.txt
.