Regex to find directory in text - regex

How do I find a path and file name in a block of text?
Before you mark this as duplicate I know questions about file paths exist
Regex for parsing directory and filename (does not match in a paragraph.)
Regex that matches directory path excluding filepath (this one just match the file name, answer doesn't work for paragraphs, and doesn't address . or spaces)
java regular expression to match file path (doesn't address . or spaces)
Regex for extracting filename from path (doesn't address being in a paragraph)
For example
In file included from /some/directoy/3.33A.37.2/something else/dogs.txt,
from /some/directoy/something else/dogs.txt,
from /some/directoyr/3.33A.37.2/something else/dogs.txt,
from /var/log/xyz/10032008.log,
from /var/log/xyz/test.c:29:
Solution:
please the file something.h has to be alone without others include, it has to be present in release letter,
in order to be included in /var/log/xyz/test.c and /var/log/xyz/test.h automatically
Other Note:
The file something.c must contain the somethinge.h and not the ecpfmbsd.h because it doesn't contain C operative code.. everything good..
The following are the ideal matches:
/some/directoy/3.33A.37.2/something else/dogs.txt
/some/directoy/something else/dogs.txt
/some/directoyr/3.33A.37.2/something else/dogs.txt
/var/log/xyz/10032008.log
/var/log/xyz/test.c:29 (this is a tricky one, ok with out it)
/var/log/xyz/test.c
/var/log/xyz/test.h
Going further what if I find an answer how can I change it to work with \ instead of / directories

You can use a regex like this:
\/.*\.[\w:]+
Working demo
Btw, if you want to allow backslashes in the path you can have:
[\\\/].*\.[\w:]+

This looks to be working:
\/[^,:]*\.\w+
See demo.
You can fine-tune this if you know the exact extensions, their lengths and what characters they have. As for me, \w+ would do to match extensions.

Related

NotePad++ regex match and replace and also keep match to convert to different markdown image reference link

I have the following link syntax that needs to be changed:
![[afoldernamenolongerneededandwillbedeprecated/somemarkdownfilename_image1.png]]
I tried (successfully) with this regex to match:
![[].*[\/].*_image[0-9].png[]]]
Although I have a hunch it may not be what I should use. I the novice think it may be only good for matching and not replacing. All images are png's, by the way. All filenames have _image in them, prefixed by the markdown file-name.
Desired end format:
![image](imagenamefromabovestring1,2,orhowevermanythereare.png)
The
![]()
is a known syntax in markdown to reference images. Images will be populated in subdirectories the program/app will find.
It goes without saying I want to run find and replace recursively on some 4000 files containing image references.
I put up the unfinished substitution example here:
https://regex101.com/r/Bl8HJC/1
So to clarify more on what I need. I need the formerly present folder name gone. I don't need it anymore. Then after the slash comes the name of the image, the syntax of which is always: current filename to be proccessed by NotePad++ recursively (it can be a markdown file named Ab, Aba, Abracadabra, etc.) and this filename always serves as prefix, then comes an underscore and 'image' + a number depending on how many images are linked to the markdown file as attachments. The names of the files to go in an attachment folder will look like this:
AB_image3.png
Abracadabra_image2.png
.
.
.
Zodiac_image45.png
I am looking for the right syntax as I couldn't figure it out with the dollar sign.
Cheers,
Otto
I have modified your example to get it working here. What you needed to do is escape the square brackets so they would be interpreted literally, since they have special meaning in regex, and you needed to use a capture group to store the matching value in $1 so you could use it in the replacement.
Regular expression:
!\[\[.*\/(.*_image[0-9]{1,2}\.png)\]\]
Substitution format:
![image]\($1\)
Edit: Question was revised to state that the folder name was unwanted in the final output, so matches are delimited after the final / character in the file path.
Edit 2: Support for file numbers 1 through 99.

Path Validation - My RegEx is matching leading spaces in directory names and I can't fix it

I'm back again with more RegEx shenanigans.
I had what I thought was a perfect windows path validation expression.
Here it is at Regex101: https://regex101.com/r/BertHu/6
^(?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\|\\?[^\\\/:*?"<>|\r\n]+\\?)(?:[^\\\/:*?"<>|\r\n]+\\)*[^\\\/:*?"<>|\r\n]*$
Breakdown:
# (?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\| # Drive
# \\?[^\\\/:*?"<>|\r\n]+\\?) # Relative path
# (?:[^\\\/:*?"<>|\r\n]+\\)* # Folder
# [^\\\/:*?"<>|\r\n]* # File
The issue I'm having now, is that the expression is matching paths with leading spaces in directories.
Example: C:\ Leading Space\ Shouldnt Match is matching.
I tried adding [^\s] to the folder portion of the expression:
(?:[^\s][^\\\/:*?"<>|\r\n]+\\)*
But that only invalidates a leading space in the first path segment:
C:\ LeadingSpace\ShouldntMatch Doesn't match (Good)
C:\LeadingSpace\ ShouldntMatch Matches incorrectly (Bad)
I think the problem lies here:
If anyone could help or point me in the right direction that would be great.
Sorry for all the RegEx questions!
Well it depends what the exact rules are, if I take your regex101 script, as basis, I would say:
File, Folder and Relative Folder, are more or less the same (if you ignore the no-capture group and the Backslashes):
\\?[^\\\/:*?"<>|\r\n]+\\?
(?:[^\\\/:*?"<>|\r\n]+\\)*
[^\\\/:*?"<>|\r\n]*
So there are three potenital places, where folders could start with a leading space.
You could add a [^\s] infront of all of them like this
^(?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\|\\?([^\s][^\\\/:*?"<>|\r\n])+\\?)(?:[^\s][^\\\/:*?"<>|\r\n]+\\)*([^\s][^\\\/:*?"<>|\r\n])*$
I saved the modified regex101 https://regex101.com/r/Pd3lcR/1
Now it should work, at least for my limited testcases, and information about the restriction.
Btw.: I don't know what your use case is, but this regex is pretty long for a smiple matching and filename capture, may be there is a more readable way(for non regex people).
Update:
to fix the introduced Bug, I have to prevent the Share option matching with the relative path, by preventing a double slash with (?!\\)
^(?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\|\\?((?!\\)[^\\\/:*?"<>|\r\n])+\\?)(?:[^\s][^\\\/:*?"<>|\r\n]+\\)*([^\s][^\\\/:*?"<>|\r\n])*$
here the updated regex: https://regex101.com/r/RMVkTC/3
Update (Version 2):
I rewrote the regex to the way I would create it. It is not perfectly optimized(short), but this way it is easier to test/bugfix.
The RegExp is exactly 3 parts, piped together:
Drive + path + folder/file: (^[a-z]:\\([^\s][^\\\/:*?"<>|\r\n]+\\)*[^\s][^\\\/:*?"<>|\r\n]+$)
relativepath + folder/file: (^(\.?\.?\\[^\\])*([^\\\/:*?"<>|\r\n]+\\?)*$)
Share + folder/file: (^\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+\\([^\s][^\\\/:*?"<>|\r\n]*\\?)*$)
like this, if you have to change something for one edge case, it is more contained and easy to adapt.
here the updated regex: https://regex101.com/r/Qxj3Ni/1

regex match file name in a path

I am trying to match the file name in a path. For example:
/path/test/index.html
I would want to match index.html
However I also can have a path with no / so the path could be
index.html
and would want to match index.html
I have the following to match the first case and can grab it with a group.
.*/([^/]+)
But how can I also match a file name when the only thing in the path is the file name?
There is probably no need to have anything but [^/]+$ unless you want to
match the entire line and your engine matcher requires it.
There is usually more than one way to do a regex, and also that there are often edge cases that end up complicating a simple task.
If you want to match any/ every valid file name in a string then perhaps:
[A-Za-z0-9_-]+\.?[A-Za-z0-9]*
or (since you can have a file named ConfigurationFile.txt.bac for example)...
[A-Za-z0-9_-\.]+\.?[A-Za-z0-9]*
But that is not what you want because each directory name is a valid file name... so...
this will match only valid file names with an extension.
[A-Za-z0-9_-]+\.[A-Za-z0-9]+
or
[A-Za-z0-9_-\.]+\.[A-Za-z0-9]+
Clearly there are many options. The AA(accepted answer) only matches any string in a path that is at the end of the path. It does not match a file name without a path. The AA may well do for the OP. It is helpful to me to be able to match any file name within a string.
There are always edge cases, for example in my case I am still matching version numbers with this regex. I have a work around for my case but I am getting too specific.
Make the .*/ into an optional group:
(?:.*/)?([^/]+)

Regex expression to match a string but exclude something at the same time

I want to try and ask this as concisely as possible please forgive me if I'm leaving something out. I want the expression to match all cases except where an exact filename string is present.
A backup software I'm using uses regular expressions and I want to setup an exclusion to skip all of a particular file extension type, except I have certain files I need to backup so I don't want them to match.
The files I want to exclude are we'll say for this example *.FLV
(?i).*\.flv
I want to include in my backups three files: abc123.flv, ghk432.flv, and fdw917.flv
This is where I'm having trouble, even just including one file from the three to be included to backup
(?i).*\.flv^(?!(abc123\.flv))&
The expression is being added to an Exclusion List for code42 CrashPlan backup, their support unfortunately cannot assist with complex RegEx expressions.
The closest thing I can supply as an example is their Example 3: Using An Exclude To Include:
.*/Documents/((?!(.*\.(doc|rtf)|.*/)$).)*$
http://support.code42.com/Administrator/3.6_And_4.0/Configuring/Using_Include_And_Exclude_Filters
However it excludes all files within directories named "Documents" and includes any files in those folders with doc or rtf file extensions. I'm trying to create an expression working with file extensions irregardless of folder location.
In my brain logically it seems like I need to write this as some kind of if then else statement but regex is not my forte.
Use an anchored negative look ahead with an alternation for the files you want to keep:
^(?i)(?!.*(abc123|ghk432|fdw917)\.flv).*\.flv
The negative lookahead asserts that the following input does not match its regex, and the pipe character means "or".
Try to put the negative lookahead at the position of the filename in the path:
^([^/]*/)*(?!(abc123|ghk432|fdw917)\.flv$)[^/]*\.flv$

regex to get portion of file name after last dot without file extension

I have a bunch of files, some examples are as follows:
/foo1/foo2/bar1.bar2.bar3.answer.jar
/foo1/bar1.bar2.answer.jar
/foo1/foo2/answer.jar
and for all of the above I would like a regex that matches 'answer'. In other words, I'm looking to get an alias for the file that is the portion of the file name after the last dot (or the file name itself if there are no dots) with the file extension (.jar can be guaranteed here to make it simpler) stripped off.
I know I can do this with a more simple regex to split the value up by dots and then get the second last one, but in this case I'm building a back-end thing that will ideally take a regex that is defined in a configuration definition for the given file type, and spit out the alias, which might be different for other file types.
Yep, I'm over-engineering. :)
Any ideas?
Following regex should work for you:
[^/.]+(?=\.jar$)
If using Javascript or a similar flavor where / is regex delimiter then you need to escape / like this:
[^\/.]+(?=\.jar$)
You can use the following regexp: (assuming that the answer part doesn't contain . or /)
[/\.]([^/\.]+)\.jar
The first capturing group is the part what you want to.