Regular expression selection of last two folder path names - regex

Im working in perl but I am having some issues with cutting out the other folder names to leave the last two folder names.
Trainings and Events
General Office\Archive
General Office\Office Opperations\Contacts
General Office\Office Opperations\Office Procedures\How Tos
public_html\Accordion\.svn\tmp\text-base
I would like to remove the folder path names so the folders will end up like this with the last two path names for each:
Trainings and Events
General Office\Archive
Office Opperations\Contacts
Office Procedures\How Tos
tmp\text-base
Office Opperations\Contacts
Ive read some other stackoverflow that showed the last two folder names picked out, but I cannot reverse the expression. (I will link it if I can find it)
I am using sublime text 3 currently to test my expressions.
Thank you

The pattern you want is
s/.+\/(.+\/.+)$/$1/g
It'll grab the last bit of the folder name and replace the whole string with it.

(?:[^\\]*\\|^)[^\\]*$
You actually want "up to two", because otherwise you won't catch "Training and Events". To solve this, I match the two cases separately:
Either a folder name (any number of non-backslashes) followed by a slash, or start of the string
Another folder name
End of string

Related

What is the name for the type of "loose" search/filter that allows for other characters in the middle?

I'm not sure which community this belongs in, feel free to suggest a better one if this doesn't fit here.
In Visual Studio Code, when searching for a file, you can CMD/Ctrl + P to bring up the Quick Open search box for finding a file by name. The search doesn't have to be the exact name and it filters as long as the search query contains the characters in that order, while being "loose" enough to ignore any characters between those.
Example:
Searching "cat" would show the following:
bigcat.txt
cat.txt
candlelight.txt
In the above, all the strings contain "cat" within it, even if there are other characters between it. The regex would probably be something like /.*c.*a.*t.*/.
Is there a name for this type of search/filter?
Fuzzy Filter/Search
After looking through VS Code's GitHub issues list, I found an issue that mentioned it.
I also found a node module that does this exact same thing.
There is also a Wikipedia entry on Approximate String Matching, which is similar to the above.

In Atom, is there a way to search only in files whose name match a regex?

I use Atom as my primary coding environment, and generally I love it. There is one feature that I could really use right now, and I'm not sure if it exists or not.
Basically, I want to do a project-wide search for a string ("1.1.0"), but I only want to search within files that have the word "build" in them. I know that Atom allows me to search a file/directory pattern, such as src/assets or even src/assets/*.cs or src/assets/buildFile.*
But in this particular project there are tons of files that have the world build - CustomBuild.xml, BuildScript.cs, FinalBuild.xml, etc. Is there any way that I can tell Atom to search for my query string in a regex-defined file/directory pattern? (I'm also open to other ways of solving my problem)
Thank you for your time!
Update: Just to clarify, some things I've tried so far:
Searching using "*build*" for my file/directory pattern (only returns file names that are build.*)
Using */**/*build*.* (same issue)

Regex to match only last segment of a folder structure

I have a recursive list of folders that I need to find characters in, but I do not want subfolders included in the result. I need to find many different characters that will be an issue when migrating data, including asterisks, double periods, etc.
For this example I will use double-period (..). I only need the first, fourth, and seventh lines
/System/Modules/Aspect/dmc_attachments_aspect..J5_D65
/System/Modules/Aspect/dmc_attachments_aspect..J5_D65/External Interfaces
/System/Modules/Aspect/dmc_attachments_aspect..J5_D65/Miscellaneous
/System/Modules/Collaboration/com.documentum.services.collaboration.IAttachmentsManager..J5_D65
/System/Modules/Collaboration/com.documentum.services.collaboration.IAttachmentsManager..J5_D65/External Interfaces
/System/Modules/Collaboration/com.documentum.services.collaboration.IAttachmentsManager..J5_D65/Miscellaneous
/System/Modules/TBO/dm_message_archive..J5_D65
/System/Modules/TBO/dm_message_archive..J5_D65/External Interfaces
Another example would be an asterisk -- I only need the first, fourth, and seventh lines.
/Public/Test/*Training
/Public/Test/*Training*/Documentation
/Public/Test/*Training*/SOPs
/Public/Test/Project**Tracking
/Public/Test/Project**Tracking/01
/Public/Test/Project**Tracking/02
/Public/Home*
/Public/Home*/Test
Is there a regex I could use to meet this? I am happy running multiple queries/reports and updating the main character (.. or *)
I wanted to give some clarity on the issue so I can avoid the XY problem.
We are migrating data from Documentum to SharePoint, and Documentum does not have the same file and folder name restrictions, so we will have to address those ahead of the migration or on the fly. I have a big text file (950k lines) containing all of the folders currently in Documentum, and I am attempting to find all folders that will not migrate due to containing these characters.
The issue is that doing a basic egrep '\*' will give not just the top level folder containing this character but all subfolders, which will throw off counts.
Let's say you were looking for the double period:
.*\.\.[^/]*$
would match two periods followed by an unlimited number of non-slash characters until the end of the string. In general, replace \.\. with whatever you are looking for.
Check it out at regex101.com. (Asterisk version here).

Regex for Current NTUSER.DAT files

I am trying to come up with a regex (PCRE) that finds current windows NTUSER.DAT files when cycling through a file list (valid NTUSER.DAT are the ones that are in the correct path for use by Windows).
I am trying to exclude any NTUSER.DAT files that have been copied by a user and placed in a different location (e.g. on the Desktop). In the following sample data, the first 4 results are valid, the next 3 are invalid:
\Users\John Thomas Hamilton\ntuser.dat
\Users\Default\NTUSER.DAT
\Users\Mary Thomas\NTUSER.DAT
\Users\UpdatusUser\NTUSER.DAT
\Users\John Thomas Hamilton\Desktop\My Stuff\Windows\Users\Default\NTUSER.DAT
\Users\John Thomas Hamilton\Desktop\My Stuff\Windows\Users\Student\NTUSER.DAT
\Users\John Thomas Hamilton\Desktop\My Stuff\My stuff to sort\Tech Support Fix it\NTUSER.DAT
Currently the best/simplest regex I have is:
\\USERS\\[A-Z0-9]+\\NTUSER.DAT$
but of course there a plenty of valid Windows file name characters other than letters and numbers that could exist in the user name.
I think i need to search up to the first occurrence of the new folder "\" and then if it does not have NTUSER.DAT after it, reject it. I have not had any luck doing this so any help would be appreciated.
Well assuming you have a valid file list, this would work:
^\\Users\\[^\\]+?\\NTUSER.DAT$
Make sure you ignore case.
The secret is using [^\\]+? instead of .+? so that you match exactly one folder length in.

RegEx to rewrite folder structure of varying length and file name to string

This is the code I'm using, developed with the help of #anubhava to rewrite a path generated by a CGI script to redirect the path from the location of my jpg image files to another folder that contains watermarked image files in the same folder structure organization as the originals, but exclude files that begin with tn_ or AM (plus _category_image.jpg):
RewriteRule ^ImageFolio4_files/1/([^/]+)/((?!AM|tn_)[^.]+\.jpg)$ /ImageFolio4_files/cache/images/~$1~$2 [L,R=302,NC]
The original path of:
/ImageFolio4_files/1/Casual_Portraits/abc123_789-xyz.jpg
And the above RegEx works to properly generate this output:
/ImageFolio4_files/cache/images/~Casual_Portraits~abc123_789-xyz.jpg
My CHALLENGE: I need to accommodate a multi-folder structure up to three folders deep underneath the ImageFolio4_files/1/ structure. The current code doesn't accomodate that. I also need to exclude any files named _category_image.jpg which occurs at each of the folder levels beneath ImageFolio4_files/1/ (these files are unique small display icons that appear next to the category names)
I really have no idea how to accomodate the multi-folder structure so your help would be appreciated.
First, change
([^/]+)/ to (([^/]+)/)+
in your expression.
Second, change
(?!AM|tn_) to (?!AM|tn_|_category_image.jpg)
You can use the the negative lookahead (?!) for the whole filename as well, it doesn't eat up characters, just checks if the regex "AM|tn_|_category_image.jpg" matches at the actual position.