Regex to match only last segment of a folder structure - regex

I have a recursive list of folders that I need to find characters in, but I do not want subfolders included in the result. I need to find many different characters that will be an issue when migrating data, including asterisks, double periods, etc.
For this example I will use double-period (..). I only need the first, fourth, and seventh lines
/System/Modules/Aspect/dmc_attachments_aspect..J5_D65
/System/Modules/Aspect/dmc_attachments_aspect..J5_D65/External Interfaces
/System/Modules/Aspect/dmc_attachments_aspect..J5_D65/Miscellaneous
/System/Modules/Collaboration/com.documentum.services.collaboration.IAttachmentsManager..J5_D65
/System/Modules/Collaboration/com.documentum.services.collaboration.IAttachmentsManager..J5_D65/External Interfaces
/System/Modules/Collaboration/com.documentum.services.collaboration.IAttachmentsManager..J5_D65/Miscellaneous
/System/Modules/TBO/dm_message_archive..J5_D65
/System/Modules/TBO/dm_message_archive..J5_D65/External Interfaces
Another example would be an asterisk -- I only need the first, fourth, and seventh lines.
/Public/Test/*Training
/Public/Test/*Training*/Documentation
/Public/Test/*Training*/SOPs
/Public/Test/Project**Tracking
/Public/Test/Project**Tracking/01
/Public/Test/Project**Tracking/02
/Public/Home*
/Public/Home*/Test
Is there a regex I could use to meet this? I am happy running multiple queries/reports and updating the main character (.. or *)
I wanted to give some clarity on the issue so I can avoid the XY problem.
We are migrating data from Documentum to SharePoint, and Documentum does not have the same file and folder name restrictions, so we will have to address those ahead of the migration or on the fly. I have a big text file (950k lines) containing all of the folders currently in Documentum, and I am attempting to find all folders that will not migrate due to containing these characters.
The issue is that doing a basic egrep '\*' will give not just the top level folder containing this character but all subfolders, which will throw off counts.

Let's say you were looking for the double period:
.*\.\.[^/]*$
would match two periods followed by an unlimited number of non-slash characters until the end of the string. In general, replace \.\. with whatever you are looking for.
Check it out at regex101.com. (Asterisk version here).

Related

Regex to select two spaces between words (or two spaces before a letter)

I'm cleaning up some ancient HTML help files and there are a lot of double-spaces that I'd like to clean up (replace each double-space with single).
Sample: <li><b>% Successful</b>. The percentage of jobs that returned a confirmation.</li>
I want to find double-spaces only before the start of sentences or between words (so between the setting label or between 'percentage' and 'of', not spaces in isolation or before the XML tags).
I tried a simple search for two spaces, but that also brings up the tab/space mixture that the creator used for formatting the indents, so I'm getting five useless results for every relevant one.
Is there a single regex that would help with both use cases, or is it better to use two different ones for each format? I'm fine either way, just am still pretty new to regexes and not sure where to start on this one.
Spaces between periods and text (three here for a better visual):
<li><b>Submit Time</b>. The time the job was scheduled.</li>
Spaces between words (three here as well):
<li><b>End Time</b>. The date/time when the job was completed or canceled.</li>
I'm looking for multiple spaces either between words or at the start of sentences (generally after the setting name period).

Regular expression selection of last two folder path names

Im working in perl but I am having some issues with cutting out the other folder names to leave the last two folder names.
Trainings and Events
General Office\Archive
General Office\Office Opperations\Contacts
General Office\Office Opperations\Office Procedures\How Tos
public_html\Accordion\.svn\tmp\text-base
I would like to remove the folder path names so the folders will end up like this with the last two path names for each:
Trainings and Events
General Office\Archive
Office Opperations\Contacts
Office Procedures\How Tos
tmp\text-base
Office Opperations\Contacts
Ive read some other stackoverflow that showed the last two folder names picked out, but I cannot reverse the expression. (I will link it if I can find it)
I am using sublime text 3 currently to test my expressions.
Thank you
The pattern you want is
s/.+\/(.+\/.+)$/$1/g
It'll grab the last bit of the folder name and replace the whole string with it.
(?:[^\\]*\\|^)[^\\]*$
You actually want "up to two", because otherwise you won't catch "Training and Events". To solve this, I match the two cases separately:
Either a folder name (any number of non-backslashes) followed by a slash, or start of the string
Another folder name
End of string

Regex for Current NTUSER.DAT files

I am trying to come up with a regex (PCRE) that finds current windows NTUSER.DAT files when cycling through a file list (valid NTUSER.DAT are the ones that are in the correct path for use by Windows).
I am trying to exclude any NTUSER.DAT files that have been copied by a user and placed in a different location (e.g. on the Desktop). In the following sample data, the first 4 results are valid, the next 3 are invalid:
\Users\John Thomas Hamilton\ntuser.dat
\Users\Default\NTUSER.DAT
\Users\Mary Thomas\NTUSER.DAT
\Users\UpdatusUser\NTUSER.DAT
\Users\John Thomas Hamilton\Desktop\My Stuff\Windows\Users\Default\NTUSER.DAT
\Users\John Thomas Hamilton\Desktop\My Stuff\Windows\Users\Student\NTUSER.DAT
\Users\John Thomas Hamilton\Desktop\My Stuff\My stuff to sort\Tech Support Fix it\NTUSER.DAT
Currently the best/simplest regex I have is:
\\USERS\\[A-Z0-9]+\\NTUSER.DAT$
but of course there a plenty of valid Windows file name characters other than letters and numbers that could exist in the user name.
I think i need to search up to the first occurrence of the new folder "\" and then if it does not have NTUSER.DAT after it, reject it. I have not had any luck doing this so any help would be appreciated.
Well assuming you have a valid file list, this would work:
^\\Users\\[^\\]+?\\NTUSER.DAT$
Make sure you ignore case.
The secret is using [^\\]+? instead of .+? so that you match exactly one folder length in.

give sudo permission to log files on different paths like /a/b1/c.log and /a/b2/d.log etc. files

I need a nice column for Centrify tool which include all the log files under the different folders, for example;
/oradata1/oracle/admin/A/scripts/rman_logs/*.log
/oracle/oracle/admin/B/scripts/rman_logs/*.log
/oradata2/admin/C/scripts/logs/*.log
I used this but after the * character user can see all logs;
/ora(data(1|2)|cle)/oracle|admin/admin/*/scripts/rman_logs
/ora(data(1|2)|cle)/oracle|admin/admin/*/scripts/rman_logs
Which expression I must use.
If I understandy our question correctly, you want only .log files. You can use a positive lookahead to assert that it is indeed a log file (contains .log at the end of filename), and match the filename whatever it is (.*).
Then it's really easy. (?=.*\.log(?:$|\s)).* Of course, you can also add specific folders if you wish to restrict the matches, but the positive lookahead will still do its work. I.e. (?=.*\.log(?:$|\s)).*/scripts/.*
EDIT: As your comment, you only need those folders, so you just specify their filepaths in alternations and add [^.\s\/]*\.log at the end. So:
(?:\/oradata1\/oracle\/admin\/A\/scripts\/rman_logs\/|\/oracle\/oracle\/admin\/B\/scripts\/rman_logs\/|\/oradata2\/admin\/C\/scripts\/logs\/)[^\s.\/]*\.log You may shorten the regex by trying to combine filepath elements, but, imo, not necessary as you might as well specify each filepath individually, if they don't overlap too much.
I have found a global expression.
this is not a good way but it works and save me from lots of job. The main files are under the ....../scripts/rman_logs/ for all servers so I use this way.
I can produce these lines and can be a command group for users so this works good
tail /////scripts/rman_logs/*.log
tail ////scripts/rman_logs/.log
Thanks for your helps.

PowerShell isolating parts of strings

I have no experience with regular expressions and would love some help and suggestions on a possible solution to deleting parts of file names contained in a csv file.
Problem:
A list of exported file names contains a random unique identifier that I need isolated. The unique identifier has no predictable pattern, however the aspects which need removing do. Each file name ends with one of the following variations:
V, -V, or %20V followed by a random number sequence with possible spaces, additional "-","" and ending with .PDF
examples:
GTD-LVOE-43-0021 V10 0.PDF
GTD-LVOE-43-0021-V34-2.PDF
GTD-LVOE-43-0021_V02_9.PDF
GTD-LVOE-43-0021 V49.9.PDF
Solution:
My plan was to write a script to select of the first occurrence of a V from the end of the string and then delete it and everything to the right of it. Then the file names can be cleaned up by deleting any "-" or "_" and white space that occurs at the end of a string.
Question:
How can I do this with a regular expression and is my line of thinking even close to the right approach to solving this?
REGEX: [\s\-_]V.*?\.PDF
Might do the trick. You'd still need to replace away any leading - and _, but it should get you down the path, hopefully.
This would read as follows..
start with a whitespace, - OR _ followed by a V. Then take everything until you get to the first .PDF