I am trying to come up with a regex (PCRE) that finds current windows NTUSER.DAT files when cycling through a file list (valid NTUSER.DAT are the ones that are in the correct path for use by Windows).
I am trying to exclude any NTUSER.DAT files that have been copied by a user and placed in a different location (e.g. on the Desktop). In the following sample data, the first 4 results are valid, the next 3 are invalid:
\Users\John Thomas Hamilton\ntuser.dat
\Users\Default\NTUSER.DAT
\Users\Mary Thomas\NTUSER.DAT
\Users\UpdatusUser\NTUSER.DAT
\Users\John Thomas Hamilton\Desktop\My Stuff\Windows\Users\Default\NTUSER.DAT
\Users\John Thomas Hamilton\Desktop\My Stuff\Windows\Users\Student\NTUSER.DAT
\Users\John Thomas Hamilton\Desktop\My Stuff\My stuff to sort\Tech Support Fix it\NTUSER.DAT
Currently the best/simplest regex I have is:
\\USERS\\[A-Z0-9]+\\NTUSER.DAT$
but of course there a plenty of valid Windows file name characters other than letters and numbers that could exist in the user name.
I think i need to search up to the first occurrence of the new folder "\" and then if it does not have NTUSER.DAT after it, reject it. I have not had any luck doing this so any help would be appreciated.
Well assuming you have a valid file list, this would work:
^\\Users\\[^\\]+?\\NTUSER.DAT$
Make sure you ignore case.
The secret is using [^\\]+? instead of .+? so that you match exactly one folder length in.
Related
I use Atom as my primary coding environment, and generally I love it. There is one feature that I could really use right now, and I'm not sure if it exists or not.
Basically, I want to do a project-wide search for a string ("1.1.0"), but I only want to search within files that have the word "build" in them. I know that Atom allows me to search a file/directory pattern, such as src/assets or even src/assets/*.cs or src/assets/buildFile.*
But in this particular project there are tons of files that have the world build - CustomBuild.xml, BuildScript.cs, FinalBuild.xml, etc. Is there any way that I can tell Atom to search for my query string in a regex-defined file/directory pattern? (I'm also open to other ways of solving my problem)
Thank you for your time!
Update: Just to clarify, some things I've tried so far:
Searching using "*build*" for my file/directory pattern (only returns file names that are build.*)
Using */**/*build*.* (same issue)
I have a recursive list of folders that I need to find characters in, but I do not want subfolders included in the result. I need to find many different characters that will be an issue when migrating data, including asterisks, double periods, etc.
For this example I will use double-period (..). I only need the first, fourth, and seventh lines
/System/Modules/Aspect/dmc_attachments_aspect..J5_D65
/System/Modules/Aspect/dmc_attachments_aspect..J5_D65/External Interfaces
/System/Modules/Aspect/dmc_attachments_aspect..J5_D65/Miscellaneous
/System/Modules/Collaboration/com.documentum.services.collaboration.IAttachmentsManager..J5_D65
/System/Modules/Collaboration/com.documentum.services.collaboration.IAttachmentsManager..J5_D65/External Interfaces
/System/Modules/Collaboration/com.documentum.services.collaboration.IAttachmentsManager..J5_D65/Miscellaneous
/System/Modules/TBO/dm_message_archive..J5_D65
/System/Modules/TBO/dm_message_archive..J5_D65/External Interfaces
Another example would be an asterisk -- I only need the first, fourth, and seventh lines.
/Public/Test/*Training
/Public/Test/*Training*/Documentation
/Public/Test/*Training*/SOPs
/Public/Test/Project**Tracking
/Public/Test/Project**Tracking/01
/Public/Test/Project**Tracking/02
/Public/Home*
/Public/Home*/Test
Is there a regex I could use to meet this? I am happy running multiple queries/reports and updating the main character (.. or *)
I wanted to give some clarity on the issue so I can avoid the XY problem.
We are migrating data from Documentum to SharePoint, and Documentum does not have the same file and folder name restrictions, so we will have to address those ahead of the migration or on the fly. I have a big text file (950k lines) containing all of the folders currently in Documentum, and I am attempting to find all folders that will not migrate due to containing these characters.
The issue is that doing a basic egrep '\*' will give not just the top level folder containing this character but all subfolders, which will throw off counts.
Let's say you were looking for the double period:
.*\.\.[^/]*$
would match two periods followed by an unlimited number of non-slash characters until the end of the string. In general, replace \.\. with whatever you are looking for.
Check it out at regex101.com. (Asterisk version here).
I need a nice column for Centrify tool which include all the log files under the different folders, for example;
/oradata1/oracle/admin/A/scripts/rman_logs/*.log
/oracle/oracle/admin/B/scripts/rman_logs/*.log
/oradata2/admin/C/scripts/logs/*.log
I used this but after the * character user can see all logs;
/ora(data(1|2)|cle)/oracle|admin/admin/*/scripts/rman_logs
/ora(data(1|2)|cle)/oracle|admin/admin/*/scripts/rman_logs
Which expression I must use.
If I understandy our question correctly, you want only .log files. You can use a positive lookahead to assert that it is indeed a log file (contains .log at the end of filename), and match the filename whatever it is (.*).
Then it's really easy. (?=.*\.log(?:$|\s)).* Of course, you can also add specific folders if you wish to restrict the matches, but the positive lookahead will still do its work. I.e. (?=.*\.log(?:$|\s)).*/scripts/.*
EDIT: As your comment, you only need those folders, so you just specify their filepaths in alternations and add [^.\s\/]*\.log at the end. So:
(?:\/oradata1\/oracle\/admin\/A\/scripts\/rman_logs\/|\/oracle\/oracle\/admin\/B\/scripts\/rman_logs\/|\/oradata2\/admin\/C\/scripts\/logs\/)[^\s.\/]*\.log You may shorten the regex by trying to combine filepath elements, but, imo, not necessary as you might as well specify each filepath individually, if they don't overlap too much.
I have found a global expression.
this is not a good way but it works and save me from lots of job. The main files are under the ....../scripts/rman_logs/ for all servers so I use this way.
I can produce these lines and can be a command group for users so this works good
tail /////scripts/rman_logs/*.log
tail ////scripts/rman_logs/.log
Thanks for your helps.
I spent most of yesterday putting together a collection of regular expressions to convert all my image names and paths to lower case. Today, I processed a folder full of files and was surprised to discover that many image names are still capitalized.
So I decided to try it one step at a time, first renaming .jpg's, then .gif's, .png's, etc.
I'm working on a Mac, using Dreamweaver and TextWrangler as my text editors. The following regex works perfectly for jpg's, with one major flaw - it deletes the extension...
([\w/-]+)\.jpe?g
\L\1
In other words, it changes South-America.jpg to south-america.
How can I change it so that it retains the file extension? I assume I can then just change it to...
([\w/-]+)\.png
\L\1
...to process png's, etc.
([\w\/-]+)(\.jpe?g)
and replace with \L\1\2
its deleting your extension because you are never saving it in a matchgroup.
You could perhaps capture the extension too?
([\w/-]+)(\.jpe?g)
\L\1\2
And I think you should be able to use something like this for all the files:
([\w/-]+)(\.[^.]+$)
\L\1\2
Or if you specifically want to convert those jpegs, pngs and gifs:
([\w/-]+)(\.(?:jpe?g|gif|png))
\L\1\2
If it's okay for the extension to become lowercase as well, you could just do
^(.*)$
\L\1
As long as you're certain that all lines contain file names.
If you want to process only certain file formats, use
^(.*\.(jpe?g|png|gif))$
\L\1
Crashplan allows for excluding files from a backup set by using regex for the exclusion criteria (there is no inclusion criteria functionality). For my particular use case I have a folder that contains these files:
C_VOL-b001.spf
C_VOL-b001-i001.md5
C_VOL-b001-i001.spi
E_VOL-b001.spf
E_VOL-b001-i001.md5
E_VOL-b001-i001.spi
F_VOL-b001.spf
F_VOL-b001-i001.md5
F_VOL-b001-i001.spi
G_VOL-b001.spf
G_VOL-b001-i001.md5
G_VOL-b001-i001.spi
and I want to exclude any file that doesn't begin with the C_VOL filename. These are backup files from another backup software, Shadowprotect, but I only want to include the C volume files and exclude the others. The incremental files will continue to be added to each of the volume sets using the naming schema of -i001, -i002, etc.
So far I've tried the following:
^E_VOL
^E_VOL.*
and a few other variations, with no success. I'm not sure if Crashplan only allows for selecting based on the filetype extension (their regex examples are here http://goo.gl/qDAEcR ). They do mention that "Note that CrashPlan treats all file separators as forward slashes (/)."
I'm not sure if Crashplan recognizes all regex expressions. If it helps, back in 2008 I emailed their tech support with a regex question and one of the founders of Crashplan, Matt Dornquast, helped me with a the following regex:
I am trying to exclude any file that either:
1. have an extension of .spf, or
2. has a file name of the type, XXXXXX-cd.spi
3. But also allow for backup of files with the name type of, xxxxx.spi
And his regex worked perfectly:
(?i).+(?:\-cd\.spi|\.spf)$
I've contacted their tech support again but they said they will no longer help with regex questions.
It seems that you could use the following regex:
.*/C_VOL.*
I created this based on this example (link) they featured on the website you linked in your question. Please let us know if it's working :)