Find file paths that have spaces - regex

I am trying to create a regular expression to select file paths that contain spaces and are not wrapped in quotes. In addition, I only want paths that begin with a volume letter (e.g., C:\, D:\, E:) and I want to ignore any switches or commands that come after the path.
Take for instance the following list, I have highlighted in bold all of the text I want to match and return:
C:\This path has spaces\system.sys -switch /command
C:\Thispathhasnospaces\filename.exe
\sytem32\ThisDidNotBeginWithADriveLetter\something.doc
D:\This path also has spaces\something.xlsx
"C:\I don't care if it is wrapped in quotes\something.abc" -switch
So far what I have come up with is:
^\w:\(.+)(.\w\w\w)
Which sort of works, but it selects paths both with spaces and without spaces. It also doesn't select the full filename if the path as a four character extension, such as .xlsx
Any help would be very much appreciated. If you do post a better regex, if you added some explanation it would really help because I am trying to learn it.
Thanks!

I would go by
^[A-Z]:\\.+\s.+\.\S+
^ is an anchor for the start of the string
[A-Z]:\\ matches a letter followed by colon and backslash
.+ matches any character, 1 or more times
\s matches a single space
.+\.\S+ matches any characters followed by dot and non-spaces
See https://regex101.com/r/fC5tF8/2 for a demo

a regex I would use is
^\w:[^\s]+[\.]*[^\s]+
this will find anything that starts with a alphanumeric and contains no spaces.

Related

Match all spaces after a particular string

Doing a find and replace in VsCode on a large amount of files. I'm looking to replace all spaces after a set of quotes, but only on a specific line.
I can very easily find all spaces using \s+, but I don't understand how to capture only the spaces after a specific string(one specific line). I've tried positive look behinds, but I can only get it to match the first space, but I need to match all spaces on that line.
Example code:
variable = "01 - Testing this thing"
I need to find and replace all the spaces between the quotation marks with underscores, but I can't get any regex to match all the spaces between the quotes. I might want to replace the dash(-) as well, but the spaces are more important and I'm struggling to figure it out.
Here is a pretty good workflow.
Open a Search Editor (from the Command Palette or set a keybinding to it).
Use this regex (?<=variable = ")[^"]*.
That will find all matches in all files in your workspace or whatever folders you designate in the file to include filter. I suggest setting the context lines option to 0.
Ctrl+Shift+L to select all your matches. The matches are the 01 - Testing this thing part.
Now do a regular find in that search editor tab - with the Find in Selection option enabled.
Simply doing a find of and replaceAll with _ will make all those changes (in the Search Editor only).
To apply those changes to all the files with your initial search results, use the extension search-editor-apply-changes Apply Search Editor Changes... command.
Then you can check to see if the changes were as you expected and save all. It will open all affected files so you can inspect them.
Seems like a few steps but notice the first regex can be very simple. And then you are doing a simple find/replace in just those selections. Demo:
You search for a string that matches, it has A space between the quotes. Replace with what is before and after the space but the space is now an underscore. You have to apply this as often as the max number od spaces in a string. It can't be done in 1 regex search-replace.
In the Search Bar
Find Regex:
(variable = "[^" ]*) ([^"]*")
Replace:
$1_$2
Then apply Replace All (button) and Refresh (button) until no more searches found.

Extract specific string using regular expression

I want to extract only a specific string if its match
example as an input string:
13.10.0/
13.10.1/
13.10.2/
13.10.3/
13.10.4.2/
13.10.4.4/
13.10.4.5/
I'm using this regex [0-9]+.[0-9]+.[0-9] to extract only digit.digit.digit from a string if its match
but in that case, this is the wrong output related to my regex :
13.10.0
13.10.1
13.10.2
13.10.3
13.10.4.2 (no need to match this string 13.10.4 )
13.10.4.4 (no need to match this string13.10.4 )
13.10.4.5(no need to match this string 13.10.4 )
the correct output that I need :
13.10.0
13.10.1
13.10.2
13.10.3
It's hard to say without knowing how you're passing these strings in -- are they lines in a file? An array of strings in a programming language?
If you're searching a file using grep or a similar tool, it will give you all lines that match anywhere, even if only part of the line matches.
Normally, you'd deal with this using anchors to specify the regex must start on the first character of the line, and end on the last (e.g. ^[0-9]+.[0-9]+.[0-9]$). ^ matches the start of the line, and $ matches at the end.
In your case, you've got slashes at the end of all the lines, so the easiest fix is to match that final slash, with ^[0-9]+.[0-9]+.[0-9]/.
You could also use lookahead or groups to match the slash without returning it -- but that depends a bit more on what tool you're running this regex in and how you're processing it.
If your strings are separated by whitespace (other than newlines), replacing ^ with (^|\s) (either the beginning of the string, or some whitespace character) may work -- but it will add a leading space to some of your results.
You may also need to set your regex tool to match multiple times in a line (e.g. the -o flag in grep). Again, it's hard to give useful advice about this without knowing what regular-expression tool you're using, or how you're processing the results.
I think you want:
^\d+\.\d+\.\d+$
Which is exactly 3 groups of digit(s) separates by (literal) dots.
Some tools (like grep) match all lines that contain your regex, and may have additional characters before/after.
Use $ character to match end of line after your regex. (Also note, that . matches any character, not literal dot)
[0-9]+\.[0-9]+\.[0-9]$

How to extract file location using Regular Expressions(VB.NET)

I am facing a problem whereby I am given a string that contains a path to a file and the file's name and I only want to extract the path (without the file's name)
For example, I will receive something like
C:\Users\OopsD\Projects\test.acdbd
and from that string I want to extract only
C:\Users\OopsD\Projects
I was trying to create a RegEx to match a backslash followed by a word, followed by a dot followed by another word - this is to match the
\test.acdbd
part and replace it with empty string so that the final result is
C:\Users\OopsD\Projects
Can anyone, familiar with RegEx, help me on this one? Also, I will be using regular expressions quite a lot in the future. Is there a (free) program I can download to create regular expressions?
Are you really sure you need to be using Regex for such as simple task? How about this:
Dim file As New IO.FileInfo(" C:\Users\OopsD\Projects\test.acdbd")
MsgBox(file.Directory.FullName)
Regarding the free program on Regex, I would definitely recommend http://www.gskinner.com/RegExr/ - using it all the time. But you always have to consider alternatives, before going the Regex way.
The regex that you are looking for is as below:
[^/]+$
where,
^ (caret):Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the caret match after line breaks (i.e. at the start of a line in a file) as well.
$ (dollar):Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the dollar match before line breaks (i.e. at the end of a line in a file) as well. Also matches before the very last line break if the string ends with a line break.
+ (plus):Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once.
More reference can be found out at this link.
Many Regex softwares and tools are out there. Some of them are:
www.gskinner.com/RegExr/
www.txt2re.com
Rubular- It is not just for Ruby.

Notepad++ Regex: Find all 1 and 2 letter words

I’m working with a text file with 200.000+ lines in Notepad++. Each line has only one word. I need to strip out and remove all words which only contains one letter (e.g.: I) and words which contains only two letters (e.g.: as).
I thought I could just pas in regular regex like this [a-zA-Z]{1,2} but I does not recognize anything (I’m trying to Mark them).
I’ve done manual search and I know that there do exists words of that length so therefor can it only be my regex code that’s wrong. Anyone knows how to do this in Notepad++ ???
Cheers,
- Mestika
If you want to remove only the words but leave the lines empty, this works:
^[a-zA-Z]{1,2}$
Replace this with an empty string. ^ and $ are anchors for the beginning and the end of a line (because Notepad++'s regexes work in multi-line mode).
If you want to remove the lines completely, search for this:
^[a-zA-Z]{1,2}\r\n
And replace with an empty string. However, this won't work before Notepad++ 6, so make sure yours is up-to-date.
Note that you will have to replace \r\n with the specific line-endings of your file!
As Tim Pietzker suggested, a platform independent solution that also removes empty lines would be:
^[a-zA-Z]{1,2}[\r\n]+
A platform-independent solution that does not remove empty lines but only those with one or two letters would be:
^[a-zA-Z]{1,2}(\r\n?|\n)
I don't use Notepad++ but my guess is it could be because you have too many matches - try including word boundaries (your exp will match every set of 2 letters)
\b[a-zA-Z]{1,2}\b
The regex you specified should find 1-or-2 characters (even in Notepad++'s Find-dialog), but not in the way you'd think. You want to have the regex make sure it starts at the beginning of the line and ends at the end with ^ and $, respecitevely:
^[a-zA-Z]{1,2}$
Notepad++ version 6.0 introduced the PCRE engine, so if this doesn't work in your current version try updating to the most recent.
You seem to use the version of Notepad++ that doesn't support explicit quantifiers: that's why there's no match at all (as { and } are treated as literals, not special symbols).
The solution is to use their somewhat more lengthy replacement:
\w\w?
... but that's only part of the story, as this regex will match any symbol, and not just short words. To do that, you need something like this:
^\w\w?$

Regex to parse file paths

I have this text:
Unexpected error creating debug information file
'c:\Users\Path1\Path2\Strategies\Path3\CustomStrategy.PDB' --
'c:\Users\Path1\Path2\Strategies\Path3\CustomStrategy.pdb: The system
cannot find the path specified.
I need to parse out the file paths c:\Users\Path1\Path2\Strategies\Path3 or c:\Users\Path1\Path2\Strategies\Path3\CustomStrategy.PDB, whatever is easier. I tried to use the following Regex
\w:.+[.]\w{3}
But, this RegEx doesn't stop at first file extension and continues to match the the second instance of the path, stopping at the second instance of .pdb; thus putting both file paths in one regex match.
What do I need to change in order for the regex to parse the two paths as two separate matches? Thanks.
Non-greedy re:
\w:.+?[.]\w{3}
Note ? after +.
Also, if your path contains no dots except the last one, you can write it so:
\w:[^.]+[.]\w{3}
If you are not sure that the extension consists of three letters, you must specify the range:
\w:[^.]+[.]\w{1,3}
And when you are not sure that your path has extension at all, but it contains no spaces, then:
\w:\S+
What about this
\w:\\(?:[^\\\s]+\\)+
See it here on Regexr
\w:\\ matches a word character, a : and a backslash
(?:[^\\\s]+\\)+ matches the directories, non-backslash or non whitespace characters till a backslash, and this repeated.
So, this would match both paths c:\Users\Path1\Path2\Strategies\Path3. works as long as the directory names does not contain spaces.
Actually, here you may as well do without regex at all.
Split the text by ' and use the second part.
As for regex, I would use something more complicated, but allowing to catch other filenames, not just those ending with a 3-letter extension:
'([a-z]:(?:[\\/][^\\/]*)+?)' --
(and use first subpattern from the match)