How to search files in windows file explorer with specified extension name? - regex

We can search files in windows 7 or higher version using the following tool:
(I don't have image uploading privilage. I mean the top-right area in windows file explorer.)
When I search for MATLAB files using "*.m", it not only returns *.m files, but also returns *.mp3, *.mp4 files. Is there any way to show *.m files exclusively?
Thanks!

I assume you used the quotation marks here to show the text you typed, because ironically the exact way how it should work is to put the search in quotation marks...
so
*.m
finds .mp3 as well as .m but
"*.m"
should only find the .m files. Alternatively you could also write
ext:".m"
which would guarantee that only extensions are searched. (Although I am not sure if this is ever necessary here, because while windows can have a dot in the filename and also can have files without extensions I am not sure if it is possible to have both at the same time.)

using the following
"*.m"
will solve your problem.You can find more information on regex to be used in msdn in the following link .Advanced query syntax

Above that, you can also take advantage of the wildcard character *.
For example, if you want to search for a file with a name ending with 024 or starting with 024 then you can put in the search box like *024.* or 024*.* respectively.
Here the * after . represents files with any extensions, if you want particular then mention extension line 024.png.

Explorer don't have a function of finding with RegEx.
You need to use Power-Shell instead of Win Explorer;
for example: where '(?i)Out' is a regex
Get-ChildItem -Path e:\temp -Recurse -File | Where-Object { $_.Name -match '(?i)Out' }

alternatively you can just simply search for your extension like this:
.extension
eg:
typing .exe will give you all the files with .exe extensions in a folder.
PS: Typing .xml OR .vmcx will give you both type of files. It is useful if you seek to make an archive of different kinds of files stored in different folders or locations.

You can get close to proper regex support from the mostly awesome Cygwin, and as a bonus you get most every linux tool running natively on linux. But it still doesnn't know that .* means "zero or more of anything", ^ means the start of a line (and $ the end), so some things are still weird.
And a startlingly large bunch of weird corner cases that only deranged perl programmers notice fail the test.
So many other things it gets wrong, but it's more workable than anything in any windows OS, plus you get perl, grep, diff, wget, curl, etc. -- the whole GNU lib for free.
If you want a full on bash shell with proper respect for regex, install the super neet-o Bash for Windows 10
Either will do what you want. And they're a billion times faster than that stupid search bar that takes off at 100 mph then crawls to 1 pixel per 10 minutes near the end.

Related

List only files but not directories using list.files

How can I list only files, but not directories using list.files (not recursively)? It has an include.dirs argument, but this is ignored when not being used recursively.
I had been thinking something like
list.files(path=myDir, pattern="[^/]$")
but that doesn't seem to work, or a few variations on it. Is there a regex that I can plug in here or a function. I know I can do list.dirs and take a setdiff, but this is already slow enough, I want this to be quicker.
PS: currently on linux, but need something that works cross-platform.
PPS: file.info is really slow, so I think that is also not going to work.
PPPS: It doesn't need to be list.files, that is just the function I had thought should do it.
Consider this regex pattern that matches any file containing letters or numbers and contains the dot extension (to leave out subdirectories but unfortunately files without extensions):
# WITH ANCHORING
files <- list.files(path, pattern=("[a-zA-Z0-9]*[.][a-zA-Z0-9]*$"))
# MATCHING LETTER AND/OR NUMBER FILES WITH EXTENSION
files = list.files(myDir, pattern=("[a-zA-Z0-9]*[.]"))
# WILDCARD FILE MATCHING WITH EXTENSION
files = list.files(myDir, pattern=("*[.]"))
Some other regex variations to catch files with periods (note these also get directories with periods and miss files with no extensions)
list.files(pattern="\\..+$")
list.files(pattern="\\.[[:alnum:]]+$")
And using system2 with ls seems to work pretty well (thanks #42- as well from comments),
system2("ls", args=c("-al", "|", "grep", "^-"))
should get only regular files (including ones without extensions), or
system2("ls", args=c("--classify"))
should return files with directories having a "/" appended so they can be determined.
For an alternative open-source solution, consider the Python solution that allows you to condition if item is a directory and using os.path.join() is agnostic to any OS platform.
import os
files = [f for f in os.listdir(myDir) if os.path.isfile(os.path.join(myDir, f))]

Applescript to extract the Digital Object Identifier (DOI) from a PDF file

I looked for an applescript to extract the DOI from a PDF file, but could not find it. There is enough information available on the actual format of the DOI (i.e. the regular expression), but how could I use this to get the identifier from the PDF file?
(It would be no problem if some external program were used, such as Hazel.)
If you're ok with using an app, I'd recommend Skim. Good AppleScript support. I'd probably structure it like this (especially if the document might be large):
set DOIFound to false
tell application "Skim"
set pp to pages of document 1
repeat with p in pp
set t to text of p
--look for DOI and set DOIFound to true
if DOIFound then exit repeat--if it's not found then use url?
end repeat
end tell
I'm assuming a DOI would always exist on one page (not spread out to between two). Looks like they are invariably (?) on the first page of an article, which would make this quick of course, even with a large doc.
[edit]
Another way would be to get the Xpdf OSX binaries from http://www.foolabs.com/xpdf/download.html and use pdftotext in the command line (just tested this; it works well) and parse the text using AppleScript. If you want to stay in AppleScript, you can do something like:
do shell script "path/to/pdftotext 'path/to/pdf/file.pdf'"
which would output a file in the same directory with a txt file extension -- you parse that for DOI.
Have you tried it with pdfgrep? It works really well in commmandline
pdfgrep -n --max-count 1 --include "*.pdf" "DOI"
i have no idea to build an apple script though, but i would be interested in one also. so that if i drop a pdf into that folder it just automatically extracts the DOI and renames the file with the DOI in the filename.

Text editor that searches within search results

Does anyone know of a text editor that searches within search results using regex?
I would like to perform a regex search on several text files and get a list of matches and then apply another regex search on the search results to further narrow down results. I would prefer a Windows GUI editor rather than a specialized editor with a steeper learning curve like Vim or Emacs.
You might want to look at PowerGrep. It's not exactly a text editor, but you can open files containing your search results within its built-in text editor, and edit stuff there.
The main thing though is that it allows you to search using a regex (or list of regexes), then apply an additional regex to each search result, before returning a 'final' result, which I believe is what you are asking for. Kind of hard to explain, but maybe you get the idea.
The only problem with PowerGrep is that its UI is not very good. To say it takes some getting used to is an understatement. But once you figure it out, you can do a lot of powerful stuff (search/replace, data collection, etc on multiple files whose file names can also be regexes).
The companion product EditPadPro by the same company is also a great editor that has a really good regex engine built-in (probably the same one as in PowerGrep), but it doesn't allow you to do the 'regex-applied-to-a-regex-result' that I think you are asking for.
Do you want list of files in which text matches both reg.exps or a list of lines?
In the first case you can do :
{ grep -l -R 'pattern1' * ; grep -l -R 'pattern2' * } | sort | uniq -d
Note that with Windows you can get those binaries from GnuWin32 and use nearly the same syntax in a batch file:
( grep -l -R "pattern1" *
grep -l -R "pattern2" *
) | sort | uniq -d
In the last case you can with vim use my answer to narrow quickfix results with reg.exp.
Of course you can also copy your search results to a buffer and do some linewise filtering.

Doing a 'diff/st' and ignoring the first line if it matches a specific criterion

In a repository for a well known open source project, all files contain a version string with a timestamp as their first line:
<?php // $Id: index.php,v 1.201.2.10 2009-04-25 21:18:24 stronk7 Exp $
Even if I don't really understand why they do this - since the files are already under version control -, I have to live with this.
The main problem is that if I try to 'st' or 'diff' a release to get an idea of what was changed from the previous one, every single file contained in the repository is obviously marked as modified and the diffs become unreadable and unmanageable.
I'm wondering if there's a way to ignoring the first lines doing a diff/st when they match a regexp.
The project is under cvs - cvs, yes, you've read correctly - and included in a bigger mercurial repository.
I don't know about cvs, but with hg you can use any external diff tool with the bundled extdiff extension, and any modern tool should have the ability to let you ignore diffs that match certain patterns.
I swear by Beyond Compare, which allows arbitrary syntax definition.
kdiff3 has preprocessor commands that you can pipe the input through.
If you try
man diff
you'll find
--ignore-matching-lines=RE Ignore changes whose lines all match RE.
search "ignore matching lines" on the web gives examples :
diff --unified --recursive --new-file
--ignore-matching-lines='[$]Author.[$]'
--ignore-matching-lines='[$]Date.[$]' ...
(http://www.cygwin.com/ml/cygwin-apps/2005-01/msg00000.html)
Thus try :
diff --ignore-matching-lines='[<][?]php [/][/] [$]Id:'

white space free path to My Documents

In building a C++ project with the GNU tool chain, make tells me ~
src/Adapter_FS5HyDE.d:1: *** multiple target patterns. Stop.
Search, search, search, and I found out that make thinks that it has multiple targets because the path to my included headers has spaces in it. If you've got your headers stored in some sane place like C:\Program Files then you can take care of this by using the old DOS paths (e.g. C:\PROGRA~1). However, when you have your headers in a truly insane place like My Documents you can get around the problem with MY DOC~1 because there's still a space.
Any idea how to tell my compiler to look in My Documents for headers without make confusing the path as two objects?
(Note: Feel free to throw tomatoes at me for putting header files in My Documents if you'd like, but there is a little rationale for doing that which I don't feel like explaining. If the solution to this question is easy, I'd rather keep it the way it is.)
You can figure out what the old path is by doing a DIR /X in your command prompt.
Or, most of the time you can fake it with the first 6 characters - spaces + ~1 + extension (8.3 paths won't have spaces).
Or, you can use quotes: "C:\Documents and Settings\Administrator\My Documents".
I don't know about make specficially, but the normal way around this is to put quotes around the path i.e.
cd "C:\Program Files\"
does that work?
Side note: the short name (8.3) for the same folder might not be the same on different OS installations. Thus, you can't be sure that C:\Program Files will always be C:\PROGRA~1.
Short names can't contain spaces in them either, so the usual short name for My Documents is MYDOCU~1, not MY DOC~1.
You can find the exact short name for any folder or file (including My Documents) using dir /x <filename>.
If you are using the GNU toolchain from Windows command line (cmd.exe), you should be able to use quotes (") around the folder/file names to work around this problem.
For some folders, including My Documents, you can specify an alternative location. To do this, right-click the folder, select Properties, select Location tab, and away you go. I use this to put my downloads and music on another drive (D:).
Write a wrapper script (e.g. batchfile) to translate the path names to short form.
I have a script "runwin" that does stuff like this - instead of, e.g. gcc <args> I can call runwin gcc <args>;
runwin will make heuristic guesses as to which arguments are filename paths and translate them, then call gcc on the resulting string of arguments.