Does anyone know of a text editor that searches within search results using regex?
I would like to perform a regex search on several text files and get a list of matches and then apply another regex search on the search results to further narrow down results. I would prefer a Windows GUI editor rather than a specialized editor with a steeper learning curve like Vim or Emacs.
You might want to look at PowerGrep. It's not exactly a text editor, but you can open files containing your search results within its built-in text editor, and edit stuff there.
The main thing though is that it allows you to search using a regex (or list of regexes), then apply an additional regex to each search result, before returning a 'final' result, which I believe is what you are asking for. Kind of hard to explain, but maybe you get the idea.
The only problem with PowerGrep is that its UI is not very good. To say it takes some getting used to is an understatement. But once you figure it out, you can do a lot of powerful stuff (search/replace, data collection, etc on multiple files whose file names can also be regexes).
The companion product EditPadPro by the same company is also a great editor that has a really good regex engine built-in (probably the same one as in PowerGrep), but it doesn't allow you to do the 'regex-applied-to-a-regex-result' that I think you are asking for.
Do you want list of files in which text matches both reg.exps or a list of lines?
In the first case you can do :
{ grep -l -R 'pattern1' * ; grep -l -R 'pattern2' * } | sort | uniq -d
Note that with Windows you can get those binaries from GnuWin32 and use nearly the same syntax in a batch file:
( grep -l -R "pattern1" *
grep -l -R "pattern2" *
) | sort | uniq -d
In the last case you can with vim use my answer to narrow quickfix results with reg.exp.
Of course you can also copy your search results to a buffer and do some linewise filtering.
Related
Having some trouble figuring out the command line to the following issue and hoping u guys can help!
Basically, I have a folder which contains a ~1000 PDF's. I need to search through every pdf and return the file names of PDF's that match certain words X amount of times.
For example, I have 10 PDF's which all contain the word "Fragile". I would like to return a list of all files that contain "Fragile" a minimum of 3 times throughout the PDF.
I am currently using pdfgrep and giving it a regex to look for, but it will return all the files that match at least once. I have seen a few recommendations out there piping the command with "awk", but i'm not sure what this really does...
Don't know much about pdfgrep, but if the output is like on https://pdfgrep.org/ it should be fairly easy to get the number of the lines in the output, doing something like:
for f in *.pdf; do if [ $(pdfgrep -nHm 10 "Fragile" "$f" | wc -l) -gt 2 ]; then echo $f; fi; done
Within a particular directory, I have a series of files that are labelled sequentially:image0000.png, image0001.png, image0002.png, etc.. They are labelled by number, but I don't necessarily know how many preceding zeroes there are in the filename, i.e. whether it will be image0001.png or image00001.png.
Within a bash script, I wish to find a single file at a time (over a for loop), and then apply some processing to the file. This search could start at zero and end before I've reached the end, or could be of varying steps. To expand, I could want to find image0000.png, image0001.png, image0002.png and so forth, or I could start at image0010.png and find every other file, i.e. the next two would be image0012.png and image0014.png.
To try and find the first file (image0000.png), I've tried using find and ls, with the following outputs:
$ find video/figs/ -name 'image*[0]0.png'
video/figs/image00100.png
video/figs/image00000.png
$ ls video/figs/image*[0]0.png
-rw-r--r-- 1 user machine 165K Feb 19 09:06 video/figs/image00000.png
-rw-r--r-- 1 user machine 207K Feb 19 09:06 video/figs/image00100.png
Similar results occur for finding the second (i.e., find video/figs/ -name 'image*[0]0.png' finds image00101.png and image00001.png. So it's finding the file I want (image00001.png), but is also finding one that I don't (image00101.jpg). Can anyone help me understand why, and fix it?
I would use ls and grep for that:
ls | grep -oP 0*[1-9]+.png
Example:
$:/tmp/test$ ls
00001.png 00002.png 00010.png 00013.png 00201.png
$:/tmp/test$ ls | grep -oP 0*[1-9]+.png
00001.png
00002.png
00013.png
01.png
I suspect you don't want to dive into subdirectories, and find files, sorted by number, spread over subdirs.
So find isn't necessary.
ls image*{08..10}.png
image00010.png image0008.png image0009.png image0010.png image008.png image009.png
Part 2 of your question, only find every other file:
ls image*{08..10..2}.png
image00010.png image0008.png image0010.png image008.png
Maybe you know for-loops. It's like that,
for (i in 8 to 10 by 2)
or
for (int i=8; i <= 10; i+=2)
Restricting the search to find image image00010.png but not imageAB010.png wouldn't work.
The reason to exclude 101 is still unclear. Maybe it's only a sorting thing.
With directories, which aren't the PWD, there is no big difference:
ls video/figs/image*{08..10..2}.png
Note, that instead of ls, you use just the program, you want to process on the files, if the program is able to handle more than one file at a time, like ls.
Sincere thanks to everyone who contributed an answer - perhaps I explained it poorly, or I was too wedded to the code I'd already written to use any of the provided answers. However, I've found the following solutions:
1) Why did I find more answers than I expected?
find video/figs/ -name 'image*[0]0.png' uses very limited comprehension of wildcards, and thus the above was interpreted as finding a file with name image<wildcard>00.png. There is no way, using the -name option, to restrict the application of * to match only a given character (in this case, only find zero or more matches to 0.
2) How do I find the image files with an unknown number of padding zeroes?
The following is a MWE from my final code. It demonstrates how to search within a given directory SEARCH_DIR (not necessarily including subdirectories, but I haven't checked)
f1=0 # Starting number
f2=10 # End number
df=2 # number to skip between images
for ((f=$f1; f<=$f2; f=$f+$df)); do
export iFile=$(find $SEARCH_DIR -regex '.*/image0*'$f'.png')
done
The export ensures the variable is available to sub-processes, with the iFile=$() syntax allowing me to export the result of the command to the variable iFile. The bit within the parentheses is the bit I was looking for: find $SEARCH_DIR -regex '.*/image[0]*'$f'.png'
a) find $SEARCH_DIR specifies the location for the search
b) -regex specifies to use regular expressions, which are more powerful than standard bash scripting and allow me to limit wildcards as required
c) '.*/image0*'$f'.png': The regular expression search looks over the entire string, so apparently I need the initial .*/ to perform the match. The 0* now performs as I originally wanted - the * wildcard is now searching for zero or more matches of the preceding term, which here is 0 (so if I wanted to search for zero or more matches of any digit, I would use [0-9]*). The $f term is to search for the numbered file in the for loop.
We can search files in windows 7 or higher version using the following tool:
(I don't have image uploading privilage. I mean the top-right area in windows file explorer.)
When I search for MATLAB files using "*.m", it not only returns *.m files, but also returns *.mp3, *.mp4 files. Is there any way to show *.m files exclusively?
Thanks!
I assume you used the quotation marks here to show the text you typed, because ironically the exact way how it should work is to put the search in quotation marks...
so
*.m
finds .mp3 as well as .m but
"*.m"
should only find the .m files. Alternatively you could also write
ext:".m"
which would guarantee that only extensions are searched. (Although I am not sure if this is ever necessary here, because while windows can have a dot in the filename and also can have files without extensions I am not sure if it is possible to have both at the same time.)
using the following
"*.m"
will solve your problem.You can find more information on regex to be used in msdn in the following link .Advanced query syntax
Above that, you can also take advantage of the wildcard character *.
For example, if you want to search for a file with a name ending with 024 or starting with 024 then you can put in the search box like *024.* or 024*.* respectively.
Here the * after . represents files with any extensions, if you want particular then mention extension line 024.png.
Explorer don't have a function of finding with RegEx.
You need to use Power-Shell instead of Win Explorer;
for example: where '(?i)Out' is a regex
Get-ChildItem -Path e:\temp -Recurse -File | Where-Object { $_.Name -match '(?i)Out' }
alternatively you can just simply search for your extension like this:
.extension
eg:
typing .exe will give you all the files with .exe extensions in a folder.
PS: Typing .xml OR .vmcx will give you both type of files. It is useful if you seek to make an archive of different kinds of files stored in different folders or locations.
You can get close to proper regex support from the mostly awesome Cygwin, and as a bonus you get most every linux tool running natively on linux. But it still doesnn't know that .* means "zero or more of anything", ^ means the start of a line (and $ the end), so some things are still weird.
And a startlingly large bunch of weird corner cases that only deranged perl programmers notice fail the test.
So many other things it gets wrong, but it's more workable than anything in any windows OS, plus you get perl, grep, diff, wget, curl, etc. -- the whole GNU lib for free.
If you want a full on bash shell with proper respect for regex, install the super neet-o Bash for Windows 10
Either will do what you want. And they're a billion times faster than that stupid search bar that takes off at 100 mph then crawls to 1 pixel per 10 minutes near the end.
I want to apply a certain regular expression substitution globally to about 40 Javascript files in and under a directory. I'm a vim user, but doing this by hand can be tedious and error-prone, so I'd like to automate it with a script.
I tried sed, but handling more than one line at a time is awkward, especially if there is no limit to how many lines the pattern might match.
I also tried this script (on a single file, for testing):
ex $1 <<EOF
gs/,\(\_\s*[\]})]\)/\1/
EOF
The pattern will eliminate a trailing comma in any Perl/Ruby-style list, so that "[a, b, c,]" will come out as "[a, b, c]" in order to satisfy Internet Explorer, which alone among browsers, chokes on such lists.
The pattern works beautifully in vim but does nothing if I run it in ex, as per the above script.
Can anyone see what I might be missing?
You asked for a script, but you mentioned that you are vim user. I tend to do project-wide find and replace inside of vim, like so:
:args **/*.js | argdo %s/,\(\_\s*[\]})]\)/\1/ge | update
This is very similar to the :bufdo solution mentioned by another commenter, but it will use your args list rather than your buflist (and thus doesn't require a brand new vim session nor for you to be careful about closing buffers you don't want touched).
:args **/*.js - sets your arglist to contain all .js files in this directory and subdirectories
| - pipe is vim's command separator, letting us have multiple commands on one line
:argdo - run the following command(s) on all arguments. it will "swallow" subsequent pipes
% - a range representing the whole file
:s - substitute command, which you already know about
:s_flags, ge - global (substitute as many times per line as possible) and suppress errors (i.e. "No match")
| - this pipe is "swallowed" by the :argdo, so the following command also operates once per argument
:update - like :write but only when the buffer has been modified
This pattern will obviously work for any vim command which you want to run on multiple files, so it's a handy one to keep in mind. For example, I like to use it to remove trailing whitespace (%s/\s\+$//), set uniform line-endings (set ff=unix) or file encoding (set filencoding=utf8), and retab my files.
1) Open all the files with vim:
bash$ vim $(find . -name '*.js')
2) Apply substitute command to all files:
:bufdo %s/,\(\_\s*[\]})]\)/\1/ge
3) Save all the files and quit:
:wall
:q
I think you'll need to recheck your search pattern, it doesn't look right. I think where you have \_\s* you should have \_s* instead.
Edit: You should also use the /ge options for the :s... command (I've added these above).
You can automate the actions of both vi and ex by passing the argument +'command' from the command line, which enables them to be used as text filters.
In your situation, the following command should work fine:
find /path/to/dir -name '*.js' | xargs ex +'%s/,\(\_\s*[\]})]\)/\1/g' +'wq!'
you can use a combination of the find command and sed
find /path -type f -iname "*.js" -exec sed -i.bak 's/,[ \t]*]/]/' "{}" +;
If you are on windows, Notepad++ allows you to run simple regexes on all opened files.
Search for ,\s*\] and replace with ]
should work for the type of lists you describe.
Is there any way to write a RegEx which can be used to find files with different Extensions.
This works in Bash:
find . -regex '.*\\.\\(pdf\|chm\|doc\\)'
Assuming you have a list of files and you are looking for .pdf, .chm and .doc, you can check it with:
\.pdf$|\.chm$|\.doc$
Regex above should work if you will check it against single filenames.
I'm sure there is, but the question you should be asking is "What's the best way to find files which have specific extensions?".
Regular expressions are not the best answer to every question.
I would suggest just getting a list of all files and passing them into a function like IsThisFileOneIWant(fileName,extensionList). That's far easier than trying to shoehorn the use of regular expressions into your problem.
Something like this should do it:
function IsThisFileOneIWant(fileName,extensionList):
for each extension in extensionList:
if fileName.endsWith (extension):
return true
return false
Done in pseudo-code since it should be simple enough to turn into any other language.
If you must have a regex, it's going to look something like (based on the values in your question):
"ASPX$|ASCX$|\.js$|\.rpt$|\.xml$"
but it depends entirely on the RE engine that you want to use. For example, here's the output from an egrep command in my work directory:
pax#paxbox1:~/work$ ls -1 | egrep '\.sh$|\.c$'
backup0.sh
backup1.sh
eclipse.sh
monbt.sh
qq.c
qq.sh
xx yy.sh