grep through an explicit filelist

grep through an explicit filelist - regex

Stack,
We have many files in our library that were never used in subsequent projects. We are now at a development phase where we can do some good housekeeping and carefully remove unused library code. I am trying to optimize my grep command, it's current implementation is quite slow.
grep --include=*.cpp --recursive --files-with-matches <library function name> <network path to subsequent projects>
The main reason is that the projects path is expansive and the bulk of the time is spent just navigating the directory tree and applying the file mask. This grep command is called many times on the same set of project files.
Rather than navigating the directory tree every call, I would like to grep to reference a static filelist stored on my local disk.
Something akin to this:
grep --from-filelist=c:\MyProjectFileList.txt
The MyProjectFileList.txt would be:
\\server1\myproject1\main.cpp
\\server1\myproject1\func1.cpp
\\server1\myproject2\main.cpp
\\server1\myproject2\method.cpp
Grep would apply the pattern-expression to contents of each of those files. Grep output would be the fully qualified path of the project file that is uses a specific library function.
Grep commands for specific library functions that return no project files are extraneous and can be deleted.
How do you force grep to scan files from an external filelist stored in a text file?
(Thereby removing directory scanning.)

Try around a little using the 'xargs' command and pipes ("|").

Try the following:
while read line; do echo -e "$line"; done < list_of_files.txt | xargs -0 grep **YOUR_GREP_ARGS_HERE**
or in a Windows environment with Powershell installed try...
Get-Content List_of_files.txt | Foreach-Object {grep $_ GREP_ARGS_HERE}

I googled for windows args and found this:
FOR /F %k in (filelist.txt) DO grep yourgrepargs %k
(but I use linux, no idea if it works)

Related

Using xargs, eval, and mv ensemble

I've been using the command line more frequently lately to increase my proficiency. I've created a .txt file containing URLs for libraries that I'd like to download. I batch-downloaded these files using
$ cat downloads.txt | xargs wget
When using the wget command I didn't specify a destination directory. I'd like to move each of the files that I've just downloaded into a directory called "vendor".
For the record, it has occurred to me that if I ran...
$ open .
...I could drag-and-drop these files into the desired directory. But in my opinion that would defeat the purpose of this exercise.
Now that I have the files in my cwd, I'd like to be able to target them and move them into the "vendor" directory.
As a side-question: Is there a useful way to print the most recently created files to STDOUT? Currently, I can grab the filenames from the URLs within downloads.txt pretty simply using the following pipeline and Perl script...
$ cat downloads.txt | perl -n -e 'if (/(?<=\/)([-.a-z]+)$/) { print $1 . "\n" }'
This will produce...
react.js
redux.js
react-dom.js
expect.js
...which is great as these are file that I intended on targeting. I'd like to transform each of these lines into a command within a pipeline that resembles this...
$ mv {./,./vendor/}<filename>
... where <filename> is "react.js" then "redux.js", and so forth.
I figure that I may be able to accomplish this using some combination of xargs, eval, and mv. This is where my bash skills drop-off.
Just to reiterate, I'm aware that the method in which I am approaching this problem is neither simple nor ideal. This is intentionally a convoluted exercise intended to stretch my bash knowledge.
Is there anyone who knows how I can use xargs, eval, and mv to accomplish this goal?
Thank you!

xargs -l -a downloads.txt basename | xargs -i mv {} ./vendor
How this works: The first instance of xargs reads the file names from downloads.txt and calls basename for each of these file names individually (alternatively, you could use basename -a). These basenames are then piped to another instance of xargs, which uses the arguments to call mv, replacing the string {} with the current argument.
mv $(basename -a $(<downloads.txt)) ./vendor
How this works: Since you want to move all the files into the same directory, you can use a single call to mv. The command substitution ("backticks") inserts the output of the command basename -a, which, in turn, reads its arguments from the file.

Use [msys] bash to remove all files whose name matches a pattern, regardless of file-name letter-case

I need a way to clean up a directory, which is populated with C/C++ built-files (.o, .a, .EXE, .OBJ, .LIB, etc.) produced by (1) some tools which always create files having UPPER-CASE names, and (2) other tools which always create lower-case file names. (I have no control over the tools.)
I need to do this from a MinGW 'msys' bash.exe shell script (or bash command prompt). I understand piping (|), but haven't come up with the right combination of exec's yet. I have successfully filtered the file names, using commands like this example:
ls | grep '.\.[eE][xX][eE]'
to list all files having any case-combination of letters in the file-extension--this example gets all the executable (e.g. ".EXE") files.
(I'll be doing similar for .o, .a, .OBJ, .LIB, .lib, .MAP, etc., which all share the same directory as the C/C++ source files. I don't want to delete the source files, only the built-files. And yes, I probably should rework the directory structure, to use a separate directory for the built-files [only], but that will take time, and I need a quick solution now.)
How can I merge the above command with "something" else (e.g., like the 'rm -f' command???), to carry this the one step further, to actually delete [only] those filtered-out files from the current directory? (I'm hopeful for a solution which does not require a temporary file to hold the filtered file names.)

Adding this answer because the accepted answer is suggesting practices which are not-recommended in actual scripts. (Please don't feel bad, I was also on that track once..)
Parsing ls output is a NO-NO! See http://mywiki.wooledge.org/ParsingLs for more detailed explanation on why.
In short, ls separates the filenames with newline; which can be present in the filename itself. (Plus, ls does not handle other special characters properly. ls prints the output in human readable form.) In unix/linux, it's perfectly valid to have a newline in the filename.
A unix filename cannot have a NULL character though. Hence below command should work.
find /path/to/some/directory -iname '*.exe' -print0 | xargs -0 rm -f
find: is a tool used to, well, find files matching the required pattern/criterion.
-iname: search using particular names, case insensitive. Note that the argument to -iname is wildcard, not regex.
-print0: Print the file names separated by NULL character.
xargs: Takes the input from stdin & runs the commands supplied (rm -f in this case) on them. The input is separaed by white-space by default.
-0 specifies that the input is separated by null character.
Or even better approach,
find /path/to/some/directory -iname '*.exe' -delete
-delete is a built-in feature of find, which deletes the files found with the pattern.
Note that if you want to do some other operation, like move them to particular directory, you'd need to use first option with xargs.
Finally, this command find /path/to/some/directory -iname '*.exe' -delete would recursively find the *.exe files/directories. You can restrict the search to current directory with -maxdepth 1 & filetype to simple file (not directory, pipe etc.) using -type f. Check the manual link I provided for more details.

this is what you mean?
rm -f `ls | grep '.\.[eE][xX][eE]'`
but usually your "ls | grep ..." output will have some other fields that you have to strip out such as date etc., so you might just want to output the file name itself.
try something like:
rm -f `ls | grep '.\.[eE][xX][eE]' | awk '{print $9}'`
where you file name is in the 9th field like:
-rwxr-xr-x 1 Administrators None 283 Jul 2 2014 search.exe

You can use following command:
ls | grep '.\.[eE][xX][eE]' | xargs rm -f
Use of "xargs" would turn standard input ( in this case output of the previous command) as arguments for "rm -f" command.

Remove duplicate filename extensions

I have thousands of files named something like filename.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz
I am using the find command like this find . -name "*.gz*" to locate these files and either use -exec or pipe to xargs and have some magic command to clean this mess, so that I end up with filename.gz
Someone please help me come up with this magic command that would remove the unneeded instances of .gz. I had tried experimenting with sed 's/\.gz//' and sed 's/(\.gz)//' but they do not seem to work (or to be more honest, I am not very familiar with sed). I do not have to use sed by the way, any solution that would help solve this problem would be welcome :-)

one way with find and awk:
find $(pwd) -name '*.gz'|awk '{n=$0;sub(/(\.gz)+$/,".gz",n);print "mv",$0,n}'|sh
Note:
I assume there is no special chars (like spaces...) in your filename. If there were, you need quote the filename in mv command.
I added a $(pwd) to get the absolute path of found name.
you can remove the ending |sh to check generated mv ... .... cmd, if it is correct.
If everything looks good, add the |sh to execute the mv
see example here:

You may use
ls a.gz.gz.gz |sed -r 's/(\.gz)+/.gz/'
or without the regex flag
ls a.gz.gz.gz |sed 's/\(\.gz\)\+/.gz/'

ls *.gz | perl -ne '/((.*?.gz).*)/; print "mv $1 $2\n"'
It will print shell commands to rename your files, it won't execute those commands. It is safe. To execute it, you can save it to file and execute, or simply pipe to shell:
ls *.gz | ... | sh
sed is great for replacing text inside files.

You can do that with bash string substitution:
for file in *.gz.gz; do
mv "${file}" "${file%%.*}.gz"
done

This might work for you (GNU sed):
echo *.gz | sed -r 's/^([^.]*)(\.gz){2,}$/mv -v & \1\2/e'

find . -name "*.gz.gz" |
while read f; do echo mv "$f" "$(sed -r 's/(\.gz)+$/.gz/' <<<"$f")"; done
This only previews the renaming (mv) command; remove the echo to perform actual renaming.
Processes matching files in the current directory tree, as in the OP (and not just files located directly in the current directory).
Limits matching to files that end in at least 2 .gz extensions (so as not to needlessly process files that end in just one).
When determining the new name with sed, makes sure that substring .gz doesn't just match anywhere in the filename, but only as part of a contiguous sequence of .gz extensions at the end of the filename.
Handles filenames with special chars. such as embedded spaces correctly (with the exception of filenames with embedded newlines.)

Using bash string substitution:
for f in *.gz.gz; do
mv "$f" "${f%%.gz.gz*}.gz"
done
This is a slight modification of jaypal's nice answer (which would fail if any of your files had a period as part of its name, such as foo.c.gz.gz). (Mine is not perfect, either) Note the use of double-quotes, which protects against filenames with "bad" characters, such as spaces or stars.
If you wish to use find to process an entire directory tree, the variant is:
find . -name \*.gz.gz | \
while read f; do
mv "$f" "${f%%.gz.gz*}.gz"
done
And if you are fussy and need to handle filenames with embedded newlines, change the while read to while IFS= read -r -d $'\0', and add a -print0 to find; see How do I use a for-each loop to iterate over file paths output by the find utility in the shell / Bash?.
But is this renaming a good idea? How was your filename.gz.gz created? gzip has guards against accidentally doing so. If you circumvent these via something like gzip -c $1 > $1.gz, buried in some script, then renaming these files will give you grief.

Another way with rename:
find . -iname '*.gz.gz' -exec rename -n 's/(\.\w+)\1+$/$1/' {} +
When happy with the results remove -n (dry-run) option.

Howto: Searching for a string in a file from the Windows command line?

Is there a way to search a directory and its subdirectories' files for a string? The string is rather unique. I want to return the name of the string and hopefully the line that the string is on in the file. Is there anything built into Windows for doing this?

You're looking for the built-in findstr command.
The /S option performs a recursive search.

There is the find.exe command, but it's pretty limited in its capabilities. You could install Cygwin or Unxutils and use a pipeline including its Unix-style find and grep:
find . -type f | xargs grep unique-string

script to add files to SVN with filters

My bash scripting is weak. I want to create a script that filters and add files to the svn.
So far i have this
ls | egrep -v "(\.tab\.|\.yy\.|\.o$|\.exe$|~$)"
I tried to output it using exec but couldnt figure out how. Before that I checked if svn add uses regex. I am not sure if it does and i couldnt figure out how to reverse the above without the -v (i tired "[^((\.tab\.|\.yy\.|\.o$|\.exe$|~$))]" but that didnt work as expected (it seems to only ignore .tab. files))
How do i create a script to add files to svn after applying a filter? Would this be the most simple way? -> use ls, grep, put into a bash array then use a foreach with an svn add $element ?
NOTE: This is using linux, i dont think i'll have this running on windows (i couldnt set up bison) so as long as it works on most linux distros i am happy. Ignore the fact the above uses .exe

A number of ways:
Use backticks: svn add ``ls | egrep stuff
Use xargs: ls | egrep stuff | xargs svn add
Use find and xargs: find . -type f -name *.c -print | grep -v '\.svn' | xargs svn add
Obviously, change "stuff" and the "-name *.c" to suit your requirements...

Try using find.
find <pattern> -prune .svn -exec svn add {} \;
The command following exec will be executed for each file and {} will be replaced with the filename at each iteration.
I'm not in front of my linux system so I can't get you a pattern that you need right now but if you read the man, you might get there.

Another solution to this is to add those file extensions and the .svn folder to your SVN ignore pattern.
Armed with a client configured as such, you could then do svn add * and get only what you want into SVN.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js