Search for files in a git repository by extensions - regex

I have a string like this *.{jpg,png} for example, but the string could also be just *.scss - in fact it is an editorconfig.
Then I want to search for every file of this extension which is tracked by my git repository.
I've tried several methods but didn't find any sufficient solution.
The closest one I've got is:
git ls-tree -r master --name-only | grep -E ".*\.jpg"
But this is only working for single file extensions not for something like this git ls-tree -r master --name-only | grep -E ".*\.{jpg,png}".
Anyone could help me?

Try this:
git ls-tree -r master --name-only | grep -E '.*\.(jpg|png)'
The expression you tried to pass via -E option is interpreted as any characters (.*), the dot (\.), and the string {jpg,png}. I guess you are confusing the Bash brace expansion with the alternation (|) in a regular expression group (the parenthesis).
Consider using the end-of-line anchor: '.*\.(jpg|png)$'.
Without grep
As #0andriy pointed out you can pass patterns to git ls-files as follows:
git ls-files '*.jpg' '*.png'
Note, you should escape the arguments in order to prevent the filename expansion (globbing). In particular, the asterisk (*) character
matches any number of repeats of the character string or RE preceding it, including zero instances.
But this obviously will work only for the simple git patterns. For a slightly more complicated case such as "extension matching N characters from a given set" you will need a regular expression (and grep, for example).

Related

Match X or Y in grep regular expression

I'm trying to run a fairly simple regular expression to clear out some home directories. For background: I'm trying to ask users on my system to clear out their unnecessary files to clear up space on their home directories, so I want to inform users with scripts such as Anaconda / Miniconda installation scripts that they can clear that out.
To generate a list of users who might need such an email, I'm trying to run a simple regular expression to list all homedirs that contain such an installation script. So my assumption would be that the follwing should suffice:
for d in $(ls -d /home/); do
if $(ls $d | grep -q "(Ana|Mini)conda[23].*\.sh"); then
echo $d;
fi;
done;
But after running this, it resulted in nothing at all, sadly. After a while looking, I noticed that grep does not interpret regular expressions as I would expect it to. The following:
echo "Lorem ipsum dolor sit amet" | grep "(Lorem|Ipsum) ipsum"
results in no matches at all. Which would then explain why the above forloop wouldn't work either.
My question then is: is it possible to match the specified regular expression (Ana|Mini)conda[23].*\.sh, in the same way it matches strings in https://regex101.com/r/yxN61p/1? Or is there some other way to find all users who have such a file in their homedir using a simple for-loop in bash?
Short answer: grep defaults to Basic Regular Expressions (BRE), but unescaped () and | are part of Extended Regular Expressions (ERE). GNU grep, as an extension, supports alternation (which isn't technically part of BRE), but you have to escape \:
grep -q "\(Ana\|Mini\)conda[23].*\.sh"
Or you can indicate that you want to use ERE:
grep -Eq "(Ana|Mini)conda[23].*\.sh"
Longer answer: this all being said, you don't need grep, and parsing the output of ls comes with a lot of pitfalls. Instead, you can use globs:
printf '%s\n' /home/*/*{Ana,Mini}conda[23]*.sh
should do it, if I understand the intention correctly.
This uses the fact that printf just repeats its formatting string if supplied with more parameters than formatting directives, printing each file on a separate line.
/home/*/*{Ana,Mini}conda[23]*.sh uses brace expansion, i.e., it first expands to
/home/*/*Anaconda[23]*.sh /home/*/*Miniconda[23]*.sh
and each of those is then expanded with filename expansion. [23] works the same way as in a regular expression; * is "zero or more of any character except /".
If you don't know how deep in the directory tree the files you're looking for are, you could use globstar and **:
shopt -s globstar
printf '%s\n' /home/**/*{Ana,Mini}conda[23]*.sh
** matches all files and zero or more subdirectories.
Finally, if you want to handle the case where nothing matches, you could set either shopt -s nullglob (expand to nothing if nothing matches) or shopt -s failglob (error if nothing matches).
Shell patterns are described here.
You don't need ls or grep at all for this:
shopt -s extglob
for f in /home/*/#(Ana|Mini)conda[23].*.sh; do
echo "$f"
done
With extglob enabled, #(Ana|Mini) matches either Ana or Mini.

Using sed with regex to replace text on OSX and Linux

I am trying to replace some strings inside a file with sed using Regular Expressions. To complicate the matter, this is being done inside a Makefile script that needs to work on both osx and linux.
Specifically, within file.tex I want to replace
\subimport{chapters/}{xxx}
with
\subimport{chapters/}{xxx-yyy}
(xxx and yyy are just example text.)
Note, xxx could contain any letters, numbers, and _ (underscore) but really the regex can simply match anything inside the brackets. Sometimes there is some whitespace at the beginning of the line before \subimport....
The design of the string being searched for requires a lot of escaping (when searched for with regex) and I am guessing somewhere therein lies my error.
Here's what I've tried so far:
sed -i'.bak' -e 's/\\subimport\{chapters\/\}\{xxx\}/\\subimport\{chapters\/\}\{xxx-yyy\}/g' file.tex
# the -i'.bak' is required so SED works on OSX and Linux
rm -f file.tex.bak # because of this, we have to delete the .bak files after
This results in an error of RE error: invalid repetition count(s) when I build my Makefile that contains this script.
I thought part of my problem was that the -E option for sed was not available in the osx version of sed. It turns out, when using the -E option, fewer things should be escaped (see comments on my question).
POSIX-ly:
sed 's#^\(\\subimport{chapters/}{[[:alnum:]_]\+\)}$#\1-yyy}#'
# is used as the parameter separator for sed's s (Substitution)
\(\\subimport{chapters/}{[[:alnum:]_]\+\) is the captured group, containing everything required upto last }, preceeded by one or more alphabetics, digits, and underscore
In the replacement, the first captured group is followed by the required string, closed by a }
Example:
$ sed 's#^\(\\subimport{chapters/}{[[:alnum:]_]\+\)}$#\1-yyy}#' <<<'\subimport{chapters/}{foobar9}'
\subimport{chapters/}{foobar9-yyy}
$ sed 's#^\(\\subimport{chapters/}{[[:alnum:]_]\+\)}$#\1-yyy}#' <<<'\subimport{chapters/}{spamegg923}'
\subimport{chapters/}{spamegg923-yyy}
Here's is the version that ended up working for me.
sed -i.bak -E 's#^([[:blank:]]*\\subimport{chapters/}{[[:alnum:]_]+)}$#\1-yyy}#' file.tex
rm -f file.tex.bak
Much thanks go to #heemayl. Their answer is the better written one, it simply required some tweaking to get a version that worked for me.

Grep or in part of a string

Good day All,
A filename can either be
abc_source_201501.csv Or,
abc_source2_201501.csv
Is it possible to do something like grep abc_source|source2_201501.csv without fully listing out filename as the filenames I'm working with are much longer than examples given to get both options?
Thanks for assistance here.
Use extended regex flag in grep.
For example:
grep -E abc_source.?_201501.csv
would source out both lines in your example. You can think of other regex patterns that would suit your data more.
You can use Bash globbing to grep in several files at once.
For example, to grep for the string "hello" in all files with a filename that starts with abc_source and ends with 201501.csv, issue this command:
grep hello abc_source*201501.csv
You can also use the -r flag, to recursively grep in all files below a given folder - for example the current folder (.).
grep -r hello .
If you are asking about patterns for file name matching in the shell, the extended globbing facility in Bash lets you say
shopt -s extglob
grep stuff abc_source#(|2)_201501.csv
to search through both files with a single glob expression.
The simplest possibility is to use brace expansion:
grep pattern abc_{source,source2}_201501.csv
That's exactly the same as:
grep pattern abc_source{,2}_201501.csv
You can use several brace patterns in a single word:
grep pattern abc_source{,2}_2015{01..04}.csv
expands to
grep pattern abc_source_201501.csv abc_source_201502.csv \
abc_source_201503.csv abc_source_201504.csv \
abc_source2_201501.csv abc_source2_201502.csv \
abc_source2_201503.csv abc_source2_201504.csv

Use grep to find strings at the beginning of a line or after a delimiter in Git Bash for Windows

I have such file:
blue|1|red|2
green|3|blue|4
darkblue|0|yellow|3
I want to use grep to find anything containg blue| at the beginning of line or |blue| anywhere, but not any darkblue| or |darkblue| or |blueberry|
I tried to use grep [^|\|]blue\| but Git Bash gives me error:
$ grep [^|\|]blue\| *.*
grep: Unmatched [ or [^
sh.exe": |]blue|: command not found
What did I do wrong? What's the proper way to do it?
Here's a quick & dirty one:
grep -E '(^|\|)blue\|' *
Matches start of line or |, followed by blue|. The important note is that you need extended regular expressions (via egrep or the -E flag) to use the | (or) construct.
Also, note the single quotes around the regular expression.
So, in answer to the OP's "What did I do wrong?",
You forgot to put the regexp in single quotes;
You chose the wrong type of brackets to enclose the alternate expressions; and finally
You forgot to use egrep or the -E flag
It's always easier to see other people's errors; I wish I was a quick to spot my own :-|

Regular Expressions for file name matching

In Bash, how does one match a regular expression with multiple criteria against a file name?
For example, I'd like to match against all the files with .txt or .log endings.
I know how to match one type of criteria:
for file in *.log
do
echo "${file}"
done
What's the syntax for a logical or to match two or more types of criteria?
Bash does not support regular expressions per se when globbing (filename matching). Its globbing syntax, however, can be quite versatile. For example:
for i in A*B.{log,txt,r[a-z][0-9],c*} Z[0-5].c; do
...
done
will apply the loop contents on all files that start with A and end in a B, then a dot and any of the following extensions:
log
txt
r followed by a lowercase letter followed by a single digit
c followed by pretty much anything
It will also apply the loop commands to an file starting with Z, followed by a digit in the 0-5 range and then by the .c extension.
If you really want/need to, you can enable extended globbing with the shopt builtin:
shopt -s extglob
which then allows significantly more features while matching filenames, such as sub-patterns etc.
See the Bash manual for more information on supported expressions:
http://www.gnu.org/software/bash/manual/bash.html#Pattern-Matching
EDIT:
If an expression does not match a filename, bash by default will substitute the expression itself (e.g. it will echo *.txt) rather than an empty string. You can change this behaviour by setting the nullglob shell option:
shopt -s nullglob
This will replace a *.txt that has no matching files with an empty string.
EDIT 2:
I suggest that you also check out the shopt builtin and its options, since quite a few of them affect filename pattern matching, as well as other aspects of the the shell:
http://www.gnu.org/software/bash/manual/bash.html#The-Shopt-Builtin
Do it the same way you'd invoke ls. You can specify multiple wildcards one after the other:
for file in *.log *.txt
for file in *.{log,txt} ..
for f in $(find . -regex ".*\.log")
do
echo $f
end
You simply add the other conditions to the end:
for VARIABLE in 1 2 3 4 5 .. N
do
command1
command2
commandN
done
So in your case:
for file in *.log *.txt
do
echo "${file}"
done
You can also do this:
shopt -s extglob
for file in *.+(log|txt)
which could be easily extended to more alternatives:
for file in *.+(log|txt|mp3|gif|foo)