How to list all files with a given extension? [duplicate] - regex

I want to search a filename which may contain kavi or kabhi.
I wrote command in the terminal:
ls -l *ka[vbh]i*
Between ka and i there may be v or bh .
The code I wrote isn't correct. What would be the correct command?

A nice way to do this is to use extended globs. With them, you can perform regular expressions on Bash.
To start you have to enable the extglob feature, since it is disabled by default:
shopt -s extglob
Then, write a regex with the required condition: stuff + ka + either v or bh + i + stuff. All together:
ls -l *ka#(v|bh)i*
The syntax is a bit different from the normal regular expressions, so you need to read in Extended Globs that...
#(list): Matches one of the given patterns.
Test
$ ls
a.php AABB AAkabhiBB AAkabiBB AAkaviBB s.sh
$ ls *ka#(v|bh)i*
AAkabhiBB AAkaviBB

a slightly longer cmd line could be using find, grep and xargs. it has the advantage of being easily extended to different search terms (by either extending the grep statement or by using additional options of find), a bit more readability (imho) and flexibility in being able to execute specific commands on the files which are found
find . | grep -e "kabhi" -e "kavi" | xargs ls -l

You can get what you want by using curly braces in bash:
ls -l *ka{v,bh}i*
Note: this is not a regular expression question so much as a "shell globbing" question. Shell "glob patterns" are different from regular expressions, though they are similar in many ways.

Related

Match X or Y in grep regular expression

I'm trying to run a fairly simple regular expression to clear out some home directories. For background: I'm trying to ask users on my system to clear out their unnecessary files to clear up space on their home directories, so I want to inform users with scripts such as Anaconda / Miniconda installation scripts that they can clear that out.
To generate a list of users who might need such an email, I'm trying to run a simple regular expression to list all homedirs that contain such an installation script. So my assumption would be that the follwing should suffice:
for d in $(ls -d /home/); do
if $(ls $d | grep -q "(Ana|Mini)conda[23].*\.sh"); then
echo $d;
fi;
done;
But after running this, it resulted in nothing at all, sadly. After a while looking, I noticed that grep does not interpret regular expressions as I would expect it to. The following:
echo "Lorem ipsum dolor sit amet" | grep "(Lorem|Ipsum) ipsum"
results in no matches at all. Which would then explain why the above forloop wouldn't work either.
My question then is: is it possible to match the specified regular expression (Ana|Mini)conda[23].*\.sh, in the same way it matches strings in https://regex101.com/r/yxN61p/1? Or is there some other way to find all users who have such a file in their homedir using a simple for-loop in bash?
Short answer: grep defaults to Basic Regular Expressions (BRE), but unescaped () and | are part of Extended Regular Expressions (ERE). GNU grep, as an extension, supports alternation (which isn't technically part of BRE), but you have to escape \:
grep -q "\(Ana\|Mini\)conda[23].*\.sh"
Or you can indicate that you want to use ERE:
grep -Eq "(Ana|Mini)conda[23].*\.sh"
Longer answer: this all being said, you don't need grep, and parsing the output of ls comes with a lot of pitfalls. Instead, you can use globs:
printf '%s\n' /home/*/*{Ana,Mini}conda[23]*.sh
should do it, if I understand the intention correctly.
This uses the fact that printf just repeats its formatting string if supplied with more parameters than formatting directives, printing each file on a separate line.
/home/*/*{Ana,Mini}conda[23]*.sh uses brace expansion, i.e., it first expands to
/home/*/*Anaconda[23]*.sh /home/*/*Miniconda[23]*.sh
and each of those is then expanded with filename expansion. [23] works the same way as in a regular expression; * is "zero or more of any character except /".
If you don't know how deep in the directory tree the files you're looking for are, you could use globstar and **:
shopt -s globstar
printf '%s\n' /home/**/*{Ana,Mini}conda[23]*.sh
** matches all files and zero or more subdirectories.
Finally, if you want to handle the case where nothing matches, you could set either shopt -s nullglob (expand to nothing if nothing matches) or shopt -s failglob (error if nothing matches).
Shell patterns are described here.
You don't need ls or grep at all for this:
shopt -s extglob
for f in /home/*/#(Ana|Mini)conda[23].*.sh; do
echo "$f"
done
With extglob enabled, #(Ana|Mini) matches either Ana or Mini.

Grep multiple files using regex for specifying filenames to search for

Let's say I have n files with names like link123.txt, link345.txt, link645.txt, etc.
I'd like to grep a subset of these n files for a keyword. For example:
grep 'searchtext' link123.txt link 345.txt ...
I'd like to do something like
grep 'searchtext' link[123\|345].txt
How can I mention the filenames as regex in this case?
you can use find and grep together like this
find . -regex '.*/link\(123\|345\).txt' -exec grep 'searchtext' {} \;
Thanks for ghoti's comment.
You can use the bash option extglob, which allows extended use of globbing, including | separated pattern lists.
#(123|456)
Matches one of 123 or 456 once.
shopt -s extglob
grep 'searchtext' link#(123|345).txt
shopt -u extglob
I think you're probably asking for find functionality to search for filenames with regex.
As discussed here, you can easely use find . -regex '.*/link\([0-9]\{3\}\).txt' to show all these three files. Now you have only to play with regex.
PS: Don't forget to specify .*/ in the beginning of pattern.
It seems, you don't need regex to determine the files to grep, since you enumerate them all (well, actually you enumerate the minimal unique part without repeating common prefix/suffix).
If regex functionality is not needed and the only aim is to avoid repeating common prefix/suffix, then simple iterating would be an option:
for i in 123 345 645; do grep searchpattern link$i.txt; done

How do I grep multiple possible extensions recursively

This question is different from other grep pattern matching questions because we're looking for a large number of file extensions, and thus the following from this question will be too long and tedious to type:
grep -r -i --include '*.ade' --include '*.adp' ... CP_Image ~/path[12345]
I was trying to email the backup of a static site when Google blocked my attachment upload for security reasons. Their support page says:
You can't send or receive the following file types:
.ade, .adp, .bat, .chm, .cmd, .com, .cpl, .exe, .hta, .ins, .isp, .jar, .jse, .lib, .lnk, .mde, .msc, .msp, .mst, .pif, .scr, .sct, .shb, .sys, .vb, .vbe, .vbs, .vxd, .wsc, .wsf, .wsh
I converted and tested the following Regular Expression here:
/.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)/gi
And tried running it with:
ls -lahR | grep '.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)'
It doesn't work. I don't think grep interprets the and (|) symbol properly because ls -lahR | grep '.*\.html' works
Normal grep uses Basic Regular Expressions (BRE). In BRE, capturing groups are represented by \(...\) and the alternation op is referred by \|
grep '.*\.\(ade\|adp\|bat\|chm\|cmd\|com\|cpl\|exe\|hta\|ins\|isp\|jar\|jse\|lib\|lnk\|mde\|msc\|msp\|mst\|pif\|scr\|sct\|shb\|sys\|vb\|vbe\|vbs\|vxd\|wsc\|wsf\|wsh\)'
OR
grep -E '.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|ms‌​t|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)'
Use --extended-regex by enabling the -E parameter.
Reference
Add the flag -E to indicate it's an extended regular expression. From GNU Grep 2.1: The default is "basic regular expression", and
[i]n basic regular expressions the meta-characters ‘?’, ‘+’, ‘{’, ‘|’, ‘(’, and ‘)’ lose their special meaning.
I'm recursively trying to find files with the specified extensions.
Better to use find with -iregex option:
find . -regextype posix-egrep -iregex '.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)'
On OSX use:
find -E . posix-egrep -iregex '.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)'
A bash method to exclude the given extensions: use extended globbing
shopt -s extglob nullglob
ls *.!(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)

Grep or in part of a string

Good day All,
A filename can either be
abc_source_201501.csv Or,
abc_source2_201501.csv
Is it possible to do something like grep abc_source|source2_201501.csv without fully listing out filename as the filenames I'm working with are much longer than examples given to get both options?
Thanks for assistance here.
Use extended regex flag in grep.
For example:
grep -E abc_source.?_201501.csv
would source out both lines in your example. You can think of other regex patterns that would suit your data more.
You can use Bash globbing to grep in several files at once.
For example, to grep for the string "hello" in all files with a filename that starts with abc_source and ends with 201501.csv, issue this command:
grep hello abc_source*201501.csv
You can also use the -r flag, to recursively grep in all files below a given folder - for example the current folder (.).
grep -r hello .
If you are asking about patterns for file name matching in the shell, the extended globbing facility in Bash lets you say
shopt -s extglob
grep stuff abc_source#(|2)_201501.csv
to search through both files with a single glob expression.
The simplest possibility is to use brace expansion:
grep pattern abc_{source,source2}_201501.csv
That's exactly the same as:
grep pattern abc_source{,2}_201501.csv
You can use several brace patterns in a single word:
grep pattern abc_source{,2}_2015{01..04}.csv
expands to
grep pattern abc_source_201501.csv abc_source_201502.csv \
abc_source_201503.csv abc_source_201504.csv \
abc_source2_201501.csv abc_source2_201502.csv \
abc_source2_201503.csv abc_source2_201504.csv

How to use ls to list out files that end in numbers

I'm not sure if I'm using regular expressions in bash correctly. I'm on a Centos system using bash shell. In our log directory, there are log files with digits appended to them, i.e.
stream.log
stream.log.1
stream.log.2
...
stream.log.nnn
Unfortunately there are also log files with the new naming convention,
stream.log.2014-02-14
stream.log.2014-02-13
I need to get files with the old log file naming format. I found something that works but I'm wondering if there's another more elegant way to do this.
ls -v stream.log* | grep -v 2014
I don't know how regular expressions work in bash and/or what command (other than possibly grep) to pipe output to. The cmd/regex I was thinking of is something like this:
ls -v stream.log(\.\d{0,2})+
Not surprisingly, this didn't work. Maybe my logic is incorrect but I wanted to say from the cmdline list files with the name stream.log with an optional .xyz at the end where xyz={1..999} is appended at the end. Please let me know if this is doable or if the solution I came up with is the only way to do something like this. Thanks in advance for your help.
EDIT: Thanks for everyone's prompt comments and reply. I just wanted to bring up that there's also a file called stream.log that doesn't any digits appended to it that also needs to make it into my ls listing. I tried the tips in the comment and answer and they work but it leaves out that file.
You can do this with extended pattern matching in bash, e.g.
> shopt -s extglob
> ls *'.'+([0-9])
Where
+(pattern-list)
Matches one or more occurrences of the given patterns
And other useful syntaxes.
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
Alternatively without extended pattern matching could use a less neat solution
ls *'.'{1..1000} 2>dev/null
And replace 1000 with some larger number if you have a lot of log files. Though I would prefer the grep option to this one.
An approach using sed:
ls -v stream.log* | sed -nE '/log(\.[0-9]+)?$/p'
and one using egrep:
ls -v stream.log* | egrep 'log(\.[0-9]+)?$'
These print out lines that end in "log" and optionally a period and any positive number of digits, followed by the end of the line.
You can this much more simply by just focusing on the dash '-' in the old logfile format. Here is the minimal version:
ls *-*
This may be a little safer if there are different types of logfiles in the same directory:
ls stream.log.*-*
To ensure that you get the one extra file, it does not make sense to generate a confusing regex that will fit it - just include it on the ls line:
ls stream.log stream.log.*-*
refer #BroSlow's answer, here is the fix which will include stream.log as well.
shopt -s extglob
ls stream.log*(.)*([0-9])
stream.log stream.log.1 stream.log.2