regular expression for "11th to 16th letter" - regex

I am new to regular expression. Need help for reading files in unix system. I want to apply regular expression on ls command.
I have below files :
DLERMS08001708161708209683.csv.gz
DLERMS13001708161330170816.csv.gz
DLERMS13001708171330170816.csv.gz
and would like to extract files which have 170816 between 11th record to 16th digit.
I tried with below command ls *170816*.gz. However I am getting 3 filenames instead of two. I want only first two filenames instead of all 3. Could you please help.
Also want to add here that my third filename already contains 170816 at the end DLERMS13001708171330170816.csv.gz. I want to avoid this in my ls command output.

Using bash parameter-expansion alone,
for file in *.csv.gz; do
[ -e "$file" ] || continue
[ "${file:10:6}" == "170816" ] && printf "%s\n" "$file"
done
${PARAMETER:OFFSET:LENGTH}
This one can expand only a part of a parameter's value, given a position to start and maybe a length. If LENGTH is omitted, the parameter will be expanded up to the end of the string. If LENGTH is negative, it's taken as a second offset into the string, counting from the end of the string
Based on comments from below, apparently OP wants to copy the files intended to an alternate path, in which case the printf() should be replaced with cp with necessary arguments
[ "${file:10:6}" == "170816" ] && cp -- "$file" path/to/destination

Firstly, be careful not to confuse regular expressions with shell glob patterns (which is what you want here).
Your glob could be:
??????????170816*.gz
Which matches 10 unknown characters followed by the sequence you specified.
Depending on your next step, you might not need to use ls at all, for example you can loop over these files like this:
for file in ??????????170816*.gz; do
something_with "$file"
done
Or output the files that match using one of the following:
echo ??????????170816*.gz
printf '%s\n' ??????????170816*.gz
If there is a possibility that no files match, then you may wish to consider enabling nullglob (using shopt -s nullglob), which would expand to nothing in that case.

If you want to use globbing, it's not the same as using regular expression.
In your example you can use "?" as a placeholder for matching a single character:
Hence to achieve what you want as output, use ls with pattern below -
ls ??????????170816*

You want to use the wildcard (not regex) "any single letter" ? appropriatly often.
ls DLERMS????170816*.csv.gz
Regexes are much more flexible/powerful and overkill for this simple use case.
But as far as I know, ls does not support them, so you would have to go via other bash tools to identify the files in case you ever need to actually use regexes for anything.
I also reflected what I perceive to be another common of your filenames, the DLERMS at the beginning, if that is NOT common, replace those letter by ?, too.

Try this:
ls ??????????170816*

A solution with find and regex
find . -regextype egrep -regex "^.{12}170816.*\.gz"
find read: ./xxxxxxxxxxxxx and .{12} means the first twelve, so 170816 is the expression between 13th record to 18th

I don't think you can use regular expressions with ls directly, but with egrep, it works fine.
ls * | egrep "DLERMS[0-9]{4}170816[0-9]{10}.csv.gz"
[0-9]{4} - any number, four times.
[0-9]{10} - any number, ten times.
Also could be used instead "egrep" the command "grep -E", the -E option allows especial regular expressions like "[{|" without need to escape them "\".

Related

grep and regex stored in string

my question is quite short:
a="'[0-9]*'"
grep -E '[0-9]*' #for example, line containing 000 will be recognized and printed
but
grep -E $a #line containing 000 WILL NOT be printed, why is that?
Does substitution for grep regex change the command's behaviour or have I missed something from a syntactic point of view? In other words, how do I make it so that grep accepts regex from a string stored in a variable.
Thank you in advance.
Quotes go around data, not in data. That means, when you store data (in this case, a regex expression) in a variable, don't embed quotes in the variable; instead, put double-quotes around the variable when you use it:
a="[0-9]*"
grep -E "$a"
You can sometimes get away with leaving the double-quotes off when using variables (as in Avinash Raj's comment), but it's not generally safe. In this case, it'll work fine provided there are no files or subdirectories in the current working directories with names that happen to start with a digit. You see, without double-quotes around $a, the shell will take its value, try to split it into multiple words (not a problem here), try to expand each word that contains shell wildcards into a list of matching files (potential problem here), and pass that to the command (grep) as its list of arguments. That means that if you happen to have files that start with digits in the current directory, grep thinks you ran a command like this:
grep -E 1file.txt 2file.jpg 3file.etc
... and it treats the first filename as the pattern to search for, and any other filenames as files to be searched. And you'll be scratching your head wondering why your script works or fails depending on which directory you happen to be in.
Note: the pattern [0-9]* is a valid regular expression, and a valid shell glob (wildcard) pattern, but it means very different things in the two contexts. As a regex, it means 0 or more digits in a row. As a shell glob, it means something that starts with a digit. Speaking of which, grep -E '[0-9]*' is not actually going to be very useful, since everything contains strings of 0 or more digits, so it'll match every line of every file you feed it.

How to rename a file using regex capture group in Linux?

I want to rename a_1.0.tgz to b_1.0.tgz, since 1.0 may be changed to any version number, how can I achieve that?
For example, I can use mv a*.tgz b.tgz if I don't need to keep the version number.
zsh comes with the utility zmv, which is intended for exactly that. While zmv does not support regex, it does provide capture groups for filename generation patterns (aka globbing).
First, you might need to enable zmv. This can be done by adding the following to your ~/.zshrc:
autoload -Uz zmv
You can then use it like this:
zmv 'a_(*)' 'b_$1'
This will rename any file matching a_* so, that a_ is replaced by b_. If you want to be less general, you can of course adjust the pattern:
to rename only .tgz files:
zmv 'a_(*.tgz)' 'b_$1'
to rename only .tgz files while changing the extension to .tar.gz
zmv 'a_(*).tgz' 'b_$1.tar.gz'
to only rename a_1.0.tgz:
zmv 'a_(1.0.tgz)' 'b_$1'
To be on the save side, you can run zmv with the option -n first. This will only print, what would happen, but not actually change anything. For more information have a look at the man zshcontrib.
I'm not too familiar with zsh so I don't know if it supports regular expressions but I don't think you really need them here.
You can match the file using a glob and use a substitution:
for file in a_[0-9].[0-9].tgz; do
echo "$file" "${file/a/b}"
done
In the glob pattern, [0-9] matches any number between 0 and 9. ${file/a/b} substitutes the first occurrence of a with b.
Change the echo to mv if you're happy with the result.
Assuming you would like to replace the first character in all files matching a*.tgz with the letter b:
for f in a*.tgz; do
echo mv "$f" "b${f:1}"
done
Remove the echo when you are certain that this does what you want it to do.
The ${f:1} uses the ${name:offset} parameter expansion. From the zshexpn manual (on OS X):
If offset is non-negative, then if the variable name is a
scalar substitute the contents starting offset characters
from the first character of the string, [...]

Grep Search Specific Character Trouble

I have searched extensively and cannot figure out what I am doing wrong here. I have a text file that may contain a string similar to the following:
/dev/dir1/dir2 200G 22G 179G 11% /usr/dir3/dir4
I generally know what the sting will look like up until the disk percentage indicator (i.e. 11%), but in the final part of the string I need to figure out if it ends in the usr (or sub) directories.
I want to use grep to do this search but am having problems. For example, the following command gives me output, but once i replace any of the "." characters where the "G" or "%" would be, or if I try to add "/usr/.*" at the end it refuses to return anything.
$ egrep ^/dev/dir1/dir2\s*\d*.\s*\d*.\s*\d*.\s*\d*.\s*.*$ testfile
/dev/dir1/dir2 200G 22G 179G 11% /usr/dir3/dir4
grep's extended regular expressions do not support using \d to match digits. Instead, use [0-9] or [:digit:]. You can use the following grep command:
egrep '^/dev/dir1/dir2\s*[0-9]*G\s*[0-9]*G\s*[0-9]*G\s*[0-9]*%\s*.*$'
You can also pass grep the -P option to enable Perl compatible regular expressions, which do support \d:
grep -P '^/dev/dir1/dir2\s*\d*G\s*\d*G\s*\d*G\s*\d*%\s*.*$'
Note the use of grep instead of egrep in the above command; -P is incompatible with egrep.
As a side note, I prefer to use + instead of * when I can, because it is stricter and can cause errors to become apparent sooner. For example, I assume there will always be at least one space and one digit in each place in the input, so you can use \s+ and [0-9]+ (or \d+). If your original pattern had used +, it would not have matched at all in the first place (whether it was quoted or not), and you would have known you had a problem even before adding the G or % to it. A working example is
egrep '^/dev/dir1/dir2\s+[0-9]+.\s+[0-9]+.\s+[0-9]+.\s+[0-9]+.\s+.+$'

How to use ls to list out files that end in numbers

I'm not sure if I'm using regular expressions in bash correctly. I'm on a Centos system using bash shell. In our log directory, there are log files with digits appended to them, i.e.
stream.log
stream.log.1
stream.log.2
...
stream.log.nnn
Unfortunately there are also log files with the new naming convention,
stream.log.2014-02-14
stream.log.2014-02-13
I need to get files with the old log file naming format. I found something that works but I'm wondering if there's another more elegant way to do this.
ls -v stream.log* | grep -v 2014
I don't know how regular expressions work in bash and/or what command (other than possibly grep) to pipe output to. The cmd/regex I was thinking of is something like this:
ls -v stream.log(\.\d{0,2})+
Not surprisingly, this didn't work. Maybe my logic is incorrect but I wanted to say from the cmdline list files with the name stream.log with an optional .xyz at the end where xyz={1..999} is appended at the end. Please let me know if this is doable or if the solution I came up with is the only way to do something like this. Thanks in advance for your help.
EDIT: Thanks for everyone's prompt comments and reply. I just wanted to bring up that there's also a file called stream.log that doesn't any digits appended to it that also needs to make it into my ls listing. I tried the tips in the comment and answer and they work but it leaves out that file.
You can do this with extended pattern matching in bash, e.g.
> shopt -s extglob
> ls *'.'+([0-9])
Where
+(pattern-list)
Matches one or more occurrences of the given patterns
And other useful syntaxes.
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
Alternatively without extended pattern matching could use a less neat solution
ls *'.'{1..1000} 2>dev/null
And replace 1000 with some larger number if you have a lot of log files. Though I would prefer the grep option to this one.
An approach using sed:
ls -v stream.log* | sed -nE '/log(\.[0-9]+)?$/p'
and one using egrep:
ls -v stream.log* | egrep 'log(\.[0-9]+)?$'
These print out lines that end in "log" and optionally a period and any positive number of digits, followed by the end of the line.
You can this much more simply by just focusing on the dash '-' in the old logfile format. Here is the minimal version:
ls *-*
This may be a little safer if there are different types of logfiles in the same directory:
ls stream.log.*-*
To ensure that you get the one extra file, it does not make sense to generate a confusing regex that will fit it - just include it on the ls line:
ls stream.log stream.log.*-*
refer #BroSlow's answer, here is the fix which will include stream.log as well.
shopt -s extglob
ls stream.log*(.)*([0-9])
stream.log stream.log.1 stream.log.2

Regular Expressions for file name matching

In Bash, how does one match a regular expression with multiple criteria against a file name?
For example, I'd like to match against all the files with .txt or .log endings.
I know how to match one type of criteria:
for file in *.log
do
echo "${file}"
done
What's the syntax for a logical or to match two or more types of criteria?
Bash does not support regular expressions per se when globbing (filename matching). Its globbing syntax, however, can be quite versatile. For example:
for i in A*B.{log,txt,r[a-z][0-9],c*} Z[0-5].c; do
...
done
will apply the loop contents on all files that start with A and end in a B, then a dot and any of the following extensions:
log
txt
r followed by a lowercase letter followed by a single digit
c followed by pretty much anything
It will also apply the loop commands to an file starting with Z, followed by a digit in the 0-5 range and then by the .c extension.
If you really want/need to, you can enable extended globbing with the shopt builtin:
shopt -s extglob
which then allows significantly more features while matching filenames, such as sub-patterns etc.
See the Bash manual for more information on supported expressions:
http://www.gnu.org/software/bash/manual/bash.html#Pattern-Matching
EDIT:
If an expression does not match a filename, bash by default will substitute the expression itself (e.g. it will echo *.txt) rather than an empty string. You can change this behaviour by setting the nullglob shell option:
shopt -s nullglob
This will replace a *.txt that has no matching files with an empty string.
EDIT 2:
I suggest that you also check out the shopt builtin and its options, since quite a few of them affect filename pattern matching, as well as other aspects of the the shell:
http://www.gnu.org/software/bash/manual/bash.html#The-Shopt-Builtin
Do it the same way you'd invoke ls. You can specify multiple wildcards one after the other:
for file in *.log *.txt
for file in *.{log,txt} ..
for f in $(find . -regex ".*\.log")
do
echo $f
end
You simply add the other conditions to the end:
for VARIABLE in 1 2 3 4 5 .. N
do
command1
command2
commandN
done
So in your case:
for file in *.log *.txt
do
echo "${file}"
done
You can also do this:
shopt -s extglob
for file in *.+(log|txt)
which could be easily extended to more alternatives:
for file in *.+(log|txt|mp3|gif|foo)