How to ignore digits

How to ignore digits - regex

I have a file location
/appl/bcm_prod/u/scratch/markit/markitdownloader_20160420_25918.log
I know the variable for the year,month, and day, how do I ignore the rest of the string?
For example,
/appl/bcm_prod/u/scratch/markit/markitdownloader_%Y%m%d_25918.log
what do I put for the 25918 id which can change everyday.

What is the regex flavour?
There are examples below:
find /appl/bcm_prod/u/scratch/markit/ -regex ".*/markitdownloader_20160420_[0-9][0-9]*.log" -type f
find /appl/bcm_prod/u/scratch/markit/ -type f | grep "markitdownloader_20160420_[0-9][0-9]*.log"
ls /appl/bcm_prod/u/scratch/markit/markitdownloader_20160420_*.log
ls /appl/bcm_prod/u/scratch/markit/markitdownloader_20160420_+([[:digit:]]).log

Typically for regex you could either use * or .*. So you would put the date as necessary and then *.log or .*.log depending on your regex version. *, depending on the regex version, either means 0 or any or is a wildcard. If * isn't the wildcard then . is. On bash I would say logname_date_*.log is what you are looking for.

Related

regex quantifiers in bash --simple vs extended matching {n} times

I'm using the bash shell and trying to list files in a directory whose names match regex patterns. Some of these patterns work, while others don't. For example, the * wildcard is fine:
$ls FILE_*
FILE_123.txt FILE_2345.txt FILE_789.txt
And the range pattern captures the first two of these with the following:
$ls FILE_[1-3]*.txt
FILE_123.txt FILE_2345.txt
but not the filename with the "7" character after "FILE_", as expected. Great. But now I want to count digits:
$ls FILE_[0-9]{3}.txt
ls: FILE_[0-9]{3}.txt: No such file or directory
Shouldn't this give me the filenames with three numeric digits following "FILE_" (i.e. FILE_123.txt and FILE_789.txt, but not FILE_2345.txt) Can someone tell me how I should be using the {n} quantifier (i.e. "match this pattern n times)?

ls uses with glob pattern, you can not use {3}. You have to use FILE_[0-9][0-9][0-9].txt. Or, you could the following command.
ls | grep -E "FILE_[0-9]{3}.txt"
Edit:
Or, you also use find command.
find . -regextype egrep -regex '.*/FILE_[0-9]{3}\.txt'
The .*/ prefix is needed to match a complete path. On Mac OS X :
find -E . -regex ".*/FILE_[0-9]{3}\.txt"

Bash filename expansion does not use regular expressions. It uses glob pattern matching, which is distinctly different, and what you're trying with FILE_[0-9]{3}.txt does brace expansion followed by filename expansion. Even bash's extended globbing feature doesn't have an equivalent to regular expression's {N}, so as already mentioned you have to use FILE_[0-9][0-9][0-9].txt

find that excludes directories that are number dash number

I'm trying to write a find command that excludes directories that are numbers dash number, but allow other directories.
Sample directories
./135888897-135954433/
./135888897-135954434/
./135888897-135954435/
./BLAG-DEF-JOB1/
./TOM-DEPLOYDEV-JOB1/
./FRANK-RELEASE-JOB1/
./STEVE-RELEASE-JOB1/
Here's part of my find command. I can't seem to get it to skip the number directories.
find . -type f ! -regex '\./[0-9]+\-[0-9]+/*'
Any help would be great. Thanks!

You should use .* instead of *.
When useing regex a * means 'match the preceding token 0 or more times'
This will result in the following command:
find . -type f ! -regex '\./[0-9]+\-[0-9]+/.*'
Update: I also thought you forgot to escape the / in your command, but after doing a little bit of research it seems escaping / is not necessary when using the find command.

You can use:
find . -type f ! -regex '\./[0-9]+-[0-9]+/.*'

How can I use regular expression to search files in Unix?

I have following files from 2 different categories :
Category 1 :
MAA
MAB
MAC
MAD
MAE
MAF
MAG
MAH
MAJ
MBA
MBB
MBC
MBD
MBE
MDA
MDD
and Category 2 :
MCA
MCB
MCC
MCD
MCE
MCF
MCG
MDB
So my question is : How can I write regular expression so that I can find files from category 1 only ?
I don't want to do hard coded script, expecting some logic from brilliant people.
I am trying this :
find . -regex "*[M][A,B,D][A,B,C,D,E,F,J].txt"

It's quite simple :
ls -l | grep "MAA\|MAB\|MAC\|MAD\|MAE\|MAF\|MAG\|MAH\|MAJ\|MBA\|MBB\|MBC\|MBD MBE\|MDA\|MDD"
Ok so you don't want hardcoded. Then yes you should state the patterns which should NOT match -v
ls -l | grep -v "MC." | grep -v "pattern2" | ....

Your question is not very precise, but from your attempt, I conclude, that you are looking for files having names ending in ....MAA.txt, ...MAB.txt and so on, and being located in either your working directory or somewhere below.
You also didn't mention, which shell you are using. Here is an example using zsh - no need to write a regular expression here:
ls ./**/*M{AA,AB,AC,AD,AE,AF,AG,AH,AJ,BA,BB,BC,BD,BE,DA,DD}.txt

I am trying this : find . -regex "*[M][A,B,D][A,B,C,D,E,F,J].txt"
The errors in this are:
The wildcard for any characters in a regex is .*, unlike just * in a normal filename pattern.
You forgot G and H in the third bracket expression.
You didn't exclude the category 2 name MDB.
Besides:
The characters of a bracket expression are not to be separated by ,.
A bracket expression with a single item ([M]) can be replaced by just the item (M).
This leads to:
find . -regex ".*M[ABD].*" -not -name "MDB*"
or, without regex:
find . -name "M[ABD]*" -not -name "MDB*"

How to list files with numbers in their name and retrieve the numbers?

I am very new to regex, therefore I do imagine this is quite a simple question to answer and must have been asked several times already, but unfortunly I can't find any of those answers.
Given a directory, I need the list of all of its subdirectories whose names respect the pattern "nw=[number].a=[number]", and for every directory I need to retrieve those numbers and do a few things based on those. Some of these directories are nw=82.a=40, nw=100.a=9, ecc.
My guess to accomplish this would be
#! /bin/bash
cd $mydir
for dir in `ls | grep nw=[:digit:]+.a=[:digit:]`: do
retrieve the numbers
a few things
done
Why doesn't it work, and how could I retrieve the numbers?
Thank you in advance,
Ferdinando

Some corrections on your grep command:
grep -E 'nw=[[:digit:]]+\.a=[[:digit:]]+'
Use the "-E" flag so you can use an extended regex, which includes the '+' operator, for example.
Use double square brackets
Escape the period, otherwise it will be used as an operator to match any character
A final '+' was missing from the end, not entirely necessary since grep will match more general cases, but it probably represents better your path names
It is probably good practice to place your regex between quotes (in this case, single quotes will do)
Hope this helps =)

perl -e '#a=`ls`;m/nw=(\d+)\.a=(\d+)(?{print"$1\t$2\n"})/ for#a'
Enjoy.
Call the terminal's ls command and store the list in the array #a.
#a=`ls`;
looking for match
m/
nw=(digits that I capture in $1).a=(digits that I capture in $2)
nw=(\d+)\.a=(\d+)
start evaluation of code from within a pattern
(?{
print first number,tab, second number, newline
print"$1\t$2\n"})
end matching pattern group
/
perform this match attempt with embedded code on each filename (with newlines still appended) in array #a
for#a
Yes, that was cryptic.

Don't parse ls. Use find instead:
find . -maxdepth 1 -type d -regex '.*nw=[0-9]+\.a=[0-9]+.*' | while IFS= read -r dir
do
echo "Found directory: $dir"
if [[ "$dir" =~ nw=([0-9]+)\.a=([0-9]+) ]]
then
echo "numbers are ${BASH_REMATCH[1]} and ${BASH_REMATCH[2]}"
fi
done

grep - search for "<?\n" at start of a file

I have a hunch that I should probably be using ack or egrep instead, but what should I use to basically look for
<?
at the start of a file? I'm trying to find all files that contain the php short open tag since I migrated a bunch of legacy scripts to a relatively new server with the latest php 5.
I know the regex would probably be '/^<\?\n/'

I RTFM and ended up using:
grep -RlIP '^<\?\n' *
the P argument enabled full perl compatible regexes.

If you're looking for all php short tags, use a negative lookahead
/<\?(?!php)/
will match <? but will not match <?php
[meder ~/project]$ grep -rP '<\?(?!php)' .

find . -name "*.php" | xargs grep -nHo "<?[^p^x]"
^x to exclude xml start tag

if you worried about windows line endings, just add \r?.

grep '^<?$' filename
Don't know if that is showing up correctly. Should be
grep ' ^ < ? $ ' filename

Do you mean a literal "backslash n" or do you mean a newline?
For the former:
grep '^<?\\n' [files]
For the latter:
grep '^<?$' [files]
Note that grep will search all lines, so if you want to find matches just at the beginning of the file, you'll need to either filter each file down to its first line, or ask grep to print out line numbers and then only look for line-1 matches.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to ignore digits - regex

Related

regex quantifiers in bash --simple vs extended matching {n} times

find that excludes directories that are number dash number

How can I use regular expression to search files in Unix?

How to list files with numbers in their name and retrieve the numbers?

grep - search for "<?\n" at start of a file

Categories

Resources