Regular Expression in Find command - regex

I want to list out the files which starts with a number and ends with ".c" extension. The following is the find command which is used. But, it does not give
the expected output.
Command:
find -type f -regex "^[0-9].*\\.c$"

It's because the regex option works with the full path and you specified only the file name. From man find:
-regex pattern
File name matches regular expression pattern. This is a match on the whole
path, not a search. For example, to match a file named './fubar3', you can use
the regular expression '.*bar.' or '.*b.*3', but not 'f.*r3'.
The regular expressions understood by find are by default Emacs Regular
Expressions, but this can be changed with the -regextype option.
Try with this:
find -type f -regex ".*/[0-9][^/]+\.c$"
where you explicitly look for a string where "the format of your filename follows any string that terminates with a slash"
UPDATE: I made a correction to the regex. I changed .* in the filename to [^\]+ as after "any string that terminates with a slash" we don't want to find a slash in that part of the string because it wouldn't be a filename but another directory!
NOTE: The matching .* can be very harmful...

Just use -name option. It accepts pattern for the last component of the path name as the doc says:
-name pattern
True if the last component of the pathname being examined matches
pattern. Special shell pattern matching characters (``['',
``]'', ``*'', and ``?'') may be used as part of pattern. These
characters may be matched explicitly by escaping them with a
backslash (``\'').
So:
$ find -type f -name "[0-9]*.c"
should work.

Related

Linux find-command with regex and write in a file working strange

So I have a folder with .wav files that look like this:
afr_(4 digits)_(random digits).wav
I want to create a list with all files that start with for example afr_0184_ and afr_1919_ and I use this linux command line
find /directory/ -name
"arf_0184_*.wav" -o -name "afr_1919_*.wav" > train.list
For some reason the list only has afr_1919_ files in it as if it overwrites the afr_0184_ that are found before them.
I also tried this
find /directory/ -name 'afr_(0184|1919)_*.wav' > train.list
but the list is empty in this case.
What am I doing wrong here?
You can use
find /directory/ -regextype posix-egrep -regex '.*/afr_(0184|1919)_[0-9]*\.wav' > train.list
To match any 4 digit variations instead of the two 'hardcoded' values, use
'.*/afr_[0-9]{4}_[0-9]*\.wav'
Using -name, you can only search with a glob wildcard pattern, you need -regex.
With -regextype posix-egrep, you can use alternation and unescaped capturing parentheses. [0-9]* match any zero or more digits (replace * with + to match one or more).
The .*/ is added because the regex should match the whole path string.

regex quantifiers in bash --simple vs extended matching {n} times

I'm using the bash shell and trying to list files in a directory whose names match regex patterns. Some of these patterns work, while others don't. For example, the * wildcard is fine:
$ls FILE_*
FILE_123.txt FILE_2345.txt FILE_789.txt
And the range pattern captures the first two of these with the following:
$ls FILE_[1-3]*.txt
FILE_123.txt FILE_2345.txt
but not the filename with the "7" character after "FILE_", as expected. Great. But now I want to count digits:
$ls FILE_[0-9]{3}.txt
ls: FILE_[0-9]{3}.txt: No such file or directory
Shouldn't this give me the filenames with three numeric digits following "FILE_" (i.e. FILE_123.txt and FILE_789.txt, but not FILE_2345.txt) Can someone tell me how I should be using the {n} quantifier (i.e. "match this pattern n times)?
ls uses with glob pattern, you can not use {3}. You have to use FILE_[0-9][0-9][0-9].txt. Or, you could the following command.
ls | grep -E "FILE_[0-9]{3}.txt"
Edit:
Or, you also use find command.
find . -regextype egrep -regex '.*/FILE_[0-9]{3}\.txt'
The .*/ prefix is needed to match a complete path. On Mac OS X :
find -E . -regex ".*/FILE_[0-9]{3}\.txt"
Bash filename expansion does not use regular expressions. It uses glob pattern matching, which is distinctly different, and what you're trying with FILE_[0-9]{3}.txt does brace expansion followed by filename expansion. Even bash's extended globbing feature doesn't have an equivalent to regular expression's {N}, so as already mentioned you have to use FILE_[0-9][0-9][0-9].txt

Regex for find command to match files with a non-empty file extension

I have a series of files that I want to clean up that are .log files that have been rotated. Examples:
error.log
access.log
error.log-2016-02-05
access.log.1
debug.log
debug.log--2
Regex is matching all of the log files with:
find . -regextype posix-extended -regex '^.*.log.*'
How can I only match ONLY the files that have characters after *.log?
Replace the last occurrence of .* with .+.
* matches 0 or more instances of the previous character.
+ matches 1 or more instances.
You also need to escape the . before log with a \, otherwise it will match any character rather than just a literal period.
In summary, use this:
find . -regextype posix-extended -regex '^.*\.log.+'
A few other adjustments might also be useful:
you probably don't want to match files with empty filenames, so you should also switch the first .* to a .+ as well (Thanks, Jan!).
you probably don't want to allow files with file extension .log. (a single . character after .*log), so you should switch the final .+ to \..+.
This would give you the final command:
find . -regextype posix-extended -regex '^.+\.log\..+'

Find all text within square brackets using regex

I have a problem that because of PHP version, I need to change my code from $array[stringindex] to $array['stringindex'];
So I want to find all the text using regex, and replace them all. How to find all strings that look like this? $array[stringindex].
Here's a solution in PHP:
$re = "/(\\$[[:alpha:]][[:alnum:]]+\\[)([[:alpha:]][[:alnum:]]+)(\\])/";
$str = "here is \$array[stringindex] but not \$array['stringindex'] nor \$3array[stringindex] nor \$array[4stringindex]";
$subst = "$1'$2'$3";
$result = preg_replace($re, $subst, $str);
You can try it out interactively here. I search for variables beginning with a letter, otherwise things like $foo[42] would be converted to $foo['42'], which might not be desirable.
Note that all the solutions here will not handle every case correctly.
Looking at the Sublime Text regex help, it would seem you could just paste (\\$[[:alpha:]][[:alnum:]]+\\[)([[:alpha:]][[:alnum:]]+)(\\]) into the Search box and $1'$2'$3 into the Replace field.
It depends of the tool you want to use to do the replacement.
with sed for exemple, it would be something like that:
sed "s/\(\$array\)\[\([^]]*\)\]/\1['\2']/g"
If sed is allowed you could simply do:
sed -i "s/(\$[^[]*[)([^]]*)]/\1'\2']/g" file
Explanation:
sed "s/pattern/replace/g" is a sed command which searches for pattern and replaces it with replace. The g options means replace multiple times per line.
(\$[^[]*[)([^]]*)] this pattern consists of two groups (in between brackets). The first is a dollar followed by a series of non [ chars. Then an opening square bracket follows, followed by a series of non closing brackets which is then followed by a closing square bracket.
\1'\2'] the replacement string: \1 means insert the first captured group (analogous for \2. Basically we wrap \2 in quotes (which is what you wanted).
the -i options means that the changes should be applied to the original file, which is supplied at the end.
For more information, see man sed.
This can be combined with the find command, as follows:
find . -name '*.php' -exec sed -i "s/(\$[^[]*[)([^]]*)]/\1'\2']/g" '{}' \;
This will apply the sed command to all php files found.

Sed on Mac not recognizing regular expressions

In terminal, I am attempting to clean up some .txt files so they can be imported into another program. Only literal search/replaces seem to be working. I cannot get regular expression searches to work.
If I attempt a search and replace with a literal string, it works:
find . -type f -name '*.txt' -exec sed -i '' s/Title Page// {} +;
(remove the words "Title Page" from every text file)
But if I am attempting even the most basic of regular expressions, it does not work:
find . -type f -name '*.txt' -exec sed -i '' s/\n\nDOWN/\\n<DOWN\>/ {} +;
(In every text file, reformat any word "DOWN" that follows double return: remove extra newline and put word in brackets: "\n")
This does not work. The only thing at all "regular expression" about this is looking for the newline.
I must be doing something incorrectly.
Any help is much appreciated.
Update: part 2
John1024's answer helped me out a lot for one aspect.
find . -type f -name '*.txt' -exec sed -i '' '/^$/{N; s/\n[0-9]+/\n/;}' {} +;
Now I am having trouble getting other types of regular expressions to respond properly. The example above, I wish to remove all numbers that appear at the beginning of a line.
Argh! What am I missing?
By default, sed handles only one line at a time. When a line is read into sed's pattern space the newline character is removed.
I see that you want to look for an empty line followed by DOWN and, when found, remove the empty and change the text to <DOWN>. That can be done. Consider this as the test file:
$ cat file
some
thing
DOWN
DOWN
other
Try:
$ sed '/^$/{N; s/\nDOWN/<DOWN>/;}' file
some
thing
DOWN
<DOWN>
other
How it works
/^$/
This looks for empty lines. The commands in braces which follow are executed only on empty lines.
{N; s/\nDOWN/<DOWN>/;}
The N command reads the next line into the pattern space, separated from the current line by a newline character.
If the pattern space matches an empty line followed by DOWN, the substitution command, s/\nDOWN/<DOWN>/, removes the newline and replaces the DOWN with <DOWN>.
Special Case: DOS/Windows Files
If a file has DOS/Windows line endings, \r\n, sed will only remove the \n when the line is read in. The \r will remain. When dealing with these files, the presence of that character, if unanticipated, may lead to surprising results.