How to use mksquashfs -regex? - regex

I want to mksquashfs a chroot, and include the /cdrom dir, but exclude everything inside it. I already know how to do this with -wildcards, but I want to see if -regex has a bug. Test case:
cd $(mktemp -d)
mkdir -p cdrom cdrom2/why
mksquashfs . /tmp/chroot.squashfs -info -noappend -regex -e '^cdrom/.*$'
The problem is that cdrom2/why was omitted! It seems to me like "/" is actually ignored there. Is this a mksquashfs bug?

This is because you don't fully understand how regexes work in Mksquashfs exclude files.
An exclude file if wildcards are used is basically treated as series of wildcarded files separated by slashes (/), i.e. wildcard1/wildcard2/wildcard3, will match wildcard1 against the top level directory, wildcard2 against the subdirectory and so on.
Specifying -regex simply replaces wildcard matching with regex matching. It is still evaluated as regexes separated by slashes (/), i.e. regex1/regex2/regex3.
In your example the regex "^cdrom" is evaluated against the files in the top level directory, and matches both "cdrom" and "cdrom2".
If you wanted the regex to only match "cdrom" you should use
mksquashfs . /tmp/chroot.squashfs -info -noappend -regex -e '^cdrom$/.*'

Related

How to find multiple files with different ending in LInux using regex?

Let's say that I have multiple files such as:
root.file991
root.file81
root.file77
root.file989
If I want to delete all of them, I would need to use a regex first, so I have tried:
find ./ - regex '\.\/root'
...which would find everything in root file, but how do I filter all these specific files?
You can use
find ./ -regextype posix-extended -regex '\./root\.file[0-9]+'
The regex will match paths like
\. - a dot
/root\.file - a /root.file text
[0-9]+ - ending with one or more digits.
I'm not quite sure what you mean by "files in root file" but if I understand correctly regular POSIX glob(7) pattern matching should be sufficient:
rm root.file[0-9]*
Depending on how complex the other files are, you may have to build up the regex more. $ man find has useful help as well. Try the following:
$ find ./ -regex '\.\/root.file[0-9].*'
# if that works to find what you are looking for, add the -delete
$ find ./ -regex '\.\/root.file[0-9].*' -delete

How to use grep to find in a directory by a regex?

I tried
grep -R '.*invalidTemplateName.*' -regex './online_admin/.*/UTF-8/.*'
to find all occurences of possible mathces of the '.invalidTemplateName.' regex within a directory regex pattern './online_admin/.*/UTF-8/.*', but it doesn't work. I got the message:
grep: ./online_admin/.*/UTF-8/.*: No such file or directory
If I use
grep -R '.*invalidTemplateName.*' .
it looks up in all subdirectory of the current directory that's overwhelming. How can I specify a directory pattern in grep? Is it possible?
Find might be a better choice here:
find ./online_admin/*/UTF-8/* -type f -exec grep -H "invalidTemplateName" {} \;
Find will locate all files in the locations you want, including subdirs of UTF-8 and then execute grep on each file. the -H argument ensures the filename will be printed along with the match. If you want only the filename, use the -L switch instead.
with find you could do something like that:
find /abs/path/to/directory -maxdepth 1 -name '.*invalidTemplateName.*'
using the name argument you can directly filter by names. you can also use wildcards for the filter-string.
using the maxdepth argument you can specify the level of recursion to look up the files. 1 means to look up in /abs/path/to/directory, 2 means to look up in /abs/path/to/directory and in the first level of directories in /abs/path/to/directory as well.

Unix - Using find to List all .html files. (Do not use shell wildcards or the ls command)

I've tried 'find -name .html$', 'find -name .html\>'.
None worked.
I'd like to know why these two are wrong and what's the right one to use with no wildcards?
What you needed was
find -name '*.html'
Or for regex:
find -regex '.*/.*\.html'
To ignore case, use -iname or -iregex:
find -iname '*.html'
find -iregex '.*/.*\.html'
Manual for -name:
-name pattern
Base of file name (the path with the leading directories
removed) matches shell pattern pattern. The metacharacters
(`*', `?', and `[]') match a `.' at the start of the base name
(this is a change in findutils-4.2.2; see section STANDARDS CON‐
FORMANCE below). To ignore a directory and the files under it,
use -prune; see an example in the description of -path. Braces
are not recognised as being special, despite the fact that some
shells including Bash imbue braces with a special meaning in
shell patterns. The filename matching is performed with the use
of the fnmatch(3) library function. Don't forget to enclose
the pattern in quotes in order to protect it from expansion by
the shell.
find . -name '*.html'
You have to single quote the wildcard to keep the shell from globbing it when passing it to find.
You want
find . -name "*.html"
Find uses emacs regex by default, not the posix you are probably used to.
You are missing a couple things here. First of all the path. If you are searching in the local path, use . For example: find . will list every file and directory recursively in the current directory. Second a * is a wildcard. So to find all the .html files in the current directory, try
find . -name *.html

Putty command to find files that are not named in Roman letters

I need a command through Putty or something to find any file in my server that isn't named in Roman alphabet. So the result of the command gives me the path of the file(s) that match this.
My website's server uses Ubuntu Linux 12.04.1.
And I want to search in this path (/var/www/) and all sub-folders of it.
This should work to find files and directories in your /var/www which contain characters other than upper/lowercase A to Z:
find /var/www -iregex '^[.]*[/A-Za-z]*$' -o -print
Leave out the -o to see what it is matching.
The leading [.]* lets you use find . -iregexp... without needing to change the regexp; you can drop it for the /var/www case.
You might need to adjust the [A-Za-z] if you mean classical Roman, which didn't have J, U, or W.
I still liked my erroneous first interpretation, where I thought you were looking for files which weren't named with roman numbers, which had a solution along the lines of:
find /var/www -iregex '^[.]*\(/\|/m*c?d?x?c*x?l?i*x?i?v?i*\)*$' -o -print
(same note on "[.]" as on the prior example)
If you're really looking for filenames containing non-ASCII characters:
find /var/www -iregex '^[ -~]+$' -o -print
That one doesn't need the "[.]" at the front because "." is part of the range anyway.
Or non-ISO-8859-1 (which includes a number of non-Latin, but no asian, hindi, etc):
find /var/www -iregex '^[ -ÿ]+$' -o -print # note the dieresis over the "y"

What's the most compact version of "match everything but these strings" in the shell or regex?

Linux: I want to list all the files in a directory and within its subdirectories, except some strings. For that, I've been using a combination of find/grep/shell globbing. For instance, I want to list all files except those in the directories
./bin
./lib
./resources
I understand this can be done as shown in this question and this other. But both versions are not solving the case "everything, but this pattern" in general terms.
It seems that it is much easier to use a conditional for filtering the results, but I wonder if there is any compact and elegant way of describing this in regexp or in the shell extended globbing.
Thanks.
yourcommand | egrep -v "pattern1|pattern2|pattern3"
Use prune option of find.
find . -path './bin' -prune -o -path './lib' -prune -o -path './resources' -prune -o «rest of your find params»
With bash's extglob shopt setting enabled, you can exclude files with ! in your wildcard pattern. Some examples:
Everything except bin, lib, and resources
shopt -s extglob
ls -Rl !(bin|lib|resources)
Everything with an i in it, except bin and lib
ls -Rl !(bin|lib|!(*i*))
(Everything that has an i in it is the same as everything except the things that don't have i's in them.)