Putty command to find files that are not named in Roman letters - regex

I need a command through Putty or something to find any file in my server that isn't named in Roman alphabet. So the result of the command gives me the path of the file(s) that match this.
My website's server uses Ubuntu Linux 12.04.1.
And I want to search in this path (/var/www/) and all sub-folders of it.

This should work to find files and directories in your /var/www which contain characters other than upper/lowercase A to Z:
find /var/www -iregex '^[.]*[/A-Za-z]*$' -o -print
Leave out the -o to see what it is matching.
The leading [.]* lets you use find . -iregexp... without needing to change the regexp; you can drop it for the /var/www case.
You might need to adjust the [A-Za-z] if you mean classical Roman, which didn't have J, U, or W.
I still liked my erroneous first interpretation, where I thought you were looking for files which weren't named with roman numbers, which had a solution along the lines of:
find /var/www -iregex '^[.]*\(/\|/m*c?d?x?c*x?l?i*x?i?v?i*\)*$' -o -print
(same note on "[.]" as on the prior example)
If you're really looking for filenames containing non-ASCII characters:
find /var/www -iregex '^[ -~]+$' -o -print
That one doesn't need the "[.]" at the front because "." is part of the range anyway.
Or non-ISO-8859-1 (which includes a number of non-Latin, but no asian, hindi, etc):
find /var/www -iregex '^[ -ÿ]+$' -o -print # note the dieresis over the "y"

Related

How to find multiple files with different ending in LInux using regex?

Let's say that I have multiple files such as:
root.file991
root.file81
root.file77
root.file989
If I want to delete all of them, I would need to use a regex first, so I have tried:
find ./ - regex '\.\/root'
...which would find everything in root file, but how do I filter all these specific files?
You can use
find ./ -regextype posix-extended -regex '\./root\.file[0-9]+'
The regex will match paths like
\. - a dot
/root\.file - a /root.file text
[0-9]+ - ending with one or more digits.
I'm not quite sure what you mean by "files in root file" but if I understand correctly regular POSIX glob(7) pattern matching should be sufficient:
rm root.file[0-9]*
Depending on how complex the other files are, you may have to build up the regex more. $ man find has useful help as well. Try the following:
$ find ./ -regex '\.\/root.file[0-9].*'
# if that works to find what you are looking for, add the -delete
$ find ./ -regex '\.\/root.file[0-9].*' -delete

How to use mksquashfs -regex?

I want to mksquashfs a chroot, and include the /cdrom dir, but exclude everything inside it. I already know how to do this with -wildcards, but I want to see if -regex has a bug. Test case:
cd $(mktemp -d)
mkdir -p cdrom cdrom2/why
mksquashfs . /tmp/chroot.squashfs -info -noappend -regex -e '^cdrom/.*$'
The problem is that cdrom2/why was omitted! It seems to me like "/" is actually ignored there. Is this a mksquashfs bug?
This is because you don't fully understand how regexes work in Mksquashfs exclude files.
An exclude file if wildcards are used is basically treated as series of wildcarded files separated by slashes (/), i.e. wildcard1/wildcard2/wildcard3, will match wildcard1 against the top level directory, wildcard2 against the subdirectory and so on.
Specifying -regex simply replaces wildcard matching with regex matching. It is still evaluated as regexes separated by slashes (/), i.e. regex1/regex2/regex3.
In your example the regex "^cdrom" is evaluated against the files in the top level directory, and matches both "cdrom" and "cdrom2".
If you wanted the regex to only match "cdrom" you should use
mksquashfs . /tmp/chroot.squashfs -info -noappend -regex -e '^cdrom$/.*'

How to Recursively Remove Files of a Certain Type

I misread the gzip documentation, and now I have to remove a ton of ".gz" files from many directories inside one another. I tried using 'find' to locate all .gz files. However, whenever there's a file with a space in the name, rm interprets that as another file. And whenever there's a dash, rm interprets that as a new flag. I decided to use 'sed' to replace the spaces with "\ " and the space-dashes with "\ -", and here's what I came up with.
find . -type f -name '*.gz' | sed -r 's/\ /\\ /g' | sed -r 's/\ -/ \\-/g'
When I run the find/sed query on a file that, for example, has a name of "Test - File - for - show.gz", I get the output
./Test\ \-\ File\ \-\ for\ \-\ show.gz
Which appears to be acceptable for rm, but when I run
rm $(find . -type f -name '*.gz'...)
I get
rm: cannot remove './Test\\': No such file or directory
rm: cannot remove '\\-\\': No such file or directory
rm: cannot remove 'File\\': No such file or directory
rm: cannot remove '\\-\\': No such file or directory
...
I haven't made extensive use of sed, so I have to assume I'm doing something wrong with the regular expressions. If you know what I'm doing wrong, or if you have a better solution, please tell me.
Adding backslashes before spaces protects the spaces against expansion in shell source code. But the output of a command in a command substitution does not undergo shell parsing, it only undergoes wildcard expansion and field splitting. Adding backslashes before spaces doesn't protect them against field splitting.
Adding backslashes before dashes is completely useless since it's rm that interprets dashes as special, and it doesn't interpret backslashes as special.
The output of find is ambiguous in general — file names can contain newlines, so you can't use a newline as a file name separator. Parsing the output of find is usually broken unless you're dealing with file names in a known, restricted character set, and it's often not the simplest method anyway.
find has a built-in way to execute external programs: the -exec action. There's no parsing going on, so this isn't subject to any problem with special characters in file names. (A path beginning with - could still be interpreted as an option, but all paths begin with . since that's the directory being traversed.)
find . -type f -name '*.gz' -exec rm {} +
Many find implementations (Linux, Cygwin, BSD) can delete files without invoking an external utility:
find . -type f -name '*.gz' -delete
See Why does my shell script choke on whitespace or other special characters? for more information on writing robust shell scripts.
There is no need to pipe to sed, etc. Instead, you can make use of the -exec flag on find, that allows you to execute a command on each one of the results of the command.
For example, for your case this would work:
find . -type f -name '*.gz' -exec rm {} \;
which is approximately the same as:
find . -type f -name '*.gz' -exec rm {} +
The last one does not open a subshell for each result, which makes it faster.
From man find:
-exec command ;
Execute command; true if 0 status is returned. All following
arguments to find are taken to be arguments to the command until an
argument consisting of ;' is encountered. The string{}' is
replaced by the current file name being processed everywhere it occurs
in the arguments to the command, not just in arguments where it is
alone, as in some versions of find. Both of these constructions
might need to be escaped (with a `\') or quoted to protect them from
expansion by the shell. See the EXAMPLES section for examples of the
use of the -exec option. The specified command is run once for
each matched file. The command is executed in the starting directory.
There are unavoidable security problems surrounding use of the -exec
action; you should use the -execdir option instead.

Unix - Using find to List all .html files. (Do not use shell wildcards or the ls command)

I've tried 'find -name .html$', 'find -name .html\>'.
None worked.
I'd like to know why these two are wrong and what's the right one to use with no wildcards?
What you needed was
find -name '*.html'
Or for regex:
find -regex '.*/.*\.html'
To ignore case, use -iname or -iregex:
find -iname '*.html'
find -iregex '.*/.*\.html'
Manual for -name:
-name pattern
Base of file name (the path with the leading directories
removed) matches shell pattern pattern. The metacharacters
(`*', `?', and `[]') match a `.' at the start of the base name
(this is a change in findutils-4.2.2; see section STANDARDS CON‐
FORMANCE below). To ignore a directory and the files under it,
use -prune; see an example in the description of -path. Braces
are not recognised as being special, despite the fact that some
shells including Bash imbue braces with a special meaning in
shell patterns. The filename matching is performed with the use
of the fnmatch(3) library function. Don't forget to enclose
the pattern in quotes in order to protect it from expansion by
the shell.
find . -name '*.html'
You have to single quote the wildcard to keep the shell from globbing it when passing it to find.
You want
find . -name "*.html"
Find uses emacs regex by default, not the posix you are probably used to.
You are missing a couple things here. First of all the path. If you are searching in the local path, use . For example: find . will list every file and directory recursively in the current directory. Second a * is a wildcard. So to find all the .html files in the current directory, try
find . -name *.html

How to exclude a directory in a recursive search using grep?

How to do a recursive search using grep while excluding a particular directory ?
Background : I have a large directory consisting of log files which I would like to eliminate in the search. The easiest way is to move the log folder. Unfortunately I cannot do that, as the project mandates the location.
Any idea how to do it ?
are you looking for this?
from grep man page:
--exclude-dir=DIR
Exclude directories matching the pattern DIR from recursive searches.
As an alternate, if you can use find in your search, it may also be useful:
find [directory] -name "*.log" -prune -o -type f -print|grep ...
The [directory] can actually be the current directory if you want (just a . will do).
The next part, -name "*.log" -prune is all together. It searches for filenames with the pattern *.log and will strip them OUT of your results.
Next is -o (for "or")
Then, -type f -print which says "print (to stdout) any type that is a file."
Those results should include every file (no directories are returned) found in [directory] except those that end in .log. Then you can grep the results as you need.