How to grep only the matching regex? - regex

I write in a file NotEmpty.txt all my non-empty text files in a directory dir with the following command:
find dir/ -not -empty -ls | grep -E "*.txt" > NotEmpty.txt
I'd like to print only the matching regex and not all the information on the line. How is it possible?

The problem is that you are executing ls against every match, so the output contains a lot of stuff. Instead, use this find command to print the name.
Note, in fact, that you can do everything in one shot, including selecting just .txt files:
find your_dir/ -not -empty -name "*.txt" -print > NotEmpty.txt
# ^^^^^^^^^^^^^ ^^^^^^
# | |
# just .txt files |
# |
# print its name instead of `ls`ing it
You can also say -type f to just check files, which in fact I guess it is assumed by the -not -empty parameter.

Use the -o parameter of grep to specify that you only want the matching portion.
Example:
$ echo foo bar baz | grep -o "foo"
foo

Related

UNIX: Finding lines

I need to write a small script, which it find lines according to a regular expression (for example "^folder#) and it will write the number of lines where it matchs.
My idea is, that I will use "find", then delete all slash and then use grep with a regular expression.
I don't know why it doesn't work. Could you give some advice how to improve, or how I should find that lines with another function?
In
./example
./folder/.DS_Store
./folder/file.png
Out
2: ./folder/.DS_Store
3: ./folder/file.png
IGN="^folder$"
find . -type f | sed -e "s/\// /g" | grep -n "${IGN}"
You say you want to use ^folder$ pattern but you want to get output like:
2: ./folder/.DS_Store
3: ./folder/file.png
These two requests contradict each other. A line like ./folder/.DS_Store cannot match pattern ^folder$ because the line doesn't start with "folder" and doesn't end with "folder".
To get the output you describe you need to change the pattern used with grep to ^\./folder/
You tried
IGN="^folder$"
find . -type f | sed -e "s/\// /g" | grep -n "${IGN}"
This script isn't working since IGN looks for start-of-line, not start-of-word.
You can make lines from the parts of your paths with
IGN="^folder$"
find . -type f | tr -s "/" "\n" | grep -n "${IGN}"

Finding soft links with grep

I'm writing a quick program that lists all of the soft/symbolic links in the working directory to a file which is given in argument 1. I'm aware that I need to use grep in order to do so, but in general I have difficulty figuring out how to write the regular expression. In this case, it is especially difficult due to the fact that a variable ($argv[1]) is involved.
The (poorly-written) line of code in question is as follows:
ls -l | xargs grep '-> $argv[1]'
My intention with this was to catch all of the lines that contained the -> and the specified file, such as
link1 -> file
link2 -> file
and so on. Is there any way that I can use grep to accomplish this?
What kind of language is $argv[1]? The (POSIX) Bourne Shell doesn't support arrays. Arguments to scripts and functions are referenced by $1, $2 and so on.
In order for grep to not treat the first hyphen in the pattern as an option, use -- to signal the end of options. Next, there is no parameter substitution in single quotes, only in double qouotes. Putting it all together, this might work:
set somename # Sets $1 to somename
ls -l | xargs grep -- "-> $1"
If your grep doesn't understand --, try
ls -l | xargs grep ".*-> $1"
Below script can find only the soft / symbolic link files and list only if the argument found on those files,
# cat sygrep.sh
#!/bin/bash
if [ $# -eq 0 ]
then
echo "No arguments supplied"
else
for a in `find . -type l` ; do grep -irl '$1' $a ; done
fi
Output:
# ./sygrep.sh
No arguments supplied
# ./sygrep.sh root
./mytest.sh

Finding file names without a specified character

Is there a good regex to find all of the files that do not contain a certain character? I know there are lots to find lines containing matches, but I want something that will find all files that do not contain my match.
Using ls and sed to replace all filenames with no extension (i.e. not containing a .) with NoExtension:
ls | sed -e 's/^[^.]*$/NoExtension/g'
replacing filenames that have an extension with their extension:
ls | sed -e 's/^[^.]*$/NoExtension/g' -e 's/.*\.\(.*\)/\1/'
for bash - to list all files in a directory-:
shopt -s extglob
ls !(*.*)
The extglob setting is required to enable to ! which negates the . argument to ls.
You should discard all the answers that parse the output of ls read here for why. The tool find is perfect for this.
# Show files in cwd
$ ls
file file.txt
# Find the files with an extension
$ find -type f -regex '.*/.*\..*$'
./file.txt
# Invert the match using the -not option
$ find -type f -not -regex '.*/.*\..*$'
./file
And an awk solution, for good measure.
ls | awk '$0 !~ /\..+$/{a++}END{print a}'
This might work for you (find, GNU sed & wc):
find . -type f | sed -rn '\|.*/\.?[^.]+$|w NoExtensions' && wc -l NoExtensions
This gives you a count and a list.
N.B. dot files without extensions are included.

BASH - find specific folder with find and filter with regex

I have a folder containing many folders with subfolder (/...) with the following structre:
_30_photos/combined
_30_photos/singles
_47_foo.bar
_47_foo.bar/combined
_47_foo.bar/singles
_50_foobar
With the command find . -type d -print | grep '_[0-9]*_' all folder with the structure ** will be shown. But I have generate a regex which captures only the */combined folders:
_[0-9]*_[a-z.]+/combined but when I insert that to the find command, nothing will be printed.
The next step would be to create for each combined folder (somewhere on my hdd) a folder and copy the content of the combined folder to the new folder. The new folder name should be the same as the parent name of the subfolder e.g. _47_foo.bar. Could that be achieved with an xargs command after the search?
You do not need grep:
find . -type d -regex ".*_[0-9]*_.*/combined"
For the rest:
find . -type d -regex "^\./.*_[0-9]*_.*/combined" | \
sed 's!\./\(.*\)/combined$!& /somewhere/\1!' | \
xargs -n2 cp -r
With basic grep you will need to escape the +:
... | grep '_[0-9]*_[a-z.]\+/combined'
Or you can use the "extended regexp" version (egrep or grep -E [thanks chepner]) in which the + does not have to be escaped.
xargs may not be the most flexible way of doing the copying you describe above, as it is tricky to use with multiple commands. You may find more flexibility with a while loop:
... | grep '_[0-9]*_[a-z.]\+/combined' | while read combined_dir; do
mkdir some_new_dir
cp -r ${combined_dir} some_new_dir/
done
Have a look at bash string manipulation if you want a way to automate the name of some_new_dir.
target_dir="your target dir"
find . -type d -regex ".*_[0-9]+_.*/combined" | \
(while read s; do
n=$(dirname "$s")
cp -pr "$s" "$target_dir/${n#./}"
done
)
NOTE:
this fails if you have linebreaks "\n" in your directory names
this uses a subshell to not clutter your env - inside a script you don't need that
changed the regex slightly: [0-9]* to [0-9]+
You can use this command:
find . -type d | grep -P "_[0-9]*_[a-z.]+/combined"

Using grep to search files provided by find: what is wrong with find . | xargs grep '...'?

When I use the command:
find . | xargs grep '...'
I get the wrong matches. I'm trying to search for the string ... in all files in the current folder.
As Andy White said, you have to use fgrep in order to match for plain ., or escape the dots.
So you have to write (-type f is to only have the files : you obviously don't want the directories.) :
find . -type f | xargs fgrep '...'
or if you still want to use grep :
find . -type f | xargs grep '\.\.\.'
And if you only want the current directory and not its subdirs :
find . -maxdepth 1 -type f | xargs fgrep '...'
'.' matches any character, so you'll be finding all lines that contain 3 or more characters.
You can either escape the dots, like this:
find . | xargs grep '\.\.\.'
Or you can use fgrep, which does a literal match instead of a regex match:
find . | xargs fgrep '...'
(Some versions of grep also accept a -F flag which makes them behave like fgrep.)
#OP, if you are looking for files that contain ...,
grep -R "\.\.\." *
If you're looking for a filename that matches, try:
find . -name "filename pattern"
or
find . | grep "filename pattern"
If your looking for looking for files that match (ie it contains the grep string)
find . | xargs grep "string pattern"
works fine. or simply:
grep "string pattern" -R *
If you are literally typing grep '...' you'll match just about any string. I doubt you're actually typing '...' for your grep command, but if you are, the ... will match any three characters.
Please post more info on what you're searching for, and maybe someone can help you out more.
To complete Jeremy's answer, you may also want to try
find . -type f | xargs grep 'your_pattern'
or
find . -type f -exec grep 'your_pattern' {} +
Which is similar to a xargs
I might add : RTFM ! Or in a more polite way : use & abuse of
man command
!