How can I use the files found by one ag search as the domain for a second ag search? - ag

Suppose I do ag -l foo. That gets me a list of files.
How can I use ag a second time to search within just those files?

Assuming you're in the bash shell, you do this:
ag whatever $(ag -l foo)
So to find all the files that match both cat and dog:
ag cat $(ag -l dog)
You could also use xargs:
ag -l dog | xargs ag cat
If you used ack, another greplike tool, you could use the -x option to read the list of input files from stdin:
ack -l dog | ack -x cat

Related

Using sed to trim the beginning of stdout

I'm writing a small script to list all the directories being shared in a macos system. Macos has a simple tool called sharing -l that will list all the paths once it's combined with sharing -l | grep path The problem is the output looks like this:
path: /Volumes/Storage A/File Server/
and I need it to look like this instead
/Volumes/Storage\ A/File\ Server/
So the white spaces need to be escaped and the beginning of the line with path: and the white space needs to be trimmed. I'm been messing about with sed for hours now but I just don't know enough about it to do this all in one command. I'm hoping to append something to the end of sharing -l | grep path
You may use this:
sharing -l | sed -En '/^path:/{ s/^path:[[:blank:]]*//; s/[[:blank:]]+/\\&/g; p;}'
Could you please try following.
sharing -l | awk '{$2=$2"\\";$3=$3"\\";sub(/^path: +/,"")} 1'
If you don't need the white spaces escaped:
$ sharing -l | sed -n 's/^path:[[:space:]]*//p'
/Volumes/Storage A/File Server/
and if you do:
$ sharing -l | awk 'sub(/^path:[[:space:]]*/,""){gsub(/[[:space:]]/,"\\\\&"); print}'
/Volumes/Storage\ A/File\ Server/

Extract filename up to first dash

We've got thousands of files saved in one directory. The common pattern there is date. For example:
foo-2013-09-01.gz
bar-2013-09-01.gz
fu-2013-09-02.gz
ba-2013-09-02.gz
cat-2013-09-01.gz
dog-2013-09-02.gz
dog-2013-09-03.gz
How could we then get the list of unique file names just before the first dash? E.g.
foo
bar
fu
ba
cat
dog
We're not bothered with path names, but just the first part (if you can see this in type-date.filext format). We intend to use the final result in a for-loop, which will create a subdirectory for each type that has all its other files by date.
One way would be to say:
ls -1 | sed 's/-.*//g' | sort -u
To avoid parsing ls output, you could say:
find . -mindepth 1 -maxdepth 1 -type f -printf "%P\n" | sed 's/-.*//g' | sort -u
Pure BASH way:
s='foo-2013-09-01.gz'
echo "${s%%-*}"
foo
Assuming you have the list of files:
... | awk -F'-' '!x[$0=$1]++' | xargs mkdir
Use sed 's/-.*//':
falsetru#ubuntu:/tmp/t$ ls
ba-2013-09-02.gz cat-2013-09-01.gz dog-2013-09-03.gz fu-2013-09-02.gz
bar-2013-09-01.gz dog-2013-09-02.gz foo-2013-09-01.gz
falsetru#ubuntu:/tmp/t$ ls | sed 's/-.*//'
ba
bar
cat
dog
dog
foo
fu
This might work for you (GNU sed):
sed -r 's/-.*//;G;/^([^\n]+)\n.*\<\1\>/d;h;P;d' file
Truncate the file name, then use the hold space to check for unique keys.
If the key already exists delete that line otherwise add it to the hold space and then print the unique key.

How to find all files in a Directory with grep and regex?

I have a Directory(Linux/Unix) on a Apache Server with a lot of subdirectory containing lot of files like this:
- Dir
- 2010_01/
- 142_78596_101_322.pdf
- 12_10.pdf
- ...
- 2010_02/
- ...
How can i find all files with filesnames looking like: *_*_*_*.pdf ? where * is always a digit!!
I try to solve it like this:
ls -1Rl 2010-01 | grep -i '\(\d)+[_](\d)+[_](\d)+[_](\d)+[.](pdf)$' | wc -l
But the regular expression \(\d)+[_](\d)+[_](\d)+[_](\d)+[.](pdf)$ doesn't work with grep.
Edit 1: Trying ls -l 2010-03 | grep -E '(\d+_){3}\d+\.pdf' | wc -l for example just return null. So it's dont work perfectly
Try using find.
The command that satisfies your specification __*_*.pdf where * is always a digit:
find 2010_10/ -regex '__\d+_\d+\.pdf'
You seem to be wanting a sequence of 4 numbers separated by underscores, however, based on the regex that you tried.
(\d+_){3}\d+\.pdf
Or do you want to match all names containing solely numbers/underscores?
[\d_]+\.pdf
First, you should be using egrep vs grep or call grep with -E for extended patterns.
So this works for me:
$ cat test2.txt
- Dir
- 2010_01/
- 142_78596_101_322.pdf
- 12_10.pdf
- ...
- 2010_02/
- ...
Now egrep that file:
cat test2.txt | egrep '((?:\d+_){3}(?:\d+)\.pdf$)'
- 142_78596_101_322.pdf
Since there are parenthesis around the whole pattern, the entire file name will be captured.
Note that the pattern does NOT work with grep in traditional mode:
$ cat test2.txt | grep '((?:\d+_){3}(?:\d+)\.pdf$)'
... no return
But DOES work if you use the extend pattern switch (the same as calling egrep):
$ cat test2.txt | grep -E '((?:\d+_){3}(?:\d+)\.pdf$)'
- 142_78596_101_322.pdf
Thanks to gbchaosmaster and the wolf I find a way which work for me:
Into a Directory:
find . | grep -P "(\d+_){3}\d+\.pdf" | wc -l
At the Root Directory:
find 20*/ | grep -P "(\d+_){3}\d+\.pdf" | wc -l

listing files in the directory using grep

List all the files in /usr/bin whose filenames contain lowercase English alphabet
characters only and also contain the word file as a (contiguous) substring.
For example, file and profiles are such files, but git-ls-files is not.
This is the exact question I have and I can only use grep, ls, cat and wc for it.
ls /usr/bin/ | grep '[^-]*file'
This is what I got so far and output is below. I dont know how to display for example just file since * is zero or more occurences. And no idea how to put lowercase thing in the regex as well..
check-binary-files
clean-binary-files
desktop-file-install
desktop-file-validate
ecryptfs-rewrite-file
file
filep
git-cat-file
git-diff-files
git-ls-files
git-merge-file
git-merge-one-file
git-unpack-file
lockfile
nsrfile
pamfile
pcprofiledump
pnmfile
ppufiles
profiles
ls /usr/bin/ | grep --regex '^[[:lower:]]*file[[:lower:]]*$'
The ^ and $ match the beginning and end of the string, respectively.
Using ls piped with grep is really redundant in that situation. You can use find:
$> find /usr/bin -regex "/usr/bin/[a-z]*file[a-z]*" -type f -printf "%f\n"
profiles
keditfiletype
inifile
dotlockfile
pamfile
pnmfile
file
konsoleprofile
ls -1 /usr/bin | grep '^[a-z]*file[a-z]*$'
ls -1 makes sure the files are listed each in a single line. ^ and $ are symbols for start and end of line, which is what you were missing (otherwise it can match a substring of the filename)
Use -P option, means Perl regular expressions, so finally your command would look like:
ls /usr/bin | grep -P '[a-z]file'
insider#gfl ~/test/bin$ ls
123file desktop-file-install file git-diff-files git-merge-one-file nsrfile pnmfile testlist
check-binary-files desktop-file-validate filep git-ls-files git-unpack-file pamfile ppufiles
clean-binary-files ecryptfs-rewrite-file git-cat-file git-merge-file lockfile pcprofiledump profiles
insider#gfl ~/test/bin$ ls . | grep -P '[a-z]file'
lockfile
nsrfile
pamfile
pcprofiledump
pnmfile
ppufiles
profiles

How can I exclude one word with grep?

I need something like:
grep ^"unwanted_word"XXXXXXXX
You can do it using -v (for --invert-match) option of grep as:
grep -v "unwanted_word" file | grep XXXXXXXX
grep -v "unwanted_word" file will filter the lines that have the unwanted_word and grep XXXXXXXX will list only lines with pattern XXXXXXXX.
EDIT:
From your comment it looks like you want to list all lines without the unwanted_word. In that case all you need is:
grep -v 'unwanted_word' file
I understood the question as "How do I match a word but exclude another", for which one solution is two greps in series: First grep finding the wanted "word1", second grep excluding "word2":
grep "word1" | grep -v "word2"
In my case: I need to differentiate between "plot" and "#plot" which grep's "word" option won't do ("#" not being a alphanumerical).
If your grep supports Perl regular expression with -P option you can do (if bash; if tcsh you'll need to escape the !):
grep -P '(?!.*unwanted_word)keyword' file
Demo:
$ cat file
foo1
foo2
foo3
foo4
bar
baz
Let us now list all foo except foo3
$ grep -P '(?!.*foo3)foo' file
foo1
foo2
foo4
$
The right solution is to use grep -v "word" file, with its awk equivalent:
awk '!/word/' file
However, if you happen to have a more complex situation in which you want, say, XXX to appear and YYY not to appear, then awk comes handy instead of piping several greps:
awk '/XXX/ && !/YYY/' file
# ^^^^^ ^^^^^^
# I want it |
# I don't want it
You can even say something more complex. For example: I want those lines containing either XXX or YYY, but not ZZZ:
awk '(/XXX/ || /YYY/) && !/ZZZ/' file
etc.
Invert match using grep -v:
grep -v "unwanted word" file pattern
grep provides '-v' or '--invert-match' option to select non-matching lines.
e.g.
grep -v 'unwanted_pattern' file_name
This will output all the lines from file file_name, which does not have 'unwanted_pattern'.
If you are searching the pattern in multiple files inside a folder, you can use the recursive search option as follows
grep -r 'wanted_pattern' * | grep -v 'unwanted_pattern'
Here grep will try to list all the occurrences of 'wanted_pattern' in all the files from within currently directory and pass it to second grep to filter out the 'unwanted_pattern'.
'|' - pipe will tell shell to connect the standard output of left program (grep -r 'wanted_pattern' *) to standard input of right program (grep -v 'unwanted_pattern').
The -v option will show you all the lines that don't match the pattern.
grep -v ^unwanted_word
I excluded the root ("/") mount point by using grep -vw "^/".
# cat /tmp/topfsfind.txt| head -4 |awk '{print $NF}'
/
/root/.m2
/root
/var
# cat /tmp/topfsfind.txt| head -4 |awk '{print $NF}' | grep -vw "^/"
/root/.m2
/root
/var
I've a directory with a bunch of files. I want to find all the files that DO NOT contain the string "speedup" so I successfully used the following command:
grep -iL speedup *