UNIX: Finding lines - regex

I need to write a small script, which it find lines according to a regular expression (for example "^folder#) and it will write the number of lines where it matchs.
My idea is, that I will use "find", then delete all slash and then use grep with a regular expression.
I don't know why it doesn't work. Could you give some advice how to improve, or how I should find that lines with another function?
In
./example
./folder/.DS_Store
./folder/file.png
Out
2: ./folder/.DS_Store
3: ./folder/file.png
IGN="^folder$"
find . -type f | sed -e "s/\// /g" | grep -n "${IGN}"

You say you want to use ^folder$ pattern but you want to get output like:
2: ./folder/.DS_Store
3: ./folder/file.png
These two requests contradict each other. A line like ./folder/.DS_Store cannot match pattern ^folder$ because the line doesn't start with "folder" and doesn't end with "folder".
To get the output you describe you need to change the pattern used with grep to ^\./folder/

You tried
IGN="^folder$"
find . -type f | sed -e "s/\// /g" | grep -n "${IGN}"
This script isn't working since IGN looks for start-of-line, not start-of-word.
You can make lines from the parts of your paths with
IGN="^folder$"
find . -type f | tr -s "/" "\n" | grep -n "${IGN}"

Related

How to grep for a pattern in every file involving brackets ${<any-word}

Environment: bash (cygwin)
I have a need to grep every file in a directory with a specific extension, and have printed to screen, just the pattern I am looking for.
It must support multiple patterns per line for the file.
The pattern is: dollar sign, left curly, then any word or no word, then right curly bracket, like so:
$P{<anyword>}
Preferably a single: grep command, or find
find . -type f -name '*.txt' -exec grep <something> {} \;
The issue is that I have a statement to do this, but it returns the whole line where the expression is found, and I only want the pattern found to be displayed.
I am in need of help with finding the regex expression to find the pattern:
$P{any-series-of-characters-or-numbers-or-dashes-or-underlines-anything-at-all-up-until-the-next-closing-curly-bracket}
I have tried several things that do not work, and then to print just what is found, but not the file name that it is found in.
given myFile.txt:
asd
asd
asdf
fdg dsfg dsf g
askldf ${foo}
${bar} dfsdfg ${}
asdf asdf
asdfl asdf ${zzzzz
AKSDHA ASDF {aaaa}
grep -o -E '[$]{[^}]*}' myFile.txt results in:
${foo}
${bar}
${}
The regex can definitely be tighten up to cover more use cases....
You can try to use grep flag -o (--only-matching): "print only the matched (non-empty) parts of a matching line".
UPD:
grep --only-matching --no-filename -P '(?<=\$\{)[^}]*(?=\})'
And if you need to include ${ and } to the result, please use
grep --only-matching --no-filename -E '\$\{[^}]*\}'
Using Perl regular expression :
find . -type f -name '*.txt' -exec grep -Po '\$\{[\w-]*\}' {} \;

Sed : print all lines after match

I got my research result after using sed :
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | cut -f 1 - | grep "pattern"
But it only shows the part that I cut. How can I print all lines after a match ?
I'm using zcat so I cannot use awk.
Thanks.
Edited :
This is my log file :
[01/09/2015 00:00:47] INFO=54646486432154646 from=steve idfrom=55516654455457 to=jone idto=5552045646464 guid=100021623456461451463 n
um=6 text=hi my number is 0 811 22 1/12 status=new survstatus=new
My aim is to find all users that spam my site with their telephone numbers (using grep "pattern") then print all the lines to get all the information about each spam. The problem is there may be matches in INFO or id, so I use sed to get the text first.
Printing all lines after a match in sed:
$ sed -ne '/pattern/,$ p'
# alternatively, if you don't want to print the match:
$ sed -e '1,/pattern/ d'
Filtering lines when pattern matches between "text=" and "status=" can be done with a simple grep, no need for sed and cut:
$ grep 'text=.*pattern.* status='
You can use awk
awk '/pattern/,EOF'
n.b. don't be fooled: EOF is just an uninitialized variable, and by default 0 (false). So that condition cannot be satisfied until the end of file.
Perhaps this could be combined with all the previous answers using awk as well.
Maybe this is what you actually want? Find lines matching "pattern" and extract the field after text= up through just before status=?
zcat file* | sed -e '/pattern/s/.*text=\(.*\)status=[^/]*/\1/'
You are not revealing what pattern actually is -- if it's a variable, you cannot use single quotes around it.
Notice that \(.*\)status=[^/]* would match up through survstatus=new in your example. That is probably not what you want? There doesn't seem to be a status= followed by a slash anywhere -- you really should explain in more detail what you are actually trying to accomplish.
Your question title says "all line after a match" so perhaps you want everything after text=? Then that's simply
sed 's/.*text=//'
i.e. replace up through text= with nothing, and keep the rest. (I trust you can figure out how to change the surrounding script into zcat file* | sed '/pattern/s/.*text=//' ... oops, maybe my trust failed.)
The seldom used branch command will do this for you. Until you match, use n for next then branch to beginning. After match, use n to skip the matching line, then a loop copying the remaining lines.
cat file | sed -n -e ':start; /pattern/b match;n; b start; :match n; :copy; p; n ; b copy'
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | ***cut -f 1 - | grep "pattern"***
instead change the last 2 segments of your pipeline so that:
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | **awk '$1 ~ "pattern" {print $0}'**

How exactly does this Sed command work?

sed 's#.*/.*\.#.#'
The command is part of a larger command to find all file extensions in a directory.
find . -type f -name '*.*' | sed 's#.*/.*\.#.#' | sort | uniq
I understand that find returns all files with an extension, I understand that sed returns just the extensions and then sort/uniq are self-explanatory.
At first, I was confused about the # symbol, but my best guess now guess is that it is part of Regex.
What really confuses me is a can't figure how it explicitly works, and the closest matching syntax I can find in a manual is s/regexp/new/ which still doesn't match the syntax of the command.
In the s/regexp/replacement/ syntax, the / can be replaced by any other character, such as ,, :, #, etc. This is very useful if your regexp itself contains / characters, such as your example of .*/.*\..
Your command could be simplified a bit, though:
find . -type f -name '*.*' | sed 's/.*\././' | sort -u
Here, I simplified the regexp so that it no longer contains a / character.

Find/sed: How can I recursively search/replace a string in files but only for lines that match a particular regexp

I'm aware that the following command can be used to recursively replace all instances of a particular string with another:
find /path/to/files -type f -print0 | xargs -0 sed -i 's/oldstring/newstring/g'
However, I need to do this only for lines that start with a particular string ("matchstr").
For example, if a file contained the following lines:
This line containing oldstring should remain untouched
matchstr sometext oldstring somethingelse
I want to have this as output:
This line containing oldstring should remain untouched
matchstr sometext newstring somethingelse
Any suggestions as to how I could proceed would be appreciated.
You could do:
sed -i '/^matchstr/{s/oldstring/newstring/g}'
ie
find /path/to/files -type f -print0 | \
xargs -0 sed -i '/^matchstr/{s/oldstring/newstring/g}'
The first /^matchstr/ looks for lines matching that regex, and for those lines the s/old/new/g is performed.
You can do this easily using sed:
sed -e '/^matchstr/ s/oldstring/newstring/g' inputfile
/^matchstr/ acts as a condition; it makes it execute the following block only if the regex matches. In this case we're looking for matchstr at the beginning of the line.

Using grep to search files provided by find: what is wrong with find . | xargs grep '...'?

When I use the command:
find . | xargs grep '...'
I get the wrong matches. I'm trying to search for the string ... in all files in the current folder.
As Andy White said, you have to use fgrep in order to match for plain ., or escape the dots.
So you have to write (-type f is to only have the files : you obviously don't want the directories.) :
find . -type f | xargs fgrep '...'
or if you still want to use grep :
find . -type f | xargs grep '\.\.\.'
And if you only want the current directory and not its subdirs :
find . -maxdepth 1 -type f | xargs fgrep '...'
'.' matches any character, so you'll be finding all lines that contain 3 or more characters.
You can either escape the dots, like this:
find . | xargs grep '\.\.\.'
Or you can use fgrep, which does a literal match instead of a regex match:
find . | xargs fgrep '...'
(Some versions of grep also accept a -F flag which makes them behave like fgrep.)
#OP, if you are looking for files that contain ...,
grep -R "\.\.\." *
If you're looking for a filename that matches, try:
find . -name "filename pattern"
or
find . | grep "filename pattern"
If your looking for looking for files that match (ie it contains the grep string)
find . | xargs grep "string pattern"
works fine. or simply:
grep "string pattern" -R *
If you are literally typing grep '...' you'll match just about any string. I doubt you're actually typing '...' for your grep command, but if you are, the ... will match any three characters.
Please post more info on what you're searching for, and maybe someone can help you out more.
To complete Jeremy's answer, you may also want to try
find . -type f | xargs grep 'your_pattern'
or
find . -type f -exec grep 'your_pattern' {} +
Which is similar to a xargs
I might add : RTFM ! Or in a more polite way : use & abuse of
man command
!