regex for one character or another? - regex

I'm searching for a regex code which lists all the lines that contain an a OR an i.
I tryed this:
grep -E '[(a|i)]{1}' testFile.txt
but this gives me the words containing a or i and words that contain a en i.
What's wrong?

You can achieve that with:
grep -E "^[^ai]*(a|i){1}[^ai]*$" testFile.txt

If you want an exclusive OR then try this:
grep -E '^[^i]*a[^i]*$|^[^a]*i[^a]*$'

Related

grep strings between "{{_(" and ")}}"

I want to parse html files to extract strings between "{{_(" and ")}}" using GREP. I tried something like this:
grep '"[^{{_(|)}}$]"' *.html
but it didn't work.
Can someone help me please?
Thanks!
You may use
grep -oP '(?<={{_\().+?(?=\)}})' file
Details
-o - output only matched substrings
-P - enable the PCRE regex engine
(?<={{_\().+?(?=\)}}) match:
(?<={{_\() - a location that is immediately preceded with {{+(
.+? - any 1 or more more chars other than line break chars, as few as possible
(?=\)}}) - a location that is immediately followed with )}} .
See the regex demo.
#Wiktor Stribiżew's answer works really good. However, if you have multiple files, you would get an output like this, where the respective file name per each match is also displayed:
foo.html: content abc
foo.html: test 123
bar.html: first match
bar.html: second match
So, if you are only interested in the matching string as output, you can try sed instead
sed -n 's/.*{{_(\(.*\))}}.*/\1/p' *.html
You can also count the unique occurrence of matches and things like that...
Update:
Or just use the -h | --no-filename with the grep that #Wiktor Stribiżew has provided.
grep -h -oP '(?<={{_\().+?(?=\)}})' *.html
Or the -c flag in order to display the count of matches per each file:
grep -c -oP '(?<={{_\().+?(?=\)}})' *.html
As in the posts before with it is possible to grep the value of an HTML property.
placeholder="SOME TEXT_HERE" -> grep -> "SOME TEXT_HERE"
grep -oP '(?<=placeholder=").+?(?=")' *html

Capture text between two tokens

I'm trying to get the text between two tokens.
For example, let's say the text is:
arn:aws:dfasdfasdf/asdfa:start:CaptureThis/end
The output should be: CaptureThis
And the two tokens are: :start: and /end
The closest I could get was using this regex:
INPUT="arn:aws:dfasdfasdf/asdfa:start:CaptureThis/end"
VALUE=$(echo "${INPUT}" | sed -e 's/:start:\(.*\)\/end/\1/')
... but this returns most of the string: arn:aws:dfasdfasdf/asdfa:start:CaptureThis/end
How do I get all of the other text out of the way?
You could use (GNU) grep with Perl regular expressions (look-arounds) and the -o option to only return the match:
$ grep -Po '(?<=:start:).*(?=/end)' <<< 'arn:aws:dfasdfasdf/asdfa:start:CaptureThis/end'
CaptureThis
Try this:
$ sed 's/^.*:start:\(.*\)\/end.*$/\1/' <<<'arn:aws:dfasdfasdf/asdfa:start:CaptureThis/end'
CaptureThis
The problem with your approach was that you only replaced part of the input line, because your regex didn't capture the entire line.
Note how the command above anchors the regex both at the beginning of the line (^.*) and at the end (.*$) so as to ensure that the entire line is matched and thus replaced.
You could use :
VALUE=$(echo "${INPUT}" | sed -e 's/.*:start:\(.*\)\/end.*/\1/')
If the tokens are liable to change, you could use variables - but since "/end" has a "/", that could lead to sed getting confused, so you'd probably want to change its delimiter to some non-conflicting character (like a "?"), so :
TOKEN1=":start:"
TOKEN2="/end"
VALUE=$(echo "${INPUT}" | sed -e "s?.*$TOKEN1\(.*\)$TOKEN2.*?\1?")
There is no need for any external utilities, bash parameter-expansion will handle it all for you:
INPUT="arn:aws:dfasdfasdf/asdfa:start:CaptureThis/end"
token=${INPUT##*:}
echo ${token%/*}
Output
CaptureThis

How to use grep with two patterns?

I have followed many links and reference documentation but still facing errors.
I want to list all the words start or ending with 'sx'.
I already found a solution which works fine:
awk '/^sx|sx$/' usr/dict/words
However I tried to use grep or egrep and neither worked.
And then I made it simpler. Any word which contains 'ax' or 'sx', and again it did not work!
grep 'ax|sx' words
grep -E '^ax\|sx$' words
egrep '^ax\|sx$' words
Error:
grep: illegal option -- E
Usage: grep -hblcnsviw pattern file . . .
Any suggestion appreciated.
This should work for you:
egrep '(^ax|sx$)' usr/dict/words
NOTE:
I couldn't find any words ending in sx on the dictionary I've used.

how to select lines containing several words using sed?

I am learning using sed in unix.
I have a file with many lines and I wanna delete all lines except lines containing strings(e.g) alex, eva and tom.
I think I can use
sed '/alex|eva|tom/!d' filename
However I find it doesn't work, it cannot match the line. It just match "alex|eva|tom"...
Only
sed '/alex/!d' filename
works.
Anyone know how to select lines containing more than 1 words using sed?
plus, with parenthesis like "sed '/(alex)|(eva)|(tom)/!d' file" doesn't work, and I wanna the line containing all three words.
sed is an excellent tool for simple substitutions on a single line, for anything else just use awk:
awk '/alex/ && /eva/ && /tom/' file
delete all lines except lines containing strings(e.g) alex, eva and tom
As worded you're asking to preserve lines containing all those words but your samples preserve lines containing any. Just in case "all" wasn't a misspeak: Regular expressions can't express any-order searches, fortunately sed lets you run multiple matches:
sed -n '/alex/{/eva/{/tom/p}}'
or you could just delete them serially:
sed '/alex/!d; /eva/!d; /tom/!d'
The above works on GNU/anything systems, with BSD-based userlands you'll have to insert a bunch of newlines or pass them as separate expressions:
sed -n '/alex/ {
/eva/ {
/tom/ p
}
}'
or
sed -e '/alex/!d' -e '/eva/!d' -e '/tom/!d'
You can use:
sed -r '/alex|eva|tom/!d' filename
OR on Mac:
sed -E '/alex|eva|tom/!d' filename
Use -i.bak for inline editing so:
sed -i.bak -r '/alex|eva|tom/!d' filename
You should be using \| instead of |.
Edit: Looks like this is true for some variants of sed but not others.
This might work for you (GNU sed):
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{3}//p' file
This method would allow a range of values to be present i.e. you wanted 2 or more of the list then use:
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{2,3}//p' file

Having trouble with GREP and REGEX

I have a text file that stores combinations of four numbers in the following format:
Num1,Num2,Num3,Num4
Num5,Num6,Num7,Num8
.............
I have a whole bunch of such files and what I need is to grep for all filenames that contains the pattern described above.
I constructed my grep as follows:
grep -l "{d+},{d+},{d+},{d+}" /some/path/to/file/name
The grep terminates without returning anything.
Can somebody point out what I might be doing wrong with my grep statement?
Thanks
This should do what you want:
egrep -l '[[:digit:]]+,[[:digit:]]+,[[:digit:]]+,[[:digit:]]+' /some/path/to/file/name
One way is using a perl regexp:
grep -Pl "\d+,\d+,\d+,\d+" /some/path/to/file/name
In your syntax d is literal. It should be escaping that letter, but is not accepted by grep regular regexp.