Matching end of ine in GREP

Matching end of ine in GREP - regex

I have this piece of bash script which is supposed to match words that end with an 'a'. However, when I run this, I get no output, despite the fact that my text file has words that end with 'a'.
cat $1 | cut -d'|' -f3 | cut -d',' -f2 | sed 's/^ //' | egrep -i "a$"
If I remove the '$' it shows output, but with '$' it returns nothing. It still works, just doesn't match.
Would appreciate some help with this. Thanks.
A sample of the file
MTNG1511|5013566|Xin, Mackenzie Darren MTNG9902|5079970|Park, Xue
Hannah Vanessa MTNG1511|5059072|Chung, Michael Jia Tianyu
MTNG1521|5060774|Lim, Stephanie Lauren MTNG1531|5060774|Lim, Stephanie
Lauren MTNG2521|5060774|Lim, Stephanie Lauren MTNG9020|5060538|Bi,
Samuel Shiyu MTNG9021|5060538|Bi, Samuel Shiyu MTNG9902|5072116|Hu,
Kai Zhi Patrick
Output should be
Park, Xue Hannah Vanessa
Since it ends with an 'a'

There's probably extra whitespace at the end of your word.
Try adding
sed 's/[ \t]*$//'
to remove the whitespace -- or else change your grep to allow for whitespace at the end.

This is very simple using grep
grep -o '[^\|]\+$' < "$1" | grep 'a\s*$'
Output
$ bash example file.txt
Park, Xue Hannah Vanessa
$
[^\|]\+ match one or more characters that aren't | to the end of the line.
a\s*$ match a as last character but check for some spaces before the line feed.

Related

how to regex replace before colon?

this is my original string:
NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1
I want to only add back slash to all the spaces before ':'
so, this is what I finally want:
NetworkManager/system\ connections/Wired\ 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1
I need to do this in bash, so, sed, awk, grep are all ok for me.
I have tried following sed, but none of them work
echo NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1 | sed 's/ .*\(:.*$\)/\\ .*\1/g'
echo NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1 | sed 's/\( \).*\(:.*$\)/\\ \1.*\2/g'
echo NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1 | sed 's/ .*\(:.*$\)/\\ \1/g'
echo NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1 | sed 's/\( \).*\(:.*$\)/\\ \1\2/g'
thanks for answering my question.
I am still quite newbie to stackoverflow, I don't know how to control the format in comment.
so, I just edit my original question
my real story is:
when I do grep or use cscope to search keyword, for example "address1" under /etc folder.
the result would be like:
./NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1
if I use vim to open file under cursor, suppose my vim cursor is now at word "NetworkManager",
then vim will understand it as
"./NetworkManager/system"
that's why I want to add "\" before space, so the search result would be more vim friendly:)
I did try to change cscope's source code, but very difficult to fully achieve this. so have to do a post replacement:(

If you only want to do the replacements if there is a : present in the string, you can check if there are at least 2 columns, setting the (output)field separator to a colon.
Data:
cat file michaelvandam#Michaels-MacBook-Pro
NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1
NetworkManager/system connections/Wired 1.nmconnection 14 address1=10.1.10.71/24,10.1.10.1%
Example in awk:
awk 'BEGIN {FS=OFS=":"}{if(NF>1)gsub(" ","\\ ",$1)}1' file
Output
NetworkManager/system\ connections/Wired\ 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1
NetworkManager/system connections/Wired 1.nmconnection 14 address1=10.1.10.71/24,10.1.10.1

This could be simply done in awk program, with your shown samples, please try following.
awk 'BEGIN{FS=OFS=":"} {gsub(/ /,"\\\\&",$1)} 1' Input_file
Explanation: Simple explanation would be, setting field separator and output field separator as : for this program. Then in main program using gsub(Global substitution) function of awk. Where substituting space with \ in 1st field only(as per OP's remarks it should be done before :) and printing line then.

An idea for a perl one liner in bash to use \G and \K (similar #CarySwoveland's comment).
perl -pe 's/\G[^ :]*\K /\\ /g' myfile
See this demo at tio.run or a pattern demo at regex101.

This might work for you (GNU sed):
sed -E ':a;s/^([^: ]*) /\1\n/;ta;s/\n/\\ /g' file
Replace spaces before : by newlines then replace newlines by \ 's.
Alternative using the hold space:
sed -E 's/:/\n:/;h;s/ /\\ /g;G;s/\n.*\n//' file
Split the line on the first :.
Amend the front section, remove the middle and append the unadulterated back section.

My answer is ugly and I think RavinderSingh13's answer is THE ONE, but I already took the time to write mine and it works (It's written step by step, but it's a one line command):
I got inspired by HatLess answer:
first get the text before the : with cut (I put the string in a file to make it easy to read, but this works on echo):
cut -d':' -f1 infile
Then replace spaces using sed:
cut -d':' -f1 infile | sed 's/\([a-z]\) /\1\\ /g'
Then echo the output with no new line:
echo -n "$(cut -d':' -f1 infile | sed -e 's/\([a-z]\) /\1\\ /g')"
Add the missing : and what comes after it:
echo -n "$(cut -d':' -f1 infile | sed -e 's/\([a-z]\) /\1\\ /g')" | cat - <(echo -n :) | cat - <(cut -d':' -f2 infile)

Using grep to remove text after the first, or second, occurrence of a four digit string. Issue with hyphenated text

I am trying to use grep and sed to format text and need help with my grep statement to include hyphens and preceding text in the output.
Example strings:
Merry.Ex-Mas.2014.1080p.Text.x265-JOHN
30.Rock.A.One-Time.Special.2020.1080p.Text.x265-JOHN
Creature.from.the.Black.Lagoon.REMASTERED.1954.1080p.BluRay.x265-JOHN
1984.1984.1080p.Text.x265-JOHN
The desired output would be:
Merry Ex-Mas 2014
30 Rock A One-Time Special 2020
Creature from the Black Lagoon 1954
1984 1984
Thanks to #grzegorz-pudłowski I have this line of code. (but for some reason hyphens and everything in front of the hyphen is being removed)
`grep -E -o '(\\w*[\\.]?)*(19|20)[0-9]{2}'`
(the extra escapes are needed in AppleScript)
Those grep commands result in:
Mas.2014
Time.Special.2020
Creature.from.the.Black.Lagoon.1954
1984.1984
I then pipe to sed to replace periods with spaces:
| sed 's/\\. */ /g'"
The original answer from #grzegorz-pudłowski that was removed from stackoverflow:
Better than sed should be grep in this situation. I gues you have bunch of files and you want to rename them or what not. So I would use something like this:
echo "Title.Text.2012.1080p.text.text" | grep -E -o "(\w*[\.]?)*(19|20)[0-9]{2}"
So... -E is "regex extended" flag. You can use egrep instead. Next flag is -o and it makes grep print only matched expression (as you want to throw away rest of this string).
Regexp is simple:
(\w*[\.]?)* match zero or more groups of zero or more alphanumeric
characters with zero or one dot at the end.
(19|20) match 19 or 20 as you want to match a year (assuming years
1900-2099 so change this part if you want wider range)
[0-9]{2} match two digits from 0 to 9
After that you can pipe result to mv or what not. If you grep file however then just use:
grep -E -o "(\w*[\.]?)*(19|20)[0-9]{2}" filename.txt

EDIT2: In case OP wants to stick with his original solution with additional steps then try following.
grep -E -o "(\w+\.){1,}.*(19|20)[0-9]{2}" Input_file | sed 's/\./ /g'
EDIT: As per OP's comment adding more generic solution.
awk '
match($0,/[0-9]{4}\.[0-9]+[a-zA-Z]+\..*/){
val=substr($0,1,RSTART+4)
gsub(/\./," ",val)
print val
val=""
}
' Input_file
Could you please try following, written and tested with shown samples in GNU sed.
sed -E 's/\.[0-9]+p\.Text\..*Text//;s/\./ /g' Input_file
2nd solution: Using awk.
awk '
BEGIN{
FS="."
}
match($0,/\.[0-9]+p\.Text\..*Text/){
$1=$1
print substr($0,1,RSTART-1)
}
' Input_file

A sed expression using BRE (Basic Regular Expressions) can be written as:
sed 's/[.]/ /g;s/\w\w*p\s.*$//' file
Where the first substitution globally replaces each '.' with a space and then the second deletes from the word ending in 'p' to the end of line. \w matches [A-Za-z0-9_], so you can tighten the matching criteria by adjusting the match of characters before 'p' if needed.
Example Use/Output
$ sed 's/[.]/ /g;s/\w\w*p\s.*$//' file
Merry Ex-Mas 2014
30 Rock A One-Time Special 2020
1984 1984
Per-Edits To Include Additional Strings
Including additional strings such as:
"WALL-E.2008.1080p.BluRay.x265-JOHN", and
"WALL-E.2008.REMASTERED.1080p.BluRay.x265-RARBG"
To use BRE you would need:
sed 's/[.]/ /g;s/^[0-9][0-9]*[ ]\([0-9][0-9][0-9][0-9]\).*$/\1 \1/;s/[ ]\([0-9][0-9][0-9][0-9]\).*$/ \1/' file
Example Input File
$ cat file
Merry.Ex-Mas.2014.1080p.Text.x265.Text
30.Rock.A.One-Time.Special.2020.1080p.Text.x265.Text
1984.1984.1080p.Text.x265.Text
WALL-E.2008.1080p.BluRay.x265-JOHN
WALL-E.2008.REMASTERED.1080p.BluRay.x265-RARBG
Example Use/Output
$ sed 's/[.]/ /g;s/^[0-9][0-9]*[ ]\([0-9][0-9][0-9][0-9]\).*$/\1 \1/;s/[ ]\([0-9][0-9][0-9][0-9]\).*$/ \1/' file
Merry Ex-Mas 2014
30 Rock A One-Time Special 2020
1984 1984
WALL-E 2008
WALL-E 2008

This can be solved using sed substitution:
sed -E 's/(.*(19|20)[0-9]{2}).*/\1/; s/\./ /g' file
Merry Ex-Mas 2014
30 Rock A One-Time Special 2020
1984 1984
Details:
(.*(19|20)[0-9]{2}): Match longest string till we get a year string and capture in group #1
.*: Match remaining part till end
\1: Put 1st capture group back
s/\./ /g: replace each dot with spacec

You may use
sed -E 's/\.1080p\..*//g;s/\./ /g' file
See the online sed demo
Details
-E - enables POSIX ERE syntax
s/\.1080p\..*//g - removes the .1080. and all text to the end of string
s/\./ /g - replaces dots with spaces.
Test:
#!/bin/bash
s='Merry.Ex-Mas.2014.1080p.
30.Rock.A.One-Time.Special.2020.1080p.
1984.1984.1080p.'
sed -E 's/\.1080p\..*//g;s/\./ /g' <<< "$s"
Output:
Merry Ex-Mas 2014
30 Rock A One-Time Special 2020
1984 1984

grep matching but not printing if line end in dos ^M

I need to search in multiple files for a PATTERN, if found display the file, line and PATTERN surrounded by a few extra chars. My problem is that if the line matching the PATTERN ends with ^M (CRLF) grep prints an empty line instead.
Create a file like this, first line "a^M", second line "a", third line empty line, forth line "a" (not followed by a new line).
a^M
a
a
Without trying to match a few chars after the PATTERN all occurrences are found and displayed:
# grep -srnoEiI ".{0,2}a" *
1:a
2:a
4:a
If I try to match any chars at the end of the PATTERN, it prints an empty line instead of line one, the one ending in CRLF:
# grep -srnoEiI ".{0,2}a.{0,2}" *
2:a
4:a
How can I change this to act as expected ?
P.S. I will like to fix this grep, but I will accept other solutions for example in awk.
EDIT:
Based on the answers below I choose to strip the \r and force grep to pipe the colors to tr:
grep --color=always -srnoEiI ".{0,2}a.{0,2}" * | tr -d '\r'

Here's a simpler case that reproduces your problem:
# Output
echo $'a\r' | grep -o "a"
# No output
echo $'a\r' | grep -o "a."
This is beacuse the ^M matches like a regular character, and makes your terminal overwrite its output (this is purely cosmetic).
How you want to fix this depends on what you want to do.
# Show the output in hex format to ensure it's correct
$ echo $'a\r' | grep -o "a." | od -t x1 -c
0000000 61 0d 0a
a \r \n
# Show the output in visually less ambiguous format
$ echo $'a\r' | grep -o "a." | cat -v
a^M
# Strip the carriage return
$ echo $'a\r' | grep -o "a." | tr -d '\r'
a

awk -v pattern="a" '$0 ~ pattern && !/\r$/ {print NR ": " $0}' file
or
sed -n '/a/{/\r$/!{=;p}}' ~/tmp/srcfile | paste -d: - -
Both of these do: find the pattern, see if the line does not end in a carriage return, print the line number and the line. For the sed, the line number is on its own line, so we have to join two consecutive lines with a colon.

You could use pcregrep:
pcregrep -n '.{0,2}a.{0,2}' inputfile
For your sample input:
$ printf $'a\r\na\n\na\n' | pcregrep -n '.{0,2}a.{0,2}'
1:a
2:a
4:a

A couple more ways:
Use the dos2unix utility to convert the dos-style line endings to unix-style:
dos2unix myfile.txt
Or preprocess the file using tr to remove the CR characters, then pipe to grep:
$ tr -d '\r' < myfile.txt | grep -srnoEiI ".{0,2}a.{0,2}"
1:a
2:a
4:a
$
Note dos2unix may need to be installed on whatever OS you are using. More than likely tr will be available on any POSIX-compliant OS.

You can use awk with a custom field separator:
awk -F '[[:blank:]\r]' '/.{0,2}a.{0,2}/{print FILENAME, NR, $1}' OFS=':' file
TESTING:
Your grep command:
grep -srnoEiI ".{0,2}a.{0,2}" file|cat -vte
file:1:a^M$
file:2:a$
file:4:a$
Suggested awk commmand:
awk -F '[[:blank:]\r]' '/.{0,2}a.{0,2}/{print FILENAME, NR, $1}' OFS=':' file|cat -vte
file:1:a$
file:2:a$
file:4:a$

Regex: return the first line after the matching line

How do I return the first line after the matching line? I realize different regex engines vary; I'm interested particularly in the grep version.

There is a straightforward way to do this using grep.
grep -A1 'PATTERN' file
The -A option means print NUM lines of trailing context after matching lines.
grep -A1 'PATTERN' file | grep -v 'PATTERN'
The -v option inverts the sense of matching, to select non-matching lines. You can use this option if you only want the line after the matched pattern to be printed.

actually, you can do it quite easily with two grep invocations:
grep --no-group-separator -A1 pattern input.dat | grep -v pattern
but this is a bit ugly because you have to enter the pattern twice, and this can be error prone. You could wrap it with a script, but personally, I tend to use sed for this sort of thing.
Anyway, here's an example:
[lineafter.sed $] cat input.dat
aaa0
bbb0
ccc0
ddd0
eee0
fff0
ggg0
hhh0
aaa1
bbb1
ccc1
ddd1
eee1
fff1
ggg1
hhh1
[lineafter.sed $] grep --no-group-separator -A1 ccc input.dat | grep -v ccc
ddd0
ddd1
[mnoy#mn09 lineafter.sed $]
(note that if you don't use --no-group-separator, you'll end up with -- markers to delimit the instances of the match (which might be what you want...)).

Using sed
sed '/pattern/,+1!d' filename

This will only give the next line after pattern found
awk 'f {print;f=0} /pattern/ {f=1}' file
example
cat file
one
two
three
four
awk 'f {print;f=0} /two/ {f=1}' file
three

I don't know how to do this with grep, but it's very easy with sed:
sed -n '/pattern/{n;p}' input.dat
example:
[lineafter.sed $] cat input.dat
aaa0
bbb0
ccc0
ddd0
eee0
fff0
ggg0
hhh0
aaa1
bbb1
ccc1
ddd1
eee1
fff1
ggg1
hhh1
[lineafter.sed $]
[lineafter.sed $] sed -n '/ccc/{n;p}' input.dat
ddd0
ddd1
[lineafter.sed $]

Using regex to extract a substring while excluding a certain phrase

Say for the string:
test.1234.mp4
I would like to extract the numbers
1234
without extracting the 4 in mp4
What would the regex be for this?
The numbers aren't always in the second position and can be in different positions and might not always be four digits. I would like to extract the number without extracting the 4 in mp4 essentially.
More examples:
test.abc.1234.mp4
test.456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4
Essentially only the numbers would be extracted. Hence, for the last example, 666 from e666 would not be extracte and only 123.
To extract I have been using
echo "example.123.mp4" | grep -o "REGEX"
Edit: test456 was meant to be test.456

The accepted answer will fail on "test.e666.123.mp4" (print 666).
This should work
$ cat | perl -ne '/\.(\d+)\./; print "$1\n"'
test.abc.1234.mp4
test.456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4
1234
456
111
123
Note that this will only print the first group of numbers, if we have test.123.456.mp4 only 123 will be printed.
The idea is to match a dot followed by numbers which we are interested in (parentheses to save the match), followed by another dot. This means that it will fail on 123.mp4.
To fix this you could have:
$ cat | perl -ne '/(^|\.)(\d+)\./; print "$2\n"'
test.abc.1234.mp4
test.456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4
781.test.mp4
1234
456
111
123
781
First match is either beginning of line (^) or a dot, followed by numbers and a dot. We use $2 here since $1 is either beginning of a line or a dot.

cut can make it:
$ echo "test.1234.mp4" | cut -d. -f2
1234
where
cut -d'.' -f2
delimiter 2nd field
If you provide more examples we can improve the output. With the current code you would extract any something in blablabla.something.blablabla.
Update: from your question update we can do this:
grep -o '\.[0-9]*\.' | sed 's/\.//g'
test:
$ echo "test.abc.1234.mp4
test456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4" | grep -o '\.[0-9]*\.' | sed 's/\.//g'
1234
111
123

grep -Po "(?<=\.)\d+(?=\.)"

echo "test.1234.mp4" | perl -lpe 's/[^.\d]+\d*//g;s/\D*(\d+).*/$1/'
or:
echo "1321.test.mp4" | perl -lpe 's/.*(?:^|\.)(\d+)\..*/$1/'
p is to print by default so that we don't need explicit print.
e says we have an expression, not a script file
l puts the newline
These will also work if you have a number at the first part of the name.

perl -F'\.' -lane 'print "$F[scalar(#F)-2]" if(/\d+\.mp4$/)' your_file
tested:
> perl -F'\.' -lane 'print "$F[scalar(#F)-2]" if(/\d+\.mp4$/)' temp
1234
111
123

$ cat file
test.abc.1234.mp4
test.456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4
$ sed 's/.*\.\([0-9][0-9]*\)\..*/\1/' file
1234
456
111
123

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Matching end of ine in GREP - regex

There's probably extra whitespace at the end of your word. Try adding sed 's/[ \t]*$//' to remove the whitespace -- or else change your grep to allow for whitespace at the end.

This is very simple using grep grep -o '[^\|]\+$' < "$1" | grep 'a\s$' Output $ bash example file.txt Park, Xue Hannah Vanessa $ [^\|]\+ match one or more characters that aren't | to the end of the line. a\s$ match a as last character but check for some spaces before the line feed.

Related

how to regex replace before colon?

Using grep to remove text after the first, or second, occurrence of a four digit string. Issue with hyphenated text

grep matching but not printing if line end in dos ^M

Regex: return the first line after the matching line

Using regex to extract a substring while excluding a certain phrase

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Matching end of ine in GREP - regex

There's probably extra whitespace at the end of your word. Try adding sed 's/[ \t]*$//' to remove the whitespace -- or else change your grep to allow for whitespace at the end.

This is very simple using grep grep -o '[^\|]\+$' < "$1" | grep 'a\s*$' Output $ bash example file.txt Park, Xue Hannah Vanessa $ [^\|]\+ match one or more characters that aren't | to the end of the line. a\s*$ match a as last character but check for some spaces before the line feed.

Related

how to regex replace before colon?

Using grep to remove text after the first, or second, occurrence of a four digit string. Issue with hyphenated text

grep matching but not printing if line end in dos ^M

Regex: return the first line after the matching line

Using regex to extract a substring while excluding a certain phrase

Categories

Resources

This is very simple using grep grep -o '[^\|]\+$' < "$1" | grep 'a\s$' Output $ bash example file.txt Park, Xue Hannah Vanessa $ [^\|]\+ match one or more characters that aren't | to the end of the line. a\s$ match a as last character but check for some spaces before the line feed.