Replace the first occurrence with sed - regex

As the example below, I want to keep only the word before the first
'John'.
However, the pattern I applied seems to replace John from the end to the head. So I need to call sed twice.
How could I find the correct way?
PATTERN="I am John, you are John also"
OUTPUT=$( echo "$PATTERN" | sed -r "s/(^.*)([ \t]*John[ ,\t]*)(.*$)/\1/" )
echo "$OUTPUT"
OUTPUT=$( echo "$OUTPUT" | sed -r "s/(^.*)([ \t]*John[ ,\t]*)(.*$)/\1/" )
echo "$OUTPUT"
My expectation is only call sed one time. Since if "John" appears several times it will be a trouble.
By the procedure above, it will generate output as:
Firstly it matches & trims the word after the final John; then the first John.
I am John, you are
I am
I want to execute one time and get
I am

Following sed may help you on same.
echo "I am John, you are John also" | sed 's/ John.*//'
Or with variables.
pattern="I am John, you are John also"
output=$(echo "$pattern" | sed 's/John.*//')

Another way of doing it is to use the grep command in Perl mode:
echo "I am John, you are John also" | grep -oP '^(?:(?!John).)*';
I am
#there will be a space at the end
echo "I am John, you are John also" | grep -oP '^(?:(?!John).)*(?=\s)';
I am
#there is no space at the end
Regex explanations:
^(?:(?!John).)*
This will accept all characters from the beginning of the lines until it reaches the first John.
Regex demo

Awk solution:
s="I am John, you are John also and there is another John there"
awk '{ sub(/[[:space:]]+John.*/, "") }1' <<<"$s"
The output:
I am

Related

how to regex replace before colon?

this is my original string:
NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1
I want to only add back slash to all the spaces before ':'
so, this is what I finally want:
NetworkManager/system\ connections/Wired\ 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1
I need to do this in bash, so, sed, awk, grep are all ok for me.
I have tried following sed, but none of them work
echo NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1 | sed 's/ .*\(:.*$\)/\\ .*\1/g'
echo NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1 | sed 's/\( \).*\(:.*$\)/\\ \1.*\2/g'
echo NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1 | sed 's/ .*\(:.*$\)/\\ \1/g'
echo NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1 | sed 's/\( \).*\(:.*$\)/\\ \1\2/g'
thanks for answering my question.
I am still quite newbie to stackoverflow, I don't know how to control the format in comment.
so, I just edit my original question
my real story is:
when I do grep or use cscope to search keyword, for example "address1" under /etc folder.
the result would be like:
./NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1
if I use vim to open file under cursor, suppose my vim cursor is now at word "NetworkManager",
then vim will understand it as
"./NetworkManager/system"
that's why I want to add "\" before space, so the search result would be more vim friendly:)
I did try to change cscope's source code, but very difficult to fully achieve this. so have to do a post replacement:(
If you only want to do the replacements if there is a : present in the string, you can check if there are at least 2 columns, setting the (output)field separator to a colon.
Data:
cat file michaelvandam#Michaels-MacBook-Pro
NetworkManager/system connections/Wired 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1
NetworkManager/system connections/Wired 1.nmconnection 14 address1=10.1.10.71/24,10.1.10.1%
Example in awk:
awk 'BEGIN {FS=OFS=":"}{if(NF>1)gsub(" ","\\ ",$1)}1' file
Output
NetworkManager/system\ connections/Wired\ 1.nmconnection:14 address1=10.1.10.71/24,10.1.10.1
NetworkManager/system connections/Wired 1.nmconnection 14 address1=10.1.10.71/24,10.1.10.1
This could be simply done in awk program, with your shown samples, please try following.
awk 'BEGIN{FS=OFS=":"} {gsub(/ /,"\\\\&",$1)} 1' Input_file
Explanation: Simple explanation would be, setting field separator and output field separator as : for this program. Then in main program using gsub(Global substitution) function of awk. Where substituting space with \ in 1st field only(as per OP's remarks it should be done before :) and printing line then.
An idea for a perl one liner in bash to use \G and \K (similar #CarySwoveland's comment).
perl -pe 's/\G[^ :]*\K /\\ /g' myfile
See this demo at tio.run or a pattern demo at regex101.
This might work for you (GNU sed):
sed -E ':a;s/^([^: ]*) /\1\n/;ta;s/\n/\\ /g' file
Replace spaces before : by newlines then replace newlines by \ 's.
Alternative using the hold space:
sed -E 's/:/\n:/;h;s/ /\\ /g;G;s/\n.*\n//' file
Split the line on the first :.
Amend the front section, remove the middle and append the unadulterated back section.
My answer is ugly and I think RavinderSingh13's answer is THE ONE, but I already took the time to write mine and it works (It's written step by step, but it's a one line command):
I got inspired by HatLess answer:
first get the text before the : with cut (I put the string in a file to make it easy to read, but this works on echo):
cut -d':' -f1 infile
Then replace spaces using sed:
cut -d':' -f1 infile | sed 's/\([a-z]\) /\1\\ /g'
Then echo the output with no new line:
echo -n "$(cut -d':' -f1 infile | sed -e 's/\([a-z]\) /\1\\ /g')"
Add the missing : and what comes after it:
echo -n "$(cut -d':' -f1 infile | sed -e 's/\([a-z]\) /\1\\ /g')" | cat - <(echo -n :) | cat - <(cut -d':' -f2 infile)

Extract a digit out after a quote in bash

I have an output looks like this
[1] "0 found |2018-07-15 22:21:09 - no new item is submitted "
How can I extract 0 out with "grep" using regular expression in bash?
Thank you!
If ok with awk. could you please try following and let me know if this helps you.
awk 'match($0,/\"[0-9]+/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
OR
your_command | awk 'match($0,/\"[0-9]+/){print substr($0,RSTART+1,RLENGTH-1)}'
Try this grep option:
grep -Po '^([0-9]+) (?=found)'
Demo
The line above uses the pattern ^([0-9]+) (?=found), which says to match a number at the start of the line which is immediately followed by the text found.
There is one sed solution
$ sed 's/^.*"\(.*\) found .*/\1/' <<< '[1] "0 found |2018-07-15 22:21:09 - no new
item is submitted "'
0
If perl counts, here is one more easy way
$ perl -ne 'print $1 if /(\d+) found/' <<< '[1] "0 found |2018-07-15 22:21:09 - no
new item is submitted "'
0
I also come up this solution:
$LOG_FILE is the file that I am going to write all my outputs
grep -oE '\[1\] " *[0-9]' $LOG_FILE | cut -d"\"" -f2 | head -1

Bash Regex Capture Groups

I have a single string that is this kind of format:
"Mike H<michael.haken#email1.com>" michael.haken#email2.com "Mike H<hakenmt#email1.com>"
If I was writing a normal regex in JS, C#, etc, I'd do this
(?:"(.+?)"|'(.+?)'|(\S+))
And iterate the match groups to grab each string, ideally without the quotes. I ultimately want to add each value to an array, so in the example, I'd end up with 3 items in an array as follows:
Mike H<michael.haken#email1.com>
michael.haken#email2.com
Mike H<hakenmt#email1.com>
I can't figure out how to replicate this functionality with grep or sed or bash regex's. I've tried some things like
echo "$email" | grep -oP "\"\K(.+?)(?=\")|'\K(.+?)(?=')|(\S+)"
The problem with this is that while it kind of mimics the functionality of capture groups, it doesn't really work with multiples, so I get captures like
"Mike
H<michael.haken#email1.com>"
michael.haken#email2.com
If I remove the look ahead/behind logic, I at least get the 3 strings, but the first and last are still wrapped in quotes. In that approach, I pipe the output to read so I can individually add each string to the array, but I'm open to other options.
EDIT:
I think my input example may have been confusing, it's just a possible input. The real input could be double quoted, single quoted, or non-quoted (without spaces) strings in any order with any quantity. The Javascript/C# regex I provided is the real behavior I'm trying to achieve.
You can use Perl:
$ email='"Mike H<michael.haken#email1.com>" michael.haken#email2.com "Mike H<hakenmt#email1.com>"'
$ echo "$email" | perl -lane 'while (/"([^"]+)"|(\S+)/g) {print $1 ? $1 : $2}'
Mike H<michael.haken#email1.com>
michael.haken#email2.com
Mike H<hakenmt#email1.com>
Or in pure Bash, it gets kinda wordy:
re='\"([^\"]+)\"[[:space:]]*|([^[:space:]]+)[[:space:]]*'
while [[ $email =~ $re ]]; do
echo ${BASH_REMATCH[1]}${BASH_REMATCH[2]}
i=${#BASH_REMATCH}
email=${email:i}
done
# same output
You may use sed to achieve that,
$ sed -r 's/"(.*)" (.*)"(.*)"/\1\n\2\n\3/g' <<< "$EMAIL"
Mike H<michael.haken#email1.com>
michael.haken#email2.com
Mike H<hakenmt#email1.com>
gawk + bash solution (adding each item to array):
email_str='"Mike H<michael.haken#email1.com>" michael.haken#email2.com "Mike H<hakenmt#email1.com>"'
readarray -t email_arr < <(awk -v FPAT="[^\"'[:space:]]+[^\"']+[^\"'[:space:]]+" \
'{ for(i=1;i<=NF;i++) print $i }' <<<$email_str)
Now, all items are in email_arr
Accessing the 2nd item:
echo "${email_arr[1]}"
michael.haken#email2.com
Accessing the 3rd item:
echo "${email_arr[3]}"
Mike H<hakenmt#email1.com>
Your first expression is fine; just be careful with the quotes (use single quotes when \ are present). In the end trim the " with sed.
$ echo $mail | grep -Po '".*?"|\S+' | sed -r 's/"$|^"//g'
Mike H<michael.haken#email1.com>
michael.haken#email2.com
Mike H<hakenmt#email1.com>
Using gawk where you can set multi-line RS.
awk -v RS='"|" ' 'NF' inputfile
Mike H<michael.haken#email1.com>
michael.haken#email2.com
Mike H<hakenmt#email1.com>
Modify your regex like this :
grep -oP '("?\s*)\K.*?(?=")' file
Output:
Mike H<michael.haken#email1.com>
michael.haken#email2.com
Mike H<hakenmt#email1.com>
Using GNU awk and FPAT to define fields by content:
$ awk '
BEGIN { FPAT="([^ ]*)|(\"[^\"]*\")" } # define a field to be space-separated or in quotes
{
for(i=1;i<=NF;i++) { # iterate every field
gsub(/^\"|\"$/,"",$i) # remove leading and trailing quotes
print $i # output
}
}' file
Mike H<michael.haken#email1.com>
michael.haken#email2.com
Mike H<hakenmt#email1.com>
What I was able to do that worked, but wasn't as concise as I wanted the code to be:
arr=()
while read line; do
line="${line//\"/}"
arr+=("${line//\'/}")
done < <(echo $email | grep -oP "\"(.+?)\"|'(.+?)'|(\S+)")
This gave me an array of the capturing group and handled the input in any order, wrapped in double or single quotes or none at all if it didn't have a space. It also provided the elements in the array without the wrapping quotes. Appreciate all of the suggestions.

How to match and keep the first number in a line using sed?

Question
Let's say I have one line of text with a number placed somewhere (it could be at the beginning, in the middle or at the end of the line).
How to match and keep the first number found in a line using sed?
Minimal example
Here is my attempt (following this page of a tutorial on regular expressions) and the output for different positions of the number:
$echo "SomeText 123SomeText" | sed 's:.*\([0-9][0-9]*\).*:\1:'
3
$echo "123SomeText" | sed 's:.*\([0-9][0-9]*\).*:\1:'
3
$echo "SomeText 123" | sed 's:.*\([0-9][0-9]*\).*:\1:'
3
As you can only the last digit is kept in the process whereas the desired output should be 123...
Using sed:
echo "SomeText 123SomeText 456" | sed -r 's/^[^0-9]*([0-9]+).*$/\1/'
123
You can also do this in gnu awk:
echo "SomeText 123SomeText 456" | awk '{print gensub(/^[^0-9]*([0-9]+).*$/, "\\1", $0)}'
123
To complement the sed solutions, here's an awk alternative (assuming that the goal is to extract the 1st number on each line, if any (i.e., ignore lines without any numbers)):
awk -F'[^0-9]*' '/[0-9]/ { print ($1 != "" ? $1 : $2) }'
-F'[^0-9]*' defines any sequence of non-digit chars. (including the empty string) as the field separator; awk automatically breaks each input line into fields based on that separator, with $1 representing the first field, $2 the second, and so on.
/[0-9]/ is a pattern (condition) that ensures that output is only produced for lines that contain at least one digit, via its associated action (the {...} block) - in other words: lines containing NO number at all are ignored.
{ print ($1!="" ? $1 : $2) } prints the 1st field, if nonempty, otherwise the 2nd one; rationale: if the line starts with a number, the 1st field will contain the 1st number on the line (because the line starts with a field rather than a separator; otherwise, it is the 2nd field that contains the 1st number (because the line starts with a separator).
You can also use grep, which is ideally suited to this task. sed is a Stream EDitor, which is only going to indirectly give you what you want. With grep, you only have to specify the part of the line you want.
$ cat file.txt
SomeText 123SomeText
123SomeText
SomeText 123
$ grep -o '[0-9]\+' file.txt
123
123
123
grep -o prints only the matching parts of a line, each on a separate line. The pattern is simple: one or more digits.
If your version of grep is compatible with the -P switch, you can use Perl-style regular expressions and make the command even shorter:
$ grep -Po '\d+' file.txt
123
123
123
Again, this matches one or more digits.
Using grep is a lot simpler and has the advantage that if the line doesn't match, nothing is printed:
$ echo "no number" | grep -Po '\d+' # no output
$ echo "yes 123number" | grep -Po '\d+'
123
edit
As pointed out in the comments, one possible problem is that this won't only print the first matching number on the line. If the line contains more than one number, they will all be printed. As far as I'm aware, this can't be done using grep -o.
In that case, I'd go with perl:
perl -lne 'print $1 if /.*?(\d+).*/'
This uses lazy matching (the question mark) so only non-digit characters are consumed by the .* at the start of the pattern. The $1 is a back reference, like \1 in sed. If there are more than one number on the line, this only prints the first. If there aren't any at all, it doesn't print anything:
$ echo "no number" | perl -ne 'print "$1\n" if /.*?(\d+).*/'
$ echo "yes123number456" | perl -lne 'print $1 if /.*?(\d+).*/'
123
If for some reason you still really want to use sed, you can do this:
sed -n 's/^[^0-9]*\([0-9]\{1,\}\).*$/\1/p'
unlike the other answers, this is compatible with all version of sed and will only print lines that contain a match.
Try this sed command,
$echo "SomeText 123SomeText" | sed -r '/[^0-9]*([0-9][0-9]*)[^0-9]*/ s//\1 /g'
123
Another example,
$ echo "SomeText 123SomeText 456" | sed -r '/[^0-9]*([0-9][0-9]*)[^0-9]*/ s//\1 /g'
123 456
It prints all the numbers in a file and the captured numbers are separated by spaces while printing.

Sed usage, remove / change text

I have an string that generated as following
OK::82.44.127.13:GB:UNITED KINGDOM:ENGLAND:WOKING:-:51.000:-0.55813:+01:00
i need a sed string to basically change to
82.44.127.13;GB UNITED KINGDOM ENGLAND WOKING;51.000;-0.55813
i think this will require multi runs of sed, so far i have:
sed 's/:-:/;/g' (which will change the :-: to a ;)
sed 's/:-/;-/g' (which will change the :- to a ;)
sed s/OK:://g (which will strip the OK::)
but im stuck on how to change the : between the ip address and location to a ;, and then remove all other :, and also strip off the time?
(note not really ip addresses)
With awk it can be easier:
$ awk -F ":" '{print $3";"$4" "$5" "$6" "$7";"$9";"$10}' <<< "OK::82.44.127.13:GB:UNITED KINGDOM:ENGLAND:WOKING:-:51.000:-0.55813:+01:00"
82.44.127.13;GB UNITED KINGDOM ENGLAND WOKING;51.000;-0.55813
Note that I am printing things like $3";"$4" "$5 because the field separator is changing. Otherwise we could use: BEGIN{OFS=";"}
$ awk -F ":" 'BEGIN{OFS=";"}{print $3,$4,$5,$6,$7,$9,$10}' <<< "OK::82.44.127.13:GB:UNITED KINGDOM:ENGLAND:WOKING:-:51.000:-0.55813:+01:00"
82.44.127.13;GB;UNITED KINGDOM;ENGLAND;WOKING;51.000;-0.55813
This will do everything in one sed command:
$ echo "OK::82.44.127.13:GB:UNITED KINGDOM:ENGLAND:WOKING:-:51.000:-0.55813:+01:00" | \
sed -r 's/OK::([0-9.]*):([A-Z ]*):([A-Z ]*):([A-Z ]*):([A-Z ]*):-:(-?[0-9.]*):(-?[0-9.]*):.*/\1;\2 \3 \4 \5;\6;\7/'
82.44.127.13;GB UNITED KINGDOM ENGLAND WOKING;51.000;-0.55813
Without awk:
cut -d: -f 3-7,9,10 | tr ":" ";" | sed -r 's/([A-Z]);([A-Z])/\1 \2/g'