sed retrieve part of line - regex

I have lines of code that look like this:
hi:12345:234 (second line)
How do I write a line of code using the sed command that only prints out the 2nd item in the second line?
My current command looks like this:
sed -n '2p' file which gets the second line, but I don't know what regex to use to match only the 2nd item '12345' and combine with my current command

Could you please try following, written and tested with shown samples in GNU sed.
sed -n '2s/\([^:]*\):\([^:]*\).*/\2/p' Input_file
Explanation: Using -n option of sed will stop the printing for all the lines and printing will happen only for those lines where we are explicitly mentioning p option to print(later in code). Then mentioning 2s means perform substitution on 2nd line only. Then using regex and sed's capability to store matched regex into a temp buffer by which values can be retrieved later by numbering 1,2...and so on. Regex is basically catching 1st part which comes before first occurrence of : and then 2nd part after first occurrence of : to till 2nd occurrence of : as per OP's request. So while doing substitution mentioning /2 will replace whole line with 2nd value stored in buffer as per request, then mentioning p will print that part only in 2nd line.

A couple of solutions:
echo "hi:12345:234" | sed -n '2s/.*:\([0-9]*\):.*/\1/p'
echo "hi:12345:234" | sed -n '2{s/^[^:]*://; s/:.*//p; q}'
echo "hi:12345:234" | awk -F':' 'FNR==2{print $2}'
All display 12345.
sed -n '2s/.*:\([0-9]*\):.*/\1/p' only displays the captured value thanks to -n and p option/flag. It matches a whole string capturing digits between two colons, \1 only keeps the capture.
The sed -n '2{s/^[^:]*://;s/:.*//p;q}' removes all from start till first :, all from the second to end, and then quits (q) so if your file is big, it will be processed quicker.
awk -F':' 'FNR==2{print $2}' splits the second line with a colon and fetches the second item.

Related

Sed handle patterns over multiple lines

For example if I have a file like
yo#gmail.com, yo#
gmail.com yo#gmail
.com
And I want to replace the string yo#gmail.com.
If the file had the target string in a single line then we could've just used
sed "s/yo#gmail.com/e#email.com/g" file
So, is it possible for me to catch patters that are spread between multiple line without replacing the \n?
Something like this
e#email.com, e#
email.com e#email
.com
Thank you.
You can do this:
tr -d '\n' < file | sed 's/yo#gmail.com/e#email.com/g'
This might work for you (GNU sed):
sed -E 'N;s/yo#gmail\.com/e#email.com/g
h;s/(\S+)\n(\S+)/\1\2\n\1/;/yo#gmail\.com/!{g;P;D}
s//\ne#email.com/;:a;s/\n(\S)(\S+)\n\S/\1\n\2\n/;ta
s/(.*\n.*)\n/\1/;P;D' file
Append the following line to the pattern space.
Replace all occurrences of matching email address in both the first and second lines.
Make a copy of the pattern space.
Concatenate the last word of the first line with the first word of the second and keep the second line as is. If there is no match with the email address, revert the line, print/delete the first line and repeat.
Otherwise, replace the match and re-insert the newline as of the length of the first word of the second line (deleting the first word of the second line too).
Remove the newline used for scaffolding, print/delete the first line and repeat.
N.B. The lines will not be of the same length as the originals if the replacement string length does not match the matching string length. Also there has been no attempt to break the replacement string in the same relative split if the match and replacement strings are not the same length.
Alternative:
echo $'yo#gmail.com\ne#email.com' |
sed -E 'N;s#(.*)\n(.*)#s/\\n\1/\\n\2/g#
:a;\#\\n([^/])(.*)\\n(.)?(.*/g)#{s//\1\\n\2\3\\n\4/;H;ba}
x;s/.//;s#\\n/g$#/g#gm;s#\\n/#/#;s/\./\\./g' |
sed -e 'N' -f - -e 'P;D' file
or:
echo 's/yo#gmail.com/e#email.com/' |
sed -E 'h;s#/#/\\n#g;:a;H;s/\\n([^/])/\1\\n/g;ta;x;s/\\n$//mg;s/\./\\./g' |
sed -zf - /file
N.B. With the last alternative solution, the last sed invocation can be swapped for the first alternative solutions last sed invocation.

repace n occurrences of a character in a string from the end

I am struggling to come up with a solution to replace n occurrences of a character with another character in a string starting from the end of the string. For example, if I want to replace last 5 occurrences of "," with "|" in a string like
abc, def,,{"data":{"xyz":null,"uan":"5643df"},{"path":"/abc/def/xyz"}},546,453,,,
to get a result like
abc, def,,{"data":{"xyz":null,"uan":"5643df"},{"path":"/abc/def/xyz"}}|546|453|||
I have looked at multiple solution which helps you find the last occurrence or all occurrences or 5 occurrences from the beginning but nothing which helps me do it from the end of the string. Reversing the string and doing it from the beginning and then reversing the string again is not an option because of the sheer size of the file.
With GNU sed. Replace five times last comma and rest of row with pipe and rest of row (s/,([^,]*)$/|\1/):
echo 'a,b,c,d,e,f,g,h' | sed -r 's/,([^,]*)$/|\1/; s/,([^,]*)$/|\1/; s/,([^,]*)$/|\1/; s/,([^,]*)$/|\1/; s/,([^,]*)$/|\1/;'
Output:
a,b,c|d|e|f|g|h
An awk version:
echo 'a,b,c,d,e,f,g,h' | awk -F, '{printf "%s",$1;for(i=2;i<=NF;i++) printf (NF-5<i?"|%s":",%s"),$i;print ""}'
a,b,c|d|e|f|g|h
It uses a loop to print each field. Count up and find when to use , or |. Number can be changed to get other result.
Example last to field:
echo 'a,b,c,d,e,f,g,h' | awk -F, '{printf "%s",$1;for(i=2;i<=NF;i++) printf (NF-2<i?"|%s":",%s"),$i;print ""}'
a,b,c,d,e,f|g|h
This might work for you (GNU sed):
sed -E '/(,[^,]*){5}$/{s//\n&/;h;y/,/|/;H;g;s/\n.*\n//}' file
Insert a newline just before the fifth comma from the end of a line, make a copy, replace all ,'s by |'s, append the current line to the copy and remove everything between the first and last newlines.
An alternative using GNU parallel and sed:
parallel -n0 -q echo 's/\(.*\),/\1|/' ::: {1..5} | sed -f - file
N.B. The first solution only amends the a line if there are at least 5 commas whereas the second solution amends a line regardless of how many commas there are.

How can I print 2 lines if the second line contains the same match as the first line?

Let's say I have a file with several million lines, organized like this:
#1:N:0:ABC
XYZ
#1:N:0:ABC
ABC
I am trying to write a one-line grep/sed/awk matching function that returns both lines if the NCCGGAGA line from the first line is found in the second line.
When I try to use grep -A1 -P and pipe the matches with a match like '(?<=:)[A-Z]{3}', I get stuck. I think my creativity is failing me here.
With awk
$ awk -F: 'NF==1 && $0 ~ s{print p ORS $0} {s=$NF; p=$0}' ip.txt
#1:N:0:ABC
ABC
-F: use : as delimiter, makes it easy to get last column
s=$NF; p=$0 save last column value and entire line for printing later
NF==1 if line doesn't contain :
$0 ~ s if line contains the last column data saved previously
if search data can contain regex meta characters, use index($0,s) instead to search literally
note that this code assumes input file having line containing : followed by line which doesn't have :
With GNU sed (might work with other versions too, syntax might differ though)
$ sed -nE '/:/{N; /.*:(.*)\n.*\1/p}' ip.txt
#1:N:0:ABC
ABC
/:/ if line contains :
N add next line to pattern space
/.*:(.*)\n.*\1/ capture string after last : and check if it is present in next line
again, this assumes input like shown in question.. this won't work for cases like
#1:N:0:ABC
#1:N:0:XYZ
XYZ
This might work for you (GNU sed):
sed -n 'N;/.*:\(.*\)\n.*\1/p;D' file
Use grep-like option -n to explicitly print lines. Read two lines into the pattern space and print both if they meet the requirements. Always delete the first and repeat.
If you actual Input_file is same as shown example then following may help you too here.
awk -v FS="[: \n]" -v RS="" '$(NF-1)==$NF' Input_file
EDIT: Adding 1 more solution as per Sundeep suggestion too here.
awk -v FS='[:\n]' -v RS= 'index($NF, $(NF-1))' Input_file

Sed Match Number followed by string and return Number

Hi i have a file containing the following:
7 Y-N2
8 Y-H
9 Y-O2
I want to match it with the following sed command and get the number at the beginning of the line:
abc=$(sed -n -E "s/([0-9]*)(^[a-zA-Z])($j)/\1/g" file)
$j is a variable and contains exactly Y-O2 or Y-H.
The Number is not the linenumber.
The Number is always followed by a Letter.
Before the Number are Whitespaces.
echoing $abc returns a whiteline.
Thanks
many problems here:
there are spaces, you don't account for them
the ^ must be inside the char class to make a negative letter
you're using -n option, so you must use p command or nothing will ever be printed (and the g option is useless here)
working command (I have changed -E by -n because it was unsupported by my sed version, both should work):
sed -nr "s/ *([0-9]+) +([^a-zA-Z])($j)/\1/p" file
Note: awk seems more suited for the job. Ex:
awk -v j=$j '$2 == j { print $1 }' file
Sed seems to be overly complex for this task, but with awk you can write:
awk -vk="$var" '$2==k{print $1}' file
With -vk="$var" we set the awk variable k to the value of the $var shell variable.
Then, we use the 'filter{command}' syntax, where the filter $2==k is that the second field is equal to the variable k. If we have a match, we print the first field with {print $1}.
Try this:
abc=$(sed -n "s/^ *\([0-9]*\) *Y-[OH]2*.*/\1/p" file)
Explanations:
^ *: in lines starting with any number of spaces
\([0-9]*\): following number are captured using backreference
*: after any number of spaces
Y-[OH]2*: search for Y- string followed by N or H with optional 2
\1/p: captured string \1 is output with p command

sed: display lines before a match

Using sed looking for the last lines before matching lines:
echo -e "aaa\nbbb\nccc\naaa\nccc\naaa\nbbb\nccc" | sed '/aaa/!d' | sed '$!d' #In which order and amount of aaa, bb, ccc, ..., nnn is optional
The example above works well. The second method:
echo -e "aaa\nbbb\nccc\naaa\nccc\naaa\nbbb\nccc" | sed -e '/aaa/!d' -e '$!d'
or:
echo -e "aaa\nbbb\nccc\naaa\nccc\naaa\nbbb\nccc" | sed -e '/aaa/!d;$!d'
The second method does not want me to work. The wikipedia someone wrote that sed can be combined. I do not want to work. What I'm doing wrong and I understand? How should properly look like?
This might work for you (GNU sed):
sed '/aaa/h;$!d;x' file
To catch the last match you must store it in the hold space then retrieve it at the end of the file.
What is the desired output? The first command gives a single line aaa. The second and third commands give no output. There's a solid reason for the discrepancy in the behaviour.
In the first command, you have:
sed '/aaa/!d' | sed '$!d'
The first sed here deletes each line that is not aaa. The output (3 lines containing aaa) is then filtered so that only the last line is printed.
In the second and third commands (which are equivalent), you have:
sed -e '/aaa/!d' -e '$!d'
The first operand deletes each line that is not aaa and starts the next cycle. The second operand deletes every remaining aaa because none of them is on the last line of input (the last line in the input is ccc, which has already been deleted by virtue of not being aaa). So the output you see is exactly what you should expect.
If you want just one aaa, consider using:
grep '^aaa$' | uniq
Though that's a long-winded way of writing:
echo aaa
Presumably, though, this is a simplified version of the real situation (which is a good thing).
$ in the address means the last line. It does not change even if the last line is not being printed because of a previous command. In the pipeline, though, only the printed lines get to the second invocation of sed, and $ again means the last line - now only from the lines printed by the previous sed invocation.