sed replace word delimiter with multiple occurences of word

sed replace word delimiter with multiple occurences of word - replace

kinda new with sed. I made a script to replace various text in a file. As an example, file test.txt contains:
My name is <Jack>.
My dad calls me <Jack>. My mum calls me <Jack>, too.
I want to replace "<" and ">" with ":". I used this command
sed -re 's/<(.+?)>/:\1:/g' test.txt
It returns
My name is :Jack:.
My dad calls me :Jack>. My mum calls me <Jack:, too.
So, it works well with a single occurence in a line. The result is wrong with multiple occurrences in a line, because sed argument is all the text between the first "<" and the last ">".
Any hints? (And a little explaination, too...)
Thanks!
EDIT:
The same regular expression works correctly using replace in Gedit or other editors.

easiest:
kent$ echo "My name is <Jack>.
dquote> My dad calls me <Jack>. My mum calls me <Jack>, too."|sed 's/[<>]/:/g'
My name is :Jack:.
My dad calls me :Jack:. My mum calls me :Jack:, too.
if you want to use group:
kent$ echo "My name is <Jack>.
My dad calls me <Jack>. My mum calls me <Jack>, too."|sed -r 's/<([^>]*)>/:\1:/g'
My name is :Jack:.
My dad calls me :Jack:. My mum calls me :Jack:, too.
In your codes, you want to use non-greedy matching, unfortunately, sed doesn't support that. So the reason why you got your output is:
the whole
<Jack>. My mum calls me <Jack>
is like <....>
the .+ matches Jack>. My mum calls me <Jack

I update the example.
Here is test.html:
My name is Jack.
My dad calls me Jack. My mum calls me Jack, too.
This command give me the expected result:
sed -re 's/<a href="filename.html#[^>]*>([^<]*)<\/a>/:\1:/g' test.html
Result:
My name is :Jack:.
My dad calls me :Jack:. My mum calls me :Jack:, too.
sed search for the tag that starts with " (option [^>]), than search till ">". The argument is any char but "<" (option [^<]), than the delimiter is "".
Did I get it?

Related

Backreference with sed

I am trying to rearrange headers in my fasta file. I thought I could select the first 10 characters, then the rest of the line, and then backreference the first selection to move it to the end.
AY843768_1 Genus species 12S
would then be
Genus species 12S AY843768_1
I need to use sed for a learning exercise, but I am unsure how to make the selections to switch the 10-digit ID to the end of the line for every header in my file.
sed ‘s/^\(.\{10\}\)(.*$)/\2 \1/g' file1.fasta > file2.fasta

This will do it:
sed 's/^\(.\{10\}\) \(.*\)/\2 \1/' file1.fasta > file2.fasta
backslashes at the second braces where missing.
.* already matches the rest of the line so $ isn't needed
g isnt needed cause multiple matches per line arent possible if expression begins with ^
Anyway, using ERE instead of BRE makes it more readable:
sed -E 's/^(.{10}) (.*)/\2 \1/' file1.fasta > file2.fasta
OT: Same (or similar) can be achieved with bash only:
while read id rest;do echo "$rest $id";done <file1.fasta >file2.fasta

Is there a way to use sed to remove only the exact string match?

I have recently started learning bash and I ran into a problem doing an assignment, So I have a txt file and in it contains something like
foo:abc:200:1:1:1
foobar:asd:100:3:2:1
bar:test:100:2:2:2
where the first column is the title of the book followed by the author name followed by price,quantity available and qty sold all seperated with the delimiter ":"
the goal here is to remove a book base on the name and author the user types in.
I have searched around and found that sed might possibly be able to help me with this problem, I have tried to test sed by deleting base on the title alone with
sed /"foo"/d Book.txt
I expected the output to be
foobar:asd:100:3:2:1
bar:test:100:2:2:2
however the output was
bar:test:100:2:2:2
which tells me that any line in the txt file containing "foo" will get deleted
Hence I would like to ask
Is there any way to use sed so it deletes the exact match only instead of lines containing foo?
is there any way to use delimiters with sed so I can use both title and author?
Should I be using something other than sed?

Using sed it is better to use:
sed -E '/(^|:)foo(:|$)/d' file
foobar:asd:100:3:2:1
bar:test:100:2:2:2
Which makes sure foo is preceded by start or : and followed by end or :.
However this job is more suitable for awk as data is delimited by colon:
awk -F: '$1 != "foo"' file

Is there any way to use sed so it deletes the exact match only instead of lines containing foo?
Yes you can for the given example, if you mark your search pattern to match exactly foo: you can have luck deleting it. For e.g. if you do below
sed '/^foo:/d' file
The pattern ^ marks that the string starting with foo followed by a colon mark : which matches your use-case. This is assuming foo can be part of the fist column only
Is there any way to use delimiters with sed so I can use both title and author?
Should I be using something other than sed?
If you are dealing with a input file has a fixed de-limiter like : which will never form a part of your valid column content, then using awk/perl are better suited as they read text easily once a de-limiter is set.
As an example, consider an e.g. if you want to change the quantity name from fourth column for one particular book named foobar, with awk you can just do
awk -F: 'BEGIN { OFS = FS } $1 == "foobar" { $4 = 6 }1' input-file
To decode above line, the content within '..' are left untouched by the shell and passed literally to the command, that's why we wrap the content in single quotes. Also the statements inside it are not meaningful in the context of the shell.
So the -F: sets the input field-separator to : which is when the command reads the file line by line, the first line is broken down into tokens separated by :. The first column is labelled $1, which is extended up to $NF, meaning the last column of the line. The part BEGIN { OFS = FS } assigns the output field separator as the same as input i.e. retain the : de-limitation when awk writes the output also.
The part $1 == "foobar" { $4 = 6 } is almost self-explanatory in a sense, that if the first column contains the string within quotes do the action inside {..}, which is set the fourth column value as 6. The {..}1 is a short-hand notation for {...; print} which is to re-construct the line based on the output field/record separators defined.

This might work for you (GNU sed):
sed '/\<foo\>/d' file
Or
sed '/\bfoo\b/d' file
The first solution uses \< start word and \> end word. The second solution uses the \b word boundary.
P.S. The dual of \b is \B so to delete lines that contain foobar or foobaz but not foo only, use:
sed '/\bfoo\B/d' file

Translate cut code to grep regex code for locating and returning up to the first hyphen

I have a list of filenames similar to the following:
NAME - Something something something
ANOTHER NAME - More stuff
THIRD - This is a title
FOURTH - This is a title - With an extra hyphen
FIFTH NAME - And some more
What I would like is to grab just the names up to the first hyphen. That is, my results should be:
NAME
ANOTHER NAME
THIRD
FOURTH
FIFTH NAME
I was able to accomplish this via cut -d'-' -f1 but I was wondering what would be a way to translate this into a grep command?
I have tried expressions like grep -o "^[[:upper:]]* -" but I run into issues when there is a second hyphen contained in the name (e.g. FOURTH) and also with names that have more than word (e.g. ANOTHER NAME and FIFTH NAME).

This awk should do:
awk -F" -" '{print $1}' file
NAME
ANOTHER NAME
THIRD
FOURTH
FIFTH NAME
Or a sed version:
sed 's/ -.*//' file
NAME
ANOTHER NAME
THIRD
FOURTH
FIFTH NAME

You can use the extended flavour, use anchor to the beginning of the line and match every character until if finds a dash, like:
grep -oE '^[^-]*' infile
It yields:
NAME
ANOTHER NAME
THIRD
FOURTH
FIFTH NAME

Regex to replace a string in context but not the context

I am new to regex and want to do the following task:
I have a string say, JOHN.S and I would want to replace the period with tab. However, the replacement should only occur if the period is between two letters. Something that I don't want it to happen is to replace period in John, S. with a tab. Instead, I will just replace , with a tab, which I know how to do.
If I try to replace /[a-zA-Z]\.[a-zA-Z]/, then the surrounding letters will be removed but obviously I want to keep them. They should just be used to identify the context.
I have searched for a long time but have not come up a solution. More specifically, I am working with bash. So maybe sed is what I am going to use.
Thank you.

It is just a matter of catching the surrounding information with () and printing them back with \1, \2, etc:
sed -r 's/(\w)\.(\w)/\1\t\2/g' file
Using your syntax:
sed -r 's/([a-zA-Z])\.([a-zA-Z])/\1\t\2/g' file
Test
$ cat file
John, S.
JOHN.S
blabla
$ sed -r 's/(\w)\.(\w)/\1\t\2/g' file
John, S.
JOHN S
blabla

greping two regex at the same time

How to use grep to search for two regex at the same time. Say, I am looking for "My name is" and "my bank account " in a text like:
My name is Mike. I'm 16 years old.
I have no clue how to solve my grep problem,but
if I manage to solve it, then I'll transfer
you some money from my bank account.
I'd like grep to return:
My name is
my bank account
Is it possible to do it with just one grep call or should I write a script to do that for me?

If you do not care about a trailing newline, simply use grep:
< file.txt grep -o "My name is\|my bank account" | tr '\n' ' '
If you would prefer a trailing newline, use awk:
awk -v RS="My name is|my bank account" 'RT != "" { printf "%s ", RT } END { printf "\n" }' file.txt

I'm not quite sure what you're after. The result you give doesn't seem to fit with anything grep can/will do. In particular, grep is line oriented, so if it finds a match in a line, it includes that entire line in the output. Assuming that's what you really want, you can just or the two patterns together:
grep ("My name is" | "my bank account")
Given the input above, this should produce:
My name is Mike. I'm 16 years old.
you some money from my bank account.
Alternatively, since you haven't included any meta-characters in your patterns, you could use fgrep (or grep -F) and put your patterns in a file, one per line. For two patterns this probably doesn't make a big difference, but if you want to look for lots of patterns, it'll probably be quite a bit faster (it uses the Aho-Corasick string search to search for all the patterns at once instead of searching for them one at a time).
The other possibility would be that you're looking for a single line that includes both my name is and my bank account. That's what #djechlin's answer would do. From the input above, that would produce no output, so I doubt it's what you want, but if it is, his answer is fairly reasonable. An alternative would be a pattern like ("My name is.*my bank account" | "my bank account.*My name is").

Yes. It is possible. I used sed. You can replace S1 and S2 with whatever you want
sed '/S1/{ s:.*:S1:;H};/S2/{ s:.*:S2:;H};${x;s:\n: :g;p};d'
Sed is much more complex than grep, and in this case I used it to simulate grep's behaviour that you wish.

pipe. grep expr1 file | grep expr2
for or - egrep '(expr1|expr2)' file

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js