Replacing Part of Text Using Sed - regex

I have the following text file
Eif2ak1.aSep07
Eif2ak1.aSep07
LOC100042862.aSep07-unspliced
NADH5_C.0.aSep07-unspliced
LOC100042862.aSep07-unspliced
NADH5_C.0.aSep07-unspliced
What I want to do is to remove all the text starting from period (.) to the end.
But why this command doesn't do it?
sed 's/\.*//g' myfile.txt
What's the right way to do it?

You're missing a period there. You want:
s/\..*$//g

you can use awk or cut, since dots are your delimters.
$4 awk -F"." '{print $1}' file
Eif2ak1
Eif2ak1
LOC100042862
NADH5_C
LOC100042862
NADH5_C
$ cut -d"." -f1 file
Eif2ak1
Eif2ak1
LOC100042862
NADH5_C
LOC100042862
NADH5_C
easier than using regular expression.

Related

unix - pattern matching in file

so I have a file with the following:
username=jsmith
api=3434kjklj23j4l3kj4l34j3l4j
I would like to return using regular expression "jsmith" and "3434kjklj23j4l3kj4l34j3l4j"
I know the regular expression for it is:
(username=)(.*) > \2
(api=)(.*) > \2
however using grep or sed or awk. I can't seem to figure out the way to use them without return the entire line.
How would you go about doing that with a commandline command?
awk is made for this task:
awk -F= '{print$2}' file
If the file has other entries, you can limit the output with a condition:
awk -F= '$1=="username"||$1=="api"{print$2}' file
Here is one using bash, PCRE and positive lookbehind (where supported):
$ grep -Po "((?<=^username=)|(?<=^api=)).*" file
jsmith
3434kjklj23j4l3kj4l34j3l4j
ie. output everything that is preceeded by username= or api= that start the lines.
And one in awk:
$ awk 'sub(/^(username|api)=/,""){print}' file
jsmith
3434kjklj23j4l3kj4l34j3l4j
ie. print lines where preceeding ^username= or ^api= are removed first.
Since you want to see chess with the input game=chess, here some solutions without matching username= or api=
cut -d"=" -f2- file
# or
sed -n 's/[^=]*=//p' file
here's the answer that worked on the macos and RHEl7.
awk -F= '$1=="username"{print$2}' testfile.txt
awk -F= '$1=="api"{print$2}' testfile.txt
testfile.txt
username=user1
api=pass1
username=user2
api =pass2

sed replace AFTER match and retain

I've been racking my brains for hours on this, but it seems simple enough. I have a large list of strings similar to the ones below and would like to replace the hyphens only after the comma, to commas:
abc-d-ef,1-2-3-4
gh-ij,1-2-3-4
to this
abc-def,1,2,3,4
gh-ij,1,2,3,4
I can't use s/-/,/2g to replace from second occurrence as the data differs, and also though about using cut, but there must be a way to use sed with something like:
"s/\(,\).*-/\1,&/g"
Thank you
This is more suitable for awk as we can break all lines using comma as field separator:
awk 'BEGIN{FS=OFS=","} {gsub(/-/, OFS, $2)} 1' file
abc-d-ef,1,2,3,4
gh-ij,1,2,3,4
If you want sed solution only then use:
sed -E -e ':a' -e 's/([^,]+,[^-]+)-/\1,/g;ta' file
abc-d-ef,1,2,3,4
gh-ij,1,2,3,4
An awk proposal.
awk -F, '{sub(/d-ef/,"def")gsub(/-/,",",$2)}1' OFS=, file
abc-def,1,2,3,4
gh-ij,1,2,3,4

Get specific Text between Specific Tags

At the top of my HTML files, I have...
<H2>City</H2>
<P>Liverpool</P>
or
<H2>City</H2>
<P>Dublin</P>
I want to output the text between the tags straight after <H2>City</H2> instances. So in the examples above which are separate files, I want to print out Liverpool and in the second example, Dublin.
Looking at this thread, I try:
sed -e 's/City\(.*\)\/P/\1/'
which I hope would get me half way there... but that just prints out the entire file. Any ideas?
awk to the rescue! You need multi-char RS support though (gawk has it)
$ awk -F'[<>]' -v RS='<H2>City</H2>' 'NF{print $3}' file
another approach can be
$ awk 'c&&c--{sub(/<[^>]*>/,""); print} /<H2>City<\/H2>/{c=1}' file
find the next record after City and trim the angle brackets...
Try using the following regex :
(?s)(?<=City<\/H2>\n<P>).*?(?=<\/P>)
see regex demo / explanation
sed
sed -e 's/(?s)(?<=City<\/H2>\n<P>).*?(?=<\/P>)/'
I checked and the \s seem not work for spaces. You should use the newline character \n:
sed -e 's/<H2>City<\/H2>\n<P>\(.*\)<\/P>/\1/'
There is no need of use lookbehind (like above), that is an overkill.
With sed, you can use the n command to read next line after your pattern. Then just remove the tag to output your content:
sed -n '/<H2>City<\/H2>/n;s/ *<\/*P> *//gp;' file
I think this should work in your mac:
echo -e "<H2>City</H2>\n<P>Dublin</P>" |awk -F"[<>]" '/City/{getline;print $3}'
Dublin

Finding and replacing the last space at or before nth character works with sed but not awk, what am I doing wrong?

I have a string in a test.csv file like this:
here is my string
when I use sed it works just as I expect:
cat test.csv | sed -r 's/^(.{1,9}) /\1,/g'
here is,my string
Then when I use awk it doesn't work and I'm not sure why:
cat test.csv | awk '{gsub(/^(.{1,9}) /,","); print}'
,my string
I need to use awk because once I get this figured out I will be selecting only one column to split into two columns with the added comma. I'm using extended regex with sed, "-r" and was wondering how or if it's supported with awk, but I don't know if that really is the problem or not.
awk does not support back references in gsub. If you are on GNU awk, then gensub can be used to do what you need.
echo "here is my string" | awk '{print gensub(/^(.{1,9}) /,"\\1,","G")}'
here is,my string
Note the use of double \ inside the quoted replacement part. You can read more about gensub here.

awk: chop stuff off beginning of line according to regex

Say I have a few lines of output that look like this:
blah <foo> I want this
baz < nom> I want this too
bit <#hi> And this...
How do I use awk to chop off everything before, and including, the first ">" character on each line?
If you only have > character once you can do a simple sed substitution:
sed 's/.*>//' file
If there can be many the above greedy (*) will consume everything up to the last > character. In that case, you are better off doing:
sed 's/[^>]*>//' file
Lets not forget cut, this is what it was invented for:
cut -d\> -f2- file
This may do (if you have one >)
awk -F\> '{print $2}' file
I want this
I want this too
And this...
Using awk you can do:
awk '{sub(/^[^.]*>/, "");} 1' file
I want this
I want this too
And this...
Or using sed:
sed 's/^[^.]*>//' file
I want this
I want this too
And this...
try this :
awk -F">" '{print $1">"}' filename