cannot match multiple occurrences of character in sed regexp

cannot match multiple occurrences of character in sed regexp - regex

I am trying to remove As at the end of line.
alice$ cat pokusni
SALALAA
alice$ sed -n 's/\(.*\)A$/\1/p' pokusni
SALALA
one A is removed just fine
alice$ sed -n 's/\(.*\)A+$/\1/p' pokusni
alice$ sed -n 's/\(.*\)AA*$/\1/p' pokusni
SALALA
multiple occurrences not:(
I am probably doing just some very stupid mistake, any help? Thanks.

Try this one 's/\(.*[^A]\)AA*$/\1/p'
Why + does not work:
Because it is just a normal character here.
Why 's/\(.*\)AA*$/\1/p' does not work:
Because the reg-ex engine is eager, so .* would consume as many as As except the final A specified in AA*. And A* will just match nothing.

This might work for you:
sed -n 's/AA*$//p' file
This replaces an A and zero or more A's at the end of line with nothing.
N.B.
sed -n 's/A*$//p file'
would produce the correct string however it would operate on every line and so produce false positives.

Using awk
awk '{sub(/AA$/,"A")}1' pokusni
SALALA
EDIT
Correct version, removing all A from end of line.
awk '{sub(/A*$/,x)}1' pokusni

You can use perl:
> echo "SALALAA" | perl -lne 'if(/(.*?)[A]+$/){print $1}else{print}'
SALAL

Related

Deleting everything between two string matches in a file

I got this text in file.txt:
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v2=6990226111024612869; tt_webid=6990226111024612869; tt_csrf_token=VD5Nb_TQFH4RKhoJeSe2nzLB; R6kq3TV7=AHkh4PB6AQAA3LIS90nWf2ss0Q7ZTCQjUat4axctvhQY68DdUEz92RwpmVSX|1|0|e9d6917c2fe555827dcf5ee916ba9778079ab2a9; ttwid=1%7CAFodeNF0iZM2fyy-ZeiZ6HTpZoG_MSx6SmXHgGVQ-V4%7C1627538859%7C59ca1e4a56f9f537b55e655a6dabff88e44eb48502b164ed6b4199f5a5263cb0; passport_csrf_token_default=6f7653c3ce946a6ce5444723fb0c509b; passport_csrf_token=6f7653c3ce946a6ce5444723fb0c509b; sid_guard=0483b7d37f4e4bd20ab3046e29724798%7C1627538893%7C5184000%7CMon%2C+27-Sep-2021+06%3A08%3A13+GMT; uid_tt=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; uid_tt_ss=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; sid_tt=0483b7d37f4e4bd20ab3046e29724798; sessionid=0483b7d37f4e4bd20ab3046e29724798; sessionid_ss=0483b7d37f4e4bd20ab3046e29724798; store-idc=maliva; store-country-code=us; odin_tt=294845c8f7711db177f7c549a9f44edb1555031b27a2a485df809cd92c4e544ac0772bf462df5b7a100f6e488c45303cd62df3b6b950f0842520cd887850137b035d990f29cc8b752765e594560c977f; cmpl_token=AgQQAPNSF-RMpbE89z5HYF0_-2PcrxjXf4fZYP5_ZA
How can I delete everything from the string inside ( first & only instance ) from :tt_ to _ZA in file.txt keeping only Osmun.Prez#mail.com:c7lB2m6b#3.a.a using bash linux?
Thank you

Something like:
sed -i "s/:tt_.*//" file.txt
if you want to edit the file in place. If not, remove the -i switch.
The sed command means: replace (s), in each line of file.txt, all the chars (.*) starting by the pattern :tt_ with an empty string (//).
Or the command:
sed -i "s/:tt_.*_ZA//" file.txt
which is more adherent to what you ask for, but returns the same output.

Use pattern substitution:
i=$(cat file.txt)
echo "${i/:tt*_ZA}"

Assuming the general requirement is to remove everything after the 2nd : ...
Sample data:
$ cat file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v ... to end of line
some.one#home.com:B52_m6b#9_az.more.stuff:delete from here ... to end of line
One sed idea:
$ sed -En 's/^([^:]*:[^:]*).*$/\1/p' file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
some.one#home.com:B52_m6b#9_az.more.stuff

Using awk
awk 'BEGIN{FS=OFS=":"}{print $1,$2}'
Using : as the delimiter, it is easy to extract the columns before :tt

This deletes all chars from ":tt_" to the last "_ZA", inclusive, in file.txt
Mac_3.2.57$cat file.txt | sed 's/\(\)[:]tt.*_ZA\(.*\)/\1\2/'
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
Mac_3.2.57$

Or if it is always the first 2 values which are separated by colon (as per you example)
cat file.txt | cut -f1,2 -d’:’

Sed version extract

I am trying to extract the version number from a string. I am unable to find the exact regex to find what I need.
For eg -
1012-EPS-Test-OF-Something-1.3
I need sed to only extract 1.3 from the above line.
I have tried quite a few things until now something like but it is clearly not working out
sed 's/[^0-9.0-9]*//')

With your shown samples, easiest way could be. Simply print value of shell variable into awk program as input and then setting field separator as - and printing the last field value in it.
echo "$string" | awk -F'-' '{print $NF}'
2nd solution: In case you could have anything else also apart from version number in last field of your value(where - is field delimiter) then use match function of awk.
echo "$var" |
awk -F'-' 'match($NF,/[0-9]+(\.[0-9]+)*/){print substr($NF,RSTART,RLENGTH)}'
3rd solution: Using GNU grep try following once. Using \K option for GNU grep here. This will match everything till - and then mentioning \K will forget OR wouldn't consider that matched value for printing and will print all further matched value(with further mentioned regex).
echo "$var" | grep -oP '.*-\K\d+(\.\d+)*'

This should work in any grep:
s='1012-EPS-Test-OF-Something-1.3'
grep -Eo '[0-9]+(\.[0-9]+)+' <<< "$s"
1.3

This might work for you (GNU sed):
sed -n 's/.*[^0-9.]//p' file
The regexp is greedy and swallows the whole line .* then steps back a character at a time till the first match of [^0-9.], removes the front portion and prints the remainder.

You can use string manipulation to get the last part after -:
s='1012-EPS-Test-OF-Something-1.3'
s="${s##*-}"
See this online demo:
#!/bin/bash
s='1012-EPS-Test-OF-Something-1.3'
s="${s##*-}"
echo "$s"
# => 1.3
See 10.1. Manipulating Strings:
${string##substring}
    Deletes longest match of $substring from front of $string.

Regex Pattern Replace

So i wanted to replace the following
<duration>89</duration>
with
(Expected Result or at least Shoud become this:)
\n<duration>89</duration>
so basically replace every < with \n< in regex So i figured.
sed -e 's/<[^/]/\n</g'
Only problem it obviously outputs
\n<uration>89</duration>
Which brings me to my question. How can i tell regex to mach for a character which follows < (is not /) but stop it from replacing it so i can get my expected result?

Try this:
sed -e 's/<[^/]/\\n&/g' file
or
sed -e 's/<[^/]/\n&/g' file
&: refer to that portion of the pattern space which matched

It can be nicely done with awk:
echo '<duration>89</duration>' | awk '1' RS='<' ORS='\n<'
RS='<' sets the input record separator to<`
ORS='\n<' sets the output record separator to\n<'
1 always evaluates to true. An true condition without an subsequent action specified tells awk to print the record.

echo "<duration>89</duration>" | sed -E 's/<([^\/])/\\n<\1/g'
should do it.
Sample Run
$ echo "<duration>89</duration>
> <tag>Some Stuff</tag>"| sed -E 's/<([^\/])/\\n<\1/g'
\n<duration>89</duration>
\n<tag>Some Stuff</tag>

Your statement is kind of correct with one small problem. sed replaces entire pattern, even any condition you have put. So, [^/] conditional statement also gets replaced. What you need is to preserve this part, hence you can try any of the following two statements:
sed -e 's/<\([^/]\)/\n<\1/g' file
or as pointed by Cyrus
sed -e 's/<[^/]/\n&/g' file
Cheers!

echo '<duration>89</duration>' | awk '{sub(/<dur/,"\\n<dur")}1'
\n<duration>89</duration>

regex: not match a group rather than single characters

echo test.a.wav|sed 's/[^(.wav)]*//g'
.a.wav
What I want is to remove every character until it reaches the whole group .wav(that is, I want the result to be .wav), but it seems that sed would remove every character until it reaches any of the four characters. How to do the trick?

Groups do not work inside [], so the dot is part of the class as is the parens.
How about:
echo test.a.wav|sed 's/.*\(\.wav\)/\1/g'
Note, there may be other valid solutions, but you provide no context on what you are trying to do to determine what may be the best solution.

The feature you're requesting wouldn't be supported by sed (negative lookahead) but Perl does the trick.
$ echo 'test.a.wav' | perl -pe 's/^(?:(?!\.wav).)*//g'
.wav

Instead of regex, you can use awk like this:
echo test.a.wav.more | awk -F".wav" '{print FS$2}'
.wav.more
It splits the data with your pattern, then print pattern and the rest of the data.

This might work for you (GNU sed):
sed ':a;/^\.wav/!s/.//;ta;/./!d' file
or:
sed 's/\.wav/\n&/;s/^[^\n]*\n//;/./!d' file
N.B. This deletes the line if it is empty. If this is not wanted just remove /./!d from the above commands.

A sed command to swap first and last character of each line

I want to write a one liner sed command to swap first and last character of every line of file. The below shown command is not working
sed 's/\(.\)\(.+\)\(.\)/\3\2\1/' input.txt
I even tried adding start of line and end of line characters
sed 's/^\(.\)\(.+\)\(.\)$/\3\2\1/' input.txt
It doesn't seem to match anything in the file.

sed -E 's/(.)(.+)(.)/\3\2\1/' input.txt

You need to escape the +,
sed 's/^\(.\)\(.\+\)\(.\)$/\3\2\1/' input.txt

If you like to try some other, here is a gnu awk version
awk '{a=$1;$1=$NF;$NF=a}1' FS= OFS= input.txt
This sets a to the first character, then sets first to last and last to a
It needs gnu awk, since settings FS to nothing is not in standard awk

This works portable:
abcd | sed 's/^\(.\)\(.*\)\(.\)$/\3\2\1/'
you can use the .*. Prints
dbca
also works with the ad too, like
echo ad | sed 's/^\(.\)\(.*\)\(.\)$/\3\2\1/'
prints
da
The .+ isn't known for every sed e.g. for example it didn't work on OS X. Therefore I recommending to use .* or simulating the .+ with ..*, like
echo ad | sed 's/^\(.\)\(..*\)\(.\)$/\3\2\1/'
prints
ad #not swaps

echo 'are' | sed 's/\(.\)\(.*\)\(.\)/\3\2\1/'
No need of ^ nor $ becasue sed take the biggest possible by default (so the whole line)
use * instead of + because with the + you need at least a 3 char line to works where a 2 char line still should swap start and end.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

cannot match multiple occurrences of character in sed regexp - regex

Try this one 's/\(.[^A]\)AA$/\1/p' Why + does not work: Because it is just a normal character here. Why 's/\(.\)AA$/\1/p' does not work: Because the reg-ex engine is eager, so .* would consume as many as As except the final A specified in AA. And A will just match nothing.

This might work for you: sed -n 's/AA$//p' file This replaces an A and zero or more A's at the end of line with nothing. N.B. sed -n 's/A$//p file' would produce the correct string however it would operate on every line and so produce false positives.

Using awk awk '{sub(/AA$/,"A")}1' pokusni SALALA EDIT Correct version, removing all A from end of line. awk '{sub(/A*$/,x)}1' pokusni

You can use perl: > echo "SALALAA" | perl -lne 'if(/(.*?)[A]+$/){print $1}else{print}' SALAL

Related

Deleting everything between two string matches in a file

Sed version extract

Regex Pattern Replace

regex: not match a group rather than single characters

A sed command to swap first and last character of each line

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

cannot match multiple occurrences of character in sed regexp - regex

Try this one 's/\(.*[^A]\)AA*$/\1/p' Why + does not work: Because it is just a normal character here. Why 's/\(.*\)AA*$/\1/p' does not work: Because the reg-ex engine is eager, so .* would consume as many as As except the final A specified in AA*. And A* will just match nothing.

This might work for you: sed -n 's/AA*$//p' file This replaces an A and zero or more A's at the end of line with nothing. N.B. sed -n 's/A*$//p file' would produce the correct string however it would operate on every line and so produce false positives.

Using awk awk '{sub(/AA$/,"A")}1' pokusni SALALA EDIT Correct version, removing all A from end of line. awk '{sub(/A*$/,x)}1' pokusni

You can use perl: > echo "SALALAA" | perl -lne 'if(/(.*?)[A]+$/){print $1}else{print}' SALAL

Related

Deleting everything between two string matches in a file

Sed version extract

Regex Pattern Replace

regex: not match a group rather than single characters

A sed command to swap first and last character of each line

Categories

Resources

Try this one 's/\(.[^A]\)AA$/\1/p' Why + does not work: Because it is just a normal character here. Why 's/\(.\)AA$/\1/p' does not work: Because the reg-ex engine is eager, so .* would consume as many as As except the final A specified in AA. And A will just match nothing.

This might work for you: sed -n 's/AA$//p' file This replaces an A and zero or more A's at the end of line with nothing. N.B. sed -n 's/A$//p file' would produce the correct string however it would operate on every line and so produce false positives.