How to match and partial substitute with sed - regex

how can i match the substring "2153846-11" (composed sometimes by only numbers, like "2153846", sometimes like "2153846-11" or "2153846_11", sometimes like "2153846-1" always digits and in the first group no less then 5) inside the following:
"01/16/2015","2153846-11","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"
and substitute the matched string with the first group (before dash/underscore) removing the second one.
The final result will be:
"01/16/2015","2153846","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"
The instruction will be written a unique sed line like
sed -e 's/...//g' < myfile
Thanks

You can use this sed:
sed 's/"\([0-9]*\)[_-][0-9]*"/"\1"/g' file
"01/16/2015","2153846","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"

You could try the below sed command.
$ echo '"01/16/2015","2153846-11","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"' | sed -r 's/"(2153846)([_-]11)?"/"\1"/g'
"01/16/2015","2153846","2015-01-16 02:50:18.0","lch_demo_hidemi-19459072-2","","01/16/2015"

Related

How to get the release value?

I've a file with the below name formats:
rzp-QAQ_SA2-5.12.0.38-quality.zip
rzp-TEST-5.12.0.38-quality.zip
rzp-ASQ_TFC-5.12.0.38-quality.zip
I want the value as: 5.12.0.38-quality.zip from the above file names.
I'm trying as below, but not getting the correct value though:
echo "$fl_name" | sed 's#^[-[:alpha:]_[:digit:]]*##'
fl_name is the variable containing the file name.
Thanks a lot in advance!
You are matching too much with all the alpha, digit - and _ in the same character class.
You can match alpha and - and optionally _ and alphanumerics
sed -E 's#^[-[:alpha:]]+(_[[:alnum:]]*-)?##' file
Or you can shorten the first character class, and match a - at the end:
sed -E 's#^[-[:alnum:]_]*-##' file
Output of both examples
5.12.0.38-quality.zip
5.12.0.38-quality.zip
5.12.0.38-quality.zip
With GNU grep you could try following code. Written and tested with shown samples.
grep -oP '(.*?-){2}\K.*' Input_file
OR as an alternative use(with a non-capturing group solution, as per the fourth bird's nice suggestion):
grep -oP '(?:[^-]*-){2}\K.*' Input_file
Explanation: using GNU grep here. in grep program using -oP option which is for matching exact matched values and to enable PCRE flavor respectively in program. Then in main program, using regex (.*?-){2} means, using lazy match till - 2 times here(to get first 2 matches of - here) then using \K option which is to make sure that till now matched value is forgotten and only next mentioned regex matched value will be printed, which will print rest of the values here.
It is much easier to use cut here:
cut -d- -f3- file
5.12.0.38-quality.zip
5.12.0.38-quality.zip
5.12.0.38-quality.zip
If you want sed then use:
sed -E 's/^([^-]*-){2}//' file
5.12.0.38-quality.zip
5.12.0.38-quality.zip
5.12.0.38-quality.zip
Assumptions:
all filenames contain 3 hyphens (-)
the desired result always consists of stripping off the 1st two hyphen-delimited strings
OP wants to perform this operation on a variable
We can eliminate the overhead of sub-process calls (eg, grep, cut and sed) by using parameter substitution:
$ f1_name='rzp-ASQ_TFC-5.12.0.38-quality.zip'
$ new_f1_name="${f1_name#*-}" # strip off first hyphen-delimited string
$ echo "${new_f1_name}"
ASQ_TFC-5.12.0.38-quality.zip
$ new_f1_name="${new_f1_name#*-}" # strip off next hyphen-delimited string
$ echo "${new_f1_name}"
5.12.0.38-quality.zip
On the other hand if OP is feeding a list of file names to a looping construct, and the original file names are not needed, it may be easier to perform a bulk operation on the list of file names before processing by the loop, eg:
while read -r new_f1_name
do
... process "${new_f1_name)"
done < <( command-that-generates-list-of-file-names | cut -d- -f3-)
In plain bash:
echo "${fl_name#*-*-}"
You can do a reverse of each line, and get the two last elements separated by "-" and then reverse again:
cat "$fl_name"| rev | cut -f1,2 -d'-' | rev
A Perl solution capturing digits and characters trailing a '-'
cat f_name | perl -lne 'chomp; /.*?-(\d+.*?)\z/g;print $1'

Deleting everything between two string matches in a file

I got this text in file.txt:
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v2=6990226111024612869; tt_webid=6990226111024612869; tt_csrf_token=VD5Nb_TQFH4RKhoJeSe2nzLB; R6kq3TV7=AHkh4PB6AQAA3LIS90nWf2ss0Q7ZTCQjUat4axctvhQY68DdUEz92RwpmVSX|1|0|e9d6917c2fe555827dcf5ee916ba9778079ab2a9; ttwid=1%7CAFodeNF0iZM2fyy-ZeiZ6HTpZoG_MSx6SmXHgGVQ-V4%7C1627538859%7C59ca1e4a56f9f537b55e655a6dabff88e44eb48502b164ed6b4199f5a5263cb0; passport_csrf_token_default=6f7653c3ce946a6ce5444723fb0c509b; passport_csrf_token=6f7653c3ce946a6ce5444723fb0c509b; sid_guard=0483b7d37f4e4bd20ab3046e29724798%7C1627538893%7C5184000%7CMon%2C+27-Sep-2021+06%3A08%3A13+GMT; uid_tt=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; uid_tt_ss=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; sid_tt=0483b7d37f4e4bd20ab3046e29724798; sessionid=0483b7d37f4e4bd20ab3046e29724798; sessionid_ss=0483b7d37f4e4bd20ab3046e29724798; store-idc=maliva; store-country-code=us; odin_tt=294845c8f7711db177f7c549a9f44edb1555031b27a2a485df809cd92c4e544ac0772bf462df5b7a100f6e488c45303cd62df3b6b950f0842520cd887850137b035d990f29cc8b752765e594560c977f; cmpl_token=AgQQAPNSF-RMpbE89z5HYF0_-2PcrxjXf4fZYP5_ZA
How can I delete everything from the string inside ( first & only instance ) from :tt_ to _ZA in file.txt keeping only Osmun.Prez#mail.com:c7lB2m6b#3.a.a using bash linux?
Thank you
Something like:
sed -i "s/:tt_.*//" file.txt
if you want to edit the file in place. If not, remove the -i switch.
The sed command means: replace (s), in each line of file.txt, all the chars (.*) starting by the pattern :tt_ with an empty string (//).
Or the command:
sed -i "s/:tt_.*_ZA//" file.txt
which is more adherent to what you ask for, but returns the same output.
Use pattern substitution:
i=$(cat file.txt)
echo "${i/:tt*_ZA}"
Assuming the general requirement is to remove everything after the 2nd : ...
Sample data:
$ cat file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v ... to end of line
some.one#home.com:B52_m6b#9_az.more.stuff:delete from here ... to end of line
One sed idea:
$ sed -En 's/^([^:]*:[^:]*).*$/\1/p' file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
some.one#home.com:B52_m6b#9_az.more.stuff
Using awk
awk 'BEGIN{FS=OFS=":"}{print $1,$2}'
Using : as the delimiter, it is easy to extract the columns before :tt
This deletes all chars from ":tt_" to the last "_ZA", inclusive, in file.txt
Mac_3.2.57$cat file.txt | sed 's/\(\)[:]tt.*_ZA\(.*\)/\1\2/'
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
Mac_3.2.57$
Or if it is always the first 2 values which are separated by colon (as per you example)
cat file.txt | cut -f1,2 -dā€™:ā€™

Regex Pattern Replace

So i wanted to replace the following
<duration>89</duration>
with
(Expected Result or at least Shoud become this:)
\n<duration>89</duration>
so basically replace every < with \n< in regex So i figured.
sed -e 's/<[^/]/\n</g'
Only problem it obviously outputs
\n<uration>89</duration>
Which brings me to my question. How can i tell regex to mach for a character which follows < (is not /) but stop it from replacing it so i can get my expected result?
Try this:
sed -e 's/<[^/]/\\n&/g' file
or
sed -e 's/<[^/]/\n&/g' file
&: refer to that portion of the pattern space which matched
It can be nicely done with awk:
echo '<duration>89</duration>' | awk '1' RS='<' ORS='\n<'
RS='<' sets the input record separator to<`
ORS='\n<' sets the output record separator to\n<'
1 always evaluates to true. An true condition without an subsequent action specified tells awk to print the record.
echo "<duration>89</duration>" | sed -E 's/<([^\/])/\\n<\1/g'
should do it.
Sample Run
$ echo "<duration>89</duration>
> <tag>Some Stuff</tag>"| sed -E 's/<([^\/])/\\n<\1/g'
\n<duration>89</duration>
\n<tag>Some Stuff</tag>
Your statement is kind of correct with one small problem. sed replaces entire pattern, even any condition you have put. So, [^/] conditional statement also gets replaced. What you need is to preserve this part, hence you can try any of the following two statements:
sed -e 's/<\([^/]\)/\n<\1/g' file
or as pointed by Cyrus
sed -e 's/<[^/]/\n&/g' file
Cheers!
echo '<duration>89</duration>' | awk '{sub(/<dur/,"\\n<dur")}1'
\n<duration>89</duration>

Sed or Awk or Perl substitution in a sentence

I need to make a substitution using Sed or other program. I have these patterns <ehh> <mmm> <mhh> repeated at the beginning of a sentences and I need to substitute for nothing.
I am trying this:
echo "$line" | sed 's/<[a-zA-z]+>//g'
But I get the same result, nothing changes. Anyone can help?
Thank you!
For me, for the test file
<ahh> test
<mmm>test 1
the following
sed 's/^<[a-zA-Z]\+>//g' testfile
produces
test
test 1
which seems to be what you want. Note that for basic regular expressions, you use \+ whereas for extended regular expressions, you use + (and need to use the -r switch for sed).
NB: I added a ^to the check since you said: at the beginning of the line.
echo '<ehh> <mmm> <mhh>blabla bla' | \
sed '^Js/^\([[:space:]]*\<[a-zA-Z]\{3\}\>\)\{1,\}//'
remove all starting occurence of your pattern (including heading space)
I escape & to be sure due to sed meaning of this character in pattern (work without on my AIX)
I don't use g because it remove several occurence of full pattern and there is only 1 begin (^) and use a multi occurence counter with group instead \(\)\{1,\}
If the goal is to get the last parameter from lines like this:
<ahh> test
<mmm>test 1
You can do:
awk -F\; '/^<[[:alpha:]]+&gt/ {print $NF}' <<< "$line"
test
test 1
It will search for pattern <[[:alpha:]]+&gt and print last field on line, separated by ;

replace number in a string

I am trying to match this string
'12.34.5.6',#### OR
'12.34.5.6', #### (Note the space after the comma)
in a series of files and replace #### with 2222.
I started small and this command successfully changed 1234 to 2222
sed -i 's/'12.34.5.6\''\,1234/'12.34.5.6\''\, 2222/g' file.txt
so I moved on to work on replacing 1234 with regex, below are some of the commands i've tried but do not work.
sed -i 's/'12.34.5.6\''\,\(\s?[0-9]{4,5}\)/'12.34.5.6\''\, 2222/g' file.txt
sed -i 's/'12.34.5.6\''\,[0-9][0-9][0-9][0-9][0-9]?/'12.34.5.6\''\, 2222/g' file.txt
Can someone help me out with this or give some pointers?
sed -r "s/('12[.]34[.]5[.]6',[ ]?)[0-9]{4}/\\12222/g"
This might do the trick:
sed -E "s/('12.34.5.6',\s?)[0-9]{4,5}/\12222/g"
Examples:
$ echo "'12.34.5.6', 2134" | sed -E "s/('12.34.5.6',\s?)[0-9]{4,5}/\12222/g"
'12.34.5.6', 2222
$ echo "'12.34.5.6',9230" | sed -E "s/('12.34.5.6',\s?)[0-9]{4,5}/\12222/g"
'12.34.5.6',2222
Explications:
With -E we ask sed to use extended regex (but this is mainly a matter of taste), the beginning of the regex is fairly simple: '12.34.5.6', just match this same string. We then add a space, followed by a ? to indicate it is optionnal. This first part is enclosed in braces to be able to use this in the replacement pattern.
Then, we add the #'s to the pattern. I assumed you used #'s in place of numbers based on your attempts with [0-9]{4,5} and [0-9][0-9][0-9][0-9][0-9].
Finally, in the replacement pattern we use the previously matched first pair of braces with \1, and add our 2's: \12222 (which will replace the numbers (#'s), discarded in the process because not enclosed in the braces).
PS. Next time please format your question for better readability.
PPS. I think the real issue here is not the regex but the quote escaping in your regex. Maybe take look at [this question].