Deleting everything between two string matches in a file - regex

I got this text in file.txt:
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v2=6990226111024612869; tt_webid=6990226111024612869; tt_csrf_token=VD5Nb_TQFH4RKhoJeSe2nzLB; R6kq3TV7=AHkh4PB6AQAA3LIS90nWf2ss0Q7ZTCQjUat4axctvhQY68DdUEz92RwpmVSX|1|0|e9d6917c2fe555827dcf5ee916ba9778079ab2a9; ttwid=1%7CAFodeNF0iZM2fyy-ZeiZ6HTpZoG_MSx6SmXHgGVQ-V4%7C1627538859%7C59ca1e4a56f9f537b55e655a6dabff88e44eb48502b164ed6b4199f5a5263cb0; passport_csrf_token_default=6f7653c3ce946a6ce5444723fb0c509b; passport_csrf_token=6f7653c3ce946a6ce5444723fb0c509b; sid_guard=0483b7d37f4e4bd20ab3046e29724798%7C1627538893%7C5184000%7CMon%2C+27-Sep-2021+06%3A08%3A13+GMT; uid_tt=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; uid_tt_ss=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; sid_tt=0483b7d37f4e4bd20ab3046e29724798; sessionid=0483b7d37f4e4bd20ab3046e29724798; sessionid_ss=0483b7d37f4e4bd20ab3046e29724798; store-idc=maliva; store-country-code=us; odin_tt=294845c8f7711db177f7c549a9f44edb1555031b27a2a485df809cd92c4e544ac0772bf462df5b7a100f6e488c45303cd62df3b6b950f0842520cd887850137b035d990f29cc8b752765e594560c977f; cmpl_token=AgQQAPNSF-RMpbE89z5HYF0_-2PcrxjXf4fZYP5_ZA
How can I delete everything from the string inside ( first & only instance ) from :tt_ to _ZA in file.txt keeping only Osmun.Prez#mail.com:c7lB2m6b#3.a.a using bash linux?
Thank you

Something like:
sed -i "s/:tt_.*//" file.txt
if you want to edit the file in place. If not, remove the -i switch.
The sed command means: replace (s), in each line of file.txt, all the chars (.*) starting by the pattern :tt_ with an empty string (//).
Or the command:
sed -i "s/:tt_.*_ZA//" file.txt
which is more adherent to what you ask for, but returns the same output.

Use pattern substitution:
i=$(cat file.txt)
echo "${i/:tt*_ZA}"

Assuming the general requirement is to remove everything after the 2nd : ...
Sample data:
$ cat file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v ... to end of line
some.one#home.com:B52_m6b#9_az.more.stuff:delete from here ... to end of line
One sed idea:
$ sed -En 's/^([^:]*:[^:]*).*$/\1/p' file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
some.one#home.com:B52_m6b#9_az.more.stuff

Using awk
awk 'BEGIN{FS=OFS=":"}{print $1,$2}'
Using : as the delimiter, it is easy to extract the columns before :tt

This deletes all chars from ":tt_" to the last "_ZA", inclusive, in file.txt
Mac_3.2.57$cat file.txt | sed 's/\(\)[:]tt.*_ZA\(.*\)/\1\2/'
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
Mac_3.2.57$

Or if it is always the first 2 values which are separated by colon (as per you example)
cat file.txt | cut -f1,2 -dā€™:ā€™

Related

A sed command to swap first and last character of each line

I want to write a one liner sed command to swap first and last character of every line of file. The below shown command is not working
sed 's/\(.\)\(.+\)\(.\)/\3\2\1/' input.txt
I even tried adding start of line and end of line characters
sed 's/^\(.\)\(.+\)\(.\)$/\3\2\1/' input.txt
It doesn't seem to match anything in the file.
sed -E 's/(.)(.+)(.)/\3\2\1/' input.txt
You need to escape the +,
sed 's/^\(.\)\(.\+\)\(.\)$/\3\2\1/' input.txt
If you like to try some other, here is a gnu awk version
awk '{a=$1;$1=$NF;$NF=a}1' FS= OFS= input.txt
This sets a to the first character, then sets first to last and last to a
It needs gnu awk, since settings FS to nothing is not in standard awk
This works portable:
abcd | sed 's/^\(.\)\(.*\)\(.\)$/\3\2\1/'
you can use the .*. Prints
dbca
also works with the ad too, like
echo ad | sed 's/^\(.\)\(.*\)\(.\)$/\3\2\1/'
prints
da
The .+ isn't known for every sed e.g. for example it didn't work on OS X. Therefore I recommending to use .* or simulating the .+ with ..*, like
echo ad | sed 's/^\(.\)\(..*\)\(.\)$/\3\2\1/'
prints
ad #not swaps
echo 'are' | sed 's/\(.\)\(.*\)\(.\)/\3\2\1/'
No need of ^ nor $ becasue sed take the biggest possible by default (so the whole line)
use * instead of + because with the + you need at least a 3 char line to works where a 2 char line still should swap start and end.

Remove everything after 2nd occurrence in a string in unix

I would like to remove everything after the 2nd occurrence of a particular
pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?
My input would be
After-u-math-how-however
Output should be
After-u
Everything after the 2nd - should be stripped out. The regex should also match
zero occurrences of the pattern, so zero or one occurrence should be ignored and
from the 2nd occurrence everything should be removed.
So if the input is as follows
After
Output should be
After
Something like this would do it.
echo "After-u-math-how-however" | cut -f1,2 -d'-'
This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.
This might work for you (GNU sed):
sed 's/-[^-]*//2g' file
You could use the following regex to select what you want:
^[^-]*-\?[^-]*
For example:
echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"
Results:
After-u
#EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:
With GNU sed for -r
$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u
With GNU awk for gensub():
$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u
Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.
awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
Always print the 1st field (print $1), followed by:
If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).
This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:
$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u
awk '$0 = $2 ? $1 FS $2 : $1' FS=-
Result
After-u
After
This will do it in awk:
echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'

grep - regular expression - match till a specific word

Lets say I have a file with lines like this
abcefghijklxyz
abcefghijkl
I want to get only the string between abc and the end of the line. End of the line can be defined as the normal end of line or the string xyz.
My question is
How can I get only the matched string using grep and regular expressions? For example, the expected output for the two lines shown above would be
efghijkl
efghijkl
I don't want the starting and ending markers.
What I have tried till now
grep -oh "abc.*xyz"
I use Ubuntu 13.04 and Bash shell.
this line chops leading abc and ending xyz (if there was) away, and gives you the part you need:
grep -oP '^abc\K.*?(?=xyz$|$)'
with your example:
kent$ echo "abcefghijklxyz
abcefghijkl"|grep -oP '^abc\K.*?(?=xyz$|$)'
efghijkl
efghijkl
another example with xyz in the middle of the text:
kent$ echo "abcefghijklxyz
abcefghijkl
abcfffffxyzbbbxyz
abcffffxyzbbb"|grep -oP '^abc\K.*?(?=xyz$|$)'
efghijkl
efghijkl
fffffxyzbbb
ffffxyzbbb
Using sed:
sed -n '/abc/{s/.*abc\(.*\)/\1/;s/xyz.*//;p}' input
Produces:
efghijkl
efghijkl
Use a look-behind like this:
$ grep -Po '(?<=abc)[^x]*' file
efghijkl
efghijkl
It fetches everything after abc and until it finds a x.
Based on Kent's answer (not to copy, but for completeness) you can grep all within abc and xyz (or end of line):
$ grep -Po '(?<=abc).*(?=xyz|$)' file
efghijklxyz
efghijkl
Or you can just remove what you do not like:
awk '/^abc/{sub(/^abc/,x);sub(/xyz.*$/,x)}1' file
efghijkl
efghijkl
xyz.*$ represent everything from xyz to end of line.

cannot match multiple occurrences of character in sed regexp

I am trying to remove As at the end of line.
alice$ cat pokusni
SALALAA
alice$ sed -n 's/\(.*\)A$/\1/p' pokusni
SALALA
one A is removed just fine
alice$ sed -n 's/\(.*\)A+$/\1/p' pokusni
alice$ sed -n 's/\(.*\)AA*$/\1/p' pokusni
SALALA
multiple occurrences not:(
I am probably doing just some very stupid mistake, any help? Thanks.
Try this one 's/\(.*[^A]\)AA*$/\1/p'
Why + does not work:
Because it is just a normal character here.
Why 's/\(.*\)AA*$/\1/p' does not work:
Because the reg-ex engine is eager, so .* would consume as many as As except the final A specified in AA*. And A* will just match nothing.
This might work for you:
sed -n 's/AA*$//p' file
This replaces an A and zero or more A's at the end of line with nothing.
N.B.
sed -n 's/A*$//p file'
would produce the correct string however it would operate on every line and so produce false positives.
Using awk
awk '{sub(/AA$/,"A")}1' pokusni
SALALA
EDIT
Correct version, removing all A from end of line.
awk '{sub(/A*$/,x)}1' pokusni
You can use perl:
> echo "SALALAA" | perl -lne 'if(/(.*?)[A]+$/){print $1}else{print}'
SALAL

Using sed to find and replace within matched substrings

I'd like to use sed to process a property file such as:
java.home=/usr/bin/java
groovy-home=/usr/lib/groovy
workspace.home=/build/me/my-workspace
I'd like to replace the .'s and -'s with _'s but only up to the ='s token. The output would be
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
I've tried various approaches including using addresses but I keep failing. Does anybody know how to do this?
What about...
$ echo foo.bar=/bla/bla-bla | sed -e 's/\([^-.]*\)[-.]\([^-.]*=.*\)/\1_\2/'
foo_bar=/bla/bla-bla
This won't work for the case where you have more than 1 dot or dash one the left, though. I'll have to think about it further.
awk makes life easier in this case:
awk -F= -vOFS="=" '{gsub(/[.-]/,"_",$1)}1' file
here you go:
kent$ echo "java.home=/usr/bin/java
groovy-home=/usr/lib/groovy
workspace.home=/build/me/my-workspace"|awk -F= -vOFS="=" '{gsub(/[.-]/,"_",$1)}1'
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
if you really want to do with sed (gnu sed)
sed -r 's/([^=]*)(.*)/echo -n \1 \|sed -r "s:[-.]:_:g"; echo -n \2/ge' file
same example:
kent$ echo "java.home=/usr/bin/java
groovy-home=/usr/lib/groovy
workspace.home=/build/me/my-workspace"|sed -r 's/([^=]*)(.*)/echo -n \1 \|sed -r "s:[-.]:_:g"; echo -n \2/ge'
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
In this case I would use AWK instead of sed:
awk -F"=" '{gsub("\\.|-","_",$1); print $1"="$2;}' <file.properties>
Output:
java_home/usr/bin/java
groovy_home/usr/lib/groovy
workspace_home/build/me/my-workspace
This might work for you (GNU sed):
sed -r 's/=/\n&/;h;y/-./__/;G;s/\n.*\n//' file
"You wait ages for a bus..."
This works with any number of dots and hyphens in the line and does not require GNU sed:
sed 'h; s/.*=//; x; s/=.*//; s/[.-]/_/g; G; s/\n/=/' < data
Here's how:
h: save a copy of the line in the hold space
s: throw away everything before the equal sign in the pattern space
x: swap the pattern and hold
s: blow away everything after the = in the pattern
s: replaces dots and hyphens with underscores
G: join the pattern and hold with a newline
s: replace that newline with an equal to glue it all back together
Other way using sed
sed -re 's/(.*)([.-])(.*)=(.*)/\1_\3=\4/g' temp.txt
Output
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
In case there are more than .- on left hand side then this
sed -re ':a; s/^([^.-]+)([\.-])(.*)=/\1_\3=/1;t a' temp.txt