Matching zero or more characters in sed

Matching zero or more characters in sed - regex

I was practicing some commands using sed when I was confused by the output of the following command:
echo 'first:second' | sed 's_[^:]*_(&)_g'
My question is: Why would this command only wrap the string "first" and "second" in parentheses?
Shouldn't the colon be wrapped too since I specified "zero or more non-colons" in my regex condition?
Please clarify.

You use
[^:]
which searches all characters except :.
So what you experience is the normal comportment.

Related

regex in sed removing only the first occurrence from every line

I have the following file I would like to clean up
cat file.txt
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
My desired output is:
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
I would like to remove everything between ":" and the first occurence of "or"
I tried sed 's/MNS:d*?or /MNS:/g' though it removes the second "or" as well.
I tried every option in https://www.geeksforgeeks.org/sed-command-in-linux-unix-with-examples/
to no avail. should I create alias sed='perl -pe'? It seems that sed does not properly support regex

perl should be more suitable here because we need Lazy match logic here.
perl -pe 's|(:.*?or +)(.*)|:\2|' Input_file
by using .*?or we are checking for the first nearest match for or string in the line.

This might work for you (GNU sed):
sed '/:.*\<or\>/{s/\<or\>/\n/;s/:.*\n//}' file
If a line contains : followed by the word or, then substitute the first occurrence of the word or with a unique delimiter (e.g.\n) and then remove everything between : and the unique delimiter.

Wrt I would like to remove everything between ":" and the first occurence of "or" - no you wouldn't. The first occurrence of or in the 2nd line of sample input is as the start of orweqqwe. That text immediately after : looks like it could be any set of characters so couldn't it contain a standalone or, e.g. MNS:2 or eqqwe or M+ GYPA*02 or GYPA*N
Given that and the fact it's apparently a fixed number of characters to be removed on every line, it seems like this is what you should really be using:
$ sed 's/:.\{14\}/:/' file
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr

If it is sure the or always occurs twice a line as provided example, please try:
sed 's/\(MNS:\).\+ or \(.\+ or .*\)/\1\2/' file.txt
Result:
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
Otherwise using perl is a better solution which supports the shortest match as RavinderSingh13 answers.

ex supports lazy matching with \{-}:
ex -s '+%s/:\zs.\{-}or //g|wq' input_file
The pattern :\zs.\{-}or matches any character after the first : up to the first or.

Select a single character in an alphanumeric string in bash

I have an issue with string manipulation in bash. I have a list of names, each name being composed of two parts, chars and numbers: for example
abcdef01234
I want to cut the last character before the numeric part starts, in this case
f
I think there is a regular expression to help me with this but just can't figure it out. AWK/sed solutions are accepted too. Hope someone can help.
Thank you.

In bash it can be done with parameter expansion with substring removal and string indexes, e.g.,
a=abcdef01234 # your string
tmp=${a%%[0-9]*} # remove all numbers from right
echo ${tmp:(-1)} # output last of remaining chars
Output: f

You can use a regexp like [a-zA-Z]+([a-zA-Z])[0-9]+. If you know how to use sed is pretty easy.
Check https://regex101.com/r/XCkKM5/1

The match will be the letter you want.
^\w+([a-zA-Z])\d+$
As a sed command (on OSX) this will be :
echo "abcdef12345" | sed -E "s#^[a-zA-Z]+([a-zA-Z])[0-9]+\$#\1#"

try following too once.
echo "abcdef01234" | awk '{match($0,/[a-zA-Z]+/);print substr($0,RLENGTH,1)}'

I have a list of names I assume is a file, file. Using grep's PCRE and (positive) lookahead:
$ grep -oP "[a-z](?=[^a-z])" file
f
It prints out the first (lowercase) letter followed by a non-(lowercase)-letter.

Matching pattern containing parentheses with sed [duplicate]

This question already has answers here:
Whether to escape ( and ) in regex using GNU sed
(4 answers)
Closed 4 years ago.
I need to insert '--' at the beginning of the line if line contains word VARCHAR(1000)
Sample of my file is:
TRIM(CAST("AP_RQ_MSG_TYPE_ID" AS NVARCHAR(1000))) AP_RQ_MSG_TYPE_ID,
TRIM(CAST("AP_RQ_PROCESSING_CD" AS NVARCHAR(1000)))
AP_RQ_PROCESSING_CD, TRIM(CAST("AP_RQ_ACQ_INST_ID" AS NVARCHAR(11)))
AP_RQ_ACQ_INST_ID, TRIM(CAST("AP_RQ_LOCAL_TXN_TIME" AS NVARCHAR(10)))
AP_RQ_LOCAL_TXN_TIME, TRIM(CAST("AP_RQ_LOCAL_TXN_DATE" AS
NVARCHAR(10))) AP_RQ_LOCAL_TXN_DATE, TRIM(CAST("AP_RQ_RETAILER" AS
NVARCHAR(11))) AP_RQ_RETAILER,
I used this command
sed 's/\(^.*VARCHAR\(1000\).*$\)/--\1/I' *.sql
But the result is not as expected.
Does anyone have idea what am I doing wrong?

this should do:
sed 's/.*VARCHAR(1000).*/--&/' file
The problem in your sed command is at the regex part. By default sed uses BRE, which means, the ( and ) (wrapping the 1000) are just literal brackets, you should not escape them, or you gave them special meaning: regex grouping.
The first and last (..) you have escaped, there you did right, if you want to reference it later by \1. so your problem is escape or not escape. :)

Use the following sed command:
sed '/VARCHAR(1000)/ s/.*/--\0/' *.sql
The s command appplies to all lines containing VARCHAR(1000). It then replaces the whole line .* by itself \0 with -- in front.

Through awk,
awk '/VARCHAR\(1000\)/ {sub (/^/,"--")}1' infile > outfile

Extract string located after or between matched pattern(s)

Given a string "pos:665181533 pts:11360 t:11.360000 crop=720:568:0:4 some more words"
Is it possible to extract string between "crop=" and the following space using bash and grep?
So if I match "crop=" how can I extract anything after it and before the following white space?
Basically, I need "720:568:0:4" to be printed.

I'd do it this way:
grep -o -E 'crop=[^ ]+' | sed 's/crop=//'
It uses sed which is also a standard command. You can, of course, replace it with another sequence of greps, but only if it's really needed.

I would use sed as follows:
echo "pos:665181533 pts:11360 t:11.360000 crop=720:568:0:4 some more words" | sed 's/.*crop=\([0-9.:]*\)\(.*\)/\1/'
Explanation:
s/ : substitute
.*crop= : everything up to and including "crop="
\([0-9.:]\) : match only numbers and '.' and ':' - I call this the backslash-bracketed expression
\(.*\) : match 'everything else' (probably not needed)
/\1/ : and replace with the first backslash-bracketed expression you found

I think this will work (need to recheck my reference):
awk '/crop=([0-9:]*?)/\1/'

yet another way with bash pattern substitution
PAT="pos:665181533 pts:11360 t:11.360000 crop=720:568:0:4 some more words"
RES=${PAT#*crop=}
echo ${RES%% *}
first remove all up to and including crop= found from left to right (#)
then remove all from and including the first space found from right to left (%%)

using sed to copy lines and delete characters from the duplicates

I have a file that looks like this:
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
I want it to look like this
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
I thought I could use sed to do this but I can't figure out how to store something in a buffer and then modify it.
Am I even using the right tool?
Thanks

You don't have to get tricky with regular expressions and replacement strings: use sed's p command to print the line intact, then modify the line and let it print implicitly
sed 'p; s/\.png//'

Glenn jackman's response is OK, but it also doubles the rows which do not match the expression.
This one, instead, doubles only the rows which matched the expression:
sed -n 'p; s/\.png//p'
Here, -n stands for "print nothing unless explicitely printed", and the p in s/\.png//p forces the print if substitution was done, but does not force it otherwise

That is pretty easy to do with sed and you not even need to use the hold space (the sed auxiliary buffer). Given the input file below:
$ cat input
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
you should use this command:
sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
The result:
$ sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
This commands is just a replacement command (s///). It matches anything starting with #" followed by non-period chars ([^.]*) and then by .png",. Also, it matches all non-period chars before .png", using the group brackets \( and \), so we can get what was matched by this group. So, this is the to-be-replaced regular expression:
#"\([^.]*\)\.png",
So follows the replacement part of the command. The & command just inserts everything that was matched by #"\([^.]*\)\.png", in the changed content. If it was the only element of the replacement part, nothing would be changed in the output. However, following the & there is a newline character - represented by the backslash \ followed by an actual newline - and in the new line we add the #" string followed by the content of the first group (\1) and then the string ",.
This is just a brief explanation of the command. Hope this helps. Also, note that you can use the \n string to represent newlines in some versions of sed (such as GNU sed). It would render a more concise and readable command:
sed 's/#"\([^.]*\)\.png",/&\n#"\1",/' input

I prefer this over Carles Sala and Glenn Jackman's:
sed '/.png/p;s/.png//'
Could just say it's personal preference.

or one can combine both versions and apply the duplication only on lines matching the required pattern
sed -e '/^#".*\.png",/{p;s/\.png//;}' input

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Matching zero or more characters in sed - regex

You use [^:] which searches all characters except :. So what you experience is the normal comportment.

Related

regex in sed removing only the first occurrence from every line

Select a single character in an alphanumeric string in bash

Matching pattern containing parentheses with sed [duplicate]

Extract string located after or between matched pattern(s)

using sed to copy lines and delete characters from the duplicates

Categories

Resources