Matching pattern containing parentheses with sed [duplicate] - regex

This question already has answers here:
Whether to escape ( and ) in regex using GNU sed
(4 answers)
Closed 4 years ago.
I need to insert '--' at the beginning of the line if line contains word VARCHAR(1000)
Sample of my file is:
TRIM(CAST("AP_RQ_MSG_TYPE_ID" AS NVARCHAR(1000))) AP_RQ_MSG_TYPE_ID,
TRIM(CAST("AP_RQ_PROCESSING_CD" AS NVARCHAR(1000)))
AP_RQ_PROCESSING_CD, TRIM(CAST("AP_RQ_ACQ_INST_ID" AS NVARCHAR(11)))
AP_RQ_ACQ_INST_ID, TRIM(CAST("AP_RQ_LOCAL_TXN_TIME" AS NVARCHAR(10)))
AP_RQ_LOCAL_TXN_TIME, TRIM(CAST("AP_RQ_LOCAL_TXN_DATE" AS
NVARCHAR(10))) AP_RQ_LOCAL_TXN_DATE, TRIM(CAST("AP_RQ_RETAILER" AS
NVARCHAR(11))) AP_RQ_RETAILER,
I used this command
sed 's/\(^.*VARCHAR\(1000\).*$\)/--\1/I' *.sql
But the result is not as expected.
Does anyone have idea what am I doing wrong?

this should do:
sed 's/.*VARCHAR(1000).*/--&/' file
The problem in your sed command is at the regex part. By default sed uses BRE, which means, the ( and ) (wrapping the 1000) are just literal brackets, you should not escape them, or you gave them special meaning: regex grouping.
The first and last (..) you have escaped, there you did right, if you want to reference it later by \1. so your problem is escape or not escape. :)

Use the following sed command:
sed '/VARCHAR(1000)/ s/.*/--\0/' *.sql
The s command appplies to all lines containing VARCHAR(1000). It then replaces the whole line .* by itself \0 with -- in front.

Through awk,
awk '/VARCHAR\(1000\)/ {sub (/^/,"--")}1' infile > outfile

Related

Remove everything after a changing string [duplicate]

This question already has an answer here:
How to get first N parts of a path?
(1 answer)
Closed 2 years ago.
I'm having some trouble with the following problem;
As input, I get a few lines of paths to files as follows:
root/child/abc/somefile.txt
root/child/def/123/somefile.txt
root/child/ghijklm/somefile.txt
The root/child piece is always in the path, everything after can differ.
I would like to remove everything after the grandchild folder. So the output would be:
root/child/abc/
root/child/def/
root/child/ghijklm/
I've tried the following:
sed 's/\/child\/.*/\/child\/.*/'
But of course that would just give the following output:
root/child/.*
root/child/.*
root/child/.*
Any help would be appreciated!
with cut:
cut -d\/ -f1,2,3 file
With awk: Could you please try following, written and tested with shown samples in GNU awk.
awk 'match($0,/root\/child\/[^/]*/){print substr($0,RSTART,RLENGTH)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/root\/child\/[^/]*/){ ##Using match function to match root/child/... till next / in current line.
print substr($0,RSTART,RLENGTH) ##printig substring from RSTART to till RLENGTH.
}
' Input_file ##Mentioning Input_file name here.
With sed:
sed 's/.*\(root\/child\/[^/]*\).*/\1/' Input_file
Explanation: Using sed's substitution method to match root/child/ till next occurrence of / and saving it into temp buffer(back reference method) and substituting whole line with only matched back referenced value.
This might work for you (GNU sed):
sed -E 's/^(([^/]*[/]){3}).*/\1/' file
Delete everything after the third group of non-forward-slashes/slash.
You were close.
sed 's%\(/child/[^/]*\)/.*%\1%'
The regex [^/]* matches as many characters as possible which are not a slash; then we replace the entire match with just the part we captured in parentheses, effectively trimming off the rest.
With Perl:
perl -pe 's{ ^ ( ( [^/]+ / ){3} ) .* $ }{$1}x' in_file > out_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
The regex uses this modifier:
x : Disregard whitespace and comments, for readability.
The substitution statement, explained:
^ : beginning of the line.
$ : end of the line.
[^/]+ / : one or more characters that are not slashes (/), followed by a slash.
( [^/]+ / ){3} : one or more non-slash characters, followed by a slash, repeated exactly 3 times.
( ( [^/]+ / ){3} ) : the above, with parenthesis to capture the matched part into the first capture variable, $1, to be used later in the substitution. Capture groups are counted left to right.
.* : zero or more occurrences of any character.
s{THIS}{THAT} : replace THIS with THAT.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start

Regexp delete using sed works on regex101 but not with sed [duplicate]

This question already has answers here:
sed multiline delete with pattern
(2 answers)
Closed 2 years ago.
I need to remove strings from a text file that matches a REGEX pattern, using regex101 my pattern match works fine, but when I execute using sed, nothing gets deleted and for some reason the regex is not working:
https://regex101.com/r/oLNrDB/1/
I need to remove all occurrences of all text including newlines between the following 2 strings:
DELIMITER ;;
some text with newlines
DELIMITER ;
The sed command I am using is:
sed '/DELIMITER ;;[\S\s]*?DELIMITER ;/d' myfile.sql;
but the output is identical to the input file, what am I doing wrong ?
The problem here is that sed reads files line-by-line and applies the pattern to each line separately. In your case, this means that the one pattern can't match both the starting and finishing delimiter because no one line contains them both.
The sed way of doing this is to use a range with the delete command, /start pattern/,/end pattern/d, which means delete all lines between the start pattern and end pattern inclusive. For example
$ cat foo.txt
some text before
DELIMITER ;;
some text with newlines
DELIMITER ;
some text after
$ sed '/DELIMITER ;;/,/DELIMITER ;/d' foo.txt
some text before
some text after

Replace string variable with string variable using Sed [duplicate]

This question already has answers here:
"sed" special characters handling
(3 answers)
Is it possible to escape regex metacharacters reliably with sed
(4 answers)
Escape a string for a sed replace pattern
(17 answers)
Closed 5 years ago.
I have a file called ethernet containing multiple lines. I have saved one of these lines as a variable called old_line. The contents of this variable looks like this:
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="2r:11:89:89:9g:ah", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"
I have created a second variable called new_line that is similar to old_line but with some modifications in the text.
I want to substitute the contents of old_line with the contents of new_line using sed. So far I have the following, but it doesn't work:
sed -i "s/${old_line}/${new_line}/g" ethernet
You need to escape your oldline so that it contains no regex special characters, luckily this can be done with sed.
old_line=$(echo "${old_line}" | sed -e 's/[]$.*[\^]/\\&/g' )
sed -i -e "s/${old_line}/${new_line}/g" ethernet
Since ${old_line} contains many regex special metacharacters like *, ? etc therefore your sed is failing.
Use this awk command instead that uses no regex:
awk -v old="$old_line" -v new="$new_line" 'p=index($0, old) {
print substr($0, 1, p-1) new substr($0, p+length(old)) }' ethernet

Just give me the words between the "" [duplicate]

This question already has answers here:
Getting values between quotes
(2 answers)
Closed 9 years ago.
I have text lines like this
blahblah"word1"blahblah"word2"blahblah"word3"
I only want the text between the quotes and without the quotes. I could do an awk and us the " as a separator. And then get every second match. However, is there any way I can just use awk (or another command) to return words between sets of quotes? so I'd get back word1, word2, word3?
Thanks,
Here you go:
echo 'blahblah"word1"blahblah"word2"blahblah"word3"' | perl -ne 'print map("$_\n", m/"([^"]*)"/g)'
Depends which language you're using, but the regular expression to do this would be:
(?<=^(("[^"]*){2})*")[^"]+(?=")
That example will match everything between "s. if you want it to match only words between "s, use:
(?<=^(("[^"]*){2})*")\b+(?=")
The main difference is with the second example, spaces and most special characters will not be allowed. With the first example, all characters except for "s will be allowed between the "s. That includes new lines.
Non-robust, but fun:
sed -E 's/(^|")[^"]*("|$)/ /g'

Matching zero or more characters in sed

I was practicing some commands using sed when I was confused by the output of the following command:
echo 'first:second' | sed 's_[^:]*_(&)_g'
My question is: Why would this command only wrap the string "first" and "second" in parentheses?
Shouldn't the colon be wrapped too since I specified "zero or more non-colons" in my regex condition?
Please clarify.
You use
[^:]
which searches all characters except :.
So what you experience is the normal comportment.