How to exclude patterns in regex conditionally in bash? - regex

This is the content of input.txt:
hello=123
1234
stack=(23(4))
12341234
overflow=345
=
friends=(987)
Then I'm trying to match all the lines with equal removing the external parenteses (if the line has it).
To be clear, this is the result I'm looking for:
hello=123
stack=23(4)
overflow=345
friends=987
I toughth in something like this:
cat input.txt | grep -Poh '.+=(?=\()?.+(?=\))?'
But does not returns nothing. What am I doing wrong? Do you have any idea to do this? I'm so interested.

Using awk:
awk 'BEGIN{FS=OFS="="} NF==2 && $1!=""{gsub(/^\(|\)$/, "", $2); print}' file
hello=123
stack=23(4)
overflow=345
friends=987

Here is an alternate way with sed:
sed -nr ' # Use n to disable default printing and r for extended regex
/.+=.+/ { # Look for lines with key value pairs separated by =
/[(]/!ba; # If the line does not contain a paren branch out to label a
s/\(([^)]+)\)/\1/; # If the line contains a paren find a subset and print that
:a # Our label
p # print the line
}' file
$ sed -nr '/.+=.+/{/[(]/!ba;s/\(([^)]+)\)/\1/;:a;p}' file
hello=123
stack=23(4)
overflow=345
friends=987

Related

How to use 'sed' to add dynamic prefix to each number in integer list?

How can I use sed to add a dynamic prefix to each number in an integer list?
For example:
I have a string "A-1,2,3,4,5", I want to transform it to string "A-1,A-2,A-3,A-4,A-5" - which means I want to add prefix of first integer i.e. "A-" to each number of the list.
If I have string like "B-1,20,300" then I want to transform it to string "B-1,B-20,B-300".
I am not able to use RegEx Capturing Groups because for global match they do not retain their value in subsequent matches.
When it comes to looping constructs in sed, I like to use newlines as markers for the places I have yet to process. This makes matching much simpler, and I know they're not in the input because my input is a text line.
For example:
$ echo A-1,2,3,4,5 | sed 's/,/\n/g;:a s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/; ta'
A-1,A-2,A-3,A-4,A-5
This works as follows:
s/,/\n/g # replace all commas with newlines (insert markers)
:a # label for looping
s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/ # replace the next marker with a comma followed
# by the prefix
ta # loop unless there's nothing more to do.
The approach is similar to #potong's, but I find the regex much more readable -- \([^0-9]*\) captures the prefix, \([^\n]*\) captures everything up to the next marker (i.e. everything that's already been processed), and then it's just a matter of reassembling it in the substitution.
Don't use sed, just use the other standard UNIX text manipulation tool, awk:
$ echo 'A-1,2,3,4,5' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
A-1,A-2,A-3,A-4,A-5
$ echo 'B-1,20,300' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
B-1,B-20,B-300
This might work for you (GNU sed):
sed -E ':a;s/^((([^-]+-)[^,]+,)+)([0-9])/\1\3\4/;ta' file
Uses pattern matching and a loop to replace a number following a comma by the first column prefix and that number.
Assuming this is for shell scripting, you can do so with 2 seds:
set string = "A1,2,3,4,5"
set prefix = `echo $string | sed 's/^\([A-Z]\).*/\1/'`
echo $string | sed 's/,\([0-9]\)/,'$prefix'-\1/g'
Output is
A1,A-2,A-3,A-4,A-5
With
set string = "B-1,20,300"
Output is
B-1,B-20,B-300
Could you please try following(if ok with awk).
awk '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i !~ /^A/&&$i !~ /\"A/){
$i="A-"$i
}
}
}
1' Input_file
if your data in 'd' file, tried on gnu sed:
sed -E 'h;s/^(\w-).+/\1/;x;G;:s s/,([0-9]+)(.*\n(.+))/,\3\1\2/;ts; s/\n.+//' d

How can I replace a character on specific strings on a file?

I have to replace the character '.' for an '_' but only on specific regions of the file (the function names), I have a file like this:
\name{function.name.something}
\usage{function.name.something(parameter.something, parameter2.something)}
I was thinking of using notepad++ or sed, and only replace on the captured groups, for example the first line would be:
\\name\{(.+)\}
and replace the with \\name\{\1\}
but with the group 1 (\1) having the dots replaced by underscores
I appreciate any help and thank you
Using gnu-awk:
awk -v FPAT='\\\\name{[^}]+}|\\S+' '{gsub(/\./, "_", $1)} 1' file
\name{function_name_something}
\usage{function.name.something(parameter.something, parameter2.something)}
FPAT='\\\\name{[^}]+}|\\S+' will parse each field using given regex here which is \name{...} OR some non-space string (default awk field).
More testing:
cat file
\name{function.name.something} abc.foo.bar
\usage{function.name.something(parameter.something, parameter2.something)}
awk -v FPAT='\\\\name{[^}]+}|\\S+' '{gsub(/\./, "_", $1)} 1' f
\name{function_name_something} abc.foo.bar
\usage{function_name_something(parameter_something, parameter2.something)}
Perl solution:
< file.txt perl -pe '
($n, $f) = /(\\name|\\usage)\{(.*?[}(])/
and s/\Q$n\E\{\Q$f\E/"$n\{" . ($f=~s=\.=_=gr)/e'
Needs Perl 5.14+, otherwise you have to write
($n, $f) = /(\\name|\\usage)\{(.*?[}(])/
and s/\Q$n\E\{\Q$f\E/"$n\{" . do { ($ff = $f) =~ s=\.=_=g; $ff }/e

regex - match exactly to a string portion in awk

I have a file where one column contains strings that are composed of characters separated by ,
example:
a123456, a54321, a12312
I need to find lines that contain a specific number in the comma separated list.
example: I want to find all lines that contain only a12345.
I tried to use the following:
awk ' $1~/a12345/ {print}'
but this prints out the line containing:
a123456, a54321, a12312
because the regex is matching the first 6 characters in a123456, I guess.
My question is, how can I make an regex that will only print out the lines that contain only an exact match?
$ awk '/(^|[^[:alnum:]])a12345([^[:alnum:]]|$)/' file
$ awk '/(^|[^[:alnum:]])a123456([^[:alnum:]]|$)/' file
a123456, a54321, a12312
With GNU awk you could use word-delimiters:
$ awk '/\<a12345\>/' file
$ awk '/\<a123456\>/' file
a123456, a54321, a12312
Try using word match of grep like below:
grep -w a123456 myfile.txt
if you need in field that just starts, then use something like:
egrep -w ^a123456 myfile.txt
With awk:
awk -F ',\\s*' '$1 == "a12345"' filename
To split the line along commas (optionally followed by whitespace) and select only those lines whose first field is exactly "a12345". This will work even if the field contains characters after "a12345" that count as a word boundary, which is to say that
a12345.foo, bar, baz
is filtered out.
If more than a single field is to be tested, then you'll have to test all fields:
awk -F ',\\s*' 'function check() { for(i = 1; i <= NF; ++i) { if($i == "a12345") return 1; } return 0 } check()' filename

AWK end of line sign in regular expressions

I have a simple awk script named "script.awk" that contains:
/\/some_simple_string/ { print $0;}
I'm using it to parse some file that contains:
(by using: cat file | awk -f script.awk)
14 catcat one_two/some_thing
15 catcat one_three/one_more_some_simple_string
16 dogdog one_two/some_simple_string_again
17 dogdog one_four/some_simple_string
18 qweqwe firefire/ppp
I want the script to only print the stroke that fully reflect "/some_simple_string[END_OF_LINE]" but not 2 or 3.
Is there any simple way to do it?
I think, the most appropriate way is to add end-of-line sigh to the regular expression.
So it will parse only strokes that starting with "/some.." and have a new line at the end of "..string[END_OF_LINE]"
Desired output:
17 dogdog one_four/some_simple_string
Sorry for confusion, I was asking for END OF LINE sign in regular expressions.
The correct answer is:
/\/some_simple_string$/ { print $0;}
You can always use:
/\/some_simple_string$/ { print $0 }
I.e. match not only "some_simple_string" but match "/some_simple_string" followed by the end of the line ($ is end of line in regex)
grep '\some_simple_string$' file | tail -n 1 should do the trick.
Or if you really want to use awk do awk '/\/some_simple_string/{x = $0}END{print x}'
To return just the last of a group of matches, ...
Store the line in a variable and print it in the END block.
/some_simple_string/ { x = $0 }
END{ print x }
To print all the matches that end with the string /some_simple_string using regular expression you need to anchor to the the end of the line using $. The most suitable tool for this job is grep:
$ grep '/some_simple_string$' file
In awk the command is much the same:
$ awk '/[/]some_simple_string$/' file
To print all lines after the matching you would do:
$ awk 'print_flag{print;f=0} /[/]some_simple_string$/{print_flag=1}' file
Or just combine grep and tail if it makes it clearer using context option -A to print the following lines:
$ grep -A1 '/some_simple_string$' file | tail -n 1
I sometimes find that the input records can have a trailing carriage return (\r).
Yes, I deal with both Windows and Linux text files.
So I add the following 'pre-processor' to my awk scripts:
1 == 1 { # preprocess all records
res = gsub("\r", "") # remove unwanted trailing char
if(res>0 && NR<100) { print "(removed stuff)" > "/dev/stderr" } # optional
}
more optimally, let FS do the work instead of having it perform unnecessary and unrelated field splitting (adding the \r bit for Windows/DOS completeness):
mawk '!_<NF' FS='[/]some_simple_string[\r]?$'
17 dogdog one_four/some_simple_string

Regex with sed, search across multiple lines

I'd like to concatenate a few lines, perform a regex match on them and print them.
I tried to do that with sed.
Namely, I used:
cat add | sed -rn '/FIRST_LINE_REGEX/,/LAST_LINE_REGEX/s/SOME_REGEX/&/p'
It prints only the lines that match SOME_REGEX while I expect it to concatenate the lines from the range between FIRST_LINE and LAST_LINE and print the concatenation if it matches SOME_REGEX.
When using '/FIRST_LINE_REGEX/,/LAST_LINE_REGEX/' each line is still processed separately, to concatenate lines you need to use the hold space or the N command to append the next line to the pattern space. Here is one option:
cat add | sed -rn '/FIRST_LINE_REGEX/{:a;N;/LAST_LINE_REGEX/{/SOME_REGEX/p;d};ba}'
Commented version:
cat add | sed -rn '/FIRST_LINE_REGEX/ { # if line matches /FIRST_LINE_REGEX/
:a # create label a
N # read next line into pattern space
/LAST_LINE_REGEX/ { # if line matches /LAST_LINE_REGEX/
/SOME_REGEX/p # print if line matches /SOME_REGEX/
d # return to start
}
ba # return to label a
}'
sed -n '/FIRST_LINE_REGEX/,/LAST_LINE_REGEX/p' add | sed -n '/FIRST_LINE_REGEX/ b check; H; $ b check; b; :check; x; /SOME_REGEX/p'
The motivation of the second pipe part comes from here: https://stackoverflow.com/a/6287105/992834
Edit: Amended for when SOME_REGEX is in between.