awk sed perl, replacing specific pattern within a range of lines - replace

I'm working in verilog and need to edit a specific line within a unique block, but am unsure of how to proceed
file.v
...
block1 block1(
.port1(port1),
.port2(port2),
);
block2 block2(
.(port2)(port2),
.(port3)(port3)
);
....
I need to somehow remove the " , " for port2 in block1. without modifying block2. There are also multiple blocks else where that contains port2.
block1 block1(
.port1(port1),
.port2(port2)
);
I've been trying ranges of awk and sed lines, but not getting the results to modify the file successfully. Any suggestions or solutions is much appreciated

This will remove any comma that occurs just before the end of a block (whitespace then );):
perl -0777 -pe 's/,(?=\s*\);)//g'
Notes:
-0777 causes perl to slurp all the input in as a single string. This is required because
we know there's newlines in between so we don't want to read line-by-line
there might be empty lines between the comma and the parentheses so reading by "paragraph" won't work either.
-p causes perl to print the input after modifications.
the regex is the trickiest part
it finds a comma and then looks ahead to match zero or more whitespace characters (includes spaces, tabs, newlines, etc) followed by a close parenthesis and a semicolon.
the lookahead text is not part of the matched text (lookaheads are known as "zero width assertions") -- the matched text will be just the comma
if there's a match, replace the comma with an empty string.
the g flag says do this globally in the string.

This might do the job for you
sed '/block1 block1/,/);/{s/\((port2)\),/\1/}' file.v

how about:
awk -v RS="" '/block1/{sub("port2),","port2)")}7' file

I guess you want to remove commas located after a closing paren ()) followed by a newline and a closing paren and a semicolon ();)?
In this case this might work for you:
sed -r ':a;N;s/\),\n\s*\);/)\n);/;P;D;ba'
| | | |---------| |---| | | |
| | | | | | | -- branch to label "a"
| | | | | | -- delete up to first newline of pattern space
| | | | | -- print up to first newline of pattern space
| | | | -- replace pattern
| | | -- search pattern
| | -- substitute
| -- read next line into pattern space (append)
-- branch label "a"

Related

RegEx, Substituting a variable number of replacements

Hopefully I'm missing something obvious.
I've got a file that contains some lines like:
| A | B | C |
|-----------|
Ignore this line
| And | Ignore | This |
| D | E | F | G |
|---------------|
I want to find the |----| lines, remove those... and replace all of the | characters with a ^ in the preceding line. e.g.
^ A ^ B ^ C ^
Ignore this line
| And | Ignore | This |
^ D ^ E ^ F ^ G ^
So far I've got:
perl -0pe 's/^(\|.*\|)\n\|-+\|/$1/mg'
This takes input from stdin (some other modifications have already happened with sed)... and it's using -0 and /m to support multiline replacements.
The match seems to be correct, and it removes the |----| lines, but I can't see how I can do the | to ^ substitution with the $1 (or \1) backreference.
I can't remember where I did it before, but another language allowed me to use ${1/A/B} to substitute A to B, but that's upsetting perl.
And I've been wondering if this is where /e or /ee could be used, but I'm not familiar enough with perl on how to do that.
You can use
perl -0pe 's{^(.*)\R\|-+\|$\R?}{$1 =~ s,\|,^,gr}gme' t
Details:
^(.*)\R\|-+\|$\R? - matches all occurrences (see the g flag at the end)
^ - start of a line (note the m flag that makes ^ match start of a line and $ match end of a line)
(.*) - Group 1: whole line
\R - a line break sequence
\| - | char
-+ - one or more - chars
\| - a | char
$ - end of line
\R? - an optional line break sequence.
Once the match is found, all | are replaced with ^ using $1 =~ s,\|,^,gr, that replaces inside the Group 1 value. This syntax is enabled with the e flag.
I could see this being done using 2 substitutions:
\|(?=.*[\r\n]+\|-+\|$)
https://regex101.com/r/x7d15d/1/
And then:
^\|-+\|(?:[\r\n]+|$)
https://regex101.com/r/ZdEzuM/1/
With one pattern that checks the next line in a lookahead assertion:
perl -0pe 's/\|(?=.*\R\|-+\|$)(?:\R.*)?/^/gm' file
If you absolutely want to use an evaluation, you can put a transliteration in the replacement part with this pattern:
perl -0pe 's#^(.*)\R\|-+\|$#$1=~y/|/^/r#gme' file

PowerShell -replace with multiple occurrences next to each other in the line

I have a | delimited file and I have some data where for null values it has a space. So, in my data file I'll have something like this:
2080| | | | | | | | | | | | | |2000225
I tried this:
-replace '\| \|', '||'
but it matches pairs of | and still leaves the space when it's done between |. I'm just not really good with regex and totally new to Powershell.
2080|| || || ....|2000225
I'm not sure if recursion would solve this or if I'm going to need to write a short Java program to do it.
You can use the regex-based -replace operator as follows:
PS> ' |2080| | | | | | | | | | | | | |2000225| ' -replace ' (\||$)', '$1'
|2080||||||||||||||2000225|
This assumes that no non-empty fields have trailing spaces - if they do, their (last) trailing space will be removed; to avoid this, use the appropriate solution from Wiktor Stribiżew's helpful answer.
Regex (\||$) matches a single space char. followed by either a literal | (escaped as \|) or (|) the end of the string ($); $1 in the replacement string then replaces whatever the 1st capture group ((...)) matched; that is, if the space char. was followed by literal |, it is effectively replaced with just |; if it was followed by the end of the string, it is effectively removed.
A slight simplification is to use a positive lookahead assertion ((?=...)), as also used in Wiktor's answer, which captures the space character only, and therefore allows omission of the substitution-text -replace operand, which defaults to the empty string and therefore effectively removes the spaces:
PS> ' |2080| | | | | | | | | | | | | |2000225| ' -replace ' (?=\||$)'
|2080||||||||||||||2000225|
Using -replace with a regex based search, you may....
Remove all whitespace between two | chars:
$text -replace '(?<=\|)\s+(?=\|)'
To only remove spaces in between | and start/end of string
$text -replace '(?<=\||^)\s+(?=\||$)'
$text -replace '(?<![^|])\s+(?![^|])'
Remove all whitespace characters that are either followed with | or end of string
$text -replace '\s+(?=\||$)'
$text -replace '\s+(?![^|])'
Output: 2080||||||||||||||2000225. See the regex demo.
Details
\s+ - 1 or more whitespace characters
(?=\||$) - a positive lookahead that requires a | char (\|) or (|) end of string ($) immediately to the right of the current location.
(?![^|]) - a negative lookahead that fails the match if there is a char other than | immediately to the right of the current location.
You don't need to run a recursive function to do that. Just run it twice. The problem is that once you match | |, you are past the start of the next occurence. In the first pass, you leave all the ocurrences of | | | (so after the first match <| |> |, you will have | as starting point for new matches, which doesn't match) for the second one... of if you have more, you left without matching all the even occurences that are stuck together. If you run it only a second time, you'll match and change all those matches you left the first time. Run it a second time and you'll see that it works.
Just do:
PS> ' |2080| | | | | | | | | | | | | |2000225| ' -replace '| |', '||' -replace '| |', '||'
|2080||||||||||||||2000225|
You won't need more.

Regex for lines containing an odd number of pipe characters

I'm cleaning up a LaTeX file, and I'm in a situation where I need to distinguish absolute value |x| from the set "such that" symbol i.e. {x | x < 0}.
The first step for me is to find all lines containing an odd number of | characters (i.e. the pipe symbol).
In principle, I know how to do this, but I've tried the following regex command with no luck.
egrep '^[^\|]*\|([^\|]*\|[^\|]*\|)*[^\|]*$'
The idea is that a matching line contains, in order:
The line start
0 or more non-pipe characters
Exactly one pipe character
0 or more copies of text containing exactly 2 pipes
The line end
However, for some reason this isn't working.
I run the command on the following file:
\[
S = \{ x | x < 0}
y = |x|
\]
and none of the lines match.
I suspect I'm making a silly mistake somewhere, possibly to do with escaping the pipe characters,
but I'm stumped as to what's wrong.
Can anybody tell me either how to fix this, or provide an alternate expression which matches lines containing an odd number of pipe characters?
Inside the [], | is not a special character so should not be escaped by \. Try:
egrep '^[^|]*\|([^|]*\|[^|]*\|)*[^|]*$'
Better to use awk for this purpose:
awk -F '|' '!(NF%2)'
TESTING:
echo "a|bc|d|erg" | awk -F '|' '!(NF%2)'
OUTPUT:
a|bc|d|erg
echo "abc|d|ergxy" | awk -F '|' '!(NF%2)'
OUTPUT:
how about:
awk -F'|' 'NF&&(NF-1)%2' file
example:
kent$ cat file
|foo|bar
| | | | |
||||||
|||||||
kent$ awk -F'|' 'NF&&(NF-1)%2' file
| | | | |
|||||||
Perl, which is cross platform (Windows too) and generally installed everywhere these days, is my axe of choice:
perl -ne 'print if (s/\|/\|/g) %2 == 1' file
script.sed
#!/bin/sed -nf
# Save to hold
h
# Delete all non | chars
s#[^|]##g
# Odd match
/^\(||\)*|$/ {
# Fetch hold
g
s#^#odd\t:#
}
# Even match
/^\(||\)\+$/ {
# Fetch hold
g
s#^#even\t:#
}
# No match
/^$/ {
# Fetch hold
g
s#^#none\t:#
}
# Print
p
data.txt
do|odd
do|odd|match|me
|even match|me
do|even match|me
do|even match|also|me|please
no-match
shell
sed -nf script.sed data.txt
stdout
odd :do|odd
odd :do|odd|match|me
even :|even match|me
even :do|even match|me
even :do|even match|also|me|please
none :
none :no-match

regex mixed case excluding specific case

I need a regex able to match:
a) All combinations of lower-/upper-cases of a certain word
b) Except a couple of certain case-combinations.
I must search the bash thru thousands of source-code files, occurrences of miss-spelled variables.
Specifically, the word I'm searching for is FrontEnd which in our coding-style guide can be written exactly in 2 ways depending on the context:
FrontEnd (F and E upper)
frontend (all lower)
So I need to "catch" any occurences that do not follow our coding standards as:
frontEnd
FRONTEND
fRonTenD
I have been reading many tutorials of regex for this specific example and I cannot find a way to say "match this pattern BUT do not match if it is exactly this one or this other one".
I guess it would be similar to trying to match "any number between 000000 to 999999, except exactly the number 555555 or the number 123456", I suppose the logic is similar (of course I don't knot to do this either :) )
Thnx
Additional comment:
I cannot use grep piped to grep -v because I could miss lines; for example if I do:
grep -i frontend | grep -v FrontEnd | grep -v frontend
would miss a line like this:
if( frontEnd.name == 'hello' || FrontEnd.value == 3 )
because the second occurence would hide the whole line. Therefore I'm searching for a regex to use with egrep capable to do the exact match I need.
You won't be able to do this easily with egrep because it doesn't support lookaheads. It's probably easiest to do this with perl.
perl -ne 'print if /(?!frontend|FrontEnd)(?i)frontend/;'
To use just pipe the text through stdin
How this works:
perl -ne 'print if /(?!frontend|FrontEnd)(?i)frontend/;'
^ ^^ ^ ^ ^ ^ ^ ^ ^ The pattern that matches both the correct and incorrect versions.
| || | | | | | | This switch turns on case insensitive matching for the rest of the regular expression (use (?-i) to turn it off) (perl specific)
| || | | | | | The pattern that match the correct versions.
| || | | | | Negative forward look ahead, ensures that the good stuff won't be matched
| || | | | Begin regular expression match, returns true if match
| || | | Begin if statement, this expression uses perl's reverse if semantics (expression1 if expression2;)
| || | Print content of $_, which is piped in by -n flag
| || Evaluate perl code from command line
| | Wrap code in while (<>) { } takes each line from stdin and puts it in $_
| Perl command, love it or hate it.
This really should be a comment, but is there any reason you cannot use sed? I'm thinking something like
sed 's/frontend/FrontEnd/ig' input.txt
That is, of course, assuming you want to correct the deviant versions...

Extract multiple occurrences on the same line using sed/regex

I am trying to loop through each line in a file and find and extract letters that start with ${ and end with }. So as the final output I am expecting only SOLDIR and TEMP(from inputfile.sh).
I have tried using the following script but it seems it matches and extracts only the second occurrence of the pattern TEMP. I also tried adding g at the end but it doesn't help. Could anybody please let me know how to match and extract both/multiple occurrences on the same line ?
inputfile.sh:
.
.
SOLPORT=\`grep -A 4 '\[LocalDB\]' \${SOLDIR}/solidhac.ini | grep \${TEMP} | awk '{print $2}'\`
.
.
script.sh:
infile='inputfile.sh'
while read line ; do
echo $line | sed 's%.*${\([^}]*\)}.*%\1%g'
done < "$infile"
May I propose a grep solution?
grep -oP '(?<=\${).*?(?=})'
It uses Perl-style lookaround assertions and lazily matches anything between '${' and '}'.
Feeding your line to it, I get
$ echo "SOLPORT=\`grep -A 4 '[LocalDB]' \${SOLDIR}/solidhac.ini | grep \${TEMP} | awk '{print $2}'\`" | grep -oP '(?<=\${).*?(?=})'
SOLDIR
TEMP
This might work for you (but maybe only for your specific input line):
sed 's/[^$]*\(${[^}]\+}\)[^$]*/\1\t/g;s/$[^{$]\+//g'
Extracting multiple matches from a single line using sed isn't as bad as I thought it'd be, but it's still fairly esoteric and difficult to read:
$ echo 'Hello ${var1}, how is your ${var2}' | sed -En '
# Replace ${PREFIX}${TARGET}${SUFFIX} with ${PREFIX}\a${TARGET}\n${SUFFIX}
s#\$\{([^}]+)\}#\a\1\n#
# Continue to next line if no matches.
/\n/!b
# Remove the prefix.
s#.*\a##
# Print up to the first newline.
P
# Delete up to the first newline and reprocess what's left of the line.
D
'
var1
var2
And all on one line:
sed -En 's#\$\{([^}]+)\}#\a\1\n#;/\n/!b;s#.*\a##;P;D'
Since POSIX extended regexes don't support non-greedy quantifiers or putting a newline escape in a bracket expression I've used a BEL character (\a) as a sentinel at the end of the prefix instead of a newline. A newline could be used, but then the second substitution would have to be the questionable s#.*\n(.*\n.*)##, which might involve a pathological amount of backtracking by the regex engine.