PowerShell -replace with multiple occurrences next to each other in the line - regex

I have a | delimited file and I have some data where for null values it has a space. So, in my data file I'll have something like this:
2080| | | | | | | | | | | | | |2000225
I tried this:
-replace '\| \|', '||'
but it matches pairs of | and still leaves the space when it's done between |. I'm just not really good with regex and totally new to Powershell.
2080|| || || ....|2000225
I'm not sure if recursion would solve this or if I'm going to need to write a short Java program to do it.

You can use the regex-based -replace operator as follows:
PS> ' |2080| | | | | | | | | | | | | |2000225| ' -replace ' (\||$)', '$1'
|2080||||||||||||||2000225|
This assumes that no non-empty fields have trailing spaces - if they do, their (last) trailing space will be removed; to avoid this, use the appropriate solution from Wiktor Stribiżew's helpful answer.
Regex (\||$) matches a single space char. followed by either a literal | (escaped as \|) or (|) the end of the string ($); $1 in the replacement string then replaces whatever the 1st capture group ((...)) matched; that is, if the space char. was followed by literal |, it is effectively replaced with just |; if it was followed by the end of the string, it is effectively removed.
A slight simplification is to use a positive lookahead assertion ((?=...)), as also used in Wiktor's answer, which captures the space character only, and therefore allows omission of the substitution-text -replace operand, which defaults to the empty string and therefore effectively removes the spaces:
PS> ' |2080| | | | | | | | | | | | | |2000225| ' -replace ' (?=\||$)'
|2080||||||||||||||2000225|

Using -replace with a regex based search, you may....
Remove all whitespace between two | chars:
$text -replace '(?<=\|)\s+(?=\|)'
To only remove spaces in between | and start/end of string
$text -replace '(?<=\||^)\s+(?=\||$)'
$text -replace '(?<![^|])\s+(?![^|])'
Remove all whitespace characters that are either followed with | or end of string
$text -replace '\s+(?=\||$)'
$text -replace '\s+(?![^|])'
Output: 2080||||||||||||||2000225. See the regex demo.
Details
\s+ - 1 or more whitespace characters
(?=\||$) - a positive lookahead that requires a | char (\|) or (|) end of string ($) immediately to the right of the current location.
(?![^|]) - a negative lookahead that fails the match if there is a char other than | immediately to the right of the current location.

You don't need to run a recursive function to do that. Just run it twice. The problem is that once you match | |, you are past the start of the next occurence. In the first pass, you leave all the ocurrences of | | | (so after the first match <| |> |, you will have | as starting point for new matches, which doesn't match) for the second one... of if you have more, you left without matching all the even occurences that are stuck together. If you run it only a second time, you'll match and change all those matches you left the first time. Run it a second time and you'll see that it works.
Just do:
PS> ' |2080| | | | | | | | | | | | | |2000225| ' -replace '| |', '||' -replace '| |', '||'
|2080||||||||||||||2000225|
You won't need more.

Related

Regex to check the exact number of occurace of multiple characters

Wish to match a regex expression that matches with a sting when there is an exact number of occurrences of '3', '2', and '1' in a given string.
For instance, having a string "(((3x2x2+1)x2x2+1)x2+1)", I wish to have a regex expression to match exactly one occurrence of '3', five occurrences of '2', and three occurrences of '1'. If there would be more or less '3's or '2's or '1's, the regex shouldn't match.
I have a solution with positive and negative look aheads to do it in one regex. However, i must warn that it can become very messy very quickly. In this problem i would advise to simply count the occurrence of each number in your string without regexes.
That being said,
let's consider the following regex :
^(?=.*(?:1[^1]*){2})(?!.*(?:1[^1]*){3}).*$
^ and .*$ are there to say we want to match the whole string
?: just means we don't want to capture the group
?= is a positive lookahead, which means that our match must satisfy the condition .*(?:1[^1]*){2}
?! is a negative lookahead, which means that our match must NOT satisfy the condition .*(?:1[^1]*){3}
To summarize, you want to ALWAYS match the whole string if the positive lookahead condition is respected (digit 1 is present 2 times) and the negative lookahead condition is not (digit 1 present 3 times)
So in the example above 1*5*1 is matched, 11 is matched but 1*1*1 is not
So now, let's say you want your string to have exactly 3 '1', 1 '2', and 2 '3',
it will look like this
^(?=.*(?:1[^1]*){3})(?!.*(1[^1]*){4})(?=.*(?:2[^2]*){1})(?!.*(2[^2]*){2})(?=.*(?:3[^3]*){2})(?!.*(3[^3]*){3}).*$
It will match (1*2*1*3*1*3) but not (1*2*1*3*1*3*1) or (1*2*1*3*1)
Note that it matches (1*2*1*3*1*37) because there is a 3 in 37
Then again, i would advise against using this solution if you have too many numbers as you need to write a positive and a negative lookahead for each of your number.
I'd do it in 3 steps, as follows:
Mac_3.2.57$echo "(((3x2x2+1)x2x2+1)x2+1)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
(((3x2x2+1)x2x2+1)x2+1)
Mac_3.2.57$echo "(((3x2x2+1)x2x2+1)x2+0)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
Mac_3.2.57$echo "(((3x2x2+1)x2x2+1)x2+1+1)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
Mac_3.2.57$echo "(((3x2x2+1)x2x2+1)x0+1)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
Mac_3.2.57$echo "(((3x2x2+1)x2x2+1)x2+2+1)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
Mac_3.2.57$echo "(((0x2x2+1)x2x2+1)x2+1)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
Mac_3.2.57$echo "(((3X3x2x2+1)x2x2+1)x2+1)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
Mac_3.2.57$echo "(((2x2+1)x2x2+1)x2+1+3)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
(((2x2+1)x2x2+1)x2+1+3)
Mac_3.2.57$
You can check for each one separately and then combine the results.
\A[^3]*(3[^3]*){1}\z
\A[^2]*(2[^2]*){5}\z
\A[^1]*(1[^1]*){3}\z
Depending on your regular expressions engine, you may need to use ^ and $ instead of \A and \z.
I initially thought that it may be possible to combine them, but that would still match if there is one '3' and one '2', for example. You'll probably need code similar to this:
match = input.matches(/[123]/)
match = match && input.matches(\A[^3]*(3[^3]*){1}\z) if input.include?('3')
match = match && input.matches(\A[^2]*(2[^2]*){5}\z) if input.include?('2')
match = match && input.matches(\A[^1]*(1[^1]*){3}\z) if input.include?('1')
Note that the code above assumes that it's ok to have some but not all of these characters as long as the existing ones match the requirements.
It just needs a series of lookaheads to verify an exact number of specific characters.
This is a short version.
^(?=[^3]*3[^3]*$)(?=[^2]*(?:2[^2]*){5}$)(?=[^1]*(?:1[^1]*){3}$).+
https://regex101.com/r/2CJ2U6/1
^
(?= [^3]* 3 [^3]* $ ) # 1 of three
(?= # 5 of two
[^2]*
(?: 2 [^2]* ){5}
$
)
(?= # 3 of one
[^1]*
(?: 1 [^1]* ){3}
$
)
.+

RegEx, Substituting a variable number of replacements

Hopefully I'm missing something obvious.
I've got a file that contains some lines like:
| A | B | C |
|-----------|
Ignore this line
| And | Ignore | This |
| D | E | F | G |
|---------------|
I want to find the |----| lines, remove those... and replace all of the | characters with a ^ in the preceding line. e.g.
^ A ^ B ^ C ^
Ignore this line
| And | Ignore | This |
^ D ^ E ^ F ^ G ^
So far I've got:
perl -0pe 's/^(\|.*\|)\n\|-+\|/$1/mg'
This takes input from stdin (some other modifications have already happened with sed)... and it's using -0 and /m to support multiline replacements.
The match seems to be correct, and it removes the |----| lines, but I can't see how I can do the | to ^ substitution with the $1 (or \1) backreference.
I can't remember where I did it before, but another language allowed me to use ${1/A/B} to substitute A to B, but that's upsetting perl.
And I've been wondering if this is where /e or /ee could be used, but I'm not familiar enough with perl on how to do that.
You can use
perl -0pe 's{^(.*)\R\|-+\|$\R?}{$1 =~ s,\|,^,gr}gme' t
Details:
^(.*)\R\|-+\|$\R? - matches all occurrences (see the g flag at the end)
^ - start of a line (note the m flag that makes ^ match start of a line and $ match end of a line)
(.*) - Group 1: whole line
\R - a line break sequence
\| - | char
-+ - one or more - chars
\| - a | char
$ - end of line
\R? - an optional line break sequence.
Once the match is found, all | are replaced with ^ using $1 =~ s,\|,^,gr, that replaces inside the Group 1 value. This syntax is enabled with the e flag.
I could see this being done using 2 substitutions:
\|(?=.*[\r\n]+\|-+\|$)
https://regex101.com/r/x7d15d/1/
And then:
^\|-+\|(?:[\r\n]+|$)
https://regex101.com/r/ZdEzuM/1/
With one pattern that checks the next line in a lookahead assertion:
perl -0pe 's/\|(?=.*\R\|-+\|$)(?:\R.*)?/^/gm' file
If you absolutely want to use an evaluation, you can put a transliteration in the replacement part with this pattern:
perl -0pe 's#^(.*)\R\|-+\|$#$1=~y/|/^/r#gme' file

perl --> regex for searching for character | in a file

how to match a line containing character "|" using perl?
File:
1. Some header
2. | A| B| C| D| E| F|
I want to match with the line containing "|" character leaving the rest.
I tried below code but it didn't work.
if($line =~ /|/){
}
| is a meaningful character in regexes; it you want a litteral | character, you need to escape it with a backslash, so:
if($line =~ /\|/){
...
}

perl alternative for sed to split multiple |

I was able to accomplish this in sed command, but could not get it working in perl. Would like to add spaces between pipe characters that are close together without any spaces or alphanumerics.
input ==> a|123|##||||
expected output ==> a|123|##| | | |
This sed command works fine:
echo "a|123|##||||" | sed 's/\([^[:blank:][:alnum:]]\)|/\1 | /g'
output for above command ==> a|123|## | | | |
In perl, I could not get it working
echo "a|123|##||||" | perl -pe 's/\([^[:blank:][:alnum:]]\)|/\1 | /g'
with output for above command
| a | | | 1 | 2 | 3 | | | # | # | | | | | | | | |
To add space only between those | that come next to each other
echo "a|123|##||||" | perl -pe's/\|(?=\|)/\| /g'
I use a lookahead in order to be able to detect consecutive (and overlapping!) pairs, with more than two | strung together: Only the first one in a match is consumed so the second one stays there for the next match, in case there is yet another after it (again asserted with the lookahead).
Another way using both lookahead and lookbehind.
$ echo "a|123|##||||" | perl -pe's/(?<=\|)(?=\|)/ /g '
a|123|##| | | |
$
Correct Perl syntax would be:
echo "a|123|##||||" | perl -pe 's/([^\s\w])\|/$1 | /g'
Pipe character must be escaped
$1 is used for 1st group match

awk sed perl, replacing specific pattern within a range of lines

I'm working in verilog and need to edit a specific line within a unique block, but am unsure of how to proceed
file.v
...
block1 block1(
.port1(port1),
.port2(port2),
);
block2 block2(
.(port2)(port2),
.(port3)(port3)
);
....
I need to somehow remove the " , " for port2 in block1. without modifying block2. There are also multiple blocks else where that contains port2.
block1 block1(
.port1(port1),
.port2(port2)
);
I've been trying ranges of awk and sed lines, but not getting the results to modify the file successfully. Any suggestions or solutions is much appreciated
This will remove any comma that occurs just before the end of a block (whitespace then );):
perl -0777 -pe 's/,(?=\s*\);)//g'
Notes:
-0777 causes perl to slurp all the input in as a single string. This is required because
we know there's newlines in between so we don't want to read line-by-line
there might be empty lines between the comma and the parentheses so reading by "paragraph" won't work either.
-p causes perl to print the input after modifications.
the regex is the trickiest part
it finds a comma and then looks ahead to match zero or more whitespace characters (includes spaces, tabs, newlines, etc) followed by a close parenthesis and a semicolon.
the lookahead text is not part of the matched text (lookaheads are known as "zero width assertions") -- the matched text will be just the comma
if there's a match, replace the comma with an empty string.
the g flag says do this globally in the string.
This might do the job for you
sed '/block1 block1/,/);/{s/\((port2)\),/\1/}' file.v
how about:
awk -v RS="" '/block1/{sub("port2),","port2)")}7' file
I guess you want to remove commas located after a closing paren ()) followed by a newline and a closing paren and a semicolon ();)?
In this case this might work for you:
sed -r ':a;N;s/\),\n\s*\);/)\n);/;P;D;ba'
| | | |---------| |---| | | |
| | | | | | | -- branch to label "a"
| | | | | | -- delete up to first newline of pattern space
| | | | | -- print up to first newline of pattern space
| | | | -- replace pattern
| | | -- search pattern
| | -- substitute
| -- read next line into pattern space (append)
-- branch label "a"