how to match a line containing character "|" using perl?
File:
1. Some header
2. | A| B| C| D| E| F|
I want to match with the line containing "|" character leaving the rest.
I tried below code but it didn't work.
if($line =~ /|/){
}
| is a meaningful character in regexes; it you want a litteral | character, you need to escape it with a backslash, so:
if($line =~ /\|/){
...
}
Related
I have a | delimited file and I have some data where for null values it has a space. So, in my data file I'll have something like this:
2080| | | | | | | | | | | | | |2000225
I tried this:
-replace '\| \|', '||'
but it matches pairs of | and still leaves the space when it's done between |. I'm just not really good with regex and totally new to Powershell.
2080|| || || ....|2000225
I'm not sure if recursion would solve this or if I'm going to need to write a short Java program to do it.
You can use the regex-based -replace operator as follows:
PS> ' |2080| | | | | | | | | | | | | |2000225| ' -replace ' (\||$)', '$1'
|2080||||||||||||||2000225|
This assumes that no non-empty fields have trailing spaces - if they do, their (last) trailing space will be removed; to avoid this, use the appropriate solution from Wiktor Stribiżew's helpful answer.
Regex (\||$) matches a single space char. followed by either a literal | (escaped as \|) or (|) the end of the string ($); $1 in the replacement string then replaces whatever the 1st capture group ((...)) matched; that is, if the space char. was followed by literal |, it is effectively replaced with just |; if it was followed by the end of the string, it is effectively removed.
A slight simplification is to use a positive lookahead assertion ((?=...)), as also used in Wiktor's answer, which captures the space character only, and therefore allows omission of the substitution-text -replace operand, which defaults to the empty string and therefore effectively removes the spaces:
PS> ' |2080| | | | | | | | | | | | | |2000225| ' -replace ' (?=\||$)'
|2080||||||||||||||2000225|
Using -replace with a regex based search, you may....
Remove all whitespace between two | chars:
$text -replace '(?<=\|)\s+(?=\|)'
To only remove spaces in between | and start/end of string
$text -replace '(?<=\||^)\s+(?=\||$)'
$text -replace '(?<![^|])\s+(?![^|])'
Remove all whitespace characters that are either followed with | or end of string
$text -replace '\s+(?=\||$)'
$text -replace '\s+(?![^|])'
Output: 2080||||||||||||||2000225. See the regex demo.
Details
\s+ - 1 or more whitespace characters
(?=\||$) - a positive lookahead that requires a | char (\|) or (|) end of string ($) immediately to the right of the current location.
(?![^|]) - a negative lookahead that fails the match if there is a char other than | immediately to the right of the current location.
You don't need to run a recursive function to do that. Just run it twice. The problem is that once you match | |, you are past the start of the next occurence. In the first pass, you leave all the ocurrences of | | | (so after the first match <| |> |, you will have | as starting point for new matches, which doesn't match) for the second one... of if you have more, you left without matching all the even occurences that are stuck together. If you run it only a second time, you'll match and change all those matches you left the first time. Run it a second time and you'll see that it works.
Just do:
PS> ' |2080| | | | | | | | | | | | | |2000225| ' -replace '| |', '||' -replace '| |', '||'
|2080||||||||||||||2000225|
You won't need more.
echo "xxabc jkl" | grep -onP '\w+(?!abc\b)'
1:xxabc
1:jkl
Why the result is not as below?
echo "xxabc jkl" | grep -onP '\w+(?!abc\b)'
1:jkl
The first string is xxabc which ending with abc.
I want to extract all characters which not ending with abc,why xxabc matched?
How to fix it,that is to say get only 1:jkl as output?
Why '\w+(?!abc\b)' can't work?
The \w+(?!abc\b) pattern matches xxabc because \w+ matches 1 or more word chars greedily, and thus grabs xxabc at once. Then, the negative lookahead (?!abc\b) makes sure there is no abc with a trailing word boundary immediately to the left of the current location. Since after xxabc there is no abc with a trailing word boundary, the match succeeds.
To match all words that do not end with abc using a PCRE regex, you may use
echo "xxabc jkl" | grep -onP '\b\w+\b(?<!abc)'
See the online demo
Details
\b - a leading word boundary
\w+ - 1 or more word chars
\b - a trailing word boundary
(?<!abc) - a negative lookbehind that fails the match if the 3 letters immediately to the left of the current location are abc.
Without pcregrep special features, you can do it adding a pipe to sed:
echo "xxabc jkl" | sed 's/[a-zA-Z]*abc//g' | grep -onE '[a-zA-Z]+'
or with awk:
echo "xxabc jkl" | awk -F'[^a-zA-Z]+' '{for(i=1;i<=NF;i++){ if ($i!~/abc$/) printf "%s: %s\n",NR,$i }}'
other approach:
echo "xxabc jkl" | awk -F'([^a-zA-Z]|[a-zA-Z]*abc\\>)+' '{OFS="\n"NR": ";if ($1) printf OFS;$1=$1}1'
I'm working in verilog and need to edit a specific line within a unique block, but am unsure of how to proceed
file.v
...
block1 block1(
.port1(port1),
.port2(port2),
);
block2 block2(
.(port2)(port2),
.(port3)(port3)
);
....
I need to somehow remove the " , " for port2 in block1. without modifying block2. There are also multiple blocks else where that contains port2.
block1 block1(
.port1(port1),
.port2(port2)
);
I've been trying ranges of awk and sed lines, but not getting the results to modify the file successfully. Any suggestions or solutions is much appreciated
This will remove any comma that occurs just before the end of a block (whitespace then );):
perl -0777 -pe 's/,(?=\s*\);)//g'
Notes:
-0777 causes perl to slurp all the input in as a single string. This is required because
we know there's newlines in between so we don't want to read line-by-line
there might be empty lines between the comma and the parentheses so reading by "paragraph" won't work either.
-p causes perl to print the input after modifications.
the regex is the trickiest part
it finds a comma and then looks ahead to match zero or more whitespace characters (includes spaces, tabs, newlines, etc) followed by a close parenthesis and a semicolon.
the lookahead text is not part of the matched text (lookaheads are known as "zero width assertions") -- the matched text will be just the comma
if there's a match, replace the comma with an empty string.
the g flag says do this globally in the string.
This might do the job for you
sed '/block1 block1/,/);/{s/\((port2)\),/\1/}' file.v
how about:
awk -v RS="" '/block1/{sub("port2),","port2)")}7' file
I guess you want to remove commas located after a closing paren ()) followed by a newline and a closing paren and a semicolon ();)?
In this case this might work for you:
sed -r ':a;N;s/\),\n\s*\);/)\n);/;P;D;ba'
| | | |---------| |---| | | |
| | | | | | | -- branch to label "a"
| | | | | | -- delete up to first newline of pattern space
| | | | | -- print up to first newline of pattern space
| | | | -- replace pattern
| | | -- search pattern
| | -- substitute
| -- read next line into pattern space (append)
-- branch label "a"
I'm cleaning up a LaTeX file, and I'm in a situation where I need to distinguish absolute value |x| from the set "such that" symbol i.e. {x | x < 0}.
The first step for me is to find all lines containing an odd number of | characters (i.e. the pipe symbol).
In principle, I know how to do this, but I've tried the following regex command with no luck.
egrep '^[^\|]*\|([^\|]*\|[^\|]*\|)*[^\|]*$'
The idea is that a matching line contains, in order:
The line start
0 or more non-pipe characters
Exactly one pipe character
0 or more copies of text containing exactly 2 pipes
The line end
However, for some reason this isn't working.
I run the command on the following file:
\[
S = \{ x | x < 0}
y = |x|
\]
and none of the lines match.
I suspect I'm making a silly mistake somewhere, possibly to do with escaping the pipe characters,
but I'm stumped as to what's wrong.
Can anybody tell me either how to fix this, or provide an alternate expression which matches lines containing an odd number of pipe characters?
Inside the [], | is not a special character so should not be escaped by \. Try:
egrep '^[^|]*\|([^|]*\|[^|]*\|)*[^|]*$'
Better to use awk for this purpose:
awk -F '|' '!(NF%2)'
TESTING:
echo "a|bc|d|erg" | awk -F '|' '!(NF%2)'
OUTPUT:
a|bc|d|erg
echo "abc|d|ergxy" | awk -F '|' '!(NF%2)'
OUTPUT:
how about:
awk -F'|' 'NF&&(NF-1)%2' file
example:
kent$ cat file
|foo|bar
| | | | |
||||||
|||||||
kent$ awk -F'|' 'NF&&(NF-1)%2' file
| | | | |
|||||||
Perl, which is cross platform (Windows too) and generally installed everywhere these days, is my axe of choice:
perl -ne 'print if (s/\|/\|/g) %2 == 1' file
script.sed
#!/bin/sed -nf
# Save to hold
h
# Delete all non | chars
s#[^|]##g
# Odd match
/^\(||\)*|$/ {
# Fetch hold
g
s#^#odd\t:#
}
# Even match
/^\(||\)\+$/ {
# Fetch hold
g
s#^#even\t:#
}
# No match
/^$/ {
# Fetch hold
g
s#^#none\t:#
}
# Print
p
data.txt
do|odd
do|odd|match|me
|even match|me
do|even match|me
do|even match|also|me|please
no-match
shell
sed -nf script.sed data.txt
stdout
odd :do|odd
odd :do|odd|match|me
even :|even match|me
even :do|even match|me
even :do|even match|also|me|please
none :
none :no-match
I have a string with the following data in it
"email#domain.com | firstname | lastname"
I want to replace | lastname with a different value. If I use s/ to do a substitution, what do i need to do to the | to get it to recognize. If I do
$foo =~ s/ lastname/fillertext/;
it works fine. but if I do
$foo =~ s/ | lastname/fillertext/;
it doesn't work. I tried to do - \|/ lastname, "| lastname", '| lastname'.
| has a special meaning in a regular expression; if you want to match a literal |, you just need to escape it:
$foo =~ s/ \| lastname/fillertext/;