I need the following:
input:
NAME-LIST:
name1
<any text>
name_to_be_changed;
NAME-LIST:
name3
<any text>
name_to_be_changed;
output: replace "name_to_be_changed" by first name in the block
NAME-LIST:
name1
<any text>
name1;
NAME-LIST:
name3
<any text>
name3;
result:
I would prefer a perl one-liner :-)
I suggest a search expression similar to what Sam already posted:
(NAME-LIST:[\t ]*[\r\n]+)([^\r\n]+)([\r\n]+[^\r\n]*[\r\n]+)name_to_be_changed;
The replace string is \1\2\3\2; or $1$2$3$2;
Each pair of opening and closing round brackets specify a marking group. There are three such marking groups in the search expression.
[\t ]* makes it possible that there are trailing spaces or tabs after fixed string NAME-LIST: at end of first line of a block.
[\r\n]+ matches 1 or more carriage returns or linefeeds. That is similar to \v as used by Sam but does not match other vertical whitespaces like formfeed.
[^\r\n]+ matches 1 or more characters which are whether a carriage return nor a linefeed. That is like . if the matching behavior for a dot is defined as matching all characters except line terminators.
[^\r\n]* matches 0 or more characters which are whether a carriage return nor a linefeed. So <any text> can be also no text at all which means third line can be also a blank line.
The 3 strings found by the expressions in the marking groups are backreferenced by \1, \2 and \3 respectively $1, $2 and $3 whereby only the second one is backreferenced twice to copy the string from line 2 to line 4 and keep the other 3 lines unchanged.
Using a perl one-liner
perl -00 -pe 's/NAMELIST:\n(.*)\n.*\n\K.*/$1/' file.txt
Explanation:
Switches:
-00: Paragraph mode
-p: Creates a while(<>){...; print} loop for each line in your input file.
-e: Tells perl to execute the code on command line.
first of all thanks for your input...
unfortunately I could not make use of both of your suggested solutions, but I have found an own one:
perl -00 -pe 's/(NAME-LIST:\s+)(\w+)(.*?)\w+;/$1$2$3$2;/gs'
\s+ = 1 or more white spaces (space, newline, tab,...)
\w+ = 1 or more alphanumericals (like words or numbers
important is the /gs
g = global (do the replacements more than one time, otherwise only the first name will be replaced)
s = treat as single line
Related
I am working on a Powershell script to parse SWIFT messages (text based) into a database. I am using REGEX to find the appropriate strings in the file and extract them. I now run into the issue that one of the data fields can have CR/LF characters in the string - in the example below I would need to extract the second line as well.
:61:2111261126D12000,00NTRF11000004217657P//03MT211124101166
JANE DOE 1232
I tested this regex pattern (:61:.*[\r\n].*) in RegExr and it recognizes the [\r\n] characters as requirement to be valid, so my plan was to have two expressions - one with and one without CR/LF characters to identify both messages - either with line break or without - however the code below will return all matches no matter whether a line break in included or not - it seems that PS stops evaluation strings after CR/LF.
$transaction = $swift | select-string ‘:61:.*[\r\n].*’ -AllMatches | % { $_.Matches } | % { $_.Value }
Can I use REGEX for this task or do I have to create a function to read the entire string and check for the next line tag to determine the end of this string?
Describe the first line more accurately, then whatever is left is necessarily the message:
$swift = #'
:61:2111261126D12000,00NTRF11000004217657P//03MT211124101166
JANE DOE 1232
'#
$swift |Select-String -Pattern '(?m):\d+:[^,]+,[^/]+//\d+MT\d+[\s\r\n]+.*$'
The regex pattern breaks down as follows:
(?m) # Multi-line mode, this will make `$` match end-of-line positions as well as end-of-string
:\d+: # 1 or more digits, surrounded by colons, matches `:61:`
[^,]+, # 1 or more non-commas followed by a comma, matches `2111261126D12000,`
[^/]+// # 1 or more non-slashes, followed by 2, matches `00NTRF11000004217657P//`
\d+MT\d+ # 1 or more digits followed by `MT` and more digits, matches `03MT211124101166`
[\s\r\n]+ # 1 or more white-space/CR/LF characters
.*$ # everything until the end of the current line, matches `JANE DOE 1232`
Since we're using [\s\r\n]+ to describe the potential line break, it'll still work when the linebreak is replaced with other whitespace characters.
Thank you in advance and sorry for the bad english!
I want
'Odd rows' 'CRLF' 'Even rows' CRLF' → 'Odd rows' ',' 'Even rows' 'CRLF'
Example Input:
0
SECTION
2
HEADER
Desired Output:
0,SECTION
2,HEADER
What I have tried:
Find: (.*)\n(.*)\n
Replace: $1,$2\n
I want ー Easy to see dxf
. matches a newline the same as it matches any other characer, so the first .* is going to gobble up the whole string and leave nothing left.
Instead, use a character group that excludes \n. Also, it's not clear whether your final line terminates with a \n or not, so the Regex should handle for that:
Find
([^\n]*)\n([^\n]*)(\n|$)
Replace
$1,$2$3
Breakdown:
([^\n]*) - 0 or more characters that are not \n
\n
([^\n]*)
(\n|$) - \n or end of string
For you example data you could capture one or more digits in capturing group 1 followed by matching a newline.
In the replacement use group 1 followed by a comma.
Match
(\d+)(?:r?\n|\r)
Regex demo
Replace
$1,
you should match enter and space also, because there may be multiple spaces and new line available in string
try this regex-
"0\nSECTION\n 2\nHEADER".replace(/([\d]+)([\s\n]+)([^\d\s\n]*)/g,"$1,$3")
var myStr = ` 0
SECTION
2
HEADER`;
var output = myStr.replace(/([\d]+)([\s\n]+)([^\d\s\n]*)/g,"$1,$3");
console.log(output);
DXF file ok
ODD line abc...
(AWK)
NR%2!=0{L1=$0}
NR%2==0{print L1 "," $0;L1=""}
I've got a text-file with the following line:
201174480 11-01-1911 J Student 25-07 11585 2 0 SPOED BEZORGEN 1ST 25,00
320819019 11-01-1911 T. Student 28-07 13561 1 15786986 DESLORATADINE TABL OMH 5MG 60ST 3,60
706059901 11-01-1911 ST Student-Student 30-06 14956 1 15356221 METOPROLOLSUCC RET T 100MG 180ST 12,90-
I want to change this line with SED into:
201174480 11-01-1911 J Student 25-07 11585 2 0 SPOED BEZORGEN 1ST 25,00
320819019 11-01-1911 T. Student 28-07 13561 1 15786986 DESLORATADINE TABL OMH 5MG 60ST 3,60
706059901 11-01-1911 ST Student-Student 30-06 14956 1 15356221 METOPROLOLSUCC RET T 100MG 180ST -12,90
So I want to swap the minus sign so that I get-12,90 in stead of 12,90- with SED. I tried:
try 1:
sed 's/\([0-9.]\+\)-/-\1/g' file.txt > file1.txt
try 2:
sed 's/\([0-9].\+\)-$/-\1/g' file.txt > file1.txt
So there must be something wrong with the REGEX but I donot really understand it. Please help.
You may use
sed 's/\([0-9][0-9,.]\+\)-\($\|[^0-9]\)/-\1\2/g'
See the online demo
The point is that after matching a number and a - (see \([0-9][0-9,.]\+\)-), there should come either end of string or non-digit (\($\|[^0-9]\)). Thus, we have 2 capturing groups now, and that is why we need a second backreference in the replacement pattern (\2).
I added a dot . to the bracket expression just in case you have mixed number formats, you may remove it if you always have a comma as the decimal separator.
Pattern details:
\([0-9][0-9,.]\+\) - Group 1 capturing
[0-9] - a digit
[0-9,.]\+ - one or more digits, commas or dots
- - a literal hyphen
\($\|[^0-9]\) - Group 2 capturing the end of string $ or a non-digit ([^0-9])
In your example, both files are identical, but I think I know what you mean.
For this particular file, you want to match a space, followed by zero or more digits, followed by a comma, followed by at least one digit, followed by a dash,
followed by zero or more spaces to the end of the line.
Then you want to replace the space in front of the matched digits and the comma with a dash. This will do the trick:
sed -e 's/ \([0-9]*,[0-9][0-9]*\)- *$/-\1/' <file.txt >file1.txt
Your first regular expression attempts to match against a string of numbers and .s, but the text contains a comma, not a .. It does the substitution you want if you replace [0-9.] with [0-9,], giving:
sed 's/\([0-9,]\+\)-/-\1/g' file.txt > file1.txt
However, it also replaces 25-07 in that case with -2507. I suggest you explicitly match against the end of the line:
sed 's/\([0-9,]\+\)-$/-\1/g'
or alternatively, you can demand that the match contains exactly one comma:
sed 's/\([0-9]\+,[0-9]\+\)-$/-\1/g'
I also find these things easier to read if you use the -r option to sed, which enables "extended regular expressions":
sed -r 's/([0-9]+,[0-9]+)-$/-\1/g'
Fewer special characters need to be escaped (on the other hand, more literal characters need to be escaped, but I find that tends to be a rarer occurrence).
(Aside: note that . usually means "any character", but inside a character class [.] it means "literally a .", since after all having it mean "any character" in there would be pretty useless.)
I am trying to read a regex format in Perl. Sometimes instead of a single line I also see the format in 3 lines.
For the below single line format I can regex as
/^\s*(.*)\s+([a-zA-Z0-9._]+)\s+(\d+)\s+(.*)/
to get the first 3 individual items in line
Hi There FirstName.LastName 10 3/23/2011 2:46 PM
Below is the multi-line format I see. I am trying to use something like
/^\s*(.*)\n*\n*|\s+([a-zA-Z0-9._]+)\s+(\d+)\s+(.*)$/m
to get individual items but don’t seem to work.
Hi There
FirstName-LastName 8 7/17/2015 1:15 PM
Testing - 12323232323 Hello There
Any suggestions? Is multi-line regex possible?
NOTE: In the same output i can see either Single line or Multi line or both so output can be like below
Hello Line1 FirstName.LastName 10 3/23/2011 2:46 PM
Hello Line2
Line2FirstName-LastName 8 7/17/2015 1:15 PM
Testing - 12323232323 Hello There
Hello Line3 Line3FirstName.LastName 8 3/21/2011 2:46 PM
You can for sure apply regex over multiple lines.
I've used the negated word \W+ between words to match space and newlines between words (actually \W is equal to [^a-zA-Z0-9_]).
The chat is viewed as a repetead \w+\W+ block.
If you provide more specific input / output case i can refine the example code:
#!/usr/bin/env perl
my $input = <<'__END__';
Hi There
FirstName-LastName 8 7/17/2015 1:15 PM
Testing - 12323232323 Hello There
__END__
my ($chat,$username,$chars,$timestamp) = $input =~ m/(?im)^\s*((?:\w+\W+)+)(\w+[-,\.]\w+)\W+(\d+)\W+([0-1]?\d\/[0-3]?\d\/[1-2]\d{3}\s+[0-2]?\d:[0-5]?\d\s?[ap]m)/;
$chat =~ s/\s+$//; #remove trailing spaces
print "chat -> ${chat}\n";
print "username -> ${username}\n";
print "chars -> ${chars}\n";
print "timestamp -> ${timestamp}\n";
Legenda
m/^.../ match regex (not substitute type) starting from start of line
(?im): case insensitive search and multiline (^/$ match start/end of line also)
\s* match zero or more whitespace chars (matches spaces, tabs, line breaks or form feeds)
((?:\w+\W+)+) (match group $chat) match one or more a pattern composed by a single word \w+ (letters, numbers, '_') followed by not words \W+(everything that is not \w including newline \n). This is later filtered to remove trailing whitespaces
(\w+[-,\.]\w+): (match group $username) this is our weak point. If the username is not composed by two regex words separated by a dash '-' or a comma ',' (UPDATE) or a dot '.' the entire regex cannot work properly (i've extracted both the possibilities from your question, is not directly specified).
(\d+): (match group $chars) a number composed by one or more digits
([0-1]?\d\/[0-3]?\d\/[1-2]\d{3}\s+[0-2]?\d:[0-5]?\d\s[ap]m): (match group $timestamp) this is longer than the others split it up:
[0-1]?\d\/[0-3]?\d\/[1-2]\d{3} match a date composed by month (with an optional leading zero), a day (with an optional leading zero) and a year from 1000 to 2999 (a relaxed constraint :)
[0-2]?\d:[0-5]?\d\s?[ap]m match the time: hour:minutes,optional space and 'pm,PM,am,AM,Am,Pm...' thanks to the case insensitive modifier above
You can test it online here
Your regex says:
^\s*(.*)\n*\n* # line starts with optional space followed by anything
| # or
\s+([a-zA-Z0-9._]+)\s+(\d+)\s+(.*)$ # spaces followed by any words followed by spaces, digits, spaces, anything at the end of the line
Consider this:
/^From|To$/
Alternation sticks as close to the sequences.
Above is really saying to find a line starting with 'Fro' followed by 'm' or 'T', followed by 'o', followed by the end of line
Compare to this:
/^(From|To)$/
Above will find lines that only have 'From' or 'To'
I need help to build a regex that can remove EVEN lines in a plain textfile.
Given this input:
line1
line2line3line4line5line6
It would output this:
line1line3line5
Thanks !
Actually, you don't use regex for that. With your favourite language, iterate the file, use a counter and do modulus. eg with awk (*nix)
$ awk 'NR%2==1' file
line1
line3
line5
even lines:
$ awk 'NR%2==0' file
line2
line4
line6
Well, if you do a search-and-replace-all-matches on
^(.*)\r?\n.*
in "^ matches start-of-line mode" and ". doesn't match linebreaks mode"; replacing with
\1
then you lose every even line.
E. g. in C#:
resultString = Regex.Replace(subjectString, #"^(.*)\r?\n.*", "$1", RegexOptions.Multiline);
or in Python:
result = re.sub(r"(?m)^(.*)\r?\n.*", r"\1", subject)
First, I fully agree with the consensus that this is not something regex should be doing.
Here's a Java demo:
public class Test {
public static String voodoo(String lines) {
return lines.replaceAll("\\G(.*\r?\n).*(?:\r?\n|$)", "$1");
}
public static void main(String[] args) {
System.out.println("a)\n"+voodoo("1\n2\n3\n4\n5\n6"));
System.out.println("b)\n"+voodoo("1\r\n2\n3\r\n4\n5\n6\n7"));
System.out.println("c)\n"+voodoo("1"));
}
}
output:
a)
1
3
5
b)
1
3
5
7
c)
1
A short explanation of the regex:
\G # match the end of the previous match
( # start capture group 1
.* # match any character except line breaks and repeat it zero or more times
\r? # match the character '\r' and match it once or none at all
\n # match the character '\n'
) # end capture group 1
.* # match any character except line breaks and repeat it zero or more times
(?: # start non-capture group 1
\r? # match the character '\r' and match it once or none at all
\n # match the character '\n'
| # OR
$ # match the end of the input
) # end non-capture group 1
\G begins at the start of the string. Every pair of lines (where the second line is optional, in case of the last uneven line) gets replaced by the first line in the pair.
But again: using a normal programming language (if one can call awk "normal" :)) is the way to go.
EDIT
And as Tim suggested, this also works:
replaceAll("(?m)^(.*)\r?\n.*", "$1")
I use capture groups (.*) --> $1 in Sublime Text' 'regex-find-replace' mode to
remove the line break in every other line and place a tab character between the values using
replace (.*)\n(.*)\n
with $1\t$2\n
For this specific question the OP could change this to
replace (.*)\n(.*)\n
with $1\n
Well this, will remove EVEN lines from the text file:
grep '[13579]$' textfile > textfilewithoddlines
And output this:
line1
line3
line5
Perhaps you are on the command line. In PowerShell:
$x = 0; gc .\foo.txt | ? { $x++; $x % 2 -eq 0 }