Regex to find every second new line (match only new line characters) - regex

Regex to find every second new line (match only new line characters)
Input:
LINE1
LINE2
LINE3
LINE4
LINE5
LINE6
Output:
LINE1LINE2
LINE3LINE4
LINE5LINE6
I have tried \n[^\n]*\n but it matches text as well for replacement and does not give desired output.
I am having issues in matching every second new line character only.
Thanks in advance!

You could use the regular expression
^(.*)\n(.*\n)
and replace each match with $1+$2.
Demo
Alternatively, you could simply match each pair of lines and remove the first newline character. That requires a bit of code, of course. As you have not indicated which language you are using I will illustrate that with some Ruby code, which readers should find easy to translate to any high-level language. Suppose str is a variable holding your multi-line string. Then:
r = /^(?:.*\n){2}/
s = str.gsub(r) { |s| s.sub(/\n/, '') }
puts s
LINE1LINE2
LINE3LINE4
LINE5LINE6

For an even number of lines, you could make use of a positive lookahead to assert what is on the right side is 0 or more times repetition of 2 lines that end with a newline, followed by matching the last line and the end of the string.
In the replacement use an empty string.
\n(?=(?:.+\n.+\n)*.+$)
Explanation
\n Match a newline
(?= Positive lookahead, assert what is on the right is
(?:.+\n.+\n)* Match 0+ times 2 lines followed by a newline
.+$ Match any char except a newline 1+ times and assert end of string
) Close lookahead
Regex demo
Output
LINE1LINE2
LINE3LINE4
LINE5LINE6

Related

Match certain string on second line of text with regex

I'm new to regex, and would appreciate some guidance/help.
Currently, I'm looking to write an expression, that derives a certain part of text from the 2nd line of the provided text.
Here is the text:
123 anywhere Avenue
Winnipeg, Manitoba R3E 0L7
Canada
Pharmacy Manager: person person
Pharmacy Licence Holder/Owner: 123456 Manitoba Ltd.
see correct formatting with code here
My goal is to derive the 'Manitoba' string from the second line, however I'd like to make it dynamic rather than writing an expression to always fetch Manitoba as a static. I used the below code to target the second line:
(.*)(?=(\n.*){3}$)
(It matches 3 lines up from the last line, thus targeting the desired line)
I noticed, that within the dataset, that the Province (Manitoba) is always in between two spaces.
Is there any addition I can make to the code, so that the expression only targets the second line, then matches the first string in-between spaces?
Perhaps using a lazy expression with a positive lookaround?
If I target all matches in between spaces, it would take both 'Manitoba' and 'R3E 0L7' which I dont want.
I want it to only match the first piece of text in between spaces on the second line.
Any help is much appreciated :-)
Thanks.
One option could be to match the first line, then capture the second word in the second lines in capturing group 1.
Then match the rest of the second line and assert what follows is 3 times a line.
^.*\r?\n\S+[^\S\r\n]+(\S+).*(?=(?:\r?\n.*){3}$)
In parts:
^ Start of string
.*\r?\n Match the whole lines and a newline
\S+ Match 1+ non whitespace char (the first "word")
[^\S\r\n]+ Match 1+ times a whitespace char except newlines
(\S+) Capture group 1 Match 1+ times a non whitespace char (the second "word')
.* Match the rest of the line
(?= Positive lookahead, assert what follows on the right is
(?:\r?\n.*){3}$ Match 3 times a newline followed by 0+ times any except a newline and assert the end of the string
) Close lookahead
Regex demo
You could also turn the lookahead in to a match instead
^.*\r?\n\S+[^\S\r\n]+(\S+).*(?:\r?\n.*){3}$
Regex demo

How to use a regex to match if any pattern appears once out of many times in a given sequence

Hard to word this correctly, but TL;DR.
I want to match, in a given text sentence (let's say "THE TREE IS GREEN") if any space is doubled (or more).
Example:
"In this text,
THE TREE IS GREEN should not match,
THE TREE IS GREEN should
and so should THE TREE IS GREEN
but double-spaced TEXT SHOULD NOT BE FLAGGED outside the pattern."
My initial approach would be
/THE( {2,})TREE( {2,})IS( {2,})GREEN/
but this only matches if all spaces are double in the sequence, therefore I'd like to make any of the groups trigger a full match. Am I going the wrong way, or is there a way to make this work?
You can use Negative lookahead if there is an option.
First match the sentence that you want to fail, in your case, it is "THE TREE IS GREEN" then give the most generic case that wants to catch your desired result.
(?!THE TREE IS GREEN)(THE[ ]+TREE[ ]+IS[ ]+GREEN)
https://regex101.com/r/EYDU6g/2
You can just search for the spaces that you're looking for:
/ {2,}/ will work to match two or more of the space character. (https://regexr.com/4h4d4)
You can capture the results by surrounding it with parenthesis - /( {2,})/
You may want to broaden it a bit.
/\s{2,}/ will match any doubling of whitespace.
(\s - means any whitespace - space, tab, newline, etc.)
No need to match the whole string, just the piece that's of interest.
If I am not mistaken you want the whole match if there is a part present where there are 2 or more spaces between 2 uppercased parts.
If that is the case, you might use:
^.*[A-Z]+ {2,}[A-Z]+.*$
^ Start of string
.*[A-Z]+ match any char except a newline 0+ time, then match 1+ times [A-Z]
[ ]{2,} Match 2 or more times a space (used square brackets for clarity)
A-Z+ Match 1+ times an uppercase char
.*$ Match any char except a newline 0+ times until the end of the string
Regex demo
You could do this:
import re
pattern = r"THE +TREE +IS +GREEN"
test_str = ("In this text,\n"
"THE TREE IS GREEN should not match,\n"
"THE TREE IS GREEN should\n"
"and so should THE TREE IS GREEN\n"
"but double-spaced TEXT SHOULD NOT BE FLAGGED outside the pattern.")
matches = re.finditer(pattern, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
if match.group() != 'THE TREE IS GREEN':
print ("{match}".format(match = match.group()))

A regular expression for matching a group followed by a specific character

So I need to match the following:
1.2.
3.4.5.
5.6.7.10
((\d+)\.(\d+)\.((\d+)\.)*) will do fine for the very first line, but the problem is: there could be many lines: could be one or more than one.
\n will only appear if there are more than one lines.
In string version, I get it like this: "1.2.\n3.4.5.\n1.2."
So my issue is: if there is only one line, \n needs not to be at the end, but if there are more than one lines, \n needs be there at the end for each line except the very last.
Here is the pattern I suggest:
^\d+(?:\.\d+)*\.?(?:\n\d+(?:\.\d+)*\.?)*$
Demo
Here is a brief explanation of the pattern:
^ from the start of the string
\d+ match a number
(?:\.\d+)* followed by dot, and another number, zero or more times
\.? followed by an optional trailing dot
(?:\n followed by a newline
\d+(?:\.\d+)*\.?)* and another path sequence, zero or more times
$ end of the string
You might check if there is a newline at the end using a positive lookahead (?=.*\n):
(?=.*\n)(\d+)\.(\d+)\.((\d+)\.)*
See a regex demo
Edit
You could use an alternation to either match when on the next line there is the same pattern following, or match the pattern when not followed by a newline.
^(?:\d+\.\d+\.(?:\d+\.)*(?=.*\n\d+\.\d+\.)|\d+\.\d+\.(?:\d+\.)*(?!.*\n))
Regex demo
^ Start of string
(?: Non capturing group
\d+\.\d+\. Match 2 times a digit and a dot
(?:\d+\.)* Repeat 0+ times matching 1+ digits and a dot
(?=.*\n\d+\.\d+\.) Positive lookahead, assert what follows a a newline starting with the pattern
| Or
\d+\.\d+\. Match 2 times a digit and a dot
(?:\d+\.)* Repeat 0+ times matching 1+ digits and a dot
*(?!.*\n) Negative lookahead, assert what follows is not a newline
) Close non capturing group
(\d+\.*)+\n* will match the text you provided. If you need to make sure the final line also ends with a . then (\d+\.)+\n* will work.
Most programming languages offer the m flag. Which is the multiline modifier. Enabling this would let $ match at the end of lines and end of string.
The solution below only appends the $ to your current regex and sets the m flag. This may vary depending on your programming language.
var text = "1.2.\n3.4.5.\n1.2.\n12.34.56.78.123.\nthis 1.2. shouldn't hit",
regex = /((\d+)\.(\d+)\.((\d+)\.)*)$/gm,
match;
while (match = regex.exec(text)) {
console.log(match);
}
You could simplify the regex to /(\d+\.){2,}$/gm, then split the full match based on the dot character to get all the different numbers. I've given a JavaScript example below, but getting a substring and splitting a string are pretty basic operations in most languages.
var text = "1.2.\n3.4.5.\n1.2.\n12.34.56.78.123.\nthis 1.2. shouldn't hit",
regex = /(\d+\.){2,}$/gm;
/* Slice is used to drop the dot at the end, otherwise resulting in
* an empty string on split.
*
* "1.2.3.".split(".") //=> ["1", "2", "3", ""]
* "1.2.3.".slice(0, -1) //=> "1.2.3"
* "1.2.3".split(".") //=> ["1", "2", "3"]
*/
console.log(
text.match(regex)
.map(match => match.slice(0, -1).split("."))
);
For more info about regex flags/modifiers have a look at: Regular Expression Reference: Mode Modifiers

(Regular Expressions) 2Liner→1Liner

Thank you in advance and sorry for the bad english!
I want
'Odd rows' 'CRLF' 'Even rows' CRLF' → 'Odd rows' ',' 'Even rows' 'CRLF'
Example Input:
0
SECTION
2
HEADER
Desired Output:
0,SECTION
2,HEADER
What I have tried:
Find: (.*)\n(.*)\n
Replace: $1,$2\n
I want ー Easy to see dxf
. matches a newline the same as it matches any other characer, so the first .* is going to gobble up the whole string and leave nothing left.
Instead, use a character group that excludes \n. Also, it's not clear whether your final line terminates with a \n or not, so the Regex should handle for that:
Find
([^\n]*)\n([^\n]*)(\n|$)
Replace
$1,$2$3
Breakdown:
([^\n]*) - 0 or more characters that are not \n
\n
([^\n]*)
(\n|$) - \n or end of string
For you example data you could capture one or more digits in capturing group 1 followed by matching a newline.
In the replacement use group 1 followed by a comma.
Match
(\d+)(?:r?\n|\r)
Regex demo
Replace
$1,
you should match enter and space also, because there may be multiple spaces and new line available in string
try this regex-
"0\nSECTION\n 2\nHEADER".replace(/([\d]+)([\s\n]+)([^\d\s\n]*)/g,"$1,$3")
var myStr = ` 0
SECTION
2
HEADER`;
var output = myStr.replace(/([\d]+)([\s\n]+)([^\d\s\n]*)/g,"$1,$3");
console.log(output);
DXF file ok
ODD line abc...
(AWK)
NR%2!=0{L1=$0}
NR%2==0{print L1 "," $0;L1=""}

Perl multiline regex for first 3 individual items

I am trying to read a regex format in Perl. Sometimes instead of a single line I also see the format in 3 lines.
For the below single line format I can regex as
/^\s*(.*)\s+([a-zA-Z0-9._]+)\s+(\d+)\s+(.*)/
to get the first 3 individual items in line
Hi There FirstName.LastName 10 3/23/2011 2:46 PM
Below is the multi-line format I see. I am trying to use something like
/^\s*(.*)\n*\n*|\s+([a-zA-Z0-9._]+)\s+(\d+)\s+(.*)$/m
to get individual items but don’t seem to work.
Hi There
FirstName-LastName 8 7/17/2015 1:15 PM
Testing - 12323232323 Hello There
Any suggestions? Is multi-line regex possible?
NOTE: In the same output i can see either Single line or Multi line or both so output can be like below
Hello Line1 FirstName.LastName 10 3/23/2011 2:46 PM
Hello Line2
Line2FirstName-LastName 8 7/17/2015 1:15 PM
Testing - 12323232323 Hello There
Hello Line3 Line3FirstName.LastName 8 3/21/2011 2:46 PM
You can for sure apply regex over multiple lines.
I've used the negated word \W+ between words to match space and newlines between words (actually \W is equal to [^a-zA-Z0-9_]).
The chat is viewed as a repetead \w+\W+ block.
If you provide more specific input / output case i can refine the example code:
#!/usr/bin/env perl
my $input = <<'__END__';
Hi There
FirstName-LastName 8 7/17/2015 1:15 PM
Testing - 12323232323 Hello There
__END__
my ($chat,$username,$chars,$timestamp) = $input =~ m/(?im)^\s*((?:\w+\W+)+)(\w+[-,\.]\w+)\W+(\d+)\W+([0-1]?\d\/[0-3]?\d\/[1-2]\d{3}\s+[0-2]?\d:[0-5]?\d\s?[ap]m)/;
$chat =~ s/\s+$//; #remove trailing spaces
print "chat -> ${chat}\n";
print "username -> ${username}\n";
print "chars -> ${chars}\n";
print "timestamp -> ${timestamp}\n";
Legenda
m/^.../ match regex (not substitute type) starting from start of line
(?im): case insensitive search and multiline (^/$ match start/end of line also)
\s* match zero or more whitespace chars (matches spaces, tabs, line breaks or form feeds)
((?:\w+\W+)+) (match group $chat) match one or more a pattern composed by a single word \w+ (letters, numbers, '_') followed by not words \W+(everything that is not \w including newline \n). This is later filtered to remove trailing whitespaces
(\w+[-,\.]\w+): (match group $username) this is our weak point. If the username is not composed by two regex words separated by a dash '-' or a comma ',' (UPDATE) or a dot '.' the entire regex cannot work properly (i've extracted both the possibilities from your question, is not directly specified).
(\d+): (match group $chars) a number composed by one or more digits
([0-1]?\d\/[0-3]?\d\/[1-2]\d{3}\s+[0-2]?\d:[0-5]?\d\s[ap]m): (match group $timestamp) this is longer than the others split it up:
[0-1]?\d\/[0-3]?\d\/[1-2]\d{3} match a date composed by month (with an optional leading zero), a day (with an optional leading zero) and a year from 1000 to 2999 (a relaxed constraint :)
[0-2]?\d:[0-5]?\d\s?[ap]m match the time: hour:minutes,optional space and 'pm,PM,am,AM,Am,Pm...' thanks to the case insensitive modifier above
You can test it online here
Your regex says:
^\s*(.*)\n*\n* # line starts with optional space followed by anything
| # or
\s+([a-zA-Z0-9._]+)\s+(\d+)\s+(.*)$ # spaces followed by any words followed by spaces, digits, spaces, anything at the end of the line
Consider this:
/^From|To$/
Alternation sticks as close to the sequences.
Above is really saying to find a line starting with 'Fro' followed by 'm' or 'T', followed by 'o', followed by the end of line
Compare to this:
/^(From|To)$/
Above will find lines that only have 'From' or 'To'