Regex to remove EVEN lines - regex

I need help to build a regex that can remove EVEN lines in a plain textfile.
Given this input:
line1
line2line3line4line5line6
It would output this:
line1line3line5
Thanks !

Actually, you don't use regex for that. With your favourite language, iterate the file, use a counter and do modulus. eg with awk (*nix)
$ awk 'NR%2==1' file
line1
line3
line5
even lines:
$ awk 'NR%2==0' file
line2
line4
line6

Well, if you do a search-and-replace-all-matches on
^(.*)\r?\n.*
in "^ matches start-of-line mode" and ". doesn't match linebreaks mode"; replacing with
\1
then you lose every even line.
E. g. in C#:
resultString = Regex.Replace(subjectString, #"^(.*)\r?\n.*", "$1", RegexOptions.Multiline);
or in Python:
result = re.sub(r"(?m)^(.*)\r?\n.*", r"\1", subject)

First, I fully agree with the consensus that this is not something regex should be doing.
Here's a Java demo:
public class Test {
public static String voodoo(String lines) {
return lines.replaceAll("\\G(.*\r?\n).*(?:\r?\n|$)", "$1");
}
public static void main(String[] args) {
System.out.println("a)\n"+voodoo("1\n2\n3\n4\n5\n6"));
System.out.println("b)\n"+voodoo("1\r\n2\n3\r\n4\n5\n6\n7"));
System.out.println("c)\n"+voodoo("1"));
}
}
output:
a)
1
3
5
b)
1
3
5
7
c)
1
A short explanation of the regex:
\G # match the end of the previous match
( # start capture group 1
.* # match any character except line breaks and repeat it zero or more times
\r? # match the character '\r' and match it once or none at all
\n # match the character '\n'
) # end capture group 1
.* # match any character except line breaks and repeat it zero or more times
(?: # start non-capture group 1
\r? # match the character '\r' and match it once or none at all
\n # match the character '\n'
| # OR
$ # match the end of the input
) # end non-capture group 1
\G begins at the start of the string. Every pair of lines (where the second line is optional, in case of the last uneven line) gets replaced by the first line in the pair.
But again: using a normal programming language (if one can call awk "normal" :)) is the way to go.
EDIT
And as Tim suggested, this also works:
replaceAll("(?m)^(.*)\r?\n.*", "$1")

I use capture groups (.*) --> $1 in Sublime Text' 'regex-find-replace' mode to
remove the line break in every other line and place a tab character between the values using
replace (.*)\n(.*)\n
with $1\t$2\n
For this specific question the OP could change this to
replace (.*)\n(.*)\n
with $1\n

Well this, will remove EVEN lines from the text file:
grep '[13579]$' textfile > textfilewithoddlines
And output this:
line1
line3
line5

Perhaps you are on the command line. In PowerShell:
$x = 0; gc .\foo.txt | ? { $x++; $x % 2 -eq 0 }

Related

Find :: outside of markdown code formatting

I have a bunch of markdown files, where I want to search for Ruby's double colon :: outside of some code formatting (e.g. where I forgot to apply proper markdown). For example
`foo::bar`
hello `foo::bar` test
` example::with::whitespace `
```
Proper::Formatted
```
```
Module::WithIndendation
```
```
Some::Nested::Modules
```
```ruby
CodeBlock::WithSyntax
```
# Some::Class
## Another::Class Heading
some text
The regex only should match Some::Class and Another::Class, because they miss the surrounding backticks, and are also not within a multiline code fence block.
I have this regex, but it also matches the multi line block
[\s]+[^`]+(::)[^`]+[\s]?
Any idea, how to exclude this?
EDIT:
It would be great, if the regex would work in Ruby, JS and on the command line for grep.
For the original input, you may use this regex in ruby to match :: string
not preceded by a ` and
not preceded by ` followed a white-space:
Regex:
(?<!`\s)(?<!`)\b\w+::\w+
RegEx Demo 1
RegEx Breakup:
(?<!\s): Negative lookbehind to assert that <code> and whitespace is not at preceding position
(?<!): Negative lookbehind to assert that <code> is not at preceding position
\b: Match word boundary
\w+: Match 1+ word characters
::: Match a ::
\w+: Match 1+ word characters
You can use this regex in Javascript:
(?<!`\w*\s*|::)\b\w+(?:::\w+)+
RegEx Demo 2
For gnu-grep, consider this command:
grep -ZzoP '`\w*\s*\b\w+::\w+(*SKIP)(*F)|\b\w+::\w+' file |
xargs -0 printf '%s\n'
Some::Class
Another::Class
RegEx Demo 3
One can use the regular expression
rgx = /`[^`]*`|([^`\r\n]*::[^`\r\n]*)/
with the form of String#gsub that takes one argument and no block, and therefore returns an enumerator (str holding the example string given in the question):
str.gsub(rgx).select { $1 }
#=> ["# Some::Class", "## Another::Class Heading"]
The idea is that the first part of the regex's alternation, `[^`]*`, matches, but does not capture, all strings delimited by backtics (including ``), whereas the second part, ([^`\r\n]*::[^`\r\n]*), matches and captures all strings on a single line that contain '::' but no backtics. We therefore concern ourselves with captures only, by invoking select { $1 } on the enumerator returned by gsub.
The regular expression can be made self-documenting by writing it in free-spacing mode.
rgx = /
` # match a backtic
[^`]* # match zero of more characters other than backtics
` # match a backtic
| # or
( # begin capture group 1
[^`\r\n]* # match zero of more characters other than backtics and
# line terminators
:: # match two colons
[^`\r\n]* # ditto line before previous
) # end capture group 1
/x # invoke free-spacing regex definition mode
[^`\r\n] contains \r (carriage return) in the event that the file was created with Windows. If desired, [^`]* can be replaced with .*? (match zero or more characters, as few as possible).

Regex to find every second new line (match only new line characters)

Regex to find every second new line (match only new line characters)
Input:
LINE1
LINE2
LINE3
LINE4
LINE5
LINE6
Output:
LINE1LINE2
LINE3LINE4
LINE5LINE6
I have tried \n[^\n]*\n but it matches text as well for replacement and does not give desired output.
I am having issues in matching every second new line character only.
Thanks in advance!
You could use the regular expression
^(.*)\n(.*\n)
and replace each match with $1+$2.
Demo
Alternatively, you could simply match each pair of lines and remove the first newline character. That requires a bit of code, of course. As you have not indicated which language you are using I will illustrate that with some Ruby code, which readers should find easy to translate to any high-level language. Suppose str is a variable holding your multi-line string. Then:
r = /^(?:.*\n){2}/
s = str.gsub(r) { |s| s.sub(/\n/, '') }
puts s
LINE1LINE2
LINE3LINE4
LINE5LINE6
For an even number of lines, you could make use of a positive lookahead to assert what is on the right side is 0 or more times repetition of 2 lines that end with a newline, followed by matching the last line and the end of the string.
In the replacement use an empty string.
\n(?=(?:.+\n.+\n)*.+$)
Explanation
\n Match a newline
(?= Positive lookahead, assert what is on the right is
(?:.+\n.+\n)* Match 0+ times 2 lines followed by a newline
.+$ Match any char except a newline 1+ times and assert end of string
) Close lookahead
Regex demo
Output
LINE1LINE2
LINE3LINE4
LINE5LINE6

(Regular Expressions) 2Liner→1Liner

Thank you in advance and sorry for the bad english!
I want
'Odd rows' 'CRLF' 'Even rows' CRLF' → 'Odd rows' ',' 'Even rows' 'CRLF'
Example Input:
0
SECTION
2
HEADER
Desired Output:
0,SECTION
2,HEADER
What I have tried:
Find: (.*)\n(.*)\n
Replace: $1,$2\n
I want ー Easy to see dxf
. matches a newline the same as it matches any other characer, so the first .* is going to gobble up the whole string and leave nothing left.
Instead, use a character group that excludes \n. Also, it's not clear whether your final line terminates with a \n or not, so the Regex should handle for that:
Find
([^\n]*)\n([^\n]*)(\n|$)
Replace
$1,$2$3
Breakdown:
([^\n]*) - 0 or more characters that are not \n
\n
([^\n]*)
(\n|$) - \n or end of string
For you example data you could capture one or more digits in capturing group 1 followed by matching a newline.
In the replacement use group 1 followed by a comma.
Match
(\d+)(?:r?\n|\r)
Regex demo
Replace
$1,
you should match enter and space also, because there may be multiple spaces and new line available in string
try this regex-
"0\nSECTION\n 2\nHEADER".replace(/([\d]+)([\s\n]+)([^\d\s\n]*)/g,"$1,$3")
var myStr = ` 0
SECTION
2
HEADER`;
var output = myStr.replace(/([\d]+)([\s\n]+)([^\d\s\n]*)/g,"$1,$3");
console.log(output);
DXF file ok
ODD line abc...
(AWK)
NR%2!=0{L1=$0}
NR%2==0{print L1 "," $0;L1=""}

Replace character between two string in PCRE (Perl) syntax

How can I replace a special character between two special strings.
I have something like this:
"start 1
2-
G
23
end"
I want to have the following:
"start 1 2- G 23 end"
Only replace \n with space between "start and end"
Test1;Hello;"Text with more words";123
Test2;want;"start
1-
76 end";123
Test3;Test;"It's a test";123
Test4;Hellp;"start
1234
good-
the end";1234
Test5;Test;"It's a test";123
Is it possible in notepad++?
You can use this pattern:
(?:\G(?!\A)|\bstart\b)(?:(?!\bend\b).)*\K\R
demo
details:
(?:
\G(?!\A) # contiguous to a previous match
|
\bstart\b # this is the first branch that matches
)
(?:(?!\bend\b).)* # zero or more chars that are not a newline nor the start of the word "end"
\K # remove all on the left from the match result
\R # any newline sequence (\n or \r\n or \r)
Note: (?:(?!\bend\b).)* isn't very efficient, feel free to replace it by something better for your particular case.
Magic words are lazy quantifier, lookaheads and single line mode.
A solution for PHP (uses PCRE) would be:
<?php
$string = __your_string_here__;
$regex = '~(?s)(?:start)(?<content>.*?)(?=end)(?s-)~';
# ~ delimiter
# (?s) starts single line mode - aka dot matches everything
# (?:start) captures start literally
# .*? matches everything lazily
# (?=end) positive lookahead
# (?s-) turn single line mode off
# ~ delimiter
preg_match_all($regex, $string, $matches);
$content = str_replace("\n", '', $matches["content"][1]);
echo $content; // 1234good-the
?>

replace in multiline - refer to content for replacement

I need the following:
input:
NAME-LIST:
name1
<any text>
name_to_be_changed;
NAME-LIST:
name3
<any text>
name_to_be_changed;
output: replace "name_to_be_changed" by first name in the block
NAME-LIST:
name1
<any text>
name1;
NAME-LIST:
name3
<any text>
name3;
result:
I would prefer a perl one-liner :-)
I suggest a search expression similar to what Sam already posted:
(NAME-LIST:[\t ]*[\r\n]+)([^\r\n]+)([\r\n]+[^\r\n]*[\r\n]+)name_to_be_changed;
The replace string is \1\2\3\2; or $1$2$3$2;
Each pair of opening and closing round brackets specify a marking group. There are three such marking groups in the search expression.
[\t ]* makes it possible that there are trailing spaces or tabs after fixed string NAME-LIST: at end of first line of a block.
[\r\n]+ matches 1 or more carriage returns or linefeeds. That is similar to \v as used by Sam but does not match other vertical whitespaces like formfeed.
[^\r\n]+ matches 1 or more characters which are whether a carriage return nor a linefeed. That is like . if the matching behavior for a dot is defined as matching all characters except line terminators.
[^\r\n]* matches 0 or more characters which are whether a carriage return nor a linefeed. So <any text> can be also no text at all which means third line can be also a blank line.
The 3 strings found by the expressions in the marking groups are backreferenced by \1, \2 and \3 respectively $1, $2 and $3 whereby only the second one is backreferenced twice to copy the string from line 2 to line 4 and keep the other 3 lines unchanged.
Using a perl one-liner
perl -00 -pe 's/NAMELIST:\n(.*)\n.*\n\K.*/$1/' file.txt
Explanation:
Switches:
-00: Paragraph mode
-p: Creates a while(<>){...; print} loop for each line in your input file.
-e: Tells perl to execute the code on command line.
first of all thanks for your input...
unfortunately I could not make use of both of your suggested solutions, but I have found an own one:
perl -00 -pe 's/(NAME-LIST:\s+)(\w+)(.*?)\w+;/$1$2$3$2;/gs'
\s+ = 1 or more white spaces (space, newline, tab,...)
\w+ = 1 or more alphanumericals (like words or numbers
important is the /gs
g = global (do the replacements more than one time, otherwise only the first name will be replaced)
s = treat as single line