Regex to match string that isn't a specific number? - regex

I have a tab-separated file that looks like this:
Something 1 Text...
Something 2 Text...
Something 2001 Text...
Something 1 Text
I want to match all lines that do not have 1 in the second to last column. So I tried this:
\t[^1][^\t]*\t[^\t]*$
But for some reason this does not work. Any hints?
Thanks!

You can use this regex:
/^\S+\s+(?!.*1).*$/gm
RegEx Demo
Or else if you want 1 to be a complete word then use:
/^\S+\s+(?!.*\b1\b).*$/gm
RegEx Demo2
EDIT:
To check for presence of 1 in last 2 columns only:
/\t(?!.*1)\S+\t+\S+$/gm
RegEx Demo 3

You regex \t[^1][^\t]*\t[^\t]*$ does not work because it matches a tab, then any character other than 1, 1 time, then 0 or more characters other than tabs, a tab, and 0 or more characters other than a tab before the end of line (if you are using m mode).
I suggest reading everything in the first column, then a tab, and then set a check so that we do not have "1":
^[^\t]*\t(?!.*1).*$
Pay attention to the multiline m flag.
Here is my demo
EDIT:
If you need to only make sure there is no 1 in the last 2 columns, use this regex:
^.*(?!.*1)[^\t]+\t[^\t]+$
EXPANATION:
^ - Start of line
.* - Consume any characters from the start
(?!.*1) - Set a check for 1 - it should not appear before the end of line from here!
[^\t]+ 1 or more characters other than a tab
\t - a tab
[^\t]+ - 1 or more characters other than a tab
$ - End of line.
See another demo

You can use following regex :
/^[^\t]*\t((?!1).)*$/gm
Demo
(?!1) is a negative look ahead that match any character that doesn't followed by 1

If you want to match only lines without the character 1 from the second column until the end of the line, you can use this pattern:
^[^\t]*\t[^1]*$

Related

How to transpose pieces of data using Regular expression in Notepad++

I am very new to the world of regular expressions. I am trying to use Notepad++ using Regex for the following:
Input file is something like this and there are multiple such files:
Code:
abc
17
015
0 7
4.3
5/1
***END***
abc
6
71
8/3
9 0
***END***
abc
10.1
11
9
***END***
I need to be able to edit the text in all of these files so that all the files look like this:
Code:
abc
1,2,3,4,5
***END***
abc
6,7,8,9
***END***
abc
10,11,12
***END***
Also:
In some files the number of * around the word END varies, is there a way to generalize the number of * so I don't have to worry about it?
There is some additional data before abcs which does not need to be transposed, how do I keep that data as it is along with transposing the data between abc and ***END***.
Kindly help me. Your help is much appreciated!
Try the following find and replace, in regex mode:
Find: ^(\d+)\R(?!\*{1,}END\*{1,})
Replace: $1,
Demo
Here is an explanation of the regex pattern:
^ from the start of the line
(\d+) match AND capture a number
\R followed by a platform independent newline, which
(?!\*{1,}END\*{1,}) is NOT followed by ***END***
Note carefully the negative lookahead at the end of the pattern, which makes sure that we don't do the replacement on the final number in each section. Without this, the last number would bring the END marker onto the same line.
This will eplace only between "abc" and "***END***" with any number of asterisk.
Ctrl+H
Find what: (?:(?<=^abc)\R|\G(?!^)).+\K\R(?!\*+END\*+)
Replace with: ,
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline*
Replace all
Explanation:
(?: # non capture group
(?<=^abc) # positive look behind, make sure we have "abc" at the beginning of line before
\R # any kind of linebreak
| # OR
\G # restart from last match position
(?!^) # negative look ahead, make sure we are not at the beginning of line
) # end group
.+ # 1 or more any character but newline
\K # forget all we have seen until this position
\R # any kind of linebreak
(?!\*+END\*+) # negative lookahead, make sure we haven't ***END*** after
Screen capture (before):
Screen capture (after):

(Regular Expressions) 2Liner→1Liner

Thank you in advance and sorry for the bad english!
I want
'Odd rows' 'CRLF' 'Even rows' CRLF' → 'Odd rows' ',' 'Even rows' 'CRLF'
Example Input:
0
SECTION
2
HEADER
Desired Output:
0,SECTION
2,HEADER
What I have tried:
Find: (.*)\n(.*)\n
Replace: $1,$2\n
I want ー Easy to see dxf
. matches a newline the same as it matches any other characer, so the first .* is going to gobble up the whole string and leave nothing left.
Instead, use a character group that excludes \n. Also, it's not clear whether your final line terminates with a \n or not, so the Regex should handle for that:
Find
([^\n]*)\n([^\n]*)(\n|$)
Replace
$1,$2$3
Breakdown:
([^\n]*) - 0 or more characters that are not \n
\n
([^\n]*)
(\n|$) - \n or end of string
For you example data you could capture one or more digits in capturing group 1 followed by matching a newline.
In the replacement use group 1 followed by a comma.
Match
(\d+)(?:r?\n|\r)
Regex demo
Replace
$1,
you should match enter and space also, because there may be multiple spaces and new line available in string
try this regex-
"0\nSECTION\n 2\nHEADER".replace(/([\d]+)([\s\n]+)([^\d\s\n]*)/g,"$1,$3")
var myStr = ` 0
SECTION
2
HEADER`;
var output = myStr.replace(/([\d]+)([\s\n]+)([^\d\s\n]*)/g,"$1,$3");
console.log(output);
DXF file ok
ODD line abc...
(AWK)
NR%2!=0{L1=$0}
NR%2==0{print L1 "," $0;L1=""}

RegExp checking for sign only if there is text afterwards

I have some cases, which I need to filter with a regex. The values which need to be filtered are listed below:
// These should be catched
123456_Test.pdf
123456 Test.pdf
123456.pdf
// These shouldn't be catched
123456Abcasd.pdf
123456-Abcasd.pdf
123456_.pdf
The current regEx looks like this:
(\d{6,7})((\_| ){0,1})(.*)\..*
The problem here is, that the latter 3 are also matched. To give you a short overview, whats wrong with the 1st "wrongly" matched strings:
The 1st capture-group has to consist 6-7 digits. (Also the capture-group is needed in the end). If there are letters after these numbers, there has to be a whitespace or underscore. The 1st example of the "shouldn't be catched" shows this. The entry is invalid, since there are letters after 123456 without the needed sign.
The last entry isn't really important, just there for convinience.
What am I missing? How do I adjust my regex in a way, that I can check for signs, only if there are letters following a number-chain?
You may use
^(\d{6,7})([_ ][A-Za-z].*)?\..*$
See the regex demo
Details
^ - start of a string
(\d{6,7}) - Group 1: 6 or 7 digits
([_ ][A-Za-z].*)? - an optional capturing group #2: a _ or space followed with a letter and then any 0+ chars as many as possible, up to the last
\. - . on a line
.* - the rest of the line
$ - end of string.
Check if this perl solution works for you.
> cat regex_catch.dat
123456_Test.pdf
123456 Test.pdf
123456.pdf
123456Abcasd.pdf
123456-Abcasd.pdf
123456_.pdf
> perl -ne ' print if m/\d+(([ _])[a-zA-Z]+| [a-zA-Z]*)?\.pdf/ ' regex_catch.dat
123456_Test.pdf
123456 Test.pdf
123456.pdf
>

Notepad++ How to remove a lines containing 3 same characters in order

lemme show an example. My file looks like this:
AaaAab
AacAaa
AacAap
AaaBbb
I would like to delete all the lines which contains 3 same characters in first or second 3 chars. Which means I will receive only AacAap from above example.
You can use something like:
^(?:(.)\1\1.*|.{3}(.)\2\2.*)$
Put that in the "Find what" field, and put an empty string in the "Replace with" field.
Here's a demo.
Ctrl+H
Find what: ^(?:(.)\1\1|...(.)\2\2).*\R
Replace with: LEAVE EMPTY
UNcheck Match case
check Wrap around
check Regular expression
DO NOT CHECK . matches newline
Replace all
Explanation:
^ : beginning of line
(?: : start non capture group
(.) : group 1, any character but newline
\1\1 : same as group 1, twice
| : OR
... : 3 any character
(.) : group 2, any character but newline
\2\2 : same as group 2, twice
) : end group
.* : 0 or more any character
\R : any kind of linebreak
Result for given example:
AacAap
You can use this pattern:
^(?:...)?(.)\1\1.*\r?\n?
The part (.)\1\1 matches three consecutive same characters with a capture and two backreferences. (?:...)? makes the three first characters optional, this way the consecutive characters can be at the beginning of the line or at the 4th position.
.*\r?\n? is only here to match all remaining characters of the line including the line break (you can preserve line breaks if you want, you only have to remove \r?\n?).
Check on the next regex (?im)^(?:...)?(.)\1\1.*(?:\R|\z).
To try the regex online and get an explanation, please click here.

Regex for finding string with more than 1 space but does not end with newline

Regex gurus, please help me here:
I need a regex that finds more than one space which does not end in a newline. For example consider the text below:
Column 1 Column 2 Column 3 Column 4
Column 1 Column 2 Column 3 Column 4
Regex should match the spaces between Column X, where X = 1, 2 or 3, but should not match the space after Column 4. Regex should also not match the single space between the word "Column" and its respective number.
I have tried \s+^(\n) but it is not working
You can use this regex:
/ {2,}(?! *$)/gm
RegEx Demo
Explanation:
{2,} - matches 2 or more spaces
}(?! *$) - is a negative lookahead to make sure to not to match when there or only 0 or more spaces before end of input
m flag makes sure that every newline is matched by anchor $ (used above in lookahead)
This works:
\b(\s(?![\n\r])){2,}\b
It essentially says match:
whole "words", hence the \b word boundary
with two or more spaces \s {2,}
not followed by a return character, by using a negative lookbehind (?![\n\r])