I can remove line without 1 space character with notepad++
^[^ ]*$
How to remove line without 2 space character.
To match a line that does not contain 2 spaces, you could use a negative lookahead asserting not 2 times a space using \S* to match zero or more times a non whitespace char.
^(?!\S* \S* \S*$).+$
^ Start of string
(?! Negative lookahead, assert what is on the right is not
\S* \S* \S*$ Match 2 spaces between 0+ non whitespace chars \S*
) Close lookahead
.+ Match any char 0+ times except a newline
$ End of string
Regex demo
I guess, maybe you want to remove lines with 1 space and 3 or more, maybe then
^ {1}$|^ {3,}$
might be OK to look into.
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
Related
I am trying to solve http://play.inginf.units.it/#/level/10
I have some strings as follows:
title={AUTOMATIC ROCKING DEVICE},
author={Diaz, Navarro David and Gines, Rodriguez Noe},
year={2006},
title={The sitting position in neurosurgery: a retrospective analysis of 488 cases},
author={Standefer, Michael and Bay, Janet W and Trusso, Russell},
journal={Neurosurgery},
title={Fuel cells and their applications},
author={Kordesch, Karl and Simader, G{"u}nter and Wiley, John},
volume={117},
I need to match the names in bold. I tried the following regex:
(?<=author={).+(?=})
But it matches the entire string inside {}. I understand why is it so but how can I break the pattern with and?
It took me a little while to get the samples to show up in your link. What about:
(?:^\s*author={|\G(?!^) and )\K(?:(?! and |},).)+
See an online demo
(?:^\s*author={|\G(?!^) and ) - Either match start of a line followed by 0+ whitespace chars and literally match 'author={` or assert position at end of previous match but negate start-line;
\K - Reset starting point of reported match;
(?:(?! and |},).)+ - Match any if it's not followed by ' and ' or match a '}' followed by a comma.
Above will also match 'others' as per last sample in linked test. If you wish to exclude 'others' then maybe add the option to the negated list as per:
(?:^\s*author={|\G(?!^) and )\K(?:(?! and |},|\bothers\b).)+
See an online demo
In the comment section we established above would not work for given linked website. Apparently its JS based which would support zero-width lookbehind. Therefor try:
(?<=\bauthor={(?:(?!\},).*?))\b[A-Z]\S*\b(?:,? [A-Z]\S*\b)*
See the demo
(?<= - Open lookbehind;
\bauthor={ - Match word-boundary and literally 'author={';
(?:(?!\},).*?)) - Open non-capture group to match a negative lookahead for '},' and 0+ (lazy) characters. Close lookbehind;
\b[A-Z]\S*\b - Match anything between two word-boundaries starting with a capital letter A-Z followed by 0+ non-whitespace chars;
(?:,? [A-Z]\S*\b)* - A 2nd non-capture group to keep matching comma/space seperated parts of a name.
If using a lookbehind assertion is supported and matching word characters, you might use:
(?<=\bauthor={[^{}]*(?:{[^{}]*}[^{}]*)*)[A-Z][^\s,]*,(?:\s+[A-Z][^\s,]*)+\b
Explanation
(?<= Postive lookahead, assert that to the left of the current position is
\bauthor={ Match author={ preceded by a word boundary
[^{}]*(?:{[^{}]*}[^{}]*)* Match optional chars other than { } or match {...}
) Close the lookbehind
[A-Z] Match an uppercase char A-Z
[^\s,]*, Optionally match non whitespace chars except , and then match ,
(?: Non capture group to repeat as a whole part
\s+[A-Z][^\s,]* Match 1+ whitespace chars, uppercase char A-Z, optional non whitespace chars except ,
)+ Close the non capture group and repeat it 1 or more times
\b a word boundary
See a regex101 demo.
I have strings that look like some text - other text and I need to delete everything before and including the hyphen - and the space after it
But do to typos I might have :
some text -other text or some text- other text or some text-other text or double spaces instead of single spaces
I am using RegEx ^.*\s+\-\s+ and this works for some text - other text with single or multiple spaces before and after the -
But for the other possibilities where the whitespace is missing, I have used two or so I have ^.*\s+\-\s+|.*\-\s|.*\-
Is there a more concise patter that does not use multiple ors for this?
Thank you for any help on this
https://regex101.com/r/TNU7i6/1
Instead of using an alternation with 3 patterns, you might use a pattern to match all except the -, then match the - and optional whitespace chars.
^[^-]*-\s*
Regex demo
If there should be a non whitespace char following, and a lookahead is supported:
^[^-]*-\s*(?=\S)
^ Start of string
[^-]*- Match 0+ times any char except -, then match -
\s* Match optional whitespace chars
(?=\S) Positive lookahead, assert a non whitespace char to the right
Regex demo
Note that \s and the negated character class [^-] can also match a newline.
1st solution: With your shown samples, please try following.
^.*?\s+\S+\s?-\s*(.*)$
OR
^.*?\s+\S+\s*-\s*(.*)$
Online demo for above regex
2nd solution: You could use \K option too to forget matched regex part, in that case try:
^.*?\s+\S+\s?-\s*\K.*$
OR
^.*?\s+\S+\s*-\s*\K.*$
Online demo for above regex
1st solution explanation:
^.*?\s+ ##From starting of value matching till 1st occurrence of space(s).
\S+\s? ##Matching 1 or more non-space occurrences followed by optional space here.
-\s* ##Matching - followed by optional space.
(.*)$ ##Matching everything till last of value.
2nd solution explanation:
^.*?\s+ ##Matching everything till 1st space occurrence(s) from starting of value.
\S+\s? ##Matching non spaces 1 or more occurrences followed by space optional.
-\s*\K ##Matching - followed by spaces(0 or more occurrences) and \K will discard all previous matched values(so that we can match exact values as per output).
.*$ ##Matching everything after previously matched values(which is discarded by \K).
I have fields which contain data in the following possible formats (each line is a different possibility):
AAA - Something Here
AAA - Something Here - D
Something Here
Note that the first group of letters (AAA) can be of varying lengths.
What I am trying to capture is the "Something Here" or "Something Here - D" (if it exists) using PCRE, but I can't get the Regex to work properly for all three cases. I have tried:
- (.*) which works fine for cases 1 and 2 but obviously not 3;
(?<= - )(.*) which also works fine for cases 1 and 2;
(?! - )(.+)| - (.+) works for cases 2 and 3 but not 1.
I feel like I'm on the verge of it but I can't seem to crack it.
Thanks in advance for your help.
Edit: I realized that I was unclear in my requirements. If there is a trailing " - D" (the letter in the data is arbitrary but should only be a single character), that needs to be captured as well.
About the patterns that you tried:
- (.*)This pattern will match the first occurrence of - followed by matching the rest of the line. It will match too much for the second example as the .* will also match the second occurrence of -
(?<= - )(.*)This pattern will match the same as the first example without the - as it asserts that is should occur directly to the left
(?! - )(.+)| - (.+) This pattern uses a negative lookahead which asserts what is directly to the right is not (?! - ). As none of the example start with - , the whole line will be matched directly after the negative lookahead due to .+ and the second part after the alternation | will not be evaluated
If the first group of letters can be of varying length, you could make the match either specific matching 1 or more uppercase characters [A-Z]+ or 1+ word characters \w+.
To get a more broad match, you could match 1 or more non whitespace characters using \S+
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*
Explanation
^ Start of string
(?:\S+\h-\h)? Optionally match the first group of non whitespace chars followed by - between horizontal whitespace chars
\K Clear the match buffer (Forget what is currently matched)
\S+ Match 1+ non whitespace characters
(?: Non capture group
\h(?!-\h) Match a horizontal whitespace char and assert what is directly to the right is not - followed by another horizontal whitespace char
\S+ Match 1+ non whitespace chars
)* Close non capture group and repeat 1+ times to match more "words" separated by spaces
Regex demo
Edit
To match an optional hyphen and trailing single character, you could add an optional non capturing group (?:-\h\S\h*)?$ and assert the end of the string if the pattern should match the whole string:
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*\h*(?:-\h\S\h*)?$
Regex demo
You may use
^(?:.*? - )?\K.*?(?= - | *$)
^(?:.*?\h-\h)?\K.*?(?=\h-\h|\h*$)
See the regex demo
Details
^ - start of string
-(?:.*? - )? - an optional non-capturing group matching any 0+ chars other than line break chars as few as possible up to the first space-space
\K - match reset operator
.*? - any 0+ chars other than line break chars as few as possible
(?= - | *$) - space-space or 0+ spaces till the end of string should follow immediately on the right.
Note that \h matches any horizontal whitespace chars.
^(?:[A-Z]+ - \K)?.*\S
demo
Since "Something Here" can be anything, there's no reason to specially describe the eventual last letter in the pattern. You don't need something more complicated.
With this pattern I assume that you are not interested by the trailing spaces, that's why I ended it with \S. If you want to keep them, remove the \S and change the previous quantifier to +.
I need to come up with a regular expression with flavor PCRE. It must be a regular expression <
I want to grab all lines of text that end in a newline character up until I encounter <zz> where zz is a digit enclosed in '<' and '>'.
e.g.
111a z
222 aset
333 //+
12 <zz> 11
abc
def
It would need to capture "111a z", "222 aset", "333 //+" in this case [and nothing else].
Right now I have ^(?!.*<zz>)[^\n]+(?=\n) but it's pretty far off from what it needs to be.
For clarification purposes, the regex I was using shows <zz>, but definitely looking for a digit enclosed in angle brackets.
Would really appreciate some help.
Edit
This is /really/ difficult for me, because at least one of the answers looks like it does the job. I'll try to mark one... Thank you, everyone.
You could repeat matching all lines including a Unicode newline sequence while the <\d+> pattern does not occur in the line.
\A(?:(?!.*<\d+>).*\R)+
Explanation
\A Start of string
(?: Non capture group
(?!.*<\d+>) Negative lookahead, assert that the pattern <\d+> does not occur
.*\R Match any char except a newline followed by matching a Unicode newline sequence
)+ Close the non capturing group, and repeat it 1+ times to match at least a single line
Regex demo
If the <\d+> has to be present, you could assert that with a positive lookahead at the end
\A(?:(?!.*<\d+>).*\R)+(?=.*<\d+>)
I'm not sure why you're using a negative lookahead, but I think you want a positive lookahead. This lets you only match the line if you see the <zz> in a lookahead. I would solve the problem using something like this:
^.*(?=.*(?:\n.*)*<\d+>)\n
^ Anchors match to beginning of line (like yours)
.* Matches all the characters it can. In this case it matches the whole line because it has to satisfy the \n at the end.
(?=...) Performs a positive lookahead (makes sure the string exists somewhere ahead)
.*(?:\n.*)* Allows any number of characters on any number of lines
<\d+> Only matches one or more digits enclosed in angle brackets
\n ensures that there is a newline at the end of the line.
I have assumed that the text may have more than one line that contains one or digits bracketed in '<' and '>', and that those lines are not themselves to be matched.
You can use the following expression to match the lines of interest.
^(?!.*<\d+>).*\r?\n(?=[\s\S]*?<\d+>)
PCRE Demo
The regex engine performs the following operations.
^ match beginning of line
(?! begin negative lookahead (prevent matching line with '<12>'
.* match 0+ characters other than newlines
<\d+> match '<', 1+ digits, '>'
) end negative lookahead
.* match 0+ characters other than newlines
\r?\n match newline optionally preceded by '\r'
(?= begin positive lookahead
[\s\S]*? match 0+ characters (incl. newlines), non-greedily
<\d+> match '<', 1+ digits, '>'
) end positive lookahead
'\r', a carriage return, will be present if the file was produced when using the Windows operating system.
I have the following template :
1251 Left Random Text I want to fill
It can go through multiple lines
As you can see
9841 Right Again we see a lot of random text with 3115 numbers
And this also goes
To multiple lines
0121 Right
5151 Right This one is just one line
I was wrong
9731 Left This one is just a line
5123 NA Instruction 5151 was wrong
4113 Right Instr 9841 was correct
We checked
I want to have 3 groups:
1251
Left
Random Text I want to fill
It can go through multiple lines
As you can see
I'm using
(\d+)\s(\w+)\s(.*)
but it stops at the current line only (so I get only Random Text I want to fill in group 3, although I want including As you can see)
If I'm using Single line flag I get only 1 match for each group, group 3 almost being all
Here is live : https://regex101.com/r/W3x0mH/4
You could use a repeating group matching all the lines while asserting that the next line does not start wit 1+ digits followed by Left or Right:
(\d+)\s(\w+)\s(.*(?:\r?\n(?!\d).*)*)
Explanation
(\d+)\s(\w+)\s Match the first 2 groups
(Third capturing group
.* Match 0+ times any char except a newline
(?: Non capturing group
\r?\n(?!\d).* Match newline, assert what is on the right is not a digit
)* Close non capturing group and repeat 0+ times
) Close capturing group
Regex demo
You may use this regex with a lookahead:
^(\d+)\s(\w+)\s(.*?)(?=\n\d|\z)
with DOTALL and MULTILINE modifiers.
Updated Regex Demo
RegEx Details:
^: Line start
(\d+): Match and capture 1+ digits in group #1
\s: match a whitespace
(\w+): Match and capture 1+ word characters in group #2
\s: match a whitespace
(.*?): Match 0 or more of any character (non-greedy) provided next lookahead assertion is satiSfied
(?=\n\d|\z): Lookahead assertion to assert that we have a newline followed by a digit or there is end of input
Faster Regex:
If you are using this regex on a long string then you should also keep overall performance in mind as a regex with DOTALL modifier will tend to get slow on a large size text. For that I suggest using this regex that doesn't need DOTALL modifier:
^(\d+)\s(\w+)\s(.*(?:\n.*)*?)(?=\n\d|\z)
RegEx Demo 2
On regex101 demo this regex takes just 181 steps as compared to first one that takes 1300 steps.
For the third group, repeat any character while using negative lookahead for ^\d, which would indicate the start of a new match:
(\d+)\s(\w+)\s((?:(?!^\d)[\s\S])*)
https://regex101.com/r/W3x0mH/5
You may try with this regex:
^(\d+)\s+(\w+)\s+(.*?)(?=^\d|\z)
^(\d+)\s+ , ^\d+ Line begins with numbers followed by one or more whitespace character \s+
(\w+)\s+ where \w+ one or more characters (left,right,na or something else) followed by one or more whitespace \w+
(.*?) matches everything until it finds a line beginning with number or \z end of string.
I think it fits your requirement....
Regex101