Regex non greedy match with tab character

Regex non greedy match with tab character - regex

I do not understand why if given the following line (vi with set list on)
10.0.6.5^IVirtual^IVmware^IHTTP, MS SQL, Windows SVC^IcpHelpdesk $
Why the following regex:
^I.*?^I
does not match the string 'Virtual' in my line above? I am using the regex below in my VI search and replace
:%s/^I.*?^I/replace/g
this returns no match however on the same string if I use
^I.*^I
I would get
10.0.6.5replaceIcpHelpdesk $
What I attempting to say with ^I.*?^I is from the first tab character (^I) match anything (with the dot except line breaks) zero or more times ( *? ) until you come to the next token with is the tab character (^I)
I don't see what I am missing and any help would be appreciated. Thank you

Are you talking about vim regex here? In that case the non-greedy quantifier is \{-}:
\t.\{-}\t
Otherwise you can do it by not matching tab characters with a negation group:
\t[^\t]*\t

Related

Find lines without specified string and remove empty lines too

So, I know from this question how to find all the lines that don't contain a specific string. But it leaves a lot of empty newlines when I use it, for example, in a text editor substitution (Notepad++, Sublime, etc).
Is there a way to also remove the empty lines left behind by the substitution in the same regex or, as it's mentioned on the accepted answer, "this is not something regex ... should do"?
Example, based on the example from that question:
Input:
aahoho
bbhihi
cchaha
sshede
ddhudu
wwhada
hede
eehidi
Desired output:
sshede
hede
[edit-1]
Let's try this again: what I want is a way to use regex replace to remove everything that does not contain hede on the text editor. If I try .*hede.* it will find all hede:
But it will not remove. On a short file, this is easy to do manually, but the idea here is to replace on a larger file, with over 1000+ lines, but that would contain anywhere between 20-50 lines with the desired string.
If I use ^((?!hede).)*$ and replace it with nothing, I end up with empty lines:
I thought it was a simple question, for people with a better understanding of regex than me: can a single regex replace also remove those empty lines left behind?

An alternative try
Find what: ^(?!.*hede).*\s?
Replace with: nothing
Explanation:
^ # start of a line
(?!) # a Negative Lookahead
. # matches any character (except for line terminators)
* # matches the previous token between zero and unlimited times,
hede # matches the characters hede literally
\s # matches any whitespace character (equivalent to [\r\n\t\f\v ])
? # matches the previous token between zero and one times,

Using Notepad++.
Ctrl+H
Find what: ^((?!hede).)*(?:\R|\z)
Replace with: LEAVE EMPTY
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
((?!hede).)* # tempered greedy token, make sure we haven't hede in the line
(?:\R|\z) # non capture group, any kind of line break OR end of file
Screenshot (before):
Screenshot (after):

Have you tried:
.*hede.*
I don't know why you are doing an inverse search for this.
You can use sed like:
sed -e '/.*hede.*/!d' input.txt

select everything that does not match pattern

I'm trying to write a regex which gets everything but a specified pattern. I've been trying to use negative lookahead but whenever testing my expression, it never works.
I have files that are of this form:
(garbage info) filename (other garbage).extension
or
[garbage info] filename [other garbage].extension
For example, one of the files is [O2CXDR] report january [77012].pdf or
(XEW7CK) sales commissions (99723).xls
I'm using the regex.h library in C so I believe that it is a POSIX library.
I'm hoping on extracting "filename" and ".extension" so that I can write a script which will the files filename.extension
So far, I have a an expression to select the garbage info with the brackets and the spaces around it but I'm unable to select the rest.
\s*(\[|\().*?(\]|\))+\s*
and the negative lookahead I tried was:
.*(?!(\s*(\[|\().*?(\]|\))+\s*)).*
but it's just selecting everything in a single match.
I'm sure that I'm not understanding the lookaheads and lookbehind correctly. What do I have to do to fix my expression? Could somebody explain how they work since I'm a bit lost. Thanks!

Since you haven't specified a regex engine, I'll target a subset that can use the tags \K, \G, and \A (like PCRE).
The following uses a combination of match resets (\K), tempered greedy token, and start of match (without start of string) \G(?!\A), further explained below:
See regex in use here
Note: remove empty matches
\s*[[(].*?[])]\s*\K|\G(?!\A)(?:(?!\s*[[(].*?[])]\s*).)+
Match one of the following:
Option 1:
\s* Match any whitespace any number of times
[[(] Match either [ or (
.*? Match any character any number of times, but as few as possible (lazy matching)
[])] Match either ] or )
\s* Match any whitespace any number of times
\K Reset match - sets the given position in the regex as the new start of the match. This means that nothing preceding this tag will be captured in the overall match.
Option 2:
\G(?!\A) Match only at the starting point of the search or position of the previous successful match end, but not at the start of the string.
(?:(?!\s*[[(].*?[])]\s*).)+ Tempered greedy token matching anything more than once except the negative lookahead pattern (which is the same as the first option).

$ cat input_file
(garbage info) filename (other garbage).extension
(garbage info)filename(other garbage).extension
(garbage info)file name(other garbage).extension
[garbage info] filename [other garbage].extension
[garbage info]filename[other garbage].extension
[garbage info]file name[other garbage].extension
$ sed -re 's/^\s*(\([^\)]*\)|\[[^]]*\])\s*(.*\S)\s*(\([^\)]*\)|\[[^]]*\])(\..*)$/\2\4/' input_file
filename.extension
filename.extension
file name.extension
filename.extension
filename.extension
file name.extension

Maybe, as simple as
^(?:\(([^)]*)\)\s*([^(\r\n]*?)\s*\(([^)]*)\)|\[([^\]]*)\]\s*([^(\r\n]*?)\s*\[([^\]]*)\])\.(.*)$
we could extract those values.
Demo 1
RegEx Circuit
jex.im visualizes regular expressions:
If you don't need all of those capturing groups, we'd then simply remove those that we wouldn't want:
^(?:\([^)]*\)\s*([^(\r\n]*?)\s*\([^)]*\)|\[[^\]]*\]\s*([^(\r\n]*?)\s*\[[^\]]*\])\.(.*)$
Demo 2

How would you replace only a single character in the middle of text with duplicates?

How would you use the regex in Notepad++ to format replacing a single character that it finds in every line excepts for the duplicate ones in the certain line further?
test1:_|TEST:-TEST.|
test2:_|TEST:-TEST.|
test3:_|TEST:-TEST.|
As shown in the test code, there are two colons; I'm trying to replace the first colon with each line to a ; and NOT the second one found; the result of me doing the regex should equal to this:
test1;_|TEST:-TEST.|
test2;_|TEST:-TEST.|
test3;_|TEST:-TEST.|

Ctrl+H
Find what: ^.+?\K:
Replace with: ;
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
.+? # 1 or more any character but newline, not greedy
\K # forget all we have seen until this position
: # colon
Screen capture (before):
Screen capture (after):

I'm guessing that maybe this expression,
(\w+)\s*(?::)(\s*_\s*\|\s*\w+\s*:\s*-\w+\.\|)
with a replacement of $1;$2 might work.
DEMO 1
Or with less boundaries, this expression:
([^:]+):(.*)
with the same replace.
DEMO 2

It's done like this
Find (?m)^[^:\r\n]*\K:
Replace ;
https://regex101.com/r/rT1vG9/1

Regex Multiline Capitalize all first letter of words and remove space notepad ++

I made this regex demo (working) here: https://regex101.com/r/WSwEbY/6
When I use it in notepad ++, it doesn't work with multiple lines:
hello ladies how are you Today
hello ladies how are you Today
-> result is on a single line:
helloLadiesHowAreYouTodayHelloLadiesHowAreYouToday
Informations:
search: [^\w]+(\w)
replaceby: \U$1
n++version: 7.5.8
I also try to check 'multiline' or add '$' to en of the search.

Here, you tried to match everything that is not a word character:
[^\w]
However, the new line character \n is also not a word character so it will also be matched by [^\w] and replaced.
You should exclude \n from the character class as well:
[^\w\n]+(\w)
Demo

How about matching just the space or the start(^) with multiline flag?
(?:^| +)(\w)
sub:
\U$1

In addition to not matching newlines in the repeated character set, you should also alternate with a check for if you're at the start of a line - that way the first word on a line will be capitalized as well. Use the m flag so that ^ matches the start of a line:
(?:^|[^\w\n]+)(\w)
Replace with:
\U$1
Output:
HelloLadiesHowAreYouToday
IAmFineThankYou
https://regex101.com/r/dsOcOD/1

Add to end of line that contains a specific word and starts with x

I would like to add some custom text to the end of all lines in my document opened in Notepad++ that start with 10 and contain a specific word (for example "frog").
So far, I managed to solve the first part.
Search: ^(10)$
Replace: \1;Batteries (to add ;Batteries to the end of the line)
What I need now is to edit this regex pattern to recognize only those lines that also contain a specific word.
For example:
Before: 1050;There is this frog in the lake
After: 1050;There is this frog in the lake;Batteries

You can use the regex to match your wanted lines:
(^(10).*?(frog).*)
the .*? is a lazy quantifier to get the minimum until frog
and replace by :
$1;Battery
Hope it helps,

You should allow any characters between the number and the end of line:
^10.*frog.*
And replacement will be $0;Batteries. You do not even need a $ anchor as .* matches till the end of a line since . matches any character but a line break char.
NOTE: There is no need to wrap the whole pattern with capturing parentheses, the $0 placeholder refers to the whole match value.
More details:
^ - start of a line
10 - a literal 10 text
.* - zero or more chars other than line break chars as many as possible
frog - a literal string
.* - zero or more chars other than line break chars as many as possible

try this
find with: (^(10).*(frog).*)
replace with: $1;Battery

Use ^(10.*frog.*)$ as regex. Replace it with something like $1;Batteries

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex non greedy match with tab character - regex

Are you talking about vim regex here? In that case the non-greedy quantifier is \{-}: \t.\{-}\t Otherwise you can do it by not matching tab characters with a negation group: \t[^\t]*\t

Related

Find lines without specified string and remove empty lines too

select everything that does not match pattern

How would you replace only a single character in the middle of text with duplicates?

Regex Multiline Capitalize all first letter of words and remove space notepad ++

Add to end of line that contains a specific word and starts with x

Categories

Resources