To match multiple conditions using regex - regex

I need support for creating a regex filter on a specific line. The requirement is like.
Any line including “Country=IN” ALLOW
Any line including “Country=”not "IN" Reject
Any line not including “Country” ALLOW
The word "Country" may be at anywhere in the line or may not be in the whole line.
Plz can someone support.

Assuming you're looking to match anything with the exception of lines containing "Country=not IN" (and assuming the extra quotes are a typo):
/^((?!Country=not IN).)*$/

Related

Regex: Find word that does not have a different target anywhere ahead of it in the file?

How do I find a word that does not have a specific target text anywhere ahead of it in a file?
Let's say I want to find "setting3" not preceeded by a right square bracket "]" (which denotes a header). This following file would fail that test, due to [header]:
[header] #can be named anything
setting1=True
setting2=various setting values, can include any type of text
setting3=1
But a file with an orphaned setting should be a match:
setting3=1
Lookbehinds won’t work, because I may have arbitrary settings in between the header and the text I'm looking for. Because my terms span multiple lines, it makes it trickier.
For context, this is to set a rule with a tool that only offers one regex line (Ansible, which I think uses Python's regex engine). I don't believe I have access to special settings (global, etc.)
You might be able to use the following regular expression :
^[^[]*setting3=1
It matches from the start of the file up to the setting you're looking for, but only matching characters that aren't [, which guarantees that it will only match the setting you're looking for if it wasn't preceded by an header since they contain [.
Note that this could miss some settings that should be matched, in particular if comments preceding the setting contain a [ character.

Deleting lines with specific words in multiple files in Notepad++

I'm trying to removing a specific line from many files I'm working on with Notepad++.
Upon searching, I found Notepad++ Remove line with specific word in multiple files within a directory but somehow the regex provided (^.*(?:YOURSTRINGHERE).*\r\n$) from the answers doesn't work for me (screenshot: https://cdn.discordapp.com/attachments/311547963883388938/407737068475908096/unknown.png).
I read on some other questions/answers that certain regex doesn't work in newer/older Notepad++ versions. I was using Notepad++ 5.x.x then updated to the latest 7.5.4, but neither worked with the regex provided in the question above.
At the moment I can work around it by replacing that line with nothing, twice (because there are only 2 variants that I need to remove from those files) but that leaves an empty line at the end of the files. So I have to do another step further to remove that empty line.
I'm hoping someone can offer helps that allow me to remove that line and leave no empty line/space behind.
The regex you attempt to use will only match your line, if it is followed by an empty line and Windows linebreaks (CR LF) are used. This is due to \r\n$ which matches a linebreak sequence followed by the end of the line.
Instead you might want to use
^.*(?:YOURSTRINGHERE).*\R?
To match the line containing your string and optionally a following line break sequence to remove the line instead of emptying it out. This will leave you with a trailing newline, if your word is contained in the last line of a file. You can use
(\R)?.*(?:YOURSTRINGHERE).*(?(1)|\R)
To avoid this. It uses a conditional to either match the previous linebreak, or the following if there is none.

searching for text that contain and not contain text and symbol

This regex code works correctly for searching for lines that begins with an exclamation mark and does not contain colon : symbol
^!([^:\n]*)$
In addition to the regex code above, I need it to contain lines of text that has the word "spelling" in it, like this code below but does not work.
^!([^:\n]spelling*)$
You could do this:
^![^:\n]*spelling[^:\n]*$
If you are looping through a file line by line, as is typical, there is no need to exclude the newlines from the match:
^![^:]*spelling[^:]*$
Another option to consider when you have complex requirements is breaking the match down into mutiple steps. This makes for simpler, easier to understand code that is less error-prone:
if (/^!/ and /spelling/ and not /:/)
spelling*
matches
spellin
spelling
spellingg
spellinggg
etc. You were trying for
^([^:\n]*spelling[^\n]*)$
aka
^([^:\n]*spelling.*)$ # Assuming /s isn't used
But that would allow : after spelling, so you really want
^([^:\n]*spelling[^:\n]*)$
What about ^([^:\n]*spelling.*)$ ?
Adding .* allows any character (except newline) to be present after 'spelling'

Use REGEX to find line breaks within a wrapped content

The direct question: How can I use REGEX lookarounds to find instances of \r\n that occur between a set of characters (stand in open and closing tags), "[ and ]" with arbitrary characters and line breaks inside as well?
The situation:
I have a large database exported to tab or comma delineated text files that I'm trying to import into excel. The problem is that some of the cells come from text areas that contain line breaks, and are qualified by double quotes. Importing into excel these line breaks are treated as new rows. I cannot adjust how the file is exported. I data needs to be preserved, but the exact format doesn't, so I was planning on using some placeholder for the returns or ~
Here's a generic illustration of the format of my data:
column1rowA column2rowA column3rowA column4rowA
column1rowB column2rowB "column3rowB
3Bcont
3Bcont
3Bcont
" column4rowB
column1rowC column2rowC column4rowC
column1rowD column2rowD "column3rowD
3Dcont" column4rowD
My thought has been to try to select and replace line breaks within the quotes using REGEX search and replace in Notepad++. To try and make is simpler I have tried adding a character to the double quotes to help indicate whether it is an opening or closing quote:
"[column3rowB
3Bcont
3Bcont
3Bcont
]"
I am new to REGEX. The progress I've made (which isn't much) is:
(?<="[) missing some sort of wildcard \r\n(?=.*]")
Every iteration I've tried has also included every line break between the first "[ and last ]"
I would also appreciate any other approaches that solve the underlying problem
If you can use some tool other than Notepad++, you can use this regex (see my working example on regex101):
(?!\n(([^"]*"){2})*[^"]*$)\n
It uses a negative lookahead to find line breaks only when not followed by an even number of quotes. You could replace them with <br>, spaces, or whatever is appropriate.
Breakdown:
(?! ... ) This is the negative lookahead, necessary because it's zero-width. Anything matched by it will still be available to match again.
(([^"]*"){2})* This is the other key piece. It ensures even-numbered pairs of non-quote characters followed by a quote.
[^"]*$ This is ensuring that there are no more quotes from there until the end of the string.
Caveat:
I couldn't get it to work in Notepad++ because it always recognizes $ as the end of a line, not the end of the entire string.
Great answer from Brian. I added an option that would only consider real linebreaks (i.e. \n\r), which worked for my CSV file:
(?!\n|\r(([^"]*"){2})*[^"]*$)\n|\r

find all text before using regex

How can I use regex to find all text before the text "All text before this line will be included"?
I have includes some sample text below for example
This can include deleting, updating, or adding records to your database, which would then be reflex.
All text before this line will be included
You can make this a bit more sophisticated by encrypting the random number and then verifying that it is still a number when it is decrypted. Alternatively, you can pass a value and a key instead.
Starting with an explanation... skip to end for quick answers
To match upto a specific piece of text, and confirm it's there but not include it with the match, you can use a positive lookahead, using notation (?=regex)
This confirms that 'regex' exists at that position, but matches the start position only, not the contents of it.
So, this gives us the expression:
.*?(?=All text before this line will be included)
Where . is any character, and *? is a lazy match (consumes least amount possible, compared to regular * which consumes most amount possible).
However, in almost all regex flavours . will exclude newline, so we need to explicitly use a flag to include newlines.
The flag to use is s, (which stands for "Single-line mode", although it is also referred to as "DOTALL" mode in some flavours).
And this can be implemented in various ways, including...
Globally, for /-based regexes:
/regex/s
Inline, global for the regex:
(?s)regex
Inline, applies only to bracketed part:
(?s:reg)ex
And as a function argument (depends on which language you're doing the regex with).
So, probably the regex you want is this:
(?s).*?(?=All text before this line will be included)
However, there are some caveats:
Firstly, not all regex flavours support lazy quantifiers - you might have to use just .*, (or potentially use more complex logic depending on precise requirements if "All text before..." can appear multiple times).
Secondly, not all regex flavours support lookaheads, so you will instead need to use captured groups to get the text you want to match.
Finally, you can't always specify flags, such as the s above, so may need to either match "anything or newline" (.|\n) or maybe [\s\S] (whitespace and not whitespace) to get the equivalent matching.
If you're limited by all of these (I think the XML implementation is), then you'll have to do:
([\s\S]*)All text before this line will be included
And then extract the first sub-group from the match result.
(.*?)All text before this line will be included
Depending on what particular regular expression framework you're using, you may need to include a flag to indicate that . can match newline characters as well.
The first (and only) subgroup will include the matched text. How you extract that will again depend on what language and regular expression framework you're using.
If you want to include the "All text before this line..." text, then the entire match is what you want.
This should do it:
<?php
$str = "This can include deleting, updating, or adding records to your database, which would then be reflex.
All text before this line will be included
You can make this a bit more sophisticated by encrypting the random number and then verifying that it is still a number when it is decrypted. Alternatively, you can pass a value and a key instead.";
echo preg_filter("/(.*?)All text before this line will be included.*/s","\\1",$str);
?>
Returns:
This can include deleting, updating, or adding records to your database, which would then be reflex.