Regular Expression That Contains All Of The Specific Letters in Notepad++ - regex

I have a dictionary list as a text files and want to select certain words that contains all of the members of a list of specific characters. Using the text editor notepad++ to apply following regular expression on the dictionary list. I've tried the following regular expression statement on notepad++;
[BLT]+
However, this matches not all of the letters in the square brackets, but any of the letters in the square brackets. Then I've also tried the following regular expression, including the word boundary;
\b[BLT]+
And this expression, again, matches all the occurences of the words including any, but not all of the letters listed in between the square brackets.
Desired Behaviour
Let say, the dictionary contains a list as below;
AL
BAL
BAK
LABAT
TAL
LAT
BALAT
LA
AB
LATAB
TAB
What I need is an expression that contains all of the the letters 'B','L','T' (not any!), thus expected behaviour should be as below;
LABAT
BALAT
LATAB
What is the most minimalist and generic regular expression for this problem?

You can use lookaheads:
^(?=.*B)(?=.*L)(?=.*T).+$
As an example for a more general case, the optimized regex for at least 1 B, 2 Ls and 3 Ts:
^(?=[^B\n]*B)(?=(?:[^L\n]*L){2})(?=(?:[^T\n]*T){3}).+$

Related

RegExp set contains one or multiple words

Is there a way in regular expressions to match a subset of words against a set of words separated by a separator that does not involve creating a new pattern for every new word added to the set.
Right now I cannot think of anything else than creating a (?:{item1, item2, ...}) pattern for every extra item in the set (see example below).
Example matching a single word of the set:
Set: foo,bar,baz
Match: foo
RegExp:/^(foo|bar|baz)$/ <- MATCH
Example that will match a subset of words:
Set: foo,bar,baz
Match: foo,bar
RegExp: /^(foo|bar|baz)(?:,(foo|bar|baz)(?:,(foo|bar|baz))?)?$/ <- MATCH
The pattern grows rapidly when adding new items to the set. Is there some (magical) way to do this in a shorter version?
One general approach which looks slightly better than your current attempt would be to use lookaheads:
^(?=.*\bfoo\b)(?=.*\bbar\b).*$
Demo
You may add one lookahead assertion for each CSV term which needs to be matched in the input CSV list.
Edit: If you want OR behavior here, then we can use an alternation of lookaheads. To match either foo or bar as a CSV term we can try:
^(?:(?=.*\bfoo\b)|(?=.*\bbar\b)).*$

Regex Conditionnals

I would like to control orphans in InDesign by applying a "No Break" character style based on a GREP expression. Basically, I need to target the last 2 words of a paragraph (That is to say: The last 2 strings of characters separated by a space).
I found a solution for my English publications where (\H+?\h?){2}$ works like a charm.
The problem is with my French publications where some punctuation requires to have a space before it. I am trying to specify the Matching Pattern based on the last character of the paragraph: If it is a ?, ! or :, I match the last 3 "words" using (\H+?\h?){3}$, if not than I match the last 2.
I thought the following expression would work:
(?(?=[\?!:]$)((\H+?\h?){3}$)|(\H+?\h?){2}$)
but somehow it always default to the "else" statement.
Can someone tell me where I did go wrong?
Maybe you want option (A) below
See if I understand correctly ...
The requirements are:
Capture the last two words
Even if in the end it is ?,! or :
(A) Use this to capture as group: https://regexr.com/4lr6h
(\w*)(?:\s*)(\w*)(?:\s*)(\w*)(?:[\?!:]|$)
(B) Use this to capture only words: https://regexr.com/4lr84
\w*\s\w*(?=(?:$|[\?!:]))
(C) Use this to capture tree last words with marks: https://regexr.com/4lr87
\w*\s\w*[\?!:]?$

Regular expression replace double and single quotes with nothing

I am using a sphinx search module on a site I am developing and there is the option to enter regular expressions to be replaced with specified characters.
The available options are Match Expression,Replace Expression and Replace Char (these are input fields in a CMS admin panel so I'm unsure of the actual code function used behind the scenes unfortunately). My understanding is the search checks for any expressions which match Match Expression and replaces the expressions specified in Replace Expression with those specified in Replace Char. So it's a sort of find and replace on matched terms.
Some examples that work:
Example 1
Match Expression: /[a-zA-Z0-9]*-[a-zA-Z0-9]*/
Replace Expression: /-/
Replace Char: empty
Matched text: SX500-123, GLX-11A, GLZX-VXV, GLZ/123, GLZV 123, CNC-PWR1
Result text: SX500123, GLX11A, GLZXVXV, GLZ/123, GLZV-123-123, CNCPWR1
More examples here: http://mirasvit.com/doc/ssp/2.3.2/ssp/global/long_tail
What I want to do is strip any single or double quotes or apostrophes from a search query.
Example inputs: "examination papers",'examination papers,'examination' "papers",pa"pers,pa'pers
Desired outputs: examination papers,examination papers,papers,papers,papers
I have tried just replacing the - with a " in the examples listed above for now but even this hasn't worked.
Any help would be greatly appreciated! Thank you
You can use these expressions:
Match Expression - /["'][\w\s]+["']|\w+["']\w+/
This will match the following text:
"examination papers",'examination papers','examination' "papers",pa"pers,pa'pers
Then you can use this regex to replace your quotes:
Replace Expression - /["']/
Replace Char - empty
So, your output will be:
examination papers,examination papers,examination papers,papers,papers
As a context for this answer. I understand from the tool you are using that your match expression gathers a resultset where you can apply another regex expression (Replace expression) that will replace the content matched with replace char

how to use Perl Regx to parse [key=value] if value has multiple data

I could not solve below problem so I used Perl script to parse
without regular expression, but I believe there's a regular expression for it.
Input String (there's no newline):
ObjectAddress=120.146.128.250,ObjectName=psyseds-tt1y,ObjectClass=SCM F5,ObjectDescription=,Aliases=psyseds-tt1y.site.com.,NameService=A,PTR,DynamicDNSUpdate=A,PTR,CNAME
Expected Output:
ObjectAddress=120.146.128.250
ObjectName=psyseds-tt1y
ObjectClass=SCM F5
ObjectDescription=
Aliases=psyseds-tt1y.site.com.
NameService=A,PTR
DynamicDNSUpdate=A,PTR,CNAME
I tried some regular expression to parse string, but I failed to parse
since it has multiple items with , separated value.
For example, NameService has two value A,PTR.
Please help me to build regular expression to parse above.
(.+?=.*?) does not pick up multiple values.
In general, it doesn't seem that your format is unambiguous — something like A=B,C=D could mean either that A maps to B and C maps to D, or that A maps to B,C=D — but for a good approximation, you can write:
my #output = split /,(?=\w+=)/, $input;
this will split $input on commas (,), with the added restriction that the comma must be followed by one or more "word characters" (\w — letters, digits, underscores) plus an equals sign. (This is called a lookahead assertion.)
You can match with this regex
(?<=^|,)(?<key>.*?)=(?<value>.*?)(?=,|$)
You can now access values by there group names

Regex multi word search

What do I use to search for multiple words in a string? I would like the logical operation to be AND so that all the words are in the string somewhere. I have a bunch of nonsense paragraphs and one plain English paragraph, and I'd like to narrow it down by specifying a couple common words like, "the" and "and", but would like it match all words I specify.
Regular expressions support a "lookaround" condition that lets you search for a term within a string and then forget the location of the result; starting at the beginning of the string for the next search term. This will allow searching a string for a group of words in any order.
The regular expression for this is:
^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b)
Where \b is a word boundary and the ?= is the lookaround modifier.
If you have a variable number of words you want to search for, you will need to build this regular expression string with a loop - just wrap each word in the lookaround syntax and append it to the expression.
AND as concatenation
^(?=.*?\b(?:word1)\b)(?=.*?\b(?:word2)\b)(?=.*?\b(?:word3)\b)
OR as alternation
^(?=.*?\b(?:word1|word2|word3)\b
^(?=.*?\b(?:word1)\b)|^(?=.*?\b(?:word2)\b)|^(?=.*?\b(?:word3)\b)
Maybe using a language recognition chart to recognize english would work. Some quick tests seem to work (this assumes paragraphs separated by newlines only).
The regexp will match one of any of those conditions... \bword\b is word separated by boundaries word\b is a word ending and just word will match it in any place of the paragraph to be matched.
my #paragraphs = split(/\n/,$text);
for my $p (#paragraphs) {
if ($p =~ m/\bthe\b|\band\b|\ban\b|\bin\b|\bon\b|\bthat\b|\bis\b|\bare\b|th|sh|ough|augh|ing\b|tion\b|ed\b|age\b|’s\b|’ve\b|n’t\b|’d\b/) {
print "Probable english\n$p\n";
}
}
Firstly I'm not certain what you're trying to return... the whole sentence? The words in between your two given words?
Something like:
\b(word1|word2)\b(\w+\b)*(word1|word2)\b(\w+\b)*\.
(where \b is the word boundary in your language)
would match a complete sentence that contained either of the two words or both..
You'd probably need to make it case insensitive so that if it appears at the start of the sentence it will still match
Assuming PCRE (Perl regexes), I am not sure that you can do it at all easily. The AND operation is concatenation of regexes, but you want to be able to permute the order in which the words appear without having to formally generate the permutation. For N words, when N = 2, it is bearable; with N = 3, it is barely OK; with N > 3, it is unlikely to be acceptable. So, the simple iterative solution - N regexes, one for each word, and iterate ensuring each is satisfied - looks like the best choice to me.