Conditional replace depending on which character is found - regex

This is NOT a duplicate of How to use conditionals when replacing in Notepad++ via regex as I am asking something very specific here which I cannot implement following the info in that question. So kindly allow this question.
I want to replace a range of characters with a corresponding range of characters. So far, I can only do it with multiple operations.
For example, match any word that starts with a capital Latin character in the range [ABEZHIKMNOPTYXZ] and is followed by a Greek lowercase letter [α-ωά-ώ] and replace the character in the first matched group with a similar-looking character but in the Greek range [ΑΒΕΖΗΙΚΜΝΟΡΤΥΧΖ] (note, they look the same but are different characters).
What I came up so far was multiple replacements, ie.
(A)([α-ωά-ώ])
Α\2
(B)([α-ωά-ώ])
Β\2
....
So that for example:
Aνθρώπινος would become Ανθρώπινος
Bάτος would become Βάτος
Preferably this should work in EmEditor, Notepad++ being the 2nd option.

Notepad++ supports conditional replacement, you can use it like:
Find what: (?:(A)|(B)|(E)|(Z)|(H)|(I)|(K)|(M)|(N)|(O)|(P)|(T)|(Y)|(X)|(Z))(?=[α-ωά-ώ])
Replace with: (?{1}Α:(?{2}Β:(?{3}Ε:(?{4}Ζ:)))) add the other Greek letters similarly
Replacement:
(?: # start non capture group
(?{1} # if group 1 exists "A"
Α # replace with greek letter
: # else
(?{2} # if group 2 exists "B"
Β # replace with greek letter
: # else
(?{3} # and so on ...
Ε
:
(?{4}
Ζ
:
)
)
)
)
) # end non capture group
(?= # positive lookahead, make sure we have after:
[α-ωά-ώ] # a small greek letter
) # end lookahead
I've made a test but for only for 2 letters "A" and "B" and replace them with more visual different letters "X" and "Y" just to show the way it works.
Screen capture (before):
Screen capture (after):

Related

Notepad++ and regex - how to title case string between two particular strings?

I have hundreds of bib references in a file, and they have the following syntax:
#article{tabata1999precise,
title={Precise synthesis of monosubstituted polyacetylenes using Rh complex catalysts.
Control of solid structure and $\pi$-conjugation length},
author={Tabata, Masayoshi and Sone, Takeyuchi and Sadahiro, Yoshikazu},
journal={Macromolecular chemistry and physics},
volume={200},
number={2},
pages={265--282},
year={1999},
publisher={Wiley Online Library}
}
I would like to title case (aka Proper Case) the journal name in Notepad++ using regular expression. For example, from Macromolecular chemistry and physics to Macromolecular Chemistry and Physics.
I am able to find all instances using:
(?<=journal\=\{).*?(?=\})
but I am unable to change the case via Edit > Convert Case to. Apparently it doesn't work on find all and I have to go one by one.
Next, I tried recording and running a macro but Notepad++ just hangs indefinitely when I try to run it (option to run until the end of the file).
So my question is: does anyone know the replace regex syntax I could use to change the case? Ideally, I would also like to use "|" exclusions for particular words such as " of ", " an ", " the ", etc. I tried to play with some of the examples provided here, but I was not able to integrate it into my look-aheads.
Thank you in advance, I'd appreciate any help.
This works for any number of words:
Ctrl+H
Find what: (?:journal={|\G)\K(?:(\w{4,})|(\w+))(\h*)
Replace with: \u$1\E$2$3
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
(?: # non capture group
journal={ # literally
| # OR
\G # restart from last match position
) # end group
\K # forget all we have seen until this position
(?: # non capture group
(\w{4,}) # group 1, a word with 4 or more characters
| # OR
(\w+) # group 2, a word of any length
) # end group
(\h*) # group 3, 0 or more horizontal spaces
Replacement:
\u # uppercased the first letter of the following
$1 # content of group 1
\E # stop the uppercased
$2 # content of group 2
$3 # content of group 3
Screenshot (before):
Screenshot (after):
if the format is always in the form:
journal={Macromolecular chemistry and physics},
i.e. journal followed by 3 words then use the following:
Find: journal={(\w+)\s*(\w+)\s*(\w+)\s*(\w+)
Replace with: journal={\u\1 \u\2 \l\3 \u\4
You can modify that if you have more words to replace by adding more \u\x, where x is the position of the word.
Hope it helps to give you an idea to move forward for a better solution.
\u translates the next letter to uppercase (used for all other words)
\l translates the next letter to lowercase (used for the word "and")
\1 replaces the 1st captured () search group
\2 replaces the 2nd captured () search group
\3 replaces the 3rd captured () search group

How to negate string pattern using re2 regex?

I'm using google re2 regex for the purpose of querying Prometheus on Grafana dashboard. Trying to get value from key by below 3 types of possible input strings
1. object{one="ab-vwxc",two="value1",key="abcd-eest-ed-xyz-bnn",four="obsoleteValues"}
2. object{one="ab-vwxc",two="value1",key="abcd-eest-xyz-bnn",four="obsoleteValues"}
3. object{one="ab-vwxc",two="value1",key="abcd-eest-xyz-bnn-ed",four="obsoleteValues"}
..with validation as listed below
should contain abcd-
shouldn't contain -ed
Somehow this regex
\bkey="(abcd(?:-\w+)*[^-][^e][^d]\w)"
..satisfies the first condition abcd- but couldn't satisfy the second condition (negating -ed).
The expected output would be abcd-eest-xyz-bnn from the 2nd input option. Any help would be really appreciated. Thanks a lot.
If I understand your requirements correctly, the following pattern should work:
\bkey="(abcd(?:-e|-(?:[^e\W]|e[^d\W])\w*)*)"
Demo.
Breakdown for the important part:
(?: # Start a non-capturing group.
-e # Match '-e' literally.
| # Or the following...
- # Match '-' literally.
(?: # Start a second non-capturing group.
[^e\W] # Match any word character except 'e'.
| # Or...
e[^d\W] # Match 'e' followed by any word character except 'd'.
) # Close non-capturing group.
\w* # Match zero or more additional word characters.
) # Close non-capturing group.
Or in simple terms:
Match a hyphen followed by:
only the letter 'e'. Or..
a word* not starting with 'e'. Or..
a word starting with 'e' not followed by 'd'.
*A "word" here means a string of word characters as defined in regex.
Maybe have a go with:
\bkey="((?:ktm-(?:(?:e-|[^e]\w*-|e[^d]\w*-)*)abcd(?:(?:-e|-[^e]\w*|-e[^d]\w*)*)|abcd(?:(?:-e|-[^e]\w*|-e[^d]\w*)*)))"
This would ensure that:
String starts with either ktm- or abcd.
If starts with ktm-, there should at least be an element called abcd.
If starts with abcd, there doesn't have to be another element.
Both options check that there must not be an element starting with -ed.
See the online demo
The struggle without lookarounds...

Regex for text file

I have a text file with the following text:
andal-4.1.0.jar
besc_2.1.0-beta
prov-3.0.jar
add4lib-1.0.jar
com_lab_2.0.jar
astrix
lis-2_0_1.jar
Is there any way i can split the name and the version using regex. I want to use the results to make two columns 'Name' and 'Version' in excel.
So i want the results from regex to look like
andal 4.1.0.jar
besc 2.1.0-beta
prov 3.0.jar
add4lib 1.0.jar
com_lab 2.0.jar
astrix
lis 2_0_1.jar
So far I have used ^(?:.*-(?=\d)|\D+) to get the Version and -\d.*$ to get the Name separately. The problem with this is that when i do it for a large text file, the results from the two regex are not in the same order. So is there any way to get the results in the way I have mentioned above?
Ctrl+H
Find what: ^(.+?)[-_](\d.*)$
Replace with: $1\t$2
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(.+?) # group 1, 1 or more any character but newline, not greedy
[-_] # a dash or underscore
(\d.*) # group 2, a digit then 0 or more any character but newline
$ # end of line
Replacement:
$1 # content of group 1
\t # a tabulation, you may replace with what you want
$2 # content of group 2
Result for given example:
andal 4.1.0.jar
besc 2.1.0-beta
prov 3.0.jar
add4lib 1.0.jar
com_lab 2.0.jar
astrix
lis 2_0_1.jar
Not quite sure what you meant for the problem in large file, and I believe the two regex you showed are doing opposite as what you said: first one should get you the name and second one should give you version.
Anyway, here is the assumption I have to guess what may make sense to you:
"Name" may follow by - or _, followed by version string.
"Version" string is something preceded by - or _, with some digit, followed by a dot or underscore, followed by some digit, and then any string.
If these assumption make sense, you may use
^(.+?)(?:[-_](\d+[._]\d+.*))?$
as your regex. Group 1 is will be the name, Group 2 will be the Version.
Demo in regex101: https://regex101.com/r/RnwMaw/3
Explanation of regex
^ start of line
(.+?) "Name" part, using reluctant match of
at least 1 character
(?: )? Optional group of "Version String", which
consists of:
[-_] - or _
( ) Followed by the "Version" , which is
\d+ at least 1 digit,
[._] then 1 dot or underscore,
\d+ then at least 1 digit,
.* then any string
$ end of line

Regex expression Giving all letters

I need all groups of 4 capital letters in a string.
So I am using REGEXP_REPLACE([Description],'\b(?![A-Z]{4}\b)\w+\b',' ')
in Tableau to replace all small letters and extra characters. I want to get only instances of capital letters with 4 string length.
By google I got to know i cannot use Regex_extract (Since /g is not supported)
My String:
"The following trials have no study data-available, in the RBM mart. It appears as is this because they were . In y HIWEThe trials currently missing data are:
JADA, JPBD, JVCS, JADQ, JVDI, JVDO, JVTZ"
I have written [^A-Z]{4}/g.
I want:
HIWE JADA JPBD JVCS JADQ JVDI JVDO JVTZ
But this is also giving me single capital letter and space included.
Thanks
You can use this regex:
((?<=[A-Z]{4})|^).*?(?=[A-Z]{4}|$)
Explaining:
( # one of:
^ # the starting position
| # or
(?<=[A-Z]{4}) # any position after four upper letters
) #
.*? # match anything till the first:
(?= # position which in front
[A-Z]{4} # has four upper letters
| # or
$ # is the string's end
) #
Any doubt feel free to ask :)

How to search for words that contains upper case letters except a few words in Notepad++ Regex

Given a sql query with some columns that have upper case letters, how can I search for words that contain a upper case letter but not SQL keywords, for example:
SELECT ... ,
table1.thisColumn AS column,
...
FROM table1
I've tried without success to build something like this:
(?!AS+FROM+LEFT+JOIN+ON)[A-Z]{2,}
Two things to note about your existing regex.
The "or" operator in the exception-list should be a | and not a +. The + will be treated as an actual character whereas the | will tell the regex engine "left" or "right", as you want in this case.
To match a full-word that contains at least one upper-case letter, you actually need to check for the lower-case letters as well - but "require" at least one upper-case letter.
I've come up with the following:
(?:^|\.|[\t ])(?!AS|FROM|LEFT|JOIN|ON)([a-z0-9]*[A-Z][a-zA-Z0-9]+)
I tested this against your sample query and also added other random SQL to it (such as an expanded field list, WHERE clause, etc.). It successfully found each word that contained at least one capital letter and was not in the list of keywords to ignore.
If you're using this to just "search" via Find Next or Find All in Opened|Current Documents, it will highlight the matching word and a preceding . or whitespace character(s).
If you're using this to "replace", the matched word (without the preceding . or whitespace character(s)) can be used/accessed via the \1 match.
Regex Explained:
(?: # non-matching group;
# a "word" is required to be preceeded by one of the following
^ # beginning of line
|\. # period
|[\t ] # tab or space;
# note: we don't use the \s here because a newline will break notepadd++'s "find all" feature
)
(?!AS|FROM|LEFT|JOIN|ON) # list of words to ignore
( # group to match
[a-z0-9]* # word can start with lowercase a-z or 0-9
[A-Z] # required uppercase letter
[a-zA-Z0-9]+ # word can end with lowercase/uppercase a-z or 0-9
)
You'll also want to add in any missing SQL keywords (if needed), or other allowed characters in the "words can have these characters" list.