RegEx: Find & Replace snake_case to UpperCamelCase/PascalCase Between Characters - regex

I am using my IDE's Find & Replace (w/ RegEx) feature to find & replace the type parameter of arguments to go from snake_case to PascalCase (AKA UpperCamelCase). There are several files and lines throughout the project that need to be changed, and manually doing so is quite error prone and tedious (plus I am sure I am going to need the essential pattern again for future changes).
For example:
CURRENT: function find_all_by_name_and_status(_i_find_all_by_name_and_statusCriteria find_all_by_name_and_status_criteria) ...
Should be:
DESIRED: function find_all_by_name_and_status(IFindAllByNameAndStatusCriteria find_all_by_name_and_status_criteria) ...
The patterns I am using are the following:
FIND: (?<=\()_(.)(Criteria)*
REPLACE: \U$1\L
The replace pattern will work, as far as I can see, if the 1st found capture group is correct (the letter just after an "_").
The core pattern of _(.) finds the correct components to replace, however, it captures the other parts of the string as well. So, I added a positive lookbehind (?<=\() to start at the opening parentheses and an ending dummy capture for (Criteria)*. The entire pattern seems to cause the core pattern to only match once and not repeatedly. (?R) does not seem to help either.
P.S.
It looks the (Criteria)* does not do anything either, but I figured that is the second problem to address after getting the core pattern to find all matches / repeat.
I feel like I am close to a solution, but not quite there yet. I, of course, could be VERY off base on the solution. Any help would be appreciated.

This expression,
(.*\()|(_)([a-z])([a-z]*)|(Criteria.*)
which is not really the best one, with a replacement of something similar to:
$1\U$3\L$4\E$5
might likely work here (the \E is for demoing).
In this demo on the right panel, the expression is explained, if you might be interested.
RegEx Circuit
jex.im visualizes regular expressions:

This is working with Notepad++
Ctrl+H
Find what: (\(|\G)_(.[^\W_]*)(?=\w+Criteria)
Replace with: $1\u$2
check Match case
check Wrap around
check Regular expression
Replace all
Explanation:
(\(|\G) # group 1, openning parenthesis or restart from last match position
_ # underscore
(.[^\W_]*) # group 2, 1 any character followed by 0 or more alphanum
(?=\w+Criteria) # positive lookahead, make sure we have 1 or more word character and Criteria
Replacement:
$1 # content of group 1
\u$2 # content of group 2 with first character uppercased
Result for given example:
function find_all_by_name_and_status(IFindAllByNameAndStatusCriteria find_all_by_name_and_status_criteria) ...
Screen capture:

Related

How would I match all data between 2 symbols with Regex?

I'm trying to find all data (including and after) a dash (-) appears, only up to the first delimiter which is a colon.
Example data:
Input:
bart23-testaccount#test.test:Test:Test:Test
Desired output:
bart23:Test:Test:Test
I've done some research and found this regex, but it's not fit for purpose -(.*):
My purpose is for thousands of lines which are all in various types of order, however the purpose remains the same, highlight all text between the - and the first : (which I will then proceed to delete). I will be using Notepad++
I can answer any questions or make my post more specific if need be, it's kind of hard to explain.
In Notepad++ you can use regex find/replace. Look for:
^([^-]+)-[^:]+(:.*)$
which captures everything up to the first - in group 1, and everything after (and including) the first : in group 2, and replace with
\1\2
Using Notepad++, without any capture group:
Ctrl+H
Find what: -[^:]+
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
Replace all
Explanation:
- # an hyphen (by default, the first one in a line)
[^:]+ # 1 or more not colon
Result for given example:
bart23:Test:Test:Test
Screen capture:

Regex substring matching on capture group

I have an advanced regex question (unless I am overthinking this).
With my basic knowledge of Regex, it is trivial to match static capture group further down in the string.
P(.): D:\1
Correctly matches
Pb: Db
Pa: Da
and (correctly) does not match
Pa: D:b
So far so good. However, what I need to capture is a set of [a-z]+ after the P and match the one character. So that these should also match:
Pabc: D:c
Pabc: D:a
Pba: D:b
Pba: D:a
but not
Pabc: D:x
Pba: D:g
I started going down the path of writing separate patterns like so (spaces added around the alternation for clarity):
P(.): D:\1 | P(.)(.): D:(\1|\2) | P(.)(.)(.): D:(\1|\2|\3)
But I cannot make even this clumsy solution work in Javascript Regex.
Is there an elegant, correct way to do this? Can it be done with Javascript's limited engine?
The following regex will do it:
P.*(.).*: D:\1
.*(.).* will match one or more characters, capturing one of them.
If the captured character matches the character after D:, then the regex matches.
If the captured character doesn't match, backtracking will ensure that it tries again with a different captured character, until all combinations have been tried.
See regex101.com for running example.

Regex: ignore characters that follow

I'd like to know how can I ignore characters that follows a particular pattern in a Regex.
I tried with positive lookaheads but they do not work as they preserves those character for other matches, while I want them to be just... discarded.
For example, a part of my regex is: (?<DoubleQ>\"\".*?\"\")|(?<SingleQ>\".*?\")
in order to match some "key-parts" of this string:
This is a ""sample text"" just for "testing purposes": not to be used anywhere else.
I want to capture the entire ""sample text"", but then I want to "extract" only sample text and the same with testing purposes. That is, I want the group to match to be ""sample text"", but then I want the full match to be sample text. I partially achieved that with the use of the \K option:
(?<DoubleQ>\"\"\K.*?\"\")|(?<SingleQ>\"\K.*?\")
Which ignores the first "" (or ") from the full match but takes it into account when matching the group. How can I ignore the following "" (")?
Note: positive lookahead does not work: it does not ignore characters from the following matches, it just does not include them in the current match.
Thanks a lot.
I hope I got your questions right. So you want to match the whole string including the quotes, but you want to replace/extract it only the expression without the quotes, right?
You typically can use the regex replace functionality to extract just a part of the match.
This is the regex expression:
""?(.*?)""?
And this the replace expression:
$1

Regular expression to find specific string and add characters when the're not already there in notepad++

Okay, I have zero knowledge of regular expressions so if someone can direct me to a better way to figure this out then by all means please do.
I figured out that a series of files are missing a particular naming convention for the database they will write to. So some might be dbname1, dbname2, dbname3, abcdbname4, abcdbname5 and they all need to have that abc in the beginning. I want to write a regular expression that will find all tags in the file that do not follow immediately by abc and add in abc. Any ideas how I can do this?
Again, forgive me if this is poorly worded/expressed. I really have absolutely zero knowledge of regular expressions. I can't find any questions that are asking this. I know that there are questions asking how to add strings to lines but not how to add only to lines that are missing the string when some already have it.
I thought I had written this in but I'm looking at lines that look like this
<Name>dbname</Name>
or
<Name>abcdbname</Name>
and I need to get them all to have that abc at the beginning
Cameron's answer will work, but so will this. It's called a negative lookbehind.
(?<!abc)(dbname\d+)
This regex looks for dbname followed by 1 or more digits, and not prefixed by abc. So it will capture dbname113.
This looks for any occurrence of dbname not immediately prefixed by the string "abc". THe original name is in the capture group \1 so you can replace this regex with abc\1 and all your files will be properly prefixed.
Not every program/language that implements regex (famously, javascript) supports lookbehinds, but most do and Notepad++ certainly does. Lookarounds (lookbehind / lookaheads) are exceedingly handy once you get the hang of them.
?<! negative lookbehind, ?<= positive lookbehind / lookbehind, ?! negative lookhead, and ?= lookahead all must be used within parantheses as I did above, but they're not used in capturing so they do not create capture groups, hence why the second set of parentheses is able to be referenced as \1 (or $1 depending on the language)
Edit: Given some better example criteria, this is possibly more what you're looking for.
Find: (<Name>)(.*?(?<!abc)dbname\d+)(</Name>)
Replace: \1abc\2\3
Alternatively, something a bit easier to understand, you can do this or something like this:
Find: (<Name>)(abc)?(dbname\d+)(</Name>)
Replace: \1abc\3\4
What this is does is:
Matches <Name>, captures as backreference 1.
Looks for abc and captures it, if it's there as backreference 2, otherwise 2 contains nothing. The ? after (abc) means match 0 or 1 times.
Looks for the dbname and captures it. and captures as backreference 3.
Matches </Name>, captures as backreference 4.
By replacing with \1abc\3\4, you kind of drop abc off dbname if it exists and replace dbname with abcdbname in all instances.
You can take this a step further and
Find: (<Name>)(?:abc)?(dbname\d+)(</Name>)
Replace: \1abc\2\3
prefix the abc with ?: to create a noncapturing group, so the backreferences for replacing are sequential.
Replace \bdbname(\d+) with abcdbname\1.
The \b means "word boundary", so it won't match the abc versions, but will match the others. The (...) parentheses represent a capturing group, which capture everything that's matched in-between into a numbered variable that can be later referenced (there's only one here so it goes in \1). The \d+ matches one or more digit characters.

Negative lookahead to match server directories not properly working

Given the following 3 example paths representing server paths i am trying to create a skiplist for my FTP client via PCRE regular expressions but can't seem to get the wished result.
/subdir-level-1/subdir-level-2/.../Author1_-_Title1-(1234)-Publisher1
/subdir-level-1/subdir-level-2/.../Author2_-_Title2_(5678)-PUBLiSHER2
/subdir-level-1/subdir-level-2/.../Author3_-_Title3-4951-publisher3
I want to skip all folders (not paths) that do not end with
-Publisher1
I am trying to create a working pattern with the help of this online help and and this regex tester but don't get any further than to this negative lookahead pattern
.*-(?!Publisher1)
But with this pattern all lines match because with all of them the substrings up to the pattern do all not contain the pattern.
/subdir/subdir/.../Author1_-_Title1-(1234) -Publisher1
/subdir/subdir/.../Author2_-_Title2_(5678) -PUBLiSHER2
/subdir/subdir/.../Author3_-_Title3-4951 -publisher3
What is my mistake and how would the correct pattern be just to match only the second and third line as line to be skipped but keep the first line?
EDIT to make it clearer what to highlight and what not.
Everything from the beginning of the path to the last slash must be ignored (allowed).
Everything after the last slash that matches the defined regex must be skipped.
EDIT to present an advanced pattern matching only the red part
[^/]*(?<!-Publisher2)$
Debuggex Demo
The regex which you have used is:
.*-(?!Publisher1)
I will tell you whats the fault in it.
According to this regex it will match those lines which dont have a - followed by Publisher1. Okay, do you notice the - there in between on yur text, yes. between author and title or after title. So all the strings satisfy this condition. Instead if you search with a negative lookahead in such a way that hiphen is with Publisher1 then your match should work.
So you plan on moving the hiphen inside the parenthesis so that it matches and make your regex like this :
^.*(?!-Publisher1)
but this will also not work, because here .* matches everything, so when we do a lookahead, we are not able to find a single character to match . Thus we will use a negative lookbehind. <.
.*(?<!-Publisher1)
what now ? . I have done everything but still I cannot get it to work. why is it so ?
because a negative lookbehind will lookback and tell if it is not followed by -Publisher1.
this is complex, just bear with me :
suppose your string
/subdir/subdir/.../Author1_-_Title1-(1234)-Publisher1
we do a negative lookbehind for -Publisher1. From the postition after 1 . i.e. at the end of the string -Publisher1 is visible when we lookback. BUT our condition is negative lookbehind. So it will move one character left to reach a position where it will no more be able to lookback and say that "Hey I can see -Publisher1 from here" because from here we are able to see "-Publisher" only. Our condtin satisfies but the regex still matches the rest of the string.
So it is essential to bind the lookbehind to the end of the string so that it doesnot move one character to the left to search for its match.
final regex:
.*(?<!-Publisher1)$
demo here : http://regex101.com/r/lE1vW2
This should suit your needs:
^.*(?<!-Publisher1)$
Debuggex Demo
I want to skip all folders that do not end with -Publisher1
You can use this negative lookahead based regex:
^(?!.*?-Publisher1$).+$
Working Demo
You could use the following regex in order to exclude lines containing Publisher1:
^((?!Publisher1).)*$
Online demo: http://regex101.com/r/gD8jK0