VSCode chaining regex transforms in a snippet - regex

I'm trying to transform a filename automatically in VSCode in two ways.
Let's say I have test-file-name.md, I want to end up with Test File Name in my document.
Right now I can do both part of the transform separately but I'm struggling to find how to combine them.
To remove all the - and replace them with a space I do this: ${TM_FILENAME_BASE/[-]/ /g}
And for capitalizing the first letter of each word I do: ${TM_FILENAME_BASE/(\b[a-z])/${1:/upcase}/g} (the \b is probably useless in this case I guess)
I've tried multiples way of writing them together but I can't find a way to have them one after the other.

Try this:
"body": "${TM_FILENAME_BASE/([^-]+)(-*)/${1:/capitalize}${2:+ }/g}"
Because of the g flag it will get all the occurrences and do each transform of the two capture groups multiple time. In your test case (test-)(file-)(name) that would be three times. It should work for any number of hyphenated words.
([^-]+) everything up to a hyphen.
${1:/capitalize} capitalize capture group 1.
${2:+ } means if there is a 2nd capture group, the (-*), add a space. I added this because at the end there is no hyphen - and thus there will be no 2nd capture group and thus no extra space should be added at the end.

Related

Regular Expression: Two words in any order but with a string between?

I want to use positive lookaheads so that RegEx will pick up two words from two different sets in any order, but with a string between them of length 1 to 20 that is always in the middle.
It also is already case insensitive, allow for any number of characters including 0 before the first word found and the same after the second word found - I am unsure if it is more correct to terminate in $.
Without the any order matching I am so far as:
(?i:.*(new|launch|releas)+.{1,20}(product1|product2)+.*)
I have attempted to add any order matching with the following but it only picks up the first word:
(?i:.*(?=new|launch|releas)+.{1,20}(?=product1|product2)+.*)
I thought perhaps this was because of the +.{1,20} in the middle but I am unsure how it could work if I add this to both sets instead, as for instance this could cause a problem if the first word is the very first part of the source text it is parsing, and so no character before it.
I have seen example where \b is used for lookaheads but that also seems like it may cause a problem as I want it to match when the first word is at the start of the source text but also when it is not.
How should I edit my RegEx here please?

Regex search string contains with specific count of letter

I am trying to workout the regex for searching string which satisfies count of letters where not in specific order
such as:
AAABBBCCCDDD
BBBAAADDDCCC
CCCAAABBBDDD
are TRUE:
so far, I have got A{3}B{3}C{3}D{3} would matches the first line, but for other lines would be needing different order.
is there any great solution that would work out?
You can match and capture a letter, then backreference that captured character. Repeat the whole thing as many times as needed, which looks to be 4 here:
(?:([A-Z])\1{2}){4}
https://regex101.com/r/vrQVgD/1
If the same character can't appear as a sequence more than once, I don't think this can be done in such a DRY manner, you'll need separate capture groups:
([A-Z])\1{2}(?!\1)([A-Z])\2{2}(?!\1|\2)([A-Z])\3{2}(?!\1|\2|\3)([A-Z])\4{2}
https://regex101.com/r/vrQVgD/2
which is essentially 4 of a variation on the below put together:
(?!\1|\2|\3)([A-Z])\4{2}
The (?!\1|\2|\3) checks that the next character hasn't occurred in any of the previously matched capture groups.

Regex taking too many characters

I need some help with building up my regex.
What I am trying to do is match a specific part of text with unpredictable parts in between the fixed words. An example is the sentence one gets when replying to an email:
On date at time person name has written:
The cursive parts are variable, might contains spaces or a new line might start from this point.
To get this, I built up my regex as such: On[\s\S]+?at[\s\S]+?person[\s\S]+?has written:
Basically, the [\s\S]+? is supposed to fill in any letter, number, space or break/new line as I am unable to predict what could be between the fixed words tha I am sure will always be there.
Now comes the hard part, when I would add the word "On" somewhere in the text above the sentence that I want to match, the regex now matches a much bigger text than I want. This is due to the use of [\s\S]+.
How am I able to make my regex match as less characters as possible? Using "?" before the "+" to make it lazy does not help.
Example is here with words "From - This - Point - Everything:". Cases are ignored.
Correct: https://regexr.com/3jdek.
Wrong because of added "From": https://regexr.com/3jdfc
The regex is to be used in VB.NET
A more real life, with html tags, can be found here. Here, I avoided using [\s\S]+? or (.+)?(\r)?(\n)?(.+?)
Correct: https://regexr.com/3jdd1
Wrong: https://regexr.com/3jdfu after adding certain parts of the regex in the text above. Although, in html, barely possible to occur as the user would never write the matching tag himself, I do want to make sure my regex is correctjust in case
These things are certain: I know with what the part of text starts, no matter where in respect to the entire text, I know with what the part of text ends, and there are specific fixed words that might make the regex more reliable, but they can be ommitted. Any text below the searched part is also allowed to be matched, but no text above may be matched at all
Another example where it goes wrong: https://regexr.com/3jdli. Basically, I have less to go with in this text, so the regex has less tokens to work with. Adding just the first < already makes the regex take too much.
From my own experience, most problems are avoided when making sure I do not use any [\s\S]+? before I did a (\r)?(\n)? first
[\s\S] matches all character because of union of two complementary sets, it is like . with special option /s (dot matches newlines). and regex are greedy by default so the largest match will be returned.
Following correct link, the token just after the shortest match must be geschreven, so another way to write without using lazy expansion, which is more flexible is to prepend the repeated chracter set by a negative lookahead inside loop,
so
<blockquote type="cite" [^>]+?>[^O]+?Op[^h]+?heeft(.+?(?=geschreven))geschreven:
becomes
<blockquote type="cite" [^>]+?>[^O]+?Op[^h]+?heeft((?:(?!geschreven).)+)geschreven:
(?: ) is for non capturing the group which just encapsulates the negative lookahead and the . (which can be replaced by [\s\S])
(?! ) inside is the negative lookahead which ensures current position before next character is not the beginning of end token.
Following comments it can be explicitly mentioned what should not appear in repeating sequence :
From(?:(?!this)[\s\S])+this(?:(?!point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
or
From(?:(?!From|this)[\s\S])+this(?:(?!point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
or
From(?:(?!From|this)[\s\S])+this(?:(?!this|point)[\s\S])+point(?:(?!everything)[\s\S])+everything:
to understand what the technic (?:(?!tokens)[\s\S])+ does.
in the first this can't appear between From and this
in the second From or this can't appear between From and this
in the third this or point can't appear between this and point
etc.

Regex specific word after equals sign

Problem is to find and replace word throughout file. I'm trying to rename several fields but not edit the system names for those fields. In the end, need to find/replace the Issue, Issues, issue, issues. I'm using Netbeans (find and replace regex) to open the .properties file which contains this code, but open to using something else.
Text to right side of equals sign needs to be replaced. Sometimes there are periods next to the word on the right, most of the time there aren't. Trying to use regex because there are around 10000 lines in the file (this is to the display text of system fields without changing the field reference itself).
Example of text to be searched:
issue.columns.admin.title=Issue Navigator Default Columns
browseproject.issues.by.status.more=View these issues in the Issue Navigator
sorted by Status
issue.operations.voting.resolved=You cannot vote or change your vote on
resolved issues.
Using the following as a pattern in netbeans, which gets around 90% correct.
(?<!([=.[a-z]]))issue(?!([[a-z]]))
However, it also matches the 'issue.' in fields like 'issue.operations.voting.resolved', which means that finding and replacing would cause problems by changing reference to the system fields. Is there a way to add to what I have already done to make it match words with periods appearing after the equals sign but not before?
You may use
(?i)(\G(?!^)|^[^=\n\r]*=)(.*?)\bissues?\b
Replace with $1$2<SOME_REPLACEMENT>.
See the regex demo.
It matches:
(?i) - an inline case insensitive modifier making the pattern case insensitive
(\G(?!^)|^[^=\n\r]*=) - Group 1 (referred to with $1 from the replacement pattern): either the location after the previous match (\G(?!^)) or the start of a line that is followed with 0+ chars other than = and line break chars and then a =
(.*?) - Group 2 (referred to with $2 from the replacement pattern): any 0+ chars, as few as possible, up to the first occurrence of...
\bissues?\b - a whole word issue or issues

Find/Match every similar words in word list in notepad++

I have a word list in alphabetical order.
It is ranked as a column.
I do not use any programming languages.
The list in notepad format.
I need to match every similar words and take them on same line.
I use regex but I can't achieve correct results.
First list is like:
accept
accepted
accepts
accepting
calculate
calculated
calculates
calculating
fix
fixed
A list I want:
accept accepted accepts accepting
calculate calculated calculates calculating
fix fixed
This seems to work, but you will have to do Replace All multiple times:
Find (^(.+?)\s*?.*?)\R\2 and replace with \1\t\2. . matches newline should be disabled.
How it works:
It finds some characters at the start of line ^(.+?), then any linebreak \R, and those same characters again \2.
\s*?.*? is used to skip unnecessary characters after multiple Replace All. \s*? skips the first whitespace, and .*? any remaining chars on the line.
Match is replaced with \1\t\2, where \1 is anything matched in (^(.+?)\s*?.*?), and \2 is anything matched with (.+?). \t is used to insert tab character to replace linebreak.
How it breaks:
Note that this will not work well with different words with similar prefix, like:
hand
hands
handle
handles
This will be hand hands handle handles after 2 replaces.
I can imagine doing this programatically with limited success (take first word which comes as a root and if derived word with this root follows, place it on the same line, else take the word as a new root and put it to new line). This will still fail at irregular words where root is not the same for all forms.
Without programming there is a way only with (manual) preprocessing – if there are less than 4 forms for given word in the list, you insert blank line for each missing verb form, so there are always 4 lines for each word. Then you can use regex to get each such a quadruple into one line.