Notepad++ regex to capitalise the first word of selected sentence - regex

Relates to customising Notepad++. I know TextFX 'Sentence case.' exists but I wanted to control this using my own regex/macro.
Testing against: hello my name is john. hello my name is john. hello my name is john.
Currently I have this which works fine when nothing is selected/highlighted with the mouse.
Find what: ((?<=^|(?<=[.!?]\s)))(\w)
Replace with: \u$0
However, when I select/highlight the second (middle) sentence only (starting at the h and finishing on the period .), the regex does nothing. Note: I have 'Use selection' ticked in N++ and am using 'Replace All'.
This makes sense because the regex is looking for the start of a line or the char pattern .!? followed by a space.
My question is how to alter the regex so that it works when selecting/highlighting any sentence, no matter if it isn't at the beginning of a line as per my example.
I have tried adding in a negative lookbehind to match when no characters are found but I only managed to uppercase the first word of every sentence.

The ^ matches the start of a line, while your selected region is not at the line start. You may replace it with \A, the start of the matching string. Since it will match at each selected region, you cannot use \w, you need to add + after it so as not to turn each subsequent word char to upper case.
Use
(?<=\A|(?<=[.!?]\s))(\w+)
and replace with \u$0.
Alternative way is to use capturing groups (then, you will be able to match cases where the number of whitespaces between !, ? or . and the next word char is more than one):
(\A|[.!?]\s+)(\w+)
to replace with $1\u$2.

Related

Using a regex to append to the end of non-blank lines

I would think that this would be a common question, but I can't find anybody asking how to do this. There are people asking how to do the opposite (find blank lines) and add a <br><br> at the end of each one. For human readability, this document has blank lines between paragraphs.
(I don't want to replace the blank lines with <br><br>. I know this would achieve the same result, but for human readability and personal preference, I don't like how this makes the document one giant block of text.)
How can I write a regex that captures -- I don't know if this is the right word to use; maybe "groups"? -- the end of lines that aren't blank so that I can append to the end of them?
I am using Visual Studio code, so I'd like this to work in the search/replace box:
I'm assuming in the replacement box above, I'd need to say $some group number(s?), so I just said $x as a temporary placeholder. Here's what I've tried as search patterns:
^(?!:($))$
^(?!:(\S$))$
^(?!:([^\S]$))$
^(?!:([^\s]$))$
^(?!([^\S]+))$
All of these seem to grab the inverse of what I'm trying to find. I guess my strategy has been, between the beginning and end of the line, there shouldn't be only whitespace. But I'm pretty sure that's not what I'm saying.
You can use
Find What:      (\S)[^\S\n]*(\n)
Replace With: $1<br><br>$2
NOTE: The above replacement will not add the <br>s at the end of the last line if it is not blank. If you need that, use
Find What:      (\S)[^\S\n]*$
Replace With: $1<br><br>
See the regex demo. The regex above matches the last non-whitespace char on a line (capturing it in Group 1 to keep it), then matches horizontal whitespace (if any) and then captures a line break that is also captured to keep in the output.
Details
(\S) - Group 1: any non-whitespace char
[^\S\n]* - zero or more horizontal whitespace chars
(\n) - Group 2: line break.
$ - end of a line (note that m flag (in its PCRE meaning) is always on, by default, in VSCode regex).
The replacement is $1<br><br>$2, Group 1 value + <br><br> + Group 2 value (if you use the first regex).
is changed into
This works to retain the spaces at the end of lines:
Find: (?<=^.*)(\S+.*)
Replace: $1<br><br>

RegEx help for NotePad++

I need help with RegEx I just can't figure it out I need to search for broken Hashtags which have an space.
So the strings are for Example:
#ThisIsaHashtagWith Space
But there could also be the Words "With Space" which I don't want to replace.
So important is that the String starts with "#" then any character and then the words "With Space" which I want to replace to "WithSpace" to repair the Hashtags.
I have a Document with 10k of this broken Hashtags and I'm kind of trying the whole day without success.
I have tried on regex101.com
with following RegEx:
^#+(?:.*?)+(With Space)
Even I think it works on regex101.com it doesn't in Notepad++
Any help is appreciated.
Thanks a lot.
BR
In your current regex you match a # and then any character and in a capturing group match (With Space).
You could change the capturing group to capture the first part of the match.
(#+.*?)With Space
Then you could use that group in the replacement:
$1WithSpace
As an alternative you could first match a single # followed by zero or more times any character non greedy .*? and then use \K to reset the starting point of the reported match.
Then match With Space.
#+(?:.*?)\KWith Space
In the replacement use WithSpace
If you want to match one or more times # you could use a quantifier +. If the match should start at the beginning of string you could use an anchor ^ at the start of the regex.
Try using ^(#.+?)(With\s+Space) for your regex as it also matches multiple spaces and tab characters - if you have multiple rows that you want to affect do gmi for the flags. I just tried it with the following two strings, each on a separate line in Notepad++
#blablaWith Space
#hello###$aWith Space
The replace with value is set to $1WithSpace and I've tried both replaceAll and replace one by one - seems to result in the following.
#blablaWithSpace
#hello###$aWithSpace
Feel free to comment with other strings you want replaced. Also be sure that you have selected the Regular Extension search mode in NPP.
Try this? (#.*)( ).
I tried this in Notepad++ and you should be able to just replace all with $1. Make sure you set the find mode to regular expressions first.
const str = "#ThisIsAHashtagWith Space";
console.log(str.replace(/(#.*)( )/g, "$1"));

Regex match till end of text

I'm using Regex to match whole sentences in a text containing a certain string. This is working fine as long as the sentence ends with any kind of punctuation. It does not work however when the sentence is at the end of the text without any punctuation.
This is my current expression:
[^.?!]*(?<=[.?\s!])string(?=[\s.?!])[^.?!]*[.?!]
Works for:
This is a sentence with string. More text.
Does not work for:
More text. This is a sentence with string
Is there any way to make this word as intended? I can't find any character class for "end of text".
End of text is matched by the anchor $, not a character class.
You have two separate issues you need to address: (1) the sentence ending directly after string, and (2) the sentence ending sometime after string but with no end-of-sentence punctuation.
To do this, you need to make the match after string optional, but anchor that match to the end of the string. This also means that, after you recognize an (optional) end-of-sentence punctuation mark, you need to match everything that follows, so the end-of-string anchor will match.
My changes: Take everything after string in your original regex and surround it in (?:...)? - the (?:...) being a "non-remembered" group, and the ? making the entire group optional. Follow that with $ to anchor the end of the string.
Within that optional group, you also need to make the end-of-sentence itself optional, by replacing the simple [.?!] with (?:[.?!].*)? - again, the (?:...) is to make a "non-remembered" group, the ? makes the group optional - and the .* allows this to match as much as you want after the end-of-sentence has been found.
[^.?!]*(?<=[.?\s!])string(?:(?=[\s.?!])[^.?!]*(?:[.?!].*)?)?$
The symbol for end-of-text is $ (and, the symbol for beginning-of-text, if you ever need it, is ^).
You probably won't get what you're looking for with by just adding the $ to your punctuation list though (e.g., [.?!$]); you'll find it works better as an alternative choice: ([.?!]|$).
Your regex is way too complex for what you want to achieve.
To match only a word just use
"\bstring\b"
It will match start, end and any non-alphanum delimiters.
It works with the following:
string is at the start
this is the end string
this is a string.
stringing won't match (you don't want a match here)
You should add the language in the question for more information about using.
Here is my example using javascript:
var reg = /^([\w\s\.]*)string([\w\s\.]*)$/;
console.log(reg.test('This is a sentence with string. More text.'));
console.log(reg.test('More text. This is a sentence with string'));
console.log(reg.test('string'))
Note:
* : Match zero or more times.
? : Match zero or one time.
+ : Match one or more times.
You can change * with ? or + if you want more definition.

Regex: Find multiple matching strings in all lines

I'm trying to match multiple strings in a single line using regex in Sublime Text 3.
I want to match all values and replace them with null.
Part of the string that I'm matching against:
"userName":"MyName","hiScore":50,"stuntPoints":192,"coins":200,"specialUser":false
List of strings that it should match:
"MyName"
50
192
200
false
Result after replacing:
"userName":null,"hiScore":null,"stuntPoints":null,"coins":null,"specialUser":null
Is there a way to do this without using sed or any other substitution method, but just by matching the wanted pattern in regex?
You can use this find pattern:
:(.*?)(,|$)
And this replace pattern:
:null\2
The first group will match any symbol (dot) zero or more times (asterisk) with this last quantifier lazy (question mark), this last part means that it will match as little as possible. The second group will match either a comma or the end of the string. In the replace pattern, I substitute the first group with null (as desired) and I leave the symbol matched by the second group unchanged.
Here is an alternative on amaurs answer where it doesn't put the comma in after the last substitution:
:\K(.*?)(?=,|$)
And this replacement pattern:
null
This works like amaurs but starts matching after the colon is found (using the \K to reset the match starting point) and matches until a comma of new line (using a positive look ahead).
I have tested and this works in Sublime Text 2 (so should work in Sublime Text 3)
Another slightly better alternative to this is:
(?<=:).+?(?=,|$)
which uses a positive lookbehind instead of resetting the regex starting point
Another good alternative (so far the most efficient here):
:\K[^,]*
This may help.
Find: (?<=:)[^,]*
Replace: null

Regex that selects everything after first consecutive capitalized words

I'd like to select everything after the first few consecutive capitalized words. ie:
Terry Smith is a good school teacher. She works tirelessly.
would become;
is a good school teacher. She works tirelessly.
So far this doesn't work work;
(^[A-Z][a-z]+(?=\s[A-Z])(?:\s[A-Z][a-z]+)+)([\s\S]*)
I'm using it in Drupal's feeds tamper plugin with the "find replace regex" feature in order to replace everything after "Terry Smith" with blank space.
The following expression with match all consecutive capitalized words at the beginning of the sentence.
^(?:(?:[A-Z][a-z]+)(?>\s*))+
Regex101 Demo
If you want to remove that part from the setnence then all you have to do is replace it with the empty string.
If you want to replace the part that comes after it then you can use the following expression:
^((?:(?:[A-Z][a-z]+)(?>\s*))+)([\s\S]+)
and use a replacement string of $1 or whatever in your language that is used to reference the first captured group.
Regex101 Demo
This will find the capital words:
[A-Z][a-z]+(?=\b)\s*
You might want to replace the + with * after [a-z] to also match single-character capital words.
To get all capitalized words at the beginning of the string, add ^( and )+ around it:
^([A-Z][a-z]+(?=\b)\s*)+