How to find capital letter using RegExp? - regex

I need a simple solution. I have a text that is improperly punctuated and in many places a comma is followed by a capital letter. Example: Here you are, You sicko. A comma followed by a cap. Any string to find these? ,\w doesn't work. I only want caps.
I only know basic regex. I'll use it to search in Notepad++
Thank you.

Try this one:
, [A-Z]
In general case, for any punctuation,
[.,!?\\-]+ [A-Z]+
See image below:
Link: https://regex101.com/r/BrGZmF/1

Related

Regex to replace first lowercase character in a line into uppercase

I have a very large file containing thousands of sentences. In all of them, the first word of each sentence begins with lowercase, but I need them to begin with uppercase.
I looked through the site trying to find a regex to do this but I was unable to. I learned a lot about regex in the process, which is always a plus for my job, but I was unable to find specifically what I am looking for.
I tried to find a way of compiling the code from several answers, including the following:
Convert first lowercase to uppercase and uppercase to lowercase (regex?)
how to change first two uppercase character to lowercase character on each line in vim
Regex, two uppercase characters in a string
Convert a char to upper case using regular expressions (EditPad Pro)
But for different reasons none of them served my purpose.
I am working with a translation-specific application which accepts regex.
Do you think this is possible at all? It would save me hours of tedious work.
You can use this regex to search for the first letters of sentences:
(?<=[\.!?]\s)([a-z])
It matches a lowercase letter [a-z], following the end of a previous sentence (which might end with one of the following: [\.!?]) and a space character \s.
Then make a substitution with \U$1.
It doesn't work only for the very first sentence. I intentionally kept the regex simple, because it's easy to capitalize the very first letter manually.
Working example: https://regex101.com/r/hqwK26/1
UPD: If your software doesn't support \U, you might want to copy your text to Notepad++ and make a replacement there. The \U is fully supported, just checked.
UPD2: According to the comments, the task is slightly different, and just the first letters of each line should be capitalized.
There is a simple regex for that: ^([a-z]), with the same substitution pattern.
Here is a working example: https://regex101.com/r/hqwK26/2
Taking Ildar's answer and combining both of his patterns should work with no compromises.
(?<=[\.!?]\s)([a-z])|^([a-z])
This is basically saying, if first pattern OR second pattern. But because you're now technically extracting 2 groups instead of one, you'll have to refer to group 2 as $2. Which should be fine because only one of the patterns should be matched.
So your substitution pattern would then be as follows...
\U$1$2
Here's a working example, again based on Ildar's answer...
https://regex101.com/r/hqwK26/13

Regex for extracting each word between hyphens

I am learning regex and trying to write a pattern that exactly matches each of the strings without'-' so that I can iterate for each of the groups and print the respective strings.
I have a string that looks like "Abcd001-wd2s-vwe1-20180e3103.txt"
I was able to write a regex for extracting Abcd001, wd2s and .txt from above text as shown below
(\A[^-]+)=> Abcd001
(-[^-]+-)=> wd2s
(\..*)=>.txt
However, I was unable to come up with the correct pattern for extracting the exact strings vwe1 and 20180e3103
It will be really helpful if you can guide me on this or if there is a better approach to achieve this?
Please note: [^-.]+ may give me all the words separately but I am looking for an option where I have a group defined for each of these strings so that its one to one mapping.
Thanks!
To get vwe1 or 20180e3103 from the example data, you might use a quantifier {2} or {3} to repeat matching one or more word charcters followed by a hyphen (?:\w+-){2}.
Then you could capture in a group ([^-.]+) matching not a hyphen or a dot.
(?:\w+-){2}([^-.]+)
Try the below regex
/\-([^\)]+)\-/gmi;
Also check the similar implementation:
https://stackoverflow.com/a/50336050/8179245

Stop when meet special word - Regex

I looking for help with a specific regex : (THE_WORD_I_WANT_TO_FIND)[^.?!\w]+([^.?!\s]+[^.?!\w]+){0,NUMBER_OF_WORDS}(MY_WORD_AT_END)
To explain, i'm looking for a specific word before another word. I have some conditions, I want to delimit to the sentence in which the WORD_AT_END is and to a specific number of word before it.
This regex does the job but I want to add a sentence delimiter : (\s\-\s) (in addition to . ? !).
Example :
Blablabla. A full Reference - Help is available in the Library, or watch the video Tutorial.
with the regex : (Help)[^.?!\w]+([^.?!\s]+[^.?!\w]+){0,}(watch) matchs and (Reference)[^.?!\w]+([^.?!\s]+[^.?!\w]+){0,}(watch) must not match...
Could you please help me?
Thank you !
SOLUTION (Thanks to #MostafaHussein) :
(Help)((?!\s-\s)\s(([\w|\w-|\pL|\pL-])+(?!\s-\s)\s+){0,})?(watch)
Here, - is a sentence delimiter if it is surrounded by two spaces.
The following Regex:
(Help)\s(?!-)(?s).+?(watch)
would match only:
Help is available in the Library, or watch
And not:
Reference - Help is available in the Library, or watch
As there is - will be found after the first word specified followed by a space e.g. Reference -
Update:
this regex will match any sentence as long as it does not contain - (it has to be surrounded by white-spaces)
Help((?!\s-\s)\s(([\w|\w-|\pL|\pL-])+\s+){0,7})?watch
Demo URL
Note: there has to be exactly 7 words before watch without counting Help and nothing matches if there is a - surrounded by spaces, also unicode letter character is taken in consideration so if there is something like ê will be matched correctly.

Using a Regex Pattern that finds Abbrevations

I am looking through volumes of data and need to identify certain patterns one of which is abbreviations. The basic rules to identify them in the content I am going through is
They are all is capital letters.
They are separated by dots.
They may be one or more alphabets
They may or may not end with a dot.
I am looking at individual words therefore looking for multiple occurrences in the string is not required.
Examples
U.S., U.S, U.S.S.R., V.
Can someone help construct a regex search pattern for me?
Many thanks
MS
You can use this regex:
^([A-Z]\.)*[A-Z]\.?$
RegEx Demo
This should do the trick:
\b(?:\p{Lu}\.)*\p{Lu}\b\.?
Demo
I've used \p{Lu} (unicode uppercase letters) since you want to match any alphabet.
If you can't make \b unicode aware in your dialect, here's an alternative:
(?<!\p{L})(?:\p{Lu}\.)*\p{Lu}(?!\p{L})\.?
This will work. it also matches the ending dots.
\b([A-Z]\.)*[A-Z]\b\.?

Detect the uppercase with Regex Yahoo Pipes

I have trouble with Regex. I would like to detect the uppercase and put a #in front. And also remove the spaces...
Example: Chris Pratt talks Jurassic World
#ChrisPratt talks #JurassicWorld
Any idea?
The regex to find two consecutive uppercased words seperated by a space would be:
/([A-Z][a-z]*)\s([A-Z][a-z]*)/g