Detect the uppercase with Regex Yahoo Pipes - regex

I have trouble with Regex. I would like to detect the uppercase and put a #in front. And also remove the spaces...
Example: Chris Pratt talks Jurassic World
#ChrisPratt talks #JurassicWorld
Any idea?

The regex to find two consecutive uppercased words seperated by a space would be:
/([A-Z][a-z]*)\s([A-Z][a-z]*)/g

Related

How to find capital letter using RegExp?

I need a simple solution. I have a text that is improperly punctuated and in many places a comma is followed by a capital letter. Example: Here you are, You sicko. A comma followed by a cap. Any string to find these? ,\w doesn't work. I only want caps.
I only know basic regex. I'll use it to search in Notepad++
Thank you.
Try this one:
, [A-Z]
In general case, for any punctuation,
[.,!?\\-]+ [A-Z]+
See image below:
Link: https://regex101.com/r/BrGZmF/1

How can I extract an exact set of words from a string using a regex?

I've looked everywhere and haven't been able to find a question that answers this specific use case (maybe I've missed it). But basically I'm wanting to extract the following text from a string: Welcome James:
This text must be at the start of the string, e.g:
Welcome James: Now some text follows...blahblah - This would be a match
However
This is some text Welcome James: some more text... - This would not be a match.
So basically I'd hard code Welcome James: into the regex (I don't need any other variables of Welcome <name>:.
Is this possible? All I've been able to find is regexes that match single words without spaces or characters.
To search at the start of a string, just prefix the regex with the ^ (caret) character:
/^Welcome James/
Here is the answer :) But #charles gave it too!
^(Welcome James)

Using a Regex Pattern that finds Abbrevations

I am looking through volumes of data and need to identify certain patterns one of which is abbreviations. The basic rules to identify them in the content I am going through is
They are all is capital letters.
They are separated by dots.
They may be one or more alphabets
They may or may not end with a dot.
I am looking at individual words therefore looking for multiple occurrences in the string is not required.
Examples
U.S., U.S, U.S.S.R., V.
Can someone help construct a regex search pattern for me?
Many thanks
MS
You can use this regex:
^([A-Z]\.)*[A-Z]\.?$
RegEx Demo
This should do the trick:
\b(?:\p{Lu}\.)*\p{Lu}\b\.?
Demo
I've used \p{Lu} (unicode uppercase letters) since you want to match any alphabet.
If you can't make \b unicode aware in your dialect, here's an alternative:
(?<!\p{L})(?:\p{Lu}\.)*\p{Lu}(?!\p{L})\.?
This will work. it also matches the ending dots.
\b([A-Z]\.)*[A-Z]\b\.?

Regular Expression to match sentence that end with special characters like . ! ? but ignore words like George W. Bush,Mr. etc

I'm looking for a regular expression to parse a text file in which the sentences end with special characters like ., ! and ? but ignore words like George W. Bush, Mr. Hopkins Mrs. Violet etc.
I tried (?!Mr|Mrs|[A-Za-z]\.\s)\S.+?[.!?](?=\s+|$) but this doesn't not seem to be working.
English is a decidedly non-regular language. I don't think a regex will be sufficient: you'll probably need a full tokenizer, plus some kind of machine learning, possibly a Markov model, to detect where one sentence ends and the next begins. And even then it would only be a heuristic -- since human language use is sloppy, an exact solution may never be possible.
A regex can not intelligently recognise what is an abbreviation and what is the end of the sentence.
What regex can do, is to define a set of characters that mark the end of the sentence and are therefore not matched and to define a set of exceptions when those characters should be matched anyway.
Try:
([^.!?]|(?<=etc|Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?])+
See it here on Regexr.
This will not match the chars .!?
But will match those chars anyway when they are preceeded by something out of this alternation etc|Dr|Mr|Mrs|\b[A-Za-z]|\s
I'm no regex expert, but I found this regex to work well at identifying breaks between sentences.
(?<!\b\p{Upper}\w{0,4})(?=[.?!]\s*\p{Upper})[.?!]\s*
It looks for sentence punctuation followed by a capital letter, excluding where there is a word beginning with a capital, because titles are capitalized.
Also note this is java regex, so \p{Upper} might not work.
Also, the title length of 4 is arbitrary, regex requires a fixed length for lookback, and I couldn't think any title abbreviations longer than 4 characters.
Let me break it down for anyone learning regex.
# Don't match where we have a short word beginning with a capital (for titles)
(?<!\b\p{Upper}\w{0,4})
(?=[.?!]\s*\p{Upper}) # Only match when followed by a captial. (for abbreviations)
[.?!] #match the punctuation
\s* #also match white space, so no trimming is required (optional)
And here's a nonsense testing paragraph that puts this regex through the ropes:
This is a sentence. I really want to win, etc. and win more. This is pretty neat. I want to thank Mr. Shea for his work. Mr. Hugo helped as well. M. Thénardier is thankful as well. The wonderful Mr. Albert Einstien PhD. is a cool dude as well.
Edit: I've been thinking about this, and I've found one case where this regex doesn't work. Consider this phrase:
Joey loved talking to Max. This was because Max is his best friend.
In this example, Max. This is picked up as a name and title. This only works with short names (under five characters with \w{0,4}, the 4 could be adjusted to something smaller to filter out longer titles) I can't think of any way to fix this other than learning what words are name or titles. I guess my method is't perfect, but I think it's close enough for most circumstances.

What regex can I use to match only letters, numbers, and one space between each word?

How can I create a regex expression that will match only letters and numbers, and one space between each word?
Good Examples:
Amazing
Hello World
I am 500 years old
Bad Examples:
Hello world
I am 500 years old.
I am Chuck Norris
Most regex implementations support named character classes:
^[[:alnum:]]+( [[:alnum:]]+)*$
You could be clever though a little less clear and simplify this to:
^([[:alnum:]]+ ?)*$
FYI, the second one allows a spurious space character at the end of the string. If you don't want that stick with the first regex.
Also as other posters said, if [[:alnum:]] doesn't work for you then you can use [A-Za-z0-9] instead.
([a-zA-Z0-9]+ ?)+?
^([a-zA-Z0-9]+\s?)*$
its works
^[a-zA-Z]+([\s][a-zA-Z]+)*$
(?:[a-zA-Z0-9]+[ ])+[a-zA-Z0-9]+
If I understand you correctly the above regex should work.
See screenshot below:
screenshot http://img136.imageshack.us/img136/6871/screenshotkiki056.png
This would match a word
'[a-zA-Z0-9]+\ ?'