This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I am very new to python, I am trying to write a regex that will find all instances of a period, space, then capital letter in a corpora.
I have this:
print (re.findall(r'(\.|\!|\?) (A-Z\w+\b)',text))
I got it to print when there was only one capital (i.e. I went to the movie.) but not when its a capitalized word.
Thoughts?
Could use findall using this
(\.|!|\?) ([A-Z]\w+)
The word boundary is not needed here.
The alternations can be substituted for a class [.!?] but not necessary.
The A-Z is a class item but it needs to be enclosed in square brackets [].
Findall will make two elements per match, the punctuation and the alphanum string.
Related
This question already has answers here:
How can I "inverse match" with regex?
(10 answers)
Closed 6 months ago.
Regex: /^[0-9\p{L}.,\s]+$/u
I would like to replace the characters not matching with the regex with "".
As I understand, you simply want to drop all chars not matching your regex. So the idea is to invert the class of chars:
/^[0-9\p{L}.,\s]+$/u should become /[^\d\p{L}.,\s]+/gu (I added the ^ after the [ to say "not in this list of chars" and replaced 0-9 by \d for digits. Use the g modifier (=global ) to match multiple times.
Running it: https://regex101.com/r/IQz6K5/1
I'm not sure that ,, . and the space will be enough ponctuation. It would be interesting to have a complete example of what you are trying to achieve. You could use another unicode character class for ponctuation if needed, typically with \p{P}. See more info about unicode classes here: https://www.regular-expressions.info/unicode.html#category
This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 2 years ago.
I am currently using the following character class:
[^\)\(] in my regex
I want to add the word 'hello' to this class so it is also not matched in my string.
I have tried
[^\)\((hello)]
but it does not work.
What can I do?
One typical way you would enforce that hello does not appear would be to use a negative lookahead, e.g.
^(?!.*hello)[^t()]+$
If you only wanted to exclude hello when it appears as a bona fide word, then surround it with word boundaries in the lookahead:
^(?!.*\bhello\b)[^t()]+$
This question already has answers here:
Regex not to allow double underscores
(3 answers)
Closed 3 years ago.
I have tried different regular expressions already but I am not sure how to have it catch one or more underscore. If are two together, must be invalid.
First word must be capital letter, then any character, the problem is underscore
I have this: (^[A-Z])(\w{6,30} ?=*(_))
This regex may work for you with a negative lookahead condition:
^[A-Z](?![^_]*__)\w{6,30}$
(?![^_]*__) is a negative lookahead condition that fails the match if __ appear anywhere after first capital letter.
RegEx Demo
If you mean a pattern which is a word starting with a capital letter followed by some groups consisting of a single underscore and a word:
^[A-Z]\w{6,30}(_\w{6,30})*$
This question already has answers here:
How do I match any character across multiple lines in a regular expression?
(26 answers)
Closed 5 years ago.
I am trying to create a regex that will give me everything that appears between [start-flashcards] and [end-flashcards]
I am using \[start-flashcards\](.*?)\[end-flashcards\] but this doesn't match. I must be missing something?
<p>[start-flashcards]</p>
<p>[London|This is the capital city of the United Kingdom]</p>
<p>[Paris|This is the capital city of France]</p>
<p>[Madrid|This is the capital city of Spain]</p>
<p>[Tokyo|This is the capital city of Japan]</p>
<p>[Moscow|This is the capital city of Russia]</p>
<p>[end-flashcards]</p>
You need this:
\[start-flashcards\]([\s\S]*?)\[end-flashcards\]
You've used ., which doesn't match line-breaks.
EDIT:
Turns out there is an efficient way of achieving the same:
Use your regex \[start-flashcards\](.*?)\[end-flashcards\] with the /s modifier flag. This flag allows . to match newline characters.
This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 7 years ago.
I have the following regex pattern:
^[A-Za-z][A-Za-z0-9_-]+$`
It is used to match; alphanumeric characters, underscores and dashes, with the first character being alphabetical.
This works as expected, but I also need it to be able to match single characters. A conditions of a fails.
How can I modify the pattern to make a single alphabetical character pass?
The + means "one or more". Replace it with * for "zero or more".
^[A-Za-z][A-Za-z0-9_-]*$
This shoudl do it for you