I have been trying to capture anything with a symbol('!') and the word(s) and between them is a space.
Example:
!!! !!! intense beatdown
Right now I could only get the !!! intense word but what would I want is to get the whole word:
!!! intense beatdown
Here is the regex that I'm using:
text = '!!! !!! intense beatdown'
matches = re.findall(r'(\!+ \w+)', text)
Use this regex :
Regex :
!!!\s([!\s]+.+)
Demo Code : Here
Demo Regex : Here
You could match 1 or more exclamation marks followed by matching 1+ word chars.
Then repeat a non capturing group 0+ times matching 1+ word chars separated by a space.
!+ \w+(?: \w+)*
In parts
!+ Match 1+ times !
\w+ Match a space and 1+ word chars
(?: Non capturing group
\w+ Match a space and 1+ word chars
)* Close group and repeat 0+ times using *
Regex demo
Related
I have a string that has the following structure:
digit-word(s)-digit.
For example:
2029 AG.IZTAPALAPA 2
I want to extract the word(s) in the middle, and the digit at the end of the string.
I want to extract AG.IZTAPALAPA and 2 in the same capture group to extract like:
AG.IZTAPALAPA 2
I managed to capture them as individual capture groups but not as a single:
town_state['municipality'] = town_state['Town'].str.extract(r'(\D+)', expand=False)
town_state['number'] = town_state['Town'].str.extract(r'(\d+)$', expand=False)
Thank you for your help!
Yo can use a single capturing group for the example string to match a single "word" that consists of uppercase chars A-Z with an optional dot in the middle which can not be at the start or end followed by 1 or more digits.
\b\d+ ([A-Z]+(?:\.[A-Z]+)* \d+)\b
Explanation
\b A word boundary
\d+
( Capture group 1
[A-Z]+ Match 1+ occurrences of an uppercase char A-Z
(?:\.[A-Z]+)* \d+ Repeat 0+ times matching a dot and a char A-Z followed by matching 1+ digits
) Close group 1
\b A word boundary
Regex demo
Or you can make the pattern a bit broader matching either a dot or a word character
\b\d+ ([\w.]+(?: [\w.]+)* \d+)\b
Regex demo
You can use the following simple regex:
[0-9]+\s([A-Z]+.[A-Z]+(?: [0-9]+)*)
Note:
(?: [0-9]+)* will make it the last digital optional.
need an expression to allow only the below pattern
end word(dot)(space)start word [eg: end. start]
in other words
no space before colon,semicolon and dot |
one space after colon,semicolon and dot
rest of the all other patterns need to get capture to identify such as
end.start || end . start || end .start
i used
"([\s{0,}][\.]|[\.][\s{2,}a-z]|[\.][\s{0,}a-z])"
but not working as i expected.Need your support please
need_regex_patterns aim_of_regex_need
You could match 1+ word characters using \w+ and match either a colon or semi colon using a character class [;:] between optional spaces ?.
After that, match again 1+ word characters.
\w+ ?[;:] ?\w+
Regex demo
To match the dot followed by a single space variant, you don't need a character class but you could match the dot only using \.
\w+\. \w+
Regex demo
Edit
To highlight all the matches for the punctuations:
(?: [.:;]|[.:;] {2,}|(?<=\S)[;:.](?=\S))
Explanation
(?: Non capture group
[.:;] match a space followed by either . : or ;
| Or
[.:;] {2,} Match one of the listed followed by 2 or more spaces
| Or
(?<=\S)[;:.](?=\S) Match one of the listed surrounded by non whitespace chars
) Close group
Regex demo
So I currently have a regex (https://regex101.com/r/zBE4Ju/1) that highlights the words before and after a linebreak. This is nice, but the issue is sometimes there are whitespaces after the word that appears BEFORE the line break. So they end up
You can see on my regex101 how the issue happens, and I have outlined the problem. I need to recognize the word before and after the line break, regardless of if there is a space after the word.
(\w*(?:[\n](?![\n])\w*)+)
You can see it in action here https://regex101.com/r/zBE4Ju/3
Expected: Line 1
Actual: Line 3
You can use $1 from:
/([^ ]+) *(\r|\n)/gm
https://regex101.com/r/o87VP7/5
If you want to highlight the last "word" in the sentence followed by possible spaces and a newline, you could repeat 0+ times a group matching 1+ non whitespace chars followed by 1+ spaces.
Then capture in a group matching non whitespace chars (\S+) and match possible spaces followed by a newline.
^ *(?:\S+ +)*(\S+) *\r?\n
Explanation
^ Start of string
* Match 0+ times a space
(?: Non capturing group
\S+ + Match 1+ non whitespace chars and 1+ spaces
-)* Close non capturing group and repeat 0+ times (to also match a single word at the beginning)
(\S+) Capture group 1, match 1+ times a non whitespace char
*\r?\n Match 0+ times a space followed by a newline
Regex demo
Anyone would kindly help with a regex for Notepad++ to replace Word with #Word (only after the first occurrence of #)?
#Celebrity #Glad #Known #Lord Byron #British #Poet
should become
#Celebrity #Glad #Known #Lord #Byron #British #Poet
^
To replace Word with #Word only after the first occurrence of #, you could use an alternation:
Find what
(?>^[^#]*#\w+\h*|#\w+\h*|\G)\K(\w+\h*)
Replace with
#\1
Regex demo
Explanation
(?> Atomic group
^[^#]*#\w+\h* Match from the start of the string not a # 0+ times using a negated character class followed by matching a #. Then match 1+ times a word character followed by 0+ times a horizontal whitespace character.
| Or
#\w+\h* Match #, a word character 1+ times followed by a horizontal whitespace character 0+ times
| Or
\G Assert position at the end of the previous match
) Close atomic group
\K Forget what what previously matched
(\w+\h*) Capture in a group 1+ word characters followed by 0+ times a horizontal whitespace character
You can use the the following regex to match and replace:
\s([^#]\w+)
It starts by matching a White Space then it creates a Group, that does not start with '#', but contains one or more Word characters.
You then replace with:
' #$1'
That will add '#' to the Words thats doesn't start with it.
Need help with this regex
ABC 130 zlis 02-03/12 N180 Grouping req
A B Csd 130 pain 02/12 I80 alias
(\w+\s{0,3})(\d+)
The regex does not seem to group as I need it to.
Desired Output, brackests are the groups im trying to detect.
(A B Csd) (130) (pain) (02/12) (I80) (alias)
Try this regex:
([a-z ]+?)\s+(\d+)\s+([a-z]+)\s+([\d-\/]+)\s+([\w ]+)
Click for Demo
Explanation:
([a-z ]+?) - match 1+ occurrences(as few as possible) of a letter or a space and capture it as Group1
\s+ - matches 1+ occurrences of a whitespace character
(\d+) - match 1+ occurrences of digits and capture as Group2
\s+ - matches 1+ occurrences of a whitespace character
([a-z]+) - match 1+ occurrences of a letter and Capture as Group 3
\s+ - matches 1+ occurrences of a whitespace character
([\d-\/]+) - match 1+ occurrences of a digit or - or / and capture it as Group4
\s+ - matches 1+ occurrences of a whitespace character
([\w ]+) - match 1+ occurrences of a word-character or a space and capture as Group5
Note that I have used the g, i, m flags for Global matches, Case-insensitive and Multiline respectively.