RegEx - match all periods except the period preceded by 'single capital letter' - regex

Any ideas on how to remove all periods from a large text document, by using a regex on a text editor for the following example:
J. don't match
F.C. don't match
word. match
Word. match
WORD. match

Below regex matches multiple word characters or single non-capital string followed by .:
((\w{2,})|([^A-Z]))\.$

You can try this too,
(?<!(?<=^|[^A-Z])[A-Z])\.
Demo

You can try something like this: \w{2,}?\.
You can go to Regex101 and try it for yourself with more test strings to get the one you want. If you want to actually exclude the periods you can use a capturing group like so: (\w{2,}?)\.

Related

Deleted everything before the dot

How can I use regex in notepad++ to make a query like this:
I have a list with subdomains containing three words such as
web1.com
test.web2.com
www.test.web3.com
I want to filter so that only three words remain and something like this comes out:
web1.com
test.web2.com
test.web3.com
I was able to delete so that only the domain remains, but this is not what I want
^(?:.+\.)?([^.\r\n]+\.[^.\r\n]+)$
An idea to match until the endpart starts and capture that.
^.*?\.([\w-]+\.[\w-]+\.[\w-]+)$
Replace with $1 (what was captured by the first group)
.*? matches lazily any amount of any characters (besides newline)
[\w-]+ char-class matches one or more word characters and hyphen
See this demo at regex101 (more explanation on the right side)
In Notepad++ be sure to have unchecked: [ ] dot matches newline
Another take at it using a positive lookahead to assert the 3 "words" to the right, allowing for non whitespace chars excluding a dot using [^\s.]
In the replacement use an empty string.
^\S+?\.(?=[^\s.]+\.[^\s.]+\.[^\s.]+$)
See a regex demo.

Match a part of a string using regex

I have a string and would like to match a part of it.
The string is Accept: multipart/mixedPrivacy: nonePAI: <sip:4168755400#1.1.1.238>From: <sip:4168755400#1.1.1.238>;tag=5430960946837208_c1b08.2.3.1602135087396.0_1237422_3895152To: <sip:4168755400#1.1.1.238>
I want to match PAI: <sip:4168755400#
the whitespace can be a word so i would like to use .* but if i used that it matches most of the string
The example on that link is showing what i'm matching if i use the whitespace instead of .*
(PAI: <sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
The example on that link is showing what i'm trying to achieve with .* but it should only match PAI: <sip:4168755400#
(PAI:.*<sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
I tried lookaround but failing.
Any idea?
thanks
Matching the single space can be updated by using a character class matching either a space or a word character and repeat that 1 or more times to match at least a single occurrence.
Note that you don't have to escape the spaces, and in both occasions you can use an optional character class matching either a space or hyphen [ -]?
If you want the match only, you can omit the 2 capturing groups if you want to.
(PAI:[ \w]+<sip:)((?:\([2-9]\d{2}\) ?|[2-9]\d{2}[ -]?)[2-9]\d{2}[- ]?\d{4})#
Regex demo
The regex should be like
PAI:.*?(<sip:.*?#)
Explanation:
PAI:.*? find the word PAI: and after the word it can be anything (.*) but ? is used to indicate that it should match as few as possible before it found the next expression.
(<sip:.*?#) capturing group that we want the result.
<sip:.*?# find <sip: and after the word it can be anything .*? before it found #.
Example

Regex to find if all the characters in a word are the same specific character

I have a set of words coming in one by one like aa, ##, ???, ~~~, ?~ etc
I need a regex to find if any of these words is containing only ? or only ~.
Of the above input examples, ??? and ~~~ should match but not the others.
I tried ^[\s?]*$ and ^[\s~]*$ separately and it works, I am trying to combine them.
^[\s?||~]*$ doesn't work as it also recognizes ?~ as valid.
Any help?
You can use this regex, which looks for a string starting with a ~ or a ?, and then asserts that every other character in the string is the same as the first one using a backreference (\1):
^([~?])\1+$
Demo on regex101
You need to use backreference to achived your desired result.
If you want only ~ or ? use
^([~?])\1+$
If you want any repetitive pattern, use
^(.)\1+$
Explanation (.) or ([~?]) capturing the first charactor.
Then, \1+ checking the first charactor, one or more times (backreferencing)
You want to match lines that both start and end with any number of either a tilde or questionmark. That would be ^\(~\|?\)*$. The parentheses to make a group and the vertical bar to do the 'or' need to be backslash escaped.

How do I match these text lines in regex?

I'm trying to match the three first text lines in regex, i.e. the ones ending with form.
value="something form"
value="Second cool form"
value="another silly old form"
value="blabla"
How can I do that?
I don't know what tool you are using, but the following pattern should match the first three lines:
.*form"$
Demo
You could simply use:
.*form"$
In order to work, you would have to turn on multiline mode.
Dot (.) means - match me anything but newline character, asterisk (*) means - match me dot zero or more times after which comes text form. Dollar sign ($) is anchor to the string ending.
Take a look at demo. You should learn more about regular expressions here, this is basic regex matching.
You can try using this:
\w*form\b
\w*: Allows characters in front of form
\b: Makes sure that form is at the end of the string.
Regex 101 demo
Actually if you want to match the 'form' as a separate word, you need something like this:
\Wform\W
\W (capital W) is any character which does not represent a word character, at least in perl-like regex.

How to extract piece with '\' and spacec?

"This is a piece of 432432\5321 text".
Numbers could be whatever long and also could be letters. How to get only 432432\5321 part of this?
Here is a sample:
(\d+\\\d+)
Group of digits followed by slash and followed by group of digits. Surrounding parenthesis is a capturing group.
Here is the fiddle: https://regex101.com/r/gI5rG4/2
EDIT:
I have missed that you also want letters. Then use \w instead of \d.
You can use the following example:
input = 'This is a piece of 432432\\5321 text'
print re.findall(r'(\d+(?:\\\d+)+)', input)
It can handle both input like 111\222, 111\222\333, etc.
Use \w for matching alphanumeric characters and \\for matching the backslash:
(\w+\\\w+)
This would match inputs like 32432\5321 as well those with letters in it, e.g. 32A1\BB1
Fiddle: https://regex101.com/r/yF2aX1/2