Regex to find where space is missing between number and word - regex

I am using regex to clean some text files.
In some places, spaces are missing as in the second line below:
1.9 Beef Curry
1.10Banana Pie
1.11 Corn Gravy
I need an expression to find a zero-length match at the position between 0 and B, so that I can replace it (in Notepad++) with a space. Note that numerators can be one or two digits, and there can also be one (i.e. 1. Exotic Disches) or three levels (i.e. 2.5.1 Chicken).
Can someone please give the answer?
I would have thought one of the following should work, but Notepad++ calls it invalid. Would also appreciate it if someone can tell my why...
(?<=\.\d\d|\.\d)(?! )(?!\.)
(?<=\.\d{1,3)(?! )(?!\.)
Thanks in advance!

Maybe it is enough, just to look for the zero length spaces \B (non word boundaries) between word characters and check, if preceded by a digit and not followed by a digit. If so, replace with space.
\B(?<=\d)(?!\d)
See this demo at regex101
at any \B non word boundary
(?<=\d) looks behind for a digt
(?!\d) looks ahead for no digit
For further restricting the digit part to dot, followed by 1-3 digits, try something like \.\d{1,3}\B\K(?!\d) where \K resets beginning of the reported match. Or without \K and replace by $0
Just to mention: Also the underscore belongs to word characters. If your input contains underscores, e.g. something like 1_ and you don't want to add space here, change the lookahead to (?![\d_])

You may use one of
^\d[\d.]*+(?!\h)
^\d[\d.]*+(?! )
^(?>\d+(?:\.\d+)*\.?)(?!\h)
Replace with $& .
Settings and test:
Details
^\d[\d.]*+(?!\h) matches a digit and then 0 or more digits/dots and once they are all matched, a horizontal whitespace is checked for. If there is no whitespace, there is a match.
^\d[\d.]*+(?! ) is the same, just the check is performed for a regular space.
^(?>\d+(?:\.\d+)*\.?)(?!\h) is more specific, it matches
^ - start of line
(?>\d+(?:\.\d+)*\.?) - an atomic group preventing backtracking:
\d+ - 1+ digits
(?:\.\d+)* - 0 or more sequences of . and 1+ digits
\.? - an optional dot
(?!\h) - no horizontal whitespace allowed immediately on the right

My alternative attempt also working
Find what: ^(\d\.\d+) ?(?=\w)
Replace with: $1 a space after $1

Related

Using regex to find abbreviations

I am trying to create a regular expression that will identify possible abbreviations within a given string in Python. I am kind of new to RegEx and I am having difficulties creating an expression though I beleive it should be somewhat simple. The expression should pick up words that have two or more capitalised letter. The expression should also be able to pick up words where a dash have been used in-between and report the whole word (both before and after the dash). If numbers are also present they should also be reported with the word.
As such, it should pick up:
ABC, AbC, ABc, A-ABC, a-ABC, ABC-a, ABC123, ABC-123, 123-ABC.
I have already made the following expression: r'\b(?:[a-z]*[A-Z\-][a-z\d[^\]*]*){2,}'.
However this does also pick up these wrong words:
A-bc, a-b-c
I believe the problem is that it looks for either multiple capitalised letters or dashes. I wish for it to only give me words that have atleast two or more capitalised letters. I understand that it will also "mistakenly" take words as "Abc-Abc" but I don't believe there is a way to avoid these.
If a lookahead is supported and you don't want to match double -- you might use:
\b(?=(?:[a-z\d-]*[A-Z]){2})[A-Za-z\d]+(?:-[A-Za-z\d]+)*\b
Explanation
\b A word boundary
(?= Positive lookahead, assert that from the current location to the right is
(?:[a-z\d-]*[A-Z]){2} Match 2 times the optionally the allowed characters and an uppercase char A-Z
) Close the lookahead
[A-Za-z\d]+ match 1+ times the allowed characters without the hyphen
(?:-[A-Za-z\d]+)* Optionally repeat - and 1+ times the allowed characters
\b A word boundary
See a regex101 demo.
To also not not match when there are hyphens surrounding the characters you can use negative lookarounds asserting not a hyphen to the left or right.
\b(?<!-)(?=(?:[a-z\d-]*[A-Z]){2})[A-Za-z\d]+(?:-[A-Za-z\d]+)*\b(?!-)
See another regex demo.

Regex stopped matching after the first match

I need some help here
Here is example of what im trying to match:
1 ScreenMail Enable friendly none Internal any 5
I need to match everything excluding the last digits (5) Meaning matching the first digit(1), spaces, letter, special characters, etc I tried using /^(\d), but after matching the first digits, it stopped. Your assistance would be appreciated.
The simplest way is probably to remove last digits with:
\d+$
\d+\s*$
See the regex demo.
You may want to use a matching regex like
^.*[^\d\s]
that matches any zero or more chars other than line break chars (.*) as many as possible and then a char other than a digit and whitespace. See this regex demo.
However, if the digits are followed with an optional whitespace, or if you allow any text after the last digits, it will fail. You can then use
^.*[^\d\s](?=\s*\d)
See this regex demo. The (?=\s*\d) positive lookahead requires zero or more whitespaces and then a digit immediately to the right of the current location.

Minimum letter constraint in regex pattern along with special characters

Currently, I am not expert in Regex, but I tried below thing I want to improve it better, can some one please help me?
Pattern can contain ASCII letters, spaces, commas, periods, ', . and - special characters, and there can be one digit at the end of string.
So, it's working well
/^[a-z ,.'-]+(\d{1})?$/i
But I want to put condition that at least 2 letters should be there, could you please tell me, how to achieve this and explain me bit as well, please?
Note that {1} is always redundant in any regex, please remove it to make the regex pattern more readable. (\d{1})? is equal to \d? and matches an optional digit.
Taking into account the string must start with a letter, you can use
/^(?:[a-z][ ,.'-]*){2,}\d?$/i
Details:
^ - start of string
(?: - start of a non-capturing group (it is used here as a container for a pattern sequence to quantify):
[a-z] - an ASCII letter
[ ,.'-]* - zero or more spaces, commas, dots, single quotation marks or hyphens
){2,} - end of group, repeat two or more ({2,}) times
\d? - an optional digit
$ - end of string
i - case insensitive matching is ON.
See the regex demo.
The thing to change in your regex is + after the list of allowed characters.
+ means one or many occurrences of the provided characters. If you want to have 2 or more you can use {2,}
So your regex should look something like
/^[a-z ,.'-]{2,}\d?$/i

RegEx to contain at least one dot

Here's my regex :
\b(https?|www)://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]*[.]{1,256}
I know I'm doing something wrong because I use RegEx very rarely.
The idea of the last [.]{1,256} was to make sure of having at least one "." in.
So, without it I got "https://www" match, so I wanted to make sure that at least one dot exists.
But with the expression above, it cuts to the first dot, not the whole thing.
First of all, www before :// does not make much sense, it can occur after ://, so it can be removed.
Both [-a-zA-Z0-9+&##/%?=~_|!:,.;]* and [-a-zA-Z0-9+&##/%=~_|]* can match an empty string, and the [.]{1,256} at the end of your pattern matches 1 to 256 dots, that is why you get matches up to a dot.
You may refactor the pattern to match all those chars you allow before a dot, then match a dot, and then match any amount of chars you allow, together with a dot:
\bhttps?://[-a-zA-Z0-9+&##/%?=~_|!:,;]*\.[-a-zA-Z0-9+&##/%?=~_|!:,.;]*
Here,
[-a-zA-Z0-9+&##/%?=~_|!:,;]* - matches 0 or more chars you allow but a dot
\. - this matches a dot
[-a-zA-Z0-9+&##/%?=~_|!:,.;]* - 0 or more allowed chars including a dot.
So, at least 1 dot will get matched.

How to find a particular string

Im using Visual Studio 2017 and in a long long text file Im searching for a particular function but unable to find
here's what the regex Im using
c\.CreateMap\<(\w)+\,\s+Address\>
and I want to in these
c.CreateMap<ClientAddress, Address>()
c.CreateMap<Responses.SiteAddress, Data.Address>()
and so on.
As soon as I add "Address" in the regex it stops matching any.
what am I doing wrong?
You can try this
c\.CreateMap\<\w+\.?\w+?\,\s*\w*?\.?Address\>
Explanation
c\.CreateMap\< - Matches c\.CreateMap\<.
\w+ - Matches any word character one or more time.
\.? - Matches '.' zero or one time.
\, - Matches ','.
\s* - Matches space zero or more time.
\w - Matches word character zero or more time.
\.? - Matches '.' zero or one time.
Address\> - Matches Address\>.
Demo
P.S- In case you also want to match something like this.
c.CreateMap<Responses.SiteAddress.abc, Data.Address.xyz>()
You can use this.
c\.CreateMap\<(\w+\.?\w+?)*\,\s*(?:\w*?\.?)*Address(\.\w*)?\>
Demo
Here is general regex I can suggest:
c\.CreateMap\<[\w.]+,\s+(?:[\w.]+\.)?Address\>\s*\(\s*\)
This will match any term with dots or word characters in the first position in the diamond. In the second, position, it will match Address, or some parent class names, followed by a dot separator, followed by Address.
Demo
Note that I also include the empty function call parentheses in the regex. As well, I allow for flexibility in the whitespace may appear after the diamond, or between the parentheses.
In your second example, you have extra dot which is not handled. Your regex needs little modification. Also, you don't need to escape < or > or , Use this,
c\.CreateMap<([\w.])+,\s+[\w.]*Address>
Demo
To match any of the functions on your question, you can use:
c\.CreateMap[^)]+\)
Regex Demo
Regex Explanation: