trying to find the correct regular expression - regex

I have the following cases that should match with a regular expression, I've tried several combinations and have read a lot of answers but still no clue on how to solve it.
the rule is, find any combination of . inside a quoted string, atm I have the following regexp
\"\w*((..)|(.))\w*\"
that covers most of the cases:
mmmas"A.F"asdaAA
196.34.45.."asd."#
".add"
sss"a.aa"sss
".."
"a.."
"a..a"
"..A"
but still having problems with this one:
"WERA.HJJ..J"
I've been testing the regpexp in the http://regexr.com/ site
I will really appreciate any help on this

Change your regex to
\"\w*(\.+\w*)+\"
Update: escape . to match the dot and not any character
demo

From the question, it seems that you need to find every occurrence of one or more dot (along with optional word characters) inside a pair of quotes. The following regex would do this:
\"\w*(\.+\w*)+\"
In "WERA.HJJ..J", you have some word characters followed by a dot which is followed by a sequence of word characters again followed by dot and word characters. Your regex would match one or two dots with a pair of optional word character blocks on either sides only.
The dots in the regex are escaped to avoid them being matched against any character, since it is a metacharacter.
Check here.

Related

Regex - returning a match without a period

I'm using the below regex string to match the word "kohls" which is located in a group of other words.
\W*((?i)kohls(?-i))\W*
It works great when the word is alone, but if the word is in a url, the match includes a period on both sides.
See the below examples:
Thank you for shopping at Kohls - returns a match for kohls.
https://www.kohls.com - returns a match for .kohls.
Edit. https://www.KohlsAndMichaels.com - doesn't return any match for kohls.
I want it to only extract the exact match for kohls without periods or any other symbols/text in front or behind it. Can you tell me what I'm doing wrong?
In cases like that you can always use a site like regex101.com, which explains the regular expression and shows the matches with colors. So this is how your regular expression currently works:
As you can see in blue color, the problem with the dots is in the \W*, which matches any non-word character. In order to fix this, you can use the following regular expression:
\b((?i)kohls(?-i))\b
The \b (before and after the word you want to match) is used to assert the position at a word boundary. See how this work on that website now:
If you still have questions, look at the explanation of the regular expression provided by that website. It is worth looking.
The \W metacharacter is used to find non-word characters. So adding a star operator will match 0 or more of these non-word characters (like periods). Did you meant to add a word boundary instead?
\b(?i)kohls(?-i)\b
Replace both \W* with [\W,\.\-]* etc.
Should be enough.

Regex expression for any times any character between two set of literals

Hi a am new to regex and programming. I in a textual file want to search any thing (all characters) between first occurrences of two literal namely- 'html' and 'http'. I have tried lot of expression, but no success. Any help will be appreciated.
You could try this regex,
(?<=html).*?(?=http)
Use s switch to make dot to match newlines also.
Explanation:
(?<=html) Positive lookbehind is used. It matches all the characters after the word html.
.*? It matches any character zero or more times. ? after * makes the regex engine to match the shortest possibility.
(?=http) Positive lookahead. Matches any characters before http.

Need RegEx to remove dots from a string only if there are more than one

Hello I have strings like
tda2030 100.200.300 circuit
I want to check if this string contains any keyword (separated by whitespace but can be on start or end) that contain more than 1 dot and then remove the dots.
The result should be
tda2030 100200300 circuit
in the example.
I tried a lot but I think I need a regex-pert :) Thanks in advance.
That's an interesting question because of your requirement to have multiple dots and the fact that PCRE does not allow infinite-width lookbehinds to see if we might have a dot behind us. We'll get over that limitation by using \K and \G.
Here is a regex that will find the right dots (see online demo)
(?<=\w)\.(?=\w+\.)|\G\w+\K\.
Use preg_replace to replace with an empty string:
$replaced = preg_replace("~(?<=\w)\.(?=\w+\.)|\G\w+\K\.~","",$string);
How does it work?
We have two cases separated by an | (OR)
Match a dot that is preceded by at least one word character and followed by some word characters and a dot
Match a dot that follows the previous match (which had to be a dot) and some word characters

Get text using Regular Expression

I have the sentence as below:
First learning of regular expression.
And I want to extract only First learning and expression by means of regular expressions.
Where would I start/
Regular expressions are for pattern matching, which means we'd need to know a pattern that is to be matched.
If you literally just want those strings, you'd just use First learning and expression as your patterns.
As #orique says, this is kind of pointless; you don't need RegEx for that. If you want something more complicated, you'd need to explain what you're trying to match.
Regex is not usually used to match literal text like what you're doing, but instead is used to match patterns of text. If you insist on using regex, you'll have to match the trivial expression
(First learning|expression)
As already pointed out, it is unusual to match a literal string like you are asking, but more common to match patterns such as several word characters followed by a space character etc...
Here is a pattern to match several word characters (which are a-z, A-Z, 0-9 and _) followed by a space, followed by several more word characters etc... It ends up capturing three groups. The first group will match the first two words, the second part the next to words, and the last part, the fifth word and the preceding space.
$words = "First learning of regular expression.";
preg_match(/(\w+\s\w+)\s(\w+\s\w+)(\s\w+)/, $words, $matches);
$result = matches[1]+matches[3];
I hope this matches your requirement.

Perl matching characters bigger than a given length

I have been struggle to write regex that matches words longer than a given length within parentheses. First I thought I could do this with \(\w{a,}\) but I realize that it doesn't match with words with white space (ab cd ef). All I want to do is find out any characters within parentheses longer than, for instance, 3 characters. How can I resolve this problem ?
What is a word with white space?
if you want to match any character then use .
\(.{3,}\)
. matches any character except newlines
But be careful, this is greedy. it will match for example also
(a)123(b)
To avoid this you could do something like
\([^)]{3,}\)
See it here online on Regexr
[^)] means any character except a )
You could use a character class that includes both \w and \s:
\([\w\s]{a,}\)
Maybe do you mean?
\([\w\s]{a,}\)
if it has a space in it it's not a word anymore.
is matching any characters fine \(.{a,}\)? Or you just need the whitespace \(\(\w|\s\){a,}\)?