Get text using Regular Expression - regex

I have the sentence as below:
First learning of regular expression.
And I want to extract only First learning and expression by means of regular expressions.
Where would I start/

Regular expressions are for pattern matching, which means we'd need to know a pattern that is to be matched.
If you literally just want those strings, you'd just use First learning and expression as your patterns.
As #orique says, this is kind of pointless; you don't need RegEx for that. If you want something more complicated, you'd need to explain what you're trying to match.

Regex is not usually used to match literal text like what you're doing, but instead is used to match patterns of text. If you insist on using regex, you'll have to match the trivial expression
(First learning|expression)

As already pointed out, it is unusual to match a literal string like you are asking, but more common to match patterns such as several word characters followed by a space character etc...
Here is a pattern to match several word characters (which are a-z, A-Z, 0-9 and _) followed by a space, followed by several more word characters etc... It ends up capturing three groups. The first group will match the first two words, the second part the next to words, and the last part, the fifth word and the preceding space.
$words = "First learning of regular expression.";
preg_match(/(\w+\s\w+)\s(\w+\s\w+)(\s\w+)/, $words, $matches);
$result = matches[1]+matches[3];
I hope this matches your requirement.

Related

Regular expression for "This Specific A" formatting rule

I'm trying to find the regex expression that validates a specific rule, but I'm quite a beginner with regular expressions.
Rule
There can be any number of words
Words are space-separated
Words only contain letters
Words start with a capital
The last word must be a single capitalized character
Expression
Here is where I am so far: ([A-Z][a-z]+[ ]*)*[A-Z]
Examples
Match
Example Name A
A New Example C
No match
a Test B
Wrong Name
Another_Wrong_Name A
Nop3 A
Notes:
Your regex matches words with two or more letters only before the final one-letter word. You need to match one or more letter words using [A-Z][a-z]*
You use a character class, [ ], to match a single space, and this is redundant, remove brackets.
You need to match the entire string, with anchors, ^ and $, or \A and \z/\Z (depending on regex flavor).
You can use
^([A-Z][a-z]* )*[A-Z]$
^(?:[A-Z][a-z]* )*[A-Z]$
^(?:[A-Z][a-z]*\h)*[A-Z]$
^(?:[A-Z][a-z]*[^\S\r\n])*[A-Z]$
Note [^\S\r\n] and \h match horizontal whitespace, not just a regular space.
The non-capturing group, (?:...), is used merely for grouping patterns without keeping the text they matched in the dedicated memory slot, which is best practice, especially with repeated groups.
See this regex demo.

trying to find the correct regular expression

I have the following cases that should match with a regular expression, I've tried several combinations and have read a lot of answers but still no clue on how to solve it.
the rule is, find any combination of . inside a quoted string, atm I have the following regexp
\"\w*((..)|(.))\w*\"
that covers most of the cases:
mmmas"A.F"asdaAA
196.34.45.."asd."#
".add"
sss"a.aa"sss
".."
"a.."
"a..a"
"..A"
but still having problems with this one:
"WERA.HJJ..J"
I've been testing the regpexp in the http://regexr.com/ site
I will really appreciate any help on this
Change your regex to
\"\w*(\.+\w*)+\"
Update: escape . to match the dot and not any character
demo
From the question, it seems that you need to find every occurrence of one or more dot (along with optional word characters) inside a pair of quotes. The following regex would do this:
\"\w*(\.+\w*)+\"
In "WERA.HJJ..J", you have some word characters followed by a dot which is followed by a sequence of word characters again followed by dot and word characters. Your regex would match one or two dots with a pair of optional word character blocks on either sides only.
The dots in the regex are escaped to avoid them being matched against any character, since it is a metacharacter.
Check here.

Regular Expression to find matches of String series

I'm a new bee in regular expression and need help in delimiting string that follows a certain pattern.
My string will be always follow a pattern like ".(0.satQA).(1.SomewhatEnjoyable).(0.satQC).(0.ShorterThanExpected).(0.Q12).(0._1)".
My first search should return (the bold one here) (0.satQA).(1.SomewhatEnjoyable).(0.satQC).(0.ShorterThanExpected).(0.Q12).(0._1)
second as (0.satQA).(1.SomewhatEnjoyable).(0.satQC).(0.ShorterThanExpected).(0.Q12).(0._1)
Third as (0.satQA).(1.SomewhatEnjoyable).(0.satQC).(0.ShorterThanExpected).(0.Q12).(0._1)
In short, I need to delimit this into 3 parts (in this case). It should start with "(" and follow with characters (any), must include ").(" in the middle and then end with ")".
The regex for the pattern you are looking for is \(.*?\)\.\(.*?\)
.*? is a reluctant greedy quantifier, meaning that will match as it can before the next match in the regex
You also need to escape characters like . ) and (

Regular expression doesn't work as expected

How can it be that this regular expression also returns strings that have a _ underscore as their last character?
It should only return strings with alphabetical characters, mixed lower- and uppercase.
However, the regular expression returns: 'action_'
$regEx = '/^([a-zA-Z])[a-zA-Z]*[\S]$|^([a-zA-Z])*[\S]$|^[a-zA-Z]*[\S]$/';
Because \S means "not whitespace character", \S matches _
A group should not have an underscore though, so, if you meant that, it could be that you are getting the whole match back and not just the first group.
Please show how are you using the regex to clarify that, if needed.
The [\S] will match everything that is not whitespace, including underscore.
Also, your expression is very odd!
If you want a string that only contains letters, then use ^[a-zA-Z]*$ or ^[a-zA-Z]+$ (depending on if blank is allowed or not).
If you're trying to do something else, you will need to expand on what that is.
\S matches any non-whitespace char - thus _
You should show the text and what part from you want to extract from it.
Regular expression shouldn't be so big like yours.
Work on small expression batches... At this size, is very difficult to help you.

What does /([^.]*)\.(.*)/ mean?

When I searched about something, I found an answered question in this site. 2 of the answers contain
/([^.]*)\.(.*)/
on their answer.
The question is located at Find & replace jquery. I'm newbie in javascript, so I wonder, what does it mean? Thanks.
/([^.]*)\.(.*)/
Let us deconstruct it. The beginning and trailing slash are delimiters, and mark the start and end of the regular expression.
Then there is a parenthesized group: ([^.]*) The parentheseis are there just to group a string together. The square brackets denote a "character group", meaning that any character inside this group is accepted in its place. However, this group is negated by the first character being ^, which reverse its meaning. Since the only character beside the negation is a period, this matches a single character that is not a period. After the square brackets is a * (asterisk), which means that the square brackets can be matched zero or more times.
Then we get to the \.. This is an escaped period. Periods in regular expressions have special meaning (except when escaped or in a character group). This matches a literal period in the text.
(.*) is a new paranthesized sub-group. This time, the period matches any character, and the asterisk says it can be repeated as many times as needs to.
In summary, the expression finds any sequence of characters (that isn't a period), followed by a single period, again followed by any character.
Edit: Removed part about shortening, as it defeats the assumed purpose of the regular expression.
It's a regular expression (it matches non-periods, followed by a period followed by anything (think "file.ext")). And you should run, not walk, to learn about them. Explaining how this particular regular expression works isn't going to help you as you need to start simpler. So start with a regex tutorial and pick up Mastering Regular Expressions.
Original: /([^.]*)\.(.*)/
Split this as:
[1] ([^.]*) : It says match all characters except . [ period ]
[2] \. : match a period
[3] (.*) : matches any character
so it becomes
[1]Match all characters which are not . [ period ] [2] till you find a .[ period ] then [3] match all characters.
Anything except a dot, followed by a dot, followed by anything.
You can test regex'es on regexpal
It's a regular expression that roughly searches for a string that doesn't contain a period, followed by a period, and then a string containing any characters.
That is a regular expression. Regular expressions are powerful tools if you use them right.
That particular regex extracts filename and extension from a string that looks like "file.ext".
It's a regular expression that splits a string into two parts: everything before the first period, and then the remainder. Most regex engines (including the Javascript one) allow you to then access those parts of the string separately (using $1 to refer to the first part, and $2 for the second part).
This is a regular expression with some advanced use.
Consider a simpler version: /[^.]*\..*/ which is the same as above without parentheses. This will match just any string with at least one dot. When the parentheses are added, and a match happens, the variables \1 and \2 will contain the matched parts from the parentheses. The first one will have anything before the first dot. The second part will have everything after the first dot.
Examples:
input: foo...bar
\1: foo
\2: ..bar
input: .foobar
\1:
\2: foobar
This regular expression generates two matching expressions that can be retrieved.
The two parts are the string before the first dot (which may be empty), and the string after the first dot (which may contain other dots).
The only restriction on the input is that it contain at least one dot. It will match "." contrary to some of the other answers, but the retrived groups will be empty.
IMO /.*\..*/g Would do the same thing.
const senExample = 'I am test. Food is good.';
const result1 = senExample.match(/([^.]*)\.(.*)/g);
console.log(result1); // ["I am test. Food is good."]
const result2 = senExample.match(/^.*\..*/g);
console.log(result2); // ["I am test. Food is good."]
the . character matches any character except line break characters the \r or \n.
the ^ negates what follows it (in this case the dot)
the * means "zero or more times"
the parentheses group and capture,
the \ allows you to match a special character (like the dot or the star)
so this ([^.]*) means any line break repeated zero or more times (it just eats up carriage returns).
this (.*) part means any string of characters zero or more times (except the line breaks)
and the \. means a real dot
so the whole thing would match zero or more line breaks followed by a dot followed by any number of characters.
For more information and a really great reference on Regular Expressions check out: http://www.regular-expressions.info/reference.html
It's a regular expression, which basically is a pattern of characters that is used to describe another pattern of characters. I once used regexps to find an email address inside a text file, and they can be used to find pretty much any pattern of text within a larger body of text provided you write the regexp properly.