Another actionscript problem. I need to extract the first sentence from a block of text but if the first sentence does not contain more than 80 characters then I need to extract the first and second sentence.
The example code below is an attempt to find the sentences and not get confused with the other perios/full stops in the eg text.
I have this test code:
import flash.text.TextField;
var descr:String =
"The temperature was 32.8 degrees Celsius. His B.Sc. degree was deemed insufficient. the Dr. owed B. the bank USD 4000.50 which he had not paid back. On 27.07.2004 a major earthquake occurred. It was 17.05 by the clock.";
var array:Array;
array = descr.split(/\s[a-zA-Z]{3,30}\.\s/);
trace(descr); //put original above output for checking against
trace(array+"\n"+array.length);//ouput
Any suggestions would be appreciated. Will check back when I get up.
Thanks
You could try using a lazy quantifier of the form {m,n}? and a positive lookahead to make sure that the period is one which matches at the end of the sentence:
^.{0,79}?(?=\.(?:$| [A-Z]))\..+?(?=\.(?:$| [A-Z]))\.|^.{80,}?(?=\.(?:$| [A-Z]))\.
The regex is of two parts:
^.{0,79}?(?=\.(?:$| [A-Z]))\..+?(?=\.(?:$| [A-Z]))\.
To match the two first sentences if the first sentence is below 80 characters.
^.{80,}?(?=\.(?:$| [A-Z]))\.
To match the first sentence (when the first part fails, that is when the first sentence is above 80 chars).
^ matches the beginning of the string.
.{0,79}? matches at most 79 characters and will stop at the closest sentence period.
.{80,}? matches at least 80 characters and will stop at the closest sentence period.
.+? is for the second sentence and can contain any number of characters.
(?=\.(?:$| [A-Z])) is a positive lookahead which matches a period that is either at the end of the string (\.$) OR, a period followed by a whitespace and a capital letter (\. [A-Z]).
Then match the period with \..
regex101 demo
NOTE: This is a regex to match and not split.
Related
I have this type of regex
\b[0-9][0-9][0-9][0-9][0-9]\b
It's not complete, this will match me many examples of 5 digit but I need just first and one match from this structure:
Reference Number WW
30966 CFUN22 098765334
30967 CFUN22 098765335
30968 CFUN22 098765336
30969 CFUN22 098765337
In this case I need just "30966" , not 30967,30968 and so on...
I tried to do
\b[0-9][0-9][0-9][0-9][0-9]\b
You can use a positive lookbehind to make sure that you're grabbing the first 5-digit number after the word "Comments":
(?<=Comments\n)\d{5}\b
https://regex101.com/r/pZLj4K/1
Try using the following regex:
^\N+\n.*?(\d{5})
It will match:
^: start of string
\N+\n: any sequence of non-newline characters, followed by newline
\n: the newline character
.*?: optional smallest sequence of characters
(\d{5}): "Group 1" - sequence of five characters
Your needed digits can be found within Group 1.
Given you're dealing with a textual table, using \N\n will allow you to skip the header from selection, while .*? will allow to match your code not necessarily at the beginning of the second line.
Check the regex demo here.
/[\w|A-Z]{1,3}[a-z]/g
but I want to match only the first 3 char of words.
For example:
I WANt THE FIRst 3 CHAr OF WORds ONLy.
It's for a rapid lector: only uppercase the begining of any words.
The best could be: (First 3 char)(Rest of the word or space)
https://regex101.com/r/PCi8Dn/2
Thank you !
Original answer
Use positive lookahead ((?=[pattern]) to match without including in the match.
[A-Z]{1,3}(?=[a-z])
appears to do what you want (if I've understood your spec correctly).
You can see it in action here.
New answer following clarification on spec
I think this does what you want:
(\S{1,3})(\S*[\s\.]+)
The breakdown is:
1st capturing group: (\S{1,3})
Matches a maximum of 3 non-space characters (\S used instead of \w because I think you want to match characters with diacritics like à and punctuation in the middle of words like '.
2nd capturing group: (\S*[\s\.]+)
Matches zero or more non-space characters (the remaining characters in each word) followed by one or more delimiter characters (space or period). I included period as a delimiter to match the last word. You might want to adjust that part depending on your exact needs.
See it in action here.
I am working on validating the pan card numbers. I need to check that the first character and the fifth character should be same while validating the pan card. Whatever the first character in the below string the same should be matched with the fifth character. Can anyone help me in applying the above condition?
Regex I have tried : [A-Za-z]{4}\d{4}[A-Za-z]{1}
Here is my pan card example: ABCDA9999K
If you want to match the full example string where the first A should match up with the fifth A, the pattern should match 5 occurrences of [A-Za-z]{5} instead of [A-Za-z]{4}
You could use a capturing group with a backreference ([A-Za-z])[A-Za-z]{3}\1 to account for the first 5 chars.
You might add word boundaries \b to the start and end to prevent a partial match or add anchors to assert the start ^ and the end $ of the string.
This part of the pattern {1} can be omitted.
([A-Za-z])[A-Za-z]{3}\1\d{4}[A-Za-z]
Regex demo
I want to match specific strings from beginning to 5th word of article title.
Input string:
The 14 best US colleges in the West are dominated by California — here's who makes the cut.
regex:
/^.*(\bbest\b|\btop\b|\bhot\b).*$/
Currently matched whole article title but want to search till "colleges".
and also need ignore or not matched strings like laptop,hot-spot etc.
You can use this expression
^((?:\w+\s?){1,5}).*
Explanation:
^ assert position at start of the string
\w+ match any word character
\s? match any white space character
{1,5} Quantifier - Between 1 and 5 times, as many times as possible
.* matches any character (except newline)
This matches the first 5 words (and spaces).
^(\w+\s){0,4}\b(best|top|hot)(\s|$)
You want to match string within first five words of input sentence. Then if counted from the start the sentence, there must be 0-4 words before the word you want to match. So you need ^(\w+\s){0,4} before the specific words you want to match. See https://regex101.com/r/nS0dU6/4
regex101 comes to help again.
^(?=(?:\w+\s){0,4}?(?:best|top|hot)\b(?!-))(\w+(?:\s\w+){0,4})
(?=(?:\w+\s){0,4}?(?:best|top|hot)\b(?!-) checks that the keyword is within first 5 (note that (?!-) is added to cater for words such as hot-spot)
(\w+(?:\s\w+){0,4}) then matches the first maximum 5 words
I'm wanting to match a string if begins with either a letter or number, and from there I want to count the string (excluding whitespaces), and if it's over 5 characters, match it.
I believe I'm pretty close, my current regex is:
\s*(?:\S[\t ]*){5,}
What I need to add, is making sure the string starts with either a letter or number (or if it begins with a whitespace, make sure the following character is a letter or number.)
http://regex101.com/r/lD7mZ2/1
How about the regex
^\s*[a-zA-Z0-9]\s*(?:\S[\t ]*){4,}
Example: http://regex101.com/r/lD7mZ2/4
Changes made
^ anchors the regex at the start of the string.
[a-zA-Z0-9] matches letter or digit
{4,} quantifies it minimum 4 times. The presceding \w makes length of minimum 5
OR
a shorter version would be
^\s*[a-zA-Z0-9]\s*(?:\S\s*){4,}