I am trying to split sentences with some exceptions to ignore cases like Mr. And Mrs. Etc... And add them to an array.
This worked in vanilla JS for me
/(?<!\sMrs)(?<!\sMr)(?<!\sSr)(\.\s)|(!\s)|(\?\s)/gm
Unfortunately React Native doesn't support the negative lookbehind.
Is there another way I can achieve the same result?
You can create exceptions in the following way:
let str = "Hello Mr. Jackson. How are you doing today?"
let sentences = str.match(/(?:\b(?:Mrs?|Sr)\.\s|[^!?.])+[!?.]?/g).map(x => x.trim())
console.log(sentences)
The regex (see its online demo) matches
(?:\b(?:Mrs?|Sr)\.\s|[^!?.])+ - one or more occurrences of
\b(?:Mrs?|Sr)\.\s - Mr, Mrs or Sr as whole words followed with . and a whitespace char
| - or
[^!?.] - any single char other than ?, ! and .
[!?.]? - an optional !, ? or ..
I ended up doing this:
let str = "Hello Mr. Jackson. How are you doing today?"
let sentences = str
.replace(
/(!”\s|\?”\s|\.”|!\)]s|\.\)\s|\?\)|!"\s|\."\s|\.\s|!\s|\?\s|[.?!])\s*/g,
"$1|"
)
.split("|")
let arr = []
for (let i = 0; i < sentences.length; i++) {
if (sentences[i].includes("Mr.") | sentences[i].includes("Mrs.")) {
arr.push(sentences[i] + " " + sentences[i+1])
i++
} else {
arr.push(sentences[i])
}
}
console.log(arr)
If anyone has a more efficient solution, let me know!
Related
I am working on regex to match either(not both) of two conditions like this:
Digits following A. ex) A123, regex (?<=A)\d+
Digits followed by Z. ex) 123Z, regex \d+(?=Z)
So naive way is to use '|' to combine them like this:
(?<=A)\d+|\d+(?=Z)
This could be better for readability and maintainability, but I'm just curious if there is another way to make it without |.
Here is a C# test code for that.
string pattern = #"(?<=A)\d+|\d+(?=Z)";
var currencies = new string[]{
"A123", // match
"123Z", // match
"123", // NOT match
};
foreach (var c in currencies)
{
Match m = Regex.Match(c, pattern, RegexOptions.IgnoreCase);
Console.WriteLine("Matched:" + m.Success + ". Value:" + m.Value);
}
The output of the code here.
Matched:True. Value:123
Matched:True. Value:123
Matched:False. Value:
I have strings in this format:
object[i].base.base_x[i] and I get lists like List(0,1).
I want to use regular expressions in scala to find the match [i] in the given string and replace the first occurance with 0 and the second with 1. Hence getting something like object[0].base.base_x[1].
I have the following code:
val stringWithoutIndex = "object[i].base.base_x[i]" // basically this string is generated dynamically
val indexReplacePattern = raw"\[i\]".r
val indexValues = List(0,1) // list generated dynamically
if(indexValues.nonEmpty){
indexValues.map(row => {
indexReplacePattern.replaceFirstIn(stringWithoutIndex , "[" + row + "]")
})
else stringWithoutIndex
Since String is immutable, I cannot update stringWithoutIndex resulting into an output like List("object[0].base.base_x[i]", "object[1].base.base_x[i]").
I tried looking into StringBuilder but I am not sure how to update it. Also, is there a better way to do this? Suggestions other than regex are also welcome.
You couldloop through the integers in indexValues using foldLeft and pass the string stringWithoutIndex as the start value.
Then use replaceFirst to replace the first match with the current value of indexValues.
If you want to use a regex, you might use a positive lookahead (?=]) and a positive lookbehind (?<=\[) to assert the i is between opening and square brackets.
(?<=\[)i(?=])
For example:
val strRegex = """(?<=\[)i(?=])"""
val res = indexValues.foldLeft(stringWithoutIndex) { (s, row) =>
s.replaceFirst(strRegex, row.toString)
}
See the regex demo | Scala demo
How about this:
scala> val str = "object[i].base.base_x[i]"
str: String = object[i].base.base_x[i]
scala> str.replace('i', '0').replace("base_x[0]", "base_x[1]")
res0: String = object[0].base.base_x[1]
This sounds like a job for foldLeft. No need for the if (indexValues.nonEmpty) check.
indexValues.foldLeft(stringWithoutIndex) { (s, row) =>
indexReplacePattern.replaceFirstIn(s, "[" + row + "]")
}
I am trying to write a regular expression which returns a string which is between parentheses. For example: I want to get the string which resides between the strings "(" and ")"
I expect five hundred dollars ($500).
would return
$500
Found Regular expression to get a string between two strings in Javascript
I don't know how to use '(', ')' in regexp.
You need to create a set of escaped (with \) parentheses (that match the parentheses) and a group of regular parentheses that create your capturing group:
var regExp = /\(([^)]+)\)/;
var matches = regExp.exec("I expect five hundred dollars ($500).");
//matches[1] contains the value between the parentheses
console.log(matches[1]);
Breakdown:
\( : match an opening parentheses
( : begin capturing group
[^)]+: match one or more non ) characters
) : end capturing group
\) : match closing parentheses
Here is a visual explanation on RegExplained
Try string manipulation:
var txt = "I expect five hundred dollars ($500). and new brackets ($600)";
var newTxt = txt.split('(');
for (var i = 1; i < newTxt.length; i++) {
console.log(newTxt[i].split(')')[0]);
}
or regex (which is somewhat slow compare to the above)
var txt = "I expect five hundred dollars ($500). and new brackets ($600)";
var regExp = /\(([^)]+)\)/g;
var matches = txt.match(regExp);
for (var i = 0; i < matches.length; i++) {
var str = matches[i];
console.log(str.substring(1, str.length - 1));
}
Simple solution
Notice: this solution can be used for strings having only single "(" and ")" like string in this question.
("I expect five hundred dollars ($500).").match(/\((.*)\)/).pop();
Online demo (jsfiddle)
To match a substring inside parentheses excluding any inner parentheses you may use
\(([^()]*)\)
pattern. See the regex demo.
In JavaScript, use it like
var rx = /\(([^()]*)\)/g;
Pattern details
\( - a ( char
([^()]*) - Capturing group 1: a negated character class matching any 0 or more chars other than ( and )
\) - a ) char.
To get the whole match, grab Group 0 value, if you need the text inside parentheses, grab Group 1 value.
Most up-to-date JavaScript code demo (using matchAll):
const strs = ["I expect five hundred dollars ($500).", "I expect.. :( five hundred dollars ($500)."];
const rx = /\(([^()]*)\)/g;
strs.forEach(x => {
const matches = [...x.matchAll(rx)];
console.log( Array.from(matches, m => m[0]) ); // All full match values
console.log( Array.from(matches, m => m[1]) ); // All Group 1 values
});
Legacy JavaScript code demo (ES5 compliant):
var strs = ["I expect five hundred dollars ($500).", "I expect.. :( five hundred dollars ($500)."];
var rx = /\(([^()]*)\)/g;
for (var i=0;i<strs.length;i++) {
console.log(strs[i]);
// Grab Group 1 values:
var res=[], m;
while(m=rx.exec(strs[i])) {
res.push(m[1]);
}
console.log("Group 1: ", res);
// Grab whole values
console.log("Whole matches: ", strs[i].match(rx));
}
Ported Mr_Green's answer to a functional programming style to avoid use of temporary global variables.
var matches = string2.split('[')
.filter(function(v){ return v.indexOf(']') > -1})
.map( function(value) {
return value.split(']')[0]
})
Alternative:
var str = "I expect five hundred dollars ($500) ($1).";
str.match(/\(.*?\)/g).map(x => x.replace(/[()]/g, ""));
→ (2) ["$500", "$1"]
It is possible to replace brackets with square or curly brackets if you need
For just digits after a currency sign : \(.+\s*\d+\s*\) should work
Or \(.+\) for anything inside brackets
let str = "Before brackets (Inside brackets) After brackets".replace(/.*\(|\).*/g, '');
console.log(str) // Inside brackets
var str = "I expect five hundred dollars ($500) ($1).";
var rex = /\$\d+(?=\))/;
alert(rex.exec(str));
Will match the first number starting with a $ and followed by ')'. ')' will not be part of the match. The code alerts with the first match.
var str = "I expect five hundred dollars ($500) ($1).";
var rex = /\$\d+(?=\))/g;
var matches = str.match(rex);
for (var i = 0; i < matches.length; i++)
{
alert(matches[i]);
}
This code alerts with all the matches.
References:
search for "?=n"
http://www.w3schools.com/jsref/jsref_obj_regexp.asp
search for "x(?=y)"
https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/RegExp
Simple:
(?<value>(?<=\().*(?=\)))
I hope I've helped.
I have a string which contains the data in xml format like as
str = "<p><a>_a_10gd_</a><a>_a_xy8a_</a><a>_a_1020_</a><a>_a_dfa7_</a><a>_a_ABCD_</a></p>";
What I am trying to do is that I want to capture _abc__(Value)__ from all possible mach. I have tried it that way
Let say I am doing this in JavaScript :-
var regex = /_a_(.+)_/g ;
var str = "<a>_a_10gd_</a><a>_a_xy8a_</a><a>_a_1020_</a><a>_a_dfa7_</a><a>_a_ABCD_</a>";
while(m = regex.exec(str)){
console.log(m[1]); // m[1] should contains each mach
}
I want to get all maching group in an array like this :-
var a = ['10gd', 'xy8a', '1020', 'dfa7', 'ABCD'];
Please tell me that what will be required regex and explain it also because I am new to regex and their capturing group.
Just change (.+) to (.+?) see:
var regex = /_a_(.+?)_/g ;
var str = "<a>_a_10gd_</a><a>_a_xy8a_</a><a>_a_1020_</a><a>_a_dfa7_</a><a>_a_ABCD_</a>";
while(m = regex.exec(str)){
console.log(m[1]); // m[1] should contains each mach
}
for more information about greediness, see What do lazy and greedy mean in the context of regular expressions?
Another option is to accept only characters except _ before the _ (instead of . which you have used), like so:
var regex = /_a_([^_]+)_/g ;
I don't know regular expression at all. Can anybody help me with one very simple regular expression which is,
extracting 'word:word' from a sentence. e.g "Java Tutorial Format:Pdf With Location:Tokyo Javascript"?
Little modification:
the first 'word' is from a list but second is anything. "word1 in [ABC, FGR, HTY]"
guys situation demands a little more
modification.
The matching form can be "word11:word12 word13 .. " till the next "word21: ... " .
things are becoming complex with sec.....i have to learn reg ex :(
thanks in advance.
You can use the regex:
\w+:\w+
Explanation:
\w - single char which is either a letter(uppercase or lowercase), digit or a _.
\w+ - one or more of above char..basically a word
so \w+:\w+
would match a pair of words separated by a colon.
Try \b(\S+?):(\S+?)\b. Group 1 will capture "Format" and group 2, "Pdf".
A working example:
<html>
<head>
<script type="text/javascript">
function test() {
var re = /\b(\S+?):(\S+?)\b/g; // without 'g' matches only the first
var text = "Java Tutorial Format:Pdf With Location:Tokyo Javascript";
var match = null;
while ( (match = re.exec(text)) != null) {
alert(match[1] + " -- " + match[2]);
}
}
</script>
</head>
<body onload="test();">
</body>
</html>
A good reference for regexes is https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp
Use this snippet :
$str=" this is pavun:kumar hello world bk:systesm" ;
if ( preg_match_all ( '/(\w+\:\w+)/',$str ,$val ) )
{
print_r ( $val ) ;
}
else
{
print "Not matched \n";
}
Continuing Jaú's function with your additional requirement:
function test() {
var words = ['Format', 'Location', 'Size'],
text = "Java Tutorial Format:Pdf With Location:Tokyo Language:Javascript",
match = null;
var re = new RegExp( '(' + words.join('|') + '):(\\w+)', 'g');
while ( (match = re.exec(text)) != null) {
alert(match[1] + " = " + match[2]);
}
}
I am currently solving that problem in my nodejs app and found that this is, what I guess, suitable for colon-paired wordings:
([\w]+:)("(([^"])*)"|'(([^'])*)'|(([^\s])*))
It also matches quoted value. like a:"b" c:'d e' f:g
Example coding in es6:
const regex = /([\w]+:)("(([^"])*)"|'(([^'])*)'|(([^\s])*))/g;
const str = `category:"live casino" gsp:S1aik-UBnl aa:"b" c:'d e' f:g`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Example coding in PHP
$re = '/([\w]+:)("(([^"])*)"|\'(([^\'])*)\'|(([^\s])*))/';
$str = 'category:"live casino" gsp:S1aik-UBnl aa:"b" c:\'d e\' f:g';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
You can check/test your regex expressions using this online tool: https://regex101.com
Btw, if not deleted by regex101.com, you can browse that example coding here
here's the non regex way, in your favourite language, split on white spaces, go through the element, check for ":" , print them if found. Eg Python
>>> s="Java Tutorial Format:Pdf With Location:Tokyo Javascript"
>>> for i in s.split():
... if ":" in i:
... print i
...
Format:Pdf
Location:Tokyo
You can do further checks to make sure its really "someword:someword" by splitting again on ":" and checking if there are 2 elements in the splitted list. eg
>>> for i in s.split():
... if ":" in i:
... a=i.split(":")
... if len(a) == 2:
... print i
...
Format:Pdf
Location:Tokyo
([^:]+):(.+)
Meaning: (everything except : one or more times), :, (any character one ore more time)
You'll find good manuals on the net... Maybe it's time for you to learn...