Pattern Substitution in Haxe - regex

var str2 : String = "Expander Detected (%MSG_ID%)";
var r2 = ~/[\(%MSG_ID%\)]+/g;
trace(r2.replace(str2, ""));
Expected Result: Expander Detected
Actual Result: Expander etected
I need to replace (%MSG_ID%) in my strings. Characters before (%MSG_ID%) are dynamic, so we can not replace them manually.

You need to remove the surrounding []. This works as expected:
var r2 = ~/\(%MSG_ID%\)+/g;
[] is a character set which matches if a single character contained in the set matches. Since the set happens to contain D, the D is also removed when calling replace(). However, you only want to match if all characters (and in that order) are present.
I'd recommend a tool like regex101.com for testing regexes. You can nicely see the issue there:

Related

Pattern Validator in Angular Reactive Forms using Regex [duplicate]

I'm doing a small javascript method, which receive a list of point, and I've to read those points to create a Polygon in a google map.
I receive those point on the form:
(lat, long), (lat, long),(lat, long)
So I've done the following regex:
\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)
I've tested it with RegexPal and the exact data I receive:
(25.774252, -80.190262),(18.466465, -66.118292),(32.321384, -64.75737),(25.774252, -80.190262)
and it works, so why when I've this code in my javascript, I receive null in the result?
var polygons="(25.774252, -80.190262),(18.466465, -66.118292),(32.321384, -64.75737),(25.774252, -80.190262)";
var reg = new RegExp("/\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)/g");
var result = polygons.match(reg);
I've no javascript error when executing(with debug mode of google chrome). This code is hosted in a javascript function which is in a included JS file. This method is called in the OnLoad method.
I've searched a lot, but I can't find why this isn't working. Thank you very much!
Use a regex literal [MDN]:
var reg = /\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)/g;
You are making two errors when you use RegExp [MDN]:
The "delimiters" / are should not be part of the expression
If you define an expression as string, you have to escape the backslash, because it is the escape character in strings
Furthermore, modifiers are passed as second argument to the function.
So if you wanted to use RegExp (which you don't have to in this case), the equivalent would be:
var reg = new RegExp("\\(\\s*([0-9.-]+)\\s*,\\s([0-9.-]+)\\s*\\)", "g");
(and I think now you see why regex literals are more convenient)
I always find it helpful to copy and past a RegExp expression in the console and see its output. Taking your original expression, we get:
/(s*([0-9.-]+)s*,s([0-9.-]+)s*)/g
which means that the expressions tries to match /, s and g literally and the parens () are still treated as special characters.
Update: .match() returns an array:
["(25.774252, -80.190262)", "(18.466465, -66.118292)", ... ]
which does not seem to be very useful.
You have to use .exec() [MDN] to extract the numbers:
["(25.774252, -80.190262)", "25.774252", "-80.190262"]
This has to be called repeatedly until the whole strings was processed.
Example:
var reg = /\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)/g;
var result, points = [];
while((result = reg.exec(polygons)) !== null) {
points.push([+result[1], +result[2]]);
}
This creates an array of arrays and the unary plus (+) will convert the strings into numbers:
[
[25.774252, -80.190262],
[18.466465, -66.118292],
...
]
Of course if you want the values as strings and not as numbers, you can just omit the +.

Why is this seemingly correct Regex not working correctly in Rascal?

In have following code:
set[str] noNnoE = { v | str v <- eu, (/\b[^eEnN]*\b/ := v) };
The goal is to filter out of a set of strings (called 'eu'), those strings that have no 'e' or 'n' in them (both upper- and lowercase). The regular expression I've provided:
/\b[^eEnN]?\b/
seems to work like it should, when I try it out in an online regex-tester.
When trying it out in the Rascel terminal it doesn't seem to work:
rascal>/\b[^eEnN]*\b/ := "Slander";
bool: true
I expected no match. What am I missing here? I'm using the latest (stable) Rascal release in Eclipse Oxygen1a.
Actually, the online regex-tester is giving the same match that we are giving. You can look at the match as follows:
if (/<w1:\b[^eEnN]?\b>/ := "Slander")
println("The match is: |<w1>|");
This is assigning the matched string to w1 and then printing it between the vertical bars, assuming the match succeeds (if it doesn't, it returns false, so the body of the if will not execute). If you do this, you will get back a match to the empty string:
The match is: ||
The online regex tester says the same thing:
Match 1
Full match 0-0 ''
If you want to prevent this, you can force at least one occurrence of the characters you are looking for by using a +, versus a ?:
rascal>/\b[^eEnN]+\b/ := "Slander";
bool: false
Note that you can also make the regex match case insensitive by following it with an i, like so:
/\b[^en]+\b/i
This may make it easier to write if you need to add more characters into the character class.
This solution (/\b[^en]+\b/i) doesn't work for strings consisting of two words, such as the Czech Republic.
Try /\b[^en]+\b$/i. That seems to work for me.

Regex to select text outside of underscores

I am looking for a regex to select the text which falls outside of underscore characters.
Sample text:
PartIWant_partINeedIgnored_morePartsINeedIgnored_PartIwant
Basically I need to be able to select the first keyword which is always before the first underscore and the last keyword which is always after the last underscore. As an additional complexity, there case also be texts which have no underscore at all, these need to be selected completely as well.
The best I got yet was this expression:
^((?! *\_[^)]*\_ *).)*
which is only yielding me the first part, not the second and it has no support for the non-underscore yet at all.
This regex is used in a tool which monitors our http traffic, which means I can only 'select' the part I need but can't invoke functions or replace logic.
Thanks!
Use JavaScript string function split(). Check below example.
var t = "PartIWant_partINeedIgnored_morePartsINeedIgnored_PartIwant";
var arr = t.split('_');
console.log(arr);
//Access the required parts like this
console.log(arr[0] + ' ' + arr[arr.length - 1]);
Perhaps something like this:
/(^[^_]+)|([^_]+$)/g
That is, match either:
^[^_]+ the beginning of the string followed by non-underscores, or
[^_]+$ non-underscores followed by the end of the string.
var regex = /(^[^_]+)|([^_]+$)/g
console.log("A_b_c_D".match(regex)) // ["A", "D"]
console.log("A_b_D".match(regex)) // ["A", "D"]
console.log("A_D".match(regex)) // ["A", "D"]
console.log("AD".match(regex)) // ["AD"]
I'm not sure if you should use a regex here. I think splitting the string at underscore, and using the first and last element of the resulting array might be faster, and less complicated.
Trivial with .replace:
str.replace(/_.*_/, '')
// "PartIWantPartIwant"
With matching, you'd need to be selecting and concatenating groups:
parts = str.match(/^([^_]*).*?([^_]*)$/)
parts[1] + parts[2]
// "PartIWantPartIwant"
EDIT
This regex is used in a tool which monitors our http traffic, which means I can only 'select' the part I need but can't invoke functions or replace logic.
This is not possible: a regular expression cannot match a discontinuous span.

How to include 2 words within Regex and result must be based on only those 2 words VB.NET

I would like to know how to include only 2 or more keywords within a Regex. and ending results should only show those words defined, not only one word.
What I currently have works with multiple keywords but I want it to use BOTH words not either one of the other.
For example:
Dim pattern As String = "(?i)[\t ](?<w>((arma)|(crapo))[a-z0-9]*)[\t ]"
Now the code works fine by including 'arma' or 'crapo'. I only want it to include BOTH 'arma' AND 'crapo' otherwise do not show any results.
Dealing with finding certain keywords within a PDF document and I only want to be shown results if the PDF document includes BOTH 'arma' and 'crapo' (Works fine by showing results for 'arma' OR 'crapo' I want to see results based on 'arma' AND 'crapo'.
Sorry for sounding so repetitive.
Edit: Here is my code. Please read comment.
Dim filesz() As String = GetPatternedFiles("c:\temp\", New String() {"tes*.pdf", "fes*.pdf", "Bas*.pdf"})
'The getpatterenedfiles is a function" also gettextfromPDF is another function.
For Each s As String In filesz
Dim thetext As String = Nothing
Dim pattern As String = "(?i)[\t ](?<w>(crapo)|(arma)[a-z0-9]*)[\t ]"
thetext = GetTextFromPDF(s)
For Each m As Match In Regex.Matches(thetext, pattern)
ListBox1.Items.Add(s)
Next
Next
You can use this regex:
\barma\b.*?\bcrapo\b|\bcrapo\b.*?\barma\b
Working demo
The idea is to match arma whatever crapo or crapo whatever arma and use word boundaries to avoid words like karma.
However, if you want to match karma or crapotos as you asked in your comment you can use:
arma.*?crapo|crapo.*?arma

RegEx for a price in £

i have: \£\d+\.\d\d
should find: £6.95 £16.95 etc
+ is one or more
\. is the dot
\d is for a digit
am i wrong? :(
JavaScript for Greasemonkey
// ==UserScript==
// #name CurConvertor
// #namespace CurConvertor
// #description noam smadja
// #include http://www.zavvi.com/*
// ==/UserScript==
textNodes = document.evaluate(
"//text()",
document,
null,
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
null);
var searchRE = /\£[0-9]\+.[0-9][0-9];
var replace = 'pling';
for (var i=0;i<textNodes.snapshotLength;i++) {
var node = textNodes.snapshotItem(i);
node.data = node.data.replace(searchRE, replace);
}
when i change the regex to /Free for example it finds and changes. but i guess i am missing something!
Had this written up for your last question just before it was deleted.
Here are the problems you're having with your GM script.
You're checking absolutely every
text node on the page for some
reason. This isn't causing it to
break but it's unnecessary and slow.
It would be better to look for text
nodes inside .price nodes and .rrp
.strike nodes instead.
When creating new regexp objects in
this way, backslashes must be
escaped, ex:
var searchRE = new
RegExp('\\d\\d','gi');
not
var
searchRE = new RegExp('\d\d','gi');
So you can add the backslashes, or
create your regex like this:
var
searchRE = /\d\d/gi;
Your actual regular expression is
only checking for numbers like
##ANYCHARACTER##, and will ignore £5.00 and £128.24
Your replacement needs to be either
a string or a callback function, not
a regular expression object.
Putting it all together
textNodes = document.evaluate(
"//p[contains(#class,'price')]/text() | //p[contains(#class,'rrp')]/span[contains(#class,'strike')]/text()",
document,
null,
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
null);
var searchRE = /£(\d+\.\d\d)/gi;
var replace = function(str,p1){return "₪" + ( (p1*5.67).toFixed(2) );}
for (var i=0,l=textNodes.snapshotLength;i<l;i++) {
var node = textNodes.snapshotItem(i);
node.data = node.data.replace(searchRE, replace);
}
Changes:
Xpath now includes only p.price and p.rrp span.strke nodes
Search regular expression created with /regex/ instead of new RegExp
Search variable now includes target currency symbol
Replace variable is now a function that replaces the currency symbol with a new symbol, and multiplies the first matched substring with substring * 5.67
for loop sets a variable to the snapshot length at the beginning of the loop, instead of checking textNodes.snapshotLength at the beginning of every loop.
Hope that helps!
[edit]Some of these points don't apply, as the original question changed a few times, but the final script is relevant, and the points may still be of interest to you for why your script was failing originally.
You are not wrong, but there are a few things to watch out for:
The £ sign is not a standard ASCII character so you may have encoding issue, or you may need to enable a unicode option on your regular expression.
The use of \d is not supported in all regular expression engines. [0-9] or [[:digit:]] are other possibilities.
To get a better answer, say which language you are using, and preferably also post your source code.
£[0-9]+(,[0-9]{3})*\.[0-9]{2}$
this will match anything from £dd.dd to £d[dd]*,ddd.dd. So it can fetch millions and hundreds as well.
The above regexp is not strict in terms of syntaxes. You can have, for example: 1123213123.23
Now, if you want an even strict regexp, and you're 100% sure that the prices will follow the comma and period syntaxes accordingly, then use
£[0-9]{1,3}(,[0-9]{3})*\.[0-9]{2}$
Try your regexps here to see what works for you and what not http://tools.netshiftmedia.com/regexlibrary/
It depends on what flavour of regex you are using - what is the programming language?
some older versions of regex require the + to be escaped - sed and vi for example.
Also some older versions of regex do not recognise \d as matching a digit.
Most modern regex follow the perl syntax and £\d+\.\d\d should do the trick, but it does also depend on how the £ is encoded - if the string you are matching encodes it differently from the regex then it will not match.
Here is an example in Python - the £ character is represented differently in a regular string and a unicode string (prefixed with a u):
>>> "£"
'\xc2\xa3'
>>> u"£"
u'\xa3'
>>> import re
>>> print re.match("£", u"£")
None
>>> print re.match(u"£", "£")
None
>>> print re.match(u"£", u"£")
<_sre.SRE_Match object at 0x7ef34de8>
>>> print re.match("£", "£")
<_sre.SRE_Match object at 0x7ef34e90>
>>>
£ isn't an ascii character, so you need to work out encodings. Depending on the language, you will either need to escape the byte(s) of £ in the regex, or convert all the strings into Unicode before applying the regex.
In Ruby you could just write the following
/£\d+.\d{2}/
Using the braces to specify number of digits after the point makes it slightly clearer