Limit regex scope to match items only in a section of code - regex

This is a general regex question.
Suppose I have the following code:
function (item1, item2, item3) {
var item4 = null
var item5 = null
}
I know that's javascript, but I don't want a javascript-specific answer: I'm curious about a pure regex answer.
Suppose I want to write regex that matches any word that starts with "item", but I only want matches that are between parenthesis.
So my question: is there a way to write a regex query that matches everything that starts with item but is also between parens? Like a way to limit my regex scope to just things within parens?
UPDATE: Just so people know, I am asking because I am working on language support for Atom (the text editor), which (to my knowledge) supports only pure regex to match patterns to add language styling. Because of this, I'm stuck with pure regex, even though I am parsing JS.

Pull out parenthesized things, then look inside them.
str="function (item1, item2, item3) {\
var item4 = null\
var item5 = null\
}";
var results = [].concat(
str.match(/\(.*?\)/g)
.map(function(submatch) {
return submatch.match(/\bitem\w*\b/g);
})
);
document.write(results);
This code first calls String#match to retrieve all the parenthesized portions of the input, so in this case it returns ['(item, item2, item3)'] (an array with one element). Then it calls Array#map on that array, transforming each element into a list of matches for item* (the \b matches word boundaries, and \w matches alphanumerics). Since the result is now a nested array ([['item1', 'item2', item3']]), we use Array#concat to flatten it out.

You could try the below regex,
(?:[^()]*(?=\()|[^()]*$)(*SKIP)(*F)|item\w+
DEMO

Related

Regex to select text outside of underscores

I am looking for a regex to select the text which falls outside of underscore characters.
Sample text:
PartIWant_partINeedIgnored_morePartsINeedIgnored_PartIwant
Basically I need to be able to select the first keyword which is always before the first underscore and the last keyword which is always after the last underscore. As an additional complexity, there case also be texts which have no underscore at all, these need to be selected completely as well.
The best I got yet was this expression:
^((?! *\_[^)]*\_ *).)*
which is only yielding me the first part, not the second and it has no support for the non-underscore yet at all.
This regex is used in a tool which monitors our http traffic, which means I can only 'select' the part I need but can't invoke functions or replace logic.
Thanks!
Use JavaScript string function split(). Check below example.
var t = "PartIWant_partINeedIgnored_morePartsINeedIgnored_PartIwant";
var arr = t.split('_');
console.log(arr);
//Access the required parts like this
console.log(arr[0] + ' ' + arr[arr.length - 1]);
Perhaps something like this:
/(^[^_]+)|([^_]+$)/g
That is, match either:
^[^_]+ the beginning of the string followed by non-underscores, or
[^_]+$ non-underscores followed by the end of the string.
var regex = /(^[^_]+)|([^_]+$)/g
console.log("A_b_c_D".match(regex)) // ["A", "D"]
console.log("A_b_D".match(regex)) // ["A", "D"]
console.log("A_D".match(regex)) // ["A", "D"]
console.log("AD".match(regex)) // ["AD"]
I'm not sure if you should use a regex here. I think splitting the string at underscore, and using the first and last element of the resulting array might be faster, and less complicated.
Trivial with .replace:
str.replace(/_.*_/, '')
// "PartIWantPartIwant"
With matching, you'd need to be selecting and concatenating groups:
parts = str.match(/^([^_]*).*?([^_]*)$/)
parts[1] + parts[2]
// "PartIWantPartIwant"
EDIT
This regex is used in a tool which monitors our http traffic, which means I can only 'select' the part I need but can't invoke functions or replace logic.
This is not possible: a regular expression cannot match a discontinuous span.

Regex without brackets

I have the following tag from an XML file:
<msg><![CDATA[Method=GET URL=http://test.de:80/cn?OP=gtm&Reset=1(Clat=[400441379], Clon=[-1335259914], Decoding_Feat=[], Dlat=[0], Dlon=[0], Accept-Encoding=gzip, Accept=*/*) Result(Content-Encoding=[gzip], Content-Length=[7363], ntCoent-Length=[15783], Content-Type=[text/xml; charset=utf-8]) Status=200 Times=TISP:270/CSI:-/Me:1/Total:271]]>
Now I try to get from this message: Clon, Dlat, Dlon and Clat.
However, I already created the following regex:
(?<=Clat=)[\[\(\d+\)\n\n][^)n]+]
But the problem is here, I would like to get only the numbers without the brackets. I tried some other expressions.
Do you maybe know, how I can expand this expression, in order to get only the values without the brackets?
Thank you very much in advance.
Best regards
The regex
(clon|dlat|dlon|clat)=\[(-?\d+)\]
Gives
As I stated before, if you use this regex to extract the information out of this CDATA element, that's okay. But you really want to get to the contents of that element using an XML parser.
Example usage
Regex r = new Regex(#"(clon|dlat|dlon|clat)=\[(-?\d+)\]");
string s = ".. here's your cdata content .. ";
foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnoreCase))
{
var name = match.Groups[1].Value; //will contain "clon", "dlat", "dlon" or "clat"
var inner_value = match.Groups[2].Value; //will contin the value inside the square-brackets, e.g. "400441379"
//Do something with the matches
}

How to replace all found ocurrences in a Google Docs for an hyperlink

We are actually wondering how can you for example find Bible verses in the document text and replace them for an URL of the verse on the web.
For example if you have a "Jn 3.1" text it will be replaced for an hiperlink like this:
Text= Jn 3.1
Link= https://www.bible.com/1/jn.3.1
we though on using Body.replaceText(searchPattern, replacement) but you cant use that for insert an hyperlink.
And also we must think that the number of characters of the verse can change, for example, it can be:
Jn 1.3
that is 6 characters or can be
John 10.10
that is 10 characters. I think that this can be covered with regex (if we are be able to use them with the solution, so its irrelevant if the solution cover it.
For this kind of modifications you will have to use the Appsscript functions. They work in the same way than normal javascript functions but here you are able to work directly with the text.
for this case the replace function is: replaceText(searchPattern, replacement)
and this is how you can search the word in your document and then replace the text.
function myFunction() {
var doc = DocumentApp.getActiveDocument();
var word = 'example';
var rep = 'replacement';
var body = doc.getBody().editAsText().findText(word);
var elem = body.getElement().asText();
var idx = elem.editAsText().getText().indexOf(word);
elem.replaceText(word, rep);
}
So basically you find the element that contains the desired word, then you will get the element and then you will edit the text contained in that element.
I personally don't like to put complete urls in the text, rather i would use and inline link so in this case "Jn 1.3" would be the text of the hyperlink.
For that, instead of the replaceText line, you can use:
var result = elem.setLinkUrl(idx, idx+word.length -1, 'www.google.com');
It will be easier to read. I hope it helps.

Using a Variable in an AS3, Regexp

Using Actionscript 3.0 (Within Flash CS5)
A standard regex to match any digit is:
var myRegexPattern:Regex = /\d/g;
What would the regex look like to incorporate a string variable to match?
(this example is an 'IDEAL' not a 'WORKING' snippet) ie:
var myString:String = "MatchThisText"
var myRegexPatter_WithString:Regex = /\d[myString]/g;
I've seen some workarounds which involve creating multiple regex instances, then combine them by source, with the variable in question, which seems wrong. OR using the flash string to regex creator, but it's just plain sloppy with all the double and triple escape sequences required.
There must be some pain free way that I can't find in the live docs or on google. Does AS3 hold this functionality even? If not, it really should.
Or I am missing a much easier means of simply avoiding this task that I'm simply naive too due to my newness to regex?
I've actually blogged about this, so I'll just point you there: http://tyleregeto.com/using-vars-in-regular-expressions-as3 It talks about the possible solutions, but there is no ideal one like you mention.
EDIT
Here is a copy of the important parts of that blog entry:
Here is a regex to strip the tags from a block of text.
/<("[^"]*"|'[^']*'|[^'">])*>/ig
This nifty expression works like a charm. But I wanted to update it so the developer could limit which tags it stripped to those specified in a array. Pretty straight forward stuff, to use a variable value in a regex you first need to build it as a string and then convert it. Something like the following:
var exp:String = 'start-exp' + someVar + 'more-exp';
var regex:Regexp = new RegExp(exp);
Pretty straight forward. So when approaching this small upgrade, that's what I did. Of course one big problem was pretty clear.
var exp:String = '/<' + tag + '("[^"]*"|'[^']*'|[^'">])*>/';
Guess what, invalid string! Better escape those quotes in the string. Whoops, that will break the regex! I was stumped. So I opened up the language reference to see what I could find. The "source" parameter, (which I've never used before,) caught my eye. It returns a String described as "the pattern portion of the regular expression." It did the trick perfectly. Here is the solution:
var start:Regexp = /])*>/ig;
var complete:RegExp = new RegExp(start.source + tag + end.source);
You can reduce it down to this for convenience:
var complete:RegExp = new RegExp(/])*>/.source + tag, 'ig');
As Tyler correctly points out (and his answer works just fine), you can assemble your regex as a string end then pass this string to the RegExp constructor with the new RegExp("pattern", "flags") syntax.
function assembleRegex(myString) {
var re = new RegExp('\\d' + myString, "i");
return re;
}
Note that when using a string to store a regex pattern, you do need to add some extra backslashes to get it to work right (e.g. to get a \d in the regex, you need to specify \\d in the string). Note also that the string pattern does not use the forward slash delimiters. In other words, the following two statements are equivalent:
var re1 = /\d/ig;
var re2 = new Regexp("\\d", "ig");
Additional note: You may need to process the myString variable to escape any backslashes it might contain (if they are to be interpreted as literal). If this is the case the function becomes:
function assembleRegex(myString) {
myString = myString.replace(/\\/, '\\\\');
var re = new RegExp('\\d' + myString);
return re;
}

Problem with Actionscript Regular Expressions

I have to parse out color information from HTML data. The colors can either be RGB colors or file names to a swatch image.
I used http://www.gskinner.com/RegExr/ to develop and test the patterns. I copied the AS regular expression code verbatim from the tool into Flex Builder. But, when I exec the pattern against the string I get a null.
Here are the patterns and an example of the string (I took the correct HTML tags out so the strings would show correctly):
DIV data:
<div style="background-color:rgb(2,2,2);width:10px;height:10px;">
DIV pattern:
/([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/
IMG data:
<img src="/media/swatches/jerzeesbirch.gif" width="10" height="10" alt="Birch">
IMG pattern:
/[a-z0-9_-]+/[a-z0-9_-]+/[a-z0-9_-]+\.[a-z0-9_-]+/
Here's my Actionscript code:
var divPattern : RegExp = new RegExp("/([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/");
var imgPattern : RegExp = new RegExp("/[a-z0-9_-]+/[a-z0-9_-]+/[a-z0-9_-]+\.[a-z0-9_-]+/");
var divResult : Array = divPattern.exec(object.swatch);
var imgResult : Array = imgPattern.exec(object.swatch);
Both of the arrays are null.
This is my first foray into AS coding, so I think I'm declaring something wrong.
Steve
(I don't know ActionScript but I know Javascript and they should be close enough to solve your problem.)
To construct a RegExp object for e.g. the pattern ^[a-z]+$, you either use
var pattern : RegExp = new RegExp("^[a-z]+$");
or, better,
var pattern : RegExp = /^[a-z]+$/
The code new RegExp("/^[a-z]+$/") is wrong because this expects a slash before the ^ and after the $.
Therefore, your DIV pattern should be written as
var divPattern : RegExp = /([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/;
but, as you know, the ( and ) are special characters for capturing, you need to escape them:
var divPattern : RegExp = /\([0-9]{1,3},[0-9]{1,3},[0-9]{1,3}\)/;
For the IMG pattern, as / delimitates a RegEx, you need to escape it as well:
var imgPattern : RegExp = /[a-z0-9_-]+\/[a-z0-9_-]+\/[a-z0-9_-]+\.[a-z0-9_-]+/
Finally, you could use \d in place of [0-9] and \w in place of [a-zA-Z0-9_].
I don't know enough to tell if your regex patterns are correct, but from the docs on the AS3 RegExp class, it looks like your new RegExp() call needs a second argument to declare flags for case sensitivity etc.
EDIT: Also, as Bart K has pointed out, you don't need the / delimiters when using the new method.
So you can use either:
var divPattern:RegExp = new RegExp("([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})", "");
OR you can also use the alternate syntax with /:
var divPattern:RegExp = /([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/;
... in which case the flag string (if any) is included after the final /