Using regular expression extractor to extract a value? - regex

I am trying to extract the value from the following code. Even though my regex expression is fine it is still not extracting the value.
token" value="(.+?)"
this does give me the exact match which I checked using regex101.com
<input type="hidden" name="token" value="GSYGEP2UUWOTMZ2SFV1G5D2M8L247KIG">
what the regex expression should be

Your original regular expression is just fine:
value="(.+?)"
It might be additional spaces, or code problems with it. Let's remove the token" or try to escape ", if necessary.
DEMO 1
DEMO 2
Reference:
Regular Expressions

Try this
<input(?=(?:[^>"']|"[^"]*"|'[^']*')*?\sname\s*=\s*(['"])\s*token\s*\1)(?=(?:[^>"']|"[^"]*"|'[^']*')*?\svalue\s*=\s*(['"])((?:(?!\2)[\S\s])*)\2)\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>
The Value content you're after is in Capture Group 3
https://regex101.com/r/HJhStT/1
https://regex101.com/r/8BWONb/1
Explained
< input # Input tag
(?= # Name attribute: Assert (a pseudo atomic group)
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s name \s* = \s* # name =
( ['"] ) # (1), Quote
\s* token \s* # token
\1
)
(?= # Value attribute
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s value \s* = \s* # value =
( ['"] ) # (2), Quote
( # (3 start), value content
(?:
(?! \2 )
[\S\s]
)*
) # (3 end)
\2
)
# Just get rest of tag
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>

Related

Regex to select newlines in between quotation marks

I have a string in Ruby that resembles the following:
{
"a boolean": true,
"multiline": "
my
multiline
value
",
"a normal key": "a normal value"
}
I would like to match only the newline characters in the substring:
"
my
multiline
value
",
This is so that I can replace them with escaped newline characters. The aim here is to make the JSON easier to work with in the long run.
UPdate - These regex work as expected.
From #faissaloo - it seemed to fail however on my large JSON.
I ran this large string using both regex:
PCRE https://regex101.com/r/3jtqea/1
Ruby https://regex101.com/r/1HVCCC/1
They both work identically, and without flaw.
If you have any other concerns, please let me know.
I think Ruby supports the Perl like constructs.
If so, it can be done in a single global find and replace.
Like this:
Edit - Ruby doesn't do Backtracking Control Verbs (*SKIP)(*FAIL)
so, to do this in Ruby code, requires the regex to be more explicit.
So, with a slight modification of the pcre/perl regex, the Ruby equivalent is:
Ruby
Find
(?-m)((?!\A)\G|(?:(?>[^"]*"[^"\r\n]*"[^"]*))*")([^"\r\n]*)\K\r?\n(?=[^"]*")((?:[^"\r\n]*"(?:(?>[^"]*"[^"\r\n]*"))*[^"]*)?)
Replace
\\n\3
https://regex101.com/r/BaqjEE/1
https://rextester.com/NVFD38349
Explained ( but it's complex )
(?-m) # Non-multiline mode safety check
( # (1 start), Prefix. Capture for debug
(?! \A ) # Not BOS
\G # Test where last match left off
| # or,
(?: # Optionally align to next " ( only used once )
(?> [^"]* " [^"\r\n]* " [^"]* )
)*
" # A new quote to test
) # (1 end)
( [^"\r\n]* ) # (2), Line break Preamble. Capture for debug
\K # Exclude from the match (group 0) up to this point
\r? \n # Line break to escape
(?= [^"]* " ) # Validate we have " closure
( # (3 start), Optional end quote and alignment.
# To be written back.
(?:
[^"\r\n]* "
(?: # Optionally align to next "
(?> [^"]* " [^"\r\n]* " )
)*
[^"]*
)?
) # (3 end)
# Ruby Code:
#----------------------
# #ruby 2.3.1
#
# re = /(?-m)((?!\A)\G|(?:(?>[^"]*"[^"\r\n]*"[^"]*))*")([^"\r\n]*)\K\r?\n(?=[^"]*")((?:[^"\r\n]*"(?:(?>[^"]*"[^"\r\n]*"))*[^"]*)?)/
# str = '{
# "a boolean": true,
# "a boolean": true,
# "a boolean": true,
# "a boolean": true,
# "multiline": "
# my
# multiline
# value
# asdf"
# ,
#
# "a multiline boo
# lean": true,
# "a normal key": "a multiline
#
# value"
# }'
# subst = '\\n\3'
#
# result = str.gsub(re, subst)
#
# # Print the result of the substitution
# puts result
For Pcre/Perl
Find
(?:((?:(?>[^"]*"[^"\n]*"[^"]*))+(*SKIP)(*FAIL)|"|(?!^)\G)([^"\n]*)\K\n(?=[^"]*")((?:[^"\n]*")?))
Replace
\\n$3
https://regex101.com/r/06naae/1
Explained ( but it's complex )
Note if you're on a windows box where editors need CRLF breaks,
add a \r in front of the LF, like this \r\n.
(?:
( # (1 start), Prefix capture, for debug
(?:
(?> [^"]* " [^"\n]* " [^"]* )
)+
(*SKIP) (*FAIL) # Consume false positives, but ignore them
# (need this to align next ")
| # or,
" # A new quote to test
| # or,
(?! ^ ) # Not BOS
\G # Test where last match left off
) # (1 end)
( [^"\n]* ) # (2), Preamble capture, for debug
\K # Exclude from the match (group 0) up to this point
\n # Line break to escape
(?= [^"]* " ) # Validate we have " closure
( # (3 start), End quote, to be written back
(?: [^"\n]* " )?
) # (3 end)
)
I think this could help you. You capture the \n inside the string and then can replace it:
"[^"]*(\n)*",
Test it
Another option would be something like this:
string = '{
"a boolean": true,
"multiline": "my
multiline
value",
"a normal value"
}'
puts string.match(/"(\w+)(\n+\w*)+"/).to_s.gsub!("\n", '\n')
This matches the regex in your string and then substitutes the newlines with escaped newlines.
Late answer, but you can use a regex like:
'"(?=\n).*?"'
Matches:
"
my
multiline
value
",
Demo:
Regex Demo & Explanation
If your multiline strings don't include commas (right before the line breaks) then you could use that in json every line has to end with ,, {, or [ or the next line has to start with } or ]:
json_string.gsub(/(?<!,|\{|\[)\n(?!\s*[}\]])/, '\n')
If you have commas in your strings (or curly and square brackets) you can improve this approach by adding more details to the list of valid line endings:
valid_line_ends = %w(true, false, ", }, ], { [)
line_end_matcher = valid_line_ends.map(&Regexp.method(:escape)).join('|')
json_string.gsub(/(?<!#{line_end_matcher})\n(?!\s*[}\]])/, '\n')

Match a pattern not preceded by a quotation mark

I have this pattern (?<!')(\w*)\((\d+|\w+|.*,*)\) that is meant to match strings like:
c(4)
hello(54, 41)
Following some answers on SO, I added a negative lookbehind so that if the input string is preceded by a ', the string shouldn't match at all. However, it still partially matches.
For example:
'c(4) returns (4) even though it shouldn't match anything because of the negative lookbehind.
How do I make it so if a string is preceded by ' NOTHING matches?
Since nobody came along, I'll throw this out to get you started.
This regex will match things like
aa(a , sd,,,f,)
aa( as , " ()asdf)) " ,, df, , )
asdf()
but not
'ab(s)
This will fix the basic problem (?<!['\w])\w*
Where (?<!['\w]) will not let the engine skip over a word char just
to satisfy the not quote.
Then the optional words \w* to grab all the words.
And if a 'aaa( quote is before it, then it won't match.
This regex here embellishes what I think you are trying to accomplish
in the function body part of your regex.
It might be a little overwhelming to understand at first.
(?s)(?<!['\w])(\w*)\(((?:,*(?&variable)(?:,+(?&variable))*[,\s]*)?)\)(?(DEFINE)(?<variable>(?:\s*(?:"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')\s*|[^()"',]+)))
Readable version (via: http://www.regexformat.com)
(?s) # Dot-all modifier
(?<! ['\w] ) # Not a quote, nor word behind
# <- This will force matching a complete function name
# if it exists, thereby blocking a preceding quote '
( \w* ) # (1), Function name (optional)
\(
( # (2 start), Function body
(?: # Parameters (optional)
,* # Comma (optional)
(?&variable) # Function call, get first variable (required)
(?: # More variables (optional)
,+ # Comma (required)
(?&variable) # Variable (required)
)*
[,\s]* # Whitespace or comma (optional)
)? # End parameters (optional)
) # (2 end)
\)
# Function definitions
(?(DEFINE)
(?<variable> # (3 start), Function for a single Variable
(?:
\s*
(?: # Double or single quoted string
"
[^"\\]*
(?: \\ . [^"\\]* )*
"
|
'
[^'\\]*
(?: \\ . [^'\\]* )*
'
)
\s*
| # or,
[^()"',]+ # Not quote, paren, comma (can be whitespace)
)
) # (3 end)
)

How to match fuzzy empty div with a regular expression?

I have the following HTML code:
<div id="page126-div" style="position:relative;width:918px;height:1188px;">
</div>
<div id="page127-div" style="position:relative;width:918px;height:1188px;">
sometext for example
</div>
<div id="page128-div" style="position:relative;width:918px;height:1188px;">
</div>
My task is to match empty divs. Empty means in this context that they do not content at all (no characters between open > and closing <) or contain just newline, or just a space or newline or less than 5 characters. So emptyness is pretty fuzzy.
If I would match all divs, not only empty I would use the following regex:
\<div id="page.*?"\>.*?\<\/div\>
Naturally I should use it with dotall modifier.
But when I try to match only empty divs I try to use this expression:
\<div id="page.*?"\>.{0,5}?\<\/div\>
I expect to get first and last(third) divs, because they contain: opening div tag with attributes, then div content that can be from 0 to 5 characters and closing div tag.
First match is right, but second match is second and third divs stacked together instead of third div only.
I do not understand why.
This regex is pretty straight-forward:
<div id=\"[^"]+?\" style=[^>]+?>(\s|\n|[^\n]{,5})<\/div>
Just notice it doesn't necessarily requires the exact same id and style properties.
You can give this a try.
Scraper Series
/(?><div(?=(?:[^>"']|"[^"]*"|'[^']*')*?\sid\s*=\s*(?:(['"])\s*page(?:(?!\1)[\S\s])*\1))\s+(?:"[\S\s]*?"|'[\S\s]*?'|(?:(?!\/>)[^>])?)+>)\s*[\S\s]{0,5}\s*<\/div\s*>/
https://regex101.com/r/x8jf8D/1
Formatted
(?>
< div # div tag
(?= # Asserttion (a pseudo atomic group)
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s id \s* = \s*
(?:
( ['"] ) # (1), Quote
\s* page # With 'id = "page XXX"
(?:
(?! \1 )
[\S\s]
)*
\1
)
)
\s+
(?:
" [\S\s]*? "
| ' [\S\s]*? '
| (?:
(?! /> )
[^>]
)?
)+
>
)
\s* # Optional whitespaces (remove if necessary)
[\S\s]{0,5} # Optional 1-5 anything (including wsp)
\s* # Optional whitespaces (remove if necessary)
</div \s* >

how to match the iframe text, then skip and match another string in wordpress

I have this iframe code that I want to match for both the text right in the beginning of the string and continue with the code to find the "soundcloud" text:
<iframe src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/297769462&color=%23ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false&show_teaser=true" width="100%" height="166" frameborder="no" scrolling="no"></iframe>
My regex, which is: (<iframe.*?><\/iframe>), which tries to match the iframe and anything in between.
What I want is the + skip everything in between until it finds soundcloud. If both conditions are fulfilled, then it's a match.
Any help would be great thank you.
Try this
(?i)<iframe(?=((?:[^>"']|"[^"]*"|'[^']*')*?\s(src\s*=\s*(['"])(?:(?!\3)[\S\s])*?soundcloud(?:(?!\3)[\S\s])*\3)(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>))\1\s*</iframe\s*>
https://regex101.com/r/KkJH6x/1
Formatted
(?i) # Case insensitive modifier
< iframe # The iframe tag
(?= # Asserttion (a pseudo atomic group)
( # (1 start)
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s
( # (2 start), src attribute with 'soundcloud' in value
src \s* = \s*
( ['"] ) # (3), Quote
(?:
(?! \3 )
[\S\s]
)*?
soundcloud # 'Soundcloud'
(?:
(?! \3 )
[\S\s]
)*
\3 # Close quote
) # (2 end)
# The remainder of the tag parts
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
) # (1 end)
)
\1
\s*
</iframe \s* >

Regex to match specific functions and their arguments in files

I'm working on a gettext javascript parser and I'm stuck on the parsing regex.
I need to catch every argument passed to a specific method call _n( and _(. For example, if I have these in my javascript files:
_("foo") // want "foo"
_n("bar", "baz", 42); // want "bar", "baz", 42
_n(domain, "bux", var); // want domain, "bux", var
_( "one (optional)" ); // want "one (optional)"
apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples) // could have on the same line two calls..
This refs this documentation: http://poedit.net/trac/wiki/Doc/Keywords
I'm planning in doing it in two times (and two regex):
catch all function arguments for _n( or _( method calls
catch the stringy ones only
Basically, I'd like a Regex that could say "catch everything after _n( or _( and stop at the last parenthesis ) actually when the function is done. I dunno if it is possible with regex and without a javascript parser.
What could also be done is "catch every "string" or 'string' after _n( or _( and stop at the end of the line OR at the beginning of a new _n( or _( character.
In everything I've done I get either stuck on _( "one (optional)" ); with its inside parenthesis or apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples) with two calls on the same line.
Here is what I implemented so far, with un-perfect regex: a generic parser and the javascript one or the handlebars one
Note: Read this answer if you're not familiar with recursion.
Part 1: match specific functions
Who said that regex can't be modular? Well PCRE regex to the rescue!
~ # Delimiter
(?(DEFINE) # Start of definitions
(?P<str_double_quotes>
(?<!\\) # Not escaped
" # Match a double quote
(?: # Non-capturing group
[^\\] # Match anything not a backslash
| # Or
\\. # Match a backslash and a single character (ie: an escaped character)
)*? # Repeat the non-capturing group zero or more times, ungreedy/lazy
" # Match the ending double quote
)
(?P<str_single_quotes>
(?<!\\) # Not escaped
' # Match a single quote
(?: # Non-capturing group
[^\\] # Match anything not a backslash
| # Or
\\. # Match a backslash and a single character (ie: an escaped character)
)*? # Repeat the non-capturing group zero or more times, ungreedy/lazy
' # Match the ending single quote
)
(?P<brackets>
\( # Match an opening bracket
(?: # A non capturing group
(?&str_double_quotes) # Recurse/use the str_double_quotes pattern
| # Or
(?&str_single_quotes) # Recurse/use the str_single_quotes pattern
| # Or
[^()] # Anything not a bracket
| # Or
(?&brackets) # Recurse the bracket pattern
)*
\)
)
) # End of definitions
# Let's start matching for real now:
_n? # Match _ or _n
\s* # Optional white spaces
(?P<results>(?&brackets)) # Recurse/use the brackets pattern and put it in the results group
~sx
The s is for matching newlines with . and the x modifier is for this fancy spacing and commenting of our regex.
Online regex demo
Online php demo
Part 2: getting rid of opening & closing brackets
Since our regex will also get the opening and closing brackets (), we might need to filter them. We will use preg_replace() on the results:
~ # Delimiter
^ # Assert begin of string
\( # Match an opening bracket
\s* # Match optional whitespaces
| # Or
\s* # Match optional whitespaces
\) # Match a closing bracket
$ # Assert end of string
~x
Online php demo
Part 3: extracting the arguments
So here's another modular regex, you could even add your own grammar:
~ # Delimiter
(?(DEFINE) # Start of definitions
(?P<str_double_quotes>
(?<!\\) # Not escaped
" # Match a double quote
(?: # Non-capturing group
[^\\] # Match anything not a backslash
| # Or
\\. # Match a backslash and a single character (ie: an escaped character)
)*? # Repeat the non-capturing group zero or more times, ungreedy/lazy
" # Match the ending double quote
)
(?P<str_single_quotes>
(?<!\\) # Not escaped
' # Match a single quote
(?: # Non-capturing group
[^\\] # Match anything not a backslash
| # Or
\\. # Match a backslash and a single character (ie: an escaped character)
)*? # Repeat the non-capturing group zero or more times, ungreedy/lazy
' # Match the ending single quote
)
(?P<array>
Array\s*
(?&brackets)
)
(?P<variable>
[^\s,()]+ # I don't know the exact grammar for a variable in ECMAScript
)
(?P<brackets>
\( # Match an opening bracket
(?: # A non capturing group
(?&str_double_quotes) # Recurse/use the str_double_quotes pattern
| # Or
(?&str_single_quotes) # Recurse/use the str_single_quotes pattern
| # Or
(?&array) # Recurse/use the array pattern
| # Or
(?&variable) # Recurse/use the array pattern
| # Or
[^()] # Anything not a bracket
| # Or
(?&brackets) # Recurse the bracket pattern
)*
\)
)
) # End of definitions
# Let's start matching for real now:
(?&array)
|
(?&variable)
|
(?&str_double_quotes)
|
(?&str_single_quotes)
~xis
We will loop and use preg_match_all(). The final code would look like this:
$functionPattern = <<<'regex'
~ # Delimiter
(?(DEFINE) # Start of definitions
(?P<str_double_quotes>
(?<!\\) # Not escaped
" # Match a double quote
(?: # Non-capturing group
[^\\] # Match anything not a backslash
| # Or
\\. # Match a backslash and a single character (ie: an escaped character)
)*? # Repeat the non-capturing group zero or more times, ungreedy/lazy
" # Match the ending double quote
)
(?P<str_single_quotes>
(?<!\\) # Not escaped
' # Match a single quote
(?: # Non-capturing group
[^\\] # Match anything not a backslash
| # Or
\\. # Match a backslash and a single character (ie: an escaped character)
)*? # Repeat the non-capturing group zero or more times, ungreedy/lazy
' # Match the ending single quote
)
(?P<brackets>
\( # Match an opening bracket
(?: # A non capturing group
(?&str_double_quotes) # Recurse/use the str_double_quotes pattern
| # Or
(?&str_single_quotes) # Recurse/use the str_single_quotes pattern
| # Or
[^()] # Anything not a bracket
| # Or
(?&brackets) # Recurse the bracket pattern
)*
\)
)
) # End of definitions
# Let's start matching for real now:
_n? # Match _ or _n
\s* # Optional white spaces
(?P<results>(?&brackets)) # Recurse/use the brackets pattern and put it in the results group
~sx
regex;
$argumentsPattern = <<<'regex'
~ # Delimiter
(?(DEFINE) # Start of definitions
(?P<str_double_quotes>
(?<!\\) # Not escaped
" # Match a double quote
(?: # Non-capturing group
[^\\] # Match anything not a backslash
| # Or
\\. # Match a backslash and a single character (ie: an escaped character)
)*? # Repeat the non-capturing group zero or more times, ungreedy/lazy
" # Match the ending double quote
)
(?P<str_single_quotes>
(?<!\\) # Not escaped
' # Match a single quote
(?: # Non-capturing group
[^\\] # Match anything not a backslash
| # Or
\\. # Match a backslash and a single character (ie: an escaped character)
)*? # Repeat the non-capturing group zero or more times, ungreedy/lazy
' # Match the ending single quote
)
(?P<array>
Array\s*
(?&brackets)
)
(?P<variable>
[^\s,()]+ # I don't know the exact grammar for a variable in ECMAScript
)
(?P<brackets>
\( # Match an opening bracket
(?: # A non capturing group
(?&str_double_quotes) # Recurse/use the str_double_quotes pattern
| # Or
(?&str_single_quotes) # Recurse/use the str_single_quotes pattern
| # Or
(?&array) # Recurse/use the array pattern
| # Or
(?&variable) # Recurse/use the array pattern
| # Or
[^()] # Anything not a bracket
| # Or
(?&brackets) # Recurse the bracket pattern
)*
\)
)
) # End of definitions
# Let's start matching for real now:
(?&array)
|
(?&str_double_quotes)
|
(?&str_single_quotes)
|
(?&variable)
~six
regex;
$input = <<<'input'
_ ("foo") // want "foo"
_n("bar", "baz", 42); // want "bar", "baz", 42
_n(domain, "bux", var); // want domain, "bux", var
_( "one (optional)" ); // want "one (optional)"
apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples) // could have on the same line two calls..
// misleading cases
_n("foo (")
_n("foo (\)", 'foo)', aa)
_n( Array(1, 2, 3), Array(")", '(') );
_n(function(foo){return foo*2;}); // Is this even valid?
_n (); // Empty
_ (
"Foo",
'Bar',
Array(
"wow",
"much",
'whitespaces'
),
multiline
); // PCRE is awesome
input;
if(preg_match_all($functionPattern, $input, $m)){
$filtered = preg_replace(
'~ # Delimiter
^ # Assert begin of string
\( # Match an opening bracket
\s* # Match optional whitespaces
| # Or
\s* # Match optional whitespaces
\) # Match a closing bracket
$ # Assert end of string
~x', // Regex
'', // Replace with nothing
$m['results'] // Subject
); // Getting rid of opening & closing brackets
// Part 3: extract arguments:
$parsedTree = array();
foreach($filtered as $arguments){ // Loop
if(preg_match_all($argumentsPattern, $arguments, $m)){ // If there's a match
$parsedTree[] = array(
'all_arguments' => $arguments,
'branches' => $m[0]
); // Add an array to our tree and fill it
}else{
$parsedTree[] = array(
'all_arguments' => $arguments,
'branches' => array()
); // Add an array with empty branches
}
}
print_r($parsedTree); // Let's see the results;
}else{
echo 'no matches';
}
Online php demo
You might want to create a recursive function to generate a full tree. See this answer.
You might notice that the function(){} part isn't parsed correctly. I will let that as an exercise for the readers :)
Try this:
(?<=\().*?(?=\s*\)[^)]*$)
See live demo
Below regex should help you.
^(?=\w+\()\w+?\(([\s'!\\\)",\w]+)+\);
Check the demo here
\(( |"(\\"|[^"])*"|'(\\'|[^'])*'|[^)"'])*?\)
This should get anything between a pair of parenthesis, ignoring parenthesis in quotes.
Explanation:
\( // Literal open paren
(
| //Space or
"(\\"|[^"])*"| //Anything between two double quotes, including escaped quotes, or
'(\\'|[^'])*'| //Anything between two single quotes, including escaped quotes, or
[^)"'] //Any character that isn't a quote or close paren
)*? // All that, as many times as necessary
\) // Literal close paren
No matter how you slice it, regular expressions are going to cause problems. They're hard to read, hard to maintain, and highly inefficient. I'm unfamiliar with gettext, but perhaps you could use a for loop?
// This is just pseudocode. A loop like this can be more readable, maintainable, and predictable than a regular expression.
for(int i = 0; i < input.length; i++) {
// Ignoring anything that isn't an opening paren
if(input[i] == '(') {
String capturedText = "";
// Loop until a close paren is reached, or an EOF is reached
for(; input[i] != ')' && i < input.length; i++) {
if(input[i] == '"') {
// Loop until an unescaped close quote is reached, or an EOF is reached
for(; (input[i] != '"' || input[i - 1] == '\\') && i < input.length; i++) {
capturedText += input[i];
}
}
if(input[i] == "'") {
// Loop until an unescaped close quote is reached, or an EOF is reached
for(; (input[i] != "'" || input[i - 1] == '\\') && i < input.length; i++) {
capturedText += input[i];
}
}
capturedText += input[i];
}
capture(capturedText);
}
}
Note: I didn't cover how to determine if it's a function or just a grouping symbol. (ie, this will match a = (b * c)). That's complicated, as is covered in detail here. As your code gets more and more accurate, you get closer and closer to writing your own javascript parser. You might want to take a look at the source code for actual javascript parsers if you need that sort of accuracy.
One bit of code (you can test this PHP code at http://writecodeonline.com/php/ to check):
$string = '_("foo")
_n("bar", "baz", 42);
_n(domain, "bux", var);
_( "one (optional)" );
apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples)';
preg_match_all('/(?<=(_\()|(_n\())[\w", ()%]+(?=\))/i', $string, $matches);
foreach($matches[0] as $test){
$opArr = explode(',', $test);
foreach($opArr as $test2){
echo trim($test2) . "\n";
}
}
you can see the initial pattern and how it works here: http://regex101.com/r/fR7eU2/1
Output is:
"foo"
"bar"
"baz"
42
domain
"bux"
var
"one (optional)"
"No apples"
"%1 apple"
"%1 apples"
apples
We can do this in two steps:
1)catch all function arguments for _n( or _( method calls
(?:_\(|_n\()(?:[^()]*\([^()]*\))*[^()]*\)
See demo.
http://regex101.com/r/oE6jJ1/13
2)catch the stringy ones only
"([^"]*)"|(?:\(|,)\s*([^"),]*)(?=,|\))
See demo.
http://regex101.com/r/oE6jJ1/14