I have some obfuscated code which call functions, like this:
getAny([["text with symbols \"()[],.;\" and maybe 'ImVerySeriousFn'"], ...]);
setAny([["other text with \"()[],.;\""], ...]);...
Arguments contain random text. Functions follow each other without a new line.
How can I get arguments of getAny, setAny and other functions, using set of regular expressions?
I need this result:
regex1 result: [["text with symbols \"()[],.;\" and maybe 'ImVerySeriousFn'"], ...]
regex2 result: [["other text with \"()[],.;\""], ...]
...
I tried write regex1:
getAny\((.*)\)
but matching result also contains setAny call
[["text with symbols \"()[],.;\" and maybe 'ImVerySeriousFn'"], ...]);setAny([["other text with \"()[],.;\""], ...]
When I tried:
getAny\((.*?)\)
matching result break argument string
[["text with symbols \"(
I can't split by ; or ); because text in arguments can contains symbols ; or );
maybe impossible to do it using regex?
Your regex needs to be \(.*?\); since your code is obfuscated (and assumedly on one line).
Note that this will fail if one of your arguments contains ); inside of it.
Explanation (From Regex101.com):
/\((.*?)\);/g
\( matches the character ( literally
1st Capturing group (.*?)
.*? matches any character (except newline)
Quantifier: Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
\) matches the character ) literally
; matches the character ; literally
g modifier: global. All matches (don't return on first match)
The main problem with your regex is that you never specified ; to end a match, so it went ahead and grabbed up until the last ) it saw because you used .*, which is greedy (grabs everything) unless followed by ?.
Demo
I don't know, if I understand your question, but if I do, you maybe could use a group and collect the allowed signs in it.
Your regex could be: \( ( ) " [ ],\.; a-zA-Z \)
outer brackets enclose the group
If I understand your pattern correctly, your function argument will always start with [[" and end with "]].
Regex:
/getAny\((\[\[".*?[^\\]"\]\])\);/
Demo: http://regex101.com/r/jC3vX5/2
Note the lazy .*?, and [^\\] to make sure the matching quote is not escaped.
Related
I am new to Regex world. I would like to rename the files that have time stamp added on the end of the file name. Basically remove last 25 characters before the extension.
Examples of file names to rename:
IMG523314(2021-12-05-14-51-25_UTC).jpg > IMG523314.jpg
Test run1(2021-08-05-11-32-18_UTC).txt > Test run1.txt
To remove 25 characters before the .extension (2021-12-05-14-51-25_UTC)
or if you like, remove the brackets ( ) which are always there and everything inside the brackets.
After the right bracket is always a dot '. "
Will Regex syntax as shown in the Tittle here, select the above? If yes, I wonder how it actually works?
Many Thanks,
Dan
Yes \(.*\) will select the paranthesis and anything inside of them.
Assuming when you ask how it works you mean why do the symbols work how they do, heres a breakdown:
\( & \): Paranthesis are special characters in regex, they signify groups, so in order to match them properly, you need to escape them with backslashes.
.: Periods are wildcard matcher, meaning they match any single character.
*: Asterisks are a quantifier, meaning match zero to inifite number of the previous matcher.
So to put everything together you have:
Match exactly one opening parathesis
Match an unlimited number of any character
Match exactly one closing bracket
Because of that closing bracket requirement, you put a limit to the infinite matching of the asterisk and therefore only grab the parenthesis and characters inside of them.
Yes, it's possible:
a='IMG523314(2021-12-05-14-51-25_UTC).jpg'
echo "${a/\(*\)/}"
and
b='Test run1(2021-08-05-11-32-18_UTC).txt'
echo "${b/\(*\)/}"
Explanation:
the first item is the variable
the second is the content to be replaced \(*\), that is, anything inside paranthesis
the third is the string we intend to replace the former with (it's empty string in this case)
The regex s/\A\s*\n// removes every all-whitespace line from the beginning of a string.
It leaves everything else alone, including any whitespace that might begin the first visible line.
By "visible line," I mean a line that satisfies /\S/.
The code below demonstrates this.
But how does it work?
\A anchors the start of the string
\s* greedily grabs all whitespace. But without the (?s) modifier, it should stop at the end of the first line, should it not?
See
https://perldoc.perl.org/perlre.
Suppose that without the (?s) modifier it nevertheless "treats the string as a single line".
Then I would expect the greedy \s* to grab every whitespace character it sees,
including linefeeds. So it would pass the linefeed that precedes the "dogs" string, keep grabbing whitespace, run into the "d", and we would never get a match.
Nevertheless, the code does exactly what I want. Since I can't explain it, it's like a kludge, something that happens to work, discovered through trial and error. What is the reason it works?
#!/usr/bin/env perl
use strict; use warnings;
print $^V; print "\n";
my #strs=(
join('',"\n", "\t", ' ', "\n", "\t", ' dogs',),
join('',
"\n",
"\n\t\t\x20",
"\n\t\t\x20",
'......so what?',
"\n\t\t\x20",
),
);
my $count=0;
for my $onestring(#strs)
{
$count++;
print "\n$count ------------------------------------------\n";
print "|$onestring|\n";
(my $try1=$onestring)=~s/\A\s*\n//;
print "|$try1|\n";
}
But how does it work?
...
I would expect the greedy \s* to grab every whitespace character it sees, including linefeeds. So it would pass the linefeed that precedes the "dogs" string, keep grabbing whitespace, run into the "d", and we would never get a match.
Correct -- the \s* at first grabs everything up to the d (in dogs) and with that the match would fail ... so it backs up, a character at a time, shortening that greedy grab so to give a chance to the following pattern, here \n, to match.
And that works! So \s* matches up to (the last!) \n, that one is matched by the following \n in the pattern, and all is well. That's removed and we stay with "\tdogs" which is printed.
This is called backtracking. See about it also in perlretut. Backtracking can be suppressed, most notably by possesive forms (like \w++ etc), or rather by extended construct (?>...).
But without the (?s) modifier, it should stop at the end of the first line, should it not?
Here you may be confusing \s with ., which indeed does not match \n (without /s)
There are two questions here.
The first is about the interaction of \s and (lack of) (?s). Quite simply, there is no interaction.
\s matches whitespaces characters, which includes Line Feed (LF). It's not affected by (?s) whatsoever.
(?s) exclusively affects ..
(?-s) causes . to match all characters except LF. [Default]
(?s) causes . to match all characters.
If one wanted to match whitespace on the current line, one could use \h instead of \s. It only matches horizontal whitespace, thus excluding CR and LF (among others).
Alternatively, (?[ \s - \n ])[1], [^\S\n][2] and \s(?<!\n)[3] all match whitespace characters other than LF.
The second is about a misconception of what greediness means.
Greediness or lack thereof doesn't affect if a pattern can match, just what it matches. For example, for a given input, /a+/ and /a+?/ will both match, or neither will match. It's impossible for one to match and not the other.
"aaaa" =~ /a+/ # Matches 4 characters at position 0.
"aaaa" =~ /a+?/ # Matches 1 character at position 0.
"bbbb" =~ /a+/ # Doesn't match.
"bbbb" =~ /a+?/ # Doesn't match.
When something is greedy, it means it will match the most possible at the current position that allows the entire pattern to match. Take the following for example:
"ccccd" =~ /.*d/
This pattern can match by having .* match only cccc instead of ccccd, and thus does so. This is achieved through backtracking. .* initially matches ccccd, then it discovers that d doesn't match, so .* tries matching only cccc. This allows the d and thus the entire pattern to match.
You'll find backtracking used outside of greediness too. "efg" =~ /^(e|.f)g/ matches because it tries the second alternative when it's unable to match g when using the first alternative.
In the same way as .* avoids matching the d in the earlier example, the \s* avoids matching the LF and tab before dog in your example.
Requires use experimental qw( regex_sets ); before 5.36, but it was safe to use since 5.18 as it was accepted without change since its introduction as an experimental feature..
Less clear because it uses double negatives.[^\S\n]= A char that's ( not( not(\s) or LF ) )= A char that's ( not(not(\s)) and not(LF) )= A char that's ( \s and not LF )
Less efficient, and far from as pretty as the regex set.
hiii every body
i need help for this reg expression pattern
i need to search on text for this
( anything ) -
check this example to every statement
i need to detect if this pattern exist on the statement that i will feed to my function and get the matched string
be careful for space and braces and dash and anything mean any content Arabic or English no matter what is it , just pattern start with ( and end to - and if this pattern exist on the first statement so it say exist
thanks for every one .....
The task can be easier if it is described in a way "guiding to"
the proper solution. Let's rewrite your task the following way:
The text to match:
Should start with ( and a space.
Then there is a non-empty sequence of chars other than )
(the text between parentheses, which you wrote as anything).
The last part is a space, ), another space and -.
Having such a description, it is obvious, that the pattern should be:
\( [^)]+ \) -
where each fragment: \(, [^)]+ and \) - expresses each of the
above conditions.
Note: If spaces after ( and before ) are optional, then you can express it
with ? after each such space, and then the whole regex will change to
\( ?[^)]+ ?\) -.
I have the following regular expression:
"\[(\d+)\].?\s+(\S+)\s+(\/+?)\\r\n"
I am pretty new to regex. I have this regexp and a string that I am trying to see if it matches or not. I believe it should match it but my program says it doesn't, and an online analyser says they do not match. I am pretty sure I am missing something small. Here is my string:
[1]+ Stopped sleep 60
However, when using this online tool to check for a match (and my program is saying they're not equal), why does the following expression not match the above regexp? Any ideas?
you appear to have escaped the \ prior to the \r resulting in it searching for the letter r
RegExp interpretation and allowed characters vary slightly with implementation, so you should give your execution context, but this is probably generic enough.
Decomposing your regexp gives
\[ - an open bracket character.
(\d+) - one or more digits; save this as capture group 1 ($1).
\] - a close bracket character.
.? - 0 or 1 character, of any kind
\s+ - 1 or more spaces.
(\S+) - 1 or more non-space characters; save this as $2
\s+ - 1 or more spaces
(\/+?) - 1 or more forward-slash characters, optional as $3
(not sure about this, this is an odd construct)
\\r\n" - an (incorrectly specified) end of line sequence, I think.
First of all, if you want to match the end of a line, use $, not \r\n. That should match the end of a line in most contexts. ^ matches the beginning of a line.
Second, I can't tell from your regexp what you are trying to capture after the "Stopped" word, so I'm going to assume you want the rest as one block, including internal spaces. A reg-exp basically the same as yours will do it.
"\[(\d+)\].?\s+(\S+)\s+(.+)\s*$"
This captures
$1 = 1,
$2 = Stopped
$3 = sleep 60
This is basically the same as yours except for the end, which grabs everything after "stopped" up to the end of the line as a single capture group, $3, except for leading and trailing blanks. If you want to do additional parsing, replace the (.+) as appropriate. Note that there must be at least 1 non-blank character after "stopped " for this to match. If you want it to match even if there is no string $3, use \s*(.*)\s*$ instead of \s+(.+)\s*$
Try to use this pattern:
\[\d+\]\+\s*\w+\s*\w+\s*\d+
I have a string 1/temperatoA,2/CelcieusB!23/33/44,55/66/77 and I would like to extract the words temperatoA and CelcieusB.
I have this regular expression (\d+/(\w+),?)*! but I only get the match 1/temperatoA,2/CelcieusB!
Why?
Your whole match evaluates to '1/temperatoA,2/CelcieusB' because that matches the following expression:
qr{ ( # begin group
\d+ # at least one digit
/ # followed by a slash
(\w+) # followed by at least one word characters
,? # maybe a comma
)* # ANY number of repetitions of this pattern.
}x;
'1/temperatoA,' fulfills capture #1 first, but since you are asking the engine to capture as many of those as it can it goes back and finds that the pattern is repeated in '2/CelcieusB' (the comma not being necessary). So the whole match is what you said it is, but what you probably weren't expecting is that '2/CelcieusB' replaces '1/temperatoA,' as $1, so $1 reads '2/CelcieusB'.
Anytime you want to capture anything that fits a certain pattern in a certain string it is always best to use the global flag and assign the captures into an array. Since an array is not a single scalar like $1, it can hold all the values that were captured for capture #1.
When I do this:
my $str = '1/temperatoA,2/CelcieusB!23/33/44,55/66/77';
my $regex = qr{(\d+/(\w+))};
if ( my #matches = $str =~ /$regex/g ) {
print Dumper( \#matches );
}
I get this:
$VAR1 = [
'1/temperatoA',
'temperatoA',
'2/CelcieusB',
'CelcieusB',
'23/33',
'33',
'55/66',
'66'
];
Now, I figure that's probably not what you expected. But '3' and '6' are word characters, and so--coming after a slash--they comply with the expression.
So, if this is an issue, you can change your regex to the equivalent: qr{(\d+/(\p{Alpha}\w*))}, specifying that the first character must be an alpha followed by any number of word characters. Then the dump looks like this:
$VAR1 = [
'1/temperatoA',
'temperatoA',
'2/CelcieusB',
'CelcieusB'
];
And if you only want 'temperatoA' or 'CelcieusB', then you're capturing more than you need to and you'll want your regex to be qr{\d+/(\p{Alpha}\w*)}.
However, the secret to capturing more than one chunk in a capture expression is to assign the match to an array, you can then sort through the array to see if it contains the data you want.
The question here is: why are you using a regular expression that’s so obviously wrong? How did you get it?
The expression you want is simply as follows:
(\w+)
With a Perl-compatible regex engine you can search for
(?<=\d/)\w+(?=.*!)
(?<=\d/) asserts that there is a digit and a slash before the start of the match
\w+ matches the identifier. This allows for letters, digits and underscore. If you only want to allow letters, use [A-Za-z]+ instead.
(?=.*!) asserts that there is a ! ahead in the string - i. e. the regex will fail once we have passed the !.
Depending on the language you're using, you might need to escape some of the characters in the regex.
E. g., for use in C (with the PCRE library), you need to escape the backslashes:
myregexp = pcre_compile("(?<=\\d/)\\w+(?=.*!)", 0, &error, &erroroffset, NULL);
Will this work?
/([[:alpha:]]\w+)\b(?=.*!)
I made the following assumptions...
A word begins with an alphabetic character.
A word always immediately follows a slash. No intervening spaces, no words in the middle.
Words after the exclamation point are ignored.
You have some sort of loop to capture more than one word. I'm not familiar enough with the C library to give an example.
[[:alpha:]] matches any alphabetic character.
The \b matches a word boundary.
And the (?=.*!) came from Tim Pietzcker's post.