Unable to match csrf-token in Lua - regex

Gone around the houses on this one tonight. All I want to do is pull out the csrf-token in the following script however it returns nil
local html = '<meta content="authenticity_token" name="csrf-param" /><meta content="ndcZ+Vp8MuM/hF6LizdrvJqgcRh22zF8w/DnIX0DvR0=" name="csrf-token" />'
local csrf_token = string.match(html, 'content="(.*)" name="csrf-token"')
If I modify the script and take off the "-token" part it matches something, but not the right thing of course.
I know it is the hyphen because if I modify the string to be "csrftoken" and the match it finds works as expected.
I attempted to escape the - like so \- but that threw an error...
elp...

There are two problems:
The - does need to be escaped, but Lua uses % instead of \.
Further, the reason why you get something odd is due to the fact the . can match anything, including across tags (or attributes) and tries to take as much as possible (since the engine will return the left-most possible match, ungreedy quantifiers wouldn't help either). What you should do is restrict the allowed characters, so that the captured thing cannot go outside of the attribute quotes - like [^"] (any character except quotes):
Taking all of that together:
local csrf_token = string.match(html, 'content="([^"]*)" name="csrf%-token"')
In any case, you shouldn't actually be matching HTML with regular expressions.

name="csrf-token'"
You have an extra apostrophe at the end of this line.
I would also escape " = and the hyphen, though this may not be necessary for all these characters.

Related

How to do regular Expression in AutoIt Script

In Autoit script Iam unable to do Regular expression for the below string Here the numbers will get changed always.
Actual String = _WinWaitActivate("RX_IST2_AM [PID:942564 NPID:10991 SID:498702881] sbivvrwm060.dev.ib.tor.Test.com:30000","")
Here the PID, NPID & SID : will be changing and rest of the things are always constant.
What i have tried below is
_WinWaitActivate("RX_IST2_AM [PID:'([0-9]{1,6})' NPID:'([0-9]{1,5})' SID:'([0-9]{1,9})' sbivvrwm060.dev.ib.tor.Test.com:30000","")
Can someone please help me
As stated in the documentation, you should write the prefix REGEXPTITLE: and surround everything with square brackets, but "escape" all including ones as the dots (.) and spaces () with a backslash (\) and instead of [0-9] you might use \d like "[REGEXPTITLE:RX_IST2_AM\ \[PID:(\d{1,6})\ NPID:(\d{1,5})\ SID:(\d{1,9})\] sbivvrwm060\.dev\.ib\.tor\.Test\.com:30000]" as your parameter for the Win...(...)-Functions.
You can even omit the round brackets ((...)) but keep their content if you don't want to capture the content to process it further like with StringRegExp(...) or StringRegExpReplace(...) - using the _WinWaitActivete(...)-Function it won't make sense anyways as it is only matching and not replacing or returning anything from your regular expression.
According to regex101 both work, with the round brackets and without - you should always use a tool like this site to confirm that your expression is actually working for your input string.
Not familiar with autoit, but remember that regex has to completely match your string to capture results. For example, (goat)s will NOT capture the word goat if your string is goat or goater.
You have forgotten to add a ] in your regex, so your pattern doesn't match the string and capture groups will not be extracted. Also I'm not completely sold on the usage of '. Based on this page, you can do something like StringRegExp(yourstring, 'RX_IST2_AM [PID:([0-9]{1,6}) NPID:([0-9]{1,5}) SID:([0-9]{1,9})]', $STR_REGEXPARRAYGLOBALMATCH) and $1, $2 and $3 would be your results respectively. But maybe your approach works too.

Replacing Or Stripping Special Characters With Regex (Smarty)

in a part of my website i have a url that looks like this:
http://www.webizzi.com/mp3/search.html?q=+Hush+Hush+-+(Avril)++Lavigne's+
I would like to keep a cleaner url by stripping every special character that appears on the url except + but i also do not want to have something like ++ or + at the beginning or end of the url, the url should look like the one below
http://www.webizzi.com/mp3/search.html?q=Hush+Hush+Avril+Lavigne
what i have to process the url at the moment is:
{$config.siteurl}search.html?q={$tags[row].tag|regex_replace:"/\s+/":"+"|stripslashes}
Assuming your input is everything after ?q=
s/(^\++|\++$|\+\++|[\(\)]+)//g
In those last pair of brackets, you put any other characters you want stripped.
This matches one or more opening +'s, one or more closing +'s, two or more +'s anywhere, or one or more the special characters inside the brackets (so far, just parentheses) and replaces it with nothing – an empty string – zilch – nada.
I don't know jack about Smarty, but I think you should try something like
{$config.siteurl}search.html?q={$tags[row].tag|regex_replace:"/(^\++|\++$|\+\++|[\(\)]+)/":""|stripslashes}
I'm not quite sure if you need to escape the parentheses here, so if it doesn't work, lose some backslashes.
href="{$site_url}tests/tests/view/{$test.test_id}/{strtolower($test.test_name|replace:' ':'-'|regex_replace:'/(^++|++$|+++|[()]+)/':'')}">Test

How to capture text between two markers?

For clarity, I have created this:
http://rubular.com/r/ejYgKSufD4
My strings:
http://blablalba.com/foo/bar_soap/foo/dir2
http://blablalba.com/foo/bar_soap/dir
http://blablalba.com/foo/bar_soap
My Regular expression:
\/foo\/(.*)
This returns:
/foo/bar_soap/dir/dir2
/foo/bar_soap/dir
/foo/bar_soap
But I only want
/foo/bar_soap
Any ideas how I can achieve this? As illustrated above, I want everything after foo up until the first forward slash.
Thanks in advance.
Edit. I only want the text after foo until until the next forward slash after. Some directories may also be named as foo and this would render incorrect results. Thanks
. will match anything, so you should change it to [^/] (not slash) instead:
\/foo\/([^\/]*)
Some of the other answers use + instead of *. That might be correct depending on what you want to do. Using + forces the regex to match at least one non-slash character, so this URL would not match since there isn't a trailing character after the slash:
http://blablalba.com/foo/
Using * instead would allow that to match since it matches "zero or more" non-slash characters. So, whether you should use + or * depends on what matches you want to allow.
Update
If you want to filter out query strings too, you could also filter against ?, which must come at the front of all query strings. (I think the examples you posted below are actually missing the leading ?):
\/foo\/([^?\/]*)
However, rather than rolling out your own solution, it might be better to just use split from the URI module. You could use URI::split to get the path part of the URL, and then use String#split split it up by /, and grab the first one. This would handle all the weird cases for URLs. One that you probably haven't though of yet is a URL with a specified fragment, e.g.:
http://blablalba.com/foo#bar
You would need to add # to your filtered-character class to handle those as well.
You can try this regular expression
/\/foo\/([^\/]+)/
\/foo\/([^\/]+)
[^\/]+ gives you a series of characters that are not a forward slash.
the parentheses cause the regex engine to store the matched contents in a group ([^\/]+), so you can get bar_soap out of the entire match of /foo/bar_soap
For example, in javascript you would get the matched group as follows:
regexp = /\/foo\/([^\/]+)/ ;
match = regexp.exec("/foo/bar_soap/dir");
console.log(match[1]); // prints bar_soap

parsing url for specific param value

im looking to use a regular expression to parse a URL to get a specific section of the url and nothing if I cannot find the pattern.
A url example is
/te/file/value/jifle?uil=testing-cdas-feaw:jilk:&jklfe=https://value-value.jifels/temp.html/topic?id=e997aad4-92e0-j30e-a3c8-jfkaliejs5#c452fds-634d-f424fds-cdsa&bf_action=jildape
I wish to get the bolded text in it.
Currently im using the regex "d=([^#]*)" but the problem is im also running across urls of this pattern:
and im getting the bold section of it
/te/file/value/jifle?uil=testing-cdas-feaw:jilk:&jklfe=https://value-value.jifels/temp.html/topic?id=e997aad4-92e0-j30e-a3c8-jfkaliejs5&bf_action=jildape
I would prefer it have no matches of this url because it doesnt contain the #
Regexes are not a magic tool that you should always use just because the problem involves a string. In this case, your language probably has a tool to break apart URLs for you. In PHP, this is parse_url(). In Perl, it's the URI::URL module.
You should almost always prefer an existing, well-tested solution to a common problem like this rather than writing your own.
So you want to match the value of the id parameter, but only if it has a trailing section containing a '#' symbol (without matching the '#' or what's after it)?
Not knowing the specifics of what style of regexes you're using, how about something like:
id=([^#&]*)#
regex = "id=([\\w-])+?#"
This will grab everything that is character class[a-zA-Z_0-9-] between 'id=' and '#' assuming everything between 'id=' and '#' is in that character class(i.e. if an '&' is in there, the regex will fail).
id=
-Self explanatory, this looks for the exact match of 'id='
([\\w-])
-This defines and character class and groups it. The \w is an escaped \w. '\w' is a predefined character class from java that is equal to [a-zA-Z_0-9]. I added '-' to this class because of the assumed pattern from your examples.
+?
-This is a reluctant quantifier that looks for the shortest possible match of the regex.
#
-The end of the regex, the last character we are looking for to match the pattern.
If you are looking to grab every character between 'id=' and the first '#' following it, the following will work and it uses the same logic as above, but replaces the character class [\\w-] with ., which matches anything.
regex = "id=(.+?)#"

Trying to pick out a specific part of a string with regex

I've tried and tried again to find a regex for this pattern.
I have a string like this picked from HTML source.
<!-- TAG=Something / Something else -->
And sometimes it's just:
<!-- TAG=Something -->
In both cases I want the regex to just match "Something", i.e. everything between TAG= and an optional /.
My first attempt was:
TAG=(.*)[/]?(.*) -->
But the first parenthesis matches everything between TAG= and --> no matter what. So what is the correct way here?
Try this:
TAG=([^/]*)(?:/(.*))?-->
Group 1 will contain "Something".
Group 2 will contain "Something else" or null.
Test it.
<!--.*?=(.*?)(-->|/)
It matches everything you need.
Use a non-greedy modifier ?:
TAG=(.*?)[/]?.* -->
Also your usage of [/] seems unusual - you don't need a character class to write a single character. The most likely explanation for this unusual syntax is probably because you are using / as the regular expression delimiter, meaning that / is treated as a special character. In many (not all) regex dialects it is possible solve this issue by using a different delimiter, such as #. This prevents you from needing to escape the slashes.