parse URL params in Perl - regex

I am working on some tutorials to explain things like GET/POST's and need to parse the URI manually. The follow perl code works, but I am trying to do two things:
list each key/value
be able to look up one specific value
What I do NOT care about is replacing the special chars to spaces or anything, the one value I need to get should be a number. In other languages I have used, the regular expression in question should group each key/value into one grouping with a part 1/part 2, does Perl do the same? If so, how do I put that into a map?
my #paramList = split /(?:\?|&|;)([^=]+)=([^&|;]+)/, $ENV{'REQUEST_URI'};
if(#paramList)
{
print "<h1>The Params</h1><ul>";
foreach my $i (#paramList) {
if($i) {
print "<li>$i</li>";
}
}
print "<ul>";
}
Per the request, here is a basic example of the input:
REQUEST_URI = /cgi-bin/printenv_html.pl?customer_name=fdas&phone_number=fdsa&email_address=fads%40fd.com&taxi=van&extras=tip&pickup_time=2020-01-14T20%3A45&pickup_place=&dropoff_place=Airport&comments=
goal is the following where the left of the equal is the key, and the right is the value:
customer_name=fdas
phone_number=fdsa
email_address=fads%40fd.com
taxi=van
extras=tip
pickup_time=2020-01-14T20%3A45
pickup_place=
dropoff_place=Airport
comments=

How about feeding your list of key-value pairs into a hash?
my %paramList = $ENV{'REQUEST_URI'} =~ /(?:\?|&|;)([^=]+)=([^&|;]+)/g;
(no reason for the split as far as I can tell)
This relies crucially on there being an even-sized list of matches, where each "before-=" thing becomes a key in the hash, with the value being its pairing "after-=" thing.
In order to also get "pairs" without a value (like comments=) change + in the last pattern to *

Related

Extracting key-value pairs from a string using ruby & regex

I want to accomplish the following with ruby and if possible a regex:
Input: "something {\"key\":\"value\",\"key2\":3}"
Output: [["\"key\"", "\"value\""], [["\"key2\"", "3"]]
My attempt so far:
s = "something {key:\"value\",key2:3}"
s.scan(/.* {(?:([^:]+):([^,}]+),?)+}$/)
# Output: [["\"key2\"", "3"]]
For some reason the regex above only matches the last key value pair. Does someone know how to retrieve all the pairs?
Just to be clear, "something" can be any kind of string. For this reason, solutions such as (1) splitting the text directly on the equal or (2) a regex as used in s.scan(/(?:([^:]+):([^,}]+),?)/) don't work for me.
I know there are similar questions on SO. Still, from what I saw, they mostly tend towards the solutions 1 & 2 or focus on a single key value pair.
your string looks like a json data structure encoded as a string, you can use JSON.parse for this as long as you remove the word "something " from the string
require 'json'
string = "something {\"key\":\"value\",\"key2\":3}"
# the following line removes the word something
string = string[string.index("{")..-1]
x = JSON.parse(string)
puts x["key"]
puts x["key2"]
you can then convert that to an array if required
alternatively if you want to use regular expressions try
string.scan(/(?:"(\w+)":"?(\w+)"?)/)

regex Match a capture group's items only once

So I'm trying to split a string in several options, but those options are allowed to occur only once. I've figured out how to make it match all options, but when an option occurs twice or more it matches every single option.
Example string: --split1 testsplit 1 --split2 test split 2 --split3 t e s t split 3 --split1 split1 again
Regex: /-{1,2}(split1|split2|split3) [\w|\s]+/g
Right now it is matching all cases and I want it to match --split1, --split2 and --split3 only once (so --split1 split1 again will not be matched).
I'm probably missing something really straight forward, but anyone care to help out? :)
Edit:
Decided to handle the extra occurances showing up in a script and not through RegEx, easier error handling. Thanks for the help!
EDIT: Somehow I ended up here from the PHP section, hence the PHP code. The same principles apply to any other language, however.
I realise that OP has said they have found a solution, but I am putting this here for future visitors.
function splitter(string $str, int $splits, $split = "--split")
{
$a = array();
for ($i = $splits; $i > 0; $i--) {
if (strpos($str, "$split{$i} ") !== false) {
$a[] = substr($str, strpos($str, "$split{$i} ") + strlen("$split{$i} "));
$str = substr($str, 0, strpos($str, "$split{$i} "));
}
}
return array_reverse($a);
}
This function will take the string to be split, as well as how many segments there will be. Use it like so:
$array = splitter($str, 3);
It will successfully explode the array around the $split parameter.
The parameters are used as follows:
$str
The string that you want to split. In your instance it is: --split1 testsplit 1 --split2 test split 2 --split3 t e s t split 3 --split1 split1 again.
$splits
This is how many elements of the array you wish to create. In your instance, there are 3 distinct splits.
If a split is not found, then it will be skipped. For instance, if you were to have --split1 and --split3 but no --split2 then the array will only be split twice.
$split
This is the string that will be the delimiter of the array. Note that it must be as specified in the question. This means that if you want to split using --myNewSplit then it will append that string with a number from 1 to $splits.
All elements end with a space since the function looks for $split and you have a space before each split. If you don't want to have the trailing whitespace then you can change the code to this:
$a[] = trim(substr($str, strpos($str, "$split{$i} ") + strlen("$split{$i} ")));
Also, notice that strpos looks for a space after the delimiter. Again, if you don't want the space then remove it from the string.
The reason I have used a function is that it will make it flexible for you in the future if you decide that you want to have four splits or change the delimiter.
Obviously, if you no longer want a numerically changing delimiter then the explode function exists for this purpose.
-{1,2}((split1)|(split2)|(split3)) [\w|\s]+
Something like this? This will, in this case, create 3 arrays which all will have an array of elements of the same name in them. Hope this helps

How to replace parts of a string in lua "in a single pass"?

I have the following string of anchors (where I want to change the contents of the href) and a lua table of replacements, which tells which word should be replaced for:
s1 = '<a href="word7">'
replacementTable = {}
replacementTable["word1"] = "potato1"
replacementTable["word2"] = "potato2"
replacementTable["word3"] = "potato3"
replacementTable["word4"] = "potato4"
replacementTable["word5"] = "potato5"
The expected result should be:
<a href="word7">
I know I could do this iterating for each element in the replacementTable and process the string each time, but my gut feeling tells me that if by any chance the string is very big and/or the replacement table becomes big, this apporach is going to perform poorly.
So I though it could be best if I could do the following: apply the regular expression for finding all the matches, get an iterator for each match and replace each match for its value in the replacementTable.
Something like this would be great (writing it in Javascript because I don't know yet how to write lambdas in Lua):
var newString = patternReplacement(s1, '<a[^>]* href="([^"]*)"', function(match) { return replacementTable[match] })
Where the first parameter is the string, the second one the regular expression and the third one a function that is executed for each match to get the replacement. This way I think s1 gets parsed once, being more efficient.
Is there any way to do this in Lua?
In your example, this simple code works:
print((s1:gsub("%w+",replacementTable)))
The point is that gsub already accepts a table of replacements.
In the end, the solution that worked for me was the following one:
local updatedBody = string.gsub(body, '(<a[^>]* href=")(/[^"%?]*)([^"]*")', function(leftSide, url, rightSide)
local replacedUrl = url
if (urlsToReplace[url]) then replacedUrl = urlsToReplace[url] end
return leftSide .. replacedUrl .. rightSide
end)
It kept out any querystring parameter giving me just the URI. I know it's a bad idea to parse HTML bodies with regular expressions but for my case, where I required a lot of performance, this was performing a lot faster and just did the job.

Split string and get last element

Let's say I have a column which has values like:
foo/bar
chunky/bacon/flavor
/baz/quz/qux/bax
I.e. a variable number of strings separated by /.
In another column I want to get the last element from each of these strings, after they have been split on /. So, that column would have:
bar
flavor
bax
I can't figure this out. I can split on / and get an array, and I can see the function INDEX to get a specific numbered indexed element from the array, but can't find a way to say "the last element" in this function.
Edit:
this one is simplier:
=REGEXEXTRACT(A1,"[^/]+$")
You could use this formula:
=REGEXEXTRACT(A1,"(?:.*/)(.*)$")
And also possible to use it as ArrayFormula:
=ARRAYFORMULA(REGEXEXTRACT(A1:A3,"(?:.*/)(.*)$"))
Here's some more info:
the RegExExtract function
Some good examples of syntax
my personal list of Regex Tricks
This formula will do the same:
=INDEX(SPLIT(A1,"/"),LEN(A1)-len(SUBSTITUTE(A1,"/","")))
But it takes A1 three times, which is not prefferable.
You could do this too
=index(SPLIT(A1, "/"), COLUMNS(SPLIT(A1, "/"))-1)
Also possible, perhaps best on a copy, with Find:
.+/
(Replace with blank) and Search using regular expressions ticked.
You can try use this!
You've got the array of String, so you can acess the last element by length
String message = "chunky/bacon/flavor";
String[] outSplited = message.split("/");
System.out.println(outSplited[outSplited.length -1]);

How to parse GET tokens from URL with regular expression

Given a URL with GET arguments such as
http://www.domain.com?key1=value1+value2+value3&key2=value4+value5
I wish to capture all the values for a given key (into separate references if possible). For example if the desired key was key1 i would want to capture value1 in \1 (or $1 depending on language), value2 in \2, and value3 in \3.
My flawed regex is:
/[?&](?:key1)=((?:[^+&]+[+&$])+)/
which yields 0 results.
I am writing this in c++ using ECMA syntax, but I think I could convert a solution or advice from any regex flavor to ECMA. Any help would be appreciated.
This has been answered before and there are compact scripts written for it.
Regular expressions are not optimal for extracting query string values. At the end of this answer, I will give you an expression which can extract the value(s) for a given field into separate references. But not that it takes a "lot" of time to extract the parameters one at a time using regular expressions, but they can all be completely extracted very quickly with no regular expression engine needed. For instance, http://www.htmlgoodies.com/beyond/javascript/article.php/3755006/How-to-Use-a-JavaScript-Query-String-Parser.htm
What language are you trying to use to extract these parameters, C++?
If you are using, JavaScript, you use the small functions mentioned in the article above, i.e.,
function ptq(q)
{
/* parse the query */
var x = q.replace(/;/g, '&').split('&'), i, name, t;
/* q changes from string version of query to object */
for (q={}, i=0; i<x.length; i++)
{
t = x[i].split('=', 2);
name = unescape(t[0]);
if (!q[name])
q[name] = [];
if (t.length > 1)
{
q[name][q[name].length] = unescape(t[1]);
}
/* next two lines are nonstandard, allowing programmer-friendly Boolean parameters */
else
q[name][q[name].length] = true;
}
return q;
}
function param() {
return ptq(location.search.substring(1).replace(/+/g, ' '));
}
Once you have that code included in your page's scripts, then you can parse the current URLs data by doing query = param(); and then using the value of query.key1, etc.
You can parse other query-string formatted data by using the ptq() function directly, i.e., query_object = ptq(query_string).
If you are using another language and regular expressions are the way you want to do it, then this would return all values matching key1, for instance:
/key1=([^&;]*)/g
That will return all the values with a certain field name (which in the query string definition, are written like this, key1=value1&key1=value2&key1=value3, etc.).
The way you ask your question makes it sound like you want to create your own programmer-friendly way of supplying values (i.e., by constructing your own custom URLs rather than receiving data from form submissions through browsers) in which your values are separated by spaces (spaces are encoded as + signs in an HTTP GET query string, and as %20 in generic query strings).
You could make a complicated regular expression to do this in one step, but it is faster to match the entire field (all the values and the + signs as well), and then split the result at the + signs.
For each of the results from the regular expression I indicate, you can extract the plus-sign separated values by simply doing /[^+]*/g