lua gsub special replacement producing invalid capture index

lua gsub special replacement producing invalid capture index - regex

I have a piece of lua code (executing in Corona):
local loginstr = "emailAddress={email} password={password}"
print(loginstr:gsub( "{email}", "tester#test.com" ))
This code generates the error:
invalid capture index
While I now know it is because of the curly braces not being specified appropriately in the gsub pattern, I don't know how to fix it.
How should I form the gsub pattern so that I can replace the placeholder string with the email address value?
I've looked around on all the lua-oriented sites I can find but most of the documentation seems to revolve around unassociated situations.

As I've suggested in the comments above, when the e-mail is encoded as a URL parameter, the %40 used to encode the '#' character will be used as a capture index. Since the search pattern doesn't have any captures (let alone 40 of them), this will cause a problem.
There are two possible solutions: you can either decode the encoded string, or encode your replacement string to escape the '%' character in it. Depending on what you are going to do with the end result, you may need to do both.
the following routine (I picked up from here - not tested) can decode an encoded string:
function url_decode(str)
str = string.gsub (str, "+", " ")
str = string.gsub (str, "%%(%x%x)",
function(h) return string.char(tonumber(h,16)) end)
str = string.gsub (str, "\r\n", "\n")
return str
end
For escaping the % character in string str, you can use:
str:gsub("%%", "%%%%")
The '%' character is escaped as '%%', and it needs to be ascaped on both the search pattern and the replace pattern (hence the amount of % characters in the replace).

Are you sure your problem isn't that you're trying to gsub on loginurl rather than loginstr?
Your code gives me this error (see http://ideone.com/wwiZk):
lua: prog.lua:2: attempt to index global 'loginurl' (a nil value)
and that sounds similar to what you're seeing. Just fixing it to use the right variable:
print(loginstr:gsub( "{email}", "tester#test.com" ))
says (see http://ideone.com/mMj0N):
emailAddress=tester#test.com password={password}
as desired.

I had this in value part so You need to escape value with: value:gsub("%%", "%%%%").
Example of replacing "some value" in json:
local resultJSON = json:gsub(, "\"SOME_VALUE\"", value:gsub("%%", "%%%%"))

Related

Extracting key-value pairs from a string using ruby & regex

I want to accomplish the following with ruby and if possible a regex:
Input: "something {\"key\":\"value\",\"key2\":3}"
Output: [["\"key\"", "\"value\""], [["\"key2\"", "3"]]
My attempt so far:
s = "something {key:\"value\",key2:3}"
s.scan(/.* {(?:([^:]+):([^,}]+),?)+}$/)
# Output: [["\"key2\"", "3"]]
For some reason the regex above only matches the last key value pair. Does someone know how to retrieve all the pairs?
Just to be clear, "something" can be any kind of string. For this reason, solutions such as (1) splitting the text directly on the equal or (2) a regex as used in s.scan(/(?:([^:]+):([^,}]+),?)/) don't work for me.
I know there are similar questions on SO. Still, from what I saw, they mostly tend towards the solutions 1 & 2 or focus on a single key value pair.

your string looks like a json data structure encoded as a string, you can use JSON.parse for this as long as you remove the word "something " from the string
require 'json'
string = "something {\"key\":\"value\",\"key2\":3}"
# the following line removes the word something
string = string[string.index("{")..-1]
x = JSON.parse(string)
puts x["key"]
puts x["key2"]
you can then convert that to an array if required
alternatively if you want to use regular expressions try
string.scan(/(?:"(\w+)":"?(\w+)"?)/)

Need to remove specific parameter from url

I am new to the Lua library, I have one use case which I have to remove on a specific parameter and its value:
for example:
String 1 : ?xyz=true&toekn=4234dadsasda
String 2 : ?toekn=4234dadsasda&test=pass
Need output like this after removing token and its value
String 1 : ?xyz=true
String 2 : ?test=pass
I have tried the below Lua gsub function but no luck:
string.gsub(args, "token=.*", " ")
any help apricated, thanks

You may want to considers additional conditions (using & and ; as separators[1]) and corner cases (trailing separators and substrings with token):
text:gsub("([&;]?)%f[%a]token=[^&;]+([&;]?)",
function(s1, s2) return s1 and s2 and #(s1..s2) > 1 and s1 or "" end)
This solution works correctly on query strings that include parameters like subtoken and that use ; as separators. The template is using %f[%a], which is a frontier pattern that describes a zero-length boundary where non-letter changes to a letter (this includes the first character in a string).
[1] W3C recommends that all web servers support semicolon separators in addition to ampersand separators to allow application/x-www-form-urlencoded query strings in URLs within HTML documents without having to entity escape ampersands (wikipedia article on query string).

If you can only have two query params and no more than two as shown in your input you can use
text:gsub("&?token=[^&]+&?", "")
Or, if you have multiple query params, you can use
text:gsub("([&?])token=[^&]+&?", "%1"):gsub("(.*)&$", "%1")
See the online Lua demo #1 and the online Lua demo #2.
Details:
&? - an optional &
token= - a literal string
[^&]+ - one or more chars other than &
&? - an optional & char.
In the second solution, :gsub("([&?])token=[^&]+&?", "%1") replaces the match with either ? or & before the token, and the next gsub("(.*)&$", "%1") removes the & at the end of string in case the param occurs at the end of string.

Find group of strings starting and ending by a character using regular expression

I have a string, and I want to extract, using regular expressions, groups of characters that are between the character : and the other character /.
typically, here is a string example I'm getting:
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
and so, I want to retrieved, 45.72643,4.91203 and also hereanotherdata
As they are both between characters : and /.
I tried with this syntax in a easier string where there is only 1 time the pattern,
[tt]=regexp(str,':(\w.*)/','match')
tt = ':45.72643,4.91203/'
but it works only if the pattern happens once. If I use it in string containing multiples times the pattern, I get all the string between the first : and the last /.
How can I mention that the pattern will occur multiple time, and how can I retrieve it?

Use lookaround and a lazy quantifier:
regexp(str, '(?<=:).+?(?=/)', 'match')
Example (Matlab R2016b):
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = regexp(str, '(?<=:).+?(?=/)', 'match')
result =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'

In most languages this is hard to do with a single regexp. Ultimately you'll only ever get back the one string, and you want to get back multiple strings.
I've never used Matlab, so it may be possible in that language, but based on other languages, this is how I'd approach it...
I can't give you the exact code, but a search indicates that in Matlab there is a function called strsplit, example...
C = strsplit(data,':')
That should will break your original string up into an array of strings, using the ":" as the break point. You can then ignore the first array index (as it contains text before a ":"), loop the rest of the array and regexp to extract everything that comes before a "/".
So for instance...
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
Breaks down into an array with parts...
1 - 'abcd'
2 - '45.72643,4.91203/Rou'
3 - 'hereanotherdata/defgh'
Then Ignore 1, and extract everything before the "/" in 2 and 3.

As John Mawer and Adriaan mentioned, strsplit is a good place to start with. You can use it for both ':' and '/', but then you will not be able to determine where each of them started. If you do it with strsplit twice, you can know where the ':' starts :
A='abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
B=cellfun(#(x) strsplit(x,'/'),strsplit(A,':'),'uniformoutput',0);
Now B has cells that start with ':', and has two cells in each cell that contain '/' also. You can extract it with checking where B has more than one cell, and take the first of each of them:
C=cellfun(#(x) x{1},B(cellfun('length',B)>1),'uniformoutput',0)
C =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'

Starting in 16b you can use extractBetween:
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = extractBetween(str,':','/')
result =
2×1 cell array
{'45.72643,4.91203'}
{'hereanotherdata' }
If all your text elements have the same number of delimiters this can be vectorized too.

How can I parse a char array with octal values in Python?

EDIT: I should note that I want a general case for any hex array, not just the google one I provided.
EDIT BACKGROUND: Background is networking: I'm parsing a DNS packet and trying to get its QNAME. I'm taking in the whole packet as a string, and every character represents a byte. Apparently this problem looks like a Pascal string problem, and using the struct module seems like the way to go.
I have a char array in Python 2.7 which includes octal values. For example, let's say I have an array
DNS = "\03www\06google\03com\0"
I want to get:
www.google.com
What's an efficient way to do this? My first thought would be iterating through the DNS char array and adding chars to my new array answer. Every time i see a '\' char, I would ignore the '\' and two chars after it. Is there a way to get the resulting www.google.com without using a new array?
my disgusting implementation (my answer is an array of chars, which is not what i want, i want just the string www.google.com:
DNS = "\\03www\\06google\\03com\\0"
answer = []
i = 0
while i < len(DNS):
if DNS[i] == '\\' and DNS[i+1] != 0:
i += 3
elif DNS[i] == '\\' and DNS[i+1] == 0:
break
else:
answer.append(DNS[i])
i += 1

Now that you've explained your real problem, none of the answers you've gotten so far will work. Why? Because they're all ways to remove sequences like \03 from a string. But you don't have sequences like \03, you have single control characters.
You could, of course, do something similar, just replacing any control character with a dot.
But what you're really trying to do is not replace control characters with dots, but parse DNS packets.
DNS is defined by RFC 1035. The QNAME in a DNS packet is:
a domain name represented as a sequence of labels, where each label consists of a length octet followed by that number of octets. The domain name terminates with the zero length octet for the null label of the root. Note that this field may be an odd number of octets; no padding is used.
So, let's parse that. If you understand how "labels consisting of "a length octet followed by that number of octets" relates to "Pascal strings", there's a quicker way. Also, you could write this more cleanly and less verbosely as a generator. But let's do it the dead-simple way:
def parse_qname(packet):
components = []
offset = 0
while True:
length, = struct.unpack_from('B', packet, offset)
offset += 1
if not length:
break
component = struct.unpack_from('{}s'.format(length), packet, offset)
offset += length
components.append(component)
return components, offset

import re
DNS = "\\03www\\06google\\03com\\0"
m = re.sub("\\\\([0-9,a-f]){2}", "", DNS)
print(m)

Maybe something like this?
#!/usr/bin/python3
import re
def convert(adorned_hostname):
result1 = re.sub(r'^\\03', '', adorned_hostname )
result2 = re.sub(r'\\0[36]', '.', result1)
result3 = re.sub(r'\\0$', '', result2)
return result3
def main():
adorned_hostname = r"\03www\06google\03com\0"
expected_result = 'www.google.com'
actual_result = convert(adorned_hostname)
print(actual_result, expected_result)
assert actual_result == expected_result
main()

For the question as originally asked, replacing the backslash-hex sequences in strings like "\\03www\\06google\\03com\\0" with dots…
If you want to do this with a regular expression:
\\ matches a backslash.
[0-9A-Fa-f] matches any hex digit.
[0-9A-Fa-f]+ matches one or more hex digits.
\\[0-9A-Fa-f]+ matches a backslash followed by one or more hex digits.
You want to find each such sequence, and replace it with a dot, right? If you look through the re docs, you'll find a function called sub which is used for replacing a pattern with a replacement string:
re.sub(r'\\[0-9A-Fa-f]+', '.', DNS)
I suspect these may actually be octal, not hex, in which case you want [0-7] rather than [0-9A-Fa-f], but nothing else would change.
A different way to do this is to recognize that these are valid Python escape sequences. And, if we unescape them back to where they came from (e.g., with DNS.decode('string_escape')), this turns into a sequence of length-prefixed (aka "Pascal") strings, a standard format that you can parse in any number of ways, including the stdlib struct module. This has the advantage of validating the data as you read it, and not being thrown off by any false positives that could show up if one of the string components, say, had a backslash in the middle of it.
Of course that's presuming more about the data. It seems likely that the real meaning of this is "a sequence of length-prefixed strings, concatenated, then backslash-escaped", in which case you should parse it as such. But it could be just a coincidence that it looks like that, in which case it would be a very bad idea to parse it as such.

Regex - If contains '%', can only contain '%20'

I am wanting to create a regular expression for the following scenario:
If a string contains the percentage character (%) then it can only contain the following: %20, and cannot be preceded by another '%'.
So if there was for instance, %25 it would be rejected. For instance, the following string would be valid:
http://www.test.com/?&Name=My%20Name%20Is%20Vader
But these would fail:
http://www.test.com/?&Name=My%20Name%20Is%20VadersAccountant%25
%%%25
Any help would be greatly appreciated,
Kyle
EDIT:
The scenario in a nutshell is that a link is written to an encoded state and then launched via JavaScript. No decoding works. I tried .net decoding and JS decoding, each having the same result - The results stay encoded when executed.

Doesn't require a %:
/^[^%]*(%20[^%]*)*$/

Which language are you using?
Most languages have a Uri Encoder / Decoder function or class.
I would suggest you decode the string first and than check for valid (or invalid) characters.
i.e. something like /[\w ]/ (empty is a space)
With a regex in the first place you need to respect that www.example.com/index.html?user=admin&pass=%%250 means that the pass really is "%250".

Another solution if look-arounds are not available:
^([^%]|%([013-9a-fA-F][0-9a-fA-F]|2[1-9a-fA-F]))*$

Reject the string if it matches %[^2][^0]

I think that would find what you need
/^([^%]|%%|%20)+$/
Edit: Added case where %% is valid string inside URI
Edit2: And fixed it for case where it should fail :-)
Edit3:
In case you need to use it in editor (which would explain why you can't use more programmatic way), then you have to correctly escape all special characters, for example in Vim that regex should lool:
/^\([^%]\|%%\|%20\)\+$/

Maybe a better approach is to deal with that validation after you decode that string:
string name = HttpUtility.UrlDecode(Request.QueryString["Name"]);

/^([^%]|%20)*$/

This requires a test against the "bad" patterns. If we're allowing %20 - we don't need to make sure it exists.
As others have said before, %% is valid too... and %%25would be %25
The below regex matches anything that doesn't fit into the above rules
/(?<![^%]%)%(?!(20|%))/
The first brackets check whether there is a % before the character (meaning that it's %%) and also checks that it's not %%%. it then checks for a %, and checks whether the item after doesn't match 20
This means that if anything is identified by the regex, then you should probably reject it.

I agree with dominic's comment on the question. Don't use Regex.
If you want to avoid scanning the string twice, you can just iteratively search for % and then check that it is being followed by 20 and nothing else. (Update: allow a % after to be interpreted as a literal %nnn sequence)
// pseudo code
pos = 0
while (pos = mystring.find(pos, '%'))
{
if mystring[pos+1] = "%" then
pos = pos + 2 // ok, this is a literal, skip ahead
else if mystring.substring(pos,2) != "20"
return false; // string is invalid
end if
}
return true;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

lua gsub special replacement producing invalid capture index - regex

I had this in value part so You need to escape value with: value:gsub("%%", "%%%%"). Example of replacing "some value" in json: local resultJSON = json:gsub(, "\"SOME_VALUE\"", value:gsub("%%", "%%%%"))

Related

Extracting key-value pairs from a string using ruby & regex

Need to remove specific parameter from url

Find group of strings starting and ending by a character using regular expression

How can I parse a char array with octal values in Python?

Regex - If contains '%', can only contain '%20'

Categories

Resources