usage of extended true in body parser? - body-parser

please reveal me the difference between extended true and extended false in urlencoded bodyparser, some tutorials use extended true while other uses extended false, I'm very confused in between these two boolean expressions

The extended option allows to choose between parsing the URL-encoded data with the querystring library (when false ) or the qs library (when true ). The “extended” syntax allows for rich objects and arrays to be encoded into the URL-encoded format, allowing for a JSON-like experience with URL-encoded.

Related

Django| generate random url based on URl or PATH from urlconfig

I have a question regarding generating example urls according the Django url and path.
For example if I have 2 urls like:
path('example/<int:example_id>/', views.some_view, name='example')
url(r'^example/(?P<example_id>[0-9]+)/$', views.some_view, name='example')
Is it possible to generate somehow by Django built-in means example full-urls for tests like:
example/234/
example/93/
example/228/
etc…
randomly.
For different urls based on concrete name + regex or converter parameters?
In other words where in source code Django understands that <int:example_id> is an integer and so on. If I have something like : 1) I expect example/ 2) I expect int - I would be able to use it to generate random example url.
Hopefully this is clear...
These urls are just an example. Real urls will differ.
Thank you.
In other words where in source code Django understands that <int:example_id> is an integer and so on.
This is in essence looking for <…:…> patterns in the path. Django has furthermore a set of path converters. The standard ones are int, str, slug, uuid, path, but you can define your own path converters. Indeed, the Django documentation has a section registering custom path converters that explains how you can design your own path converters.
What a path basically does is constructing a regex. It thus searches for patterns like <int:example_id>. Since you here used int, the IntConverter [GitHUb] is used:
class IntConverter:
regex = '[0-9]+'
def to_python(self, value):
return int(value)
def to_url(self, value):
return str(value)
Here it will thus replace the <int:example_id> by (?<example_id>[0-9]+), where the regex part thus will be used.
A path(…) does not only constructs a regex. It also automatically maps the captured items to a Python object by calling .to_python(…). This thus means that if you work with re_path(r'^example/(?P<example_id>[0-9]+)/$', …), then example_id will be a string, but for path('example/<int:example_id>/', …), it will automatically call int(…) on the captured string, and thus pass an int object. Often that does not make much difference.
This also holds when you serialize an object: you can pass a value, and it will call .to_url(…) on the object to convert it to a string representation for the generated URL. This thus means that you can write custom path converters that thus perform more sophisticated conversions.
I expect example/ 2) I expect int - I would be able to use it to generate random example url.
Well eventually it boils down to a regular expression. A regular expression can be converted into a finite state machine, so we can generate random valid strings. For example exrex [GitHub] is capable to generate all valid strings (this is often an generator that will keep proposing new strings), or random strings that match the regular expression. It is however not said that because it matches the path(…), that it will be sensical for the view. If you for example use this int as the primary key of a Post object it is fetching, then for random URLs, it is likely that you will generate URLs for non-existing Posts.

Regex Error - (incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)

I'm trying to do something which seems like it should be very simple. I'm trying to see if a specific string e.g. 'out of stock' is found within a page's source code. However, I don't care if the string is contained within an html comment or javascript. So prior to doing my search, I'd like to remove both of these elements using regular expressions. This is the code I'm using.
urls.each do |url|
response = HTTP.get(url)
if response.status.success?
source_code = response.to_s
# Remove comments
source_code = source_code.gsub(/<!--(.*?)-->/su, '')
# Remove scripts
source_code = source_code.gsub(/<script(.*?)<\/script>/msu, '')
if source_code.match(/out of stock/i)
# Flag URL for further processing
end
end
end
end
This works for 99% of all the urls I tried it with, but certain urls have become problematic. When I try to use these regular expressions on the source code returned for the url "https://www.sunski.com" I get the following error message:
Encoding::CompatibilityError (incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string))
The page is definitely UTF-8 encoded, so I don't really understand the error message. A few people on stack overflow recommended using the # encoding: UTF-8 comment at the top of the file, but this didn't work.
If anyone could help with this it would be hugely appreciated. Thank you!
The Net::HTTP standard library only returns binary (ASCII-8BIT) strings. See the long-standing feature request: Feature #2567: Net::HTTP does not handle encoding correctly. So if you want UTF-8 strings you have to manually set their encoding to UTF-8 with String#force_encoding:
source_code.force_encoding(Encoding::UTF_8)
If the website's character encoding isn't UTF-8 you have to implement a heuristic based on the Content-Type header or <meta>'s charset attribute but even then it might not be the correct encoding. You can validate a string's encoding with String#valid_encoding? if you need to deal with such cases. Thankfully most websites use UTF-8 nowadays.
Also as #WiktorStribiżew already wrote in the comments, the regexp encoding specifiers s (Windows-31J) and u (UTF-8) modifiers aren't necessary here and only very rarely are. Especially the latter one since modern Ruby defaults to UTF-8 (or, if sufficient, its subset US-ASCII) anyway. In other programming languages they may have a different meaning, e.g. in Perl s means single line.

Query string degenerate cases

I am looking around looking for a correct regualr expression for validating URI query strings. I found some answers here or here but I still have doubts on the edge cases, where the key or the value could be empty. For example, should be the following treated as valid query strings?
?&&
?=
?a=
?a=&
?=a
?&=a
I am looking [...] for a correct regular expression for [valid] URI query strings.
Sure thing, no prob. As per RFC 3986, appendix B, here it is:
^([^#]*)$
If you want something more elaborate, you can check section 3.4 for the allowed characters in addition to percent-encoded entities. The regex would look something like this:
^(%[[:xdigit:]]{2}|[[:print:]])*$
As far as RFC 3986 is concerned, all your examples are valid so far. The RFC is telling us how the query string has to be encoded while saying little about how the query string has to be structured. Older RFCs are continuously shifting authority over the structure of query strings between CGI and HTTP without ever formally specifying a grammar (see e.g. RFC 3875, sec. 4.1.7, RFC 2396, sec. 3.4, RFC 1808, sec. 2.1, …).
An interesting note can be found in RFC 7230, section 2.4:
Applications MUST NOT directly specify the syntax of queries, as this can cause operational difficulties for deployments that do not support a particular form of a query.
[…]
HTML constrains the syntax of query strings used in form submission. New form languages SHOULD NOT emulate it, but instead allow creation of a broader variety of URIs
For a full validity check on such query strings, you would have to implement the algorithm for decoding formdata recommended by the W3C. Could be done in regex, but I would advise against it for reasons of sanity.
With regard to your examples: I believe they are all valid. How they are interpreted should be left to the receiving application. Some are not as much of a fringe case as you may think, though: ?&& is simply an empty dictionary while ?=a could map to { "": "a" }.

Using regex with p:keyFilter

I have an p:inputMask with a p:keyFilter to match time in the HH:mm pattern as following:
<p:inputMask mask="99:99" ...>
<p:keyFilter regEx="([0-9]|0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]"/>
</p:inputMask>
But it doesn't work and it accepts all values from 00:00 to 99:99.
How can I solve this?
p:keyFilter versus f:validateRegex – regEx versus inputRegEx
p:keyFilter with the regEx attribute is used to filter characters (on each key stroke), it does not allow you to validate an expression (on the complete inputted value). If you want to validate if your input matches a regular expression, use the inputRegEx attribute or f:validateRegex.
So, in your case you could use:
<p:inputXxx ...>
<f:validateRegex pattern="([0-9]|0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]"/>
</p:inputXxx>
Please note that p:keyFilter requires JavaScript regular expressions, while, while f:validateRegex requires a Java regular expression. And, p:keyFilter inputRegEx is checked on key up, while f:validateRegex is executed when the field is processed. The proper way to use p:keyFilter would be:
<p:inputXxx ...>
<p:keyFilter inputRegEx="/[0-9:]/"/>
</p:inputXxx>
But that will still allow invalid input.
So, in resume:
Property
p:keyFilter inputRegEx="..."
f:validateRegex pattern="..."
Regular expression type
JavaScript
Java
Executed when
Key up JavaScript event is triggered
Component is processed
This applies to all text input components (like p:inputText), so not only to the p:inputMask you are using).
See also:
https://primefaces.github.io/primefaces/10_0_0/#/components/keyfilter
Convert Javascript regular expression to Java syntax
Before PrimeFaces 6
Note that p:keyFilter is available since 6.0. For older versions you need PrimeFaces Extensions pe:keyFilter. Note that versions of PFE before 6.0 do not align with PF versions.
For something completely different
You could simply use p:datePicker, which nowadays can be used to enter time (hours and minutes) only:
<p:datePicker pattern="HH:mm" .../>
Or you could have a look at pe:timePicker.

RegEx to parse or validate Base64 data

Is it possible to use a RegEx to validate, or sanitize Base64 data? That's the simple question, but the factors that drive this question are what make it difficult.
I have a Base64 decoder that can not fully rely on the input data to follow the RFC specs. So, the issues I face are issues like perhaps Base64 data that may not be broken up into 78 (I think it's 78, I'd have to double check the RFC, so don't ding me if the exact number is wrong) character lines, or that the lines may not end in CRLF; in that it may have only a CR, or LF, or maybe neither.
So, I've had a hell of a time parsing Base64 data formatted as such. Due to this, examples like the following become impossible to decode reliably. I will only display partial MIME headers for brevity.
Content-Transfer-Encoding: base64
VGhpcyBpcyBzaW1wbGUgQVNDSUkgQmFzZTY0IGZvciBTdGFja092ZXJmbG93IGV4YW1wbGUu
Ok, so parsing that is no problem, and is exactly the result we would expect. And in 99% of the cases, using any code to at least verify that each char in the buffer is a valid base64 char, works perfectly. But, the next example throws a wrench into the mix.
Content-Transfer-Encoding: base64
http://www.stackoverflow.com
VGhpcyBpcyBzaW1wbGUgQVNDSUkgQmFzZTY0IGZvciBTdGFja092ZXJmbG93IGV4YW1wbGUu
This a version of Base64 encoding that I have seen in some viruses and other things that attempt to take advantage of some mail readers desire to parse mime at all costs, versus ones that go strictly by the book, or rather RFC; if you will.
My Base64 decoder decodes the second example to the following data stream. And keep in mind here, the original stream is all ASCII data!
[0x]86DB69FFFC30C2CB5A724A2F7AB7E5A307289951A1A5CC81A5CC81CDA5B5C1B19481054D0D
2524810985CD94D8D08199BDC8814DD1858DAD3DD995C999B1BDDC8195E1B585C1B194B8
Anyone have a good way to solve both problems at once? I'm not sure it's even possible, outside of doing two transforms on the data with different rules applied, and comparing the results. However if you took that approach, which output do you trust? It seems that ASCII heuristics is about the best solution, but how much more code, execution time, and complexity would that add to something as complicated as a virus scanner, which this code is actually involved in? How would you train the heuristics engine to learn what is acceptable Base64, and what isn't?
UPDATE:
Do to the number of views this question continues to get, I've decided to post the simple RegEx that I've been using in a C# application for 3 years now, with hundreds of thousands of transactions. Honestly, I like the answer given by Gumbo the best, which is why I picked it as the selected answer. But to anyone using C#, and looking for a very quick way to at least detect whether a string, or byte[] contains valid Base64 data or not, I've found the following to work very well for me.
[^-A-Za-z0-9+/=]|=[^=]|={3,}$
And yes, this is just for a STRING of Base64 data, NOT a properly formatted RFC1341 message. So, if you are dealing with data of this type, please take that into account before attempting to use the above RegEx. If you are dealing with Base16, Base32, Radix or even Base64 for other purposes (URLs, file names, XML Encoding, etc.), then it is highly recommend that you read RFC4648 that Gumbo mentioned in his answer as you need to be well aware of the charset and terminators used by the implementation before attempting to use the suggestions in this question/answer set.
From the RFC 4648:
Base encoding of data is used in many situations to store or transfer data in environments that, perhaps for legacy reasons, are restricted to US-ASCII data.
So it depends on the purpose of usage of the encoded data if the data should be considered as dangerous.
But if you’re just looking for a regular expression to match Base64 encoded words, you can use the following:
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$
This one is good, but will match an empty String
This one does not match empty string :
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{4})$
The answers presented so far fail to check that the Base64 string has all pad bits set to 0, as required for it to be the canonical representation of Base64 (which is important in some environments, see https://www.rfc-editor.org/rfc/rfc4648#section-3.5) and therefore, they allow aliases that are different encodings for the same binary string. This could be a security problem in some applications.
Here is the regexp that verifies that the given string is not just valid base64, but also the canonical base64 string for the binary data:
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/][AQgw]==|[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=)?$
The cited RFC considers the empty string as valid (see https://www.rfc-editor.org/rfc/rfc4648#section-10) therefore the above regex also does.
The equivalent regular expression for base64url (again, refer to the above RFC) is:
^(?:[A-Za-z0-9_-]{4})*(?:[A-Za-z0-9_-][AQgw]==|[A-Za-z0-9_-]{2}[AEIMQUYcgkosw048]=)?$
Neither a ":" nor a "." will show up in valid Base64, so I think you can unambiguously throw away the http://www.stackoverflow.com line. In Perl, say, something like
my $sanitized_str = join q{}, grep {!/[^A-Za-z0-9+\/=]/} split /\n/, $str;
say decode_base64($sanitized_str);
might be what you want. It produces
This is simple ASCII Base64 for StackOverflow exmaple.
The best regexp which I could find up till now is in here
https://www.npmjs.com/package/base64-regex
which is in the current version looks like:
module.exports = function (opts) {
opts = opts || {};
var regex = '(?:[A-Za-z0-9+\/]{4}\\n?)*(?:[A-Za-z0-9+\/]{2}==|[A-Za-z0-9+\/]{3}=)';
return opts.exact ? new RegExp('(?:^' + regex + '$)') :
new RegExp('(?:^|\\s)' + regex, 'g');
};
Here's an alternative regular expression:
^(?=(.{4})*$)[A-Za-z0-9+/]*={0,2}$
It satisfies the following conditions:
The string length must be a multiple of four - (?=^(.{4})*$)
The content must be alphanumeric characters or + or / - [A-Za-z0-9+/]*
It can have up to two padding (=) characters on the end - ={0,2}
It accepts empty strings
To validate base64 image we can use this regex
/^data:image/(?:gif|png|jpeg|bmp|webp)(?:;charset=utf-8)?;base64,(?:[A-Za-z0-9]|[+/])+={0,2}
private validBase64Image(base64Image: string): boolean {
const regex = /^data:image\/(?:gif|png|jpeg|bmp|webp|svg\+xml)(?:;charset=utf-8)?;base64,(?:[A-Za-z0-9]|[+/])+={0,2}/;
return base64Image && regex.test(base64Image);
}
The shortest regex to check RFC-4648 compiliance enforcing canonical encoding (i.e. all pad bits set to 0):
^(?=(.{4})*$)[A-Za-z0-9+/]*([AQgw]==|[AEIMQUYcgkosw048]=)?$
Actually this is the mix of this and that answers.
I found a solution that works very well
^(?:([a-z0-9A-Z+\/]){4})*(?1)(?:(?1)==|(?1){2}=|(?1){3})$
It will match the following strings
VGhpcyBpcyBzaW1wbGUgQVNDSUkgQmFzZTY0IGZvciBTdGFja092ZXJmbG93IGV4YW1wbGUu
YW55IGNhcm5hbCBwbGVhcw==
YW55IGNhcm5hbCBwbGVhc3U=
YW55IGNhcm5hbCBwbGVhc3Vy
while it won't match any of those invalid
YW5#IGNhcm5hbCBwbGVhcw==
YW55IGNhc=5hbCBwbGVhcw==
YW55%%%%IGNhcm5hbCBwbGVhc3V
YW55IGNhcm5hbCBwbGVhc3
YW55IGNhcm5hbCBwbGVhc
YW***55IGNhcm5hbCBwbGVh=
YW55IGNhcm5hbCBwbGVhc==
YW55IGNhcm5hbCBwbGVhc===
My simplified version of Base64 regex:
^[A-Za-z0-9+/]*={0,2}$
Simplification is that it doesn't check that its length is a multiple of 4. If you need that - use other answers. Mine is focusing on simplicity.
To test it: https://regex101.com/r/zdtGSH/1