Regex to parse the "Accept" header - regex

I'm working on a REST API. The client is using the Accept header in their request to send in stuff like
...application/vnd.mywebsite+json; version=1... or
...application/vnd.mywebsite+xml; version=2....
Currently, I am parsing the headers and picking out the media type and version to serve with string functions:
json and 1
xml and 2
I was wondering if I could do that faster with a regex.
How can I pull out the format and version from an "Accept" header in the request? I suppose, I would need to make 2 regex calls to get this done, and that's okay.
Update :
Using the answer below, I tried extracting those using ColdFusion, but the pattern just matches the whole string.
Ideally, I want an array of 2 elements, ie ['json', '1']. Any ideas ?
<cfscript>
arrTitles = reMatch(
"application/vnd.website\+([A-Za-z]+);\s*version=(\d+)",
"application/vnd.website+json; version=2"
);
writedump(arrTitles);
</cfscript>
Please refer this runnable example.

You could use something simple like this:
application/vnd.mywebsite\+([A-Za-z]+);\s*version=(\d+)
The type (json or xml) would be in capturing group 1, the version in group 2.
You can see it working here.

Related

Adapting Regular Expression in Django URL to match filepath

So I am currently working on a web application that takes as input the location of a malware file for one of the functions.
This is passed via the views file. However after some altering of the models section of the application I found it was unable to parse the full filepath.
The code below works for the following pcap as input:
8cdddcd3-35fa-468d-8647-816518a9836a435be1c6e904836ad65f97f3eac4cbe19ee7ba0da48178fc7f00206270469165.pcap
url(r'^analyse/(?P<pcap>[\w\-]+\.pcap)$', views.analyse, name='analyse'),
However this code no longer works when it is a pcap containing the full filepath.
/home/freddie/malwarepcaps/8cdddcd3-35fa-468d-8647-816518a9836a435be1c6e904836ad65f97f3eac4cbe19ee7ba0da48178fc7f00206270469165.pcap
Any suggestions or pointers on how exactly I would alter the regular expression to accomodate the full filepath in the string being passed to the route would be very much appreciated.
regex: ((/\w+?)+/)?([\w-]+\.pcap)
django regex: ^analyse(?P<pcap>((/\w+?)+/)?([\w-]+\.pcap))$
note that there is no slash after analyse because it's part of pcap now.
so analyse/home/freddie/malwarepcaps/foo-bar.pcap should match this pattern and pcap will be equal to /home/freddie/malwarepcaps/foo-bar.pcap
test:
https://pythex.org/?regex=((%2F%5Cw%2B%3F)%2B%2F)%3F(%5B%5Cw-%5D%2B%5C.pcap)&test_string=8cdddcd3-35fa-468d-8647-816518a9836a435be1c6e904836ad65f97f3eac4cbe19ee7ba0da48178fc7f00206270469165.pcap%20%0A%2Fhome%2Ffreddie%2Fmalwarepcaps%2F8cdddcd3-35fa-468d-8647-816518a9836a435be1c6e904836ad65f97f3eac4cbe19ee7ba0da48178fc7f00206270469165.pcap&ignorecase=0&multiline=0&dotall=0&verbose=0
PS: I think it's better to move such parameter (path - /home/f/m/f.pcap) into querystring (for GET request) or into http-body (for POST request)
so it will be easier to obtain param without url-matching

Regex to differentiate APIs

I need to create a regex to help determine the number the number of times an API is called. We have multiple APIs and this API is of the following format:
/foo/bar/{barId}/id/{id}
The above endpoint also supports query parameters so the following requests would be valid:
/foo/bar/{barId}/id/{id}?start=0&limit=10
The following requests are also valid:
/foo/bar/{barId}/id/{id}/
/foo/bar/{barId}/id/{id}
We also have the following endpoints:
/foo/bar/{barId}/id/type/
/foo/bar/{barId}/id/name/
/foo/bar/{barId}/id/{id}/price
My current regex to extract calls made only to /foo/bar/{barId}/id/{id} looks something like this:
\/foo\/bar\/(.+)\/id\/(?!type|name)(.+)
But the above regex also includes calls made to /foo/bar/{barId}/id/{id}/price endpoint.
I can check if the string after {id}/ isn't price and exclude calls made to price but it isn't a long term solution since if we add another endpoint we may need to update the regex.
Is there a way to filter calls made only to:
/foo/bar/{barId}/id/{id}
/foo/bar/{barId}/id/{id}/
/foo/bar/{barId}/id/{id}?start=0&limit=10
Such that /foo/bar/{barId}/id/{id}/price isn't also pulled in?
\/foo\/bar\/(.+)\/id\/(?!type|name)(.+)
There is something in your RegEx which is the cause to your problem. "(.+)" RegEx code matches every character after it. So replace it with "[^/]" and add the following code "/?(?!.+)". This is working for me.
/foo/bar/([^/]+)/id/(?!type|name)([^/]+)/?(?!.+)

URI encoding in C++ REST SDK ("Casablanca")

I'm using the http listener of the C++ REST SDK 2.8 and noticed the following. If I send the following URL to this listener:
http://my_server/my%2fpath?key=xxx%26yyy%3Dzzz
and I do:
auto uri = request.relative_uri();
auto v_path_components = web::uri::split_path(web::uri::decode(uri.path()));
auto m_query_components = web::uri::split_query(web::uri::decode(uri.query()));
then I find that v_path_components contains 2 elements ["my", "path"], and m_query_components contains 2 pairs [("key","xxx"), ("yyy","zzz")].
What I want and would have expected is v_path_components to contain 1 element ["my/path"], and m_query_components to contain 1 pair [("key","xxx&yyy=zzz")].
In order for the latter to achieve, relative_uri shouldn't decode/encode the uri, as that looses information. In addition, web::uri::decode() should be executed on the split results rather than before splitting. But, as the REST SDK itself as well as many samples shipped with it uses this in the above way, it leads me to believe that I might be wrong.
Could anyone confirm my findings or explain why I'm on the wrong track?
Your findings make sense.
Since you are decoding first, then the encoded ampersand (%3D) becomes a key/value pair separator. Same for the path components. The slash (%2f) becomes a path separator, and is parsed as such.

Coldfusion JSON Breaking with DataTables

Working on one of the tasks i am using jsstringformat function to handle json data if some special characters are used, but that does not seems to handle all issues.
My JSON still breaks.
I am using like this :
"<a href='edit.cfm?id=#jsStringFormat(qFiltered.randomnumber)#' style='color:##066D99'>#trim(jsStringFormat(qFiltered[thisColumn][qFiltered.currentRow]))#</a>"
I am lost here what else i can use as any part of regex or rereplace that it should not break
Thanks
You're doing multiple things here.
You're putting the string into a URL: use UrlEncodedFormat.
You're also putting it in an HTML tag: use HtmlEditFormat.
The whole thing is going into a JavaScript variable, so I would use JSStringFormat to wrap the whole thing.
Try building your string before assigning it.
<cfsavecontent variable="htmlLink"><cfoutput>
#HtmlEditFormat(Trim(qFiltered[thisColumn][qFiltered.currentRow]))#
</cfoutput></cfsavecontent>
myJsVar = "#JsStringFormat(Trim(htmlLink))#";

Classic ASP: encoding text outside tags with regex

I'm in the need of a function that could process a string and replace textual parts with html encoded ones:
Sample 1
Input: "<span>Total amount:<br>€ 50,00</span>"
Output: "<span>Total amount:<br>€ 50,00</span>"
Sample 2
Input: "<span>When threshold > x<br>act as described below:</span>"
Output: "<span>When threshold > x<br>act as described below:</span>"
These are simplified cases of course, and yes, I know I could do that by a series of replace on each specific char I need to encode, but I'd rather have a function that can recognize and skip HTML tags using Regex and perform a Server.HTMLEncode on the textual part of the input string.
Any help will be highly appreciated.
I'm not sure why you'd want to do this. Why don't you just pass the innerHTML into your parser using javascript and have Javascript create your span tag? Then you can encode the entire thing. I'd be worried that the encoding here won't have any added security for your application if that's what you are trying to do.