How to create a reg exp to parse such url?

How to create a reg exp to parse such url? - regex

So we have http://127.0.0.1:4773/robot10382.flv?action=read we need to get out from it protocol, ip/adress, port, actual url (robot10382.flv here) and actions (action=read here) how to parse all that into string vars in one reg exp?

I'm surprised that AS3 does not include proper URL parsing facilities. To put it simply, it is not easy to safely parse a URL using an RE. Here's an example of doing it though.

/(\w+)\:\/\/(\d+\.\d+\.\d+\.\d+)\:(\d+)\/(\w+)\?(.+)/ : $1 - protocol, $2 - ip, $3 - port, $4 - actual url, $5 - actions
there's also another way:
protocol : url.split('://')[0]
ip/domain name : url.split('://')[1].split(':')[0] (or if no port specified - url.split('://')[1].split('/)[0]
port : url.split('://')[1].split(':')[1].split('/')[0]
actual url : url.split('?')[0].split('/').reverse()[0]
actions : url.split('?')[1].split('&')/*the most possible separator imho*/ elements of this array can also be spliced('=') to separate variable names and values.
i know there's an opinion that splice shouldn't be used, but i think it's just beautiful when used properly.

Sometimes when passing a file path to a SWF you would like to perform FileExistance check before passing the file to an AS3 class. To do so you want to know if a URI is a file or an http URL or any other URI with specific protocol (moniker).
The following code will tell you if you are dealing with a local full or relative path.
http://younsi.blogspot.com/2009/08/as3-uri-parser-and-code-sequence-to.html

Related

Adapting Regular Expression in Django URL to match filepath

So I am currently working on a web application that takes as input the location of a malware file for one of the functions.
This is passed via the views file. However after some altering of the models section of the application I found it was unable to parse the full filepath.
The code below works for the following pcap as input:
8cdddcd3-35fa-468d-8647-816518a9836a435be1c6e904836ad65f97f3eac4cbe19ee7ba0da48178fc7f00206270469165.pcap
url(r'^analyse/(?P<pcap>[\w\-]+\.pcap)$', views.analyse, name='analyse'),
However this code no longer works when it is a pcap containing the full filepath.
/home/freddie/malwarepcaps/8cdddcd3-35fa-468d-8647-816518a9836a435be1c6e904836ad65f97f3eac4cbe19ee7ba0da48178fc7f00206270469165.pcap
Any suggestions or pointers on how exactly I would alter the regular expression to accomodate the full filepath in the string being passed to the route would be very much appreciated.

regex: ((/\w+?)+/)?([\w-]+\.pcap)
django regex: ^analyse(?P<pcap>((/\w+?)+/)?([\w-]+\.pcap))$
note that there is no slash after analyse because it's part of pcap now.
so analyse/home/freddie/malwarepcaps/foo-bar.pcap should match this pattern and pcap will be equal to /home/freddie/malwarepcaps/foo-bar.pcap
test:
https://pythex.org/?regex=((%2F%5Cw%2B%3F)%2B%2F)%3F(%5B%5Cw-%5D%2B%5C.pcap)&test_string=8cdddcd3-35fa-468d-8647-816518a9836a435be1c6e904836ad65f97f3eac4cbe19ee7ba0da48178fc7f00206270469165.pcap%20%0A%2Fhome%2Ffreddie%2Fmalwarepcaps%2F8cdddcd3-35fa-468d-8647-816518a9836a435be1c6e904836ad65f97f3eac4cbe19ee7ba0da48178fc7f00206270469165.pcap&ignorecase=0&multiline=0&dotall=0&verbose=0
PS: I think it's better to move such parameter (path - /home/f/m/f.pcap) into querystring (for GET request) or into http-body (for POST request)
so it will be easier to obtain param without url-matching

Cant get path to a string using Poco

I'm trying to put the path in a string, but it always empty:
Poco::URI uri("http://10.10.10.10:3535");
std::string path(uri.getPathAndQuery());
This string part never receive anything and stays empty..
The code snippet is taking from an example https://gist.github.com/FatalCatharsis/749d93b4592e7d59d91a

In your URI the Path and Query are empty, so it is correct that you see an empty string. Your URI only has a Scheme, Host and Port.
Here is a diagram of the URI format from Wikipedia:
authority path
┌───────────────┴───────────────┐┌───┴────┐
abc://username:password#example.com:123/path/data?key=value#fragid1
└┬┘ └───────┬───────┘ └────┬────┘ └┬┘ └───┬───┘ └──┬──┘
scheme user information host port query fragment

http://10.10.10.10:3535 does not have any path or query in the string, that is why it is empty.
Here is you can separate URI :
scheme:[//[user:password#]host[:port]][/]path[?query][#fragment]
eg. http:://192.168.11.2:3000/user?action=edit#basic

nginx - URL encode query string

I have an nginx reverse-proxy which needs to pass on the query string it receives. However this query string it receives is not well formatted and can contain JSON that is not URL encoded i.e. it contains curly brackets i.e. {}, commas, colons and double quotes! Unfortunately, I have no control over this and this causes the downstream server to barf when parsing the string.
Is there a way to correctly URL encode this string before proxying it?
I can replace the curly brackets as I know there will only be one instance of each using the config:
if ($args ~* '(.*){(.*)}(.*)') {
set $args $1%7B$2%7D$3;
rewrite (.*)$ $1;
}
proxy_pass http://127.0.0.1:8080;
However, I don't know in advance how many fields the JSON will have so it's difficult to use the same logic as above for the rest of the object.
I should also mention that I don't think this is related to nginx url-decoding parameters as I am not using a URI in the proxy_pass.
Thanks!
UPDATE: For the time being, the JSON object seems to be sending the same properties so this is what I've used as a workaround. It's pretty hideous and will break if the number of properties changes but does the job for now.
if ($args ~* '(.*){"(.*)":"(.*)","(.*)":"(.*)","(.*)":"(.*)","(.*)":"(.*)","(?<group10>.*)":"(?<group11>.*)"}(?<group12>.*)') {
set $args $1%7B%22$2%22%3A%22$3%22%2C%22$4%22%3A%22$5%22%2C%22$6%22%3A%22$7%22%2C%22$8%22%3A%22$9%22%2C%22${group10}%22%3A%22${group11}%22%7D${group12};
rewrite (.*)$ $1;
}
proxy_pass http://127.0.0.1:8080;
Note that since this returns more than 9 regex groups, I had to name groups 10, 11 and 12 otherwise they get interpreted as $1 + the digit 0, 1 or 2.
Is there a more robust way of doing this?

Personally, I don't like a solution with a single if statement, because it doesn't look very readable, flexible or maintainable. You may see whether having a combination of location or rewrite statements, where each one handles a specific encoding case, may work; see http://mdoc.su/ for a fun project that's very heavy with internal redirects, although I believe at one point nginx may have a limit on the total number of indirections.
Otherwise, provided that you cannot modify the backend, another option is to automatically redirect misbehaving clients and/or requests to an auxiliary backend, whose only purpose is to re-encode the string correctly, providing an X-Accel-Redirect HTTP Response Header as its output (as per http://nginx.org/r/proxy_ignore_headers), which nginx will use to make a subsequent internal redirect / request to the actual backend.

URL general format

I have written a C++ program that allows URLs to be posted onto YouTube. It works by taking in the URL as input either from you typing it into the program or from direct input, and then it will replace every '/', '.' in the string with '*'. This modified string is then put on your clipboard (this is solely for Windows-users).
Of course, before I can even call the program usable, it has to go back: I will need to know when '.', '/' are used in URLs. I have looked at this article: http://en.wikipedia.org/wiki/Uniform_Resource_Locator , and know that '.' is used when dealing with the "master website" (in the case of this URL, "en.wikipedia.org"), and then '/' is used afterwards, but I have been to other websites, http://msdn.microsoft.com/en-us/library/windows/desktop/ms649048%28v=vs.85%29.aspx , where this simply isn't the case (it even replaced '(', ')' with "%28", "%29", respectively!)
I also seemed to have requested a .aspx file, whatever that is. Also, there is a '.' inside the parentheses in that URL. I have even tried to view the regular expressions (I don't quite fully understand those yet...) regarding URLs. Could someone tell me (or link me to) the rules regarding the use of '.', '/' in URLs?

Can you explain why you are doing this convoluted thing? What are you trying to achieve? It may be that you don't need to know as much as you think, once you answer that question.
In the mean time here is some information. A URL is really comprised of a number of sections
http: - the "scheme" or protocol used to access the resource. "HTTP", "HTTPS",
"FTP", etc are all examples of a scheme. There are many others
// - separates the protocol from the host (server) address
myserver.org - the host. The host name is looked up against a DNS (Dynamic Name Server)
service and resolved to an IP address - the "phone number" of the machine
which can serve up the resource (like "98.139.183.24" for www.yahoo.com)
www.myserver.org - the host with a prefix. Sometimes the same domain (`myserver.org`)
connects multiple servers (or ports) and you can be sent straight to the
right server with the prefix (mail., www., ftp., ... up to the
administrators of the domain). Conventionally, a server that serves content
intended for viewing with a browser has a `www.` prefix, but there's no rule
that says this must be the case.
:8080/ - sometimes, you see a colon followed by up to five digits after the domain.
this indicates the PORT on the server where you are accessing data
some servers allow certain specific services on just a particular port
they might have a "public access" website on port 80, and another one on 8080
the https:// protocol defaults to port 443, there are ports for telnet, ftp,
etc. Add these things only if you REALLY know what you are doing.
/the/pa.th/ this is the path relative to DOCUMENTROOT on the server where the
resource is located. `.` characters are legal here, just as they are in
directory structures.
file.html
file.php
file.asp
etc - usually the resource being fetched is a file. The file may have
any of a great number of extensions; some of these indicate to the server that
instead of sending the file straight to the requester,
it has to execute a program or other instructions in this file,
and send the result of that
Examples of extensions that indicate "active" pages include
(this is not nearly exhaustive - just "for instance"):
.php = contains a php program
.py = contains a python program
.js = contains a javascript program
(usually called from inside an .htm or .html)
.asp = "active server page" associated with a
Microsoft Internet Information Server
?something=value&somethingElse=%23othervalue%23
parameters that are passed to the server can be shown in the URL.
This can be used to pass parameters, entries in a form, etc.
Any character might be passed here - including '.', '&', '/', ...
But you can't just write those characters in your string...
Now comes the fun part.
URLs cannot contain certain characters (quite a few, actually). In order to get around this, there exists a mechanism called "escaping" a character. Typically this means replacing a character with the hexadecimal equivalent, prefixed with a % sign. Thus, you frequently see a space character represented as %20, for example. You can find a handly list here
There are many functions available for converting "illegal" characters in a URL automatically to a "legal" value.
To learn about exactly what is and isn't allowed, you really need to go back to the original specifications. See for example
http://www.ietf.org/rfc/rfc1738.txt
http://www.ietf.org/rfc/rfc2396.txt
http://www.ietf.org/rfc/rfc3986.txt
I list them in chronological order - the last one being the most recent.
But I repeat my question -- what are you really trying to do here, and why?

Regexp to grab protocol from URL

Let's say I have a variable called URL and it's assigned a value of http://www.google.com. I can also received the URL via ftp, hence it'll be ftp://ftp.google.com. How can I have it so I grab everything before the :? I'll have an if/else condition afterwards to test the logic.

/^[^:]+/
If you want to prevent 'www.foobar.com' (which has no protocol specified) to match as protocol:
/^[^:]+(?=:\/\/)/

You mean like this?
/^(.*?):/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js