Extract IP:PORT - regex

How can I extract all IP:PORT from a given website ? I have this current Regex PATTERN but i think it doesn't grab all..
Or is it a better way to do it?
PATTERN = '((?:1?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:1?\d{1,2}|2[0-4]\d|25[0-5]):\d{2,5}';

Instead of RegEx, you can use the Internet Direct (Indy) unit IdURI. It can parse any URI into its protocol parts. It supports IPv4 and IPv6. The unit is quite self-contained.
MyURI := TIdURI.Create('http://127.0.0.1:8080');
try
MyHost := MyURI.Host;
MyPort := MyURI.Port;
finally
MyURI.Free;
end;
Properties expose detailed information about the URI:
property Bookmark : string read FBookmark write FBookMark;
property Document: string read FDocument write FDocument;
property Host: string read FHost write FHost;
property Password: string read FPassword write FPassword;
property Path: string read FPath write FPath;
property Params: string read FParams write FParams;
property Port: string read FPort write FPort;
property Protocol: string read FProtocol write FProtocol;
property URI: string read GetURI write SetURI;
property Username: string read FUserName write FUserName;
property IPVersion : TIdIPVersion read FIPVersion write FIPVersion;
See also this warning, however I think it does not affect simple host:port URI parsing:
https://stackoverflow.com/a/502011/80901
I recommend to download a current release of Indy to have the latest fixes.

This will work, if there is always a port following the IP:
\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\:\d{2,5}\b
Matches:
1.2.3.4:80
001.002.003.004:2345
255.255.255.255:13245
Does not match:
1.2.3
1.2.3:01
1.2.3.4.5:99
299.299.299.299:123

Regexes are not a magic wand that you should wave at every problem relating to strings. In this case, the language you're using probably has support for URL parsing.
In PHP, you parse URLs with the parse_url() function. http://php.net/manual/en/function.parse-url.php
In Perl, you use the URI::URL class http://search.cpan.org/dist/URI/
If you really want to use a regex, the Perl module http://search.cpan.org/dist/Regexp-Common/ has already-built regexes for you to detect IP addresses.
Whatever language that you're using, someone has already written, debugged and tested code that already does what you want. Use that existing code rather than writing your own.

Related

Adapting Regular Expression in Django URL to match filepath

So I am currently working on a web application that takes as input the location of a malware file for one of the functions.
This is passed via the views file. However after some altering of the models section of the application I found it was unable to parse the full filepath.
The code below works for the following pcap as input:
8cdddcd3-35fa-468d-8647-816518a9836a435be1c6e904836ad65f97f3eac4cbe19ee7ba0da48178fc7f00206270469165.pcap
url(r'^analyse/(?P<pcap>[\w\-]+\.pcap)$', views.analyse, name='analyse'),
However this code no longer works when it is a pcap containing the full filepath.
/home/freddie/malwarepcaps/8cdddcd3-35fa-468d-8647-816518a9836a435be1c6e904836ad65f97f3eac4cbe19ee7ba0da48178fc7f00206270469165.pcap
Any suggestions or pointers on how exactly I would alter the regular expression to accomodate the full filepath in the string being passed to the route would be very much appreciated.
regex: ((/\w+?)+/)?([\w-]+\.pcap)
django regex: ^analyse(?P<pcap>((/\w+?)+/)?([\w-]+\.pcap))$
note that there is no slash after analyse because it's part of pcap now.
so analyse/home/freddie/malwarepcaps/foo-bar.pcap should match this pattern and pcap will be equal to /home/freddie/malwarepcaps/foo-bar.pcap
test:
https://pythex.org/?regex=((%2F%5Cw%2B%3F)%2B%2F)%3F(%5B%5Cw-%5D%2B%5C.pcap)&test_string=8cdddcd3-35fa-468d-8647-816518a9836a435be1c6e904836ad65f97f3eac4cbe19ee7ba0da48178fc7f00206270469165.pcap%20%0A%2Fhome%2Ffreddie%2Fmalwarepcaps%2F8cdddcd3-35fa-468d-8647-816518a9836a435be1c6e904836ad65f97f3eac4cbe19ee7ba0da48178fc7f00206270469165.pcap&ignorecase=0&multiline=0&dotall=0&verbose=0
PS: I think it's better to move such parameter (path - /home/f/m/f.pcap) into querystring (for GET request) or into http-body (for POST request)
so it will be easier to obtain param without url-matching

extract specific string from a file using c++

file content:
/function name: input\n\t\tworking: inputs details\n\t\t/
/*fuction name: datawrite\n\t\tworking: write the data into the file
*/
/function name: dataread\n\tworking: read the data in the file for
display/
/*funtion name: accsummary\n\t\tworking: display the content of the
file of specific person */
I want to extract only the "function names" and the "workings" from the file and store them into the array of string. "using c++"
that is, if i have declared an array of string function[10] then this should store "input,datawrite,dataread,accsummary" similarly from "working"
This sounds like a task for regular expression. In C++ there is support for regex especially regex_match.
I guess, this should get you started. But be warned, what you are trying to accomplish will not be solved by simple regex.
Your matching string might look something like this
/\/function name: ([^\\]*).*/
This will look for string "function name: " followed by any character other than \ . and then any character up to the end of the line. The second part will be remembered and can be accessed by regex_match.
Try it in online regex tester and modify it based on your specific needs. Just note that it takes regex without leading and ending /.
Oh, I noticed that you asked also for extracting workings, while my example extracts only function names. But you will get there when you get the concept.
You need to take a look at the std::stringstream class:
http://www.cplusplus.com/reference/sstream/stringstream/
Then you need to look at the substr method in std::string:
http://www.cplusplus.com/reference/string/string/substr/

Read string which contains \" from mongodb

In my MongoDB, I have stored below string
"description" : "25\"",
But when I try to read it in C++ driver using both ways below, I always get "25""
d->description=record.getStringField("description");
or
d->description = record.getField("description").jsonString(Strict);
I need to keep back slash \ here, because the string will be sent to web browser, JavaScript code will parse this string to JSON object.
Any way to do this?
Not sure how that string got in there, but this will not serialize or deserialize properly without the proper escaping. It should look more like this:
{ "description" : "25\\\"" }
You should update these with your driver, which should do the serialization properly just based on your regular input, ie 25".
When the fields in the document look like above then they will deserialize how you want.

How to create a reg exp to parse such url?

So we have http://127.0.0.1:4773/robot10382.flv?action=read we need to get out from it protocol, ip/adress, port, actual url (robot10382.flv here) and actions (action=read here) how to parse all that into string vars in one reg exp?
I'm surprised that AS3 does not include proper URL parsing facilities. To put it simply, it is not easy to safely parse a URL using an RE. Here's an example of doing it though.
/(\w+)\:\/\/(\d+\.\d+\.\d+\.\d+)\:(\d+)\/(\w+)\?(.+)/ : $1 - protocol, $2 - ip, $3 - port, $4 - actual url, $5 - actions
there's also another way:
protocol : url.split('://')[0]
ip/domain name : url.split('://')[1].split(':')[0] (or if no port specified - url.split('://')[1].split('/)[0]
port : url.split('://')[1].split(':')[1].split('/')[0]
actual url : url.split('?')[0].split('/').reverse()[0]
actions : url.split('?')[1].split('&')/*the most possible separator imho*/ elements of this array can also be spliced('=') to separate variable names and values.
i know there's an opinion that splice shouldn't be used, but i think it's just beautiful when used properly.
Sometimes when passing a file path to a SWF you would like to perform FileExistance check before passing the file to an AS3 class. To do so you want to know if a URI is a file or an http URL or any other URI with specific protocol (moniker).
The following code will tell you if you are dealing with a local full or relative path.
http://younsi.blogspot.com/2009/08/as3-uri-parser-and-code-sequence-to.html

What's the best way to validate a user-entered URL in a Cocoa application?

I am trying to build a homebrew web brower to get more proficient at Cocoa. I need a good way to validate whether the user has entered a valid URL. I have tried some regular expressions but NSString has some interesting quirks and doesn't like some of the back-quoting that most regular expressions I've seen use.
You could start with the + (id)URLWithString:(NSString *)URLString method of NSURL, which returns nil if the string is malformed.
If you need further validation, you can use the baseURL, host, parameterString, path, etc methods to give you particular components of the URL, which you can then evaluate in whatever way you see fit.
I've found that it is possible to enter some URLs that seem to be OK but are rejected by the NSURL creation methods. So we have a method to escape the string first to make sure it's in a good format. Here is the meat of it:
NSString *escapedURLString =
NSMakeCollectable(CFURLCreateStringByAddingPercentEscapes(NULL,
(CFStringRef)URLString,
(CFStringRef)#"%+#", // Characters to leave unescaped
NULL,
kCFStringEncodingUTF8));