Regex: Matching,parsing an FTP response to a request - regex

Here's what i'm trying to do:
I what to have some FTP functionality in one of my apps (this is just for myself, not a business application or such) and since I didn't wanted to write all that FTP request/response code for myself, I (being the lazy man I am) search the internet for an FTP wrapper.
I have found this DLL.
This is all very great, works like a charm. Except for one thing: when I request the LastWriteTime of a specific file ON the FTP server, the DLL is giving me strange dates (namely, prints out fictional dates). I've been able to find the problem. Whenever you send a request to the FTP server, it sends back a one line response, which has a very special format. Now what i've been able to gather, this format is different for most of the servers, my wrapper DLL comes with 6 pre-defined response formats, but my FTP server sends back a 7th one. Here's a response to a request and the REGEX formats:
-rw-r--r-- 1 user user 594 Jun 11 03:44 random_log.file
here are my regex parsing formats:
"(?<dir>[\-d])(?<permission>([\-r][\-w][\-xs]){3})\s+\d+\s+\w+\s+\w+\s+(?<size>\d+)\s+(?<timestamp>\w+\s+\d+\s+\d{4})\s+(?<name>.+)", _
"(?<dir>[\-d])(?<permission>([\-r][\-w][\-xs]){3})\s+\d+\s+\d+\s+(?<size>\d+)\s+(?<timestamp>\w+\s+\d+\s+\d{4})\s+(?<name>.+)", _
"(?<dir>[\-d])(?<permission>([\-r][\-w][\-xs]){3})\s+\d+\s+\d+\s+(?<size>\d+)\s+(?<timestamp>\w+\s+\d+\s+\d{1,2}:\d{2})\s+(?<name>.+)", _
"(?<dir>[\-d])(?<permission>([\-r][\-w][\-xs]){3})\s+\d+\s+\w+\s+\w+\s+(?<size>\d+)\s+(?<timestamp>\w+\s+\d+\s+\d{1,2}:\d{2})\s+(?<name>.+)", _
"(?<dir>[\-d])(?<permission>([\-r][\-w][\-xs]){3})(\s+)(?<size>(\d+))(\s+)(?<ctbit>(\w+\s\w+))(\s+)(?<size2>(\d+))\s+(?<timestamp>\w+\s+\d+\s+\d{2}:\d{2})\s+(?<name>.+)", _
"(?<timestamp>\d{2}\-\d{2}\-\d{2}\s+\d{2}:\d{2}[Aa|Pp][mM])\s+(?<dir>\<\w+\>){0,1}(?<size>\d+){0,1}\s+(?<name>.+)"
Non of these seem to be able to parse the datetime correctly and since I have no idea how to do that, can a REGEX pro please write me a ParsingFormat that would be able to parse the above FTP response?

Both a hand-check and irb check of the fourth format shows that it does match:
> re=/(?<dir>[\-d])(?<permission>([\-r][\-w][\-xs]){3})\s+\d+\s+\w+\s+\w+\s+(?<size>\d+)\s+(?<timestamp>\w+\s+\d+\s+\d{1,2}:\d{2})\s+(?<name>.+)/
=> /(?<dir>[\-d])(?<permission>([\-r][\-w][\-xs]){3})\s+\d+\s+\w+\s+\w+\s+(?<size>\d+)\s+(?<timestamp>\w+\s+\d+\s+\d{1,2}:\d{2})\s+(?<name>.+)/
> m=re.match("-rw-r--r-- 1 user user 594 Jun 11 03:44 random_log.file")
=> #<MatchData "-rw-r--r-- 1 user user 594 Jun 11 03:44 random_log.file" dir:"-" permission:"rw-r--r--" size:"594" timestamp:"Jun 11 03:44" name:"random_log.file">
> m['dir']
=> "-"
> m['permission']
=> "rw-r--r--"
> m['size']
=> "594"
> m['timestamp']
=> "Jun 11 03:44"
> m['name']
=> "random_log.file"
>
I think the pile of regular expressions are fine. Perhaps you need to look elsewhere for the problem.

Related

Regex for filtering out the groups and if there is specific string in the group then extract that into another group

Hi I am trying to match 3 logs with regex the issue I face is that it is not dynamic as if the value changes then regex do not work on that group.
I think the practical will give better understanding. https://regex101.com/r/sdoZaH/1
In this, Group 1 <address is working on 1st log line only, it is not able to identify string in 2nd line
In <message> group also, I want if there is IP addr then it should be separate group else it has covered the remaining part of it.
How do I make it dynamic that it matches all lines.
The lines I am trying to match
Mar 21 23:31:19 c10sw1 raslogd: AUDIT, 2022/03/21-23:31:19 (PDT), [SEC-3020], INFO, SECURITY, admin/admin/test.domain.com/ssh/CLI, ad_0/c10sw1/FID 128, 8.2.1c, , , , , , , Event: login, Status: success, Info: Successful login attempt via REMOTE, IP Addr: test.domain.com.
Mar 21 23:37:13 c10-M1000e-SW1 raslogd: AUDIT, 2022/03/21-23:37:13 (PDT), [SEC-3022], INFO, SECURITY, admin/admin/test.domain.com/ssh/CLI, ad_0/c10-M1000e-SW1/FID 128, 8.2.2b, , , , , , , Event: logout, Status: success, Info: Successful logout by user [admin].
Mar 21 23:37:13 c10-M1000e-SW1 raslogd: AUDIT, 2022/03/21-23:37:13 (PDT), [SEC-3022], INFO, SECURITY, admin/admin/test.domain.com/ssh/CLI, ad_0/c10-M1000e-SW1/FID 128, 8.2.2b, , , , , , , Event: logout, Status: success, Info: Successful logout by user [admin].
Please try the following pattern:
^[A-Za-z]+[\d\s:]+(?<address>\D\w+)\s.+?,\s(?<time>\d+\/\d+\/\d+\-\d+\:\d+\:\d+).+?\s\w+\/.+?\/(?<domain>.+?)\/(?<destinationprocess>.+?)\/(?<sourceprocess>.+?),.+Event:\s(?<eventtype>.+?),.+Status:\s(?<status>.+?),\sInfo:\s(?<message>.+)$
Please could you put various valid and invalid strings in the question.

Why might mutt email be accepted/rejected by windows recipient as a function of alphabetic string content in the body of html file being sent?

Calling mutt-1.5.24 on linux.
I'm seeing some very odd behavior when emailing an html file from linux to windows/outlook using mutt on linux. Example of the mutt call...
mutt -e 'set content_type=text/html' -s 'yuk, yuk, yuk' 'moe.howard#stooge.com' < a.html
The email does not show up on the windows side. mutt returned no error or warning on the linux side. Now, here's the odd part... If I global/replace the string "pcie" in the body of the html to "pcix", the email appears on the windows/outlook side just fine. OR... if I global/replace "ity" to "..." it also works fine (even if I leave "pcie" alone). But changing "ity" to "xxx" fails. Very odd character sensitivity behavior like this.
In my home dir on the linux side I see a file ~/sent getting created. The header (whether the email made it to the windows/outlook side or not) looks like...
From m.howard#theserver.stooge.com Thu Jan 28 18:49:29 2021
Date: Thu, 28 Jan 2021 18:49:29 -0500
From: Moe Howard <mhoward#theserver.stooge.com>
To: moe.howard#stooge.com
Subject: yuk, yuk, yuk
Message-ID: <20210128234929.GA48266#atletx7-reg062.amd.com>
MIME-Version: 1.0
Content-Type: text/html; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.24 (2015-08-30)
Status: RO
Content-Length: 20537
Lines: 122
<html>
....etc... the rest of the html which firefox reads just fine if I get rid of the header above
Grasping at straws. Looking at the "charset=us-ascii" in the "sent" file thinking it should be something else ? So I tried providing other options by adding "-e 'set assumed_charset=utf-8:us-ascii'" to the command. No luck.
Any ideas what might be happening and what a solution might be ?
Figured it out. All my email actually arrived in Outlook. It's just that it got sent to the junk folder, labeled as spam. So if the body of the html contains "pcie", it's spam. But "pcix" is not. Got to go undo that now.

Scripting the cisco banner with Net::Appliance::Session

Has anyone ran into this issue? When the script gets to the banner text the script just hangs.
I am using Net::Appliance::Session
Here is the error I get in debug. The rest of the script inserts code perfectly. I did test what I read about adding a # to the banner for each line. Same result.
banner login +
[ 4.092880] tr nope, doesn't (yet) match (?-xism:[\/a-zA-Z0-9._\[\]-]+ ?(?:\(config[^)]*\))? ?[#>] ?$)
[ 4.093124] du SEEN:
banner login +
[ 4.093304] tr nope, doesn't (yet) match (?-xism:[\/a-zA-Z0-9._\[\]-]+ ?(?:\(config[^)]*\))? ?[#>] ?$)
[ 4.305872] du SEEN:
Enter TEXT message. End with the character '+'
[ 4.306121] tr nope, doesn't (yet) match (?-xism:[\/a-zA-Z0-9._\[\]-]+ ?(?:\(config[^)]*\))? ?[#>] ?$)
We had an issue when accessing the device : 10.49.216.74
The reported error was : read timed-out at /usr/lib/perl5/site_perl/5.10.0/Net/CLI/Interact/Transport/Wrapper/Net_Telnet.pm line 35
Here is a snip of code.
my $session_obj = Net::Appliance::Session->new(
host => $ios_device_ip,
transport => 'Telnet',
personality => 'ios',
timeout => 60,
);
#interace
$session_obj->set_global_log_at('debug');
eval {
# try to login to the ios device, ignoring host check
$session_obj->connect(
username => $ios_username,
password => $ios_password,
#SHKC => 0
);
# get our running config
$version_info = $session_obj->begin_privileged;
$session_obj->cmd('conf t');
$session_obj->cmd('line con 0');
$session_obj->cmd('exec-character-bits 8');
$session_obj->cmd('international');
$session_obj->cmd('line vty 0 4');
$session_obj->cmd('exec-character-bits 8');
$session_obj->cmd('international');
$session_obj->cmd('line vty 5 15');
$session_obj->cmd('exec-character-bits 8');
$session_obj->cmd('international');
$session_obj->cmd('exit');
$session_obj->cmd('no banner login');
$session_obj->cmd('banner login +');
$session_obj->cmd('*************************************************************************');
$session_obj->cmd('* test *');
$session_obj->cmd('* *');
$session_obj->cmd('*************************************************************************');
$session_obj->cmd('+');
$session_obj->cmd('no banner MOTD');
$session_obj->cmd('banner motd +');
$session_obj->cmd('*************************************************************************');
$session_obj->cmd('* test *');
$session_obj->cmd('* *');
$session_obj->cmd('*************************************************************************');
$session_obj->cmd('+');
$session_obj->cmd('exit');
$session_obj->cmd('write memory');
$session_obj->end_privileged;
# close down our session
$session_obj->close;
};
If you look at the regexp that matches the prompt before sending a new command you'll see that it requires a specific string that closely matches user, privileged or config mode of a router.
When you send the banner login + command you get the Enter TEXT message. End with the character '+' followed by blank line from a router (instead of Router(config)# that your script expects. After a while it just times out since there is no match for the regexp.
The easiest solution is to try to send the whole banner in one command. Try concatenating your banner with a \r in one string and sending it as a one command that looks like (note the double quotes):
$session_obj->cmd("banner login + line1 \r line2 \r line3\r +");
Took way too long to figure this out... spaces are not your friend.
$session_obj->cmd("banner login + \rline1\rline2\rline3\r+");
Example with my orginal problem:
$session_obj->cmd('*************************************************************************\r* test *\r* *\r*************************************************************************');

Perl Regex issues

why isn't this perl REGEX working? i'm grabbing the date and username (date works fine), but it will grab all the usernames then when it hits bob.thomas and grabs the entire line
Code:
m/^(.+)\s-\sUser\s(.+)\s/;
print "$2------\n";
Sample Data:
Feb 17, 2013 12:18:02 AM - User plasma has logged on to client from host
Feb 17, 2013 12:13:00 AM - User technician has logged on to client from host
Feb 17, 2013 12:09:53 AM - User john.doe has logged on to client from host
Feb 17, 2013 12:07:28 AM - User terry has logged on to client from host
Feb 17, 2013 12:04:10 AM - User bob.thomas has been logged off from host because its web server session timed out. This means the web server has not received a request from the client in 3 minute(s). Possible causes: the client process was killed, the client process is hung, or a network problem is preventing access to the web server.
for the user that asked for the full code
open (FILE, "log") or die print "couldn't open file";
$record=0;
$first=1;
while (<FILE>)
{
if(m/(.+)\sto now/ && $first==1) # find the area to start recording
{
$record=1;
$first=0;
}
if($record==1)
{
m/^(.+)\s-\sUser\s(.+)\s/;
<STDIN>;
print "$2------\n";
if(!exists $user{$2})
{
$users{$2}=$1;
}
}
}
.+ is greedy, it matches the longest possible string. If you want it to match the shortest, use .+?:
/^(.+)\s-\sUser\s(.+?)\s/;
Or use a regexp that doesn't match whitespace:
/^(.+)\s-\sUser\s(\S+)/;
Use the reluctant/ungreedy quantifier to match up until the first occurrence rather than the last. You should do this in both cases just in case the "User" line also has " - User "
m/^(.+?)\s-\sUser\s(.+?)\s/;

Separating HTTP Response Body from Header in C++

I'm currently writing my own C++ HTTP class for a certain project. And I'm trying to find a way to separate the response body from the header, because that's the only part I need to return.
Here's a sample of the raw http headers if you're not familiar with it:
HTTP/1.1 200 OK
Server: nginx/0.7.65
Date: Wed, 29 Dec 2010 06:13:07 GMT
Content-Type: text
Connection: keep-alive
Vary: Cookie
Content-Length: 82
Below that is the HTML/Response body. What would be the best way to do this? I'm only using Winsock library for the requests by the way (I don't even think this matters).
Thanks in advance.
HTTP headers are terminated by the sequence \r\n\r\n (a blank line). Just search for that, and return everything after. (It may not exist of course, e.g. if it was in response to a HEAD request.)
Do you need to roll your own? There are C/C++ libraries out there for doing HTTP, e.g. libcurl. If you need to support the full gamut of HTTP, then it's not always a simple delineation. You might also have to cater, for example, for chunked encoding.
DO IF Socket.IsServerReady(Sock) THEN Text = text + Socket.Read(Sock, 65000) 'print text '' 32000 bytes... whatever they give us Bytes = bytes + Socket.Transferred StatusBar.Panel(0).Caption = "Bytes Read: " + STR$(Bytes)
END IF
'RichEdit.addstrings text zzz=Bytes LOOP UNTIL Socket.Transferred = 0 RichEdit.Clear RichEdit.Text = text Socket.Close(Sock) dim mem as qmemorystream dim S$ as string S$ = text for n=0 to 400 buff$=mid$(S$,n,5)
if buff$="alive" then' found end of headers richedit1.addstrings (buff$) richedit1.addstrings (mid$(S$,n,9)) richedit1.addstrings str$(n+9) zzz=n+8'offset + 8 bit space after headers and before Bof end if next n Mem.WriteStr(S$, LEN(S$))'write entire file to memory Mem.Position = zzz ' use offset as Start position S$ = Mem.ReadStr(LEN(S$)) ' read rest of file into string till Eof Mem.Close' dont forget to close 'PRINT S$ '' print it
Filex.Open("c:/CAP.AVI", fmCreate)'create file on system filex.WriteBinStr(S$,len(S$)-zzz)' write to it filex.close 'dont forget to close