Regex on Spark RDD[String] with Regex on multiline - regex

I'm trying a parse a log file in Spark 1.6 using scala, here is the sample data
2017-02-04 04:48:11,123 DEBUG [org.quartz.core.QuartzSchedulerThread] - <batch acquisition of 0 triggers>
2017-02-04 04:48:20,892 INFO [org.jasig.inspektr.audit.support.Slf4jLoggingAuditTrailManager] - <Audit trail record BEGIN
=============================================================
WHO: audit:unknown
WHAT: TGT-7d937-yRqp6ObM7JOtkUZ7Ff4yEo95-casino1.example.org
ACTION: TICKET_GRANTING_TICKET_DESTROYED
APPLICATION: CASINO
WHEN: Sat Feb 04 04:48:20 AEDT 2017
CLIENT IP ADDRESS: 160.50.201.557
SERVER IP ADDRESS: login.cfu.asg
=============================================================
>
2017-02-04 04:48:32,165 INFO [org.jasig.cas.services.DefaultServicesManagerImpl] - <Reloading registered services.>
2017-02-04 04:48:32,167 INFO [org.jasig.casino.services.DefaultServicesManagerImpl] - <Loaded 2 services.>
2017-02-04 04:48:38,889 DEBUG [org.quartz.core.QuartzSchedulerThread] - <batch acquisition of 1 triggers>
2017-02-04 04:48:52,790 DEBUG [org.quartz.core.QuartzSchedulerThread] - <batch acquisition of 0 triggers>
2017-02-04 04:48:52,790 DEBUG [org.quartz.core.JobRunShell] - <Calling execute on job DEFAULT.serviceRegistryReloaderJobDetail>
2017-02-04 04:48:52,790 INFO [org.jasig.casino.services.DefaultServicesManagerImpl] - <Reloading registered services.>
2017-02-04 04:48:52,792 DEBUG [org.jasig.casino.services.DefaultServicesManagerImpl] - <Adding registered service ^(https?|imaps?)://.*>
2017-02-04 04:48:52,792 DEBUG [org.jasig.casino.services.DefaultServicesManagerImpl] - <Adding registered service
2017-02-04 04:48:52,792 INFO [org.jasig.casino.services.DefaultServicesManagerImpl] - <Loaded 2 services.>
2017-02-04 04:49:14,365 INFO [org.jasig.casino.services.DefaultServicesManagerImpl] - <Reloading registered services.>
2017-02-04 04:49:14,366 INFO [org.jasig.casino.services.DefaultServicesManagerImpl] - <Loaded 2 services.>
2017-02-04 04:49:19,699 DEBUG [org.quartz.core.QuartzSchedulerThread] - <batch acquisition of 0 triggers>
2017-02-04 04:49:43,465 DEBUG [org.quartz.core.QuartzSchedulerThread] - <batch acquisition of 0 triggers>
2017-02-04 04:50:00,978 INFO [org.jasig.casino.authentication.PolicyBasedAuthenticationManager] - <JaasAuthenticationHandler successfully authenticated >
2017-02-04 04:50:00,978 INFO [org.jasig.casino.authentication.PolicyBasedAuthenticationManager] - <Authenticated 3785973 with credentials.>
2017-02-04 04:50:00,978 INFO [org.jasig.inspektr.nhgij.support.Slf4jLogggbhAuditTrailManaver] - <Audit trail record BEGIN
=============================================================
WHO: z3705z73
WHAT: supplied credentials: [d37c5973]
ACTION: AUTHENTICATION_SUCCESS
APPLICATION: casinoINO
WHEN: Sat Feb 04 04:50:00 AEDT 2017
CLIENT IP ADDRESS: 101.181.28.555
SERVER IP ADDRESS: login.cfu.asg
=============================================================
>
And the data goes on, there can be other log data inbetween the patterns which is not relevant for my parsing though. I have about 40GB of files each contains one day's data.
All these files are gzip compressed. I tried using sc.wholeTextFiles to get a pair RDD, but running into Java heapspace errors as each file goes between 400mb to 800mb (uncompressed).
So i started using sc.textFile and experimenting with one reading one file. I can create a RDD[String], luckily sc.textFile does not return me any heapspace issues when run any action on this RDD.
Here is the code i tried.
val casinop2 = sc.wholeTextFiles("/logdata/casino/catalina.out-20150228.gz")
val casop = casinop2.flatMap(x=>x.split("\n"))
.filter(x=> !(x.contains("Reloading registered services") || x.contains("Loaded 2 services.") || x.contains("DEBUG") || x.contains("ERROR") || x.contains("java.lang.RuntimeException") || x.contains("Caused by:") || x.contains("Granted ticket") || x.contains("java.lang.IllegalStateException") || x.startsWith("\t") || x.contains("org.jasig.cas.authentication.PolicyBasedAuthenticationManager") ))
val pattern = new Regex("""((\d{4})-(\d{2})-\d{2}\s\d{2}:\d{2}:\d{2}),\d{3}\s+(\w+)\s+\[(.*)\]\s+\-\s+\<.*\s\=*\s+([W][H][O]\:)\s+(.*)\s+([W][H][A][T]\:)\s+(.*)\s+([A][C][T][I][O][N]\:)\s+(.*)\s+([A][P][P][L][I][C][A][T][I][O][N]\:)\s+(.*)\s+([W][H][E][N]\:)\s+(.*)\s+([A-Z\s]{17}\:)\s+(.*)\s+([A-Z\s]{17}\:)\s+(.*)\s+\=*\s\s\>""")
pattern: scala.util.matching.Regex = ((\d{4})-(\d{2})-\d{2}\s\d{2}:\d{2}:\d{2}),\d{3}\s+(\w+)\s+\[(.*)\]\s+\-\s+\<.*\s\=*\s+([W][H][O]\:)\s+(.*)\s+([W][H][A][T]\:)\s+(.*)\s+([A][C][T][I][O][N]\:)\s+(.*)\s+([A][P][P][L][I][C][A][T][I][O][N]\:)\s+(.*)\s+([W][H][E][N]\:)\s+(.*)\s+([A-Z\s]{17}\:)\s+(.*)\s+([A-Z\s]{17}\:)\s+(.*)\s+\=*\s\s\>
case class MLog(datetime: String, message: String, process: String, who: String, what: String, action: String, application: String, when: String, clientipaddress: String, serveripaddress: String,year: String, month: String)
pattern.findAllMatchIn(casop.collect.toString).toList
Now the last statement throws me heapspace error. The reason i want rdd into a string variable is regex needs multi line input, not single line. For single line, i would use map, flatmap etc.
The output i should get from the log file should be
|2017-02-04 04:54:41| INFO|org.jasig.inspekt...| s4542732|supplied credenti...|AUTHENTICATION_SU...| CAS|Sat Feb 04 04:54:...| 175.163.28.77|login.vu.edu.au|2017| 02|
|2017-02-04 04:54:41| INFO|org.jasig.inspekt...| s4542732|TGT-78959-EX63Wf2...|TICKET_GRANTING_T...| CAS|Sat Feb 04 04:54:...| 175.163.28.77|login.vu.edu.au|2017| 02|
|2017-02-04 04:54:41| INFO|org.jasig.inspekt...| 4542732|ST-474481-jTxCJFB...|SERVICE_TICKET_CR...| CAS|Sat Feb 04 04:54:...| 175.163.28.77|login.vu.edu.au|2017| 02|
|2017-02-04 04:54:44| INFO|org.jasig.inspekt...|audit:unknown|ST-474481-jTxCJFB...|SERVICE_TICKET_VA...| CAS|Sat Feb 04 04:54:...| 203.13.194.68|login.vu.edu.au|2017| 02|
|2017-02-04 04:55:02| INFO|org.jasig.inspekt...| s3785573|supplied credenti...|AUTHENTICATION_SU...| CAS|Sat Feb 04 04:55:...| 101.181.28.125|login.vu.edu.au|2017| 02|
|2017-02-04 04:55:02| INFO|org.jasig.inspekt...| s3785573|TGT-78960-yWaWkcN...|TICKET_GRANTING_T...| CAS|Sat Feb 04 04:55:...| 101.181.28.125|login.vu.edu.au|2017| 02|
|2017-02-04 04:55:02| INFO|org.jasig.inspekt...| 3785573|ST-474482-rARxdUG...|SERVICE_TICKET_CR...| CAS|Sat Feb 04 04:55:...| 101.181.28.125|login.vu.edu.au|2017| 02|
|2017-02-04 04:55:02| INFO|org.jasig.inspekt...|audit:unknown|ST-474482-rARxdUG...|SERVICE_TICKET_VA...| CAS|Sat Feb 04 04:55:...| 203.13.194.68|login.vu.edu.au|2017| 02|
+-------------------+-------+--------------------+-------------+--------------------+--------------------+-----------+--------------------+---------------+---------------+----+-----+
How can we read a multiline input and feed to regex?

I have fixed and improved your regex and it should work now for your last logs that are on several lines:
The regex is the following beast:
(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}),\d{3}\s+(\w+)\s+\[(.*)\]\s+\-\s+<[^>]*\s\=*\s+WHO\:\s+([^>\n]*)\s+WHAT\:\s+([^>\n]*)\s+ACTION\:\s+([^>\n]*)\s+APPLICATION\:\s+([^>\n]*)\s+WHEN\:\s+([^>\n]*)\s+([A-Z\s]{17}\:)\s+([^>\n]*)\s+([A-Z\s]{17}\:)\s+([^>\n]*)\s+\=*\s\s>
I have tried it with your logs by using the following replacement pattern that you should adapt depending on your exact needs:
\1 | \2 | \3 | WHO:\4 | WHAT: \5 | ACTION: \6 | APPLICATION: \7 | WHEN: \8 | \9 $10 | $11 $12
Here is the result:
Last but not least, you might have to change your heapsize: --executor-memory 10g

Related

Vtiger 7.2 query operations to REST API returning 500 errors

I've recently upgraded from Vtiger 6 to 7.2 (a clean installation) and all my requests to the REST API that use the query operation are no longer working. It doesn't matter which module the request is for e.g. Contacts, Leads, Accounts. All other types of operations are working e.g. retrieve, describe, but a query such as select * from Contacts where email = 'foo#bar.com'; will fail with a 500 Internal Server Error returned from the Vtiger server.
Here's an example of my HTTP request (query param is left unencoded for readability):
https://crm.myendpoint.com/webservice.php?sessionName=[mysession]&operation=query&query=select * from Contacts where email = 'foo#bar.com';
The code I'm using to make my queries is completely unmodified from when I was using version 6 of Vtiger and the requests were working fine then. I've switched on debug logging on the server but there's no errors.
The server is receiving and processing the request though. At one point, it dumps the data to the log for the Contact that I'm querying (which all looks correct), and then here's the last few lines of the logging before it ends:
Mon Jan 20 17:13:41 2020,292 [8010] DEBUG webservice - Entering isPermitted(Contacts,DetailView,) method ...
Mon Jan 20 17:13:41 2020,292 [8010] DEBUG webservice - Entering getActionid(DetailView) method ...
Mon Jan 20 17:13:41 2020,292 [8010] INFO webservice - get Actionid DetailView
Mon Jan 20 17:13:41 2020,292 [8010] INFO webservice - action id selected is 4
Mon Jan 20 17:13:41 2020,292 [8010] DEBUG webservice - Exiting getActionid method ...
Mon Jan 20 17:13:41 2020,292 [8010] DEBUG webservice - Exiting isPermitted method ...
Mon Jan 20 17:13:41 2020,293 [8010] DEBUG webservice - Entering getColumnFields(Accounts) method ...
Mon Jan 20 17:13:41 2020,293 [8010] DEBUG webservice - in getColumnFields Accounts
Mon Jan 20 17:13:41 2020,293 [8010] DEBUG webservice - Prepared sql query being executed : SELECT tabid, fieldname, fieldid, fieldlabel, columnname, tablename, uitype, typeofdata, presence
FROM vtiger_field WHERE tabid in (?)
Mon Jan 20 17:13:41 2020,293 [8010] DEBUG webservice - Prepared sql query parameters : [6]
Mon Jan 20 17:13:41 2020,293 [8010] DEBUG webservice - Exiting getColumnFields method ...
Mon Jan 20 17:13:41 2020,294 [8010] DEBUG webservice - Entering getColumnFields(Accounts) method ...
Mon Jan 20 17:13:41 2020,294 [8010] DEBUG webservice - in getColumnFields Accounts
Mon Jan 20 17:13:41 2020,294 [8010] DEBUG webservice - Exiting getColumnFields method ...
Mon Jan 20 17:13:41 2020,294 [8010] DEBUG webservice - Prepared sql query being executed : select 1 from vtiger_crmentity where crmid=? and deleted=0 and setype='Accounts'
Mon Jan 20 17:13:41 2020,294 [8010] DEBUG webservice - Prepared sql query parameters : [9637]
Mon Jan 20 17:13:41 2020,294 [8010] DEBUG user - Entering Users() method ...
Mon Jan 20 17:13:41 2020,294 [8010] DEBUG webservice - Entering getColumnFields(Users) method ...
Mon Jan 20 17:13:41 2020,294 [8010] DEBUG webservice - in getColumnFields Users
Mon Jan 20 17:13:41 2020,294 [8010] DEBUG webservice - Exiting getColumnFields method ...
Mon Jan 20 17:13:41 2020,295 [8010] DEBUG user - Exiting Users() method ...
Mon Jan 20 17:13:41 2020,295 [8010] DEBUG webservice - Entering getColumnFields(Users) method ...
Mon Jan 20 17:13:41 2020,295 [8010] DEBUG webservice - in getColumnFields Users
Mon Jan 20 17:13:41 2020,295 [8010] DEBUG webservice - Exiting getColumnFields method ...
Mon Jan 20 17:13:41 2020,295 [8010] DEBUG webservice - Prepared sql query being executed : select 1 from vtiger_users where id=? and deleted=0 and status='Active'
Mon Jan 20 17:13:41 2020,295 [8010] DEBUG webservice - Prepared sql query parameters : [1]
Mon Jan 20 17:13:41 2020,295 [8010] DEBUG webservice - Prepared sql query being executed : select groupname from vtiger_groups where groupid = ?
Mon Jan 20 17:13:41 2020,295 [8010] DEBUG webservice - Prepared sql query parameters : [1]
Mon Jan 20 17:13:41 2020,295 [8010] DEBUG webservice - Prepared sql query being executed : select first_name from vtiger_users where id = ?
Mon Jan 20 17:13:41 2020,295 [8010] DEBUG webservice - Prepared sql query parameters : [1]
I was thinking that this might be some kind of permission issue, but surely the server wouldn't return a 500 in that situation. In any case, I've tried running queries with 2 different users, both of which have an admin role.
This problem only happens with query operations, irrespective of the module being queried. Is there any way I can debug this further?
This fix got it working.
In summary, non-existent method getAllAccessibleTags being called at line 199 in include/Webservices/VtigerModuleOperation.php needs to be changed to getAllAccessible.
This bug was addressed by issue #1217 Fetch tag details of records for requests made via Webservices and the fix should be included in Vtiger 7.2.1 which has not yet been released at the time of writing. Here are the changes implemented in commits 7881fde4 and 072b5cee:
include/Webservices/VtigerModuleOperation.php [176]
$result = $this->pearDB->pquery($mysql_query, array());
+ $tableIdColumn = $meta->getIdColumn();
$error = $this->pearDB->hasFailedTransaction();
include/Webservices/VtigerModuleOperation.php [191]
- if(!$meta->hasPermission(EntityMeta::$RETRIEVE,$row["crmid"])){
+ if(!$meta->hasPermission(EntityMeta::$RETRIEVE,$row[$tableIdColumn])){
include/Webservices/VtigerModuleOperation.php [194]
- $output[] = DataTransform::sanitizeDataWithColumn($row,$meta);
+ $output[$row[$tableIdColumn]] = DataTransform::sanitizeDataWithColumn($row,$meta);
modules/Vtiger/models/Tag.php [302]
+
+ /**
+ * Function used to return tags for list for records
+ * #param <Array> $records - record ids
+ * #return <Array> tags
+ */
+ public static function getAllAccessibleTags($records) {
+ $tagsList = array();
+ if(count($records) == 0) return $tagsList;
+
+ $currentUser = Users_Record_Model::getCurrentUserModel();
+
+ $db = PearDatabase::getInstance();
+ $query = "SELECT tag,object_id FROM vtiger_freetags
+ INNER JOIN vtiger_freetagged_objects ON vtiger_freetags.id = vtiger_freetagged_objects.tag_id
+ WHERE (vtiger_freetagged_objects.tagger_id = ? OR vtiger_freetags.visibility='public')
+ AND vtiger_freetagged_objects.object_id IN
+ (" . generateQuestionMarks($records) . ")";
+ $params = array($currentUser->getId());
+ $params = array_merge($params, $records);
+
+ $result = $db->pquery($query , $params);
+ $num_rows = $db->num_rows($result);
+
+
+ for($i=0; $i<$num_rows; $i++) {
+ $tagName = decode_html($db->query_result($result, $i, 'tag'));
+ $record = decode_html($db->query_result($result, $i, 'object_id'));
+
+ if(empty($tagsList[$record])) {
+ $tagsList[$record] = $tagName;
+ } else {
+ $tagsList[$record] .= ','.$tagName;
+ }
+ }
+ return $tagsList;
+ }

how to pre-fix a piece of text in github "git log" using shell-script

I need to make a github commit (the text), from the git command git log into a link in an email. So the recipient can click on the link and go directly to the change.
I receive a long list containing lines with the text:
commit some_long_string_of_hexadecimals
and I need to transform this into:
commit https://github.com/account/repo/commit/some_long_string_of_hexadecimals
The log I am receiving contain n-amount of these logs, so I need the script to do this for all instances of this (some_long_string_of_hexadecimals).
Here are a few example log statements:
commit a98a897a67896a987698a769786a987a6987697a6
Author: Some Person <some#email.com>
Date: Thu Sep 29 09:48:52 2016 +0200
long message describing change.
commit a98a897a67896a987698a769786a987a6987697a6
Author: Some Person <some#email.com>
Date: Thu Sep 29 09:48:52 2016 +0200
more description
I'd like it to look like this:
commit https://github.com/account/repo/commit/a98a897a67896a987698a769786a987a6987697a6
Author: Some Person <some#email.com>
Date: Thu Sep 29 09:48:52 2016 +0200
added handling of running tests from within a docker container
How do I achieve this using a shell command ?
Thanks in advance.
awk '$1 == "commit" {$2 = "https://github.com/account/repo/commit/" $2} 1'
check if field 1 equals "commit"
if so, prepend to field 2
if line matched, print modified line, else print line as is

How to Mass Change with Regex

Before
Live! => [ cambax83#gmail.com:xxxxx ] + [Name: Cameron - Following: 225 - Follower: 2 - Bio: GAFY - Location: Australia - URL: Empty - Translator: No - Verified: No (Joined At Wed Oct 12 21:54:04 +0000 2011 [1256 Days])]
Live! => [ kingrozny#hotmail.com:xxxxx ] + [Name: Edgar - Following: 236 - Follower: 9 - Bio: Empty - Location: Empty - URL: Empty - Translator: No - Verified: No (Joined At Sun Jan 15 07:45:52 +0000 2012 [1162 Days])]
Live! => [ voonshin#gmail.com:xxxxx ] + [Name: Voonshin - Following: 381 - Follower: 1 - Bio: Empty - Location: Empty - URL: Empty - Translator: No - Verified: No (Joined At Thu Sep 20 04:14:04 +0000 2012 [913 Days])]
Live! => [ y0ng4n#gmail.com:xxxxx ] + [Name: Surabaya Jaya - Following: 539 - Follower: 0 - Bio: Surabaya Jaya merupakan Distributor Peralatan Safety, Sarung tangan, Terpal, dsb. Distributor kita berpusat di kota Surabaya dan memiliki cabang di Balikpapan. - Location: Balikpapan - URL: Empty - Translator: No - Verified: No (Joined At Fri Aug 01 11:38:33 +0000 2014 [233 Days])]
Live! => [ Honeybee104#hotmail.com:xxxxx ] + [Name: Diane - Following: 84 - Follower: 1 - Bio: Empty - Location: Empty - URL: Empty - Translator: No - Verified: No (Joined At Fri Jul 25 23:03:26 +0000 2014 [239 Days])]
After :
cambax83#gmail.com:xxxxx
kingrozny#hotmail.com:com:xxxxx
voonshin#gmail.com:com:xxxxx
y0ng4n#gmail.com:com:xxxxx
Honeybee104#hotmail.com:com:xxxxx
How to mass Clean with Regex, please with demo on https://regex101.com/ ! :)
see demo here https://regex101.com/r/cN7vT8/1
/(\w+#\w+\.\w{2,4}:xxxxx)/gm
and for replace demo here https://regex101.com/r/cN7vT8/2
var re = /.*\[ (\w+#\w+\.\w{2,4}:xxxxx).*/gm;
var subst = '$1';
var result = str.replace(re, subst);
According the poor spec, i'll do https://regex101.com/r/mI4mJ6/1
\[\s*(.*xxxx) \]
Find : ^[^\[]*\[\s*|\s*\].*$
Replace:Empty string.
Try this.See demo.
https://regex101.com/r/pT4tM5/32
http://regexr.com/3alde

fail2ban custom filter on multiline

Is it possible to catch authentication failure on multiple line with fail2ban regex?
Here is the example :
Sep 08 11:54:59.207814 afpd[16190] {dsi_tcp.c:241} (I:DSI): AFP/TCP session from 10.0.71.149:53863
Sep 08 11:54:59.209504 afpd[16190] {uams_dhx2_pam.c:329} (I:UAMS): DHX2 login: thierry
Sep 08 11:54:59.272092 afpd[16190] {uams_dhx2_pam.c:214} (I:UAMS): PAM DHX2: PAM Success
Sep 08 11:55:01.522258 afpd[16190] {uams_dhx2_pam.c:666} (I:UAMS): DHX2: PAM_Error: Authentication failure
Thanks
Yeah sure, fail2ban uses python regex with the multiline option. In your case, try:
"afpd\[[0-9]+\] {dsi_tcp.c:241} \(I:DSI\): AFP/TCP session from <HOST>:[0-9]+\n.*afpd\[[0-9]+\] {uams_dhx2_pam.c:[0-9]+}.*\n.*afpd\[[0-9]+\] {uams_dhx2_pam.c:[0-9]+}.*\n.*afpd\[[0-9]+\] {uams_dhx2_pam.c:[0-9]+}.*PAM_Error: Authentication failure"
As you can see, you just have to put \n where needed. Don't forgot to set the maxlines option to 4 in your case, so that fail2ban uses 4 lines to match the regex. Your filter file should look something like:
[Init]
maxlines = 4
[Definition]
failregex = "afpd\[[0-9]+\] {dsi_tcp.c:241} \(I:DSI\): AFP/TCP session from <HOST>:[0-9]+\n.*afpd\[[0-9]+\] {uams_dhx2_pam.c:[0-9]+}.*\n.*afpd\[[0-9]+\] {uams_dhx2_pam.c:[0-9]+}.*\n.*afpd\[[0-9]+\] {uams_dhx2_pam.c:[0-9]+}.*PAM_Error: Authentication failure"
ignoreregex =
Use fail2ban-regex to test your regex.
Was just looking for a solution for the same problem - but I think that answer given by wpoely86 can lead to blocking innocent IPs - if there are multiple IPs connecting at more or less the same time.
Sep 08 11:54:59.207814 afpd[16190] {dsi_tcp.c:241} (I:DSI): AFP/TCP session from 10.0.71.149:53863
Sep 08 11:54:59.207815 afpd[99999] {dsi_tcp.c:241} (I:DSI): AFP/TCP session from 10.10.10.10:53864
Sep 08 11:54:59.209504 afpd[16190] {uams_dhx2_pam.c:329} (I:UAMS): DHX2 login: thierry
Sep 08 11:54:59.272092 afpd[16190] {uams_dhx2_pam.c:214} (I:UAMS): PAM DHX2: PAM Success
Sep 08 11:55:01.522258 afpd[16190] {uams_dhx2_pam.c:666} (I:UAMS): DHX2: PAM_Error: Authentication failure
Sep 08 11:55:01.522258 afpd[99999] {uams_dhx2_pam.c:666} (I:UAMS): DHX2: PAM_success: Authentication succeeded
Above, the offending connection came from 10.0.71.149. However, the regex would block 10.10.10.10. In other words, the regex would need to distinguish between afpd[99999] and afpd[16190] (which identify the PID of the afpd process).

RegEx match IP on Mail-Header Received:

I try to fiddle a RegEx, which returns me only the Sender IP Address:
http://regexr.com?38atl
This is the RegEx I build, but cant complete:
(?<=\bReceived: from .*\[)(?:\d{1,3}\.){3}\d{1,3}
or
(?<=\bReceived: from )(.*\[)(?:\d{1,3}\.){3}\d{1,3}
So it should only match this (on lines beginning with: Received: from)
127.0.0.1
127.0.0.1
21.22.23.24
And this are a example Mail-Headers i'm search in:
To: a#domain.de
Return-Path: <t#domain.de>
X-Original-To: a#domain.de
Delivered-To: c#domain.tld
Received: from localhost (localhost [127.0.0.1])
by mail1.domain.tld (Postfix) with ESMTP id 3fT3TR72zNz8m8
for <a#domain.de>; Tue, 18 Feb 2014 14:54:35 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at mail1.domain.tld
X-Spam-Flag: YES
X-Spam-Score: 5.773
X-Spam-Level: *****
X-Spam-Status: Yes, score=5.773 tagged_above=1 required=4.5
tests=[BAYES_05=-0.5, MISSING_MID=0.497, RCVD_IN_PBL=3.335,
RCVD_IN_RP_RNBL=1.31, RDNS_DYNAMIC=0.982, TO_NO_BRKTS_DYNIP=0.139,
T_RCVD_IN_SEMBLACK=0.01] autolearn=no
Received: from mail1.domain.tld ([127.0.0.1])
by localhost (mail1.domain.tld [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id lDJqiZjBn2t4 for <a#domain.de>;
Tue, 18 Feb 2014 14:54:34 +0100 (CET)
Received: from mail.domain.tld (pAAAAAAAA.dip0.t-ipconnect.de [21.22.23.24])
by mail1.domain.tld (Postfix) with SMTP id 3fT3TQ4Nwgz8m5
for <a#domain.de>; Tue, 18 Feb 2014 14:54:34 +0100 (CET)
Date: Tue, 18 Feb 2014 15:02:11 +0100
Sender: "From" <t#domain.de>
From: "From" <t#domain.de>
Subject: Subbbb (192.168.123.123)
Reply-To: t#domain.de
MIME-Version: 1.0
Content-type: text/plain; charset=UTF-8
Message-Id: <3fT3TR72zNz8m8#mail1.domain.tld>
Try this expression:
Received: +from[^\n]*?\[([0-9\.]+)\]
Edit:
For a PHP script try something like this (where $emailHeader contains the data you are searching):
$regex = '/Received: +from[^\\n]*?\\[([0-9\\.]+)\\]/s';
if (preg_match_all($regex, $emailHeader, $matches_out)) {
print_r($matches_out);
} else {
print('Sender IP not found');
}
The <= in the star looks funny, but other than that it seems to be working fine:
(?:\bReceived: from .*\[)((\d{1,3}\.){3}\d{1,3})(?:]\))
I believe what you're looking for is:
(?:\bReceived: from .*?\[)(?<ip>(?:\d{1,3}\.){3}\d{1,3})
the matched IP address will be in capture group named "ip".