Put email multiple folder in case of multiple recipients using procmail - procmail

I'm going to build an email system at home and I'm subscribed to a lot of mailing list. The emails fetched to my local machine by fetchmail and filtered by procmail. But, there is a situation which is not possible to solve with my current knowledge. I have been googling for 2-3 hours to find a solution without any result.
What I want is that, I get an email with multiple recipients and I would like to copy this email to different folders. Here is an example:
Cc: linux-kernel#vger.kernel.org, kernel-janitors#vger.kernel.org
I would like to put this email into linux-kernel and linux-kernel-janitors folder. How can I do it by procmail?
Thanks in advance!

You can make Procmail loop on the list of recipients by using SWITCHRC= but this is rather hackish. Or, if you have a limited list of folders you want to process, you can deliver into each separately, and drop the message if you have delivered it at least once.
LASTFOLDER=
:0c:
* ^TO_linux-kernel#vger\.kernel\.org\>
linux-kernel
:0c:
* ^TO_kernel-janitors#vger\.kernel\.org\>
kernel-janitors
# ... repeat for other addresses you want to multiplex ...
# If it was delivered, LASTFOLDER will be set
:0
* LASTFOLDER ?? .
/dev/null
If you may have copied into additional inboxes before reaching this section, you want to explicitly set LASTFOLDER to the empty string. It should not be necessary otherwise, but I left it in as a precaution. (This variable contains the name of the latest folder the message was delivered to.)

The solution looks like this:
First of all, an If statement is needed because my .procmailrc file contains not just kernel mailing list filter conditions. If it matches than there is another list of conditions. I think by the time it will be more fine-grained.
:0
* [To|Cc].*vger.kernel.org
LASTFOLDER=
:0Ac:
* ^[To|Cc].*linux-janitors#vger.kernel.org
| DoItSomethingWithIt
:0Ac:
* ^[To|Cc].*linux-kernel#vger.kernel.org
| DoItSomethingWithIt2
:0
* LASTFOLDER ?? .
| DoItSomethingWithIt3

Related

Regex to extract multiple pieces across multiple lines

I am working on making basic Zabbix items for Wazuh. Its not to replace Wazuh, but our techs live in Zabbix and this provides an alert in Zabbix so techs can know something and can go check to Wazuh.
The issue is that Wazuh alerts are multi-line alerts and we need 2 pieces of information.
From the example below, we would like to get:
(server)
(level 10) -> 'High amount of POST requests in a small period of time (likely bot).'
I use the following regex:
([\r\n].*?)(?:=?\r|\n)(.*?(?:(level 10.*)).*)
This will match on level 10 and then I can use group 1 to get the host name (server). But I am unable to get the second part. I can create an item for each level of rule (1-10 for example) and can get host name. But I can not get the alert itself. I read that I need to create individual items for each pience but what I found is that Zabbix does not always grab right piece from alert. Maybe alert 10 is one item captured but host name is from another log entry.
Is there a way to capture all of these in one item using regex in Zabbix?
Thank you. I appreciate all your help.
** Alert 1646336311.8104996: - web,appsec,attack,pci_dss_6.5,pci_dss_11.4,gdpr_IV_35.7.d,nist_800_53_SA.11,nist_800_53_SI.4,tsc_CC6.6,tsc_CC7.1,tsc_CC8.1,tsc_CC6.1,tsc_CC6.8,tsc_CC7.2,tsc_CC7.3,
2022 Mar 03 19:38:31 (server) any->/var/log/nginx/access.log
Rule: 31533 (level 10) -> 'High amount of POST requests in a small period of time (likely bot).'
If I understand correctly, you can use Javascript in Preprocessing:
value1 = value.replace(/.*1.*/g,'Mobile"')
value2 = value.replace(/.*2.*/g,'Mobile"')
finishvalue = value1.concat(value2);
return(finishvalue)

PDI - Check data types of field

I'm trying to create a transformation read csv files and check data types for each field in that csv.
Like this : the standard field A should string(1) character and field B is integer/number.
And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.
I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.
You can read files in loop, and
add step as below,
after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.
after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.
If it has value 1 then move files else ....
I hope this can help.
You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.
Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.
In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.
Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move
And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like
there. It is more flexible if you need maintenance on the long run.

HDFS sink: "clever" folder routing

I am new to Flume (and to HDFS), so I hope my question is not stupid.
I have a multi-tenant application (about 100 different customers as for
now).
I have 16 different data types.
(In production, we have approx. 15 million messages/day through our
RabbitMQ)
I want to write to HDFS all my events, separated by tenant, data type,
and date, like this :
/data/{tenant}/{data_type}/2014/10/15/file-08.csv
Is it possible with one sink definition ? I don't want to duplicate
configuration, and new client arrive every week or so
In documentation, I see
agent1.sinks.hdfs-sink1.hdfs.path = hdfs://server/events/%Y/%m/%d/%H/
Is this possible ?
agent1.sinks.hdfs-sink1.hdfs.path = hdfs://server/events/%tenant/%type/%Y/%m/%d/%H/
I want to write to different folders according to my incoming data.
Yes this is indeed possible. You can either use the metadata or some field in the incoming data to redirect the output to.
For example, in my case I am getting different types of log data and I want to store it in respective folders accordingly. Also in my case the first word in my log lines is the file name. Here is the config snippet for the same.
Interceptor:
dataplatform.sources.source1.interceptors = i3
dataplatform.sources.source1.interceptors.i3.type = regex_extractor
dataplatform.sources.source1.interceptors.i3.regex = ^(\\w*)\t.*
dataplatform.sources.source1.interceptors.i3.serializers = s1
dataplatform.sources.source1.interceptors.i3.serializers.s1.name = filename
HDFS Sink
dataplatform.sinks.sink1.type = hdfs
dataplatform.sinks.sink1.hdfs.path = hdfs://server/events/provider=%{filename}/years=%Y/months=%Y%m/days=%Y%m%d/hours=%H
Hope this helps.
Possible solution may be to write an interceptor which passes the tenant value.
please refer to the link below
http://hadoopi.wordpress.com/2014/06/11/flume-getting-started-with-interceptors/

What is the efficient method to compare files list in Client and remote Server

I have the below situation which need to be addressed efficiently,
I'm doing file sync from client devices to server. Sometimes what happen is file from one device doesn't get fetched to another device from the server due to some issues with server. I need to make sure that all the files in the server are synced to all the client devices using a separate thread. I am using C++ for the development and libcurl for client to server communication.
Here in the client device, we have an entry for downloaded files in the SQLite Database. Likewise in the server, we have similar updates in the server databases (MySQL) too. I need to list all the available files from the client device and send it to server and have to compare it with the list taken from the server database to find out the missed files.
I did a rough estimation that for 1 million files list (File Name with Full Path), it is about 85 MB in size. Upon compression it goes upto 10 MB in size. So transferring this entire file list (even after compression) from client to server is not a good idea. I planned to implement Bloom Filters for this as below,
Fetch files list from client side database and convert those to Bloom Filter Data Structure.
Just transferring the bloom data structure alone from client to the server.
Fetch files list from server side database and compare it with Bloom data structure received from the client and find out the missing files.
Please note that the above process initiated from client should be handled in thread at regular interval say for every 1 hour or so.
The problem with Bloom filters is false positive rates even if it very low. I don't want to miss out even a single file. Is there any other better way of doing this ?.
As you've noticed, this isn't a problem for which Bloom Filters are appropriate. With a Bloom Filter, when you get a hit you must then check the authoritative source to differentiate between a false positive and a true positive - they're useful in situations where most queries against the filter will be expected to give a negative result, which is the opposite to your case.
What you could do is have each side build a partial Prefix Tree in memory of all the filenames known to that side. It wouldn't be a full prefix tree - once you number of filenames below a node drops below a certain level you'd just include the full list of those filenames in that node. You then synchronise those prefix trees using a recursive algorithm starting at the root of the trees:
Each side creates a hash of all the sorted, concatenated filenames below the current node.
If the hashes are equal then this node and all descendents are synchronised - return.
If there are no child nodes, send the (short) list of filenames at this terminal node from one side to the other to synchronise and return.
Otherwise, recursively synchronise the child nodes and return.
The hash should be at least 128 bits, and make sure that when you concatenate the filenames for the hash you do so in a reversible manner (ie. seperate them with a character that can't appear in filenames like \0, or prefix each one with its length).
In file/pathname compression I've found a prefix-suffix compression to work better even alone than a generic (bz2) compression. When combined, the filename list could be reduced even more.
The trick is in using escape codes (e.g. <32) to indicate the number of common characters to the previous row, then use regular characters for the unique part and finally (optionally) encode the number of common characters at the end of the string.

Search a list of terms from this website, and nostop even any one of the terms are missing

I am trying to use RCurl package to get data from the genecard databases
http://www-bimas.cit.nih.gov/cards//
I read a wonderful solution in a previous posted questions:
How can I use R (Rcurl/XML packages ?!) to scrape this webpage?
However, my problem is different in a form that I need further supports from experist. Instead of exctracting all the links from the webpages. I have a list of ~ 1000 genes in my mind. They are in the form of gene symbols (some of the gene symbols can be found in the webpage, some of them are new to the database). Here is part of my lists of genes.
TP53
SOD1
EGFR
C2d
AKT2
NFKB1
C2d is not in the database, so, when I do the search manually I will see.
"Sorry, there is no GeneCard for C2d".
When I use to the solution posted in the previous questions for my analysis.
How can I use R (Rcurl/XML packages ?!) to scrape this webpage?
(1) I firstly readin the list
(2) I then use the get_structs function in the previous solution to subsitute each gene sybmols in the list to the following website
http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=genesybol.
(3) Scrap the information that I needed for each genes in the list, using the get_data_url function in the previous message.
It works for the TP53, SOD1, EGFR, but when the search comes to C2d. The process stopped.
As I got ~ 1000 genes, I am sure some of them are missing from the webpage.
How can I get a modified gene list to tell me out of ~1000 genes, which one of them are missing automatically? So, that I can use the same approach as listed in the previous question to get all the data that I needed based on the new gene lists that are EXISTING in webpage?
Or are there any methods to ask the R to skip those missing items and do the scrapping continuously till the end of the list but mark those missing items in the final results.
In order to faciliate the discussion process. I have make a sudo input files using the scripts using in the previous questions for the same webpage that they used.
u <- c ("Aero_pern", "Ppate", "didnotexist", "Sbico")
library(RCurl)
base_url<-"http://gtrnadb.ucsc.edu/" base_html<-getURLContent(base_url)[[1]]
links<-strsplit(base_html,"a href=")[[1]]
get_structs<-function(u) {
struct_url<-paste(base_url,u,"/",u,"-structs.html",sep="")
raw_data<-getURLContent(struct_url)
s_split1<-strsplit(raw_data,"<PRE>")[[1]]
all_data<-s_split1[seq(3,length(s_split1))]
data_list<-lapply(all_data,parse_genomes)
for (d in 1:length(data_list)) {data_list[[d]]<-append(data_list[[d]],u)}
return(data_list)
}
I guess the problem can be solved by modifing the get_structs scripps above or ifelse function may help, but I cannot figure out how to modify it further. Pls comments.
You can enclose your function call inside a try() so that the process won't break if you get errors. Usually this will let you loop over problematic cases and it will return an error message instead of breaking your process. e.g.
dat <- list()
for (i in 1:length(u)){
dat[[i]] <- try(get_structs(u[i]))
}