Logstash grok filter apache pattern - regex

This is a sample Apache Tomcat log:
portal.portal.some.thing.int:8443 13.233.220.113 - - [09/Sep/2019:00:08:02 +0200] "GET /en/search-results?p_p_id=portal201_WAR_portal201_INSTANCE_q8EzsBteHybf&p_p_lifecycle=1&p_p_state=normal&queryText=Poll&facet.collection=AΜLex%2CAMsom%2CAMss%2WebPage%2SummariesOfSomething&startRow=1&resultsPerPage=10&SEARCH_TYPE=SIMPLE HTTP/1.1" 230 334734 6261 - - 35S64857F6860FDFC0F60B5B47A97E18
10.235.350.103 94.62.15.157, 10.435.230.101,10.134.046.2
I would like to capture the following variables
09/Sep/2019:00:08:02 +0200
/en/search-results?p_p_id=portal2....
35S64857F6860FDFC0F60B5B47A97E18
Can you help me with that? I want to index only those and drop the others, is it possible? Thank you

Use this grok pattern:
%{GREEDYDATA:field1} %{IP:ip1} - - \[%{GREEDYDATA:date}] \"%{WORD:method} %{GREEDYDATA:request}" %{WORD:numbers} %{WORD:numbers} %{WORD:numbers} - - %{WORD:last_parameter}
input:
portal.portal.some.thing.int:8443 13.233.220.113 - - [09/Sep/2019:00:08:02 +0200] "GET /en/search-results?p_p_id=portal201_WAR_portal201_INSTANCE_q8EzsBteHybf&p_p_lifecycle=1&p_p_state=normal&queryText=Poll&facet.collection=AΜLex%2CAMsom%2CAMss%2WebPage%2SummariesOfSomething&startRow=1&resultsPerPage=10&SEARCH_TYPE=SIMPLE HTTP/1.1" 230 334734 6261 - - 35S64857F6860FDFC0F60B5B47A97E18
10.235.350.103 94.62.15.157, 10.435.230.101,10.134.046.2
output:
{
"field1": [
[
"portal.portal.some.thing.int:8443"
]
],
"ip1": [
[
"13.233.220.113"
]
],
"IPV6": [
[
null
]
],
"IPV4": [
[
"13.233.220.113"
]
],
"date": [
[
"09/Sep/2019:00:08:02 +0200"
]
],
"method": [
[
"GET"
]
],
"request": [
[
"/en/search-results?p_p_id=portal201_WAR_portal201_INSTANCE_q8EzsBteHybf&p_p_lifecycle=1&p_p_state=normal&queryText=Poll&facet.collection=AΜLex%2CAMsom%2CAMss%2WebPage%2SummariesOfSomething&startRow=1&resultsPerPage=10&SEARCH_TYPE=SIMPLE HTTP/1.1"
]
],
"numbers": [
[
"230",
"334734",
"6261"
]
],
"last_parameter": [
[
"35S64857F6860FDFC0F60B5B47A97E18"
]
]
}
fields you want are:
date
request
last_parameter
You can remove other fields using remove field in mutate filter.

Related

How to use neptune-export

Can someone please list out a detailed stepwise process to export data from Neptune to S3(or local storage) in form of CSV.
I followed the doc(which seems to be the only resource available online), but it is not very clear.
TIA
The Neptune Export tool has many options that can be used to configure an export of both property graph and RDF data. The overall syntax of the command (if invoked via the command line) is:
NAME
neptune-export.sh export-pg - Export property graph from Neptune to CSV
or JSON.
SYNOPSIS
neptune-export.sh export-pg
[ --alb-endpoint <applicationLoadBalancerEndpoint> ]
[ --approx-edge-count <approxEdgeCount> ]
[ --approx-node-count <approxNodeCount> ]
[ {-b | --batch-size} <batchSize> ]
[ {-c | --config-file | --filter-config-file} <configFile> ]
[ --clone-cluster ]
[ --clone-cluster-instance-type <cloneClusterInstanceType> ]
[ --clone-cluster-replica-count <replicaCount> ]
[ {--cluster-id | --cluster | --clusterid} <clusterId> ]
[ {-cn | --concurrency} <concurrency> ]
[ {--config | --filter} <configJson> ] {-d | --dir} <directory>
[ --disable-ssl ] [ {-e | --endpoint} <endpoint>... ]
[ --edge-label-strategy <edgeLabelStrategy> ]
[ {-el | --edge-label} <edgeLabels>... ]
[ --escape-csv-headers ] [ --escape-newline ]
[ --exclude-type-definitions ] [ --export-id <exportId> ]
[ --format <format> ] [ --janus ]
[ --lb-port <loadBalancerPort> ] [ --limit <limit> ]
[ --log-level <log level> ]
[ --max-content-length <maxContentLength> ] [ --merge-files ]
[ --multi-value-separator <multiValueSeparator> ]
[ {-nl | --node-label} <nodeLabels>... ]
[ --nlb-endpoint <networkLoadBalancerEndpoint> ]
[ {-o | --output} <output> ] [ {-p | --port} <port> ]
[ --partition-directories <partitionDirectories> ]
[ --per-label-directories ] [ --profile <profiles>... ]
[ {-r | --range | --range-size} <rangeSize> ]
[ {--region | --stream-region} <region> ]
[ {-s | --scope} <scope> ] [ --serializer <serializer> ]
[ --skip <skip> ]
[ --stream-large-record-strategy <largeStreamRecordHandlingStrategy> ]
[ --stream-name <streamName> ] [ --strict-cardinality ]
[ {-t | --tag} <tag> ] [ --token-prefix <tokenPrefix> ]
[ --tokens-only <tokensOnly> ] [ --use-iam-auth ] [ --use-ssl ]
There are detailed instructions at the GitHub page for the tool that describe the alternative ways to export data. https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-export
If you still have questions I suggest making edits to the original question to clarify the precise challenges you have encountered.

AssetParameters changes when CDK codes updates for EKS deployed

CDK is used to deploy EKS in our company. A co-worker created a EKS cluster with CDK and then I pulled the CDK codes, modified something. Before deployment, I ran 'cdk diff' command and the result was shown many resources would be changed. Changes excluded what I modified were all AssetParameters as below.
# cdk diff
Stack eks-cluster
Parameters
[-] Parameter AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: {"Type":"String","Description":"S3 bucket for asset \"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\""}
[-] Parameter AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: {"Type":"String","Description":"S3 key for asset version \"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\""}
...
[+] Parameter AssetParameters/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/S3Bucket AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: {"Type":"String","Description":"S3 bucket for asset \"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\""}
[+] Parameter AssetParameters/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/S3VersionKey AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: {"Type":"String","Description":"S3 key for asset version \"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\""}
...
Resources
[~] AWS::CloudFormation::Stack #aws-cdk--aws-eks.ClusterResourceProvider.NestedStack/#aws-cdk--aws-eks.ClusterResourceProvider.NestedStackResource awscdkawseksClusterResourceProviderNestedStackawscdkawseksClusterResourceProviderNestedStackResourcexxxxx
[~] TemplateURL
[~] .Fn::Join:
## -7,7 +7,7 ##
[ ] },
[ ] "/",
[ ] {
[-] "Ref": "AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[+] "Ref": "AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[ ] },
[ ] "/",
[ ] {
## -17,7 +17,7 ##
[ ] "Fn::Split": [
[ ] "||",
[ ] {
[-] "Ref": "AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[+] "Ref": "AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[ ] }
[ ] ]
[ ] }
## -30,7 +30,7 ##
[ ] "Fn::Split": [
[ ] "||",
[ ] {
[-] "Ref": "AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[+] "Ref": "AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[ ] }
[ ] ]
[ ] }
[~] AWS::CloudFormation::Stack #aws-cdk--aws-eks.KubectlProvider.NestedStack/#aws-cdk--aws-eks.KubectlProvider.NestedStackResource awscdkawseksKubectlProviderNestedStackawscdkawseksKubectlProviderNestedStackResourcexxxxx
[~] Parameters
[+] Added: .referencetoeksclustereksAssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxRef
[+] Added: .referencetoeksclustereksAssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxRef
[-] Removed: .referencetoeksclustereksAssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxRef
[-] Removed: .referencetoeksclustereksAssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxRef
...
[~] TemplateURL
[~] .Fn::Join:
## -7,7 +7,7 ##
[ ] },
[ ] "/",
[ ] {
[-] "Ref": "AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[+] "Ref": "AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[ ] },
[ ] "/",
[ ] {
## -17,7 +17,7 ##
[ ] "Fn::Split": [
[ ] "||",
[ ] {
[-] "Ref": "AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[+] "Ref": "AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[ ] }
[ ] ]
[ ] }
## -30,7 +30,7 ##
[ ] "Fn::Split": [
[ ] "||",
[ ] {
[-] "Ref": "AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[+] "Ref": "AssetParametersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
[ ] }
[ ] ]
[ ] }
When the codes is deployed, is there no impact on the EKS cluster in running?
According to this issue this can happen, when the node version differs between machines:
https://github.com/aws/aws-cdk/issues/12427

How do you extract a time stamp using logstash and grok?

I'm trying to extract a timestamp using TIME from grok in logstash, but the extraction is unsucessful.
I'm using a grok pattern, but it is not matching or returning anything.
2019-07-30 14:12:23 - main - INFO - metro crawler completed runtime:00:00:02
%{TIMESTAMP_ISO8601:timestamp}%{GREEDYDATA}-%{SPACE}%{GREEDYDATA:crawler}%{SPACE}-%{SPACE}%{LOGLEVEL:level}%{TIME:time}
I'm getting no matches
You may use
%{TIMESTAMP_ISO8601:timestamp}%{SPACE}-%{SPACE}%{DATA:crawler}%{SPACE}-%{SPACE}%{LOGLEVEL:level}%{DATA}%{TIME:time}
See the debug output:
{
"timestamp": [
[
"2019-07-30 14:12:23"
]
],
"crawler": [
[
"__main__"
]
],
"level": [
[
"INFO"
]
],
"time": [
[
"00:00:02"
]
]
}

Grok parse error when using custom pattern definitions

I'm trying to use a grok filter in logstash version 1.5.0 to parse several fields of data from a log file.
I'm able to parse a simple WORD field with no issues, but when I try to define a custom pattern and add that in as well, the grok parse fails.
I've tried using a couple of grok debuggers which have been recommended elsewhere to find an issue:
http://grokconstructor.appspot.com/do/match
and
http://grokdebug.herokuapp.com/
both say that my regex should be fine, and return the fields that I want, but when I add it to my logstash.conf, grok fails to parse the log line and simply passes through the raw data to elasticsearch.
My sample line is as follows:
APPERR [2015/06/10 11:28:56.602] C1P1405 S39 (VPTestSlave002_001)| 8000B Connect to CGDialler DB (VPTest - START)| {39/A612-89A0-A598/60B9-1917-B094/9E98F46E} Failed to get DB connection: SQLConnect failed. 08001 (17) [Microsoft][ODBC SQL Server Driver][DBNETLIB]SQL Server does not exist or access denied.
My logstash.conf grok config looks like this:
grok
{
patterns_dir => ["D:\rt\Logstash-1.5.0\bin\patterns"]
match => {"message" => "%{WORD:LogLevel} \[%{KERNELTIMESTAMP:TimeStamp}\]"}
}
and the contents of my custom pattern file are:
KERNELTIMESTAMP %{YEAR}/%{MONTHNUM}/%{MONTHDAY} %{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
I am expecting this to return the following set of data:
{
"LogLevel": [
[
"APPERR"
]
],
"TimeStamp": [
[
"2015/06/10 11:28:56.602"
]
],
"YEAR": [
[
"2015"
]
],
"MONTHNUM": [
[
"06"
]
],
"MONTHDAY": [
[
"10"
]
],
"HOUR": [
[
"11",
null
]
],
"MINUTE": [
[
"28",
null
]
],
"SECOND": [
[
"56.602"
]
],
"ISO8601_TIMEZONE": [
[
null
]
]
}
Can anyone tell me where my issue is?

Chaining grok filter patterns for logstash

I am trying to configure logstash to manage my various log sources, one of which is Mongrel2. The format used by Mongrel2 is tnetstring, where a log message will take the form
86:9:localhost,12:192.168.33.1,5:57089#10:1411396297#3:GET,1:/,8:HTTP/1.1,3:200#6:145978#]
I want to write my own grok patterns to extract certain fields from the above format. I received help on this question trying to extract the host. So if in grok-patterns I define
M2HOST ^(?:[^:]*\:){2}(?<hostname>[^,]*)
and then in the logstash conf specify
filter {
grok {
match => [ "message", "%{M2HOST}" ]
}
}
it works as expected. The problem I now have is I want to specify multiple patterns e.g. M2HOST, M2ADDR etc. I tried defining additional ones in the same grok-patterns file
M2HOST ^(?:[^:]*\:){2}(?<hostname>[^,]*)
M2ADDR ^(?:[^:]*\:){3}(?<address>[^,]*)
and changing the logstash conf
filter {
grok {
match => [ "message", "%{M2HOST} %{M2ADDR}" ]
}
}
but now I just get the error _grokparsefailure.
with your sample input from other question and with some guessing about the values names the full match would be:
(?:[^:]*:){2}(?<hostname>[^,]*)[^:]*:(?<address>[^,]*)[^:]*:(?<pid>[^#]*)[^:]*:(?<time>[^#]*)[^:]*:(?<method>[^,]*)[^:]*:(?<query>[^,]*)[^:]*:(?<protocol>[^,]*)[^:]*:(?<code>[^#]*)[^:]*:(?<bytes>[^#]*).*
Producing:
{
"hostname": [
[
"localhost"
]
],
"address": [
[
"192.168.33.1"
]
],
"pid": [
[
"57089"
]
],
"time": [
[
"1411396297"
]
],
"method": [
[
"GET"
]
],
"query": [
[
"/"
]
],
"protocol": [
[
"HTTP/1.1"
]
],
"code": [
[
"200"
]
],
"bytes": [
[
"145978"
]
]
}