Regex - extract ip - regex

I'm tring to pull some data from a plain log file with a json convrestor.
this is the log entry:
01/04/2022 15:29:34.2934 +03:00 - [INFO] - [w3wp/LPAPI-Last Casino/177] - AppsFlyerPostback?re_targeting_conversion_type=&is_retargeting=false&app_id=id624512118&platform=ios&event_type=in-app-event&attribution_type=organic&ip=8.8.8.8&name=blabla
This is the regex I'm using:
(?P<date>[0-9]{2}\/[0-9]{2}\/[0-9]{4}).(?P<time>\s*[0-9]{2}:[0-9]{2}:[0-9]{2}).*(?P<level>\[\D+\]).-.\[(?P<application_subsystem_thread>.*)\].-.(?P<message>.*)
This is the output I'm getting:
{
"application_subsystem_thread": "w3wp/LPAPI-Last Casino/177",
"date": "01/04/2022",
"level": "[INFO]",
"message": "AppsFlyerPostback?re_targeting_conversion_type=&is_retargeting=false&app_id=id624512118&platform=ios&event_type=in-app-event&attribution_type=organic&ip=8.8.8.8&name=blabla",
"time": "15:29:34"
}
As you can see, the convertor is using the group names as the json key.
I would like to get the following output instead:
{
"application_subsystem_thread": "w3wp/LPAPI-Last Casino/177",
"date": "01/04/2022",
"level": "[INFO]",
"message": "AppsFlyerPostback?re_targeting_conversion_type=&is_retargeting=false&app_id=id624512118&platform=ios&event_type=in-app-event&attribution_type=organic&ip=8.8.8.8&name=blabla",
"time": "15:29:34",
"ip": "8.8.8.8"
}
As you can see I would like to get the IP as well how can I do it ?

You could extract it from the part of the message:
As defined in the message it could be captured with
ip\=(?P<ip_address>(?:[0-9]+\.){3}[0-9]+)
So then we incoperate it as part of the greater message group
(?P<message>.*ip\=(?P<ip_address>(?:[0-9]+\.){3}[0-9]+).*)
Resulting in the final expression
(?P<date>[0-9]{2}\/[0-9]{2}\/[0-9]{4}).(?P<time>\s*[0-9]{2}:[0-9]{2}:[0-9]{2}).*(?P<level>\[\D+\]).-.\[(?P<application_subsystem_thread>.*)\].-.(?P<message>.*ip\=(?P<ip_address>(?:[0-9]+\.){3}[0-9]+).*)
var message = `01/04/2022 15:29:34.2934 +03:00 - [INFO] - [w3wp/LPAPI-Last Casino/177] - AppsFlyerPostback?re_targeting_conversion_type=&is_retargeting=false&app_id=id624512118&platform=ios&event_type=in-app-event&attribution_type=organic&ip=8.8.8.8&name=blabla`;
// NOTE - The regex in this code sample has been modified to be ECMAScript compliant
console.log(/(?<date>[0-9]{2}\/[0-9]{2}\/[0-9]{4}).(?<time>\s*[0-9]{2}:[0-9]{2}:[0-9]{2}).*(?<level>\[\D+\]).-.\[(?<application_subsystem_thread>.*)\].-.(?<message>.*ip\=(?<ip_address>(?:[0-9]+\.){3}[0-9]+).*)/gm.exec(message).groups)

Related

Assistance with NGINX and Regex Rustexp Parsing with Vector VRL in YAML .yml Format

I am attempting to parse an unfiltered & unformatted generic NGINX access log using Vector VRL. I need am looking for a couple of examples and a push in the right direction, my first attempt using VRL syntax syntax but I have experience with basic regex and Sumo Logic (other logging systems).
Here is an example when I mean generic basic NGINX access log: (sourced from Digital Ocean's blog)
47.29.201.179 - - [28/Feb/2019:13:17:10 +0000] "GET /?p=1 HTTP/2.0" 200 5316 "https://domain1.com/?p=1" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36" "2.75"
My goal is to parse and transform the log into the following below JSON presentation format, or at least in JSON so I can then use JQ to interpret the log with the below. But you can see an idea of the fields I am looking to parse with the message key names:
{
"file": "X",
"host": "X",
"message": "<MESSAGE> \"GET .."",
"nginx": {
"agent": "X",
"client": "X",
"method": "GET",
"path": "/",
"protocol": "HTTP/1.1",
"request": "GET / HTTP/1.1",
"server": "X", <---- Attempting to add this field here by extracting client IP from the log
"size": X,
"status": 200,
"timestamp": "YYYY-MM-DDTHH:MM:SSZ"
},
"source_type": "file",
"timestamp": "YYYY-MM-DDTHH:MM:SS.<unixtimestamp>Z"
}
Please also note I am additionally trying to nest nginx as nested JSON within the formatted log.
VRL has a ton of functions available but I am not sure on my best choice going forward, should I use the Regex parse_regex as well as parse_nginx_log, in which order and any other transforms? I saw you can also use Multiple parsing strategies but unsure how to achieve this, especially in yml.
I started testing with an example transform configuration below (I am strictly attempting to achieve this in YAML .yml), but not having much look thus far:
transforms:
parse-nginx:
inputs:
- nginx
type: remap
source: |
.nginx = parse_nginx_log!(.message,"combined")
.nginx = parse_regex!(.message, r'^(?P<server>\d+\.\d+\.\d+\.\d+)
#.nginx = parse_json!(.message) # sets `.` to an array of objects
#.nginx = parse_syslog(.message) ??
#.nginx = parse_common_log(.message) ??
Additionally, to test the configuration would you suggest this is the best approach?
sudo vector --config-yaml <path>/nginx-vector-config.yml
Any help much appreciated...
TYIA!

AWS Cloud Watch: How to specify which field to use for timestamp in json?

I have
datetime_format = "%Y-%m-%dT%H:%M:%S.%f%z"
in /etc/awslogs/awslogs.conf
And I have log like this:
{
"level": "info",
"ts": "2023-01-08T21:46:03.381067Z",
"caller": "bot/bot.go:172",
"msg": "Creating test subscription declined",
"user_id": "0394c017-2a94-416c-940c-31b1aadb12ee"
}
However timestamp does not parsed
I see warning in logs
2023-01-08 21:46:03,423 - cwlogs.push.reader - WARNING - 9500 - Thread-4 - Fall back to previous event time: {'timestamp': 1673211877689, 'start_position': 6469L, 'end_position': 6640L}, previousEventTime: 1673211877689, reason: timestamp could not be parsed from message.
upd:
tried to remove level
{
"ts": "2023-01-08T23:15:00.518545Z",
"caller": "bot/bot.go:172",
"msg": "Creating test subscription declined",
"user_id": "0394c017-2a94-416c-940c-31b1aadb12ee"
}
and still does not work.
There 2 different formats of cloudwatch log configurations:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html. This is deprecated as mentioned in the alert section of the page.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html. This is the configuration for new unified cloudwatch agent and it doesn't have the parameter datetime_format to configure. Instead it has the timestamp_format.
Since you have mentioned the datetime_format, I'm assuming you are using the old agent. In that case, the %z refers to UTC offset in the form +HHMM or -HHMM. +0000, -0400, +1030 as per the linked documentation[1 above]. Your timestamp doesn't have an offset mentioned hence your format should be %Y-%m-%dT%H:%M:%S.%fZ. There the Z is similar to T where it just represents a character. Also, specify the time_zone as UTC.

python, google cloud platform: unable to overwite a file from google bucket: CRC32 does not match

I am using python3 client to connect to google buckets and trying to the following
download 'my_rules_file.yaml'
modify the yaml file
overwrite the file
Here is the code that i used
from google.cloud import storage
import yaml
client = storage.Client()
bucket = client.get_bucket('bucket_name')
blob = bucket.blob('my_rules_file.yaml')
yaml_file = blob.download_as_string()
doc = yaml.load(yaml_file, Loader=yaml.FullLoader)
doc['email'].clear()
doc['email'].extend(["test#gmail.com"])
yaml_file = yaml.dump(doc)
blob.upload_from_string(yaml_file, content_type="application/octet-stream")
This is the error I get from the last line for upload
BadRequest: 400 POST https://storage.googleapis.com/upload/storage/v1/b/fc-sandbox-datastore/o?uploadType=multipart: {
"error": {
"code": 400,
"message": "Provided CRC32C \"YXQoSg==\" doesn't match calculated CRC32C \"EyDHsA==\".",
"errors": [
{
"message": "Provided CRC32C \"YXQoSg==\" doesn't match calculated CRC32C \"EyDHsA==\".",
"domain": "global",
"reason": "invalid"
},
{
"message": "Provided MD5 hash \"G/rQwQii9moEvc3ZDqW2qQ==\" doesn't match calculated MD5 hash \"GqyZzuvv6yE57q1bLg8HAg==\".",
"domain": "global",
"reason": "invalid"
}
]
}
}
: ('Request failed with status code', 400, 'Expected one of', <HTTPStatus.OK: 200>)
why is this happening. This seems to happen only for ".yaml files".
The reason for your error is because you are trying to use the same blob object for both downloading and uploading this will not work you need two separate instances... You can find some good examples here Python google.cloud.storage.Blob() Examples
You should use a seperate blob instance to handle the upload you are trying with only one...
.....
blob = bucket.blob('my_rules_file.yaml')
yaml_file = blob.download_as_string()
.....
the second instance is needed here
....
blob.upload_from_string(yaml_file, content_type="application/octet-stream")
...

Dataflow Job - HTTP 400 Non Empty Data

I launched a dataflow batch job to load csv data from GCS to Pubsub.
Dataflow job is failing with the following log portion:
Error message from worker: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request { "code" : 400, "errors" : [ { "domain" : "global", "message" : "One or more messages in the publish request is empty. Each message must contain either non-empty data, or at least one attribute.", "reason" : "badRequest" } ], "message" : "One or more messages in the publish request is empty. Each message must contain either non-empty data, or at least one attribute.", "status" : "INVALID_ARGUMENT" } com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150) com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:443) com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1108) com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:541) com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:474) com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:591) org.apache.beam.sdk.io.gcp.pubsub.PubsubJsonClient.publish(PubsubJsonClient.java:138) org.apache.beam.sdk.io.gcp.pubsub.PubsubIO$Write$PubsubBoundedWriter.publish(PubsubIO.java:1195) org.apache.beam.sdk.io.gcp.pubsub.PubsubIO$Write$PubsubBoundedWriter.finishBundle(PubsubIO.java:1184)
So basically it's saying that there is at least on line empty but the csv data contains some empty fields but not a full empty line
here is a sample below
2019-12-01 00:00:00 UTC,remove_from_cart,5712790,1487580005268456287,,f.o.x,6.27,576802932,51d85cb0-897f-48d2-918b-ad63965c12dc
2019-12-01 00:00:00 UTC,view,5764655,1487580005411062629,,cnd,29.05,412120092,8adff31e-2051-4894-9758-224bfa8aec18
2019-12-01 00:00:02 UTC,cart,4958,1487580009471148064,,runail,1.19,494077766,c99a50e8-2fac-4c4d-89ec-41c05f114554
2019-12-01 00:00:05 UTC,view,5848413,1487580007675986893,,freedecor,0.79,348405118,722ffea5-73c0-4924-8e8f-371ff8031af4
2019-12-01 00:00:07 UTC,view,5824148,1487580005511725929,,,5.56,576005683,28172809-7e4a-45ce-bab0-5efa90117cd5
2019-12-01 00:00:09 UTC,view,5773361,1487580005134238553,,runail,2.62,560109803,38cf4ba1-4a0a-4c9e-b870-46685d105f95
2019-12-01 00:00:18 UTC,cart,5629988,1487580009311764506,,,1.19,579966747,1512be50-d0fd-4a92-bcd8-3ea3943f2a3b
Any help ?
Thanks

Submit an App by Url to Firefox Marketplace

I can install it with URL, but i can't upload to firefox marketplace.
but i have 2 errors:
JSON Parse Error
Error: The webapp extension could not be parsed due to a syntax error in the JSON.
No JSON object could be decoded: line 1 column 0 (char 0)
well the json is this:
{
"name": "Snake",
"description": "Snake in html and js",
"launch_path": "/index.html",
"developer": {
"name": "ZiTAL",
"url": "https://github.com/ZiTAL/snakejs"
},
"icons": {
"128": "/img/snake-128.png"
},
"installs_allowed_from": ["*"]
}
Second error:
Manifests must be served with the HTTP header "Content-Type: application/x-web-app-manifest+json". See https://developer.mozilla.org/docs/Web/Apps/Manifest#Serving_manifests for more information.
well if i downloaded with wget:
wget http://myurl/manifest.webapp
the header is OK
HTTP eskaera bidalia, erantzunaren zain... 200 OK
Luzera: 267 [application/x-web-app-manifest+json]
Saving to: ‘manifest.webapp’
To validate the app, you need to put the manifest.webapp url, not the app url:
http://myurl/manifest.webapp
Second error:
You could try wget --save-headers and look in the output file, if the Content-Type header is really correct...