Hi i'm in need of some serious help,
I have logs that i wish to Parse using GROK but the problem i'm having is that they are not always consistent in content or spacing here are some obfuscated examples.
title_access_log:ipaddress1, ipaddress2, ipaddress3 - - [14/Nov/2017:08:30:00 +0000] "GET /url HTTP/1.1" 200 198454 - 153261 - 0000fD5b5OSuS2C7ZdhgwqYufJk:GH809 url
title_access_log:ipaddress1, ipaddress2 - - [14/Nov/2017:08:30:00 +0000] "GET /url HTTP/1.1" 200 2326 - 20482 V22843489635e0e42e864037eccb8ad4857500ea 0000BDzHfUFhjJmcs9R4-CyglGS:GH806 url
title_access_log:ipaddress1, ipaddress2 - - [14/Nov/2017:08:30:00 +0000] "POST /url HTTP/1.1" 200 30031 - 17942 - 0000PjpQluI9BZ0w4EDB9o2fow-:GH809 url
I have managed to make a GROK patterns that pull out up to time and date for logs that contain 2 IPs but i get suck going further or when trying to do logs with 3 ips.
Has anyone got any advice on how to tackle this.
i'm using Graylog is what i'm using to extract data to so i do have the option of using other formats than GROK.
Related
I'm writing a Python 3.7.2 program to parse Apache logs looking for all successful response codes. I've got regex written right now that will parse all correct Apache log entries into individual tuples of [origin] [date/time] [HTML method/file/protocol] [response code] and [file size] and then I just check to see if the response code is 3xx. The problem is there are several entries that are corrupt, some corrupt enough to be unreadable so I've stripped them out in a different part of the program. Several are just missing the closing " (quotation mark) on the method/protocol item causing it to throw an error each time I parse that line. I'm thinking I need to use a RegEx Or expression for " OR whitespace but that seems to break the quote into a different tuple item instead of looking for say, "GET 613.html HTTP/1.0" OR "GET 613.html HTTP/1.0 I'm new to regex and thoroughly stumped, can anyone explain what I'm doing wrong?
I should note that the logs have been scrubbed of some info, instead of origin IP it only shows 'local' or 'remote' and the OS/browser info is removed entirely.
This is the regex for the relevant tuple item that works with valid entries: "(.*)?" I've also tried:
"(.*)?("|\s) - creates another tuple item and still throws error
Here's a snippet of the log entries including the last entry which is missing it's closing "
local - - [27/Oct/1994:18:47:03 -0600] "GET index.html HTTP/1.0" 200 3185
local - - [27/Oct/1994:18:48:53 -0600] "GET index.html HTTP/1.0" 404 -
local - - [27/Oct/1994:18:49:55 -0600] "GET index.html HTTP/1.0" 303 3185
local - - [27/Oct/1994:18:50:25 -0600] "GET 612.html HTTP/1.0" 404 -
local - - [27/Oct/1994:18:50:41 -0600] "GET index.html HTTP/1.0" 200 388
local - - [27/Oct/1994:18:50:52 -0600] "GET 613.html HTTP/1.0 303 728
regex = '([(\w+)]+) - - \[(.*?)\] "(.*)?" (\d+) (\S+)'
import re
with open("validlogs.txt") as validlogs:
i = 0
array = []
successcodes = 0
for line in validlogs:
array.append(line)
loglength = len(array)
while (i < loglength):
line = re.match(regex, array[i]).groups()
if(line[3].startswith("3")):
successcodes+=1
i+=1
print("Number of successcodes: ", successcodes)
Parsing the log responses above should give Number of success codes: 2
Instead I get: Traceback (most recent call last):
File "test.py", line 24, in
line = re.match(regex, array[i]).groups()
AttributeError: 'NoneType' object has no attribute 'groups'
because (I believe) regex is looking explicitly for a " and can't handle the line entry that's missing it.
So I originally used re.match with ([(\w+)]+) - - \[(.*?)\] "(.*?)" (\d+) (\d+) with a Try: / Except: continue code to parse all the logs that actually matched the pattern. Since ~100,000 of the ~750,000 lines didn't conform to the correct Apache logs pattern, I wound up changing my code to re.search with much smaller segments instead.
For instance:
with open("./http_access_log.txt") as logs:
for line in logs:
if re.search('\s*(30\d)\s\S+', line): #Checking for 30x redirect codes
redirectCounter += 1
I've read that re.match is faster than re.search but I felt that being able to accurately capture the most possible log entries (this handles all but about 2000 lines, most of which have no usable info) was more important.
After following the documentation in https://www.odoo.com/documentation/9.0/api_integration.html I have encountered a problem with the generated PDF report.
I call the webservice to generate an invoice report and after rendering the pdf report, it returns without its layout ( located: account.report_invoice )
I do the following to render the report:
url = 'http://{0}:{1}/xmlrpc/2/report'.format(self._connect['host'], self._connect['port'])
sock_print = xmlrpclib.ServerProxy(url)
#Here, the 'render_report' function returns the base64 pdf without the specified layout
result = sock_print.render_report(db_name, uid, pwd, report_name, ids, {'model': 'account.invoice', 'report_type': 'qweb-pdf'})
string_pdf = base64.decodestring(report['result'])
return True, string_pdf
After, the function above is done, I save the file in a directory to check if the file was generated with the correct layout.
So far, the pdf was generated but without its layout for account.report_invoice.
Any ideas on what might be happening or what I might be missing?
Thank you for your time.
[EDIT 1]
2018-09-17 14:34:09,599 30522 INFO ? werkzeug: 127.0.0.1 - - [17/Sep/2018 14:34:09] "GET /web/content/323-c1e807b/report.assets_common.0.css HTTP/1.1" 404 -
2018-09-17 14:34:09,617 30522 INFO ? werkzeug: 127.0.0.1 - - [17/Sep/2018 14:34:09] "GET /web/content/328-9a5a204/report.assets_pdf.0.css HTTP/1.1" 404 -
2018-09-17 14:34:09,879 30522 INFO ? werkzeug: 127.0.0.1 - - [17/Sep/2018 14:34:09] "GET /web/content/328-9a5a204/report.assets_pdf.0.css HTTP/1.1" 404 -
2018-09-17 14:34:09,883 30522 INFO ? werkzeug: 127.0.0.1 - - [17/Sep/2018 14:34:09] "GET /web/content/323-c1e807b/report.assets_common.0.css HTTP/1.1" 404 -
Found this when trying to call via webservice.
When I print the reports directly from odoo interface it's O.K, but via webservice it doesn't recognise its own core css.
Consider an access log of a REST API, you will see lines (simplified) that looks like this:
2017-01-01T12:12:41Z "GET /api/posts" HTTP/1.1 200 "-"
2017-01-01T12:12:42Z "GET /api/posts/56/comments" HTTP/1.1 200 "-"
2017-01-01T12:12:42Z "GET /api/posts" HTTP/1.1 200 "-"
2017-01-01T12:12:56Z "POST /api/posts" HTTP/1.1 202 "Safari"
2017-01-01T12:12:58Z "GET /api/posts/134/comments" HTTP/1.1 200 "-"
To parse that you could write something like :
_collector=access.log | regex parse "(?<method>[A-Z]+) /api/(?<path>[\w\d\/]+) HTTP"
This would extract METHOD and PATH form the log lines, BUT you would see these unique values:
GET posts
POST posts
GET posts/56/comments
GET posts/134/comments
I wish to throw away all the dynamic parts of the url, so I could find the following instead:
GET posts
POST posts
GET posts/{id}/comments
I could figure out this in a search and replace regex easily enough, but is it even possible in Sumologic?
I am trying to grab the output from an nginx log file and send it to logstash.
10.1.10.20 - bob [14/Feb/2014:18:57:05 +0000] “POST /main/foo.git/git-upload-pack HTTP/1.1” 200 3653189 “-” “git/1.8.3.4 (Apple Git–47)”
Grock is able to find the first 3 words fine
10.1.10.20 - bob [14/Feb/2014:18:57:05 +0000]
%{IPV4:user_ip} - %{USERNAME:user_name} \[%{HTTPDATE:time_local}\]
Grok is able to find the 3rd and 4th words fine
[14/Feb/2014:18:57:05 +0000] “POST /main/foo.git/git-upload-pack HTTP/1.1”
\[%{HTTPDATE:time_local}\] %{QUOTEDSTRING:request}
However when I combine them, and try to find all 4, grok says there are no results (using http://grokdebug.herokuapp.com/ for testing)
10.1.10.20 - bob [14/Feb/2014:18:57:05 +0000] “POST /main/foo.git/git-upload-pack HTTP/1.1”
%{IPV4:user_ip} - %{USERNAME:user_name} \[%{HTTPDATE:time_local}\] %{QUOTEDSTRING:request}
#not found
Anyone know how to get the quoted string in the above example?
I'm brand new to grok, so perhaps I'm not approaching this correctly.
Update
Interestingly if I use the following log line and then manually type in the url it does work
bob 14/Feb/2014:18:57:05 +0000 "herp"
#Once herp works, replace herp, with POST
bob 14/Feb/2014:18:57:05 +0000 "POST"
#Once POST works, keep expounding until the whole thing is in place
autobuild 14/Feb/2014:18:57:05 +0000 "POST /main/builder.git/git-upload-pack HTTP/1.1"
"POST /main/builder.git/git-upload-pack HTTP/1.1" in pattern
"%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}"
The process of posting to stack overflow identified the problem.
If you look carefully, the double quotes are parsed differently
"POST
vs
“POST
Manually typing in the double quote fixes the problem
Also you can use this expression for the cases where the log changes:
"%{WORD:verb}(?:| %{URIPATHPARAM:request})(?:| HTTP/%{NUMBER:httpversion})"
it matches with:
"POST /main/builder.git/git-upload-pack HTTP/1.1"
or
"POST /main/builder.git/git-upload-pack"
or
"POST"
try it.. ;)
I finid that at {wso2am_home}repository/logs/ have logs:
http_access_2013-10-28.log
tm.out wso2-apigw-errors.log
wso2-apigw-service.log
wso2-apigw-trace.log
wso2carbon-trace-messages.log
wso2carbon.log
and I configure all the log4j.properties INFO to OFF. I don't know where to close the http_access.log.
I find when I call 1 time api,it write the http_access.log: gwmanager.apim-wso2.com:8280 - - - "GET /direct/1.0.5 HTTP/1.1" - - "-" "Jakarta Commons-HttpClient/3.1" 128.6.X.X:80 - - - "GET http://128.6.X.X:80 HTTP/1.1" - - "-" "Synapse-HttpComponents-NIO so,as I call api time more and more ,the file is more and ---------- more big.
Do you know how to close the http_access.log?
If you want to disable http access logs in WSO2 products then go to catalina-server.xml which is located {CARBON_HOME}/repository/conf/tomcat directory, and remove the following property
<Valve className="org.apache.catalina.valves.AccessLogValve" directory="${carbon.home}/repository/logs"
prefix="http_access_" suffix=".log"
pattern="combined" />
Please refer this for more details