I am using telegraf plugin[[inputs.logparser]] to grab the access_log data from Apache based on a local web page I have got running.
Using ["%{COMBINED_LOG_FORMAT}"] patterns, I am able to retrieve the default measurements provided by the access_logs, including http_version, request, resp_bytes etc.
I have appended the "Log Format" within httpd.conf file to include the additional "Response time" to each request access_log records with %D at the end, this has been successful when i look at the access_log after implementing.
However I am so far unable to successfully tell Telegraf to acknowledge this new measurement with the inputs.logparser - I am using a grafana dashboard with InfluxDB to monitor this data and it has not yet appeared as an additional measurement.
So far I have attempted the following:
First [[inputs.logparser]] section remains the same throughout my attempts and is always present/active, this seems right in order to be able to obtain the default measurements?
######## default logparser using COMBINED to obtain default access_log measurements ######
# Stream and parse log file(s).
[[inputs.logparser]]
files = ["/var/log/httpd/access_log"]
from_beginning = true
## Parse logstash-style "grok" patterns:
[inputs.logparser.grok]
patterns = ["%{COMBINED_LOG_FORMAT}"
measurement = "apache_access_log"
custom_patterns = '''
'''
Attempt 1 at matching the response time appended to access_log:
############# Grok/RegEx for matching response time ######################
# Stream and parse log file(s).
[[inputs.logparser]]
## Log files to parse.
files = ["/var/log/httpd/access_log"]
from_beginning = true
## Parse logstash-style "grok" patterns:
[inputs.logparser.grok]
patterns = ["%{METRICS_INCLUDE_RESPONSE}"]
measurement = "apache_access_log"
custom_patterns = '''
METRICS_INCLUDE_RESPONSE [%{NUMBER:resp}]
'''
And my 2nd attempt I thought to try normal regular expressions
############# Grok/RegEx for matching response time ######################
# Stream and parse log file(s).
[[inputs.logparser]]
## Log files to parse.
files = ["/var/log/httpd/access_log"]
from_beginning = true
## Parse logstash-style "grok" patterns:
[inputs.logparser.grok]
patterns = ["%{METRICS_INCLUDE_RESPONSE}"]
measurement = "apache_access_log"
custom_patterns = '''
METRICS_INCLUDE_RESPONSE [%([0-9]{1,3})]
'''
After both of these attempts, the default measurements are still recorded and grabbed fine by Telegraf, but the response time does not appear as an additional measurement.
I believe the issue to be syntax within my custom grok pattern, and that it is not matching as I have intended it to because I am not telling it to pull the correct information? But I am unsure.
I have provided an example of the access_log output below, ALL details are pulled from Telegraf without issue under COMBINED_LOG_FORMAT, except for the number at the end, which is representative of the response time.
10.30.20.32 - - [09/Jan/2020:11:08:14 +0000] "POST /404.php HTTP/1.1" 200 252 "http://10.30.10.77/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36" 600
10.30.20.32 - - [09/Jan/2020:11:08:15 +0000] "POST /boop.html HTTP/1.1" 200 76 "http://10.30.10.77/404.php" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36" 472
You are essentially extending a pre-defined pattern. So, the pattern should be written like so (assuming your response time value is within square brackets in the log) :
######## default logparser using COMBINED to obtain default access_log measurements ######
# Stream and parse log file(s).
[[inputs.logparser]]
files = ["/var/log/httpd/access_log"]
from_beginning = true
## Parse logstash-style "grok" patterns:
[inputs.logparser.grok]
patterns = ["%{COMBINED_LOG_FORMAT} \\[%{NUMBER:responseTime:float}\\]"]
measurement = "apache_access_log"
custom_patterns = '''
'''
You will get the response time value in a metric named 'responseTime' in float data type.
My setup is ELB --https--> traefik --https--> service
I get back a 500 Internal Server Error from traefik on every request. It doesn't appear the request ever makes it to the service. The service is running Apache with access logging and I see no incoming requests logged. I am able to curl the service directly and receive an expected response. Both traefik and the service are running in Docker containers. I am also able to use port 80 all the way through with success, and I can use https to traefik and port 80 to the service. I get an error from apache, but it does go all the way through.
traefik.toml
logLevel = "DEBUG"
RootCAs = [ "/etc/certs/ca.pem" ]
#InsecureSkipVerify = true
defaultEntryPoints = ["https"]
[entryPoints]
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[[entryPoints.https.tls.certificates]]
certFile = "/etc/certs/cert.pem"
keyFile = "/etc/certs/key.pem"
[entryPoints.http]
address = ":80"
[web]
address = ":8080"
[traefikLog]
[accessLog]
[consulCatalog]
endpoint = "127.0.0.1:8500"
domain = "consul.localhost"
exposedByDefault = false
prefix = "traefik"
The tags used for the consul service:
"traefik.enable=true",
"traefik.protocol=https",
"traefik.frontend.passHostHeader=true",
"traefik.frontend.redirect.entryPoint=https",
"traefik.frontend.entryPoints=https",
"traefik.frontend.rule=Host:hostname"
The debug output from traefik for each request:
time="2018-04-08T02:46:36Z"
level=debug
msg="vulcand/oxy/roundrobin/rr: begin ServeHttp on request"
Request="{"Method":"GET","URL":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":""},"Proto":"HTTP/1.1","ProtoMajor":1,"ProtoMinor":1,"Header":{"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"],"Accept-Encoding":["gzip, deflate, br"],"Accept-Language":["en-US,en;q=0.9"],"Cache-Control":["max-age=0"],"Cookie":["__utmc=80117009; PHPSESSID=64c928bgf265fgqdqqbgdbuqso; _ga=GA1.2.573328135.1514428072; messagesUtk=d353002175524322ac26ff221d1e80a6; __hstc=27968611.cbdd9ce39324304b461d515d0a8f4cb0.1523037648547.1523037648547.1523037648547.1; __hssrc=1; hubspotutk=cbdd9ce39324304b461d515d0a8f4cb0; __utmz=80117009.1523037658.5.2.utmcsr=|utmccn=(referral)|utmcmd=referral|utmcct=/; __utma=80117009.573328135.1514428072.1523037658.1523128344.6"],"Upgrade-Insecure-Requests":["1"],"User-Agent":["Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.81 Safari/537.36"],"X-Amzn-Trace-Id":["Root=1-5ac982a8-b9615451a35258e3fd2a825d"],"X-Forwarded-For":["76.105.255.147"],"X-Forwarded-Port":["443"],"X-Forwarded-Proto":["https"]},"ContentLength":0,"TransferEncoding":null,"Host”:”hostname”,”Form":null,"PostForm":null,"MultipartForm":null,"Trailer":null,"RemoteAddr":"10.200.20.130:4880","RequestURI":"/","TLS":null}"
time="2018-04-08T02:46:36Z" level=debug
msg="vulcand/oxy/roundrobin/rr: Forwarding this request to URL"
Request="{"Method":"GET","URL":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":""},"Proto":"HTTP/1.1","ProtoMajor":1,"ProtoMinor":1,"Header":{"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"],"Accept-Encoding":["gzip, deflate, br"],"Accept-Language":["en-US,en;q=0.9"],"Cache-Control":["max-age=0"],"Cookie":["__utmc=80117009; PHPSESSID=64c928bgf265fgqdqqbgdbuqso; _ga=GA1.2.573328135.1514428072; messagesUtk=d353002175524322ac26ff221d1e80a6; __hstc=27968611.cbdd9ce39324304b461d515d0a8f4cb0.1523037648547.1523037648547.1523037648547.1; __hssrc=1; hubspotutk=cbdd9ce39324304b461d515d0a8f4cb0; __utmz=80117009.1523037658.5.2.utmcsr=|utmccn=(referral)|utmcmd=referral|utmcct=/; __utma=80117009.573328135.1514428072.1523037658.1523128344.6"],"Upgrade-Insecure-Requests":["1"],"User-Agent":["Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.81 Safari/537.36"],"X-Amzn-Trace-Id":["Root=1-5ac982a8-b9615451a35258e3fd2a825d"],"X-Forwarded-For":["76.105.255.147"],"X-Forwarded-Port":["443"],"X-Forwarded-Proto":["https"]},"ContentLength":0,"TransferEncoding":null,"Host”:”hostname”,”Form":null,"PostForm":null,"MultipartForm":null,"Trailer":null,"RemoteAddr":"10.200.20.130:4880","RequestURI":"/","TLS":null}" ForwardURL="https://10.200.115.53:443"
assume "hostname" is the correct host name. Any assistance is appreciated.
I think your problem come from "traefik.protocol=https", remove this tag.
Also you can remove traefik.frontend.entryPoints=https because it's useless: this tag create a redirection to https entrypoint but your frontend is already on the https entry point ("traefik.frontend.entryPoints=https")
This question already has an answer here:
Django bug on CRSF token
(1 answer)
Closed 5 years ago.
The request header is as below.
Accept:application/json, text/plain, */*
Accept-Encoding:gzip, deflate, br
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Content-Length:129
Content-Type:text/plain
Host:localhost:9000
Origin:http://localhost:8000
Referer:http://localhost:8000/
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
X-CSRFTOKEN:t5Nx0SW9haZTeOcErcBDtaq6psqBfeyuX4LRQ1WOOXq5g93tQkvcUZDGoWz8wSeD
The X-CSRFTOKEN is there but Django still complain about CSRF cookie not set. What happen to Django?
In settings.py, the naming are perfectly correct.
CSRF_HEADER_NAME = "HTTP_X_CSRFTOKEN"
Check if CSRF_COOKIE_SECURE is set to true.
You would get such an error message if CSRF_COOKIE_SECURE is true and you access a site through http instead of https.
Or you need to use (for testing only) csrf_exempt.
For example, curtisp mentions in the comments:
I had conditional dev vs prod settings and accidentally put dev settings to CSRF_COOKIE_SECURE = True and SESSION_COOKIE_SECURE = True.
My dev site is localhost on laptop, and is does not have SSL.
So changing dev settings to False fixed it for me.
I have run into an issue where my web crawler will only run correctly when I am connected to my home Internet.
Using Python 2.7 with the Mechanize module on Windows 7.
Here are a few details about the code (snippet below)- This web crawler logs into a website, navigates through a series of links, locates a link to download a file, downloads the file, saves the file to a preset folder, then repeats the process several thousand times.
I am able to run the code successfully at home on both my wired and wireless internet. When I connect to the Internet via a different source (e.g. work, starbucks, neighbor's house, mobile hotspot) the script runs but returns an error when trying to access the link to download a file:
httperror_seek_wrapper: HTTP ERROR 404: Not Found
This is what the prints in the IDE when I access this site:
send: 'GET /download/8635/CLPOINT.E00.GZ HTTP/1.1\r\nHost: dl1.geocomm.com\r\nUser-Agent: Mozilla/5.0 (x11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1\r\nCookie: MSGPOPUP=1391465678; TBMSESSION=5dee7266e3dcfa0193972102c73a2543\r\nConnection: close\r\nAccept-Encoding: gzip\r\n\r\n'
reply: 'HTTP/1.1 404 Not Found\r\n'
header: Content-Type: text/html
header: Content-Length: 345
header: Connection: close
header: Date: Mon, 03 Feb 2014 22:14:44 GMT
header: Server: lighttpd/1.4.32
Simply changing back to my home internet What confuses me is I am not changing anything but the source of the internet - I simply disconnect from router, connect to another, and rerun the code.
I have tried to change the browser headers using these three options:
br.addheaders = [('User-agent', 'Mozilla/5.0 (x11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11')]
br.addheaders = [('User-agent', 'Firefox')]
I am using the Mechanize module to access the Internet and create a browser session. Here is the login code snippet and download file code snippet (where I am getting the 404 error).
def websiteLogin():
## Logs into GeoComm website using predefined credential (username/password hardcoded in definition)
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(),max_time=1)
br.set_debug_http(True)
br.set_debug_redirects(True)
br.set_debug_responses(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (x11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.select_form(nr=0)
br.form['username']='**********' ## stars replace my actual un and pw
br.form['password']='**********'
br.submit()
return br
def downloadData (br, url, outws):
br.open(url)
for l in br.links(url_regex = 'download/[0-9]{4}'):
fname = l.text
outfile = os.path.join(outws, fname)
if not os.path.exists(outfile):
f = br.retrieve(l.absolute_url)[0]
time.sleep(7.5)
shutil.copy2(f, outfile)
This code does run as expected (i.e. downloads files without 404 error) on my home internet, but that is a satellite internet service and my daily download and monthly data allotments are limited - that is why I need to run this using another source of internet. I am looking for some help better understanding why the code runs one place but not another. Let me know if you require more information to help troubleshoot this.
As you can see from your get-request your mechanize browser object is trying to get the resource /download/8635/CLPOINT.E00.GZ from host dl1.geocomm.com.
When you try to recheck this you will get the 404 because the resource is simply not available.
dl1.geocomm.com is redirected to another target
What I'd recommend you to do is to start debugging your application in an appropriate way.
You could start with adding at least some debugging print statements.
def downloadData (br, url, outws):
br.open(url)
for l in br.links(url_regex = 'download/[0-9]{4}'):
print(l.url)
After that you'll see how the output differs. Ensure to pass the url in the same way every time.
I need to exclude some sensitive details in my apache log, but I want to keep the log and the uri's in it. Is it possible to achieve following in my access log:
127.0.0.1 - - [27/Feb/2012:13:18:12 +0100] "GET /api.php?param=secret HTTP/1.1" 200 7600 "http://localhost/api.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"
I want to replace "secret" with "[FILTERED]" like this:
127.0.0.1 - - [27/Feb/2012:13:18:12 +0100] "GET /api.php?param=[FILTERED] HTTP/1.1" 200 7600 "http://localhost/api.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"
I know I probably should have used POST to send this variable, but the damage is already done. I've looked at http://httpd.apache.org/docs/2.4/logs.html and LogFormat, but could not find any possibilities to use regular expression or similar. Any suggestions?
[edit]
Do NOT send sensitive variables as GET parameters if you have the possibility to choose.
I've found one way to solve the problem. If I pipe the log output to sed, I can perform a regex replace on the output before I append it to the log file.
Example 1
CustomLog "|/bin/sed -E s/'param=[^& \t\n]*'/'param=\[FILTERED\]'/g >> /your/path/access.log" combined
Example 2
It's also possible to exclude several parameters:
exclude.sh
#!/bin/bash
while read x ; do
result=$x
for ARG in "$#"
do
cleanArg=`echo $ARG | sed -E 's|([^0-9a-zA-Z_])|\\\\\1|g'`
result=`echo $result | sed -E s/$cleanArg'=[^& \t\n]*'/$cleanArg'=\[FILTERED\]'/g`
done
echo $result
done
Move the script above to the folder /opt/scripts/ or somewhere else, give the script execute rights (chmod +x exclude.sh) and modify your apache config like this:
CustomLog "|/opt/scripts/exclude.sh param param1 param2 >> /your/path/access.log" combined
Documentation
http://httpd.apache.org/docs/2.4/logs.html#piped
http://www.gnu.org/software/sed/manual/sed.html
If you want to exclude several parameters, but don't want to use a script, you can use groups like that :
CustomLog "|$/bin/sed -E s/'(email|password)=[^& \t\n]*'/'\\\\\1=\[FILTERED\]'/g >> /var/log/apache2/${APACHE_LOG_FILENAME}.access.log" combined