Extract domain with grok

Extract domain with grok - regex

I'd like to make the simplest grok filter, just to extract domain from url
For example, for the url
https://stackoverflow.com/questions/ask?title=grok%20extract%20url
I'd like to get the result
stackoverflow.com
I tried to do so with the filter
%{URIPROTO}://%{URIHOST:domain}
And it did extract me stackoverflow.com, but when I use a different url that has www at the start
for example
https://www.elastic.co/
the result is
www.elastic.co
is there a filter that could return me the domain alone, without www?
Thank you!

You can add a custom pattern like below :
SLD ([a-z0-9-]+.[a-z]{2,63})
This gives you second level domain name without subdomain. You can also add xn-- pattern like below for unicode domain names:
SLD ((xn--)?[a-z0-9-]+\.[a-z]{2,63})
Please check how you can add custom pattern to logstash documentation. Then, now, you can use this custom pattern like below:
%{URIPROTO}://(%{WORD:SUBDOMAIN}\.)?(%{SLD})
For %{WORD:SUBDOMAIN} this part, you can also write another regex to your custom pattern like below:
SUBDOMAIN ([a-z0-9-]{1,63})
At the end, your pattern file like this :
SLD ((xn--)?[a-z0-9-]+\.[a-z]{2,63})
SUBDOMAIN ([a-z0-9-]{1,63})
And your logstash conf like this:
filter {
grok {
patterns_dir => ["./patterns"]
match => { "uri" => "%{URIPROTO}://(%{SUBDOMAIN}\.)?(%{SLD})" }
}
}

Please define grok pattern:
HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
and use it:
%{URIPROTO}://%{HOSTNAME:domain}

Related

match route using a splat. What does it do?

I see this route:
match "*a", to: 'application#some_error_handler', via: :all
What does that do in Rails? Is that a splat "a"?

This is called route globbing and is explained in the Routing guide's section on route globbing:
Route globbing is a way to specify that a particular parameter should be matched to all the remaining parts of a route. For example:
get 'photos/*other', to: 'photos#unknown'
This route would match photos/12 or /photos/long/path/to/12, setting params[:other] to "12" or "long/path/to/12". The fragments prefixed with a star are called "wildcard segments".
Wildcard segments can occur anywhere in a route. For example:
get 'books/*section/:title', to: 'books#show'

URL not resolving correctly in Django URL

I just tried the following in my AJAX update:
[Server]/secTypes/Update
This maps to the following url in URLS.py:
url(r'^secTypes/Update/', equity.views.updateSecTypes, name='updateSecTypes'),
This doesn't resolve to the following function in my view.
But when I change the URL expression to:
url(r'^su/', equity.views.updateSecTypes, name='updateSecTypes')
It works fine.
What in the URL resolver is not getting accurately mapped? Is it the forward slash?
I think it has to do with something related to the regex so if someone understands this better can help me that would be appreciated.

From the url patterns in your comments, it looks like you had another matching pattern before the one in your question.
There are two simple solution for this.
Move that first pattern down. Change this:
url(r'^secTypes/', equity.views.getSecTypes, name='getSecTypes'),
url(r'^secTypesAll/', equity.views.getSecTypesAll, name='getSecTypesAll'),
url(r'^secTypes/Update/', equity.views.updateSecTypes, name='updateSecTypes'),
url(r'^secTypes/Delete/', equity.views.deleteSecTypes, name='deleteSecTypes'),
url(r'^secTypes/Create/', equity.views.createSecTypes, name='createSecTypes'),
to this:
url(r'^secTypesAll/', equity.views.getSecTypesAll, name='getSecTypesAll'),
url(r'^secTypes/Update/', equity.views.updateSecTypes, name='updateSecTypes'),
url(r'^secTypes/Delete/', equity.views.deleteSecTypes, name='deleteSecTypes'),
url(r'^secTypes/Create/', equity.views.createSecTypes, name='createSecTypes'),
url(r'^secTypes/', equity.views.getSecTypes, name='getSecTypes'),
The order matters when resolving URL patterns and if an earlier one matches, the following ones are not processed.
Both r'^secTypes/' and r'^secTypes/Update/' matches the string 'secTypes/Update/' so you need to be careful to put the more specific one first and the more general one afterwards.
Update the regex to match the end of the URL string by adding a $ like this:
url(r'^secTypes/$', equity.views.getSecTypes, name='getSecTypes'),
url(r'^secTypesAll/$', equity.views.getSecTypesAll, name='getSecTypesAll'),
url(r'^secTypes/Update/$', equity.views.updateSecTypes, name='updateSecTypes'),
url(r'^secTypes/Delete/$', equity.views.deleteSecTypes, name='deleteSecTypes'),
url(r'^secTypes/Create/$', equity.views.createSecTypes, name='createSecTypes'),
This is the preferred solution since it would stop Django from matching a URL like secTypes/Update/foobar
However, if you have logic in the view that specifically uses the substring after the end of the URL pattern (i.e. foobar based on the above example), this wouldn't work.

How to configure Fiddler's Autoresponder to "map" a host to a folder?

I'm already using Fiddler to intercept requests for specific remote files while I'm working on them (so I can tweak them locally without touching the published contents).
i.e. I use many rules like this
match: regex:(?insx).+/some_file([?a-z0-9-=&]+\.)*
respond: c:\somepath\some_file
This works perfectly.
What I'd like to do now is taking this a step further, with something like this
match: regex:http://some_dummy_domain/(anything)?(anything)
respond: c:\somepath\(anything)?(anything)
or, in plain text,
Intercept any http request to 'some_dummy_domain', go inside 'c:\somepath' and grab the file with the same path and name that was requested originally. Query string should pass through.
Some scenarios to further clarify:
http://some_domain/somefile --> c:\somepath\somefile
http://some_domain/path1/somefile --> c:\somepath\path1\somefile
http://some_domain/path1/somefile?querystring --> c:\somepath\path1\somefile?querystring
I tried to leverage what I already had:
match: regex:(?insx).+//some_dummy_domain/([?a-z0-9-=&]+\.)*
respond: ...
Basically, I'm looking for //some_dummy_domain/ in requests. This seems to match correctly when testing, but I'm missing how to respond.
Can Fiddler use matches in responses, and how could I set this up properly ?
I tried to respond c:\somepath\$1 but Fiddler seems to treat it verbatim:
match: regex:(?insx).+//some_domain/([?a-z0-9-=&]+\.)*
respond: c:\somepath\$1
request: http://some_domain/index.html
response: c:\somepath\$1html <-----------

The problem is your use of insx at the front of your expression; the n means that you want to require explicitly-named capture groups, meaning that a group $1 isn't automatically created. You can either omit the n or explicitly name the capture group.
From the Fiddler Book:
Use RegEx Replacements in Action Text
Fiddler’s AutoResponder permits you to use regular expression group replacements to map text from the Match Condition into the Action Text. For instance, the rule:
Match Text: REGEX:.+/assets/(.*)
Action Text: http://example.com/mockup/$1
...maps a request for http://example.com/assets/Test1.gif to http://example.com/mockup/Test1.gif.
The following rule:
Match Text: REGEX:.+example\.com.*
Action Text: http://proxy.webdbg.com/p.cgi?url=$0
...rewrites the inbound URL so that all URLs containing example.com are passed as a URL parameter to a page on proxy.webdbg.com.
Match Text: REGEX:(?insx).+/assets/(?'fname'[^?]*).*
Action Text C:\src\${fname}
...maps a request for http://example.com/‌assets/img/1.png?bunnies to C:\src\‌img\‌1.png.

Why is my route not matching any action

Using play framework, I'm trying to match a route using a regular expression.
What I wanted is to use one action that maps all this urls:
mydomain.com/my-post-title-123
mydomain.com/another-post-title-124
mydomain.com/a-third-post-title-125
get this "123, 124 and 125" from the end of the url so the controller can use it. Basically ignore whatever post tile comes in and only use the number at the end.
I have the following on my routes.conf
GET /$postId<\d$> controllers.Posts.viewPost(postId: Int)
But I get the error page "Action not found"

You are missing the url prefix and "+" in the regex in the routes definition. Here is my route configuration and it works fine
#Regex test
GET /$prefix<.*>$postId<\d+$> controllers.Application.viewPost(prefix:String,postId: Int)
Controllers.Application.viewPost
def viewPost(prefix:String,postId:Int) = Action{
Ok("the post id is: "+postId+" the prefix is:"+prefix)
}
and the output will be
the post id is: 123 the prefix is "whatever/prefix/you/give"
** tested, it works.

regex quandry - for word matching

I am out of my depth here, currently reading the tutorials and using python to learn regex.
I have a website where a php file http://www.example.com/showme.php?user=JOHN will load the visitor page of JOHN. However I want to let John have his own vanity URL like john.example.com and rewrite it to http://www.example.com/showme.php?user=JOHN .
I know it can be done and after fiddling with it it seems lighttpd mod_rewrite is the way to go. Now I am stumped as I am trying to come up with regex to match!
rewrite ("^![www]\.example\.com" => "www\.example\.com\?user=###");
I am playing with python re module to test out several ways of getting the john from john.example.com and recognize when the first segment of url is not www and then redirect. Above was my trial. Am I even in the right continent!
Any help will be appreciated in
recognizing when first part of url before the first . is not www and is something else - so that example.com won't stump it.
getting the first part of the url before first . and tag it to user=###
Thanks a bunch

Use lighttpd's mod-rewrite module. Add this to your lighttpd.conf file:
$HTTP["host"] != "www.example.com" {
$HTTP["host"] =~ "^([^.]+)\.example\.com$" {
url.rewrite-once = (
"^/?$" => "/showme.php?user=%1"
)
}
}
For an href value like /dir/page.php the domain part of the link gets automatically added from the current request as shown in the browser's address bar. So, if you had used www.example.com; the link would point to htp://www.example.com/dir/page.php and likewise for john.example.com.
For all your links to point at www.example.com, you need to be accessing the page using www. This would be possible only if you do an external redirect from the vanity URL to the actual one i.e. users can still use the shortened URL but they would get redirected to the actual one.
$HTTP["host"] != "www.example.com" {
$HTTP["host"] =~ "^([^.]+)\.example\.com$" {
url.redirect = (
"^/?$" => "http://www.example.com/showme.php?user=%1"
)
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract domain with grok - regex

Please define grok pattern: HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b) and use it: %{URIPROTO}://%{HOSTNAME:domain}

Related

match route using a splat. What does it do?

URL not resolving correctly in Django URL

How to configure Fiddler's Autoresponder to "map" a host to a folder?

Why is my route not matching any action

regex quandry - for word matching

Categories

Resources