Extract Splunk domain from payload_printable field with regex - regex

I'm trying to extract a domain from the Splunk payload_printable field (source is Suricata logs) and found this regex is working fine for most of the cases:
source="*suricata*" alert.signature="ET JA3*"
| rex field=payload_printable "(?<dom>[a-zA-Z0-9\-\_]{1,}\.[a-zA-Z0-9\-\_]{2,}\.[a-zA-Z0-9\-\_]{2,})"
| table payload_printable, dom
The regular expression is:
(?<dom>[a-zA-Z0-9\-\_]{1,}\.[a-zA-Z0-9\-\_]{2,}\.[a-zA-Z0-9\-\_]{2,})
For example, if my printable_payload looks like this:
...........^aO+.t....]......$.....mT*l.......&.,.+.0./.$.#.(.'.
...........=.<.5./.
...].........activity.windows.com..........
.................
.......................#...........
The domain "activity.windows.com" is successfully extracted. Now, it doesn't work for such a payload, because the regex matches another part that does not correspond to the domain:
...........^aO+]v;.~........:.Y.zORw._I..K>..&.,.+.0./.$.#.(.'.
...........=.<.5./.
...].........activity.windows.com..........
.................
.......................#...........
It extracts "Y.zORw._I".
Another example:
...........^h.'`.o2...
.y.k>..e.ef...]..8.G..&.,.+.0./.$.#.(.'.
...........=.<.5./.
...p.........arc.msn.com..........
.................
.......................#.........h2.http/1.1...................
I don't know how to do. Thank you for your help.

This regex will match domain names and correctly matches the two examples you gave:
"(?<dom>(?:[a-z0-9](?:[a-z0-9-_]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-_]{0,61}[a-z0-9])"

Related

Splunk regex to match part of url string

I'm trying to use Splunk to search for all base path instances of a specific url (and maybe plot it on a chart afterwards).
Here are some example urls and the part I want to match for:
http://some-url.com/first/ # match "first"
http://some-url.com/first/second/ # match "first"
http://some-url.com/first/second/third/ # match "first"
Here's the regex I'm using, which works fine:
http:\/\/some-url\.com\/(.*?)\/
What should my Splunk search be to extract the desired text? Is this even possible in Splunk?
Assuming that it's always com.
Using rex:
index= and other stuff | rex field=(if not _raw) "\.com/(?<match> \w+)/" | table match
To match any URL (.com or not), you can use the following command.
index=... | rex field=_raw "http(s)?://[^/]+/(?<match>[^/]+)"
This will match things such as
http://splunk.com/first/
https://simonduff.net/first/
https://splunk.com/first/middle/last
https://splunk.com/first

Extract website details from text file using regex pattern

Need to extract website urls from the text. Can you tell me where am I missing.
Data:
gmail.com
2.0
Dolphins.com.
B.TECH
62.1%.
github.com/XYZ
abcd.com
github.com/abcd
linkedin.com/in/abcd
abcd.wordpress.com/
https://xyz/stackoverflow.com
Regex pattern:
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w+/\-?=%.]+\.[\w+/\-?=%.]+', text)
Expected Output:
github.com/XYZ
abcd.com
github.com/abcd
linkedin.com/in/abcd
abcd.wordpress.com/
https://xyz/stackoverflow.com
Current output:
Its extracting all the items which are written in Data. Can someone tell me what changes are required in my regex to get the expected output?
I used below regex and it worked in regex101.com
.*(?:https?:\/\/)?(?:www\.)?[a-z-]+\.(?:com|org)(?:\.[a-z]{2,3})?.*
But when I use it in my code with re.findall() it returns entire text file, and if we use it with re.finditer() it says json is not serializable. Im trying to return my output in json. So what can be done here?

splunk: Get the first three numbers from ip address

I'm trying to get the first three sets of numbers of an IP address which is in this format: 10.10.10.10
Desired value would be 10.10.10
Try this regex: ^(.+)(?=\.\d+$)
DEMO
And from next time please post what have you tried along with how you plan to reach the solution.
Regex to match a correct IP4Address:
/^(([01]?\d?\d|2[0-4]\d|25[0-5])\.){3}([01]?\d?\d|2[0-4]\d|25[0-5])$/
Regex101
Regex to match first three blocks of an correct IP4Address:
/^(([01]?\d?\d|2[0-4]\d|25[0-5])\.){2}([01]?\d?\d|2[0-4]\d|25[0-5])$/
Regex101
or if it is still fine when it matches a point after the third block:
/^(([01]?\d?\d|2[0-4]\d|25[0-5])\.){3}$/
Regex101
was able to get it this way:
rex field=IP "(?<first_three>\d+\.\d+\.\d+)\.\d+"
Another method to do.
..| rex field=ip_addr "(?<split_ip>.+)\.[0-9]+"
Where,
ip_addr - field name
split_ip - variable under which the split IP address will be stored
Example:
Splunk Query:
| stats count | eval ip = "115.124.35.123" | rex field=ip "(?<split_ip>.+)\.[0-9]+" | table split_ip
Output:
115.124.35
Below works for me.
rex field=_raw "(?<ip_address>^\d+\.\d+\.\d+\.\d+)"|timechart count by ip_address
Use below regex :
^(?P<result>.+(?=\.\d+))
[link] https://regex101.com/r/bO4tY5/3
https://regex101.com/ is a super useful tool for this kind of stuff. It lets you write your regex and test it for different strings in real time.
Once you've got what you need, stick it into your Splunk search query with the rex command.
To answer your exact problem:
The regex code, where MY_FIELD_NAME_HERE is the name of the extracted field:
(?<MY_FIELD_NAME_HERE>\d+\.\d+\.\d+)\.\d+
The regex with examples from regex101:
https://regex101.com/r/qTTf4e/2
The command required for the Splunk query language, where ORIGNAL_FIELD is your original field holding 10.10.10.10 and MY_FIELD_NAME_HERE is the extracted field:
... | rex field="ORIGNAL_FIELD" "(?<MY_FIELD_NAME_HERE>\d+\.\d+\.\d+)\.\d+"

How to use reg expression with multiple values?

I have multiple products on a website and each product has following info:
ProductId
ProductName
ProductPrice
I know how to use it for a single value like ProductId, but how should I use it for all the values?
Example: I select product1 from a list and then the other fields should be automatically updated, too.
JMeter's regular expression extractor stores match groups as JMeter Variables in form of
reference name -> underscore -> match number -> underscore -> group number
For example, if you have the following Regular Expression Extractor configuration:
Reference Name: LINK
Regular Expression: (.+?)
Template: $1$
And add it as a Post Processor to i.e. HTTP Request to http://example.com you will receive the following variables:
LINKS_1=http://www.iana.org/domains/example
LINKS_1_g=2
LINKS_1_g0=More information...
LINKS_1_g1=http://www.iana.org/domains/example
LINKS_1_g2=More information...
So you'll be able to access link "href" attribute as ${LINKS_1_g1} and link text as ${LINKS_1_g2}
You should be able to use similar approach for your testing.
See Using RegEx (Regular Expression Extractor) with JMeter guide for more information on the domain.

Searching a number in a specific string with regexp in jmeter

I want to find a specific number from a HTML response.
For example, I want to extract 3 from publicationID3publicationID.
Does someone know a solution with regexp?
Add Regular Expression Extractor Post Processor as a child of the request, which returns to you this string.
Configure it as follows:
Reference Name: publicationID (you can use any variable name here)
Regular Expression: publicationID(\d+)publicationID
Template: $1$
other fields can be left blank.
You can later refer publication ID as ${publicationID} or ${__V(publicationID)}
You can see what matches does your Regular Expression return using View Results Tree Listener (select RegExp Tester from dropdown). Another option is Debug Sampler again with combination with View Results Tree.
you can use \d to match a number using regex.