I have the following value in a field which needs to be split into multiple fields,
Classname:
abc.TestAutomation.NNNN.Specs.Prod/NDisableTransactionalAccessUsers.#()::TestAssembly:abc.TestAutomation
Required output:
Productname : abc.TestAutomation.NNNN.Specs.Prod
Feature name : NDisableTransactionalAccessUsers
Project : TestAssembly:abc.TestAutomation
I have been trying to extract the values into my fields using REX command, but I am failing.
source="Reports.csv" index="prod_reports_data" sourcetype="ReportsData"
| rex "classname(?<Productname>/*)\.(?<Featurename>#*)\.(?<Project>.*)"
| table classname Productname Featurename Project
While I execute this command, there are no results. I am very new to Splunk, can someone guide.
Thanks.
I almost always use multiple rex statement to get what I want ... but if you "know" the data is consistent, this will work (tried on regex101.com):
| rex field=_raw (?<classname>[^\/]+)\/(?<featurename>[^\.]+)\.[[:punct:]]+(?<project>[\w].+)
What this regular expression does:
<classname> :: everything from the front of the event to a front slash (/)
<featurename> :: whatever follows the front slash (/) until a literal dot (.)
discard all found punctuation
<project> :: whatever is left on the line
According to regex101.com, this is likely the most efficient rex you can use (14 steps total)
Related
I have a field "hostname" in splunk logs which is available in my event as
"host = server.region.ab1dc2.mydomain.com".
I can refer to host with same name "host" in splunk query. I want to extract the substring with 4 digits after two dots ,for the above example , it will be "ab1d". How my splunk query should look like for this extraction?
Basically I have been given a string, and want to skip two dots and then take the four characters after that.
So long as you have at least three segments to a fully-qualified domain name, this should work (without using a regular expression)
index=ndx sourcetype=srctp host=*
| makemv delim="." host
| eval piece=substr(mvindex(host,3),1,4)
...
makemv converts a field into a multivalue field based on the delim you instruct it to use
Then use eval to grab the third item in the list using mvindex, trimming it with substr
If you really want to use a regular expression, this will do it (again, presuming you have at least three pieces to the FQDN):
index=ndx sourcetype=srctp host=*
| rex field=host "\.[^\.]+\.(?<piece>[^\.]{4})"
...
I have a scenario wherein I want to replace a period when its surrounded by Alphabets and not when surrounded by Numbers. I figured out a Regular Expression pattern that can identify only the periods in Key names but the pattern is not working in SQL
SELECT REGEXP_REPLACE("Amount.fee:0.75,Amount.tot:645.55","(?<!\d)(\.)(?!\d)","_","ig");
Expected output: Amount_fee:0.75,Amount_tot:645.55
Note, I am trying this because, In MemSQL I couldn't access JSON key when it has period in it.
Also verified the pattern "(?<!\d)(.)(?!\d)" using https://coding.tools/regex-replace and it working fine. But, SQL is not working. Am using MemSQL 7.1.9 and POSIX Enhanced Regular expression are supposed to be work. Any help is much appreciated.
Since it looks like you are trying to workaround accessing a JSON key with a period, I will show you how to do that.
This can be done by either surrounding the json key name with backtics while using the shorthand json extract syntax:
select col::%`Amount.fee` from (select '{"Amount.fee":0.75,"Amount.tot":645.55}' col);
+--------------------+
| col::%`Amount.fee` |
+--------------------+
| 0.75 |
+--------------------+
or by using the json_extract_ builtins directly:
select json_extract_double('{"Amount.fee":0.75,"Amount.tot":645.55}', 'Amount.fee');
+------------------------------------------------------------------------------+
| json_extract_double('{"Amount.fee":0.75,"Amount.tot":645.55}', 'Amount.fee') |
+------------------------------------------------------------------------------+
| 0.75 |
+------------------------------------------------------------------------------+
Assuming you only want to target dots that are in between two non digit characters, where the dot is not the first or last character in the string, you may match on ([^\d])\.([^\d]) and replace with \1_\2:
SELECT REGEXP_REPLACE("Amount.fee:0.75,Amount.tot:645.55", "([^\d])\.([^\d])", "\1_\2", "ig");
Here is a regex demo showing that the replacement is working. Note that you might have to use $1_$2 instead of \1_\2 as the replacement, depending on the regex flavor of your SQL tool.
I've some URL's in my cas_fnd_dwd_det table,
casi_imp_urls cas_code
----------------------------------- -----------
www.casiac.net/fnds/CASI/qnxp.pdf
www.casiac.net/fnds/casi/as.pdf
www.casiac.net/fnds/casi/vindq.pdf
www.casiac.net/fnds/CASI/mnip.pdf
how do i copy the letters between last '/' and '.pdf' to another column
expected outcome
casi_imp_urls cas_code
----------------------------------- -----------
www.casiac.net/fnds/CASI/qnxp.pdf qnxp
www.casiac.net/fnds/casi/as.pdf as
www.casiac.net/fnds/casi/vindq.pdf vindq
www.casiac.net/fnds/CASI/mnip.pdf mnip
the below URL's are static
www.casiac.net/fnds/CASI/
www.casiac.net/fnds/casi/
Advise, how do i select the codes between last '/' and '.pdf' ?
I would recommend to take a look at REGEXP_SUBSTR. It allows to apply a regular expression. Db2 has string processing functions, but the regex function may be the easiest solution. See SO question on regex and URI parts for different ways of writing the expression. The following would return the last slash, filename and the extension:
SELECT REGEXP_SUBSTR('http://fobar.com/one/two/abc.pdf','\/(\w)*.pdf' ,1,1)
FROM sysibm.sysdummy1
/abc.pdf
The following uses REPLACE and the pattern is from this SO question with the pdf file extension added. It splits the string in three groups: everything up to the last slash, then the file name, then the ".pdf". The '$1' returns the group 1 (groups start with 0). Group 2 would be the ".pdf".
SELECT REGEXP_REPLACE('http://fobar.com/one/two/abc.pdf','(?:.+\/)(.+)(.pdf)','$1' ,1,1)
FROM sysibm.sysdummy1
abc
You could apply LENGTH and SUBSTR to extract the relevant part or try to build that into the regex.
For older Db2 versions than 11.1. Not sure if it works for 9.5, but definitely should work since 9.7.
Try this as is.
with cas_fnd_dwd_det (casi_imp_urls) as (values
'www.casiac.net/fnds/CASI/qnxp.pdf'
, 'www.casiac.net/fnds/casi/as.pdf'
, 'www.casiac.net/fnds/casi/vindq.pdf'
, 'www.casiac.net/fnds/CASI/mnip.PDF'
)
select
casi_imp_urls
, xmlcast(xmlquery('fn:replace($s, ".*/(.*)\.pdf", "$1", "i")' passing casi_imp_urls as "s") as varchar(50)) cas_code
from cas_fnd_dwd_det
I am trying to come up with a RegEx (POSIX like) in a vendor application that returns data looking like illustrated below and presents a single line of data at a time so I do not need to account for multiple rows and need to match a row indvidually.
It can return one or more values in the string result
The application doesn't just let me use a "\d+\.\d+" to capture the component out of the string and I need to map all components of a row of data to a variable unfortunately even if I am going to discard it or otherwise it returns a negative match result.
My data looks like the following with the weird underscore padding.
USER | ___________ 3.58625 | ___________ 7.02235 |
USER | ___________ 10.02625 | ___________ 15.23625 |
The syntax is supports is
Matches REGEX "(Var1 Regex), (Var2 Regex), (Var3 Regex), (Var 4 regex), (Var 5 regex)" and the entire string must match the aggregation of the RegEx components, a single character off and you get nothing.
The "|" characters are field separators for the data.
So in the above what I need is a RegEx that takes it up to the beginning of the numeric and puts that in Var1, then capture the numeric value with decimal point in var 2, then capture up to the next numeric in Var 3, and then keep the numeric in var 4, then capture the space and end field | character into var 5. Only Var 2 and 4 will be useful but I have to capture the entire string.
I have mainly tried capturing between the bars "|" using ^.*\|(.*).\|*$ from this question.
I have also tried the multiple variable ([0-9]+k?[.,]?[0-9]+)\s*-\s*.*?([0-9]+k?[.,]?[0-9]+) mentioned in this question.
I seem to be missing something to get it right when I try using them via RegExr and I feel like I am missing something pretty simple.
In RegExr I never get more than one part of the string I either get just the number, the equivalent of the entire string in a single variable, or just the number which don't work in this context to accomplish the required goal.
The only example the documentation provides is the following from like a SysLog entry of something like in this example I'm consolidating there with "Fault with Resource Name: Disk Specific Problem: Offline"
WHERE value matches regex "(.)Resource Name: (.), Specific Problem: ([^,]),(.)"
SET _Rrsc = var02
SET _Prob = var03
I've spun my wheels on this for several hours so would appreciate any guidance / help to get me over this hump.
Something like this should work:
(\D+)([\d.]+)(\D+)([\d.]+)(.*)
Or in normal words: Capture everything but numbers, capture a decimal number, capture everything but numbers, capture a decimal number, capture everything.
Using USER | ___________ 10.02625 | ___________ 15.23625 |
$1 = USER | ___________
$2 = 10.02625
$3 = | ___________
$4 = 15.23625
$5 = |
It seems that there is no way to extract fields with a . in the name.
I'm trying to use field extractors on our older data to create fields matching the newer data JSON fields.
{ "pirate": { "say ": "Shiver me timbers" } }
pirate.say = "Shiver me timbers"
To test this you can to do is something like this:
| metadata type=hosts index=_internal
| head 1
| eval message="Shiver me timbers, goes the pirate"
| table message
| rex field=message "(?<pirate.say>[^,]+)"
But all I get for my efforts is the same error message in both the 'rex' prototype described above and 'Field extractions' page.
From the 'rex' prototype I get:
Error in 'rex' command: Encountered the following error while compiling the regex '(?[^,]+)': Regex: syntax error in subpattern name (missing terminator)
From the 'Fields » Field extractions » Add new' I get:
Encountered the following error while trying to save: Regex: syntax error in subpattern name (missing terminator)
Any thoughts on how I can solve this one?
There are several different things going on here.
First, No, you cannot create a regex with a dot in the field name being extracted. (tested over at regex101.com, and it doesn't work.)
When extracted from a JSON, splunk can create fields that have a dot in them, signifying the hierarchy of the JSON.
On the other hand, when auto extracting from normal data, splunk will normally replace invalid characters with underscores.
To extract a JSON, normally you use the spath command.
To pull out a regex, just give it a valid name and then rename to contain the dot.
| makeresults
| eval message="Shiver me timbers, goes the pirate"
| table message
| rex field=message "(?<piratesays>[^,]+)"
| rename piratesays as "pirate.say"
I've forgotten whether you need single or double quotes for the odd name, so if that doesn't work, try this.
| rename piratesays as 'pirate.say'