Splunk - extract a field with dot/period - regex

It seems that there is no way to extract fields with a . in the name.
I'm trying to use field extractors on our older data to create fields matching the newer data JSON fields.
{ "pirate": { "say ": "Shiver me timbers" } }
pirate.say = "Shiver me timbers"
To test this you can to do is something like this:
| metadata type=hosts index=_internal
| head 1
| eval message="Shiver me timbers, goes the pirate"
| table message
| rex field=message "(?<pirate.say>[^,]+)"
But all I get for my efforts is the same error message in both the 'rex' prototype described above and 'Field extractions' page.
From the 'rex' prototype I get:
Error in 'rex' command: Encountered the following error while compiling the regex '(?[^,]+)': Regex: syntax error in subpattern name (missing terminator)
From the 'Fields » Field extractions » Add new' I get:
Encountered the following error while trying to save: Regex: syntax error in subpattern name (missing terminator)
Any thoughts on how I can solve this one?

There are several different things going on here.
First, No, you cannot create a regex with a dot in the field name being extracted. (tested over at regex101.com, and it doesn't work.)
When extracted from a JSON, splunk can create fields that have a dot in them, signifying the hierarchy of the JSON.
On the other hand, when auto extracting from normal data, splunk will normally replace invalid characters with underscores.
To extract a JSON, normally you use the spath command.
To pull out a regex, just give it a valid name and then rename to contain the dot.
| makeresults
| eval message="Shiver me timbers, goes the pirate"
| table message
| rex field=message "(?<piratesays>[^,]+)"
| rename piratesays as "pirate.say"
I've forgotten whether you need single or double quotes for the odd name, so if that doesn't work, try this.
| rename piratesays as 'pirate.say'

Related

Splunk need help in extracting ERROR messages from logs

Can someone pls help in extracting all the logs with "level": ERROR and extract the message into another field .For eg. message : Example1.Example2: Example3: 2 need to extracted to fied.TravellllgPeeele_INR
The log looks like this :
| "level":"ERROR","loggerName":"Log1.Log2LLogLog3","message":"Example1.Example2: Example3: 2 need to extracted to fied.TravellllgPeeele_INR","endOfBatch":false
I am able to extract all ERROR with the below rex, however unable to extract message.
| rex "(?<Err>ERROR)"
| search Err=*
You've got the first extraction correct, just do another like it:
| rex field=_raw "message[[:punct:]]+(?<message>[^\"]+)"
This will grab everything after "message":"" until it hits another quote mark
added per comment request
If you only want to extract message if "ERROR" is present in the event, the regex is similarly simple:
| rex field=_raw "ERROR.+?message[[:punct:]]+(?<message>[^\"]+)"
edit
If you can have any punctuation leading the message, just match the ":" by using [[:punct:]]{3}:
| rex field=_raw "ERROR.+?message[[:punct:]]{3}(?<message>[^\"]+)"

How to replace a period within string and not in Numeric using singlestore (MemSQL) DB REGEXP_REPLACE function

I have a scenario wherein I want to replace a period when its surrounded by Alphabets and not when surrounded by Numbers. I figured out a Regular Expression pattern that can identify only the periods in Key names but the pattern is not working in SQL
SELECT REGEXP_REPLACE("Amount.fee:0.75,Amount.tot:645.55","(?<!\d)(\.)(?!\d)","_","ig");
Expected output: Amount_fee:0.75,Amount_tot:645.55
Note, I am trying this because, In MemSQL I couldn't access JSON key when it has period in it.
Also verified the pattern "(?<!\d)(.)(?!\d)" using https://coding.tools/regex-replace and it working fine. But, SQL is not working. Am using MemSQL 7.1.9 and POSIX Enhanced Regular expression are supposed to be work. Any help is much appreciated.
Since it looks like you are trying to workaround accessing a JSON key with a period, I will show you how to do that.
This can be done by either surrounding the json key name with backtics while using the shorthand json extract syntax:
select col::%`Amount.fee` from (select '{"Amount.fee":0.75,"Amount.tot":645.55}' col);
+--------------------+
| col::%`Amount.fee` |
+--------------------+
| 0.75 |
+--------------------+
or by using the json_extract_ builtins directly:
select json_extract_double('{"Amount.fee":0.75,"Amount.tot":645.55}', 'Amount.fee');
+------------------------------------------------------------------------------+
| json_extract_double('{"Amount.fee":0.75,"Amount.tot":645.55}', 'Amount.fee') |
+------------------------------------------------------------------------------+
| 0.75 |
+------------------------------------------------------------------------------+
Assuming you only want to target dots that are in between two non digit characters, where the dot is not the first or last character in the string, you may match on ([^\d])\.([^\d]) and replace with \1_\2:
SELECT REGEXP_REPLACE("Amount.fee:0.75,Amount.tot:645.55", "([^\d])\.([^\d])", "\1_\2", "ig");
Here is a regex demo showing that the replacement is working. Note that you might have to use $1_$2 instead of \1_\2 as the replacement, depending on the regex flavor of your SQL tool.

Splunk - Split a field into multiple fields based on delimiters

I have the following value in a field which needs to be split into multiple fields,
Classname:
abc.TestAutomation.NNNN.Specs.Prod/NDisableTransactionalAccessUsers.#()::TestAssembly:abc.TestAutomation
Required output:
Productname : abc.TestAutomation.NNNN.Specs.Prod
Feature name : NDisableTransactionalAccessUsers
Project : TestAssembly:abc.TestAutomation
I have been trying to extract the values into my fields using REX command, but I am failing.
source="Reports.csv" index="prod_reports_data" sourcetype="ReportsData"
| rex "classname(?<Productname>/*)\.(?<Featurename>#*)\.(?<Project>.*)"
| table classname Productname Featurename Project
While I execute this command, there are no results. I am very new to Splunk, can someone guide.
Thanks.
I almost always use multiple rex statement to get what I want ... but if you "know" the data is consistent, this will work (tried on regex101.com):
| rex field=_raw (?<classname>[^\/]+)\/(?<featurename>[^\.]+)\.[[:punct:]]+(?<project>[\w].+)
What this regular expression does:
<classname> :: everything from the front of the event to a front slash (/)
<featurename> :: whatever follows the front slash (/) until a literal dot (.)
discard all found punctuation
<project> :: whatever is left on the line
According to regex101.com, this is likely the most efficient rex you can use (14 steps total)

Remove leading 0 in String with letters and digits

I have a comma separated file where I need to change the first column removing leading zeroes in string. Text file is as below
ABC-0001,ab,0001
ABC-0010,bc,0010
I need to get the data as under
ABC-1,ab,0001
ABC-10,bc,0010
I can do a command line replace which i tried as below:
sed 's/ABC-0*[1-9]/ABC-[1-9]/g' file
I ended up getting output:
ABC-[1-9],ab,0001
ABC-[1-9]0,ac,0010
Can you please tell me what I am missing in here.
Alternately I also tried to apply formatting in the SQL that generates this file as below:
select regexp_replace(key,'((0+)|1-9|0+)','(1-9|0+)') from file where key in ('ABC-0001','ABC-0010')
which gives output as
ABC-(1-9|0+)1
ABC-(1-9|0+)1(1-9|0+)
Help on either of solution will be very helpful!
Try this :
sed -E 's/ABC-0*([1-9])/ABC-\1/g' file
------ --
| |
capturing group |
captured group
To do it in the query using Oracle, where the key value with the zeroes you want to remove is in a column called "key" in a table called "file", would look like this:
select regexp_replace(key, '(-)(0+)(.*)', '\1\3')
from file;
You need to capture the dash as it is "consumed" by the regex as it is matched. Followed by the second group of one or more 0's, followed by the rest of the field. Replace with captured groups 1 and 3, leaving the 0's (if any) between out.

Spark 2.2/Jupyter Notebook SQL regexp_extract function not matching regex pattern

I'm using the regexp_extract Spark 2.2 SQL function in a Jupyter (Scala) notebook to match a string of 11 or more repeating characters.
Here's the regex:
^(.)\1{10,}$
Now, let's look at that pattern with the regexp_extract function. Here's how I've used it in my notebook:
spark.sql("SELECT REGEXP_EXTRACT('hhhhhhhhhhhhh', '^(.)\\1{10,}$', 1) as ExtractedChar").show()
+-------------+
|ExtractedChar|
+-------------+
| |
+-------------+
Odd, no output. Let's make sure my regex pattern is actually correct. Yep, looks right.
You may be wondering why the regex pattern contains two "\\" characters, it's because it is an escape character so two are necessary. Here's some verification:
1. val string = "SELECT REGEXP_EXTRACT('hhhhhhhhhhhhhhhhhhhhh', '^(.)\\1{10,}$', 1) as ExtractedChar"
2. println(string)
SELECT REGEXP_EXTRACT('hhhhhhhhhhhhhhhhhhhhh', '^(.)\1{10,}$', 1) as ExtractedChar
Alright, let's make sure the regexp_extract function is working correctly:
spark.sqlContext.sql("SELECT REGEXP_EXTRACT('TESTING', '^.', 0) as test").show()
+----+
|test|
+----+
| T|
+----+
Okay, maybe the issue is the Jupyter notebook? After checking and using the Scala REPL, I'm still having the same issue.
Any ideas why I'm unable to get this regex to successfully match?
Edit: Spark SQL is a requirement for this. I could create my own UDF using Scala; however, UDFs are black boxed by Spark meaning they will not be fully optimized.
I found the solution. The SQL string needs to include 4 "\" characters, like so:
'^(.)\\\\1{10,}$'
As explained here, four \ characters are needed because \ for two reasons:
\ is a special character in SQL and needs to be escaped, so the query needs two of them.
The input is coming from a string where \ also needs to be escaped. Just having "\\" would give a single \. To get two you need "\\\\".