I am trying to extract log data in splunk and my current usecase is more complicated that what the "regex builder" will allow for. Consider the below example, I would like to extract all the text between two phrases. I can get small, one line samples to work between two words, but I've not been able to get this to work at all. The separate line breaks are not helping either.
Thanks for any help you can provide!
Phrase1: Stuff.Applications.Business.StuffApi.Common.Exceptions.ValidationException:
Phrase2:
at Stuff.Applications.Business.StuffApi.Web.Controllers.Stuff.Things
Example Data:
02/26/2018 02:17:08 PM
LogName=Stuff
SourceName=StuffApi
EventCode=400
EventType=2
Type=Error
ComputerName=Stuff.things.Words
TaskCategory=%1
OpCode=Info
RecordNumber=3129
Keywords=Classic
Message=2018-02-26 14:17:08,767 [63] ERROR Things [(null)] - Something Number: ; Something Number: 9999999999 ; Source Application: ABCD ; Error Type: Validation ; Response Status Code: 400
Stuff.Applications.Business.StuffApi.Common.Exceptions.ValidationException: Validation Errors: Error:ErrorInfo.Error cannot be greater than the current date: 2/26/2018 12:00:00 AM, Incoming Value:2/27/2018 12:00:00 AM;
at Stuff.Applications.Business.StuffApi.Web.Controllers.Stuff.Things(SomeRequest request) in f:\Builds\348\Policy Systems\V.12_Release.Applications.Business.Things\src\src\Web\Controllers\Stuff.cs:line 288
at lambda_method(Closure , Object , Object[] )
at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ActionExecutor.<>c__DisplayClass10.<GetExecutor>b__9(Object instance, Object[] methodParameters)
at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ExecuteAsync(HttpControllerContext controllerContext, IDictionary`2 arguments, CancellationToken cancellationToken)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
Try this regex
Stuff\.Applications\.Business\.StuffApi\.Common\.Exceptions\.ValidationException:(?<text>[\s\S]+)at Stuff\.Applications\.Business\.StuffApi\.Web\.Controllers\.Stuff\.Things
Related
I need to write the regex to fetch the details from the following data
Type Time(s) Ops TPS(ops/s) Net(M/s) Get_miss Min(us) Max(us) Avg(us) Std_dev Geo_dist
Period 5 145443 29088 22.4 37006 352 116302 6600 7692.04 4003.72
Global 10 281537 28153 23.2 41800 281 120023 6797 7564.64 4212.93
The above is the log which i get from a log file
I have tried writing the reg ex to get the details in the table format but could not get.
Below is the reg ex which i tried.
Type[\s+\S+].+\n(?<time>[\d+\S+\s+]+)[\s+\S+].*Period
When it comes to Period keyword the regex fails
If for some reason RichG's suggestion of using multikv doesn't work, the following should:
| rex field=_raw "(?<type>\w+)\s+(?<time>[\d\.]+)\s+(?<ops>[\d\.]+)\s+(?<tps>[\d\.]+)\s+(?<net>[\d\.]+)\s+(?<get_miss>[\d\.]+)\s+(?<min>[\d\.]+)\s+(?<max>[\d\.]+)\s+(?<avg>[\d\.]+)\s+(?<std_dev>[\d\.]+)\s+(?<geo_dist>[\d\.]+)"
Where is your data coming from?
I have a text file with data formatted as below. Figured out how to format the second part of the file to format it for upload into a db table. Hitting a wall trying to get the just the first 7 lines to format in the same way.
If it wasn't obvious, I'm trying to get it pipe delimited with the exact same number of columns, so I can easily upload it to the db.
Year: 2019 Period: 03
Office: NY
Dept: Sales
Acct: 111222333
SubAcct: 11122234-8
blahblahblahblahblahblahblah
Status: Pending
1000
AAAAAAAAAA
100,000.00
2000
BBBBBBBBBB
200,000.00
3000
CCCCCCCCCC
300,000.00
4000
DDDDDDDDDD
400,000.00
some kind folks answered my question about the bottom part, using the following code I can format that to look like so -
(.*)\r?\n(.*)\r?\n(.*)(?:\r?\n|$)
substitute with |||||||$1|$2|$3\n
|||||||1000|AAAAAAAAAA|100,000.00
|||||||2000|BBBBBBBBBB|200,000.00
|||||||3000|CCCCCCCCCC|300,000.00
|||||||4000|DDDDDDDDDD|400,000.00
just need help formatting the top part - to look like this, so the entire file matches with the exact same number of columns.
Year: 2019|Period: 03|Office: NY|Dept: Sales|Acct: 111222333|SubAcct: 11122234-8|blahblahblahblahblahblahblah|Status: Pending|||
I'm ok with having multiple passes on the file to get the desired end result.
I've helped you on your previous question, so I will focus now on the first part of your file.
You can use this regex:
\n|\b(?=Period)
Working demo
And use | as the replacement string
If you don't want the previous space before Period, then you can use:
\n|\s(?=Period)
I have a .txt file from which I have to fetch name and age.
The .txt file has data in the format like:
Age: 71 . John is 47 years old. Sam; Born: 05/04/1989(29).
Kenner is a patient Age: 36 yrs Height: 5 feet 1 inch; weight is 56 kgs.
This medical record is 10 years old.
Output 1: John, Sam, Kenner
Output_2: 47, 29, 36
I am using the regular expression to extract data. For example, for age, I am using the below regular expressions:
re.compile(r'age:\s*\d{1,3}',re.I)
re.compile(r'(age:|is|age|a|) \s*\d{1,3}(\s|y)',re.I)
re.compile(r'.* Age\s*:*\s*[0-9]+.*',re.I)
re.compile(r'.* [0-9]+ (?:year|years|yrs|yr) \s*',re.I)
I will apply another regular expression to the output of these regular expressions to extract the numbers. The problem is with these regular expressions, I am also getting the data which I do not want. For example
This medical record is 10 years old.
I am getting '10' from the above sentence which I do not want.
I only want to extract the names of people and their age. I want to know what should be the approach? I would appreciate any kind of help.
Please take a look at the Cloud Data Loss Prevention API. Here is a GitHub repo with examples. This is what you'll likely want.
def inspect_string(project, content_string, info_types,
min_likelihood=None, max_findings=None, include_quote=True):
"""Uses the Data Loss Prevention API to analyze strings for protected data.
Args:
project: The Google Cloud project id to use as a parent resource.
content_string: The string to inspect.
info_types: A list of strings representing info types to look for.
A full list of info type categories can be fetched from the API.
min_likelihood: A string representing the minimum likelihood threshold
that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
max_findings: The maximum number of findings to report; 0 = no maximum.
include_quote: Boolean for whether to display a quote of the detected
information in the results.
Returns:
None; the response from the API is printed to the terminal.
"""
# Import the client library.
import google.cloud.dlp
# Instantiate a client.
dlp = google.cloud.dlp.DlpServiceClient()
# Prepare info_types by converting the list of strings into a list of
# dictionaries (protos are also accepted).
info_types = [{'name': info_type} for info_type in info_types]
# Construct the configuration dictionary. Keys which are None may
# optionally be omitted entirely.
inspect_config = {
'info_types': info_types,
'min_likelihood': min_likelihood,
'include_quote': include_quote,
'limits': {'max_findings_per_request': max_findings},
}
# Construct the `item`.
item = {'value': content_string}
# Convert the project id into a full resource id.
parent = dlp.project_path(project)
# Call the API.
response = dlp.inspect_content(parent, inspect_config, item)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
try:
if finding.quote:
print('Quote: {}'.format(finding.quote))
except AttributeError:
pass
print('Info type: {}'.format(finding.info_type.name))
print('Likelihood: {}'.format(finding.likelihood))
else:
print('No findings.')
I have a text (see below) where
I would like to extract the date only for the specific status, when the date appears after "New on Date".
I want the formula to answer: if the status is "New" then extract the "date".
I tried this: =If(A2 = "New",REGEXEXTRACT(A2,"(\d{1,}?\/\d{1,}?\/\d{4})"),)
I also tried the same by adding Find and Search but still unsuccessful.
I know that this part of the formula works: REGEXEXTRACT(A2,"(\d{1,}?\/\d{1,}?\/\d{4})")
But I do not manage to find the other part: Would anyone have a guess?
Contract Rejected/Contract Withdrew on Date: 11/11/2016 6:23:33 AM and Modified by: Eletttt|| Offer Negotiation on Date: 6/2/2016 5:36:04 AM and Modified by: Dexx|| HR Screening on Date: 4/14/2016 2:30:57 AM and Modified by: Dexxx|| New on Date: 4/14/2016 2:24:58 AM and Modified by: Dexxx|| Contract sent on Date: 6/7/2016 11:03:58 AM and Modified by: Chrisyyy|| Pending Contract Approval on Date: 6/7/2016 4:56:55 AM and Modified by: Debxxx|| HM Interview on Date: 5/10/2016 12:40:30 AM and Modified by: Debxxx
If you need to extract the date after New on Date, you need to add this text to the pattern and keep the capturing group where it is now:
=REGEXEXTRACT(A2, "New on Date:\s*(\d{1,2}/\d{1,2}/\d{4})")
See the screenshot (with the data inside B29 cell):
See the regex demo.
Here is a simpler one for you:
=REGEXEXTRACT(A1,"New on Date:\s(\d\S+)")
I got a message with below structure, where message starts from tag :20: and ends at :86:. I want to write a regular expression to extract the all messages.
I would write a C# utility to extract each message and put it in ArrayList.
:20:160212-2359
:21:600******444
:28C:00001/00001
.
.
.
:86:DAILY SETTLEMENT /ENTRY-13 MAR
:62F:D160212GBP1229387,45
:64:D160212GBP1229387,45
:65:D120314GBP1229387,45
:65:D120315GBP1229387,45
:65:D120316GBP1229387,45
:65:D120317GBP1229387,45
:65:D120318GBP1229387,45
:86:FORWARD AVAILABLE FUNDS SHOW ITEMS KNOWN BUT NOT YET POSTED
some more comments in 86_2 segment
this is line2
:20:160212-2359
:21:B***22
:25:60*****88
.
.
.
:86:/ENTRY-13 MAR TRF/REF 6*******64 /ORD/ some line here
*********************** /BNF/ JO 88
:62F:C160212EUR13868931,00
:64:C160212EUR13868931,00
:65:C120314EUR13868931,00
:65:C120315EUR13791849,00
:65:C120316EUR13791849,00
:65:C120317EUR13791849,00
:65:C120318EUR13791849,00
:86:FORWARD AVAILABLE FUNDS SHOW ITEMS KNOWN BUT NOT YET POSTED
some more comments in 86_2 segment.
:20:160212-2359
:21:B****X
:25:6*************1
:28C:00001/00001
:86:STORE1 EUROPE B.V. /ENTRY-15 MAR RTS/REF 6*****6 RTS
SWEPT FROM 9999 1**** XX***********BILLING CHARGES -
28FEB12 TRF/REF 6641XXX43799053 /ITEMCNT/004 /BNF/ /ITEMCNT/004
BILLING CHARGES
:61:1203130313DR10000000,00****288//6*****6
:86:STORE1 CNRTY SRL /ENTRY-13 MAR CLG/REF 66**********6
:61:1*****000,00NT*****9846//6******74
:86:NAME /ENTRY-13 MAR CLG/REF 6******4 LA C****R
**** CASH DEPOSIT STORE1
:61:1203150315DR48531,00NCHGBILLING CHARGES//6641XXX43799053
:86:BILLING CHARGES - 28FEB12 /ENTRY-15 MAR TRF/REF
66******53 /ITEMCNT/004
:62F:C160212EUR0,00
:64:C160212EUR0,00
:65:C120314EUR0,00
:65:C120315EUR0,00
:65:C120316EUR0,00
:65:C120317EUR0,00
:65:C120318EUR0,00
:86:FORWARD AVAILABLE FUNDS SHOW ITEMS KNOWN BUT NOT YET POSTED
{newline}
Actual values are replaced with '*' character.
Thanks
Dhiraj Bhavsar
Try this
:20:(.*?):86:
in code
/:20:(.*?):86:/gs
https://regex101.com/r/dW4zS3/1
.*? matches any character between zero and unlimited times, as few times as possible, expanding as needed