Problems with extracting json respnse value in scala using brackets

Problems with extracting json respnse value in scala using brackets - regex

Have tried using " and ' in different combinations to extract the value 13029416243 from the JSON response body I get in a gatling/scala script
,\"initialString\":\"13029416243\"},
This has been some of my try outs:
.check(regex("initialString(.*?)}").exists.saveAs("initialString"))
and
.check(regex("initialString\\\":\\\"(.*?)\\\"}").exists.saveAs("initialString"))
where the last one results in this output in the log:
---- Errors --------------------------------------------------------------------
regex(initialString\":\"(.*?)\"}).find.exists, found nothing
Any help on how to obtain the value?

You can make the use of lookbehinds. But to be honest. Json should be parsed with json parsers. Regex is not always a reliable tool for searching in jsons.
(?<=initialString\\":\\")\d+\b
https://regex101.com/r/Wgeurd/1/

Related

JMeter Regular extraction looking to correlate a value between two dynamic variables

im in a bit of a pickle
im trying to extract a value where the value before it and after it are very redundant.
[sample of the response with the desired value highlighted][1]
now since the value before it and after it are dynamic that narrows down the regex values i can rely on.
and validating every idea i have gives back a wider string.
how can i extract this value?
[1]: https://i.stack.imgur.com/kYUXf.png

JSON is not a regular language therefore using Regular Expressions for extracting data from it is not the best idea.
Consider using i.e. JSON Extractor which allows using arbitrary JsonPath queries allowing fetching data from JSON in flexible, readable and more powerful way.
If you post the code in text form and not as an image I could come up with a proper JsonPath expression.

Need to extract a specific detail from a json

I have the following json which i'm looking to extract the specific string which come after location_code and between quotes XXX123
{"location_code":"XXX123","location_uuid":"XXX-XXX-XXAA-4444-ASDFSDAF44","hotstamp":"1111","card_format":"ABC","accesses":[{"partition_name":"SSSSuljiro","SSSSS":"3","access_levels":["ASDASDASDA"],"location_code":"XXX123"}]}
Would greatly appreciate any help with this!!
I'm using redshift and tried several attempts with regexp_substr

This should work:
(?<="location_code":")\w+
but since you want to parse a json object, there could be better/easier ways to do so.

I believe you could use json_extract_path_text:
select json_extract_path_text(
json_column, -- your json
'location_code', -- json key to extract data from
true -- return null if input is invalid json
);
Make sure that your string is actually valid JSON format.

parse hl7 with regex

I have the following hl7 message:
MSH|^~\&|EPIC|SMHRMC|JCAPS|QHN|20170626165726|EDILABIH|ORU^R01^LAB|00004841|P|2.3|||||||||
PID|1||W00xxxxx^^^SMHRMC||mouse^Mickey^E||19860905|F||1|2601 somestreet AVE NO 8^^City^ST^zip^USA^^^county|MESA|(970)xxx-xxxx^P^PH|||Single||175375903|xxxxxxx||last^first^^|NON-HISPANIC||||||||||
PV1|1|I|MNEU^908^A^^R^^^^^^||||9999999^pcp^pcp^LYNNE^^^^^NPI^^^^NPI~999999999^last^first^LEE^^^^^NPI^^^^NPI||||||||||00000000^last^first^LYNNE^^^^^NPI^^^^NPI||000000603|CAID||||||||||||||||||||||||20170626000000
Hl7 is hard to extract with regex however I have an field that is always in the same location and feel that might be easier. I need to pull the encounter number which is the 'W00xxxxx' in the stream above. It is always in the 3rd pipe delimited section of the PID and stops at the ^.
Currently I have: select substring(column from 'PID\|[1]\|\|(.)\^') but this is not working. However when I use select substring(column from 'PV1\|[1]\|(.)\|') it will pull the 'I'. I can't see the big differences in my regex to know why this isn't working. Thanks.

how about this:
PID\|[1]\|\|(.+?)\^

You can't reliably parse HL7 V2.x messages using regex because the encoding characters may change in MSH-1 and MSH-2. Whatever language you're using there's probably already an HL7 parsing library you can use instead.

Are my regex just wrong or is there a buggy behaviour in td-agent's format behaviour?

I am using fluentd, elasticsearch and kibana to organize logs. Unfortunately, these logs are not written using any standard like apache, so I had to come up with the regex for the format myself. I used this site here to verify that they are working: http://fluentular.herokuapp.com/ .
The logs have roughly this format here:
DEBUG: 24.04.2014 16:00:00 [SingleActivityStrategy] Start Activitiy 'barbecue' zu verabeiten.
the format regex I am using is as follows:
format /(?<pri>([INFO]|[DEBUG]|[ERROR])+)...(?<date>(\d{2}\.\d{2}\.\d{4})).(?<time>(\d{2}:\d{2}:\d{2})).\[(?<subject>(.*))\].(?<msg>(.*))/
Now, judging by that website that is supposed to test specifically fluentd's behaviour with regexes, the output SHOULD be this one:
Record
Key Value
pri DEBUG
date 24.04.2014
subject SingleActivityStrategy
msg Start Activitiy 'barbecue' zu verabeiten.
Instead though, I have this ?bug? that pri is always shortened to DEBU. Same for ERROR which becomes ERRO, only INFO stays INFO. I am not very experienced with regular expressions and I find it hard to believe that this is a bug, still it confuses me and any help is greatly appreciated.
I'm not sure I can link the complete config file because I dont personally own these log files and I am trying to keep it on a level that my boss won't get mad at me for posting sensitive information, but should it definately be needed, I will post them later on after having asked him how much I can reveal.
In general, the logs always look roughly like this:
First the priority, which is either DEBUG, ERROR or INFO, next the date , next what we call the subject which is always written in [ ] and finally just a message.
Here is a link to fluentular with the format I am using and a teststring that produces the right result in fluentular, but not in my config file:
Fluentular
Sorry I couldn't make it work like a regular link to just click on.
Another link to test out regex with my format and test string is this one:
http://rubular.com/r/dfXOkQYNXP
tl;dr version:
my td-agent format regex cuts off the last letter, although fluentular says it shouldn't. My fault or a bug?

How the regex would look if you're trying to match the data specifically:
(INFO|DEBUG|ERROR)\:\s+(\d{2}\.\d{2}\.\d{4})\s(\d{2}:\d{2}:\d{2})\s\[(.*)\](.*)
In your format string, you were using . and ... for where your spaces and colon should be. I'm not to sure on why this works in Fluentular, but you should have matched the \: explicitly and each space between the values.
So you'd be looking at the following regular expression with the Fluentd fields (which are grouping names):
(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))
Meaning your td-agent.conf should look like:
<source>
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))/
</source>
I would also take a look into comparing Logstash vs. Fluentd. I like Logstash far more because you create Grok filters to match the type of data you want, and it makes formatting your fields much easier because you are providing an abstraction layer, but you essentially will get the same data.
And I would watch out when you're using sites like Rubular, as they are fairly particular about multi-line matching and the like. I'd suggest something like Regexr which gives immediate feedback and you can set global and multiline matching as well.

RGoogleAnalytics replacing unexpectedly escaped characters with gsub

I'm using RGoogleAnalytics, I'm just at the learning stage at the moment.
I'm following the code in the tutorial here https://code.google.com/p/r-google-analytics/
But when I try to run
ga.goals <- conf$GetGoals()
ga.goals
I get an error message telling me there is an unexpected escaped character '\.' at pos 7
I get a similar message for the next two lines of code (GetSegments)
This question deals with a similar problems in the Facebook Graphs API
How to replace "unexpected escaped character" in R
I've tried using a similar bit of code
confGoalsSub <- gsub('\\.', ' ', conf$GetGoals())
to remove the escaped characters, but I get another error :
cannot coerce type 'closure' to vector of type 'character'
Out of desperation I have tried confGoalsSub <- gsub('\\.', ' ', conf) which returns a character vector that is just garbage (it's just the code for conf with the decimal points stripped out).
Can anyone suggest a better expression than gsub that will return a useful object?
EDIT: As per the suggestion below I've now added the brackets at the end of the function call but I still get the same error message about unexpected escape characters. I get the same error when I try to call other, similar function such as $GetSegments().
I saw on one video at the weekend that this package was broken for a long time, although the speaker did not provide details as to why. Perhaps I should give up and try one of the other Google Analytics packages in R.
Seems odd, given that this one is supposed to be Google supported.

I think this error arises when the RJSON library isn't able to parse the Google Analytics Data Feed properly and convert it into a nested list. The updated version of [RGoogleAnalytics] (http://cran.r-project.org/web/packages/RGoogleAnalytics/index.html) fixes this problem. Currently, you won't be able to retrieve Goals and Segments from your Google Analytics account using the library but beyond that it supports the full range of dimensions and metrics.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Problems with extracting json respnse value in scala using brackets - regex

You can make the use of lookbehinds. But to be honest. Json should be parsed with json parsers. Regex is not always a reliable tool for searching in jsons. (?<=initialString\\":\\")\d+\b https://regex101.com/r/Wgeurd/1/

Related

JMeter Regular extraction looking to correlate a value between two dynamic variables

Need to extract a specific detail from a json

parse hl7 with regex

Are my regex just wrong or is there a buggy behaviour in td-agent's format behaviour?

RGoogleAnalytics replacing unexpectedly escaped characters with gsub

Categories

Resources