I need to write the regex to fetch the details from the following data
Type Time(s) Ops TPS(ops/s) Net(M/s) Get_miss Min(us) Max(us) Avg(us) Std_dev Geo_dist
Period 5 145443 29088 22.4 37006 352 116302 6600 7692.04 4003.72
Global 10 281537 28153 23.2 41800 281 120023 6797 7564.64 4212.93
The above is the log which i get from a log file
I have tried writing the reg ex to get the details in the table format but could not get.
Below is the reg ex which i tried.
Type[\s+\S+].+\n(?<time>[\d+\S+\s+]+)[\s+\S+].*Period
When it comes to Period keyword the regex fails
If for some reason RichG's suggestion of using multikv doesn't work, the following should:
| rex field=_raw "(?<type>\w+)\s+(?<time>[\d\.]+)\s+(?<ops>[\d\.]+)\s+(?<tps>[\d\.]+)\s+(?<net>[\d\.]+)\s+(?<get_miss>[\d\.]+)\s+(?<min>[\d\.]+)\s+(?<max>[\d\.]+)\s+(?<avg>[\d\.]+)\s+(?<std_dev>[\d\.]+)\s+(?<geo_dist>[\d\.]+)"
Where is your data coming from?
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 days ago.
Improve this question
I have the following bit in a google script that parses pdfs:
function extractPDFtext(text){
const regexp = /[w,W,s,S]*(\d{3}).?(\d{3}).?(\d{3}).?(\d{3})?.?(\d{3})?[\w\W]*?(\d+.\d+)/gm;
try{
let array = [...text.match(regexp)];
return array;
}catch(e){
let array = ["No items found"]
return array;
}
};
The existing regex partially works (because the pdf's are not all equal) and so I have to restrict the search/matching between words/results and when I try to do it, I get no results. I would like to retrieve the digits related to Reference and Amount tags, while ignoring any words and digits in between. And it's here that I'm having some trouble because on regex101 I get the full match + the correct capturing groups but once on the script, I get no results.
This is a regex example based on what was suggested on another question of mine but in the end has the same problem as any of my other attempts:
^Reference\b[^\d\n]*[\t ](\d{3})[\t ]*(\d{3})[\t ]*(\d{3})[\t ]*(\d{3})[\t ]*(\d{3})(?:\n(?!Amount\b)\S.*)*\nAmount\b[^\d\n]*[\t ](\d+(?:,\d+)?)\b
So I'm wondering if the problem is with the regex or with the script and how to solve in any of those circumstances.
Below, there's some dummy text example of the variable text where the regex is being used on, baring in mind that it can have more words after each "tag" (example: Reference of something // Amount of first payment:); it can have : or not.
Some dummy text that may have words in common like `reference` or `amount` throughout the document
Reference: 245 154 343 345 345
Entity: 34567
Amount: 11,11
Payment date: 14/07/2022
Some more text
Maybe your trying to do too much with one command. Try breaking it up as I show below.
console.log(text);
let ref = text.match(/Reference.+/gm);
if( ref.length > 0 ) {
ref = ref[0].match(/\d.+/);
console.log(ref[0]);
}
ref = text.match(/Amount.+/);
if( ref.length > 0 ) {
ref = ref[0].match(/\d.+/);
console.log(ref[0]);
}
Execution log
8:55:50 AM Notice Execution started
8:55:50 AM Info Some dummy text that may have words in common like `reference` or `amount` throughout the document
Reference: 245 154 343 345 345
Entity: 34567
Amount: 11,11
Payment date: 14/07/2022
Some more text
8:55:50 AM Info 245 154 343 345 345
8:55:50 AM Info 11,11
8:55:50 AM Notice Execution completed
Im trying to get the number between '-' and '-' in google sheets but after trying many things I still havent been able to find the solution.
Data record 1
England Premier League
West Ham vs Crystal Palace
2.090 - 3.47 - 3.770
Expected value = 3.47
Data record 2
England League Two
Carlisle vs Scunthorpe
2.830 - 3.15 - 2.820
Expected value = 3.15
Hopefully someone can help me out
Try either of the following
option 1.
=INDEX(IFERROR(REGEXEXTRACT(AE1:AE4," \d+\.\d+ ")*1))
option 2.
=INDEX(IFERROR(REGEXEXTRACT(AE1:AE4,".* - (\d+\.\d+) ")))
(Do adjust the formula according to your ranges and locale)
use:
=INDEX(IFNA(REGEXEXTRACT(A1:A, "- (\d+(?:.\d+)?) -")*1))
I have a text file with data formatted as below. Figured out how to format the second part of the file to format it for upload into a db table. Hitting a wall trying to get the just the first 7 lines to format in the same way.
If it wasn't obvious, I'm trying to get it pipe delimited with the exact same number of columns, so I can easily upload it to the db.
Year: 2019 Period: 03
Office: NY
Dept: Sales
Acct: 111222333
SubAcct: 11122234-8
blahblahblahblahblahblahblah
Status: Pending
1000
AAAAAAAAAA
100,000.00
2000
BBBBBBBBBB
200,000.00
3000
CCCCCCCCCC
300,000.00
4000
DDDDDDDDDD
400,000.00
some kind folks answered my question about the bottom part, using the following code I can format that to look like so -
(.*)\r?\n(.*)\r?\n(.*)(?:\r?\n|$)
substitute with |||||||$1|$2|$3\n
|||||||1000|AAAAAAAAAA|100,000.00
|||||||2000|BBBBBBBBBB|200,000.00
|||||||3000|CCCCCCCCCC|300,000.00
|||||||4000|DDDDDDDDDD|400,000.00
just need help formatting the top part - to look like this, so the entire file matches with the exact same number of columns.
Year: 2019|Period: 03|Office: NY|Dept: Sales|Acct: 111222333|SubAcct: 11122234-8|blahblahblahblahblahblahblah|Status: Pending|||
I'm ok with having multiple passes on the file to get the desired end result.
I've helped you on your previous question, so I will focus now on the first part of your file.
You can use this regex:
\n|\b(?=Period)
Working demo
And use | as the replacement string
If you don't want the previous space before Period, then you can use:
\n|\s(?=Period)
Help much appreciated - I have a field in Redshift giving data of the form:
{\"frequencyCapList\":[{\"frequencyCapped\":true,\"frequencyCapPeriodCount\":1,\"frequencyCapPeriodType\":\"DAYS\",\"frequencyCapCount\":501}]}
What I would like to do is parse this cleanly as the output of a Redshift query into some columns like:
Frequency Cap Period Count | Frequency Cap Period Type | Frequency Cap Count
1 | DAYS | 501
I believe I need to use the regexp_subst function to achieve this but I cannot work out the syntax to get the required output :(
Thanks in advance for any assistance,
Carter
Here you go
select json_extract_path_text(json_extract_array_element_text(json_extract_path_text(replace('{\"frequencyCapList\":[{\"frequencyCapped\":true,\"frequencyCapPeriodCount\":1,\"frequencyCapPeriodType\":\"DAYS\",\"frequencyCapCount\":501}]}','\\',''),'frequencyCapList'),0),'frequencyCapPeriodCount');
just replace the last string with each one you want to extract!
I've got a regex that is responsible for matching the pattern A:B in lines where you might have multiple matches (i.e. "A:B A: B A : B A:B", etc.) The problem lies in the complexity of what A represents.
I'm using the regex:
\b[\w|\(|\)+]+\s*:(?:(?![\w+]+\s*:).)*
to match items in:
Data_1: Tutor Elementary: 10 a F Test: 7.87 sips
Turning 1 Data (A Run), Data: 0.0 10.0 10.0 17.3 0.0
Turning 2 Data (A Run), Data2: 0.0 6.8 0.0 6.8 6.8
Data_1: Tutor Pool: Data2: A B C
Turning 2 (A Run), ABSOLUTE: 368 337 428 0 2 147
Data_4 : 4AZE Localization : 33.14 lat -86 long
Time: 0.75 Data Scenario: 3121.2
The question is this, if you examine this setup (I use https://regex101.com/), lines 2,3,5 don't return exactly what I'm looking for. Where the match is the first in the line, I want it to grab everything from the beginning of the line to the first ':'. Is this type of conditional regex possible? I've tried every possible way I could imagine, but I haven't been successful yet.
Thanks in advance!
A little complex, but try this here
^(.*?:.*?)(\b\w+\b\s*:.*?)\b\w+\b:.*$|^(.*?:.*?)\b\w+\b\s*:(.*?)$|^(.*)$