I am trying to write a regex that will parse the following Cisco log messages correctly:
<191>45902: DC-SWITCH2: Aug 30 18:15:16.478: %SFF8472-3-THRESHOLD_VIOLATION: Te0/2: Rx power high warning; Operating value: -0.8 dBm, Threshold value: -1.0 dBm.
Desired output:
Te0/2: Rx power high warning; Operating value: -0.8 dBm, Threshold value: -1.0 dBm.
And:
<191>45902: DC-SWITCH2: Aug 31 19:17:30.147: sensor num : 10 sensor_value :33, high :110 low:85
Desired output:
sensor num : 10 sensor_value :33, high :110 low:85
I have developed the following regex for the first case, but I cannot fathom how to make the mnemonic %STRING section optional:
>\d+:\s.+?:\s.+?(?=:\s):\s%.+?(?=:\s):?\s(.+)
It returns the desired result for the first example, but for the second I get:
10 sensor_value :33, high :110 low:85
You want to make the part that checks for the %STRING non-capturing.
Something like this:
>\d+:\s.+?:\s.+?(?=:\s):\s(?:%.+?:)?\s(.+)
See https://regex101.com/r/F30ALK/1
Why not try something generic like
\d{2}:\d{2}:\d{2}.\d{3}.*? (\b[A-Za-z].*)
where the required output will be in the Group 1.
Example shown here
Related
I need to write the regex to fetch the details from the following data
Type Time(s) Ops TPS(ops/s) Net(M/s) Get_miss Min(us) Max(us) Avg(us) Std_dev Geo_dist
Period 5 145443 29088 22.4 37006 352 116302 6600 7692.04 4003.72
Global 10 281537 28153 23.2 41800 281 120023 6797 7564.64 4212.93
The above is the log which i get from a log file
I have tried writing the reg ex to get the details in the table format but could not get.
Below is the reg ex which i tried.
Type[\s+\S+].+\n(?<time>[\d+\S+\s+]+)[\s+\S+].*Period
When it comes to Period keyword the regex fails
If for some reason RichG's suggestion of using multikv doesn't work, the following should:
| rex field=_raw "(?<type>\w+)\s+(?<time>[\d\.]+)\s+(?<ops>[\d\.]+)\s+(?<tps>[\d\.]+)\s+(?<net>[\d\.]+)\s+(?<get_miss>[\d\.]+)\s+(?<min>[\d\.]+)\s+(?<max>[\d\.]+)\s+(?<avg>[\d\.]+)\s+(?<std_dev>[\d\.]+)\s+(?<geo_dist>[\d\.]+)"
Where is your data coming from?
Does pandas have a built-in string matching function for exact matches and not regex? The code below for tropical_two has a slightly higher count. Documentation tells me it does a regex search.
tropical = reviews['description'].map(lambda x: "tropical" in x).sum()
print(tropical)
tropical_two = reviews['description'].str.count("tropical").sum()
print(tropical_two)
The first way is the answer key from Kaggle but something about it seems less readable and intuitive to me compared to a .str function because when I run this it returns True instead of 2 so I am a little confused about if the answer key method is actually counting all occurrences of "tropical" and not just the first.
def in_str(text):
return "tropical" in text
in_str("tropical is tropical")
First 2 lines of dataframe:
0 Italy Aromas include tropical fruit, broom, brimston... Vulkà Bianco 87 NaN Sicily & Sardinia Etna NaN Kerin O’Keefe #kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) White Blend Nicosia
1 Portugal This is ripe and fruity, a wine that is smooth... Avidagos 87 15.0 Douro NaN NaN Roger Voss #vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red Quinta dos Avidagos
Notebook here, tropical code in cell #2
https://www.kaggle.com/mikexie0/exercise-summary-functions-and-maps
You may use str.count with word boundary markers to match the exact search term:
tropical_two = reviews['description'].str.count(r'\btropical\b').sum()
print(tropical_two)
There may not be the need for a separate exact API, as str.count can be used for exact matches as well.
Apologize for the way I describe my question, maybe it will be much clarified if I give the instance as below:
Consider the case that I have a certain file, and it is separated into different section with each beginning with a certain string consider as flag: e.g.
consider "From Clock" as the flag I mentioned about
example_file:
From Clock: fdbk_bufg_cell_in_net
To Clock: fdbk_bufg_cell_in_net
Setup : NA Failing Endpoints, Worst Slack NA , Total Violation NA
Hold : NA Failing Endpoints, Worst Slack NA , Total Violation NA
PW : 0 Failing Endpoints, Worst Slack 5.501ns, Total Violation 0.000ns
Pulse Width Checks
Clock Name: fdbk_bufg_cell_in_net
Waveform(ns): { 0.000 3.500 }
Period(ns): 7.000
Sources: { my_atspeed_mmcm/CLKFBOUT }
Check Type Corner Lib Pin Reference Pin Required(ns) Actual(ns) Slack(ns) Location Pin
Min Period n/a BUFGCE/I n/a 1.499 7.000 5.501 BUFGCE_X0Y35 fdbk_bufg_cell/I
From Clock: mmcm_clkout
To Clock: mmcm_clkout
Setup : NA Failing Endpoints, Worst Slack NA , Total Violation NA
Hold : 0 Failing Endpoints, Worst Slack 0.123ns, Total Violation 0.000ns
PW : 625 Failing Endpoints, Worst Slack -0.195ns, Total Violation -121.875ns
Now I only want to match the "PW :" line in "From Clock: mmcm_clkout" part, how do I do so?
You can try something along these lines:
my $match = 0;
while (<>)
{
if (/^From Clock: (\w+)/)
{
$match = ($1 eq "mmcm_clkout");
} elsif ($match && /^PW: whatever/)
{
# do whatever you want with the line
}
}
I've got a regex that is responsible for matching the pattern A:B in lines where you might have multiple matches (i.e. "A:B A: B A : B A:B", etc.) The problem lies in the complexity of what A represents.
I'm using the regex:
\b[\w|\(|\)+]+\s*:(?:(?![\w+]+\s*:).)*
to match items in:
Data_1: Tutor Elementary: 10 a F Test: 7.87 sips
Turning 1 Data (A Run), Data: 0.0 10.0 10.0 17.3 0.0
Turning 2 Data (A Run), Data2: 0.0 6.8 0.0 6.8 6.8
Data_1: Tutor Pool: Data2: A B C
Turning 2 (A Run), ABSOLUTE: 368 337 428 0 2 147
Data_4 : 4AZE Localization : 33.14 lat -86 long
Time: 0.75 Data Scenario: 3121.2
The question is this, if you examine this setup (I use https://regex101.com/), lines 2,3,5 don't return exactly what I'm looking for. Where the match is the first in the line, I want it to grab everything from the beginning of the line to the first ':'. Is this type of conditional regex possible? I've tried every possible way I could imagine, but I haven't been successful yet.
Thanks in advance!
A little complex, but try this here
^(.*?:.*?)(\b\w+\b\s*:.*?)\b\w+\b:.*$|^(.*?:.*?)\b\w+\b\s*:(.*?)$|^(.*)$
I am using Weka for my internship but I have a little knowledge about data mining. So, maybe someone knows how can I apply the following results on my data-sets to get all data by cluster ? The method that I use now is to compute distances between my attributes and the mean value of each cluster then I classify them by the nearest value. But this method is too rough for me .
=== Run information ===
Scheme:weka.clusterers.EM -I 100 -N -1 -M 1.0E-6 -S 100
Relation: wcet_cluster6 - Copie-weka.filters.unsupervised.attribute.Remove-R1-3,5-weka.filters.unsupervised.attribute.Remove-R5-12
Instances: 467
Attributes: 4
max
alt
stmt
bb
Test mode:evaluate on training data
=== Model and evaluation on training set ===
EM
Number of clusters selected by cross validation: 6
Cluster
Attribute 0 1 2 3 4 5
(0.28) (0.11) (0.25) (0.16) (0.04) (0.17)
==================================================================
max
mean 9.0148 10.9112 11.2826 10.4329 11.2039 10.0546
std. dev. 1.8418 2.7775 3.0263 2.5743 2.2014 2.4614
alt
mean 0.0003 19.6467 0.4867 2.4565 44.191 8.0635
std. dev. 0.0175 5.7685 0.5034 1.3647 10.4761 3.3021
stmt
mean 0.7295 77.0348 3.2439 12.3971 140.9367 33.9686
std. dev. 1.0174 21.5897 2.3642 5.1584 34.8366 11.5868
bb
mean 0.4362 53.9947 1.4895 7.2547 114.7113 22.2687
std. dev. 0.5153 13.1614 0.9276 3.5122 28.0919 7.6968
Time taken to build model (full training data) : 4.24 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 163 ( 35%)
1 50 ( 11%)
2 85 ( 18%)
3 73 ( 16%)
4 18 ( 4%)
5 78 ( 17%)
Log likelihood: -9.09081
Thanks for your help!!
I think no-one can really answer this. Some tips off the top of my head.
You have used the EM clustering algorithm, see animated gif on wikipedia page. From Weka's Documentation Synopsis:
"EM assigns a probability distribution to each instance which
indicates the probability of it belonging to each of the clusters. "
Is this complex output really what you want?
It also selects a number of clusters for you (unless you constrain that number).
In weka 3.7 you can use the unsupervised attribute filter "ClusterMembership" in the Preprocess dialog to replace your dataset with a result of the cluster assignments. You need to select one reference attribute, though. By default it selects the last one. This creates hard-to -interpret output.