regex grouping and decimal plus comma and Time - regex

I need a regex for these String(s):
17 h 13 min - 43,1 km - Ankunft ca. 11:48
17 h 13 min - 1.443,1 km - Ankunft ca. 11:48
13 min - 431 m - Ankunft ca. 11:48
17 h - 3 km - Ankunft ca. 09:04
It has to be grouped in 3 Groups (routerarrivalin, routedistance, routeeta).
routearrival: 17 h 13 m
routedistance: 43,1 km
routeeta: 11:48
I tried alot but I am constantly failing. What I have right now is this regex:
(?<routearrivalin>\d+?\s?h?\s?\d+?\s*m).+(?<routedistance>\d\d,\d\s[km|m]*).+(?<routeeta>\d\d:\d\d)
I cant filter the distance to match every possibility I have.
If someone is curious what I'm doing here: the string is the navigationinformation you get on a android phone from google maps. I want to split the string into mentioned variables to handle them with tasker and forward those to my display in the speedometer.

So it took a little tweaking, but I have included some extra groups to handle the optional and changing parts and I think this is what you are looking for:
(?<routearrivalin>([\d\s]+h)?([\d\s]*m)?)(in)?[\s-]+(?<routedistance>([\d.,]+\s+(km|m))).+(?<routeeta>\d\d:\d\d)
You can see it here at Regex101 working with the example you have given

Related

How to isolate specific words and years in group 1 no matter the combination

I got some values coming in with various look such as
autumn ux-s 2021
2021 pes-3 autumn P-S
pes-3 autumn 2021 32
autumn usd- fosd 2021 2
I really want to isolate "autumn" and "2021" - both in group 1
Of course the "autumn" could also be "spring", "summer", "winter" while the year of course should match the year.
It doesnt matter if i get "2021 autumn" and "autumn 2021" as long as i can isolate it within the same group 1
How could i achieve this? I simply cant see how i can keep it within one single group ?
I can isolate the location here, but of course still match the whole thing
((?:(?:autumn|spring)(?:\s*[a-zA-Z]*\s*)\d{4})|(?:\d{4}(?:\s*[a-zA-Z]*\s*)(?:autumn|spring)))
Can i somehow substract only partials from here and combine them into a single group result?
I did not find a regex to capture what you want in 1 group but maybe this one-liner solution helps you?
import re
text = ["autumn ux-s 2021", "2021 pes-3 autumn P-S", "pes-3 autumn 2021 32" ,"autumn usd- fosd 2021 2"]
pattern = r"(autumn|summer|winter|spring).*(\d{4})|(\d{4}).*(autumn|summer|winter|spring)"
print([' '.join(filter(None, re.search(pattern, txt, re.IGNORECASE).groups())) for txt in text])
output: ['autumn 2021', '2021 autumn', 'autumn 2021', 'autumn 2021']

How do I create a pivot table with weighted averages from a table in PowerBI?

I have data in the following format:
Building
Tenant
Type
Floor
Sq Ft
Rent
Term Length
1 Example Way
Jeff
Renewal
5
100
100
6
47 Fake Street
Tom
New
3
500
200
12
I need to create a visualisation in PowerBI that displays a pivot table of attribute by tenant, with a weighted averages (by square foot) column, like this:
Jeff
Tom
Weighted Average (by Sq Ft)
Building
1 Example Way
47 Fake Street
-
Type
Renewal
New
-
Floor
5
3
-
Sq Ft
100
500
433.3333333
Rent
100
200
183.3333333
Term Length (months)
6
12
11
I have unpivoted the original data, like this:
Tenant
Attribute
Value
Jeff
Building
1 Example Way
Jeff
Type
Renewal
Jeff
Floor
5
Jeff
Sq Ft
100
Jeff
Rent
100
Jeff
Term Length (months)
6
Tom
Building
47 Fake Street
Tom
Type
New
Tom
Floor
3
Tom
Sq Ft
500
Tom
Rent
200
Tom
Term Length (months)
12
I can almost create what I need from the unpivoted data using a matrix (as below), but I can't calculate the weighted averages column from that matrix.
Jeff
Tom
Building
1 Example Way
47 Fake Street
Type
Renewal
New
Floor
5
3
Sq Ft
100
500
Rent
100
200
Term Length (months)
6
12
I can also create a table with my attributes as headers (instead of in a column). This displays the right values and lets me calculate weighted averages (as below).
Building
Type
Floor
Sq Ft
Rent
Term Length (months)
Jeff
1 Example Way
Renewal
5
100
100
6
Tom
47 Fake Street
New
3
500
200
12
Weighted Average (by Sq Ft)
-
-
-
433.3333333
183.3333333
11
However, it's important that these values are displayed vertically instead of horizontally. This is pretty straightforward in Excel, but I can't figure out how to do it in PowerBI. I hope this is clear. Can anyone help?
Thanks!

PowerBI running Total formula

I have a dataset OvertimeHours with EMPLID, checkdate and NumberOfHours (and other fields). I need a running total NumberOfHours for each employee by checkdate. I tried using the Quick Measure option but that only allows for a single column and I have two. I do not want the measure to recalculate when filters are applied. Ultimately what I am trying to do is identify the records for the first 6 hours of overtime worked on each check so that they can get a category of OCB and all overtime over the first 6 hours is OTP and it does not have to be exact (as demonstrated in the output below). I have only been working with Power BI for about a month and this is a pretty complex (for me) formula to figure out...
EMPLID CheckDate WkDate NumberOfHours RunningTotal Category
124 1/1/19 12/20/18 5 5 OCB
124 1/1/19 12/21/18 9 14 OTP
125 1/1/19 12/20/18 3 3 OCB
125 1/1/19 12/20/18 2 5 OCB
125 1/1/19 12/22/18 2 7 OTP
124 1/15/19 1/8/19 3 3 OCB
*Edited to add the WkDate.
Edit:
I have tweaked my query so that I have the running total and a sequential counter now:
Using the first 12 records, I am looking to get the following results:
I can either do it in a query if that is the easiest way or if there is a way to use DAX in PowerBI with this dataset now that I have the sequential piece, I can do that too.
I got it in the query:
select r.CheckDate,
r.EMPLID,
case
when PayrollRunningOTHours <= 6
then PayrollRunningOTHours
else 6
end as OCBHours,
case
when PayRollRunningOTHours > 6
then PayRollRunningOTHours - 6
end as OTPHours
from #rollingtotal r
inner
join lastone l
on r.CheckDate = l.CheckDate
and r.EMPLID = l.EMPLID
and r.OTCounter = l.lastRec
order by r.emplid,
r.CheckDate,
r.OTCounter

Pandas: select rows from columns using Regex

I want to extract rows from column feccandid that have a H or S as the first value:
cid amount date catcode feccandid
0 N00031317 1000 2010 B2000 H0FL19080
1 N00027464 5000 2009 B1000 H6IA01098
2 N00024875 1000 2009 A5200 S2IL08088
3 N00030957 2000 2010 J2200 S0TN04195
4 N00026591 1000 2009 F3300 S4KY06072
5 N00031317 1000 2010 B2000 P0FL19080
6 N00027464 5000 2009 B1000 P6IA01098
7 N00024875 1000 2009 A5200 S2IL08088
8 N00030957 2000 2010 J2200 H0TN04195
9 N00026591 1000 2009 F3300 H4KY06072
I am using this code:
campaign_contributions.loc[campaign_contributions['feccandid'].astype(str).str.extractall(r'^(?:S|H)')]
Returns error:
ValueError: pattern contains no capture groups
Does anyone with experience using Regex know what I am doing wrong?
Why not just use str.match instead of extract and negate?
ie df[df['col'].str.match(r'^(S|H)')]
(I came here looking for the same answer, but the use of extract seemed odd, so I found the docs for str.ops.
W
For something this simple, you can bypass the regex:
relevant = campaign_contributions.feccandid.str.startswith('H') | \
campaign_contributions.feccandid.str.startswith('S')
campaign_contributions[relevant]
However, if you want to use a regex, you can change this to
relevant = ~campaign_contributions['feccandid'].str.extract(r'^(S|H)').isnull()
Note that the astype is redundant, and that extract is enough.

Regular Expression to capture a TMatchCollection of Paragraphs (Using Delphi XE 6)

I'm trying to capture a collection of paragraphs that look like those shown below.
I would like to capture each paragraph in a separate collection. I have figured out how to capture each line independently, but not the full paragraph.
I'm using PCRE engine.
Any help would be greatly appreciated. I think there are may be new lines/line breaks at the end of each line also...if that makes a difference. Some paragraphs may be 5 lines long, or as short as 2 lines.
FORECAST VALID 04/0000Z 33.8N 77.3W
MAX WIND 85 KT...GUSTS 105 KT.
64 KT... 20NE 20SE 0SW 20NW.
50 KT... 40NE 50SE 20SW 40NW.
34 KT...100NE 110SE 70SW 60NW.
FORECAST VALID 04/1200Z 36.3N 74.4W
MAX WIND 90 KT...GUSTS 110 KT.
64 KT... 30NE 30SE 0SW 20NW.
50 KT... 50NE 50SE 30SW 40NW.
34 KT...100NE 110SE 80SW 70NW.
FORECAST VALID 05/0000Z 39.4N 70.2W
MAX WIND 60 KT...GUSTS 75 KT.
50 KT... 60NE 80SE 60SW 60NW.
34 KT...100NE 130SE 110SW 90NW.