how do I use f-string with regex in Python - regex

This code works if I use raw strings only. However, as soon as I add f to r it stops working.
Is there a way to make f-strings work with raw strings for re?
import re
lines = '''
04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009
Feb 2009; Sep 2009; Oct 2010
6/2008; 12/2009
2009; 2010
'''
rmonth = 'a'
regex = fr'(\d{1,2})/(\d{1,2})/(\d{4}|\d{2})'
date_found = re.findall(regex, lines)
date_found

The new fstrings in Python interpret brackets in their own way. You can escape brackets you want to see in the output by doubling them:
regex = fr'(\d{{1,2}})/(\d{{1,2}})/(\d{{4}}|\d{{2}})'

Related

How to have a measure always show the ratio as percentage in scorecard/textbox POWER BI

There are many method of having a measure to show percentage in a column of table ,
but cannot find a method to always show the ratio of a SPECIFIC group in percentage between two category.
data sample:
YEAR MONTH TYPE AMOUNT
2020 Jan A 100
2020 Feb A 250
2020 Mar A 230
2020 Jan B 158
2020 Feb B 23
2020 Mar B 46
2019 Jan A 499
2019 Feb A 65
2019 Mar A 289
2019 Jan B 465
2019 Feb B 49
2019 Mar B 446
2018 Jan A 13
2018 Feb A 97
2018 Mar A 26
2018 Jan B 216
2018 Feb B 264
2018 Mar B 29
2018 Jan A 314
2018 Feb A 659
2018 Mar A 226
2018 Jan B 469
2018 Feb B 564
2018 Mar B 164
My Goal is always show the percentage of A compare with the total amount
YEAR and MONTH are used to synchronize with slicer.
e.g. I select YEAR = 2020 , MONTH = Jan
100/258 = 38%
Manually inputted in textbox
First, Create these following 3 measures in your table-
1.
amount_A =
CALCULATE(
SUM(pie_chart_data[AMOUNT]),
FILTER(
ALLSELECTED(pie_chart_data),
pie_chart_data[TYPE] = "A"
)
)
2.
amount_overall =
CALCULATE(
SUM(pie_chart_data[AMOUNT]),
ALLSELECTED(pie_chart_data)
)
3.
amount_A_percentage = [amount_A]/[amount_overall]
Now, add both measure amount_A and amount_overall to your donut chart's values column. And place the amount_A_percentage measure to a Card and place the card in center of the Donut chart. The presentation will be as below finally-

Substituting with regex in python does not work

I have a long string containing different dummy dates and want to replace a part of a mail multiple times with different date values in my dummy_date string. The below code return 1 result substituted with the last date of dummy_date. How can I do that for all date values in the dummy_date? I cannot iterate over string. Also, when I write my string to .txt file, just last item is written.
dummy_dates= "14 B-Laycan Dec I-Laycan 2012 I-Laycan
15 B-Laycan Jan I-Laycan 2013 I-Laycan
16 B-Laycan Feb I-Laycan 2014 I-Laycan
17 B-Laycan Mar I-Laycan 2014 I-Laycan"
dummy1= re.sub(r"((\d{1,2})([\w\W]{1,10})( B-Laycan)+(([\w\W]{1,10}( I-Laycan)+){0,6}))", dummy_dates ,mail)

python program to grep the output of a file with a time range

I am using python 2.6.6
I have a sample file 1.csv
1.csv
11887788201606180000 value=1 sat sun mon tue , 998848494 992920209 992828282 kdkkdkdf 992828228 o333448482
28283838201606180000 value-2 jan feb mar apr , 8849494994 49499494 499494949 49949494 499494484 449494994
33838383201606180000 value-2 jan feb mar apr , 8849494994 49499494 499494949 49949494 499494484 449494994
47474747201606190000 value-2 jan feb mar apr , 8849494994 49499494 499494949 49949494 499494484 449494994
47474747201606200000 value-2 jan feb mar apr , 8849494994 49499494 499494949 49949494 499494484 449494994
I want to get the data from time range 20160618 to 20160619
and my expected output should like this:
11887788201606180000 value=1 sat sun mon tue , 998848494 992920209 992828282 kdkkdkdf 992828228 o333448482
28283838201606180000 value-2 jan feb mar apr , 8849494994 49499494 499494949 49949494 499494484 449494994
33838383201606180000 value-2 jan feb mar apr , 8849494994 49499494 499494949 49949494 499494484 449494994
47474747201606190000 value-2 jan feb mar apr , 8849494994 49499494 499494949 49949494 499494484 449494994
The code i have written is
import csv
import sys
import time
import datetime
if __name__ == '__main__':
from_raw = raw_input('\nEnter From date :')
from_date = datetime
print 'From date: = ' + str(from_date)
to_raw = raw_input('\nEnter TO Date :')
to_date = datetime
in_file = './file.csv'
for line in in_file:
fields = line.split(',')
found_date = datetime.date
if from_date <= found_date <= to_date:
print line
in_file.close()
I am executing it like
python script.py 1.csv
I am able to key in the start date and end date with the script but not able to get the expected output
please help
Just reading your code the problem is in the line
fields = line.split(',')
You are splitting the line at the "," which is not what you want. Given that the date substring is consistently in the same place in the string I would try an easy solution which is
found_date = line[8:16]
And also remove the following line:
found_date = datetime.date
This line will change the found_date in the line to the current date/time which you do no want to happen.
These simple changes should solve your coding issue as long as the input format is consistent.

get mean of values that fit a specific criteria (pattern matching)

I asked this question before and got a reply that solved it for me. I have a dataframe that looks like this:
id weekdays halflife
241732222300860000 Friday, Aug 31, 2012, 22 0.4166666667
241689170123309000 Friday, Aug 31, 2012, 19 0.3833333333
241686878137512000 Friday, Aug 31, 2012, 19 0.4
241651117396738000 Friday, Aug 31, 2012, 16 1.5666666667
241635163505820000 Friday, Aug 31, 2012, 15 0.95
241633401382265000 Friday, Aug 31, 2012, 15 2.3666666667
And I would like to get average half life of items that were created on Monday, then on Tuesday...etc. (My date range spans over 6 months).
To get the date values I used strptime and difftime. Also, I found the maximum halflife with max(df$halflife), how can I find which id it corresponds to?
Reproducible code:
structure(list(id = c(241732222300860416, 241689170123309056,
241686878137511936, 241651117396738048, 241635163505819648, 241633401382264832
), weekdays = c("Friday, Aug 31, 2012, 22", "Friday, Aug 31, 2012, 19",
"Friday, Aug 31, 2012, 19", "Friday, Aug 31, 2012, 16", "Friday, Aug 31, 2012, 15",
"Friday, Aug 31, 2012, 15"), halflife = structure(c(0.416666666666667,
0.383333333333333, 0.4, 1.56666666666667, 0.95, 2.36666666666667
), class = "difftime", units = "mins")), .Names = c("id",
"weekdays", "halflife"), row.names = c(NA, 6L), class = "data.frame")
So now, I have an average half life value for all mondays, tuesdays...etc. How can I get the average value for all hours within those weekdays, i.e.: Average half life of all items that were created on all Mondays at 9am, then 10am, then 11am..etc. And then Tuesday at 9am, 10am, 11am..etc. The dates in the weekdays column is formatted so that the last number after the comma is the hour it was created at. I am really bad with regular expressions and pattern matching, which is why I am asking this follow-up question.
with base packages you can do following.
> mydf
id weekdays halflife
1 2.417322e+17 Friday, Aug 31, 2012, 22 0.4166667 mins
2 2.416892e+17 Friday, Aug 31, 2012, 19 0.3833333 mins
3 2.416869e+17 Friday, Aug 31, 2012, 19 0.4000000 mins
4 2.416511e+17 Friday, Aug 31, 2012, 16 1.5666667 mins
5 2.416352e+17 Friday, Aug 31, 2012, 15 0.9500000 mins
6 2.416334e+17 Friday, Aug 31, 2012, 15 2.3666667 mins
Instead of using regex, we can just use strsplit on each element of weekdays, unlist the result, and it back in 4 column format as matrix and cbind it back with mydf.
> mydf2 <- cbind(mydf, matrix(unlist(sapply(mydf$weekdays, strsplit, split=',')), byrow=TRUE, ncol=4, dimnames=list(1:nrow(mydf), c('Weekday', 'Day', 'Year', 'Hour'))))
> mydf2
id weekdays halflife Weekday Day Year Hour
1 2.417322e+17 Friday, Aug 31, 2012, 22 0.4166667 mins Friday Aug 31 2012 22
2 2.416892e+17 Friday, Aug 31, 2012, 19 0.3833333 mins Friday Aug 31 2012 19
3 2.416869e+17 Friday, Aug 31, 2012, 19 0.4000000 mins Friday Aug 31 2012 19
4 2.416511e+17 Friday, Aug 31, 2012, 16 1.5666667 mins Friday Aug 31 2012 16
5 2.416352e+17 Friday, Aug 31, 2012, 15 0.9500000 mins Friday Aug 31 2012 15
6 2.416334e+17 Friday, Aug 31, 2012, 15 2.3666667 mins Friday Aug 31 2012 15
Now we have split weekdays column appropriately, we can use aggregate function to calculate mean over desired grouping columns.
> aggregate(halflife ~ Weekday, data=mydf2, FUN = mean)
Weekday halflife
1 Friday 1.013889
If you want to group by Weekday as well as Hour then
> aggregate(halflife ~ Weekday + Hour, data=mydf2, FUN = mean)
Weekday Hour halflife
1 Friday 15 1.6583333
2 Friday 16 1.5666667
3 Friday 19 0.3916667
4 Friday 22 0.4166667
As such first parameter of aggregate function here is a forumla object which supports one ~ one, one ~ many, many ~ one, and many ~ many relationships. See ?aggregate examples to understand how to use it.
I will give brief example of how to many to many relationships.
> set.seed(12345)
> mydf2 <- cbind(mydf2, newvar = rnorm(nrow(mydf2)))
> mydf2
id weekdays halflife Weekday Day Year Hour newvar
1 2.417322e+17 Friday, Aug 31, 2012, 22 0.4166667 mins Friday Aug 31 2012 22 0.5855288
2 2.416892e+17 Friday, Aug 31, 2012, 19 0.3833333 mins Friday Aug 31 2012 19 0.7094660
3 2.416869e+17 Friday, Aug 31, 2012, 19 0.4000000 mins Friday Aug 31 2012 19 -0.1093033
4 2.416511e+17 Friday, Aug 31, 2012, 16 1.5666667 mins Friday Aug 31 2012 16 -0.4534972
5 2.416352e+17 Friday, Aug 31, 2012, 15 0.9500000 mins Friday Aug 31 2012 15 0.6058875
6 2.416334e+17 Friday, Aug 31, 2012, 15 2.3666667 mins Friday Aug 31 2012 15 -1.8179560
> aggregate(cbind(newvar,halflife) ~ Weekday + Hour, data=mydf2, FUN = mean)
Weekday Hour newvar halflife
1 Friday 15 -0.6060343 1.6583333
2 Friday 16 -0.4534972 1.5666667
3 Friday 19 0.3000814 0.3916667
4 Friday 22 0.5855288 0.4166667

javascript regular expression: how do I find date without year or date with year<2010

I need to find date without year, or date with year<2010.
basically,
Feb 15
Feb 20
Feb 20, 2009
Feb 20, 1995
should be accepted
Feb 20, 2010
Feb 20, 2011
should be rejected
How do I do it?
Thanks,
Cheng
Try this:
(Jan|Feb|Mar...Dec)\s\d{1,2},\s([1][0-9][0-9][0-9]|200[0-9])
Note: Expand the month list with proepr names. I was too lazy to spell it all out.