Regular expression for numeric range - regex

Looking for a regular expression to cover a number range. More specifically, consider a numeric format:
NN-NN
where N is a number. So examples are:
04-11
07-12
06-06
I want to be able to specify a range. For example, anything between:
01-27 and 02-03
When I say range, it is as if the - is not there. So the range:
the range 01-27 to 02-03
would cover:
01-28, 01-29, 01-30, 01-31 and 02-01
I want the regular expression so that I can plug in values for the range very easily. Any ideas?

Validating dates is not where regexes strengths are.
for example, how would you validate February regarding leap years.
The solution is to use the available date API's in your language

'0[12]-[0-3][1-9]' would match all of the required dates, however, it would also match dates like 01-03. If you want to match exactly and only the dates in that range, you'll need to do something a little more advanced.
Here's an easily configurable example in Python:
from calendar import monthrange
import re
startdate = (1,27)
enddate = (2,3)
d = startdate
dateList = []
while d != enddate:
(month, day) = d
dateList += ['%02i-%02i' % (month, day)]
daysInMonth = monthrange(2011,month)[1] # took a random non-leap year
# but you might want to take the current year
day += 1
if day > daysInMonth:
day = 1
month+=1
if month > 12:
month = 1
d = (month,day)
dateRegex = '|'.join(dateList)
testDates = ['01-28', '01-29', '01-30', '01-31', '02-01',
'04-11', '07-12', '06-06']
isMatch = [re.match(dateRegex,x)!=None for x in testDates]
for i, testDate in enumerate(testDates):
print testDate, isMatch[i]
dateRegex looks like this:
'01-27|01-28|01-29|01-30|01-31|02-01|02-02'
And the output is:
01-28 True
01-29 True
01-30 True
01-31 True
02-01 True
04-11 False
07-12 False
06-06 False

It's not completely clear for me, and you didn't mention language as well, but in PHP it looks like this:
if (preg_match('~\d{2}-\d{2}~', $input, $matches) {
// do something here
}
Do you have any use case so we can adjust code to your needs?

Related

For Loop and If Statement not performing as expected

Here's the code:
# Scrape table data
alltable = driver.find_elements_by_id("song-table")
date = date.today()
simple_year_list = []
complex_year_list = []
dateformat1 = re.compile(r"\d\d\d\d")
dateformat2 = re.compile(r"\d\d\d\d-\d\d-\d\d")
for term in alltable:
simple_year = dateformat1.findall(term.text)
for year in simple_year:
if 1800 < int(year) < date.year: # Year can't be above what the current year is or below 1800,
simple_year_list.append(simple_year) # Might have to be changed if you have a song from before 1800
else:
continue
complex_year = dateformat2.findall(term.text)
complex_year_list.append(complex_year)
The code uses regular expressions to find four consecutive digits. Since there are multiple 4 digit numbers, I want to narrow it down to between 1800 and 2021 since that's a reasonable time frame. simple_year_list, however, prints out numbers that don't follow the conditions.
You aren't saving the right value here:
simple_year_list.append(simple_year)
You should be saving the year:
simple_year_list.append(year)
I would need more information to help further though. Maybe give us a sample of the data you're working through, and the output you're seeing?
You can do it all in regex.
Add start ^ and end $ anchors, and range restriction via pattern:
dateformat1 = re.compile(r"^(1[89]\d\d|20([01]\d|2[01]))$")

Filter and compare specific data (Power BI/DAX)

I am trying to do an If formula in Power bi, with filtering and comparing data. I want to check for every Client,who have with unique Transaction ID, if the Legal firm is the same. If its the same to return Yes, if not - NO.
**Client | Transaction ID | Legal firm**
American Express |2295876 |Orrick Herrington
American Express |2295877 |Orrick Herrington
American Express |2295878 |Orrick Herrington
Swedbank AB |2287074 |Linklaters
Swedbank AB |2287074 |Clifford Chance
Swedbank AB |2287075 |Clifford Chance
I tried Calculate with distinct count, but it wasn't possible to include if statement.
You should be able to do it with COUNT, and removing the filter context on the Legal Firm using ALLEXCEPT, for example
Measure =
VAR rowCheck = CALCULATE(COUNT(Table1[Legal firm]), ALLEXCEPT(Table1, Table1[Transaction ID]))
VAR textValue = IF(rowCheck = 1, "Yes", "No")
RETURN
textValue
[
Hope that helps

python Find the most reported month

I am trying to find out October(mentioned 2 times), I had the idea to use dictionary to solve this problem. However I struggled a lot to figure out how to find/separate the months, I was not able to use my solution for the 1st str values where there are some spaces. Can someone please suggest how can I modify that split section to cover - , and white space?
import re
#str="May-29-1990, Oct-18-1980 ,Sept-1-1980, Oct-2-1990"
str="May-29-1990,Oct-18-1980,Sept-1-1980,Oct-2-1990"
val=re.split(',',str)
monthList=[]
myDictionary={}
#put the months in a list
def sep_month():
for item in val:
if not item.isdigit():
month,day,year=item.split("-")
monthList.append(month)
#process the month list from above
def count_month():
for item in monthList:
if item not in myDictionary.keys():
myDictionary[item]=1
else:
myDictionary[item]=myDictionary.get(item)+1
for k,v in myDictionary.items():
if v==2:
print(k)
sep_month()
count_month()
from datetime import datetime
import calendar
from collections import Counter
datesString = "May-29-1990,Oct-18-1980,Sep-1-1980,Oct-2-1990"
datesListString = datesString.split(",")
datesList = []
for dateStr in datesListString:
datesList.append(datetime.strptime(dateStr, '%b-%d-%Y'))
monthsOccurrencies = Counter((calendar.month_name[date.month] for date in datesList))
print(monthsOccurrencies)
# Counter({'October': 2, 'May': 1, 'September': 1})
Something to be aware in my solution with %b for the month is that Sept has changed to Sep to work (Month as locale’s abbreviated name). In this case you can either use fullname months (%B) or abbreviated name (%b). If you can not have the big string as with correct month name formatting, just replace the wrong ones ("Sept" for example with "Sep" and always work with date obj).
Not sure that regex is the best tool for this job, I would just use strip() along with split() to handle your whitespace issues and get a list of just the month abbreviations. Then you could create a dict with counts by month using the list method count(). For example:
dates = 'May-29-1990, Oct-18-1980 ,Sept-1-1980, Oct-2-1990'
months = [d.split('-')[0].strip() for d in dates.split(',')]
month_counts = {m: months.count(m) for m in set(months)}
print(month_counts)
# {'May': 1, 'Oct': 2, 'Sept': 1}
Or even better with collections.Counter:
from collections import Counter
dates = 'May-29-1990, Oct-18-1980 ,Sept-1-1980, Oct-2-1990'
months = [d.split('-')[0].strip() for d in dates.split(',')]
month_counts = Counter(months)
print(month_counts)
# Counter({'Oct': 2, 'May': 1, 'Sept': 1})

Time Series manipulation

So I have a dataframe that I dump a time series into. The index is the date. I need to do calculations based on date.
For eg. I have {
XRT_Close
Date
2010-01-04 35.94
2010-01-05 36.17
2010-01-06 36.50
...
2015-02-07 36.60
2015-02-08 36.52 }
How would I go about doing say... Percentage change of beginning to end of the month? How would I construct a loop to cycle through the months?
Any help will be met with huge appreciation. Thank you.
First create year and month columns:
df['year'] = [x.year for x in df.index]
df['month'] = [x.month for x in df.index]
Group by them:
grouped = df.groupby(['year','month'])
Define the function you want to run on the groups:
def PChange(df):
begin = df['column_name'].iloc[0]
end = df['column_name'].iloc[-1]
return (end-begin)/(end+begin)*100
Apply the function to the groups:
grouped.apply(PChange)
Let me know if it works.

Check if string is of SortableDateTimePattern format

Is there any way I can easily check if a string conforms to the SortableDateTimePattern ("s"), or do I need to write a regular expression?
I've got a form where users can input a copyright date (as a string), and these are the allowed formats:
Year: YYYY (eg 1997)
Year and month: YYYY-MM (eg 1997-07)
Complete date: YYYY-MM-DD (eg 1997-07-16)
Complete date plus hours and minutes: YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00)
Complete date plus hours, minutes and seconds: YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)
Complete date plus hours, minutes, seconds and a decimal fraction of a second
YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)
I don't have much experience of writing regular expressions so if there's an easier way of doing it I'd be very grateful!
Not thoroughly tested and hence not foolproof, but the following seems to work:
var regex:RegExp = /(?<=\s|^)\d{4}(-\d{2}(-\d{2}(T\d{2}:\d{2}(:\d{2}(\.\d{2})?)?\+\d{2}:\d{2})?)?)?(?=\s|$)/g;
var test:String = "23 1997 1998-07 1995-07s 1937-04-16 " +
"1970-0716 1993-07-16T19:20+01:01 1979-07-16T19:20+0100 " +
"2997-07-16T19:20:30+01:08 3997-07-16T19:20:30.45+01:00";
var result:Object
while(result = regex.exec(test))
trace(result[0]);
Traced output:
1997
1998-07
1937-04-16
1993-07-16T19:20+01:01
2997-07-16T19:20:30+01:08
3997-07-16T19:20:30.45+01:00
I am using ActionScript here, but the regex should work in most flavors. When implementing it in your language, note that the first and last / are delimiters and the last g stands for global.
I'd split the input field into many (one for year, month, day etc.).
You can use Javscript to advance from one field to the next once full (i.e. once four characters are in the year box, move focus to month) for smoother entry.
You can then validate each field independently and finally construct the complete date string.