Age checker 18+ with regular expressions - regex

Can you help me please to do age checker with regular expression? I don't know how to calculate if user is 18 or not.
Input: user's birthday (in format of REGEX)
Output: "welcome" or "come back when you will be 18+"
Here is my code for checking if format is ok and bad:
import re
import datetime
pattern = re.compile("^(0[1-9]|[12][0-9]|3[01])[- \/.,_](0[1-9]|1[012])[- \/.,_](19|20)\d\d")
dob = input('Enter your birthday (dd/mm/yyyy): ')
result = pattern.match(dob)
if pattern.match(dob):
print("format is ok")
else:
print("format is bad")
Thank you in advance!!!

Instead of using regex just try to use the input to create a datetime object - if it works the format is good else input is invalid (see datetime.strptime(date_string, format)).
Once you have it and you have datetime.now() you can easily calculate the age

Okay, so you have the requirement to do it with a regular expression.
Be aware, that this could lead to some edge cases not being covered!
import re
import datetime
pattern = re.compile("^(0[1-9]|[12][0-9]|3[01])[- \/.,_](0[1-9]|1[012])[- \/.,_](19|20)\d\d")
dob = input('Enter your birthday (dd/mm/yyyy): ')
result = pattern.match(dob)
if pattern.match(dob):
print("format is ok")
else:
print("format is bad")
Okay, the regular expression seem to be valid (except the capturing group for the year. You could use Regexr or similiar services if you need to refine it).
Then you can deconstruct the matched groups to get the day, month and year:
[day, month, year] = result.groups() # As mentioned, year is currently either 19 or 20
Then, the next step would be to compare the month against the current month. This will help decide on whether to add a year or not. In case it happens to be the same or an adjacent month, you might want to look at the days, too.
Finally, subtract the current year from the entered one (once you fixed the year capturing group) and do the math.
Since it's an assignment, I won't provide the code for this ;-)

Related

Pandas: Grouping rows by list in CSV file?

In an effort to make our budgeting life a bit easier and help myself learn; I am creating a small program in python that takes data from our exported bank csv.
I will give you an example of what I want to do with this data. Say I want to group all of my fast food expenses together. There are many different names with different totals in the description column but I want to see it all tabulated as one "Fast Food " expense.
For instance the Csv is setup like this:
Date Description Debit Credit
1/20/20 POS PIN BLAH BLAH ### 1.75 NaN
I figured out how to group them with an or statement:
contains = df.loc[df['Description'].str.contains('food court|whataburger', flags = re.I, regex = True)]
I ultimately would like to have it read off of a list? I would like to group all my expenses into categories and check those category variable names so that it would only output from that list.
I tried something like:
fast_food = ['Macdonald', 'Whataburger', 'pizza hut']
That obviously didn't work.
If there is a better way of doing this I am wide open to suggestions.
Also I have looked through quite a few posts here on stack and have yet to find the answer (although I am sure I overlooked it)
Any help would be greatly appreciated. I am still learning.
Thanks
You can assign a new column using str.extract and then groupby:
df = pd.DataFrame({"description":['Macdonald something', 'Whataburger something', 'pizza hut something',
'Whataburger something','Macdonald something','Macdonald otherthing',],
"debit":[1.75,2.0,3.5,4.5,1.5,2.0]})
fast_food = ['Macdonald', 'Whataburger', 'pizza hut']
df["found"] = df["description"].str.extract(f'({"|".join(fast_food)})',flags=re.I)
print (df.groupby("found").sum())
#
debit
found
Macdonald 5.25
Whataburger 6.50
pizza hut 3.50
Use dynamic pattern building:
fast_food = ['Macdonald', 'Whataburger', 'pizza hut']
pattern = r"\b(?:{})\b".format("|".join(map(re.escape, fast_food)))
contains = df.loc[df['Description'].str.contains(pattern, flags = re.I, regex = True)]
The \b word boundaries find whole words, not partial words.
The re.escape will protect special characters and they will be parsed as literal characters.
If \b does not work for you, check other approaches at Match a whole word in a string using dynamic regex

PARSE_DATETIME formatting with day of year

Having an issue with the PARSE_DATETIME function in BigQuery used with the day of year (%j) formatting element. The function seems to ignore the day of year element.
Eg.
select PARSE_DATETIME("%Y%j", "2013243")
returns 2013-01-01T00:00:00, lacking day of year component.
However the reverse function with the same date formatting elements works as expected:
select FORMAT_DATETIME("%Y%j", "2013-02-02T00:00:00")
returns: 2013033
Bug? or user error?
Cheers
I think that this is a bug that could be fixed! there is no logic in it working one way but not opposite!
Meantime, you can use below to achieve goal
#standardSQL
CREATE TEMP FUNCTION PARSE_DATETIME_WITH_DAYS(x STRING) AS (
DATETIME_ADD(PARSE_DATETIME('%Y%j', x), INTERVAL CAST(SUBSTR(x, -3) AS INT64) - 1 DAY)
);
SELECT PARSE_DATETIME_WITH_DAYS('2013243')
with result -
Row f0_
1 2013-08-31T00:00:00
Not a bug, neither an error! PARSE_DATETIME uses a format_string and a STRING representation of a DATETIME to return a DATETIME -> "2013243" does not represent a DATETIME string, not a DATE...
To achieve what you are looking for first get the day number - 1 and add it to date (first day of the year) and format the output to DATETIME
SELECT DATETIME(DATE_ADD((SELECT PARSE_DATE("%Y%j", "2013243")), INTERVAL CAST((SELECT SUBSTR("2013243", -3)) AS INT64) -1 DAY));
Output:
2013-08-31T00:00:00

regex year format authentication

I have a program where the user is asked for the session year which needs to be in the form of 20XX-20XX. The constraint here is that it needs to be a year followed by its next year. Eg. 2019-2020.
For example,
Vaild Formats:
2019-2020
2018-2019
2000-2001
Invalid Fromats:
2019-2021
2000-2000
2019-2018
I am trying to validate this input using regular expressions.
My work:
import re
def add_pages(matchObject):
return "{0:0=3d}".format(int(matchObject) + 1)
try:
a = input("Enter Session")
p = r'2([0-9]{3})-2'
p1= re.compile(p)
x=add_pages(p1.findall(a)[0])
p2 = r'2([0-9]{3})-2'+x
p3 = re.compile(p2)
l=p3.findall(a)
if not l:
raise Exception
else:
print("Authenticated")
except Exception as e:
print("Enter session. Eg. 2019-2020")
Question:
So far I have not been able to retrieve a single regex that will validate this input. I did have a look at backreferencing in regex but it only solved half my query. I am looking for ways to improve this authentication process. Is there any single regex statement that will check for this constraint? Let me know if you need any more information.
Do you really need to get the session year in one input?
I think its better to have two inputs (or just automatically set the session year to be the first year + 1).
I don't know if you're aiming for something bigger and this is just an example but using regex just doesn't seem appropriate for this task to me.
For example you could do this:
print("Enter session year")
first_year = int(input("First year: "))
second_year = int(input("Second year: "))
if second_year != (first_year + 1):
# some validation
else:
# program continues
First of all, why regex? Regex is terrible at math. It would be easier to do something like:
def check_years(string):
string = "2011-2012"
years = string.split("-")
return int(years[0]) == (int(years[1]) - 1)

Split a string using regex or other optimized way

I have a very simple string of the form
YYYYMMDDHHMMSS
Basically a full date/time string. Say an example is
20170224134523
Above implies
year: 2017
month: 02
day:24
hour:13
min:45
sec:23
I want to split it so that i can have it in variables (year, month, day, hour, min, sec). This is in Scala I want to. I was thinking should I use a 6-Tuple and on the right side I will use a regex or what as the most efficient way. If I want to do it in a concise way is what I am trying to think. Little bad with regular expressions.
Can anyone help?
I may want to have each variable in the 6-tuple as option type because otherwise that will also do my sanity check? Say if any variable comes out as None, I want to throw an exception
java.text.SimpleDateFormat handles this kind of date parsing well.
scala> val sdf = new SimpleDateFormat("yyyyMMddkkmmss")
sdf: java.text.SimpleDateFormat = java.text.SimpleDateFormat#8e10adc0
scala> val date = sdf.parse("20170224134523")
date: java.util.Date = Fri Feb 24 13:45:23 PST 2017
You can get the date, day, hours, etc from a successful parse of the date as the API shows below.
scala> res0.get
getClass getDate getDay getHours getMinutes getMonth getSeconds getTime getTimezoneOffset getYear
Further, I'd suggest wrapping the parse call in a Try to handle the successful and unsuccessful parsing.
scala> val date = Try(sdf.parse("20170224134523"))
date: scala.util.Try[java.util.Date] = Success(Fri Feb 24 13:45:23 PST 2017)
scala> val date = Try(sdf.parse("asdf"))
date: scala.util.Try[java.util.Date] = Failure(java.text.ParseException: Unparseable date: "asdf")
Here's the same thing using the newer LocalDateTime instead of Date and it's deprecated methods.
LocalDateTime.parse("20170224134523", DateTimeFormatter.ofPattern("yMMddkkmmss"))
java.time.LocalDateTime = 2017-02-24T13:45:23
Because it is a date string it probably makes sense to use a dedicated date parsing library and parse to a datetime type. Fortunatly, java provides a very good one with the java.time package.
val dateTime = LocalDateTime.parse("20170224134523", DateTimeFormatter.ofPattern("yyyyMMddHHmmss"))
Which will produce a LocalDateTime object (date and time without a timezone attached). If you need more complicated string parsing you can use a DateTimeFormatterBuilder to customize the date format exactly as you need it.
With such a predictable format you can grab it by position using a substring function (from, to) into a date class.
The regex pattern to grab the sections as groups is:
(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})
Demo

Renaming files with no fixed char length in Python

I am currently learning Python 2.7 and am really impressed by how much it can do.
Right now, I'm working my way through basics such as functions and loops. I'd reckon a more 'real-world' problem would spur me on even further.
I use a satellite recording device to capture TV shows etc to hard drive.
The naming convention is set by the device itself. It makes finding the shows you want to watch after the recording more difficult to find as the show name is preceded with lots of redundant info...
The recordings (in .mts format) are dumped into a folder called "HBPVR" at the root of the drive. I'd be running the script on my Mac when the drive is connected to it.
Example.
"Channel_4_+1-15062015-2100-Exams__Cheating_the_....mts"
or
"BBC_Two_HD-19052015-2320-Newsnight.mts"
I included the double-quotes.
I'd like a Python script that (ideally) would remove the broadcaster name, reformat the date info, strip the time info and then put the show's name to the front of the file name.
E.g "BBC_Two_HD-19052015-2320-Newsnight.mts" ->> "Newsnight 19 May 2015.mts"
What may complicate matters is that the broadcaster names are not all of equal length.
The main pattern is that broadcaster name runs up until the first hyphen.
I'd like to be able to re-run this script at later points for newer recordings and not have already renamed recordings renamed further.
Thanks.
Try this:
import calendar
input = "BBC_Two_HD-19052015-2320-Newsnight.mts"
# Remove broadcaster name
input = '-'.join(input.split("-")[1:])
# Get show name
show = ''.join(' '.join(input.split("-")[2:]).split(".mts")[:-1])
# Get time string
timestr = ''.join(input.split("-")[0])
day = int(''.join(timestr[0:2])) # The day is the first two digits
month = calendar.month_name[int(timestr[2:4])] # The month is the second two digits
year = timestr[4:8] # The year is the third through sixth digits
# And the new string:
new = show + " " + str(day) + " " + month + " " + year + ".mts"
print(new) # "Newsnight 19 May 2015.mts"
I wasn't quite sure what the '2320' was, so I chose to ignore it.
Thanks Coder256.
That has given me a bit more insight into how Python can actually help solve real world (first world!) problems like mine.
It tried it out with some different combos of broadcaster and show names and it worked.
I would like though to use the script to rename a batch of recordings/files inside the folder from time to time.
The script did throw and error when processing an already re-named recording, which is to be expected I guess. Should the renamed file have a special character at the start of its name to help avoid this happening?
e.g "_Newsnight 19 May 2015.mts"
Or is there a more aesthetically pleasing way of doing this, with special chars being added on etc.
Thanks.
One way to approach this, since you have a defined pattern is to use regular expressions:
>>> import datetime
>>> import re
>>> s = "BBC_Two_HD-19052015-2320-Newsnight.mts"
>>> ts, name = re.findall(r'.*?-(\d{8}-\d{4})-(.*?)\.mts', s)[0]
>>> '{} {}.mts'.format(name, datetime.datetime.strptime(ts, '%d%m%Y-%H%M').strftime('%d %b %Y'))
'Newsnight 19 May 2015.mts'