I have a program where the user is asked for the session year which needs to be in the form of 20XX-20XX. The constraint here is that it needs to be a year followed by its next year. Eg. 2019-2020.
For example,
Vaild Formats:
2019-2020
2018-2019
2000-2001
Invalid Fromats:
2019-2021
2000-2000
2019-2018
I am trying to validate this input using regular expressions.
My work:
import re
def add_pages(matchObject):
return "{0:0=3d}".format(int(matchObject) + 1)
try:
a = input("Enter Session")
p = r'2([0-9]{3})-2'
p1= re.compile(p)
x=add_pages(p1.findall(a)[0])
p2 = r'2([0-9]{3})-2'+x
p3 = re.compile(p2)
l=p3.findall(a)
if not l:
raise Exception
else:
print("Authenticated")
except Exception as e:
print("Enter session. Eg. 2019-2020")
Question:
So far I have not been able to retrieve a single regex that will validate this input. I did have a look at backreferencing in regex but it only solved half my query. I am looking for ways to improve this authentication process. Is there any single regex statement that will check for this constraint? Let me know if you need any more information.
Do you really need to get the session year in one input?
I think its better to have two inputs (or just automatically set the session year to be the first year + 1).
I don't know if you're aiming for something bigger and this is just an example but using regex just doesn't seem appropriate for this task to me.
For example you could do this:
print("Enter session year")
first_year = int(input("First year: "))
second_year = int(input("Second year: "))
if second_year != (first_year + 1):
# some validation
else:
# program continues
First of all, why regex? Regex is terrible at math. It would be easier to do something like:
def check_years(string):
string = "2011-2012"
years = string.split("-")
return int(years[0]) == (int(years[1]) - 1)
Related
Background
I have a list of "bad words" in a file called bad_words.conf, which reads as follows
(I've changed it so that it's clean for the sake of this post but in real-life they are expletives);
wrote (some )?rubbish
swore
I have a user input field which is cleaned and striped of dangerous characters before being passed as data to the following script, score.py
(for the sake of this example I've just typed in the value for data)
import re
data = 'I wrote some rubbish and swore too'
# Get list of bad words
bad_words = open("bad_words.conf", 'r')
lines = bad_words.read().split('\n')
combine = "(" + ")|(".join(lines) + ")"
#set score incase no results
score = 0
#search for bad words
if re.search(combine, data):
#add one for a hit
score += 1
#show me the score
print(str(score))
bad_words.close()
Now this finds a result and adds a score of 1, as expected, without a loop.
Question
I need to adapt this script so that I can add 1 to the score every time a line of "bad_words.conf" is found within text.
So in the instance above, data = 'I wrote some rubbish and swore too' I would like to actually score a total of 2.
1 for "wrote some rubbish" and +1 for "swore".
Thanks for the help!
Changing combine to just:
combine = "|".join(lines)
And using re.findall():
In [33]: re.findall(combine,data)
Out[33]: ['rubbish', 'swore']
The problem with having the multiple capturing groups as you originally were doing is that re.findall() will return each additional one of those as an empty string when one of the words is matched.
Can you help me please to do age checker with regular expression? I don't know how to calculate if user is 18 or not.
Input: user's birthday (in format of REGEX)
Output: "welcome" or "come back when you will be 18+"
Here is my code for checking if format is ok and bad:
import re
import datetime
pattern = re.compile("^(0[1-9]|[12][0-9]|3[01])[- \/.,_](0[1-9]|1[012])[- \/.,_](19|20)\d\d")
dob = input('Enter your birthday (dd/mm/yyyy): ')
result = pattern.match(dob)
if pattern.match(dob):
print("format is ok")
else:
print("format is bad")
Thank you in advance!!!
Instead of using regex just try to use the input to create a datetime object - if it works the format is good else input is invalid (see datetime.strptime(date_string, format)).
Once you have it and you have datetime.now() you can easily calculate the age
Okay, so you have the requirement to do it with a regular expression.
Be aware, that this could lead to some edge cases not being covered!
import re
import datetime
pattern = re.compile("^(0[1-9]|[12][0-9]|3[01])[- \/.,_](0[1-9]|1[012])[- \/.,_](19|20)\d\d")
dob = input('Enter your birthday (dd/mm/yyyy): ')
result = pattern.match(dob)
if pattern.match(dob):
print("format is ok")
else:
print("format is bad")
Okay, the regular expression seem to be valid (except the capturing group for the year. You could use Regexr or similiar services if you need to refine it).
Then you can deconstruct the matched groups to get the day, month and year:
[day, month, year] = result.groups() # As mentioned, year is currently either 19 or 20
Then, the next step would be to compare the month against the current month. This will help decide on whether to add a year or not. In case it happens to be the same or an adjacent month, you might want to look at the days, too.
Finally, subtract the current year from the entered one (once you fixed the year capturing group) and do the math.
Since it's an assignment, I won't provide the code for this ;-)
Searching through a database looking for matches. Need to log the matches as well as though that don't match so I have the full database but those that match I specifically need to know the part that matches.
serv = ['6:00am-9:00pm', 'Unavailable', '7:00am-10:00pm', '8:00am-9:00pm', 'Closed']
if self.serv[datas] == 'Today':
clotime.append('')
elif self.serv[data] == 'Tomorrow':
clotime.append('')
elif self.serv[data] == 'Yesterday':
clotime.append('')
else:
clo = re.findall('-(.*?):', self.serv[data])
clotime.append(clo[0])
The bulk majority of the data ends up running through re.findall but some is still left for the initial if/elif checks.
Is there a way to condense this code down and do it all with re.findall, maybe even with just one line of code. I need the everything(entire database) gone through/logged so I can process through the database correctly when I go to display the data on a map.
Using anchors you can match a whole string
clo = re.search('^(?:To(?:day|morrow)|Yesterday)$|-(.*?):', self.serv[data])
if clo is not None:
clotime.append(clo.group(1))
With your example list:
serv = ['6:00am-9:00pm', 'Unavailable', '7:00am-10:00pm', '8:00am-9:00pm', 'Closed']
clotime = []
for data in serv:
clo = re.search('^(?:To(?:day|morrow)|Yesterday)$|-(.*?):', data)
if clo is not None:
clotime.append(clo.group(1))
print(clotime)
I would try something like this:
clo = re.findall('-(\d+):', self.serv[data])
clotime.append(clo[0] if clo else '')
If I understood your existing code it looks like you want to append an empty string in the cases where a closing hour couldn't be found in the string? This example extracts the closing hour but uses an empty string whenever the regex doesn't match anything.
Also if you're only matching digits it's better to be explicit about that.
--
I am attempting to scrape information from the website:
http://www.forexfactory.com/#tradesPositions
Now, I used to have one up and running which this forum helped me get going, but I think something has changed on the website and the script I had no longer works.
What do I need?
I would like to scrape the number of 'short' and 'long' positions for AUDUSD, EURUSD, GBPUSD, USDJPY, USDCAD, NZDUSD and USDCHF.
NOT the percentages, the actual number of traders.
What have I done?
This is for EURUSD
import lxml.html
from selenium import webdriver
driver = webdriver.Chrome("C:\Users\MY NAME\Downloads\Chrome Driver\chromedriver.exe")
url = ('http://www.forexfactory.com/#tradesPositions')
driver.get(url)
tree = lxml.html.fromstring(driver.page_source)
results_short = tree.xpath('//*[#id="flexBox_flex_trades/positions_tradesPositionsCopy1"]/div[1]/table/tbody/tr/td[2]/div[1]/ul[1]/li[2]/span/text()')
results_long = tree.xpath('//*[#id="flexBox_flex_trades/positions_tradesPositionsCopy1"]/div[1]/table/tbody/tr/td[2]/div[1]/ul[1]/li[1]/span/text()')
print "Forex Factory"
print "Traders Short EURUSD:",results_short
print "Traders Long EURUSD:",results_long
driver.quit()
This returns
Forex Factory
Traders Short EURUSD: ['337 Traders ', ' ']
Traders Long EURUSD: [' 259 Traders']
I would like to strip everything away from the result except for the numbers. I've tried .strip() and .replace() but neither work on a list. Which will come as no surprise to you guys I don't think!
Empty List
When I apply the same technique to AUDUSD I get an empty list.
import lxml.html
from selenium import webdriver
driver = webdriver.Chrome("C:\Users\Andrew G\Downloads\Chrome Driver\chromedriver.exe")
url = ('http://www.forexfactory.com/#tradesPositions')
driver.get(url)
tree = lxml.html.fromstring(driver.page_source)
results_short = tree.xpath('//*[#id="flexBox_flex_trades/positions_tradesPositionsCopy1"]/div[6]/table/tbody/tr/td[2]/div[1]/ul[1]/li[2]/span/text()')
results_long = tree.xpath('//*[#id="flexBox_flex_trades/positions_tradesPositionsCopy1"]/div[6]/table/tbody/tr/td[2]/div[1]/ul[1]/li[1]/span/text()')
s2 = results_short
l2 = results_long
print "Traders Short AUDUSD:",s2
print "Traders Long AUDUSD:",l2
This returns
Traders Short AUDUSD: []
Traders Long AUDUSD: []
What gives? Is the Xpath not working? Just use Chromes 'inspect element' feature and navigated to the desired number, and copied the path. Same method for EURUSD.
Ideally, It would be nice to set up a list of div numbers that can insert into the tree.xpath instead of repeating the lines of code for all the different currencies to make it neater. So, in the Xpath where it has:
/div[number]/
It would be nice to have a list, i.e [1,2,3,4,5,6] that can insert into that because the rest of the Xpath is the same for the currencies. Anyway, that's an optional bonus, priority is to get a return for all currencies listed.
THANKS
You can remove all the space inside your result as you mentioned with strip method, here is my sample code:
for index in range(len(results_short)):
results_short[index] = results_short[index].strip()
if results_short[index] == "":
del results_short[index]
for index in range(len(results_long)):
results_long[index] = results_long[index].strip()
if results_long[index] == "":
del results_long[index]
For the problem you cannot get the result of AUD because the values are not loaded to the page until you have clicked the "expand" button. But I have found you can get the result from the following page: http://www.forexfactory.com/trades.php
So you can change the value of url as:
url = ('http://www.forexfactory.com/trades.php')
For this page, since the name of CSS id has changed, you need to update your value to:
results_short = tree.xpath('//*[#id="flexBox_flex_trades/positions_tradesPositions"]/div[6]/table/tbody/tr/td[2]/div[1]/ul[1]/li[2]/span/text()')
results_long = tree.xpath('//*[#id="flexBox_flex_trades/positions_tradesPositions"]/div[6]/table/tbody/tr/td[2]/div[1]/ul[1]/li[1]/span/text()')
Then apply the strip function as mentioned above, you should be able to get the correct results.
Trying to get a partial match in a list, from a user input.
I am trying to make a simple diagnostic program. The user inputs their ailment and the program will output a suggested treatment.
print("What is wrong with you?")
answer=input()
answer=answer.lower()
problem=""
heat=["temperature","hot"]
cold=["freezing","cold"]
if answer in heat:
problem="heat"
if answer in cold:
problem="cold"
print("you have a problem with",problem)
I can get it to pick an exact match from the list but I want it to find partial matches from my input. For example if the user types they are "too hot".
Try the code below. The key is the split() method.
answer = input('What is wrong with you?')
answer = answer.lower()
heat = ['temperature', 'hot']
cold = ['freezing', 'cold']
for word in answer.split():
if word in heat:
problem = 'heat'
if word in cold:
problem = 'cold'
print('you have a problem with', problem)
I would recommend you use something like this which might be a bit more "pythonic"
answer = input()
cold = ["freezing", "cold"]
if any(answer in c for c in cold):
problem = "cold"