Different spellings of Chanukah Regex [closed] - regex

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Hannuka, Chanukah, Hanukkah...Due to transliteration from another language and character set, there are many ways to spell the name of this holiday. How many legitimate spellings can you come up with?
Now, write a regular expression that will recognise all of them.

According to http://www.holidays.net/chanukah/spelling.htm, it can be spelled any of the following ways:
Chanuka
Chanukah
Chanukkah
Channukah
Hanukah
Hannukah
Hanukkah
Hanuka
Hanukka
Hanaka
Haneka
Hanika
Khanukkah
Here is my regex that matches all of them:
/(Ch|H|Kh)ann?[aeiu]kk?ah?/
Edit: Or this, without branches:
/[CHK]h?ann?[aeiu]kk?ah?/

Call me a sucker for readability.
In Python:
def find_hanukkah(s):
import re
spellings = ['hannukah', 'channukah', 'hanukkah'] # etc...
for m in re.finditer('|'.join(spellings), s, re.I):
print m.group()
find_hanukkah("Hannukah Channukah, Hanukkah")

Something like C?hann?uk?kah? matches most of the common cases. There also a bunch of weirder spellings C?hann?uk?kah?|Han[aei]ka|Khanukkah matches almost every spelling I could think of (that had at least half a million hits on google).

((Ch|H|X|Х|Kh|J)[aа](н|n{1,2})(у|ou|[auei])(к|k|q){1,2}[aа]h?)|(חנו?כה)
This regex is much more inclusive and covers all of the following options:
Channuka
Channukah
Channukka
Channukkah
Chanuka
Chanukah
Chanukah
Chanukka
Chanukkah
Chanuqa
Hanaka
Haneka
Hanika
Hannuka
Hannukah
Hannukka
Hannukkah
Hanoukka
Hanuka
Hanukah
Hanukka
Hanukkah
Januka
Khanukkah
Xanuka
Ханука
Ханука
חנוכה
חנכה

Try this:
/^[ck]?hann?ukk?ah?$/i

I think the only approved spellings in English are Hanukkah and Chanukh, so it's something like
/(Ch|H)anuk?kah/
Or maybe even better
/(Chanukah|Hanukkah)/

I like Triptych's answer, but i would take it one step forward... also in python:
def valid(spelling):
import re
regex_spelling = re.compile(r'^[cCkK]{0,1}han{1,2}uk{1,2}ah$')
valid = regex_spelling.match(spelling)
if valid:
print 'Valid spelling'
else:
print spelling, " is not a spelling for the word"
to use it:
valid("hanukkah")

Related

Regular Expressions - Snowflake [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 months ago.
Improve this question
enter image description hereI am trying to get text till the fourth "\n \n" from the below text. Can you please help me to write the snowflake expression for this issue.
Hello Jeffrey,\n \nWe have not heard from you yet. I hope all is well with you.\n \nChecking in to gather your Goosehead approved office location address, so we can add you to our database here at ERGOS. Once added here, we can schedule your laptop setup.\n \nGoosehead requires all agents to be onboarded by ERGOS so that we can provide IT support as well as get your laptop in our database. \n \nDo you have a laptop ready for setup?
so every thing up to the first \n \n can be fetched with regexp_substr via:
select
regexp_substr(column1, '.*\n \n') as match
from values
('Hello Jeffrey,\n \nWe have not heard from you yet. I hope all is well with you.\n \nChecking in to gather your Goosehead approved office location address, so we can add you to our database here at ERGOS. Once added here, we can schedule your laptop setup.\n \nGoosehead requires all agents to be onboarded by ERGOS so that we can provide IT support as well as get your laptop in our database. \n \nDo you have a laptop ready for setup?');
MATCH
Hello Jeffrey,
now, if we add a group around that ( ) and ask for 4 matches {4}, and swap to a smaller sample text, to make things less ugly for the output
select
regexp_substr(column1, '(.*\n \n){4}') as match
from values
('1111\n \n222222222222222\n \n3333333333333333\n \n44444444444444444\n \n55555555555555555555555');
gives:
MATCH
1111 222222222222222 3333333333333333 44444444444444444
if you are expecting the \n in the output:
then
select
column1,
regexp_substr(column1, '[^\\\\]+\\\\n \\\\n') as match
from values
('1111\\n \\n22222\\n \\n33333333\\n \\n4444444\\n \\n55555\\n \\66666\\n \\n7777');
shows how they need to be encoded in the SQL to output, and thus how to encode the match.
these matches greedy and gives:
COLUMN1
MATCH
1111\n \n22222\n \n33333333\n \n4444444\n \n55555\n \66666\n \n7777
1111\n \n
thus putting the grouping back in:
select
column1,
regexp_substr(column1, '([^\\\\]+\\\\n \\\\n){4}') as match
from values
('1111\\n \\n22222\\n \\n33333333\\n \\n4444444\\n \\n55555\\n \\66666\\n \\n7777');
COLUMN1
MATCH
1111\n \n22222\n \n33333333\n \n4444444\n \n55555\n \66666\n \n7777
1111\n \n22222\n \n33333333\n \n4444444\n \n
Picture to example for escaped new lines:

Partial match in a list, from a user input

Trying to get a partial match in a list, from a user input.
I am trying to make a simple diagnostic program. The user inputs their ailment and the program will output a suggested treatment.
print("What is wrong with you?")
answer=input()
answer=answer.lower()
problem=""
heat=["temperature","hot"]
cold=["freezing","cold"]
if answer in heat:
problem="heat"
if answer in cold:
problem="cold"
print("you have a problem with",problem)
I can get it to pick an exact match from the list but I want it to find partial matches from my input. For example if the user types they are "too hot".
Try the code below. The key is the split() method.
answer = input('What is wrong with you?')
answer = answer.lower()
heat = ['temperature', 'hot']
cold = ['freezing', 'cold']
for word in answer.split():
if word in heat:
problem = 'heat'
if word in cold:
problem = 'cold'
print('you have a problem with', problem)
I would recommend you use something like this which might be a bit more "pythonic"
answer = input()
cold = ["freezing", "cold"]
if any(answer in c for c in cold):
problem = "cold"

How to use regex for a persons name? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
Basically here is my code and I cannot find the problem with this, So I'm looking for advice.
name = raw_input('What\'s your name? ')
if not re.match(r'[A-Za-z- ]+', name):
print 'Invalid name.\n' #error message
You must need to add end of the line anchor. Without the anchor foo? would be considered as a valid one. That is, it won't print the message Invalid name for this name.
if not re.match(r'[A-Za-z- ]+$', name):
print 'Invalid name.\n'
Example:
>>> s = 'foo?'
>>> if not re.match(r'[A-Za-z- ]+', s):
print('Invalid name.\n')
>>> if not re.match(r'[A-Za-z- ]+$', s):
print('Invalid name.\n')
Invalid name.

R : regular expression to match pattern in only the first line [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
In my R code, I have the following content of x as a result of lda prediction output.
[1] lamb
Levels: lamb cow chicken
I would like to capture the word "lamb" in the first line and not the second line.
I had the following reg expression which did not work.
if (regmatches(x,regexec(".*?([a-z]+)",x))[[1]][2]=="lamb"){
cat("It is a lamb")
}
Instead, I also got the following error :-
Error in regexec(".*?([a-z]+)", x) : invalid 'text' argument
Anyone with help ?
Thanks in advance.
mf
Direct Answer:
It is a variable type error. See ?predict.lda to learn why: The return object of a predict() when used with an object of class lda is a list. You just want the first element of the list, which is a factor for an object of type integer. Factors in R store some characters for every element in their level component, which can be accessed by levels() (Read ?factor as well.). But what you want is to access the explicit value your factor shows, which can be acheived by as.character(). By the way: The second line does not get checked by the regex. It is just standard console output of a factor, see ?print.factor.
Here's an example, based on thepredict.lda() help page:
tr <- sample(1:50, 25)
train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3])
test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
z <- lda(train, cl)
x_lda <- predict(z, test)
# x_lda is a list
typeof(x_lda)
# The first element of the list, called "class", is a factor of type integer.
typeof(x_lda$class)
# Now we create a character vector from the factor:
as.character(x_lda$class)
With an explicit character object, your code works for me:
x <- "lamb"
regmatches(x,regexec(".*?([a-z]+)",x))[[1]][2]=="lamb"
[1] TRUE
So you need to coerce your object to character, and then use it as the "text" argument for the regexec function.
Actual Answer:
There are better ways to do this.
You nest and chain a lot of functions in one line. This is barely readable and makes debugging hard.
If you know that the output will always consist of certain elements (especially, since you know the input of your lda prediction and therefore know the different factor levels beforehand), you can simply check them by == and maybe any() (continuing with the example from before):
levels(cl)
[1] "c" "s" "v"
any(as.character(x_lda$class)=="c")
[1] TRUE
See the help file for ?any, if you don't know what it does.
Finally, if you just want to print "It is a lamb" in the end, and your output will always just have one element, you can simply use paste():
paste("It is a", as.character(x))
[1] "It is a lamb"

regex remove seconds and milliseconds [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
This is linked the my previous question, regex to add hypen in dates.
I would now like to be able to remove the seconds and milliseconds/change it to zero using gsub again as well
i.e. something like:
x <- c("20130603 00:00:03.102","20130703 00:01:03.103","20130804 00:03:03.104")
y <- gsub([REGEX PATTERN TO MATCH],[REPLACEMENT PATTERN TO INSERT HYPHEN and REMOVE SECONDS] ,x)
> y
[1] "2013-06-03 00:00:00" "2013-07-03 00:01:00" "2013-08-04 00:03:00"
You can use strptime to parse your objects into POSIXlt objects which, when printed, are exactly in the format you expect:
y <- strptime(x, "%Y%m%d %H:%M:%S")
# [1] "2013-06-03 00:00:03" "2013-07-03 00:01:03" "2013-08-04 00:03:03"
To remove seconds, use trunc:
y <- trunc(y, units = "mins")
# [1] "2013-06-03 00:00:00" "2013-07-03 00:01:00" "2013-08-04 00:03:00"
Having your objects as date/time objects will open a lot of doors, but if you really mean to store the output as a character vector, then just use as.character:
y <- as.character(y)
A lubridate version:
library(lubridate)
dt <- ymd_hms(x)
dt2 <- update(dt, seconds = 0)
You can try this regex, which I added a bit:
gsub("(\\d{4})(\\d{2})(\\d{2}) (\\d{2}:\\d{2}).*", "\\1-\\2-\\3 \\4:00", subject, perl=TRUE);
demo on regex101.