English alphabet recognition in Python

English alphabet recognition in Python - python-2.7

I am planning to do a semester project on English alphabet recognition using Python. I am using Python for the first time. What are the tools and methodologies needed? Please help.

Your question is not clear and I suggest you rephrase it or risk having it put on hold.
However, if you mean checking whether a character in a string is a letter, you can use the built-in isaplha() method.
Examples:
>>> 'a'.isalpha()
True
>>> 'x'.isalpha()
True
>>> my_char = 'd'
>>> my_char.isalpha()
True
>>> '1'.isalpha()
False
>>> my_var = '#'
>>> my_var.isalpha()
False

Related

Python range and list - what's the reason of this output?

I am self-learning, and not sure if this is a duplicate questions. In the example below:
Why does first list(a)==a print nothing?
Also why does a become false? And then turning back to a range object?
>>> a=range(10,17)
>>> list(a)==a
>>> list(a)==a
False
>>> a
False
>>> a
range(10, 17)
>>>
Moreover, if I call A once, then call list(A)==A. It always gives range(10, 17). Could anyone also explain this?
>>> A=range(10,17)
>>> A
>>> list(A)==A
range(10, 17)
>>>
Sorry for the rookie questions, I must miss some very basic logic here. Thank you for any ideas!

Scandinavian letters (æøå) in python 2.7

So I am having this weird problem when using 'æ', 'ø' and 'å' in python.
I have included: # -- coding: utf-8 --
at the top of every file, and æøå prints fine so no worries there. However if i do len('æ') i get 2. I am making a program where i loop over and analyze danish text, so this is a big problem.
Below is some examples from the python terminal to illustrate the problem:
In [1]: 'a'.islower()
Out[1]: True
In [2]: 'æ'.islower()
Out[2]: False
In [3]: len('a')
Out[3]: 1
In [4]: len('æ')
Out[4]: 2
In [5]: for c in 'æ': print c in "æøå"
True
True
In [6]: print "æøå are troublesome characters"
æøå are troublesome characters
I can get around the problem of islower() and isupper() not working for 'æ', 'ø' and 'å' by simply doing c.islower() or c in "æøå" to check if c is a lower case letter, but as shown above both parts of 'æ' will then count as a lower case and be counted double.
Is there a way that I can make those letters act like any other letter?
I run python 2.7 on windows 10 using canopy as its an easy way to get sklearn and numpy which i need.

You have stumbled across the problem that strings are bytes by default in python 2. With your header # -- coding: utf-8 -- you have only told the interpreter that your source code is utf-8 but this has no effect on the handling of strings.
The solution to your problem is to convert all your strings to unicode objects with the decode method, e.g
danish_text_raw = 'æ' # here you would load your text
print(type(danish_text_raw)) # returns string
danish_text = danish_text_raw.decode('utf-8')
print(type(danish_text)) # returns <type 'unicode'>
The issues with islower and len should be fixed then. Make sure that all the strings you use in your program are unicode and not bytes objects. Otherwise comparisons can lead to strange results. For example
danish_text_raw == danish_text # this yields false
To make sure that you use unicode strings you can for example use this function to ensure it
def to_unicode(in_string):
if isinstance(in_string,str):
out_string = in_string.decode('utf-8')
elif isinstance(in_string,unicode):
out_string = in_string
else:
raise TypeError('not stringy')
return out_string

Testing for an item in lists - Python 3

As part of a school project we are creating a trouble shooting program. I have come across a problem that I cannot solve:
begin=['physical','Physical','Software','software',]
answer=input()
if answer in begin[2:3]:
print("k")
software()
if answer in begin[0:1]:
print("hmm")
physical()
When I try to input software/Software no output is created. Can anybody see a hole in my code as it is?

In Python, slice end values are exclusive. You are slicing a smaller list than you think you are:
>>> begin=['physical','Physical','Software','software',]
>>> begin[2:3]
['Software']
>>> begin[0:1]
['physical']
Use begin[2:4] and begin[0:2] or even begin[2:] and begin[:2] to get all elements from the 3rd to the end, and from the start until the 2nd (inclusive):
>>> begin[2:]
['Software', 'software']
>>> begin[2:4]
['Software', 'software']
>>> begin[:2]
['physical', 'Physical']
>>> begin[0:2]
['physical', 'Physical']
Better yet, use str.lower() to limit the number of inputs you need to provide:
if answer.lower() == 'software':
With only one string to test, you can now put your functions in a dictionary; this gives you the option to list the various valid answers too:
options = {'software': software, 'physical': physical}
while True:
answer = input('Please enter one of the following options: {}\n'.format(
', '.join(options))
answer = answer.lower()
if answer in options:
options[answer]()
break
else:
print("Sorry, {} is not a valid option, try again".format(answer))

Your list slicing is wrong, Try the following script.
begin=['physical','Physical','Software','software',]
answer=input()
if answer in begin[2:4]:
print("k")
software()
if answer in begin[0:2]:
print("hmm")
physical()

Better code then a regex sub() python 2.7

I am trying to find out if there are better faster ways to clean this returned string. Or is this the best way. It works, but more efficient ways are always wanted.
I have a function that returns the following output:
"("This is your:, House")"
I clean it up before printing with:
a = re.sub(r'^\(|\)|\,|\'', '', a)
print a
>>> This is your: House
I also learn a lot from the different ways people do things.

You don't need to use regular expression to do this.
>>> import string
>>> a = '"("This is your:, House")"'
>>> ''.join(x for x in a if x not in string.punctuation)
'This is your House'
>>> tbl = string.maketrans('', '')
>>> a.translate(tbl, string.punctuation)
'This is your House'

s='"("This is your:, House")"'
s.replace('\"','').replace('(','').replace(')','').replace(',','').replace(':','')
'This is your House'

Select Everything After Greater Than in Python

I have a string
"Tyson Invitational 02/08/2013','#FFFFCC')""; ONMOUSEOUT=""kill()"" >6.54"
How would I use regex to select everything after the right-pointing bracket? Aka how would I get the 6.54?
I've tried
\>(.*)
but I'm not sure it's working properly. I use
m = re.search( '\>(.*)', row_out[5])
and get
<_sre.SRE_Match object at 0x10b6105d0>
Not sure what the issue is.
Thanks!

>>> import re
>>> str="""Tyson Invitational 02/08/2013','#FFFFCC')""; ONMOUSEOUT=""kill()"" >6.54"""
>>> re.search('\>(.*)',str)
<_sre.SRE_Match object at 0x7f0de9575558>
same as you got before. However assign the search result to a variable and
>>> f=re.search('\>(.*)',str)
>>> f.groups()
('6.54',)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

English alphabet recognition in Python - python-2.7

I am planning to do a semester project on English alphabet recognition using Python. I am using Python for the first time. What are the tools and methodologies needed? Please help.

Related

Python range and list - what's the reason of this output?

Scandinavian letters (æøå) in python 2.7

Testing for an item in lists - Python 3

Better code then a regex sub() python 2.7

Select Everything After Greater Than in Python

Categories

Resources