Largest digit in a string using isdigit for python 3 - python-2.7

I'm trying to find the largest digit in a string of texts with alpha and numeric characters.
The source works in Python v2 but not in Python v3. When I run the module in Python 3 it returns with an error "TypeError: unorderable types: str() > int()
largestdigit = 0
n = 5000
with open('pg76.txt') as file:
sentence = file.read()
#FIND LARGEST DIGIT FOR SPECIFIED N SIZE
for i in range(0,n):
if sentence[i].isdigit():
if sentence[i] > largestdigit:
largestdigit = sentence[i]
#OUTPUT
print ("loaded \"pg76.txt\" of length", len(sentence))
print ("n =", n)
if largestdigit == 0:
print ("largest digit = None")
else:
print ("Largest digit =", largestdigit )

The TypeError that you see was part of a deliberate change. Python3 offers more complex and precise comparison operators and, as a result, the older "unnatural" comparisons have been removed. This is documented as part of What's New for Python3:
Python 3.0 has simplified the rules for ordering comparisons:
The ordering comparison operators (<, <=, >=, >) raise a TypeError exception when the operands don’t have a meaningful natural ordering.
Thus, expressions like 1 < '', 0 > None or len <= len are no longer
valid, and e.g. None < None raises TypeError instead of returning
False. A corollary is that sorting a heterogeneous list no longer
makes sense – all the elements must be comparable to each other. Note
that this does not apply to the == and != operators: objects of
different incomparable types always compare unequal to each other.
So, you need to either stick with characters or convert all the digits to integers. If you choose conversion:
if int(sentence[i]) > largestdigit:
largestdigit = int(sentence[i])

In the statement
if sentence[i] > largestdigit:
you are trying to compare a string value with an integer value. Python does not automatically convert a string to an integer, so even though Python 2 doesn't show you an error, the code is not doing what you assume it is.
In Python 2, when you try to compare a string and an integer, the string ALWAYS evaluates to greater than the integer. So, in your code, sentence[i] will ALWAYS be greater than largestdigit, even if you set sentence[i] to '1' and largestinteger to 9.
In Python 3, instead of assuming that strings are always greater than integers, Python throws an error, which is what you are seeing.
You need to manually convert the string to an integer using the int() method. So, that line of code will become:
if int(sentence[i]) > largestdigit:
largestdigit = int(sentence[i])
EDIT: As user falsetru mentioned in the comments, another alternative is to make everything strings, in which case Python will evaluate them based on their ASCII code, and your digits comparison will work correctly. In this case, all you need to do is modify the line where you initialize largestdigit:
largestdigit = '0'
and also the comparision you make in the OUTPUT section:
if largestdigit == '0':

Related

In Ocaml, when comparing strings (which contain numbers), how are the boolean values evaluated?

The string comparison "3" <= "4";; evaluates as "bool = true"
Here 3 is less than 4 so this makes sense.
This string comparison "3" <= "9";;evaluates as "bool = true"
3 is less than 9 so this makes sense.
Why then does the string comparison "3" <= "10";; evaluate to "bool = false"?
Does it have to do with the length of strings, or perhaps their ASCII values?
Thank you for your time.
It's a normal lexicographical order.
"3" > "10" for the same reason that "d" > "ba".
The first character of string A is compared to the first character of string B. If they're different, you're done.
If they're the same, then the second character of string A is compared to the second character of string B. If they're different, you're done.
If they're the same, then the third character ...
This continues until either both strings run out of characters at the same time (then they're equal) or one of the strings runs out first (that string is "less than" the other one).

Convert any Unicode string to int

I have an arbitrary Unicode string that represents a number, such as "2", "٢" (U+0662, ARABIC-INDIC DIGIT TWO) or "Ⅱ" (U+2161, ROMAN NUMERAL TWO). I want to convert that string into an int. I don't care about specific locales (the input might not be in the current locale); if it's a valid number then it should get converted.
I tried QString.toInt and QLocale.toInt, but they don't seem to get the job done. Example:
bool ok;
int n;
QString s = QChar(0x0662); // ARABIC-INDIC DIGIT TWO
n = s.toInt(&ok); // n == 0; ok == false
QLocale anyLocale(QLocale::AnyLanguage, QLocale::AnyScript, QLocale::AnyCountry);
n = anyLocale.toInt(s, &ok); // n == 0; ok == false
QLocale cLocale = QLocale::C;
n = cLocale.toInt(s, &ok); // n == 0; ok == false
QLocale arabicLocale = QLocale::Arabic; // Specific locale. I don't want that.
n = arabicLocale.toInt(s, &ok); // n == 2; ok == true
Is there a function I am missing?
I could try all locales:
QList<QLocale> allLocales = QLocale::matchingLocales(QLocale::AnyLanguage, QLocale::AnyScript, QLocale::AnyCountry);
for(int i = 0; i < allLocales.size(); i++)
{
n = allLocales[i].toInt(s, &ok);
if(ok)
break;
}
But that feels slightly hackish. Also, it does not work for all strings (e.g. Roman numerals, but that's an acceptable limitation). Are there any pitfalls when doing it that way, such as conflicting rules in different locales (cf. Turkish vs. non-Turkish letter case rules)?
I' not aware of any ready to use package which does this (but
maybe ICU supports it), but it isn't hard to do if you really
want to. First, you should download the UnicodeData.txt file
from http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
This is an easy to parse ASCII file; the exact syntax is
described in http://www.unicode.org/reports/tr44/tr44-10.html,
but for your purposes, all you need to know is that each line in
the file consists of semi-colon separated fields. The first
field contains the character code in hex, the third field the
"general category", and if the third field is "Nd" (numeric,
decimal), the seventh field contains the decimal value.
This file can easily be parsed using Python or a number of other
scripting languages, to build a mapping table. You'll want some
sort of sparse representation, since there are over a million
Unicode characters, of which very few (a couple of hundred) are
decimal digits. The following Python script will give you a C++
table which can be used to initialize an
std::map<int, int>;. If the character is
in the map, the mapped element is its value.
Whether this is sufficient or not depends on your application.
It has several weaknesses:
It requires extra logic to recognize when two successive
digits are in different alphabets. Presumably a sequence "1١"
should be treated as two numbers (1 and 1), rather than as one
(11). (Because all of the sets of decimal digits are in 10
successive codes, it would be fairly easy, once you know the
digit, to check whether the preceding digit character was in the
same set.)
It ignores non-decimal digits, like ௰ or ൱ (Tamil ten and
Malayam one hundred). There aren't that many of them, and they are
also in the UnicodeData.txt file, so it might be possible to
find them manually and add them to the table. I don't know
myself, however, how they combine with other digits when numbers
have been composed.
If you're converting numbers, you might have to worry about
the direction. I'm not sure how this is handled (but there is
documentation at the Unicode site); in general, text will appear
in its natural order. In the case of Arabic and related
languages, when reading in the natural order, the low order
digits appear first: something like "١٢" (literally "12",
but because the writing is from right to left, the digits will
appear in the order "21") should be interpreted as 12, and not 21. Except that I'm not sure whether a change direction mark is
present or not. (The exact rules are described in the
documentation at the Unicode site; in the UnicodeData.txt file,
the fifth field—index 4—gives this information. I
think if it's anything but "AN", you can assume the big-endian
standard used in Europe, but I'm not sure.)
Just to show how simple this is, here's the Python script to
parse the UnicodeData.txt file for the digit values:
print('std::pair<int, int> initUnicodeMap[] = {')
for line in open("UnicodeData.txt"):
fields = line.split(';')
if fields[2] == 'Nd':
print(' {{{:d}, {:d}}},'.format(int(fields[0], 16), int(fields[7])))
print('};')
If you're doing any work with Unicode, this files is a gold mine
for generating all sorts of useful tables.
You can get the numeric equivalent of an unicode character with the method QChar::digitValue:
int value = QChar::digitValue((uint)0x0662);
It will return -1 if the character does not have numeric value.
See the documentation if you need more help, I don't really know much about c++/qt
Chinese numerals mentioned in that wikipedia article belong to 0x4E00-0x9FCC. There is no useful metadata about individual characters in this range:
4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
9FCC;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;
So if you wish to map chinese numerals to integers, you must do that mapping yourself, simple as that.
Here's simple mapping of the symbols in the wikipedia article where a single symbol maps to some single number:
0x96f6,0x3007 = 0
0x58f9,0x4e00,0x5f0c = 1
0x8cb3,0x8d30,0x4e8c,0x5f0d,0x5169,0x4e24 = 2
0x53c3,0x53c1,0x4e09,0x5f0e,0x53c3,0x53c2,0x53c4,0x53c1 = 3
0x8086,0x56db,0x4989 = 4
0x4f0d,0x4e94 = 5
0x9678,0x9646,0x516d = 6
0x67d2,0x4e03 = 7
0x634c,0x516b = 8
0x7396,0x4e5d = 9
0x62fe,0x5341,0x4ec0 = 10
0x4f70,0x767e = 100
0x4edf,0x5343 = 1000
0x842c,0x842c,0x4e07 = 10000
0x5104,0x5104,0x4ebf = 100000000
0x5e7a = 1
0x5169,0x4e24 = 2
0x5440 = 10
0x5ff5,0x5eff = 20
0x5345 = 30
0x534c = 40
0x7695 = 200
0x6d1e = 0
0x5e7a = 1
0x4e24 = 2
0x5200 = 4
0x62d0 = 7
0x52fe = 9

python bitwise_xor

I am having a problem with an xor search.
I have an array composed of binary values. My list contains 1000 distinct binary values, and I want to time how long it takes for a double loop to find an element in the list. Therefore for a double loop search, I expect it to go through the loop [(1) + (2) +(3)+...+(1000)] = 500500 times. [n(n+1) / 2]
I use the bitwise_xor in the following code
from numpy import bitwise_xor
count = 0
for word1 in listOutTextnoB:
for word2 in listOutTextnoB:
count+=1
if bitwise_xor(word1,word2)==0:
break
print "count"
Unfortunately, when I print count, I get count = 1,000,000
If I change the if statement to
if bitwise_xor(word1,word2):
break
count is 1000
I also tried to do:
if word1^word2==0:
break
but it gives me "TypeError: unsupported operand type(s) for ^: 'str' and 'str'"
A working example would be:
1101110111010111011101101110110010111100101111001 XOR 1101110111010111011101101110110010111100101111001
it should give me 0 and exit the inner loop
What is wrong with code?
^ works on integers, not arrays, so that is not surprising.
I don't know why you used strings but:
from numpy import bitwise_xor
listOutTextnoB = range(1000)
count = 0
for word1 in listOutTextnoB:
for word2 in listOutTextnoB:
count+=1
if bitwise_xor(word1,word2)==0:
break
print "count", count
prints
count 500500
as you predict.
EDIT: yes, you should be doing
if int(word1) ^ int(word2) == 0:
break
bitwise_xor is actually returning 'NotImplemented' for every string, string input.
Your error shows the problem: the values in your list are strings, not numbers. I'm not sure what bitwise_xor does to them, but I'm pretty sure it won't convert them to numbers first. If you do this manually (bitwise_xor (int (word1), int (word2))), I think it should work.

Limit size of a list in python

I want to limit the size of a list in python 2.7 I have been trying to do it with a while loop but it doesn't work
l=[]
i=raw_input()//this is the size of the list
count=0
while count<i:
l.append(raw_input())
count=count+1
The thing is that it does not finish the loop. I think this problem has an easy answer but I can't find it.
Thanks in advance
I think the problem is here:
i=raw_input()//this is the size of the list
raw_input() returns a string, not an integer, so comparisons between i and count don't make sense. [In Python 3, you'd get the error message TypeError: unorderable types: int() < str(), which would have made things clear.] If you convert i to an int, though:
i = int(raw_input())
it should do what you expect. (We'll ignore error handling etc. and possibly converting what you're adding to l if you need to.)
Note though that it would be more Pythonic to write something like
for term_i in range(num_terms):
s = raw_input()
l.append(s)
Most of the time you shouldn't need to manually keep track of indices by "+1", so if you find yourself doing it there's probably a better way.
That is because i has a string value type, and int < "string" always returns true.
What you want is:
l=[]
i=raw_input() #this is the size of the list
count=0
while count<int(i): #Cast to int
l.append(raw_input())
count=count+1
You should try changing your code to this:
l = []
i = input() //this is the size of the list
count = 0
while count < i:
l.append(raw_input())
count+=1
raw_input() returns a string while input() returns an integer. Also count+=1 is better programming practice than count = count + 1. Good luck

C++: Program converting postfix to evaluation

How can I convert the char in the array into an integer?
Ignore lines 5-100 it is just my stack.
http://ideone.com/KQytD
Scroll down output #2 worked properly but output #3 did not. Some how when I pushed the value back into the stack and when I popped it it had the +'43' because of the ASCII and I cannot seem to get it into a regular integer value so I can do these operations easily.
line 116 puts input into char postfix. NOTE: input must be in postfix notation line 117 puts the single integer value into final after it has run through the function.
convertPostfixToEvaluation works as such: I scroll through each index of postfix until I read in '=' then I output the total/sum. The first if statement pushed the operands (0-9) into a stack. The second if statement if it reads in an operator then it attempts to do the operation as such in lines 134-158. After the if statements I increase the index value by 1 so it can scan the entire array.
The issue lies within the switch where I try adding,subtracting,multiply, or dividing more than 3 operands. so the 3rd one i believe is still has the value (+43 because of the ASCII).
My outputs(on the bottom of my program) show what the awkwardness is.
The cut to the chase issue. Issue converting char to int the second time around.
There are many things very likely wrong with this code.
Look up the function isdigit. This should eliminate the huge if statement.
You may want to use a string lookup instead of the other complex if statement:
const std::string my_operators = "+-/*";
if (my_operators.find(postfix[i]) != std::string::npos)
{
// Enter here if the character is a valid symbol.
}
If you "parse" character by character, you will have to build your number:
int number = 0;
// After detecting the character is a number:
number = number * 10 + (postfix[i] - '0');
The expression "postfix[i] - '0'" will return the distance between the number character and the character for zero. The C and C++ languages guarantee the following relationship:
'0' < '1' < '2' < '3' < '4' < '5' < '6' < '7' < '8' < '9'
The languages also state that those numbers are contiguous.
Suggestion: use std::string instead of an array of characters. The std::string contains some helpful functions for searching, skipping characters, and obtaining a substring.