Unicode to integer with preceding zeros in python - python-2.7

I am new to python. I am using python 2.7 version I have two unicode string variables,
a = u'0125', b = u'1234'
Now i want to convert this variables into integer and append it into a List like [0125, 1234]. This is my expected output.
I have tried to convert these variables into integer and appended it into the List and got the output as [125, 1234]. Preceeding zero is missing in that value. Can someone give better solution for this?.

Related

python floating point, why is the print result different (examples given)?

a = (random.random(), random.random())
print(a)
print(a[0])
the result is:
(0.4817527913069962, 0.7017598562799067)
0.481752791307
What extra is happening behind printing a tuple(similar behavior for list)? Why is there extra fraction?
Thanks a lot.
BTW, this is python 2.7
What you are seeing is the difference between the formatting choices made by str(float) and repr(float). In Python 2.x, str(float) returns 12 digits while repr(float) returns 17 digits. In its interactive mode, Python uses str() to format the result. That accounts for the 12 digits of precision when formatting a float. But when the result is a tuple or list, the string formatting logic uses repr() to format each element.
The output of repr(float) must be able to be converted back to the original value. Using 17 digits of precision guarantees that behavior. Python 3 uses a more sophisticated algorithm that returns the shortest string that will round-trip back to the original value. Since repr(float) frequently returns a more friendly appearing result, str(float) was changed to be the same as repr(float).

Is there an equivalent of the "of some_array{*}" form for use in SAS functions

Our database predates our database software having good unicode support, and in its place has a psuedo-base64 encoding which it uses to store UTF16 characters in an ascii field. I am writing a function to convert this type of field into straight UTF8 within SAS.
The function loops through the string converting each set of three ascii characters into a unicode character and placing it in an array. When experimenting with code in a data step I had used cat(of final{*}) to convert the array into a string, but the same code does not appear to be valid within a function.
I am currently collating the string in the loop with collate = trim(collate)!!trim(final{i}) and an arbitrary length collate string, but I would like to produce this directly from the array or at least set the size of the collate string based on the length of the input string.
I've included a pastebin of the data and function here.
Edit: The version of SAS I was using is 9.3
The same code is valid in a function in SAS 9.4 TS1M3; it may not be in earlier versions (significant changes were made to how arrays were handled in FCMP in 9.4 and in maintenance releases TS1M2 and 3).
However, this doesn't really solve your arbitrary length problem; when I run your function with
outtext = cat(of final{*});
return (outtext);
I get... 1 character! And when I run
return(cats(of final{*}));
output:
Obs text_enc finaltext
1 ABCABlABjABhAB1ABzABlAAgABVABUABGAA4AAgABpABzAAgABoABhAByABk BecauseU
2 ABTABpABtABwABsABlAByAAgABsABpABrABlAAgAB0ABoABpABz Simplerl
3 ABJABvAAgABJABvAAgABCAByABvABtABpABvABz IoIoBrom
which is a bit better (cats trims for you), I still only get 8 characters. That's because 8 characters is the default length in SAS for an undeclared character variable. Expand the length (using a length statement for outtext) and you get:
Obs text_enc finaltext
1 ABCABlABjABhAB1ABzABlAAgABVABUABGAA4AAgABpABzAAgABoABhAByABk BecauseUTF8ishard
2 ABTABpABtABwABsABlAByAAgABsABpABrABlAAgAB0ABoABpABz Simplerlikethis
3 ABJABvAAgABJABvAAgABCAByABvABtABpABvABz IoIoBromios
You'll still need to define whatever length you need, then. FCMP doesn't, as far as I know, allow for a way to have an undefined-length string; you need to define the default (and maximum) length for the string you're going to return. The user is welcome to define a shorter length, and should, when it's appropriate.

How to decode a NCR to unicode character in C++

I want to decode a NCR value something like &# 35686; to its equivalent chinese character. Example:test direct(&# 35686;&# 23519;) should be converted to test direct(警察) . I have tried the below algorithm similar to java
find the decimal value that is between &# and ; --ie. 35686
Convert to int
get the equivalent char by using char(35686) which will give the unicode char
In java , it produces expected output but in C++ it produces the string as test direct( fß) instead of chinese.
Please help me out of this.

Removing items in a set

I have a function that takes a set, prints the min/max, and then asks the user to remove items in the set until the set is empty.
My code:
def deleteFromNumSet(numsSet):
while len(numsSet)!=0:
print("Max is",max(numsSet),"and Min is", min(numsSet))
num=input("Enter a number between the two.")
if num in numsSet:
print("The number is in the set.")
numsSet.remove(num)
else:
print("The number is not in the set.")
return("No more numbers left in the set.")
The code will say that the "number is not in the set", regardless of whether or not it actually is in the set. It worked using the emulator on repl.it (which was where I coded it originally), but it does not work on my Python program (I am currently using version 3.4.1). I would like to know why it worked on the (older) version of python but does not work now, and a solution that would work for current versions of python.
input() returns a string, not an integer. If our set contains integers then Python will not consider the string equal to the numbers.
Convert the input to an integer first:
num = int(input("Enter a number between the two."))
In Python 2.7, input() is a different function; it essentially does the same as eval(input()) would do in Python 3. As such it automatically interprets digits as a Python integer literal.
To make your code work in both Python 2 and 3 would require a lot more experience with both versions. See the Porting Python 2 Code to Python 3 how-to if you really want to go this way.

Segment a korean word into individual syllables - C++/Python

I am trying to segment a Korean string into individual syllable.
So the input would be a string like "서울특별시" and the outcome "서","울","특","별","시".
I have tried with both C++ and Python to segment a string but the result is a series of ? or white spaces respectively (The string itself however can be printed correctly on the screen).
In c++ I have first initialized the input string as string korean="서울특별시" and then used a string::iterator to go through the string and print each individual component.
In Python I have just used a simple for loop.
I have wondering if there is a solution to this problem. Thanks.
I don't know Korean at all, and can't comment on the division into syllables, but in Python 2 the following works:
# -*- coding: utf-8 -*-
print(repr(u"서울특별시"))
print(repr(u"서울특별시"[0]))
Output:
u'\uc11c\uc6b8\ud2b9\ubcc4\uc2dc'
u'\uc11c'
In Python 3 you don't need the u for Unicode strings.
The outputs are the unicode values of the characters in the string, which means that the string has been correctly cut up in this case. The reason I printed them with repr is that the font in the terminal I used, can't represent them and so without repr I just see square boxes. But that's purely a rendering issue, repr demonstrates that the data is correct.
So, if you know logically how to identify the syllables then you can use repr to see what your code has actually done. Unicode NFC sounds like a good candidate for actually identifying them (thanks to R. Martinho Fernandes), and unicodedata.normalize() is the way to get that.