How do I access binary data via python registry? - python-2.7

The data in the registry key looks like:
Name Type Value
Data REG_BINARY 60 D0 DB 9E 2D 47 Cf 01
The data represent 8 bytes (QWORD little endian) filetime value. So why they chose to use binary rather than REG_QWORD is anyones guess.
If the python 2.7 code I can see the data value has been located and a value object contains the key information such as
print "***", value64.name(), value64.value_type(), value64.value
*** Data 3 <bound method RegistryValue.value of <Registry.Registry.RegistryValue object at 0x7f2d500b3990>>
The name 'Data' is correct and the value_type of 3 means REG_BINARY so that is correct.
The documentation to the python.registry (assuming I have the right doc) is
https://github.com/williballenthin/python-registry/blob/master/documentation/registry.html
However I am can't figure out what methods/functions have been provided to process binary data.
Because I know this binary data will always be 8 bytes I'm tempted to cast the object pointer to a QWORD (double) pointer and get the value directly but I'm not sure the object points to the data or how I would do this in python anyway.
Any pointers appreciated.

I figured out the type of the value64.value() was a 'str' so then I used simple character indexing to reference each of the 8 bytes and converted the value to a float.
def bin_to_longlong(binval):
return ord(binval[7])*(2**56) + ord(binval[6])*(2**48) + ord(binval[5])*(2**40) + ord(binval[4])*(2**32) + \
ord(binval[3])*(2**24) + ord(binval[2])*(2**16) + ord(binval[1])*(2**8) + ord(binval[0])
Code by me.
which can be tidied up by using struct.unpack like so:
return struct.unpack('<Q', binval)[0] # '<Q' little endian long long
And converted the float (filetime value) to a date.
EPOCH_AS_FILETIME = 116444736000000000 # January 1, 1970 as MS file time
HUNDREDS_OF_NANOSECONDS = 10000000
def filetime_to_dt(ft):
return datetime.fromtimestamp((ft - EPOCH_AS_FILETIME) / HUNDREDS_OF_NANOSECONDS)
Code from : https://gist.github.com/Mostafa-Hamdy-Elgiar/9714475f1b3bc224ea063af81566d873
Like so :
value64date = filetime_to_dt(bin_to_longlong(value64.value()))
Now hopefully someone can show me how to do that elegantly in python!

Related

Decoding register data from our Solar Power system

I have a Python program which is accessing one of the devices on our Solar Power system. I can read the registers which are supposed to conform to the SunSpec conventions. I have been able to decode most of the values, but I'm stuck on decoding the TCP_Address and gateway which are sourced from these two registers:
TCP Address:
reg 22 value 49320 in HEX 0xc0a8
reg 23 value 64 in HEX 0x40
Gataway Address:
reg 24 value 49320 in HEX 0xc0a8
reg 25 value 1 in HEX 0x1
the documentation says that the format for these values is "uint32", which I interpret to mean unsigned 32 bit integer. The result of decoding should be something like 192.168.0.?.
Can anyone assist to understand how to convert the above to that format in Python? Thanks...RDK
I would say that
0xc0 0xa8 (0x00) 0x01
is 192.168.0.1, your gateway. Seems you've just missed to note that both registers are 16 bits so you've neglected the high byte..
Here is my solution to this problem:
def Decode_TCPIP(reg1,reg2):
# print("Reg1 = "+ reg1 + " Reg2 = " + reg2)
UpperMask = 0xff00
LowerMask = 0x00ff
First = (reg1 & UpperMask)/256
Second = (reg1 & LowerMask)
Third = (reg2 & UpperMask)/256
Forth = (reg2 & LowerMask)
return First, Second, Third, Forth
the returned values are then the four digits in the IP address....RDK

Python 2 str.decode('hex') in Python 3?

I want to send hex encoded data to another client via sockets in python. I managed to do everything some time ago in python 2. Now I want to port it to python 3.
Data looks like this:
""" 16 03 02 """
Then I used this function to get it into a string:
x.replace(' ', '').replace('\n', '').decode('hex')
It then looks like this (which is a type str by the way):
'\x16\x03\x02'
Now I managed to find this in python 3:
codecs.decode('160302', 'hex')
but it returns another type:
b'\x16\x03\x02'
And since everything I encode is not a proper language, i cannot use utf-8 or some decoders, as there are invalid bytes in it (e.g. \x00, \xFF). Any ideas on how I can get the string solution escaped again just like in python 2?
Thanks
'str' objects in python 3 are not sequences of bytes but sequences of unicode code points.
If by "send data" you mean calling send then bytes is the right type to use.
If you really want the string (not 3 bytes but 12 unicode code points):
>>> import codecs
>>> s = str(codecs.decode('16ff00', 'hex'))[2:-1]
>>> s
'\\x16\\xff\\x00'
>>> print(s)
\x16\xff\x00
Note that you need to double backslashes in order to represent them in code.
There is an standard solution for Python2 and Python3. No imports needed:
hex_string = """ 16 03 02 """
some_bytes = bytearray.fromhex(hex_string)
In python3 you can treat it like an str (slicing it, iterate, etc) also you can add byte-strings: b'\x00', b'text' or bytes('text','utf8')
You also mentioned something about to encode "utf-8". So you can do it easily with:
some_bytes.encode()
As you can see you don't need to clean it. This function is very effective. If you want to return to hexadecimal string: some_bytes.hex() will do it for you.
a = """ 16 03 02 """.encode("utf-8")
#Send things over socket
print(a.decode("utf-8"))
Why not encoding with UTF-8, sending with socket and decoding with UTF-8 again ?

"Unknown Label type" decision tree classifier with floats

I want to use a decision tree to predict the value of a float based on 6 features that are also float values. I realise that a decision tree may not be the best method, but I am comparing multiple methods to try and understand them better
The error I am getting is "Unknown label type" on my y training data list. I have read that "DecisionTreeClassifier" accepts float values, and that typically the values are converted to float 32 anyway. I am explicit setting the values in my list to float32 yet there still seems to be a problem, can anybody help?
sample of my x training data (features_x_train) :
[[ 2.49496743e-01 6.07936502e-01 -4.20752168e-01 -3.88045199e-02
-7.59323120e-01 -7.59323120e-01]
[ 4.07418489e-01 5.36915325e-02 2.95270741e-01 1.87122121e-01
9.89770174e-01 9.89770174e-01]]
sample of my y training data (predict_y_train): [ -7.59323120e-01 9.89770174e-01]
Code...
df_train = wellbeing_df[feature_cols].sample(frac=0.9)
#Split columns into predictor and result
features_x_train =
np.array(df_train[list(top_features_cols)].values).astype(np.float32)
predict_y_train = np.asarray(df_train['Happiness score'], dtype=np.float32)
#Setup decision tree
decision_tree = tree.DecisionTreeClassifier()
decision_tree = decision_tree.fit(features_x_train, predict_y_train)
#Train tree on 90% of available data
error:
ValueError Traceback (most recent call last)
<ipython-input-103-a44a03982bdb> in <module>()
19 #Setup decision tree
20 decision_tree = tree.DecisionTreeClassifier()
---> 21 decision_tree = decision_tree.fit(features_x_train, predict_y_train) #Train tree on 90% of available data
22
23 #Test on remaining 10%
C:\Users\User\Anaconda2\lib\site-packages\sklearn\tree\tree.pyc in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
175
176 if is_classification:
--> 177 check_classification_targets(y)
178 y = np.copy(y)
179
C:\Users\User\Anaconda2\lib\site-packages\sklearn\utils\multiclass.pyc in check_classification_targets(y)
171 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
172 'multilabel-indicator', 'multilabel-sequences']:
--> 173 raise ValueError("Unknown label type: %r" % y)
174
175
ValueError: Unknown label type: array([[ -7.59323120e-01],
[ 9.89770174e-01],
Also If I change the list to string values then the code runs
Decision Tree Classifier, is, well... a classifier. Classifier is an estimator of function from some arbitrary space (usually R^d) into finite space of values, called label space. Consequently python (scikit-learn) expects you to pass something that is label-like, thus: integer, string, etc. floats are not a typical encoding form of finite space, they are used for regression.
Thus is short you seem to be confusing classification and regression. How to distinguish?
if you have y as floats, but only a finite number of different values can be obtained, and all of them are obtained in training set, then this is classification - just convert your values to strings or integers and you are good to go.
if you have y as floats, and this are actuall real values, and you can have plenty of values, even not seen in the training set and you expect your model to somehow "interpolate" this is regression and you are supposed to use DecisionTreeRegressor instead.
use sklearn.tree.DecisionTreeRegressor()

Reproducing legacy binary file with Python

I'm trying to write a legacy binary file format in Python 2.7 (the file will be read by a C program).
Is there a way to output the hex representation of integers to a file? I suspect I'll have to roll my own (not least because I don't think Python has the concept of short int, int and long int), but just in case I thought I'd ask. If I have a list:
[0x20, 0x3AB, 0xFFFF]
Is there an easy way to write that to a file so a hex editor would show the file contents as:
20 00 AB 03 FF FF
(note the endianness)?
Since you have some specific formatting needs, I think that using hex is out - you don't need the prefix. We use format instead.
data = [0x20, 0x3AB, 0xFFFF]
def split_digit(n):
""" Bitmasks out the first and second bytes of a <=32 bit number.
Consider checking if isinstance(n, long) and throwing an error.
"""
return (0x00ff & n, (0xff00 & n) >> 8)
[hex(x) + ' ' + hex(y) for x, y in [split_digit(d) for d in data]]
# ['0x20 0x0', '0xab 0x3', '0xff 0xff']
with open('myFile.bin', 'wb') as fh:
for datum in data:
little, big = split_digit(datum)
fh.write(format(little, '02x'))
fh.write(format(big, '02x'))
...or something like that? You'll need to change the formatting a bit, I bet.

Python rounding error for ints in a list (list contains ints and strings)

I have a python (2.7) script that reads an input file that contains text setup like this:
steve 83 67 77
The script averages the numbers corresponding to each name and returns a list for each name, that contains the persons name along with the average, for example the return output looks like this:
steve 75
However, the actual average value for "steve" is "75.66666667". Because of this, I would like the return value to be 76, not 75 (aka I would like it to round up to the nearest whole integer). I'm not sure how to get this done... Here is my code:
filename = raw_input('Enter a filename: ')
file=open(filename,"r")
line = file.readline()
students=[]
while line != "":
splitedline=line.split(" ")
average=0
for i in range(len(splitedline)-1) :
average+=int(splitedline[i+1])
average=average/(len(splitedline)-1)
students.append([splitedline[0],average])
line = file.readline()
for v in students:
print " ".join(map(str, v))
file.close()
While your code is very messy and should be improved overall, the solution to your problem should be simple:
average=average/(len(splitedline)-1)
should be:
average /= float(len(splitedline) - 1)
average = int(round(average))
By default in Python 2.x / with two integers does flooring division. You must explicitly make one of the parameters a floating point number to get real division. Then you must round the result and turn it back into an integer.
In Python 3 flooring division is //, and regular division is /. You can get this behavior in Python 2 with from __future__ import division.