formatting the binary representation of data in gdb

formatting the binary representation of data in gdb - gdb

these days i use the examine memory feature of gdb a lot. however, i find the binary representation of data not very readable as all the bits are cramped together. i'd like to add some spacing to make it more readable, so for example instead of 01101100011011000110010101001000 i'll have 0110-1100-0110-1100-0110-0101-0100-1000 or something similar.
is this possible? the closest i got was x/4bt s which is close, but there are still two problems: the data is grouped in bytes (8 bits and not 4) which are layed out in reverse (so its 01001000 01100101 01101100 01101100)
thanks

What you need is something similar to pretty printers for GDB, they are most often use to print an STL / linked list etc. There might be some solution for your need that already exists so google a bit for pretty printers.
The below script is just an example of how you can write a custom command with python extensions, this will work of GDB version 7.3 and greater, however I tested it for version 7.5.
import gdb
class ppbin(gdb.Command):
def __init__(self):
super(ppbin, self).__init__("ppbin", gdb.COMMAND_USER)
def invoke(self, arg, tty):
print arg
arg_list = gdb.string_to_argv(arg)
if len(arg_list) < 2:
print "usage: <address>, <byte-count>"
return
res = gdb.execute("x/%sxt %s" %(arg_list[1], arg_list[0]), False, True)
res = res.split("\t")
ii = 0
for bv in res:
if ii % 4:
print "%s-%s-%s-%s-%s-%s-%s-%s" %(bv[0:4], bv[4:8],
bv[8:12], bv[12:16], \
bv[16:20], bv[20:24], \
bv[24:28],bv[28:32])
ii += 1
ppbin()
Invoking the new ppbin command
(gdb) source pp-bin.py
(gdb) ppbin 0x601040 10
0x601040 10
0000-0000-1010-1010-0011-0011-0101-0101
0000-0000-0000-0000-0000-0000-0000-0000
0000-0000-0000-0000-0000-0000-0000-0000
0000-0000-0000-0000-0000-0000-0000-0000
0000-0000-0000-0000-0000-0000-0000-0000
0000-0000-0000-0000-0000-0000-0000-0000
0000-0000-0000-0000-0000-0000-0000-0000
0000-0000-0000-0000-0000-0000-0000-0000
(gdb)
Above code is shared https://skamath#bitbucket.org/skamath/ppbin.git
P.S. - I usually find debugging memory in hex (x command) is easier than binary, so I will not use my solution.

Related

How do I access binary data via python registry?

The data in the registry key looks like:
Name Type Value
Data REG_BINARY 60 D0 DB 9E 2D 47 Cf 01
The data represent 8 bytes (QWORD little endian) filetime value. So why they chose to use binary rather than REG_QWORD is anyones guess.
If the python 2.7 code I can see the data value has been located and a value object contains the key information such as
print "***", value64.name(), value64.value_type(), value64.value
*** Data 3 <bound method RegistryValue.value of <Registry.Registry.RegistryValue object at 0x7f2d500b3990>>
The name 'Data' is correct and the value_type of 3 means REG_BINARY so that is correct.
The documentation to the python.registry (assuming I have the right doc) is
https://github.com/williballenthin/python-registry/blob/master/documentation/registry.html
However I am can't figure out what methods/functions have been provided to process binary data.
Because I know this binary data will always be 8 bytes I'm tempted to cast the object pointer to a QWORD (double) pointer and get the value directly but I'm not sure the object points to the data or how I would do this in python anyway.
Any pointers appreciated.

I figured out the type of the value64.value() was a 'str' so then I used simple character indexing to reference each of the 8 bytes and converted the value to a float.
def bin_to_longlong(binval):
return ord(binval[7])*(2**56) + ord(binval[6])*(2**48) + ord(binval[5])*(2**40) + ord(binval[4])*(2**32) + \
ord(binval[3])*(2**24) + ord(binval[2])*(2**16) + ord(binval[1])*(2**8) + ord(binval[0])
Code by me.
which can be tidied up by using struct.unpack like so:
return struct.unpack('<Q', binval)[0] # '<Q' little endian long long
And converted the float (filetime value) to a date.
EPOCH_AS_FILETIME = 116444736000000000 # January 1, 1970 as MS file time
HUNDREDS_OF_NANOSECONDS = 10000000
def filetime_to_dt(ft):
return datetime.fromtimestamp((ft - EPOCH_AS_FILETIME) / HUNDREDS_OF_NANOSECONDS)
Code from : https://gist.github.com/Mostafa-Hamdy-Elgiar/9714475f1b3bc224ea063af81566d873
Like so :
value64date = filetime_to_dt(bin_to_longlong(value64.value()))
Now hopefully someone can show me how to do that elegantly in python!

Python 2 str.decode('hex') in Python 3?

I want to send hex encoded data to another client via sockets in python. I managed to do everything some time ago in python 2. Now I want to port it to python 3.
Data looks like this:
""" 16 03 02 """
Then I used this function to get it into a string:
x.replace(' ', '').replace('\n', '').decode('hex')
It then looks like this (which is a type str by the way):
'\x16\x03\x02'
Now I managed to find this in python 3:
codecs.decode('160302', 'hex')
but it returns another type:
b'\x16\x03\x02'
And since everything I encode is not a proper language, i cannot use utf-8 or some decoders, as there are invalid bytes in it (e.g. \x00, \xFF). Any ideas on how I can get the string solution escaped again just like in python 2?
Thanks

'str' objects in python 3 are not sequences of bytes but sequences of unicode code points.
If by "send data" you mean calling send then bytes is the right type to use.
If you really want the string (not 3 bytes but 12 unicode code points):
>>> import codecs
>>> s = str(codecs.decode('16ff00', 'hex'))[2:-1]
>>> s
'\\x16\\xff\\x00'
>>> print(s)
\x16\xff\x00
Note that you need to double backslashes in order to represent them in code.

There is an standard solution for Python2 and Python3. No imports needed:
hex_string = """ 16 03 02 """
some_bytes = bytearray.fromhex(hex_string)
In python3 you can treat it like an str (slicing it, iterate, etc) also you can add byte-strings: b'\x00', b'text' or bytes('text','utf8')
You also mentioned something about to encode "utf-8". So you can do it easily with:
some_bytes.encode()
As you can see you don't need to clean it. This function is very effective. If you want to return to hexadecimal string: some_bytes.hex() will do it for you.

a = """ 16 03 02 """.encode("utf-8")
#Send things over socket
print(a.decode("utf-8"))
Why not encoding with UTF-8, sending with socket and decoding with UTF-8 again ?

How to mix queue-based and feed-based input in TensorFlow

I've recently migrated to a fully_connected style model that reads inputs from a queue generated from a TFRecords file. This has proven much more efficient, but I still would like to pass parameters interactively with placeholder/feed_dict.
Is there a way to use the same computation graph (say you have a model class that builds a graph in the init method) for both a feed_dict and full_connected functionality? Can you get a placeholder to receive values from a dequeue?

One possibility is to use the recently added (in TensorFlow 0.8) tf.placeholder_with_default() op, which allows you to specify a default value (typically the output of the queue/reader), and also allows you to feed values that might have different shapes.
For example, let's say your queue produces batches of 32 elements, where each elements has 784 features, to give a 32 x 784 matrix.
input_from_queue = ... # e.g. `queue.dequeue_many(32)` or `tf.train.batch(..., 32)`
# input_from_queue.get_shape() ==> (32, 784)
input = tf.placeholder_with_default(input_from_queue, shape=(None, 784))
# input.get_shape() ==> (?, 784)
# ...
train_op = ...
sess.run(train_op) # Takes examples from `queue`.
sess.run(train_op, feed_dict={input: ...}) # Takes examples from `feed_dict`.
This allows you to feed in variable-sized batches or use an input reader, as desired.

You could simply feed the output of the dequeue operation. TensorFlow would not actually dequeue any item, it would just use the value you provided. For example:
q = tf.FIFOQueue(capacity=10, dtypes=[tf.float32], shapes=[()])
v = tf.placeholder(tf.float32)
enqueue = q.enqueue([v])
dequeue = q.dequeue()
output = dequeue + 10.0
with tf.Session() as sess:
sess.run(enqueue, feed_dict={v: 1.0})
sess.run(enqueue, feed_dict={v: 2.0})
sess.run(enqueue, feed_dict={v: 3.0})
print(sess.run(output)) # 11.0
print(sess.run(output, feed_dict={dequeue: 5.0})) # 15.0
print(sess.run(output)) # 12.0
print(sess.run(output)) # 13.0

Reproducing legacy binary file with Python

I'm trying to write a legacy binary file format in Python 2.7 (the file will be read by a C program).
Is there a way to output the hex representation of integers to a file? I suspect I'll have to roll my own (not least because I don't think Python has the concept of short int, int and long int), but just in case I thought I'd ask. If I have a list:
[0x20, 0x3AB, 0xFFFF]
Is there an easy way to write that to a file so a hex editor would show the file contents as:
20 00 AB 03 FF FF
(note the endianness)?

Since you have some specific formatting needs, I think that using hex is out - you don't need the prefix. We use format instead.
data = [0x20, 0x3AB, 0xFFFF]
def split_digit(n):
""" Bitmasks out the first and second bytes of a <=32 bit number.
Consider checking if isinstance(n, long) and throwing an error.
"""
return (0x00ff & n, (0xff00 & n) >> 8)
[hex(x) + ' ' + hex(y) for x, y in [split_digit(d) for d in data]]
# ['0x20 0x0', '0xab 0x3', '0xff 0xff']
with open('myFile.bin', 'wb') as fh:
for datum in data:
little, big = split_digit(datum)
fh.write(format(little, '02x'))
fh.write(format(big, '02x'))
...or something like that? You'll need to change the formatting a bit, I bet.

gdb - how to see what function used stdout?

I have an OpenGL library bug I'm trying to trace, and the bug prints out something that looks like C code into stdout. This is the program, and the bug occurs as soon as glutMainLoop() is called, but I suspect that it's not that function that is faulty. How would I go tracing what function wrote to stdout?
As per request, the output:
arc_ccw_turn, p = 0
case b
arc_ccw_turn, p = 0
case d
arc_ccw_turn, p = 0
case a
arc_ccw_turn, p = 0
case c
I've reported the bug already, but I'd try and provide a GDB backtrace for the issue too.

If you are using Linux then set a breakpoint on write(), all output to stdout and stderr eventually goes through this function. The following is for x86-64, for other architectures you would need to change register names:
$ gdb /usr/bin/cat
Reading symbols from /usr/bin/cat...(no debugging symbols found)...done.
(gdb) set args /proc/cpuinfo
(gdb) b write
Breakpoint 1 at 0x401740
(gdb) condition 1 ($rdi == 1 || $rdi == 2)
(gdb) display $rdi
(gdb) display $rsi
(gdb) display $rdx
(gdb) r
Starting program: /usr/bin/cat /proc/cpuinfo
3: $rdx = 3368
2: $rsi = 6348800
1: $rdi = 1
(gdb) p (char*)$rsi
$4 = 0x60e000 "processor\t: 0\nvendor_id\t: GenuineIntel\ncpu family\t: 6\nmodel\t\t: 30\nmodel name\t: Intel(R) Core(TM) i7 CPU 870 # 2.93GHz\nstepping\t: 5\nmicrocode\t: 0x5\ncpu MHz\t\t: 1199.000\ncache size\t: 8192 KB\nphy"...

Put breakpoints on std::streambuf::sputc and std::streambuf::sputn. If necessary, do print std::cin::rdbuf() once you're in main, and condition the break point on this being equal to the value use get back from this expression.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

formatting the binary representation of data in gdb - gdb

Related

How do I access binary data via python registry?

Python 2 str.decode('hex') in Python 3?

How to mix queue-based and feed-based input in TensorFlow

Reproducing legacy binary file with Python

gdb - how to see what function used stdout?

Categories

Resources