How to debug a raw binary with gdb - gdb

I have a executable for an embedded device.
It does not have header information that gdb recognizes, but instead uses a proprietary header specified by the vendor.
I can analyse the file just fine using IDA-pro, but I'd like to run some code to see what it does.
The executable is loaded at address 0x52000000
However if I just load the file using
exec-file myfile
I get
"myfile": not in executable format: File format not recognized
And if I restore the memory at the correct location using:
restore myfile 52000000
I get:
You can't do that without a process to debug.
How do I get out of this chicken-and-egg problem?
I just want to jump in the middle of the code, set some registers to predetermined values and run some code to see what happens.
Note that I'm using the gdb ARM toolchain from ARM itself.

As per #artless_noise suggestion I did the following:
objcopy.exe
--output-target=elf32-bigarm
--input-target=binary
--change-start=0x52000000
INPUTFILE OUTPUTFILE
This adds an elf header to the file.
However it does not fix the whole problem.
The output of
readelf.exe -a OUTPUTFILE
gives:
ELF Header:
Magic: 7f 45 4c 46 01 02 01 61 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, big endian
Version: 1 (current)
OS/ABI: ARM
ABI Version: 0
Type: REL (Relocatable file)
Machine: ARM
Version: 0x1
Entry point address: 0x52000000
Start of program headers: 0 (bytes into file)
Start of section headers: 57316 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 40 (bytes)
Number of section headers: 5
Section header string table index: 2
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .data PROGBITS 00000000 000034 00df8c 00 WA 0 0 1
.....
Note that the .data section still has an address of 0x00000000. This should be 0x52000000.
To fix this I opened up a hex editor at address 0xdf8c.
This is close the where the section headers are.
The structure of the section headers is as follows, along with the data I expect to be there.
typedef struct {
Elf32_Word sh_name;
Elf32_Word sh_type; = 1 {.data}
Elf32_Word sh_flags; = ?
Elf32_Addr sh_addr; = 0x00000000
Elf32_Off sh_offset; = 0x00000034
Elf32_Word sh_size; = 0x0000df8c
Elf32_Word sh_link;
Elf32_Word sh_info;
Elf32_Word sh_addralign;
Elf32_Word sh_entsize;
} Elf32_Shdr;
The first header is always all zeros, the second header is the .data section.
So I look for the magic numbers and fill in the starting address, save the file and reload it into gdb.
Now it works

Related

Linking with Windows Projected File System DLL/LIB

I am trying to build the RegFS sample to better understand the Windows Projected File System. My code is building without a warning, but I am getting dynamic linking errors. Below is a sample error, with the code causing it right below.
"The procedure entry point PrjWritePlaceholderInfo could not be located in the dynamic link library."
HRESULT VirtualizationInstance::WritePlaceholderInfo(
LPCWSTR relativePath,
PRJ_PLACEHOLDER_INFO* placeholderInfo,
DWORD length
) {
return PrjWritePlaceholderInfo(
_instanceHandle,
relativePath,
placeholderInfo,
length);
}
I'm sure I did something wrong when I was linking. Under [Project Property Pages] > Linker > Input, I prepended "ProjectedFSlib.lib" to "Additional Dependencies."
This is my first time using Visual Studio with libraries not linked in by default, and I've been unable to find instructions on how to locate and link libraries within the Windows SDK.
Thanks for your help!
EDIT:
The DUMPBIN output is:
Dump of file ProjectedFSLib.lib
File Type: LIBRARY
Exports
ordinal name
PrjAllocateAlignedBuffer
PrjClearNegativePathCache
PrjCloseFile
PrjCommandCallbacksInit
PrjCompleteCommand
PrjConfigureVolume
PrjConvertDirectoryToPlaceholder
PrjCreatePlaceholderAsHardlink
PrjDeleteFile
PrjDetachDriver
PrjDoesNameContainWildCards
PrjFileNameCompare
PrjFileNameMatch
PrjFillDirEntryBuffer
PrjFreeAlignedBuffer
PrjGetOnDiskFileState
PrjGetVirtualizationInstanceIdFromHandle
PrjGetVirtualizationInstanceInfo
PrjMarkDirectoryAsPlaceholder
PrjOpenFile
PrjReadFile
PrjStartVirtualizationInstance
PrjStartVirtualizationInstanceEx
PrjStartVirtualizing
PrjStopVirtualizationInstance
PrjStopVirtualizing
PrjUpdateFileIfNeeded
PrjUpdatePlaceholderIfNeeded
PrjWriteFile
PrjWriteFileData
PrjWritePlaceholderInfo
PrjWritePlaceholderInformation
PrjpReadPrjReparsePointData
Summary
D8 .debug$S
14 .idata$2
14 .idata$3
8 .idata$4
8 .idata$5
14 .idata$6
A DUMPBIN of the executable imports results in:
Dump of file regfs.exe
File Type: EXECUTABLE IMAGE
Section contains the following imports:
PROJECTEDFSLIB.dll
14006D2A0 Import Address Table
14006D9E0 Import Name Table
0 time date stamp
0 Index of first forwarder reference
1E PrjWritePlaceholderInfo
1D PrjWriteFileData
19 PrjStopVirtualizing
17 PrjStartVirtualizing
C PrjFileNameMatch
D PrjFillDirEntryBuffer
E PrjFreeAlignedBuffer
0 PrjAllocateAlignedBuffer
11 PrjGetVirtualizationInstanceInfo
12 PrjMarkDirectoryAsPlaceholder
B PrjFileNameCompare
KERNEL32.dll
14006D098 Import Address Table
14006D7D8 Import Name Table
0 time date stamp
0 Index of first forwarder reference
389 IsProcessorFeaturePresent
382 IsDebuggerPresent
466 RaiseException
1B1 FreeLibrary
BA CreateDirectoryW
116 DeleteFileW
59A TerminateProcess
4BD RemoveDirectoryW
621 WriteFile
C2 CreateFile2
86 CloseHandle
267 GetLastError
3F2 MultiByteToWideChar
21D GetCurrentProcess
57B SetUnhandledExceptionFilter
5BC UnhandledExceptionFilter
4E1 RtlVirtualUnwind
4DA RtlLookupFunctionEntry
4D3 RtlCaptureContext
477 ReadFile
2B5 GetProcAddress
5DD VirtualQuery
2BB GetProcessHeap
60D WideCharToMultiByte
450 QueryPerformanceCounter
21E GetCurrentProcessId
2F0 GetSystemTimeAsFileTime
36C InitializeSListHead
352 HeapFree
34E HeapAlloc
27E GetModuleHandleW
2D7 GetStartupInfoW
222 GetCurrentThreadId
ADVAPI32.dll
14006D000 Import Address Table
14006D740 Import Name Table
0 time date stamp
0 Index of first forwarder reference
299 RegQueryValueExW
293 RegQueryInfoKeyW
28C RegOpenKeyExW
27D RegEnumValueW
27A RegEnumKeyExW
25B RegCloseKey
281 RegGetValueW
ole32.dll
14006D438 Import Address Table
14006DB78 Import Name Table
0 time date stamp
0 Index of first forwarder reference
2A CoCreateGuid
MSVCP140D.dll
14006D228 Import Address Table
14006D968 Import Name Table
0 time date stamp
0 Index of first forwarder reference
A5 ??1_Lockit#std##QEAA#XZ
6D ??0_Lockit#std##QEAA#H#Z
296 ?_Xlength_error#std##YAXPEBD#Z
297 ?_Xout_of_range#std##YAXPEBD#Z
VCRUNTIME140D.dll
14006D360 Import Address Table
14006DAA0 Import Name Table
0 time date stamp
0 Index of first forwarder reference
3C memcpy
3D memmove
1 _CxxThrowException
E __CxxFrameHandler3
36 _purecall
3B memcmp
21 __std_exception_copy
22 __std_exception_destroy
8 __C_specific_handler
9 __C_specific_handler_noexcept
25 __std_type_info_destroy_list
2E __vcrt_GetModuleFileNameW
2F __vcrt_GetModuleHandleW
31 __vcrt_LoadLibraryExW
ucrtbased.dll
14006D498 Import Address Table
14006DBD8 Import Name Table
0 time date stamp
0 Index of first forwarder reference
2B6 _register_thread_local_exe_atexit_callback
B5 _configthreadlocale
2CE _set_new_mode
4D __p__commode
11D _free_dbg
52C strcpy_s
528 strcat_s
68 __stdio_common_vsprintf_s
2C2 _seh_filter_dll
B6 _configure_narrow_argv
171 _initialize_narrow_environment
172 _initialize_onexit_table
9F _c_exit
E5 _execute_onexit_table
C2 _crt_atexit
C1 _crt_at_quick_exit
54B terminate
39C _wmakepath_s
3B8 _wsplitpath_s
564 wcscpy_s
A4 _cexit
48D getchar
60 __stdio_common_vfwprintf
35 __acrt_iob_func
4 _CrtDbgReport
567 wcslen
176 _invalid_parameter
4B __p___wargv
49 __p___argc
2CB _set_fmode
EA _exit
450 exit
175 _initterm_e
174 _initterm
13E _get_initial_wide_environment
173 _initialize_wide_environment
B7 _configure_wide_argv
5B __setusermatherr
2C6 _set_app_type
561 wcscmp
5 _CrtDbgReportW
4D8 malloc
2B5 _register_onexit_function
A1 _callnewh
2C3 _seh_filter_exe
Summary
1000 .00cfg
1000 .data
2000 .idata
1000 .msvcjmc
5000 .pdata
17000 .rdata
1000 .reloc
1000 .rsrc
37000 .text
18000 .textbss
As evident, it imports all the necessary functions from PROJECTEDFSLIB.dll
Either add ProjectedFSLib.lib to your libraries or add a:
#pragma comment(lib, "ProjectedFSLib.lib")
line in your code. Also, make sure you are using version 10.0.17763.0 of the SDK. If you are using mingw it would not surprise me if this library has not been made available yet.
The Projected FS is still an optional feature of Windows that requires manual installation to use. Go to Control Panel -> Programs and Features -> Turn Windows Features on or off. In that list of optional features, scroll down to "Windows Projected File System" and make sure that it is enabled there. Only after that is done will you have a ProjectedFSLib.dll show up in your system32 directory.
It's also probably worth noting that it looks like there's only an x64 version of this DLL, so if you're building an x86 program, that might be the reason why you're unable to dynamically link with that DLL.

ELF entry point is not valid

I am trying to set a breakpoint on the entry point in a stripped ELF. The ELF was compiled and stripped on Virtual Box Linux Machine.
root#xxxx:~# readelf -e yyyy_not_patched
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x650
Start of program headers: 64 (bytes into file)
Start of section headers: 6792 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 9
Size of section headers: 64 (bytes)
Number of section headers: 31
Section header string table index: 30
Program Headers point to:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000001f8 0x00000000000001f8 R E 0x8
INTERP 0x0000000000000238 0x0000000000000238 0x0000000000000238
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000000009ec 0x00000000000009ec R E 0x200000
LOAD 0x0000000000000dd8 0x0000000000200dd8 0x0000000000200dd8
0x0000000000000268 0x0000000000000278 RW 0x200000
DYNAMIC 0x0000000000000df0 0x0000000000200df0 0x0000000000200df0
0x00000000000001e0 0x00000000000001e0 RW 0x8
NOTE 0x0000000000000254 0x0000000000000254 0x0000000000000254
0x0000000000000044 0x0000000000000044 R 0x4
GNU_EH_FRAME 0x00000000000008a0 0x00000000000008a0 0x00000000000008a0
0x000000000000003c 0x000000000000003c R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000000dd8 0x0000000000200dd8 0x0000000000200dd8
0x0000000000000228 0x0000000000000228 R 0x1
"
When setting a breakpoint in the GDB I am getting "Cannot access memory at address 0x650"
root#xxxx:~# gdb yyyy_not_patched
Reading symbols from login_not_patched...(no debugging symbols found)...done.
(gdb) b *0x650
Breakpoint 1 at 0x650
(gdb) r
Starting program: /root/yyyy_not_patched
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x650
Any idea what could be the issue?
Any idea what could be the issue?
This:
Type: DYN (Shared object file)
means that you are looking at a position-independent executable (a special form of a shared library). Such executables are relocated to a random address before they start running, so setting breakpoint on unrelocated address 0x650 will not work.
What works:
(gdb) set stop-on-solib-events 1
(gdb) run
(gdb) info proc map
# Figure out where the executable got loaded
(gdb) b *($exe_load_address + 0x650)
Example:
$ readelf -h a.out | grep 'Entry point'
Entry point address: 0x620
$ gdb -q ./a.out
(gdb) set stop-on-solib-events 1
(gdb) run
Starting program: /tmp/a.out
Stopped due to shared library event (no libraries added or removed)
(gdb) info proc map
process 67394
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x555555554000 0x555555555000 0x1000 0x0 /tmp/a.out
0x555555754000 0x555555756000 0x2000 0x0 /tmp/a.out
0x7ffff7dda000 0x7ffff7dfd000 0x23000 0x0 /lib/x86_64-linux-gnu/ld-2.19.so
...
(gdb) b *(0x555555554000+0x620)
Breakpoint 1 at 0x555555554620
(gdb) c
Continuing.
Stopped due to shared library event:
Inferior loaded /lib/x86_64-linux-gnu/libc.so.6
(gdb) c
Continuing.
Breakpoint 1, 0x0000555555554620 in _start ()
(gdb) bt
#0 0x0000555555554620 in _start ()

How to identify unknown compression algorithm?

I'm trying to parse some text information from within project files of an Adobe program (Adobe Premiere Pro). The project files are gzip compressed XML files. These then contain fields which have base64 encoded fields which contain further information about a component of the project. I'm trying to look at the information within these fields.
I can't for the life of me identify the compression scheme used here. It doesn't look like GZip as it doesn't have the appropriate headers. Can anyone help?
An example of a base64 encoded field is:
AQAAAAAAAACk1QAAAAAAAENvbXByZXNzZWRUaXRsZQB4nO0da1MbSa5/iuu+L34Eg6maY4sQ2OWWhBRmk/hTytiGeM+xfbZJYH/83enRPf2cGfMIMyZdFOCR1Gp1t6TWSOPp//03Eb+KW/FVTERNfBMjsRBLMRYzMRX/FP8QTbElGvC/BpipGAB8CNipuCbsn+JCHItfgGqHaH4V+yIRB0AzE5fQ4rM4h08zsfLgF8BlBX2OCPPB6ndftKDPhujA7y78toCi7tEk4i1x5uvPogv4FV2jdEuieA899MUdYI+BaiG+w9UCpNgHuoW4gRbIOZsqAa4ruloBVvUzgOsRzce+uALsBHpjTkXUKNEinZcJ/OzD+FgGF56II+C1JOgJtWZKF4p0U5AWe7yCn3wJ82lRvhnQroIS+hhTxjOYz1VASgVHaPGK1XO0BMe7JHnHhF+IU1q3GfWwBP37kKm/2N6kdq+PaD5GwH1F/VwBPMSvJXmZ9BfAZU4yqrGHcCw/Ws/AWxUTg9dFkrnSv4G2X2kNlax5ss/FJ1jNE/h7BFy6AGuKPbI57NvHYosefD4R7wDG9GydTG/jkLorDkFLjuDnHaznObVgWh+TwNovQMbvMI4vNJKFHK2ydIU/B8g10OB48XpKPItau/Ppz5U7mweklaxleD2SutkFGlwJrZUH1FufsHh9JS1gS45XwxL4O4Pxm1gFSagvHhnj9whvQhOQZUZyzUB6k8qGsxfKk1NhD4xZyh5HQ7RzR6LweWNprzmadsF4siSur7V6r4FmIP4t9eWGLC5kJdvSSrogax9oEH9Oox9Rn7gLoj1vg9R7YAUsYT5tAl7Q98VnUvfmAEUfqDzhOVzjf2w5pR3pK7UzR8nWlN0O8aERI+YNSdgHyATa4p51l/rYopGGR1fP5WnjLuDqluxwlHrzuWWvTyNDcT+/AawPsC/kec9AX/6SvsPmxPozovk7pBbaqy/Ff4CqT7KgDGGqBDjPaJUU7BA4z0gTcE30+iCPfFqf1xn9nRby0XRo5YgZiY/wf0i+cz+1dB+TgAaOyZZwzx3LK9UijEvAJ49JlteA+xs+s6W5VpBFlRijOSQrmBLuGGgmpFu8Mq8gRtwCSWpP9Il91To9J6ArE9KqOa3Nc0u5fu9IfT9tV9ZzSJ4NVyhEZXontXf00zsEFQ+5UPb+t2QlU5IL929XL8I0qFUzis4/yvHYsbyPTch60A/iLJ1K7V5CfOP2mE2XxaO3Jo+evP+Z0B44lfEHWonaVzny9fEJ3VFcUuSM+4vpWT/KsfL+m0WVkE3OifeAbPOCaLu0fwzSVWnSPRfyWo8+i6/2eubYimjVOJW22leunrFuvNtw7TNbPkz3ijlEzXuc5oU17TKd7WrooLujRi18aVqYpW9LyXUquWye/sU9eDN00NW0+poxYv59WDhrp+7B89vWAneC3BrvFXdSHjy+Lv0dSKsZw73GiOLyHTnjGobzcCt+h886Y7ENkTfPiYth6j9ICp2ltWFob5wPVTPYpThc5VnY2rIp1rdr7Lcr7z9x/nfT/IwNdylD+YkQhWo3Ia+wssZrQpnuT7qHwlnQmuP3EaLi9uY9qLkC9r0pt2J7932BiUsAO4Y7TcxfD1PZbRjTnJD2s62YdDacaa9I3yfgm/vkNfYpOzUmuregf6qtT6faz2jWjgmOd253QV6aTx69zdP0Qmgt1yQD5itcbra/MjHaQ7S9Vqb3uKJZ6VMu4Jr84h21QRvrUCVnDzxcK+Xh07JX8e21XuAJHusrdqOviL4i+oroK9bwFZ3oK6KviL7ip/EVRTSqNqxG6V7XYG3fBO5LNMVraaF5d0IrY64a8oe1Vs/TnKpE41SyN1RDRT3tC1UnUzX7YkpdI+LKAXoDdUccsoUQXUJPjPAo2afup7PAvtHFJsQbJcA6PeJxDVeUAVDWkoXPatsraNuTknINF7Mcn+FqRj6ScwfLlEMRVUKrsSDMVFr0inRQzXwYez9vcwSrdUk+fUIj+ZJmK7pytzkne/T94n1a4tNAqsLMle0+ZRC0bbLmosUh7yJqpPG1HuHHpDnXqSyncn1CFtGWFuG2sfdff+/lJ56wR7272LBEcP18YnAwIQntzfwUlDuvJiYh6Y/IZ4zkHjRMnxFAzfFlt7ndvz3Wp12o+1xSHgXajo07o7lxJcuiSig7xNb1u9zzcOXc9llUPPdY51mR91b+dwfWGz19J10Nnyarb/OpjBM5j7cCny5s0q6/DVxxD1ERwP24mM9tIO4tfEZ4k+qzP+4vSxrqGTEhzXicpTTXsJVtuYM/l7WYFYcq2YopV1mWsrsxltIAbX7M7/0twYeqnYYjMi1/3g7UzHxCrObsge9AG77KZ1XNJzCXXks3vjAxvG+G4PeX4FXpEmyXLkG7dAl2Spdgt3QJOqVLsFe6BM302fQyZWhWQIbyPWOzAr6xWQHv2Mz1jy3KuWzDXLUp6us8i0y/5OqoK9Pes9luoxL2i1JUwYYb9F2tashRBVtGOapgzw36lkY15Cg/8mE5yo9/WI7yoyCWoxqxUDXioUZFYqJGReKiRkVio0ZF4qNGQYz0EDlsuJmVMHPmLHFWZUn5tJ+7srQrYmUpVpZiZSlWlpgqVpZiZSlWlmJlKVaWfmxcXP4dQvn3BuVnWcrPr5SfWSk/p1KNbEr5MlQhi1K+Z6xG9qR877hOZakts5F7IG9VKktapvaz2W6sLMXKUqwsxcpSrCw9Vca+Gv60GjFRrCzFytJ9KkvKl/zclSX1rFOsLMXKUqwsxcpSrCzFylKsLMXKUqws/di4uPw7hPLvDcrPspSfXyk/s1J+TqUa2ZTyZahCFqV8z1iN7En53nG9ytIOxRuvnkmm9SpLSqbms9lurCzFylKsLMXKUqwsPVXGvhr+tBoxUawsxcqSX1kyr5feGHxIXlVpICVAfPh8qDBFQnODObcbgtsjD+OwN8wVYc3lzHov4LbsKYy155jfcGnm2XxsQjnIsVXV0HnMEM5swXk6u06QhcVzlbjKwSc6Xgg+uU290TOMdVu9FuYpba1gW5tGczjwKhYhTEL6iyeSTqjuwu8ANSuC2Xjd1yFo3SDYl4nhk5z6VA/DypCpvQ250ll43Zebj/XhCZ3SxBqj3kNZo1Uakv6hnqvdBPVwaPBSdbRr+szneKmV1hA+m+TGmiV1nUAPmHtcpDh9jVe+bOtJ3Nw4iVsbJ/GrjZN425G4RU8BZEut8VmSa4rHSq99o++LwjjTG2kf7Poh0ztraOhUUrdl+ORSjX8vxtKHuGe0+BRmO398YZw9Ptyd5nIHckeocW4bnLWxMN8jnY0322LkoN56nTU6k8Zs+5b2UD7z0txhwxS8L7Nuhd5M7WLdGOfxcUr4/Tn3i1jMN+jGuCXGLdWMW1pU28339+1Cf6/OBI4xTIxhYgwTY5gYw1Qvhtl7QAzzsKyL2tN+bPQSY5efPXb5MTsTZsZ3Y8yyURJvfsxStsQxTolxSplxivuN8ofkWtQJYTHXEuOVKsYrMdcS45aXFrfEXEuMYWIME34DTcy1xNjlpcQuMdcSY5aXErOULXGMU2KcUm6cYn/H6iG5ls6DopWYa4nxSsy1xLglxi0x1xJjmBjDPEUM03xADBNzLTF2qW7sEnMtMWZ5KTFL2RLHOCXGKQ+NU2wIf//5QpgxQwgWimjUOyEe/k0hG/6O9hmNV2+M8Tm4lBzd4B6l3tNrvpdV52eyKBJa1280whXZnE3pv280nxrfOXhDEQNLOJGW4fLJojLbq3cSuhoSpkANGVAfI/HJGKF6x08IZ7cJ0do0vRy+vQy+vSAt68A47cUcnwnXVL0Mqp6ctQnZ2TQw9jDObtMF+KWBbXgtXQpXhz+BJ0SrxRUJR+Q2hdm/P69hnN2mSGafwpW5Vyhzz5HZxuo3ufot9RsmUaI5eZ5uqgHmOH2c289BSvUNtOgPgNxl9BqmNCP3EN6N4cM87Aj6Qu61rmVnUbljCnnxEAW+ExRnkve9pYwFVun4s7CIsz350/n2/CcTo2+Pvj369ujbo2/fRN+eXwmNvj369ujbo2+Pvv15fbsL40zNKXiSa/JjTGNfn4KMd0LnevQ1Xr2heftO7/C7TnsJQVXPIdghzfdYjlhfvQc+MzGXHvcuuBupN2l+pyznUHykvufe7Pr4RNZn9LlMnE/dB54t+N2i74lzBop1GTWhQ3cuW0DRkdWeOznqEDczI4o82tBuC1o206qD4qvOelDczJZoF5hxs7OLLhTpZrD+fG7RKcFxjV0Pm02VxcH1vtlUmJucwagOBVahfqP1++6tRZgmIT9zCb5jRtlFs7r0Ua4e20kWFe7naMMTsj5ciQui7VJOfJDOVDONL9ajz+KrK1oT2slXOTztc5myddy2AG6rMqr42X5XZQiW1ZptTa1XyJawFlSDq0vxl/Qk6iy1JmHmaYul4PNfbArsR50zZY9rTBqSd77UCqTTZ1TtgeW9AkvZpco2r/qtdYbVJ2lvIWvVuOIzpr6lEvIOpObJ1dlsusSasVPHKluy1zwaxOfNG1IcUVV3In0MtjszOGrtukytE7mu2yqhfY8jB32Cy0B+QlmQW4gGNeucrHCRjudfMBbWTheTUDwypRlAyAp0lWVkax5K7V2Q/XHPB2IleVxS7KtmZD3KmuwR4/UbkllruvYdHInrUwN3Ca/WOAv7ieKDRVrXUXVrxrEOrjJp1Iqr8/y4RnRFq25T1h893mbBeHdyx7tTynjrT6grK4o61P3jfrqPuDDXS9pQO06pavSyQ88wb8HfbXkfrqMMPFOKfWVT3h/fJ3pBvh1ojc8Zxfglxi+bE7+0CuMX9RzM08Qv/DTelnwmLz9+8e21SvHL5kYvpk6vG7kMBT4DcQWeYkyfb+DzF/g/rHg8w2/qz9vhO7k7fOcF7vBT+Vn39tgdX8PNLIZ+wob3ghvQSpZJX9foiS7+7N4xzWkOBtDTBfziLuWfTelTJGRnfemjdUbMhSaWzWDfNfAFyB13Qx0bqjnSMhePoFWREbRyRlDPWZ+6k82qe9muDwDp0qdjMTbsynyiOu/MyivZyvUa9UwMzthhmkXE+0fOnttQRcd2YtdMbAxCsmVnvB6lmoEZ2RZDToAWs/dsc3xehqZRM3kghrRLjMRnqk3MqP3/AUeAlD0=
It's a Zlib-compressed string with a 32-byte uncompressed header.
var encoding = #"";
byte[] data = Convert.FromBase64String(encoding);
var compressedArray = new byte[data.Length - 32];
Array.Copy(data, 32, compressedArray, 0, data.Length - 32);
var decompressed = ZlibStream.UncompressBuffer(compressedArray);
var str = Encoding.Unicode.GetString(decompressed);
The header contains the uncompressed length of the data in little-endian order at offset 8: a5 d5 or 0xd5a4, which equals 54692. It is hard to tell from this example if the uncompressed length is stored as a5 d5, a5 d5 00 00, or a5 d5 00 00 00 00 00 00.

What does "symbol value" from nm command mean?

When you list the symbol table of a static library, like nm mylib.a, what does the 8 digit hex that show up next to each symbol mean? Is that the relative location of each symbol in the code?
Also, can multiple symbols have the same symbol value? Is there something wrong with a bunchof different symbols all having the symbol value of 00000000?
Here's a snippet of code I wrote in C:
#include
#include
void foo();
int main(int argc, char* argv[]) {
foo();
}
void foo() {
printf("Foo bar baz!");
}
I ran gcc -c foo.c on that code. Here is what nm foo.o showed:
000000000000001b T foo
0000000000000000 T main
U printf
For this example I am running Ubuntu Linux 64-bit; that is why the 8 digit hex you see is 16 digit here. :-)
The hex digit you see is the address of the code in question within the object file relative to the beginning of the .text. section. (assuming we address sections of the object file beginning at 0x0). If you run objdump -td foo.o, you'll see the following in the output:
Disassembly of section .text:
0000000000000000 :
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: 89 7d fc mov %edi,-0x4(%rbp)
b: 48 89 75 f0 mov %rsi,-0x10(%rbp)
f: b8 00 00 00 00 mov $0x0,%eax
14: e8 00 00 00 00 callq 19
19: c9 leaveq
1a: c3 retq
000000000000001b :
1b: 55 push %rbp
1c: 48 89 e5 mov %rsp,%rbp
1f: b8 00 00 00 00 mov $0x0,%eax
24: 48 89 c7 mov %rax,%rdi
27: b8 00 00 00 00 mov $0x0,%eax
2c: e8 00 00 00 00 callq 31
31: c9 leaveq
32: c3 retq
Notice that these two symbols line right up with the entries we saw in the symbol table from nm. Bare in mind, these addresses may change if you link this object file to other object files. Also, bare in mind that callq at 0x2c will change when you link this file to whatever libc your system provides, since that is currently an incomplete call to printf (it doesn't know where it is right now).
As for your mylib.a, there is more going on here. The file you have is an archive; it contains multiple object files, each one of which with it's own text segment. As an example, here is part of an nm against /usr/lib/libm.a on my box here
e_sinh.o:
0000000000000000 r .LC0
0000000000000008 r .LC1
0000000000000010 r .LC2
0000000000000018 r .LC3
0000000000000000 r .LC4
U __expm1
U __ieee754_exp
0000000000000000 T __ieee754_sinh
e_sqrt.o:
0000000000000000 T __ieee754_sqrt
e_gamma_r.o:
0000000000000000 r .LC0
U __ieee754_exp
0000000000000000 T __ieee754_gamma_r
U __ieee754_lgamma_r
U __rint
You'll see that multiple text segment entries -- indicated by the T in the second column rest at address 0x0, but each individual file has only one text segment symbol at 0x0.
As for individual files having multiple symbols resting at the same address, it seems like it would be possible perhaps. After all, it is just an entry in a table used to determine the location and size of a chunk of data. But I don't know for certain. I have never seen multiple symbols referencing the same part of a section before. Anyone with more knowledge on this than me can chime in. :-)
Hope this helps some.
The hex numeral is the memory offset into the object files where the symbol can be found. It's literally the number of bytes into the object code.
That value is used by the linker to locate and make a copy of the symbol's value. You can see generally how it's laid out if you add the -S option to nm, which will show you the size of the value for each symbol.
nm shows the values of symbols. Some symbols in a library or object file may show up as zero simply because they haven't been given a value yet. They'll get their actual value at link time.
Some symbols are code symbols, some are data, etc. Before linking the symbol value is often the offset in the section it resides in,

Example invalid utf8 string?

I'm testing how some of my code handles bad data, and I need a few series of bytes that are invalid UTF-8.
Can you post some, and ideally, an explanation of why they are bad/where you got them?
Take a look at Markus Kuhn's UTF-8 decoder capability and stress test file
You'll find examples of many UTF-8 irregularities, including lonely start bytes, continuation bytes missing, overlong sequences, etc.
In PHP:
$examples = array(
'Valid ASCII' => "a",
'Valid 2 Octet Sequence' => "\xc3\xb1",
'Invalid 2 Octet Sequence' => "\xc3\x28",
'Invalid Sequence Identifier' => "\xa0\xa1",
'Valid 3 Octet Sequence' => "\xe2\x82\xa1",
'Invalid 3 Octet Sequence (in 2nd Octet)' => "\xe2\x28\xa1",
'Invalid 3 Octet Sequence (in 3rd Octet)' => "\xe2\x82\x28",
'Valid 4 Octet Sequence' => "\xf0\x90\x8c\xbc",
'Invalid 4 Octet Sequence (in 2nd Octet)' => "\xf0\x28\x8c\xbc",
'Invalid 4 Octet Sequence (in 3rd Octet)' => "\xf0\x90\x28\xbc",
'Invalid 4 Octet Sequence (in 4th Octet)' => "\xf0\x28\x8c\x28",
'Valid 5 Octet Sequence (but not Unicode!)' => "\xf8\xa1\xa1\xa1\xa1",
'Valid 6 Octet Sequence (but not Unicode!)' => "\xfc\xa1\xa1\xa1\xa1\xa1",
);
From http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php#54805
The idea of patterns of ill-formed byte-sequences can be gotten from the table of well-formed byte sequences. See "Table 3-7. Well-Formed UTF-8 Byte Sequences" in the Unicode Standard 6.2.
Code Points First Byte Second Byte Third Byte Fourth Byte
U+0000 - U+007F 00 - 7F
U+0080 - U+07FF C2 - DF 80 - BF
U+0800 - U+0FFF E0 A0 - BF 80 - BF
U+1000 - U+CFFF E1 - EC 80 - BF 80 - BF
U+D000 - U+D7FF ED 80 - 9F 80 - BF
U+E000 - U+FFFF EE - EF 80 - BF 80 - BF
U+10000 - U+3FFFF F0 90 - BF 80 - BF 80 - BF
U+40000 - U+FFFFF F1 - F3 80 - BF 80 - BF 80 - BF
U+100000 - U+10FFFF F4 80 - 8F 80 - BF 80 - BF
Here are the examples generated from U+24B62. I used them for a bug report: Bug #65045 mb_convert_encoding breaks well-formed character
// U+24B62: "\xF0\xA4\xAD\xA2"
"\xF0\xA4\xAD" ."\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"
"\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD"
The oversimplification of range of trailing bytes([0x80, 0xBF]) can be seen in the various libraries.
// U+0800 - U+0FFF
\xE0\x80\x80
// U+D000 - U+D7FF
\xED\xBF\xBF
// U+10000 - U+3FFFF
\xF0\x80\x80\x80
// U+100000 - U+10FFFF
\xF4\xBF\xBF\xBF
,̆ was particularly evil. I see it as combined on ubuntu.
comma-breve
This might not be exactly what OP asked but it's somewhat related :
if you happen to already have byte ordinance values (0 - 255) and wanna know whether a byte# is a valid UTF-8 starting point byte or not, I came up with this strange unified formula that returns a 1 (true) or 0 (false) :
function newUTF8start(__) {
return 118^(+__< 194) < (246-__) }
Fuzz Testing - generate a random sequence of octets. Most likely you'll get some illegal sequences sooner than later.