PIN get assembly opcodes from instruction address - c++

I am using PIN to analyze a C program's instructions and perform necessary operations. I have compiled my C program using GCC on Ubuntu and then passed the generated executable as input to the pintool. I have a pintool which calls an instruction instrumentation routine and then calls an analysis routine everytime. This is my Pintool in C++ -
#include "pin.H"
#include <fstream>
#include <cstdint>
UINT64 icount = 0;
using namespace std;
KNOB<string> KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "test.out","A pin tool");
FILE * trace;
//====================================================================
// Analysis Routines
//====================================================================
VOID dump(VOID *ip, UINT32 size) {
unsigned int i;
UINT8 opcodeBytes[15];
UINT32 fetched = PIN_SafeCopy(&opcodeBytes[0], ip, size);
if (fetched != size) {
fprintf(trace, "*** error fetching instruction at address 0x%lx",(unsigned long)ip);
return;
}
fprintf(trace, "\n");
fprintf(trace, "\n%d\n",size);
for (i=0; i<size; i++)
fprintf(trace, " %02x", opcodeBytes[i]); //print the opcode bytes
fflush(trace);
}
//====================================================================
// Instrumentation Routines
//====================================================================
VOID Instruction(INS ins, void *v) {
INS_InsertCall( ins, IPOINT_BEFORE, (AFUNPTR)dump, IARG_INST_PTR, IARG_UINT32, INS_Size(ins) , IARG_END);
}
VOID Fini(INT32 code, VOID *v) {
printf("count = %ld\n",(long)icount);
}
INT32 Usage(VOID) {
PIN_ERROR("This Pintool failed\n"
+ KNOB_BASE::StringKnobSummary() + "\n");
return -1;
}
int main(int argc, char *argv[])
{
trace = fopen("test.out", "w");
if (PIN_Init(argc, argv)) return Usage();
PIN_InitSymbols();
PIN_AddInternalExceptionHandler(ExceptionHandler,NULL);
INS_AddInstrumentFunction(Instruction, 0);
PIN_AddFiniFunction(Fini, 0);
// Never returns
PIN_StartProgram();
return 0;
}
When I check my output trace I see that I get an output like this-
3
48 89 e7
5
e8 78 0d 00 00
1
55
The first row is the size in bytes of the instruction and the second row is the opcode stored in each byte.
I saw this particular forum-
https://groups.yahoo.com/neo/groups/pinheads/conversations/topics/4405#
where they mentioned that the Linux output is inconsistent and is due to a 32 bit disassembler for 64 bit instructions. I am getting the same output as the Linux ones mentioned here, while the Windows ones are the correct x86_64 opcodes I am expecting.
Any idea how I can get the correct opcodes and if I am doing the dissassembly wrong, how I can correct it. I am using a 64-bit PC so don't know if I am doing 32-bit disassembly.

In 32-bit mode, 48 is a 1 byte inc or dec (I forget which).
In 64-bit mode, it's a REX prefix (with W=1, other bits unset, selecting 64-bit operand-size). (AMD 64 repurposed the whole 0x40-f range of inc/dec short encodings as REX prefixes.)
Decoding 48 89 e7 as a 3-byte instruction instead of a 48 and 89 e7 is absolute proof that it's disassembling in 64-bit mode.
So how am I supposed to interpret the instruction here?
As x86-64 instructions, obviously.
For your case, I fed those hex bytes to a disassembler:
db 0x48, 0x89, 0xe7
db 0xe8, 0x78, 0x0d, 0x00, 0x00
db 0x55
nasm -f elf64 foo.asm && objdump -drwC -Mintel foo.o
400080: 48 89 e7 mov rdi,rsp
400083: e8 78 0d 00 00 call rel32
400088: 55 push rbp
objdump -d finds the same instruction breaks, because PIN was decoding it correctly.
The push is presumably at the start of the called function. Sticking them together sort of flattens the trace, and isn't a way to make a runnable version, just to get the bytes disassembled.
I should simple ignore the first byte and then use the remaining?
No, of course not. REX prefixes are part of the instruction. Without the 0x48, the first instruction would decode as mov edi,esp, which is a different instruction.
Try looking at some disassembly output for some existing code to get used to what x86-64 instructions look like. For specific encoding details, see Intel's vol.2 manual. It has some intro and appendix sections about instruction-encoding details. (The main body of the manual is the instruction-set reference, with the details of how every instruction works and its opcodes.) See https://software.intel.com/en-us/articles/intel-sdm#three-volume, and other links in the x86 tag wiki.

Pin has an API for disassembly, you should use it. See this question as to how it should be done:
https://reverseengineering.stackexchange.com/questions/12404/intel-pin-how-to-access-the-ins-object-from-inside-an-analysis-function

Related

QDataStream reads and writes more bytes than QFile::length() reports to have

I have a utility that should copy files from one location to another.
The problem I have is when reading X bytes using the QDataStream and writing it, the number of bytes being read/written exceeds the number of bytes the file has. I see this problem happen with a number of files.
I am using a QDataStream::readRawData() and QDataStream::writeRawData() to facilitate reading/writing to and from files as shown below
QDataStream in(&sourceFile);
QDataStream out(&newFile);
// Read/Write byte containers
qint64 fileBytesRead = 0;
quint64 fileBytesWritten = 0;
qint64 bytesWrittenNow = 0;
quint8* buffer = new quint8[bufSize];
while ((fileBytesRead = in.readRawData((char*)buffer, bufSize)) != 0) {
// Check if we have a read/write mismatch
if (fileBytesRead == -1) {
printCritical(TAG, QString("Mismatch read/write: [R:%1/W:%2], total file write/max [W:%3/M:%4]. File may be corrupted, skipping...").arg(QString::number(fileBytesRead), QString::number(bytesWrittenNow), QString::number(fileBytesWritten), QString::number(storageFile.size)));
// close source file handle
sourceFile.close();
// Close file handle
newFile.close();
return BackupResult::IOError;
}
// Write buffer to file stream
bytesWrittenNow = out.writeRawData((const char*)buffer, fileBytesRead);
// Check if we have a read/write mismatch
if (bytesWrittenNow == -1) {
printCritical(TAG, QString("Mismatch read/write: [R:%1/W:%2], total file write/max [W:%3/M:%4]. File may be corrupted, skipping...").arg(QString::number(fileBytesRead), QString::number(bytesWrittenNow), QString::number(fileBytesWritten), QString::number(storageFile.size)));
// close source file handle
sourceFile.close();
// Close file handle
newFile.close();
return BackupResult::IOError;
}
// Add current buffer size to written bytes
fileBytesWritten += bytesWrittenNow;
if(fileBytesWritten > storageFile.size) {
qWarning() << "Extra bytes read/written exceeding file length"; <================= this line is hit every now and then
}
//...
This problem isn't consistent, but it happens every now and then, I have no idea why. Anyone have thoughts on a possible cause?
The name of the function QDataStream::writeRawData() sounds like ideal for writing binary data. Unfortunately, that's only half of the story.
The open-mode of the file is relevant as well under certain conditions – e.g. if the QFile is opened on Windows with QIODevice::Text:
QIODevice::Text
When reading, the end-of-line terminators are translated to '\n'. When writing, the end-of-line terminators are translated to the local encoding, for example '\r\n' for Win32.
I prepared an MCVE to demonstrate that:
// Qt header:
#include <QtCore>
void write(const QString &fileName, const char *data, size_t size, QIODevice::OpenMode mode)
{
qDebug() << "Open file" << fileName;
QFile qFile(fileName);
qFile.open(mode | QIODevice::WriteOnly);
QDataStream out(&qFile);
const int ret = out.writeRawData(data, size);
qDebug() << ret << "bytes written.";
}
// main application
int main(int argc, char **argv)
{
const char data[] = {
'\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07',
'\x08', '\x09', '\x0a', '\x0b', '\x0c', '\x0d', '\x0e', '\x0f'
};
const size_t size = sizeof data / sizeof *data;
write("data.txt", data, size, 0);
write("test.txt", data, size, QIODevice::Text);
}
Built and tested in VS2017 on Windows 10:
Open file "data.txt"
16 bytes written.
Open file "test.txt"
16 bytes written.
Result inspected with the help of cygwin:
$ ls -l *.txt
-rwxrwx---+ 1 scheff Domänen-Benutzer 427 Jun 23 08:24 CMakeLists.txt
-rwxrwx---+ 1 scheff Domänen-Benutzer 16 Jun 23 08:37 data.txt
-rwxrwx---+ 1 scheff Domänen-Benutzer 17 Jun 23 08:37 test.txt
$
data.txt has 16 bytes as expected but test.txt has 17 bytes. Oops!
$ hexdump -C data.txt
00000000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |................|
00000010
$ hexdump -C test.txt
00000000 00 01 02 03 04 05 06 07 08 09 0d 0a 0b 0c 0d 0e |................|
00000010 0f |.|
00000011
$
Obviously, the underlying Windows file function “corrected” the \n to \r\n – 09 0a 0b became 09 0d 0a 0b. Hence, there occurs one additional byte which was not part of the originally written data.
Similar effects may happen when the QFile is opened for reading with QIODevice::Text involved.

Generate self-contained assembly subroutines with call to Windows API functions from C/C++

I have done some experiments with shellcode execution in which I wrote my own shellcode, write it into the target program's memory in which I want it executed and execute it with either a new thread or thread hijacking.
This works well, but manually writing the shellcode is rather time consuming, therefore I am looking for a method to be able to write a function in C or C++ that will be completely self-contained once compiled. This means that any compiled function should be executable independently. This way I could directly write it into my target's program ready to execute with WriteProcessMemory for example. Pushing the shellcode would therefore be done with a code like this:
#include <Windows.h>
#include <iostream>
using namespace std;
BOOL MakeABeep() {
return Beep(0x500, 0x500);
}
DWORD MakeABeepEnd() { return 0; }
int main() {
DWORD pid = 0;
cout << "PID: ";
cin >> dec >> pid;
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
if (!hProcess) { cout << "OpenProcess failed GLE = " << dec << GetLastError() << endl; return EXIT_FAILURE; }
void* buf = VirtualAllocEx(hProcess, NULL, 4096, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (!buf) { cout << "VirtualAllocEx failed GLE = " << dec << GetLastError() << endl; return EXIT_FAILURE; }
SIZE_T size = (DWORD64)MakeABeep - (DWORD64)MakeABeepEnd;
BOOL wpmStatus = WriteProcessMemory(hProcess, buf, MakeABeep, size, NULL);
if (!wpmStatus) { cout << "WriteProcessMemory failed GLE = " << dec << GetLastError() << endl; return EXIT_FAILURE; }
HANDLE hThread = CreateRemoteThread(hProcess, NULL, NULL, (LPTHREAD_START_ROUTINE)buf, NULL, NULL, NULL);
if (!hThread) { cout << "CreateRemoteThread failed GLE = " << dec << GetLastError() << endl; return EXIT_FAILURE; }
WaitForSingleObject(hThread, INFINITE);
VirtualFreeEx(hProcess, buf, 0, MEM_RELEASE);
return EXIT_SUCCESS;
}
If compiled with the default options of MSVC compiler, only a bunch of jmp instructions are copied, which seem to be a jump table. To avoid this problem, I disabled incremental linking in the compiler options and now any code in the function MakeABeep is properly copied, with the exception of calls to imported functions.
In my shellcode I pass the arguments as required by the calling convention and then I put the address of the function I want to call in the register rax and finally I call the function with call rax.
Is it possible to make the compiler generate something like that?
The key thing is that the generated binary has to have self-contained-only subroutines that could be executed independently.
For example, this is the assembly code generated for the function MakeABeep:
To be able to run it directly, instead of this mov rax, QWORD PTR [rip+0x?] the compiler should mov the full address of the Beep function into rax instead.
Please ignore the problems related to the modules possibly not loaded or loaded at a different address in the target program, I only intent to call functions in kernel32 and ntdll which are for sure loaded and at the same address in different processes.
Thank you for your help.
The compiler does not know the full address of the Beep function. The Beep function lives in kernel32.dll and this .DLL is marked as ASLR compatible and could in theory change its address every time you run the program. There is no compiler feature that lets you generate the real address of a function in a .DLL because such a feature is pretty useless.
One option I can think of is to use magic cookies that you replace with the correct function addresses at run time:
SIZE_T beepaddr = 0xff77ffffffff7001ull; // Magic value
((BOOL(WINAPI*)(DWORD,DWORD))beepaddr)(0x500, 0x500); // Call Beep()
compiles to
00011 b9 00 05 00 00 mov ecx, 1280 ; 00000500H
00016 48 b8 01 70 ff ff ff ff 77 ff mov rax, -38280596832686079 ; ff77ffffffff7001H
00020 8b d1 mov edx, ecx
00022 ff d0 call rax
You would then have to write a wrapper around WriteProcessMemory that knows how to lookup and replace these magic values with the correct address.
Some shell code will have its own mini implementation of GetModuleHandle and GetProcAddress where it looks up the module in the PEB module list and then searches the export directory. They often use a mini hashing function for the names so they don't have to deal with strings.
If you are injecting a large amount of code you will probably get tired of these hacks and just load a .DLL in the remote process like everyone else.

Checking if a key is down in MS-DOS (C/C++)

Yes, I mean real MS-DOS, not Windows' cmd.exe shell console.
Is there a way to check if a key is down in MS-DOS, analogically to the GetAsyncKeyState() function in WinAPI?
Currently I'm using kbhit() and getch(), but it's really slow, has a delay after the first character, doesn't allow multiple keys at the same time etc.
I'm using Turbo C++ 3.1. Can anyone help?
(by the way, don't ask why I'm coding my game on such an ancient system)
There is no function provided by Turbo C++, MS-DOS or the BIOS that corresponds to Windows function GetAsyncKeyState. The BIOS only keeps track of which modifier keys (Shift, Ctrl, or Alt) are held down, it doesn't track any of the other keys. If you want to do that you need to talk to the keyboard controller directly and monitor the make (key pressed) and break (key released) scan codes it receives from the keyboard.
To do that you'll need to hook the keyboard interrupt (IRQ 1, INT 0x09), read the scancodes from the keyboard controller and then update your own keyboard state table.
Here's a simple program that demonstrates how do this:
#include <conio.h>
#include <dos.h>
#include <stdio.h>
unsigned char normal_keys[0x60];
unsigned char extended_keys[0x60];
static void interrupt
keyb_int() {
static unsigned char buffer;
unsigned char rawcode;
unsigned char make_break;
int scancode;
rawcode = inp(0x60); /* read scancode from keyboard controller */
make_break = !(rawcode & 0x80); /* bit 7: 0 = make, 1 = break */
scancode = rawcode & 0x7F;
if (buffer == 0xE0) { /* second byte of an extended key */
if (scancode < 0x60) {
extended_keys[scancode] = make_break;
}
buffer = 0;
} else if (buffer >= 0xE1 && buffer <= 0xE2) {
buffer = 0; /* ingore these extended keys */
} else if (rawcode >= 0xE0 && rawcode <= 0xE2) {
buffer = rawcode; /* first byte of an extended key */
} else if (scancode < 0x60) {
normal_keys[scancode] = make_break;
}
outp(0x20, 0x20); /* must send EOI to finish interrupt */
}
static void interrupt (*old_keyb_int)();
void
hook_keyb_int(void) {
old_keyb_int = getvect(0x09);
setvect(0x09, keyb_int);
}
void
unhook_keyb_int(void) {
if (old_keyb_int != NULL) {
setvect(0x09, old_keyb_int);
old_keyb_int = NULL;
}
}
int
ctrlbrk_handler(void) {
unhook_keyb_int();
_setcursortype(_NORMALCURSOR);
return 0;
}
static
putkeys(int y, unsigned char const *keys) {
int i;
gotoxy(1, y);
for (i = 0; i < 0x30; i++) {
putch(keys[i] + '0');
}
}
void
game(void) {
_setcursortype(_NOCURSOR);
clrscr();
while(!normal_keys[1]) {
putkeys(1, normal_keys);
putkeys(2, normal_keys + 0x30);
putkeys(4, extended_keys);
putkeys(5, extended_keys + 0x30);
}
gotoxy(1, 6);
_setcursortype(_NORMALCURSOR);
}
int
main() {
ctrlbrk(ctrlbrk_handler);
hook_keyb_int();
game();
unhook_keyb_int();
return 0;
}
The code above has been compiled with Borland C++ 3.1 and tested under DOSBox and MS-DOS 6.11 running under VirtualBox. It shows the current state of keyboard a string of 0's and 1's, a 1 indicating that the key corresponding to that position's scan code is being pressed. Press the ESC key to exit the program.
Note that the program doesn't chain the original keyboard handler, so the normal MS-DOS and BIOS keyboard functions will not work while the keyboard interrupt is hooked. Also note that it restores original keyboard handler before exiting. This is critical because MS-DOS won't do this itself. It also properly handles extended keys that send two byte scan codes, which was the problem with the code in the question you linked to in your answer here.
Why are you coding your game on su…just kidding!
In MS-DOS, the "API" functions are implemented as interrupt servicers. In x86 assembly language, you use the INT instruction and specify the number of the interrupt that you want to execute. Most of the interrupts require that their "parameters" be set in certain registers before executing the INT. After the INT instruction returns control to your code, its result(s) will have been placed in certain registers and/or flags, as defined by the interrupt call's documentation.
I have no idea how Turbo C++ implements interrupts, since that pre-dates my involvement with programming, but I do know that it allows you to execute them. Google around for the syntax, or check your Turbo C++ documentation.
Knowing that these are interrupts will get you 90% of the way to a solution when you're searching. Ralf Brown compiled and published a famous list of DOS and BIOS interrupt codes. They should also be available in any book on DOS programming—if you're serious about retro-programming, you should definitely consider getting your hands on one. A used copy on Amazon should only set you back a few bucks. Most people consider these worthless nowadays.
Here is a site that lists the sub-functions available for DOS interrupt 21h. The ones that would be relevant to your use are 01, 06, 07, and 08. These are basically what the C standard library functions like getch are going to be doing under the hood. I find it difficult to imagine, but I have heard reports that programmers back in the day found it faster to call the DOS interrupts directly. The reason I question that is that I can't imagine the runtime library implementers would have been so stupid as to provide unnecessarily slow implementations. But maybe they were.
If the DOS interrupts are still too slow for you, your last recourse would be to use BIOS interrupts directly. This might make an appreciable difference in speed because you're bypassing every abstraction layer possible. But it does make your program significantly less portable, which is the reason operating systems like DOS provided these higher level function calls to begin with. Again, check Ralf Brown's list for the interrupt that is relevant to your use. For example, INT 16 with the 01h sub-function.
pressing on the arrows keys shoots two Keyboard interrupts ? ( int 09h )
The implementation in this question works just fine, so if anyone for some reason wants a ready function for this, here you go:
unsigned char read_scancode() {
unsigned char res;
_asm {
in al, 60h
mov res, al
in al, 61h
or al, 128
out 61h, al
xor al, 128
out 61h, al
}
return res;
}
(EDIT: corrected char to unsigned char so putting this function's return value in "if" statements with things like scancode & 0x80 actually works)
When a key is pressed, it returns one of the scancodes listed there http://www.ctyme.com/intr/rb-0045.htm and when it's released it returns the same scancode but ORed with 80h.
If you actually run this in a game loop you'll eventually overflow the BIOS keyboard buffer and the computer will beep at you. A way to free the keyboard buffer is of course while(kbhit()) getch(); but since we are on 286 realmode and we have all of our hardware to f*ck with, here's a more low-level solution:
void free_keyb_buf() {
*(char*)(0x0040001A) = 0x20;
*(char*)(0x0040001C) = 0x20;
}
If you're looking for explanation how and why does it work, here you go:
The BIOS keyboard buffer starts at 0040:001Ah and looks like this: 2-byte "head" pointer, 2-byte "tail" pointer and 32 bytes of 2-byte scancodes. The "tail" pointer indicates where to start reading from the keyboard buffer, the "head" pointer indicates where to stop. So by setting both of these to 0x20 (so they actually point to 0040:0020h) we basically trick the computer into thinking that there are no new keystrokes ready for extraction.
So, I've gone through all this stuff somewhat recently, and just happen to have the code that you need. (Also, I will link you some great books to get information from in pdf format.)
So, the way that this works, is you need to overwrite the Interrupt Vector Table in memory at index 9h. The Interrupt Vector Table is simply a table of memory addresses that point to a piece of code to run when that interrupt is triggered (these are called interrupt handler routines or ISRs). Interrupt 9h is triggered when the keyboard controller has a scancode ready for use.
Anyways, we first need to overwrite the old int9h ISR by calling the KeyboardInstallDriver() function. Now, when int9h is triggered, the KeyboardIsr() function is called, and it gets the scancode from the keyboard controller, and sets a value in the keyStates[] array to either 1 (KEY_PRESSED) or 0 (KEY_RELEASED) based on the value of the scan code that was retrieved from the keyboard controller.
After the corresponding value in the keyStates[] array has been set, then you can call KeyboardGetKey() giving it the scancode of the key that you want to know the state of, and it will look it up in the keyStates[] array and return whatever the state is.
There is a lot of details to this, but it's WAY too much to write on here. All the details can be found in the books that I will link here:
IBM PC Technical Reference, IBM PC XT Technical Reference,
IBM PC AT Technical Reference, Black Art of 3D Game Programming
Hopefully those links stay active for a while. Also, the "Black Art of 3D Game Programming" book is not always completely accurate on every little detail. Sometimes there are typos, and sometimes there is misinformation, but the IBM Technical References have all the details (even if they are a bit cryptic at times), but they have no example code. Use the book to get the general idea, and use the references to get the details.
Here is my code for getting input from the keyboard:
(it's not completely finished for all the possible keys and certain other things, but it works quite well for most programs and games.)
Also, there is some code to handle the "extended" keys. The extended keys have 0xE0 prefixed to their regular scan code. There are even more crazy details with this, so I'm not gonna cover it, but, there is the mostly working code, anyways.
keyboard.h
#ifndef KEYBOARD_H_INCLUDED
#define KEYBOARD_H_INCLUDED
#include "keyboard_scan_codes.h"
unsigned char KeyboardGetKey(unsigned int scanCode);
void KeyboardClearKeys();
void KeyboardInstallDriver();
void KeyboardUninstallDriver();
void KeyboardDumpScancodeLog();
#endif // KEYBOARD_H_INCLUDED
keyboard.c
#define MAX_SCAN_CODES 256
#define KEYBOARD_CONTROLLER_OUTPUT_BUFFER 0x60
#define KEYBOARD_CONTROLLER_STATUS_REGISTER 0x64
#define KEY_PRESSED 1
#define KEY_RELEASED 0
#define PIC_OPERATION_COMMAND_PORT 0x20
#define KEYBOARD_INTERRUPT_VECTOR 0x09
// PPI stands for Programmable Peripheral Interface (which is the Intel 8255A chip)
// The PPI ports are only for IBM PC and XT, however port A is mapped to the same
// I/O address as the Keyboard Controller's (Intel 8042 chip) output buffer for compatibility.
#define PPI_PORT_A 0x60
#define PPI_PORT_B 0x61
#define PPI_PORT_C 0x62
#define PPI_COMMAND_REGISTER 0x63
#include <dos.h>
#include <string.h>
#include <stdio.h>
#include <conio.h>
#include "keyboard.h"
void interrupt (*oldKeyboardIsr)() = (void *)0;
unsigned char keyStates[MAX_SCAN_CODES];
unsigned char keyCodeLog[256] = {0};
unsigned char keyCodeLogPosition = 0;
static unsigned char isPreviousCodeExtended = 0;
unsigned char KeyboardGetKey(unsigned int scanCode)
{
// Check for the extended code
if(scanCode >> 8 == 0xE0)
{
// Get rid of the extended code
scanCode &= 0xFF;
return keyStates[scanCode + 0x7F];
}
else
{
return keyStates[scanCode];
}
}
void KeyboardClearKeys()
{
memset(&keyStates[0], 0, MAX_SCAN_CODES);
}
void interrupt far KeyboardIsr()
{
static unsigned char scanCode;
unsigned char ppiPortB;
_asm {
cli // disable interrupts
};
/* The keyboard controller, by default, will send scan codes
// in Scan Code Set 1 (reference the IBM Technical References
// for a complete list of scan codes).
//
// Scan codes in this set come as make/break codes. The make
// code is the normal scan code of the key, and the break code
// is the make code bitwise "OR"ed with 0x80 (the high bit is set).
//
// On keyboards after the original IBM Model F 83-key, an 0xE0
// is prepended to some keys that didn't exist on the original keyboard.
//
// Some keys have their scan codes affected by the state of
// the shift, and num-lock keys. These certain
// keys have, potentially, quite long scan codes with multiple
// possible 0xE0 bytes along with other codes to indicate the
// state of the shift, and num-lock keys.
//
// There are two other Scan Code Sets, Set 2 and Set 3. Set 2
// was introduced with the IBM PC AT, and Set 3 with the IBM
// PS/2. Set 3 is by far the easiest and most simple set to work
// with, but not all keyboards support it.
//
// Note:
// The "keyboard controller" chip is different depending on
// which machine is being used. The original IBM PC uses the
// Intel 8255A-5, while the IBM PC AT uses the Intel 8042 (UPI-42AH).
// On the 8255A-5, port 0x61 can be read and written to for various
// things, one of which will clear the keyboard and disable it or
// re enable it. There is no such function on the AT and newer, but
// it is not needed anyways. The 8042 uses ports 0x60 and 0x64. Both
// the 8255A-5 and the 8042 give the scan codes from the keyboard
// through port 0x60.
// On the IBM PC and XT and compatibles, you MUST clear the keyboard
// after reading the scancode by reading the value at port 0x61,
// flipping the 7th bit to a 1, and writing that value back to port 0x61.
// After that is done, flip the 7th bit back to 0 to re-enable the keyboard.
//
// On IBM PC ATs and newer, writing and reading port 0x61 does nothing (as far
// as I know), and using it to clear the keyboard isn't necessary.*/
scanCode = 0;
ppiPortB = 0;
ppiPortB = inp(PPI_PORT_B); // get the current settings in PPI port B
scanCode = inp(KEYBOARD_CONTROLLER_OUTPUT_BUFFER); // get the scancode waiting in the output buffer
outp(PPI_PORT_B, ppiPortB | 0x80); // set the 7th bit of PPI port B (clear keyboard)
outp(PPI_PORT_B, ppiPortB); // clear the 7th bit of the PPI (enable keyboard)
// Log scancode
keyCodeLog[keyCodeLogPosition] = scanCode;
if(keyCodeLogPosition < 255)
{
++keyCodeLogPosition;
}
// Check to see what the code was.
// Note that we have to process the scan code one byte at a time.
// This is because we can't get another scan code until the current
// interrupt is finished.
switch(scanCode)
{
case 0xE0:
// Extended scancode
isPreviousCodeExtended = 1;
break;
default:
// Regular scancode
// Check the high bit, if set, then it's a break code.
if(isPreviousCodeExtended)
{
isPreviousCodeExtended = 0;
if(scanCode & 0x80)
{
scanCode &= 0x7F;
keyStates[scanCode + 0x7F] = KEY_RELEASED;
}
else
{
keyStates[scanCode + 0x7F] = KEY_PRESSED;
}
}
else if(scanCode & 0x80)
{
scanCode &= 0x7F;
keyStates[scanCode] = KEY_RELEASED;
}
else
{
keyStates[scanCode] = KEY_PRESSED;
}
break;
}
// Send a "Non Specific End of Interrupt" command to the PIC.
// See Intel 8259A datasheet for details.
outp(PIC_OPERATION_COMMAND_PORT, 0x20);
_asm
{
sti // enable interrupts
};
}
void KeyboardInstallDriver()
{
// Make sure the new ISR isn't already in use.
if(oldKeyboardIsr == (void *)0)
{
oldKeyboardIsr = _dos_getvect(KEYBOARD_INTERRUPT_VECTOR);
_dos_setvect(KEYBOARD_INTERRUPT_VECTOR, KeyboardIsr);
}
}
void KeyboardUninstallDriver()
{
// Make sure the new ISR is in use.
if(oldKeyboardIsr != (void *)0)
{
_dos_setvect(KEYBOARD_INTERRUPT_VECTOR, oldKeyboardIsr);
oldKeyboardIsr = (void *)0;
}
}
void KeyboardDumpScancodeLog()
{
FILE *keyLogFile = fopen("keylog.hex", "w+b");
if(!keyLogFile)
{
printf("ERROR: Couldn't open file for key logging!\n");
}
else
{
int i;
for(i = 0; i < 256; ++i)
{
fputc(keyCodeLog[i], keyLogFile);
}
fclose(keyLogFile);
}
}
keyboard_scan_codes.h (simply defines all the scancodes to the qwerty button layout)
#ifndef KEYBOARD_SCAN_CODES_H_INCLUDED
#define KEYBOARD_SCAN_CODES_H_INCLUDED
// Original 83 Keys from the IBM 83-key Model F keyboard
#define SCAN_NONE 0x00
#define SCAN_ESC 0x01
#define SCAN_1 0x02
#define SCAN_2 0x03
#define SCAN_3 0x04
#define SCAN_4 0x05
#define SCAN_5 0x06
#define SCAN_6 0x07
#define SCAN_7 0x08
#define SCAN_8 0x09
#define SCAN_9 0x0A
#define SCAN_0 0x0B
#define SCAN_MINUS 0x0C
#define SCAN_EQUALS 0x0D
#define SCAN_BACKSPACE 0x0E
#define SCAN_TAB 0x0F
#define SCAN_Q 0x10
#define SCAN_W 0x11
#define SCAN_E 0x12
#define SCAN_R 0x13
#define SCAN_T 0x14
#define SCAN_Y 0x15
#define SCAN_U 0x16
#define SCAN_I 0x17
#define SCAN_O 0x18
#define SCAN_P 0x19
#define SCAN_LEFT_BRACE 0x1A
#define SCAN_RIGHT_BRACE 0x1B
#define SCAN_ENTER 0x1C
#define SCAN_LEFT_CONTROL 0x1D
#define SCAN_A 0x1E
#define SCAN_S 0x1F
#define SCAN_D 0x20
#define SCAN_F 0x21
#define SCAN_G 0x22
#define SCAN_H 0x23
#define SCAN_J 0x24
#define SCAN_K 0x25
#define SCAN_L 0x26
#define SCAN_SEMICOLON 0x27
#define SCAN_APOSTROPHE 0x28
#define SCAN_ACCENT 0x29
#define SCAN_TILDE 0x29 // Duplicate of SCAN_ACCENT with popular Tilde name.
#define SCAN_LEFT_SHIFT 0x2A
#define SCAN_BACK_SLASH 0x2B
#define SCAN_Z 0x2C
#define SCAN_X 0x2D
#define SCAN_C 0x2E
#define SCAN_V 0x2F
#define SCAN_B 0x30
#define SCAN_N 0x31
#define SCAN_M 0x32
#define SCAN_COMMA 0x33
#define SCAN_PERIOD 0x34
#define SCAN_FORWARD_SLASH 0x35
#define SCAN_RIGHT_SHIFT 0x36
#define SCAN_KP_STAR 0x37
#define SCAN_KP_MULTIPLY 0x37 // Duplicate of SCAN_KP_STAR
#define SCAN_LEFT_ALT 0x38
#define SCAN_SPACE 0x39
#define SCAN_CAPS_LOCK 0x3A
#define SCAN_F1 0x3B
#define SCAN_F2 0x3C
#define SCAN_F3 0x3D
#define SCAN_F4 0x3E
#define SCAN_F5 0x3F
#define SCAN_F6 0x40
#define SCAN_F7 0x41
#define SCAN_F8 0x42
#define SCAN_F9 0x43
#define SCAN_F10 0x44
#define SCAN_NUM_LOCK 0x45
#define SCAN_SCROLL_LOCK 0x46
#define SCAN_KP_7 0x47
#define SCAN_KP_8 0x48
#define SCAN_KP_9 0x49
#define SCAN_KP_MINUS 0x4A
#define SCAN_KP_4 0x4B
#define SCAN_KP_5 0x4C
#define SCAN_KP_6 0x4D
#define SCAN_KP_PLUS 0x4E
#define SCAN_KP_1 0x4F
#define SCAN_KP_2 0x50
#define SCAN_KP_3 0x51
#define SCAN_KP_0 0x52
#define SCAN_KP_PERIOD 0x53
// Extended keys for the IBM 101-key Model M keyboard.
#define SCAN_RIGHT_ALT 0xE038
#define SCAN_RIGHT_CONTROL 0xE01D
#define SCAN_LEFT_ARROW 0xE04B
#define SCAN_RIGHT_ARROW 0xE04D
#define SCAN_UP_ARROW 0xE048
#define SCAN_DOWN_ARROW 0xE050
#define SCAN_NUMPAD_ENTER 0xE01C
#define SCAN_INSERT 0xE052
#define SCAN_DELETE 0xE053
#define SCAN_HOME 0xE047
#define SCAN_END 0xE04F
#define SCAN_PAGE_UP 0xE049
#define SCAN_PAGE_DOWN 0xE051
#define SCAN_KP_FORWARD_SLASH 0xE035
#define SCAN_PRINT_SCREEN 0xE02AE037
#endif // KEYBOARD_SCAN_CODES_H_INCLUDED

Reading a tiff image with c++

I'm trying to get information from a tiff image file. The output of Endian is correct but the rest of them are all wrong. The first 8 bytes of the tiff file is:
4d 4d 00 2a 00 02 03 60
The magicno I'm getting is 10752, which is 2A00 is HEX. But I should be reading the third and for bytes, which should be 002a. Need help please!!
Here's my code.
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
char buffer[3];
short magicno;
int ifdaddress;
short ifdcount;
ifstream imfile;
imfile.open("pooh.tif",ios::binary);
imfile.seekg(0,ios::beg);
imfile.read(buffer,2);
imfile.read((char*)&magicno, 2);
imfile.read((char*)&ifdaddress, 4);
imfile.seekg(ifdaddress, ios::beg);
imfile.read((char*)&ifdcount, 2);
imfile.close();
buffer[2]='\0';
cout<<"Endian: "<<buffer<<endl;
cout<<"Magic: "<<magicno<<endl;
cout<<"IFD Address: "<<ifdaddress<<endl;
cout<<"IFD CountL "<<ifdcount<<endl;
return 0;
}
My output is:
Endian: MM
Magic: 10752
IFD Address: 1610809856
IFD CountL 0
You read the endianness marker correctly but you do not act upon it. From Adobe's "TIFF 6":
Bytes 0-1:
The byte order used within the file. Legal values are:
“II” (4949.H)
“MM” (4D4D.H)
In the “II” format, byte order is always from the least significant byte to the most significant byte, for both 16-bit and 32-bit integers. This is called little-endian byte order. In the “MM” format, byte order is always from most significant to least significant, for both 16-bit and 32-bit integers. This is called big-endian byte order.
(https://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf)
You need two sets of routines to read a short integer from a TIFF file (and also to read the longer integral types): one that reads Motorola ("MM") big-endian numbers, and one that reads Intel ("II") little-endians.
As it is, you must be one a little-endian system while attemtping to natively read big-endian numbers.
The code to correctly read a word can be as simple as
unsigned char d1,d2;
imfile.read (&d1,1);
imfile.read (&d2,1);
if (magicno == 0x4949)
word = d1 + (d2<<8);
else
word = (d1<<8)+d2;
Untested, but the general idea should be clear. Best make it a functionl because you need a similar setup for the "LONG" data type, which in turn is needed for the "RATIONAL" datatype.
Ultimately, for TIFF files, you may want a generalized read_data function which first checks what data type is stored in the file and then calls the correct routine.

QDataStream uses sometimes 32 bit and sometimes 40 bit floats

I am writing an application that is supposed to write an array of floats to a WAVE file. I am using a QDataStream for this, but this results in a very improbable output that I can't explain. It seems like the QDataStream sometimes chooses 32 bit floats and sometimes 40 bit floats. This messes up the entire output file, since it has to obey a strict format.
My code roughly looks like this:
float* array;
unsigned int nSamples;
void saveWAV(const QString& fileName) const
{
QFile outFile(fileName);
if (outFile.open(QIODevice::WriteOnly | QIODevice::Text))
{
QDataStream dataStream(&outFile);
dataStream.setByteOrder(QDataStream::LittleEndian);
dataStream.setFloatingPointPrecision(QDataStream::SinglePrecision);
// ... do all the WAV file header stuff ...
for(int ii = 0; ii < nSamples; ++ii)
dataStream << array[ii];
}
}
I can think of no reason of how this code could have such a side-effect. So I made a minimal example to find out what was going on. I replaced the for-loop by this:
float temp1 = 1.63006e-33f;
float temp2 = 1.55949e-32f;
dataStream << temp1;
dataStream << temp1;
dataStream << temp2;
dataStream << temp1;
dataStream << temp2;
Then I opened the output file using Matlab and had a look at the bytes written the file. Those were:
8b 6b 07 09 // this is indeed 1.63006e-33f (notice it's Little Endian)
8b 6b 07 09
5b f2 a1 0d 0a // I don't know what this is but it's a byte to long
8b 6b 07 09
5b f2 a1 0d 0a
I chose the values pretty arbitrarily, they just happened to have this effect. Some values are exported as 4-byte and other ones as 5-byte numbers. Does anyone have any idea what may be the cause of this?
Edit:
When checking the size of both floats, they do seem to be 4 chars long, though:
qDebug() << sizeof(temp1); // prints '4'
qDebug() << sizeof(temp2); // prints '4'
The answer lies in the opening of the output file: the QIODevice::Text flag should have been left out, since it is a binary file. If the text-flag is included, a 0d character is inserted before each 0a. So each float that contains an 0a character seems a char longer because of this.
All credits for this answer go to the answers given in:
Length of float changes between 32 and 40 bit
Note: I'm not 100% sure I'm right below, and would love to hear I'm wrong, but this is how I think it is:
QDataStream has it's own serialization format, and while I did not check, that is probably related to that. Point is, it's not meant for what you are trying to do with it: write just any binary format. You can use the class, but I believe you need to use only writeRawData() method, and take care of byte order etc yourself.
I had a similar issue even though I was not using the IODevice::Text flag. I found that adding a line
dataStream.device()->setTextModeEnabled(false);
solved the problem to make sure you are in the binary mode.