I have been messing with NTFS lately in order to perform quick search (by parsing MFT) which is supposed to reveal files with specific extensions (even if they were deleted) and find their path. The first weird thing that I have encountered is that in all the cases I have seen (3) drive C contains a lot of invalid MFT records (more than 3/4). Most of them (if not all) fail signature validation.
I would normally think that these records are not used but there is another problem which makes me think that something is wrong: when all file records with the required extension are found, some of the records point to parent MFT records which fail validation due to the same reason. But the files are markes as 'in-use' and I see them in explorer. Also, another weird thing is that the directory they are in is valid, since there are files which are in the same directory/subdirectories which point to valid directories (e.g. I have file log.txt which is on the Desktop and it points to an invalid file record. There is also a folder data (on the Desktop, too) which contains a file info.txt and 'data' points to a valid file record).
Signature validation (simplified):
struct FILE_RECORD_HEADER
{
uint32 Magic; //Should match FILE_RECORD_SIGNATURE
uint16 OffsetOfUS; //Offset of Update Sequence
uint16 SizeOfUS; //Size in 2-byte ints of Update Sequence Number & Array
uint64 LSN; //$LogFile Sequence Number
uint16 SeqNo; //Sequence number
uint16 Hardlinks; //Hard link count
uint16 OffsetOfAttr; //Offset of the first Attribute
uint16 Flags; //Flags
uint32 RealSize; //Real size of the FILE record
uint32 AllocSize; //Allocated size of the FILE record
uint64 RefToBase; //File reference to the base FILE record. Low 6B - file reference, high 2B - MFT record sequence number
uint16 NextAttrId; //Next Attribute Id
uint16 Align; //Align to 4 uint8 boundary
uint32 RecordNo; //Number of this MFT Record
};
#define FILE_RECORD_SIGNATURE 'ELIF'
FILE_RECORD_HEADER * header = (FILE_RECORD_HEADER *)rawFileRecord; //where rawFileRecord is a pointer to a block of memory in which a file record is stored
if(header->Magic != FILE_RECORD_SIGNATURE) //The file record is invalid
Getting LCN of parent:
struct ATTR_FILE_NAME
{
uint64 ParentRef; //File reference to the parent directory. Low 6B - file reference, high 2B - MFT record sequence number
uint64 CreateTime; //File creation time
uint64 AlterTime; //File altered time
uint64 MFTTime; //MFT changed time
uint64 ReadTime; //File read time
uint64 AllocSize; //Allocated size of the file
uint64 RealSize; //Real size of the file
uint32 Flags; //Flags
uint32 ER; //Used by EAs and Reparse
uint8 NameLength; //Filename length in characters
uint8 NameSpace; //Filename space
uint16 Name[1]; //Filename
};
ATTR_FILE_NAME * attr = (ATTR_FILE_NAME*)filenameAttr; //where filenameAttr is a pointer to the beginning of filename attribute somewhere in the rawFileRecord
uint64 parentLCN = attr->ParentRef & 0x0000FFFFFFFFFFFF;
Is it possible to lose (in terms of search) files due to this signature mismatch (I think yes but I want to be sure)? Why do some file records point to invalid parents while other point to valid ones (they are supposed to have the same parent)?
The reference you are searching for is the RecordNo in FILE_RECORD_HEADER.
The Low 6B - file reference in ParentRef is suppose to match with it.
If those two match you have the right file.
Its the main reason why a NTFS can only contain 4'294'967'295 files, since its stored in a uint32.
Personally i found it easier to map everything with the $INDEX_ROOT_ATTR and $INDEX_ALLOCATION_ATTR since you can found the same type of reference and it allows you to follow the tree structure easier, since you can start with the root (his record no is always 5).
I am reading a tutorial on writing memory management tools for c++ programs. Here is the link to the tutorial.
One of the variants of this memory manager is a Bit-Mapped Memory Manager in which optimization is based on the idea of prefetching a large chunk of memory and using it in our programs later.
This chunk is further divided into smaller fixed-sized units called blocks to be used for the allocation of a particular type of object.
The tutorial clearly mentions, "All free blocks have their corresponding bit set to 1. Occupied blocks have their bits reset to 0."
With each chunk, a bitmap is associated which represents the above idea. But, in the implementation of BitMap class, each corresponding bit for each block is represented by a 32-bit integer instead of just a single bit boolean value. This is what I am not able to understand.
Also, below is the declaration of the above-mentioned class. You can see this in the Listing-12 of the tutorial. I also think the line with memset is incorrect. The initialization is incomplete and it should be BIT_MAP_SIZE*4 even if we go their way.
typedef struct BitMapEntry
{
int Index;
int BlocksAvailable;
int BitMap[BIT_MAP_SIZE];
public:
BitMapEntry(): BlocksAvailable(BIT_MAP_SIZE)
{
memset(BitMap, 0xff, BIT_MAP_SIZE / sizeof(char));
// initially all blocks are free and bit value 1 in the map denotes
// available block
}
void SetBit(int position, bool flag);
void SetMultipleBits(int position, bool flag, int count);
void SetRangeOfInt(int* element, int msb, int lsb, bool flag);
Complex* FirstFreeBlock(size_t size);
Complex* ComplexObjectAddress(int pos);
void* Head();
}
BitMapEntry;
The initialization does seem to be incorrect, but the choice of a "bit container" type as int is not necessarily invalid.
You see, C++ doesn't have a native bit type; and on typical computers - you can't address single bits.
What could be happening (and you haven't shown use the implementation), since that each int can be the container for sizeof(int) * CHAR_BIT bits, thus if you ask for position k, you'll be looking at the k % (sizeof(int) * CHAR_BIT) in the k / sizeof(int) * CHAR_BIT) integer.
I am using eclipse with cygwin. The application is 64bit. In cygwin the structure is defined as :
struct addrinfo {
int ai_flags; /* input flags */
int ai_family; /* address family of socket */
int ai_socktype; /* socket type */
int ai_protocol; /* ai_protocol */
socklen_t ai_addrlen; /* length of socket address */
char *ai_canonname; /* canonical name of service location */
struct sockaddr *ai_addr; /* socket address of socket */
struct addrinfo *ai_next; /* pointer to next in list */
};
The sizeof(addrinfo) result is 48. The size of socketlen_t is 4 bytes. The int type size is 4 bytes. The pointer is 8 bytes in the 64 bits application. The total bytes is 44(4 ints = 16 bytes, socket_len = 4 bytes, 3 pointers = 24; 20+4+24 = 44). I am wondering what the missing 4 bytes for? Are they for padding? I thought 44 bytes do not need to be aligned. Any thought?
Thanks for the answer in advance.
It's padding, after the socklen_t. The next variable in the struct is a pointer, which is 64-bits in length, and will (in this case) be aligned to 64-bits as well. Note however that padding is dependent on architecture and compiler settings; it happens here, but is not guaranteed to always happen.
Note that since you are sharing this struct with the operating system, you should NOT try to change the padding yourself (most compilers allow this using compiler switches and/or pragmas). The OS is expecting this struct with a certain amount of padding included. If you fail to provide it, all the pointers at the end of the struct will have their values misinterpreted.
It's "struct member alignment", the /Zp flag
Data structure alignment,
Microsoft documentation
I don't understand what using this syntax is for: *(char *). What does it do and can it be used with other data types like int?
void function(int a)
{
*(char*)(0x12345 + (0x3980 * a)) = 0xFF;
}
*(char *)hoge means that interpret hoge as a pointer for char and read the data on where hoge points at.
It can be used with other data types like int.
One usage example: comparison function for qsort
int cmp(const void *x, const void *y) {
int a = *(int *)x;
int b = *(int *)y;
if (a > b) return 1;
if (a < b) return -1;
return 0;
}
I don't know where you got your example from, but it doesn't make sense to me for some reason. Anyway, when you use the character "*" before something like (char*), what is happening is you're telling the compiler to cast the value computed between those parentheses (0x12345 + (0x3980 * a)) into a pointer to char, and then change the value store in that location on the memory to be 0xFF.
In other words, what just happened is you grabbed a random location on the memory, and you told the compiler to act like that location contain a char "*(char*)", and store my value "0xFF" there.
The question has been awnsered already but here is a "real world" example where this kind of syntax is used.
In (low-level) embedded software developement you often have to interface with a mcu's hardware peripheripherals. These perhipherals are controlled by registers which are mapped to fixed memory addresses.
When an mcu has multiple of the same peripherals (ie. 3 ADC's) it'll usually have 3 equal register sets mapped right after each other.
When interfacing you want to work with the addresses directly but add an abstraction. A simple API for control may looks like this:
.H file
/* Header file, defines addresses for specific chip*/
#define ADC_BASE_ADDRESS 0x00001000 /* Start address of first register of ADC0 */
#define SIZEOF_ADC_REGISTERS 0x00000020 /* Size of all ADC0 registers */
#define ADC_REG_CFG_OFFSET 0x00 /* ADC Config register offset */
#define ADC_REG_BLA_BLA_OFFSET 0x04 /* ADC Config register offset */
/* etc, etc, etc*/
#define ADC_CFG_ENABLE 0x01 /* Enable command */
.C file
#include "chip.h"
void adc_enable(int adc){
*(uint32_t *)(ADC_BASE_ADDRESS + ADC_REG_CFG_OFFSET + (adc * SIZEOF_ADC_REGISTERS)) = ADC_CFG_ENABLE;
}
/* Calling code */
adc_enable(0);
adc_enable(3);
Do note, as mentioned this is typically done in C, not so much in C++.
I am trying to perform a less-than-32bit read over the PCI bus to a VME-bridge chip (Tundra Universe II), which will then go onto the VME bus and picked up by the target.
The target VME application only accepts D32 (a data width read of 32bits) and will ignore anything else.
If I use bit field structure mapped over a VME window (nmap'd into main memory) I CAN read bit fields >24 bits, but anything less fails. ie :-
struct works {
unsigned int a:24;
};
struct fails {
unsigned int a:1;
unsigned int b:1;
unsigned int c:1;
};
struct main {
works work;
fails fail;
}
volatile *reg = function_that_creates_and_maps_the_vme_windows_returns_address()
This shows that the struct works is read as a 32bit, but a read via fails struct of a for eg reg->fail.a is getting factored down to a X bit read. (where X might be 16 or 8?)
So the questions are :
a) Where is this scaled down? Compiler? OS? or the Tundra chip?
b) What is the actual size of the read operation performed?
I basiclly want to rule out everything but the chip. Documentation on that is on the web, but if it can be proved that the data width requested over the PCI bus is 32bits then the problem can be blamed on the Tundra chip!
edit:-
Concrete example, code was:-
struct SVersion
{
unsigned title : 8;
unsigned pecversion : 8;
unsigned majorversion : 8;
unsigned minorversion : 8;
} Version;
So now I have changed it to this :-
union UPECVersion
{
struct SVersion
{
unsigned title : 8;
unsigned pecversion : 8;
unsigned majorversion : 8;
unsigned minorversion : 8;
} Version;
unsigned int dummy;
};
And the base main struct :-
typedef struct SEPUMap
{
...
...
UPECVersion PECVersion;
};
So I still have to change all my baseline code
// perform dummy 32bit read
pEpuMap->PECVersion.dummy;
// get the bits out
x = pEpuMap->PECVersion.Version.minorversion;
And how do I know if the second read wont actually do a real read again, as my original code did? (Instead of using the already read bits via the union!)
Your compiler is adjusting the size of your struct to a multiple of its memory alignment setting. Almost all modern compilers do this. On some processors, variables and instructions have to begin on memory addresses that are multiples of some memory alignment value (often 32-bits or 64-bits, but the alignment depends on the processor architecture). Most modern processors don't require memory alignment anymore - but almost all of them see substantial performance benefit from it. So the compilers align your data for you for the performance boost.
However, in many cases (such as yours) this isn't the behavior you want. The size of your structure, for various reasons, can turn out to be extremely important. In those cases, there are various ways around the problem.
One option is to force the compiler to use different alignment settings. The options for doing this vary from compiler to compiler, so you'll have to check your documentation. It's usually a #pragma of some sort. On some compilers (the Microsoft compilers, for instance) it's possible to change the memory alignment for only a very small section of code. For example (in VC++):
#pragma pack(push) // save the current alignment
#pragma pack(1) // set the alignment to one byte
// Define variables that are alignment sensitive
#pragma pack(pop) // restore the alignment
Another option is to define your variables in other ways. Intrinsic types are not resized based on alignment, so instead of your 24-bit bitfield, another approach is to define your variable as an array of bytes.
Finally, you can just let the compilers make the structs whatever size they want and manually record the size that you need to read/write. As long as you're not concatenating structures together, this should work fine. Remember, however, that the compiler is giving you padded structs under the hood, so if you make a larger struct that includes, say, a works and a fails struct, there will be padded bits in between them that could cause you problems.
On most compilers, it's going to be darn near impossible to create a data type smaller than 8 bits. Most architectures just don't think that way. This shouldn't be a huge problem because most hardware devices that use datatypes of smaller than 8-bits end up arranging their packets in such a way that they still come in 8-bit multiples, so you can do the bit manipulations to extract or encode the values on the data stream as it leaves or comes in.
For all of the reasons listed above, a lot of code that works with hardware devices like this work with raw byte arrays and just encode the data within the arrays. Despite losing a lot of the conveniences of modern language constructs, it ends up just being easier.
I am wondering about the value of sizeof(struct fails). Is it 1? In this case, if you perform the read by dereferencing a pointer to a struct fails, it looks correct to issue a D8 read on the VME bus.
You can try to add a field unsigned int unused:29; to your struct fails.
The size of a struct is not equal to the sum of the size of its fields, including bit fields. Compilers are allowed, by the C and C++ language specifications, to insert padding between fields in a struct. Padding is often inserted for alignment purposes.
The common method in embedded systems programming is to read the data as an unsigned integer then use bit masking to retrieve the interesting bits. This is due to the above rule that I stated and the fact that there is no standard compiler parameter for "packing" fields in a structure.
I suggest creating an object ( class or struct) for interfacing with the hardware. Let the object read the data, then extract the bits as bool members. This puts the implementation as close to the hardware. The remaining software should not care how the bits are implemented.
When defining bit field positions / named constants, I suggest this format:
#define VALUE (1 << BIT POSITION)
// OR
const unsigned int VALUE = 1 << BIT POSITION;
This format is more readable and has the compiler perform the arithmetic. The calculation takes place during compilation and has no impact during run-time.
As an example, the Linux kernel has inline functions that explicitly handle memory-mapped IO reads and writes. In newer kernels it's a big macro wrapper that boils down to an inline assembly movl instruction, but it older kernels it was defined like this:
#define readl(addr) (*(volatile unsigned int *) (addr))
#define writel(b,addr) ((*(volatile unsigned int *) (addr)) = (b))
Ian - if you want to be sure as to the size of things you're reading/writing I'd suggest not using structs like this to do it - it's possible the sizeof of the fails struct is just 1 byte - the compiler is free to decide what it should be based on optimizations etc- I'd suggest reading/writing explicitly using int's or generally the things you need to assure the sizes of and then doing something else like converting to a union/struct where you don't have those limitations.
It is the compiler that decides what size read to issue. To force a 32 bit read, you could use a union:
union dev_word {
struct dev_reg {
unsigned int a:1;
unsigned int b:1;
unsigned int c:1;
} fail;
uint32_t dummy;
};
volatile union dev_word *vme_map_window();
If reading the union through a volatile-qualified pointer isn't enough to force a read of the whole union (I would think it would be - but that could be compiler-dependent), then you could use a function to provide the required indirection:
volatile union dev_word *real_reg; /* Initialised with vme_map_window() */
union dev_word * const *reg_func(void)
{
static union dev_word local_copy;
static union dev_word * const static_ptr = &local_copy;
local_copy = *real_reg;
return &static_ptr;
}
#define reg (*reg_func())
...then (for compatibility with the existing code) your accesses are done as:
reg->fail.a
The method described earlier of using the gcc flag -fstrict-volatile-bitfields and defining bitfield variables as volatile u32 works, but the total number of bits defined must be greater than 16.
For example:
typedef union{
vu32 Word;
struct{
vu32 LATENCY :3;
vu32 HLFCYA :1;
vu32 PRFTBE :1;
vu32 PRFTBS :1;
};
}tFlashACR;
.
tFLASH* const pFLASH = (tFLASH*)FLASH_BASE;
#define FLASH_LATENCY pFLASH->ACR.LATENCY
.
FLASH_LATENCY = Latency;
causes gcc to generate code
.
ldrb r1, [r3, #0]
.
which is a byte read. However, changing the typedef to
typedef union{
vu32 Word;
struct{
vu32 LATENCY :3;
vu32 HLFCYA :1;
vu32 PRFTBE :1;
vu32 PRFTBS :1;
vu32 :2;
vu32 DUMMY1 :8;
vu32 DUMMY2 :8;
};
}tFlashACR;
changes the resultant code to
.
ldr r3, [r2, #0]
.
I believe the only solution is to
1) edit/create my main struct as all 32bit ints (unsigned longs)
2) keep my original bit-field structs
3) each access I require,
3.1) I have to read the struct member as a 32bit word, and cast it into the bit-field struct,
3.2) read the bit-field element I require. (and for writes, set this bit-field, and write the word back!)
(1) Which is a same, because then I lose the intrinsic types that each member of the "main/SEPUMap" struct are.
End solution :-
Instead of :-
printf("FirmwareVersionMinor: 0x%x\n", pEpuMap->PECVersion);
This :-
SPECVersion ver = *(SPECVersion*)&pEpuMap->PECVersion;
printf("FirmwareVersionMinor: 0x%x\n", ver.minorversion);
Only problem I have is writting! (Writes are now Read/Modify/Writes!)
// Read - Get current
_HVPSUControl temp = *(_HVPSUControl*)&pEpuMap->HVPSUControl;
// Modify - set to new value
temp.OperationalRequestPort = true;
// Write
volatile unsigned int *addr = reinterpret_cast<volatile unsigned int*>(&pEpuMap->HVPSUControl);
*addr = *reinterpret_cast<volatile unsigned int*>(&temp);
Just have to tidy that code up into a method!
#define writel(addr, data) ( *(volatile unsigned long*)(&addr) = (*(volatile unsigned long*)(&data)) )
I had same problem on ARM using GCC compiler, where write into memory is only through bytes rather than 32bit word.
The solution is to define bit-fields using volatile uint32_t (or required size to write):
union {
volatile uint32_t XY;
struct {
volatile uint32_t XY_A : 4;
volatile uint32_t XY_B : 12;
};
};
but while compiling you need add to gcc or g++ this parameter:
-fstrict-volatile-bitfields
more in gcc documentation.