How can a 32-bit APIC ID fit inside 4 bits out of an 8-bit IOAPIC destination field? - osdev

Reading both the OSDev Wiki article and the Intel documentation about the two different available APIC types that said article links to left me with more questions than answers, specifically when it comes to the length of the fields. According to both sources, the IOAPIC destination field is supposed to contain a local APIC ID in bits 56-59 — that’s 4 bits out of an 8-bit field:
Destination field. If the destination mode bit was clear, then the lower 4 bits contain the bit APIC ID to sent the interrupt to. If the bit was set, the upper 4 bits also contain a set of processors. (See below)
Yet according to the same sources, the ID along with everything else in the LAPIC registers is 32 bits long:
The local APIC registers are memory mapped to an address that can be found in the MP/MADT tables. Make sure you map these to virtual memory if you are using paging. Each register is 32 bits long, and expects to be written and read as a 32 bit integer. Although each register is 4 bytes, they are all aligned on a 16 byte boundary.
This begs the question: how can the ID of the APIC to send the interrupt to possibly fit there?
What’s also interesting is that those upper 4 bits are supposed to contain a list of CPU cores to send an interrupt to if the IOAPIC is configured in logical mode, which is also interesting considering that most modern CPUs have more than 4 cores (my own i5-8400 notwithstanding) but that’s a completely different topic.

Solved my own problem by looking here. It turns out that only bits 24-32 of the ID value are the actual ID; all the others are mere padding.

Related

Encode additional information in pointer

My problem:
I need to encode additional information about an object in a pointer to the object.
What I thought I could do is use part of the pointer to do so. That is, use a few bits encode bool flags. As far as I know, the same thing is done with certain types of handles in the windows kernel.
Background:
I'm writing a small memory management system that can garbage-collect unused objects. To reduce memory consumption of object references and speed up copying, I want to use pointers with additional encoded data e.g. state of the object(alive or ready to be collected), lock bit and similar things that can be represented by a single bit.
My question:
How can I encode such information into a 64-bit pointer without actually overwriting the important bits of the pointer?
Since x64 windows has limited address space, I believe, not all 64 bits of the pointer are used, so I believe it should be possible. However, I wasn't able to find which bits windows actually uses for the pointer and which not. To clarify, this question is about usermode on 64-bit windows.
Thanks in advance.
This is heavily dependent on the architecture, OS, and compiler used, but if you know those things, you can do some things with it.
x86_64 defines a 48-bit1 byte-oriented virtual address space in the hardware, which means essentially all OSes and compilers will use that. What that means is:
the top 17 bits of all valid addresses must be all the same (all 0s or all 1s)
the bottom k bits of any 2k-byte aligned address must be all 0s
in addition, pretty much all OSes (Windows, Linux, and OSX at least) reserve the addresses with the upper bits set as kernel addresses -- all user addresses must have the upper 17 bits all 0s
So this gives you a variety of ways of packing a valid pointer into less than 64 bits, and then later reconstructing the original pointer with shift and/or mask instructions.
If you only need 3 bits and always use 8-byte aligned pointers, you can use the bottom 3 bits to encode extra info, and mask them off before using the pointer.
If you need more bits, you can shift the pointer up (left) by 16 bits, and use those lower 16 bits for information. To reconstruct the pointer, just right shift by 16.
To do shifting and masking operations on pointers, you need to cast them to intptr_t or int64_t (those will be the same type on any 64-bit implementation of C or C++)
1There's some hints that there may soon be hardware that extends this to 56 bits, so only the top 9 bits would need to be 0s or 1s, but it will be awhile before any OS supports this

How a program in 16 bit system can access integers more than 65535 but not address

A 16 bit system can only access RAM upto 64kbytes (normally). There is a concept of memory addresses that 16 bit system can access 2^16 numbers thus in unsigned integers it can only access 2^16 = 65536 INTEGERS (0 to 65535). Thus 16 bit sytem can only use addresses upto 64kbytes(after conclusion of small calculation). now the main que. Is that when we define an integer to be 'long int' than how can it access integers more than 65535?
There are a bunch of misconceptions in this post:
I came to know in previous days that a 16 bit system can only access RAM upto 64kbytes
This is factually wrong, the 8086 has a external address bus of 20 bits, so it can access 1,048,576 bytes (~1MB). You can read more about the 8086 architecture here: https://en.wikipedia.org/wiki/Intel_8086.
Is that when we define an integer to be 'long int' than how can it access integers more than 65535?
Are you asking about register size? In that case the answer is easy: it doesn't. It can access the first 16 bits, and then it can access the other 16 bits, and whatever the application does with those 2 16 bit values is up to it (and the framework used, like the C runtime).
As to how you can access the full address space of 20 bits with just 16bit integers, the answer is address segmenting. You have a second register (CS, DS, SS, and ES on 8086) that stores the high part of the address, and the CPU "stitches" them together to send to the memory controller.
Computers can perform arithmetic on values larger than a machine word in much the same way as humans can perform arithmetic on values larger than a digit: by splitting operations into multiple parts, and keeping track of "carries" that would move data between them.
On the 8086, for example, if AX holds the bottom half of a 32-bit number and DX holds the top half, the sequence:
ADD AX,[someValue]
ADC DX,[someValue+2]
will add to DX::AX the 32-bit value whose lower half is at address [someValue] and whose upper half is at [someValue+2]. The ADD instruction will update a "carry" flag indicating whether there was a carry out from the addition, and the ADC instruction will add an extra 1 if the carry flag was set.
Some processors don't have a carry flag, but have an instruction that will compare two registers, and set a third register to 1 if the first was greater than the second, and 0 otherwise. On those processors, if one wants to add R1::R0 to R3::R2 and place the result in R5::R4, one can use the sequence:
Add R0 to R2 and store the result in R4
Set R5 to 1 if R4 is less than R0 (will happen if there was a carry), and 0 otherwise
Add R1 to R5, storing the result in R5
Add R3 to R5, storing the result in R5
Four times as slow as a normal single-word addition, but still at least somewhat practical. Note that while the carry-flag approach is easily extensible to operate on numbers of any size, extending this approach beyond two words is much harder.

AMD HCC Swizzle Intrinsic

I've just recently discovered AMD's equivalent to CUDA's __byte_perm intrinsic; amdgcn_ds_swizzle(Or at least I think its the equivalent of a byte permutation function). My problem is this: CUDA's byte perm takes in two unsigned 32 bit integers, and then permutes that based on the value of the selector argument (supplied as a hex value). However, AMD's swizzle function only takes in one single unsigned 32 bit integer, and one int that's named as "pattern". How do I utilize AMD's Swizzle intrinsic function?
ds_swizzle and __byte_perm do are a little bit different. One permutes a whole register across lanes and the later permutes any four bytes from two 32-bit regs.
AMD's ds_swizzle_b32 GCN instruction is actually swapping values with other lanes. You specify the 32-bit register in the lane you want to read and the 32-bit register you want to place it in. There is also a hard-coded value that specifies how these are to be swapped. A great explanation of ds_swizzle_b32 is here as user3528438 pointed out.
The __byte_perm does not swap data with other lanes. It just gathers any 4 bytes from two 32-bit registers in its own lane and stores it to a register. There is no cross-lane traffic.
I'm guessing the next question would be how to do a "byte permute" on AMD GCN hardware. The instruction for that is v_perm_b32. (see page 12-152 here) It basically selects any four bytes from two specified 32-bit registers.

Why valgrind "valid value" need 8 bits?

Visit http://valgrind.org/docs/manual/mc-manual.html#mc-manual.machine
valgrind using V bits to verify the validity of data. In my opinion, only 1 bit can verify the validity, but why valgrind need 8bits?
It seems like it's explained right here:
It is simplest to think of Memcheck implementing a synthetic CPU which is identical to a real CPU, except for one crucial detail. Every bit (literally) of data processed, stored and handled by the real CPU has, in the synthetic CPU, an associated "valid-value" bit, which says whether or not the accompanying bit has a legitimate value. In the discussions which follow, this bit is referred to as the V (valid-value) bit.
So every single bit in the Valgrind testing environment has a corresponding valid-not valid bit to track its validity. This can be especially important for bit-fields, where a single bit can represent something like a Boolean value, represented only by one bit.
At this level, Valgrind is going for absolute precision on memory resolution down to the binary level, it seems, in order to provide for a good datascape on which it can observe and perform analysis.
Allocations are in units of bytes. That is, the integer you pass to malloc is the number of bytes you want to allocate. So memcheck only needs one bit per byte to track whether a memory address has been allocated.
But initialization can work on individual bits, not just on whole bytes. If all bits of byte X are uninitialized, and then I execute X = X | (1 << 3), then just one bit of X is now initialized. So memcheck tracks whether each individual bit has been initialized. Since there are 8 bits in a byte (on all CPUs that memcheck supports), that means memcheck needs another 8 bits per byte to track which bits have been initialized.

Which bit is first and when you bit shift, does it actually shift in that direction?

So.. wrestling with bits and bytes, It occurred to me that if i say "First bit of nth byte", it might not mean what I think it means. So far I have assumed that if I have some data like this:
00000000 00000001 00001000
then the
First byte is the leftmost of the groups and has the value of 0
First bit is the leftmost of all 0's and has the value of 0
Last byte is the rightmost of the groups and has the value of 8
Last bit of the second byte is the rightmost of the middle group and has the value of 1
Then I learned that the byte order in a typed collection of bytes is determined by the endianess of the system. In my case it should be little endian (windows, intel, right?) which would mean that something like 01 10 as a 16 bit uinteger should be 2551 while in most programs dealing with memory it would be represented as 265.. no idea whats going on there.
I also learned that bits in a byte could be ordered as whatever and there seems to be no clear answer as to which bit is the actual first one since they could also be subject to bit-endianess and peoples definition about what is first differs. For me its left to right, for somebody else it might be what first appears when you add 1 to 0 or right to left.
Why does any of this matter? Well, curiosity mostly but I was also trying to write a class that would be able to extract X number of bits, starting from bit-address Y. I envisioned it sorta like .net string where i can go and type ".SubArray(12(position), 5(length))" then in case of data like in the top of this post it would retrieve "0001 0" or 2.
So could somebody clarifiy as to what is first and last in terms of bits and bytes in my environment, does it go right to left or left to right or both, wut? And why does this question exist in the first place, why couldn't the coding ancestors have agreed on something and stuck with it?
A shift is an arithmetic operation, not a memory-based operation: it is intended to work on the value, rather than on its representation. Shifting left by one is equivalent to a multiplication by two, and shifting right by one is equivalent to a division by two. These rules hold first, and if they conflict with the arrangement of the bits of a multibyte type in memory, then so much for the arrangement in memory. (Since shifts are the only way to examine bits within one byte, this is also why there is no meaningful notion of bit order within one byte.)
As long as you keep your operations to within a single data type (rather than byte-shifting long integers and them examining them as character sequences), the results will stay predictable. Examining the same chunk of memory through different integer types is, in this case, a bit like performing integer operations and then reading the bits as a float; there will be some change, but it's not the place of the integer arithmetic definitions to say exactly what. It's out of their scope.
You have some understanding, but a couple misconceptions.
First off, arithmetic operations such as shifting are not concerned with the representation of the bits in memory, they are dealing with the value. Where memory representation comes into play is usually in distributed environments where you have cross-platform communication in the mix, where the data on one system is represented differently on another.
Your first comment...
I also learned that bits in a byte could be ordered as whatever and there seems to be no clear answer as to which bit is the actual first one since they could also be subject to bit-endianess and peoples definition about what is first differs
This isn't entirely true, though the bits are only given meaning by the reader and the writer of data, generally bits within an 8-bit byte are always read from left (MSB) to right (LSB). The byte-order is what is determined by the endian-ness of the system architecture. It has to do with the representations of the data in memory, not the arithmetic operations.
Second...
And why does this question exist in the first place, why couldn't the coding ancestors have agreed on something and stuck with it?
From Wikipedia:
The initial endianness design choice was (is) mostly arbitrary, but later technology revisions and updates perpetuate the same endianness (and many other design attributes) to maintain backward compatibility. As examples, the Intel x86 processor represents a common little-endian architecture, and IBM z/Architecture mainframes are all big-endian processors. The designers of these two processor architectures fixed their endiannesses in the 1960s and 1970s with their initial product introductions to the market. Big-endian is the most common convention in data networking (including IPv6), hence its pseudo-synonym network byte order, and little-endian is popular (though not universal) among microprocessors in part due to Intel's significant historical influence on microprocessor designs. Mixed forms also exist, for instance the ordering of bytes within a 16-bit word may differ from the ordering of 16-bit words within a 32-bit word. Such cases are sometimes referred to as mixed-endian or middle-endian. There are also some bi-endian processors which can operate either in little-endian or big-endian mode.
Finally...
Why does any of this matter? Well, curiosity mostly but I was also trying to write a class that would be able to extract X number of bits, starting from bit-address Y. I envisioned it sorta like .net string where i can go and type ".SubArray(12(position), 5(length))" then in case of data like in the top of this post it would retrieve "0001 0" or 2.
Many programming languages and libraries offer functions that allow you to convert to/from network (big endian) and host order (system dependent) so that you can ensure data you're dealing with is in the proper format, if you need to care about it. Since you're asking specifically about bit shifting, it doesn't matter in this case.
Read this post for more info