DFU bootloader immediately jumping to application in STM32F405RG - c++

I am trying to immediately jump to the DFU bootloader via software on the STM32F405RG, but instead it is doing a software reset and the RCC_FLAG_SFTRST flag is thrown. The program continues to repeatedly execute the code for jumping to bootloader and finally stays in bootloader mode after multiple seconds. I experienced success with entering bootloader mode on the first try if interrupts are disabled. However, I cannot upload code while in DFU mode since it requires interrupts to be enabled. I am unsure what is causing the bootloader to jump back to the application and was hoping to get assistance on this. Below is the code for jumping to bootloader, which is called immediately after Init_HAL() under main.
void JumpToBootloader(void) {
void (*SysMemBootJump)(void);
volatile uint32_t addr = 0x1FFF0000;
HAL_RCC_DeInit();
HAL_DeInit();
/**
* Step: Disable systick timer and reset it to default values
*/
SysTick->CTRL = 0;
SysTick->LOAD = 0;
SysTick->VAL = 0;
/**
* Step: Interrupts are not disabled since they are needed for DFU mode
*/
// __disable_irq();
/**
* Step: Remap system memory to address 0x0000 0000 in address space
* For each family registers may be different.
* Check reference manual for each family.
*
* For STM32F4xx, MEMRMP register in SYSCFG is used (bits[1:0])
* For STM32F0xx, CFGR1 register in SYSCFG is used (bits[1:0])
* For others, check family reference manual
*/
__HAL_SYSCFG_REMAPMEMORY_SYSTEMFLASH(); //Call HAL macro to do this for you
/**
* Step: Set jump memory location for system memory
* Use address with 4 bytes offset which specifies jump location where program starts
*/
SysMemBootJump = (void (*)(void)) (*((uint32_t *)(addr + 4)));
/**
* Step: Set main stack pointer.
* This step must be done last otherwise local variables in this function
* don't have proper value since stack pointer is located on different position
*
* Set direct address location which specifies stack pointer in SRAM location
*/
__set_MSP(*(uint32_t *)addr);
/**
* Step: Actually call our function to jump to set location
* This will start system memory execution
*/
SysMemBootJump();
}

Taking a look at the Application Note AN2606, Rev.55, chapter 28 STM32F40xxx/41xxx devices bootloader, page 127, one can see that the first step is called "System Reset".
The DFU is not designed to be executed from your application, but to be called according to the bootloader selection after a system reset. Before you are even entering the main function of your application, a few steps are already executed to prepare the device. These steps may be different for the DFU. Calling the DFU from an unexpected system state can lead to undefined behaviour.
The best way to enter the DFU is to latch the BOOT0 pin high and then perform a system reset either by software reset, or by IWDG.
Another potential problem can arise from the HSE, which has to be provided externally, however since you stated that the DFU does eventually work, this seems unlikely.

Related

Problem with Writing to the Flash Memory of the STM32L4R5 microcontroller

I'm trying to write to the flash memory of STM32L4R5 in 'FLASH_TYPEPROGRAM_FAST' mode of the HAL_FLASH_Program().
The flash of the MCU is configured as Single Bank.
Writing to the flash only works when using 'FLASH_TYPEPROGRAM_DOUBLEWORD'. The flash reads as 0xFFFFFFFF when written in 'FLASH_TYPEPROGRAM_FAST' mode.
This is my test project:
// Page Erase Structure
static FLASH_EraseInitTypeDef EraseInitStruct;
// Page Erase Status
uint32_t eraseStatus;
// Data Buffer
uint64_t pDataBuf[32] =
{
0x1111111122222222, 0x3333333344444444,
0x5555555566666666, 0x7777777788888888,
0x12345678ABC12345, 0x23456789DEF01234,
0x34567890AAABBB12, 0x4567890FABCDDD34,
0x1111111122222222, 0x3333333344444444,
0x5555555566666666, 0x7777777788888888,
0x12345678ABC12345, 0x23456789DEF01234,
0x34567890AAABBB12, 0x4567890FABCDDD34,
0x1111111122222222, 0x3333333344444444,
0x5555555566666666, 0x7777777788888888,
0x12345678ABC12345, 0x23456789DEF01234,
0x34567890AAABBB12, 0x4567890FABCDDD34,
0x1111111122222222, 0x3333333344444444,
0x5555555566666666, 0x7777777788888888,
0x12345678ABC12345, 0x23456789DEF01234,
0x34567890AAABBB12, 0x4567890FABCDDD34
};
// Flash Page Start Address
uint32_t pageAddr = 0x081FE000;
// Fill Erase Init Structure
EraseInitStruct.TypeErase = FLASH_TYPEERASE_PAGES;
EraseInitStruct.Banks = FLASH_BANK_1;
EraseInitStruct.Page = 255;
EraseInitStruct.NbPages = 1;
// Unlocking the FLASH Control Register
HAL_FLASH_Unlock();
// Clear OPTVERR Bit Set on Virgin Samples
__HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_OPTVERR);
// Erasing the Flash Page
HAL_FLASHEx_Erase(&EraseInitStruct, &Error);
#if 0
// Wriring a Doubled Word to Flash. pDataBuf[0] is the 64-bit Word
HAL_FLASH_Program(FLASH_TYPEPROGRAM_DOUBLEWORD, pageAddr, pDataBuf[0]);
#else
// Wriring 32 Double Words. pDataBuf is the Starting Address of the 64-bit Array
HAL_FLASH_Program(FLASH_TYPEPROGRAM_FAST_AND_LAST, pageAddr, pDataBuf);
#endif
// Locking the FLASH Control Register
HAL_FLASH_Lock();
Am I doing anything wrong?
Thank you,
Ivan
Document RM0932, Reference manual for STM32L4+, section FLASH. It covers reading and writing from/to flash, for both single-bank and double-bank configurations and different MCU models of this line. It seems, most differences are about reading from Flash (64-bit for dual bank, 128-bit for single bank). As for writing, page 128:
Flash is very picky about data width, and every STM32 has different data width for its flash, it seems. Very recently I stumbled upon one, which accepted only 16-bit writes and reads. This one likes double words. There is no universal function to read and write flash to any STM32, so it seems one of your commands doesn't respect this MCU's Flash data width rules. You can check if any error flags appear as per reference manual, although, as you can see, it doesn't say anything about trying to write 32-bit piece of data. I would expect that write to fail, but we can't make any conclusions about error flags from the screenshot provided. If you're curious enough, you can look at what data width every mode/function of yours utilizes and see what happens. 64-bit writes have to work.

SPI Extra Clock Cycle over communication between STM32 Nucleo L432KC and MAX31865 ADC

The setup that I'm working with is a Nucleo L432KC connected to 8 different MAX31865 ADCs to get temperature readings from RTDs (resistive thermal devices). Each of the 8 chip selects is connected to its own pin, but the SDI/SDO of all chips are connected to the same bus, since I only read from one at a time (only 1 chip select is enabled at a time). For now, I am using a 100 ohm base resistor in the Kelvin connection, not an RTD, just to ensure an accurate resistance reading. The read from an RTD comes from calling the function rtd.read_all(). When I read from one and only one RTD, I get an accurate reading and an accurate SPI waveform (pasted below):
correct SPI reading for 1 ADC
(yellow is chip enable, green is clock, blue is miso, purple is mosi)
However, when I read from 2 or more sequentially, the SPI clock for some reason gains an additional unwanted cycle at the start of the read that throws off the transmitted values. It's been having the effect of shifting the clock to the right and bit-shifting my resistance values to the left by 1.
Logic analyzer reading of SPI; clock has additional cycle at start
What could be causing this extra clock cycle? I'm programming in C++ using mbed. I can access the SPI.h file but I can't see the implementation so I'm not sure what might be causing this extra clock cycle at the start. If I need to add the code too, let me know and I'll edit/comment.
rtd.read_all() function:
uint8_t MAX31865_RTD::read_all( )
{
uint16_t combined_bytes;
//SPI clock polarity/phase (CPOL & CPHA) is set to 11 in spi.format (bit 1 = polarity, bit 0 = phase, see SPI.h)
//polarity of 1 indicates that the SPI reading idles high (default setting is 1; polarity of 0 means idle is 0)
//phase of 1 indicates that data is read on the first edge/low-to-high leg (as opposed to phase 0,
//which reads data on the second edge/high-to-low transition)
//see https://deardevices.com/2020/05/31/spi-cpol-cpha/ to understand CPOL/CPHA
//chip select is negative logic, idles at 1
//When chip select is set to 0, the chip is then waiting for a value to be written over SPI
//That value represents the first register that it reads from
//registers to read from are from addresses 00h to 07h (h = hex, so 0x01, 0x02, etc)
//00 = configuration register, 01 = MSBs of resistance value, 02 = LSBs of
//Registers available on datasheet at https://datasheets.maximintegrated.com/en/ds/MAX31865.pdf
//The chip then automatically increments to read from the next register
/* Start the read operation. */
nss = 0; //tell the MAX31865 we want to start reading, waiting for starting address to be written
/* Tell the MAX31865 that we want to read, starting at register 0. */
spi.write( 0x00 ); //start reading values starting at register 00h
/* Read the MAX31865 registers in the following order:
Configuration (00)
RTD (01 = MSBs, 02 = LSBs)
High Fault Threshold (03 = MSBs, 04 = LSBs)
Low Fault Threshold (05 = MSBs, 06 = LSBs)
Fault Status (07) */
this->measured_resistance = 0;
this->measured_configuration = spi.write( 0x00 ); //read from register 00
//automatic increment to register 01
combined_bytes = spi.write( 0x00 ) << 8; //8 bit value from register 01, bit shifted 8 left
//automatic increment to register 02, OR with previous bit shifted value to get complete 16 bit value
combined_bytes |= spi.write( 0x00 );
//bit 0 of LSB is a fault bit, DOES NOT REPRESENT RESISTANCE VALUE
//bit shift 16-bit value 1 right to remove fault bit and get complete 15 bit raw resistance reading
this->measured_resistance = combined_bytes >> 1;
//high fault threshold
combined_bytes = spi.write( 0x00 ) << 8;
combined_bytes |= spi.write( 0x00 );
this->measured_high_threshold = combined_bytes >> 1;
//low fault threshold
combined_bytes = spi.write( 0x00 ) << 8;
combined_bytes |= spi.write( 0x00 );
this->measured_low_threshold = combined_bytes >> 1;
//fault status
this->measured_status = spi.write( 0x00 );
//set chip select to 1; chip stops incrementing registers when chip select is high; ends read cycle
nss = 1;
/* Reset the configuration if the measured resistance is
zero or a fault occurred. */
if( ( this->measured_resistance == 0 )
|| ( this->measured_status != 0 ) )
{
//reconfigure( );
// extra clock cycle causes measured_status to be non-zero, so chip will reconfigure even though it doesn't need to. reconfigure commented out for now.
}
return( status( ) );
}
Background:
I took a look at the entire implementation of the MAX31865_RTD class and the thing I find "troubling" is that a MAX31865_RTD instance creates its own SPI instance on construction. If you create multiple instances of this MAX31865_RTD class then there will be a separate SPI instance created and initialized for each of these.
If you have 8 of these chips and you create 8 separate MAX31865_RTD instances to provide one for each of your chips then this also creates 8 SPI instances that all point to the same physical SPI device of the microcontroller.
The problem:
When you call the read_all function on your MAX31865_RTD instance it in turn calls the SPI write functions (as seen in the code you provided). But digging deeper in the call chain you will eventually find that the code of the SPI write method (and others as well) is written in a way that it assumes that there can be multiple SPI instances that are using the same SPI hardware with different parameters (frequency, word length, etc...). In order to actually use the SPI hardware, the SPI class instance must first take ownership of the hardware if it does not have it yet. To do this it "acquires" the hardware for itself which basically means that it reconfigures the SPI hardware to the frequency and word length and mode that this particular SPI instance was set to (This happens regardless of the fact that every instance is set to the same parameters. They don't know about each other. They just see the fact that they have lost ownership and thus have to reacquire it and they also automatically assume that the settings are to be restored.). And this frequency (= clock) reinitialization is the reason that your clock is having a weird artefact/glitch on it. Each time you call the read_all on a different MAX31865_RTD instance the SPI instance of that instance will have to do an acquire (because they steal the ownership from each other on each read_all call) and it will make the clock behave weird.
Why it works if you only have one device:
Because when you have one and only one MAX31865_RTD instance then it has only one SPI instance which is the sole "owner" of the SPI hardware. So no-one is stealing the ownership on each turn. Which means that it does not have to re-acquire it on every read_all call. So in that case the SPI hardware is not reinitialized every time. So you don't get the weird clock pulse and everything works as intended.
My proposed solution #1:
I propose that you change the implementation of the read_all method.
If the version of the SPI class that you use has the select method, then add the
spi.select();
line just before pulling the chip select (nss) low. Basically add the line above this block:
/* Start the read operation. */
nss = 0;
If there is no select function, then just add a
spi.write(0x00);
line in the same place instead of the line with the select.
In essence both of the proposed lines just force the acquire (and the accompanying clock glitch) before the chip select line is asserted. So by the time the chip select is pulled low and the actual data is being written the SPI instance already has ownership and the write methods will not trigger an acquire (nor the clock glitch).
My proposed solution #2:
Another solution is to modify the MAX31865_RTD class to use an SPI instance reference and provide that reference through its constructor. That way you can create one SPI instance explicitly and provide this same SPI instance to all your MAX31865_RTD instances at construction. Now since all of your MAX31865_RTD instances are using a reference to the same and only SPI instance, the SPI hardware ownership never changes since there is only one SPI class instance that is using it. Thus the hardware is never reconfigured and the glitch never happens. I would prefer this solution since it is less of a workaround.
My proposed solution #3:
You could also modify the MAX31865_RTD class to have a "setter" for the nss (= chip select) pin. That way, you could have only one MAX31865_RTD instance for all your 8 devices and only change the nss pin before addressing the next device. Since there is only one MAX31865_RTD instance then there is only one SPI instance which also solves the ownership issue and since no re-acquisition has to be made then no glitch will be triggered.
And of-course there can be any number of other ways to fix this knowing the reason of the problem.
Hope this helps in solving your issue.

How can I flash the NXP S32K148 using J-Link?

I'm working on a S32K148 with VS Code and J-Link. This is part of NXP's S32Kxxx series of 32-bit ARM Cortex M4-based MCUs targeted for high-reliability automobile and industrial applications.
I want to flash the chip using JFlash (with J-Link), but it seems that flashing has been disabled.
My research suggests that I need to have a LinkScript file for the S32Kxxx device, but I cannot find such a file anywhere.
Is my assumption correct that a LinkScript file is needed? If so, where can I find this file?
First, you need to ensure that the MCU's reset pin is connected to the reset pin of the J-Link. Without this done properly, you will not be able to connect via J-Link.
Next, it must be noted that the design of the S32Kxxx series makes it impossible to attach to a debugging session via J-Link out of the box. The Segger wiki contains a more detailed description:
ECC protected internal RAM
The device series provide ECC protected internal RAM. By default, J-Link resets the MCU on connect and initializes the RAM contents to 0x00. This is done for the following reasons:
If a memory window in the debugger is open during the debug session and points to a non-initialized RAM region, on the next single step etc. a non-maskable ECC error interrupt would be thrown
J-Link temporarily uses some portions of the RAM during flash programming and accesses to non-initialized RAM areas would throw non maskable ECC error interrupts
Attach to debug session
For the reasons mentioned [above, in the section "ECC protected internal RAM"], no out of the box attach is possible. This is because there is no way to distinguish between an attach and a connect.
To attach to the device the following J-Link script file can be used:
File:NXP_Kinetis_S32_Attach.JLinkScript
Note:
This script file is only supposed to be used in case of an attach, as it skips the device specific connect.
So, your research is correct: in order to attach to the device, you must use a J-Link script, which can be downloaded here from Segger. The contents are reprinted below:
* (c) SEGGER Microcontroller GmbH & Co. KG *
* The Embedded Experts *
* www.segger.com *
**********************************************************************
-------------------------- END-OF-HEADER -----------------------------
File : NXP_Kinetis_S32_Attach.JLinkScript
Purpose : Script file to skip the device specific connect for the
NXP Kinetis S32 devices, to make an attach possible.
Literature:
[1] J-Link User Guide
*/
/*********************************************************************
*
* Constants (similar to defines)
*
**********************************************************************
*/
/*********************************************************************
*
* Global variables
*
**********************************************************************
*/
/*********************************************************************
*
* Local functions
*
**********************************************************************
*/
/*********************************************************************
*
* Global functions
*
**********************************************************************
*/
/*********************************************************************
*
* InitTarget()
*
* Function description
* If present, called right before performing generic connect sequence.
* Usually used for targets which need a special connect sequence.
* E.g.: TI devices with ICEPick TAP on them where core TAP needs to be enabled via specific ICEPick sequences first
*
* Return value
* >= 0: O.K.
* < 0: Error
*
* Notes
* (1) Must not use high-level API functions like JLINK_MEM_ etc.
* (2) For target interface JTAG, this device has to setup the JTAG chain + JTAG TAP Ids.
*/
int InitTarget(void) {
//
// As this script file is only used for attach purposes, no special handling is required.
// Therefore, we override the default connect handled by the J-Link DLL for this device.
// as it would trigger a reset.
//
return 0;
}
/*************************** end of file ****************************/

Using memcpy on mmap'ed region crashes, a for loop does not

I have an NVIDIA Tegra TK1 processor module on a carrier board with a PCI-e slot connecting to it. In that PCIe slot is an FPGA board which exposes some registers and a 64K memory area via PCIe.
On the ARM CPU of the Tegra board, a minimal Linux installation is running.
I am using /dev/mem and the mmap function to obtain user-space pointers to the register structs and the 64K memory area.
The distinct register files and the memory block are all assigned addresses which are aligned and do not overlap with regards to 4KB memory pages.
I explicitly map whole pages with mmap, using the result of getpagesize(), which also is 4096.
I can read/write from/to those exposed registers just fine.
I can read from the memory area (64KB), doing uint32 word-by-word reads in a for loop, just fine. I.e. read contents are correct.
But if I use std::memcpy on the same address range, though, the Tegra CPU freezes, always. I do not see any error message, if GDB is attached I also don't see a thing in Eclipse when trying to step over the memcpy line, it just stops hard. And I have to reset the CPU using the hardware reset button, as the remote console is frozen.
This is debug build with no optimization (-O0), using gcc-linaro-6.3.1-2017.05-i686-mingw32_arm-linux-gnueabihf. I was told the 64K region is accessible byte-wise, I did not try that explicitly.
Is there an actual (potential) problem that I need to worry about, or is there a specific reason why memcpy does not work and maybe should not be used in the first place in this scenario - and I can just carry on using my for loops and think nothing of it?
EDIT: Another effect has been observed: The original code snippet was missing a "vital" printf in the copying for loop, that came before the memory read. That removed, I don't get back valid data. I now updated the code snippet to have an extra read from the same address instead of the printf, which also yields correct data. The confusion intensifies.
Here the (I think) important excerpts of what's going on. With minor modifications, to make sense as shown, in this "de-fluffed" form.
// void* physicalAddr: PCIe "BAR0" address as reported by dmesg, added to the physical address offset of FPGA memory region
// long size: size of the physical region to be mapped
//--------------------------------
// doing the memory mapping
//
const uint32_t pageSize = getpagesize();
assert( IsPowerOfTwo( pageSize ) );
const uint32_t physAddrNum = (uint32_t) physicalAddr;
const uint32_t offsetInPage = physAddrNum & (pageSize - 1);
const uint32_t firstMappedPageIdx = physAddrNum / pageSize;
const uint32_t lastMappedPageIdx = (physAddrNum + size - 1) / pageSize;
const uint32_t mappedPagesCount = 1 + lastMappedPageIdx - firstMappedPageIdx;
const uint32_t mappedSize = mappedPagesCount * pageSize;
const off_t targetOffset = physAddrNum & ~(off_t)(pageSize - 1);
m_fileID = open( "/dev/mem", O_RDWR | O_SYNC );
// addr passed as null means: we supply pages to map. Supplying non-null addr would mean, Linux takes it as a "hint" where to place.
void* mapAtPageStart = mmap( 0, mappedSize, PROT_READ | PROT_WRITE, MAP_SHARED, m_fileID, targetOffset );
if (MAP_FAILED != mapAtPageStart)
{
m_userSpaceMappedAddr = (volatile void*) ( uint32_t(mapAtPageStart) + offsetInPage );
}
//--------------------------------
// Accessing the mapped memory
//
//void* m_rawData: <== m_userSpaceMappedAddr
//uint32_t* destination: points to a stack object
//int length: size in 32bit words of the stack object (a struct with only U32's in it)
// this crashes:
std::memcpy( destination, m_rawData, length * sizeof(uint32_t) );
// this does not, AND does yield correct memory contents - but only with a preceding extra read
for (int i=0; i<length; ++i)
{
// This extra read makes the data gotten in the 2nd read below valid.
// Commented out, the data read into destination will not be valid.
uint32_t tmp = ((const volatile uint32_t*)m_rawData)[i];
(void)tmp; //pacify compiler
destination[i] = ((const volatile uint32_t*)m_rawData)[i];
}
Based on the description, it looks like your FPGA code is not responding correctly to load instructions that are reading from locations on your FPGA and it is causing the CPU to lock up. It's not crashing it is permanently stalled, hence the need for the hard reset. I had this problem also when debugging my PCIE logic on an FPGA.
Another indication that your logic is not responding correctly is that you need an extra read in order to get the right responses.
Your loop is doing 32-bit loads but memcpy is doing at least 64-bit loads, which changes how your logic responds. For example, it will need to use two TLPs with 32 bits of response if the first 128 bits of the completion and the next 32 bits in the second 128 bit TLP of the completion.
What I found super-useful was to add logic to log all the PCIE transactions into an SRAM and to be able to dump the SRAM out to see how the logic was behaving or misbehaving. We have a nifty utility, pcieflat, that prints one PCIE TLP per line. It even has documentation.
When the PCIE interface is not working well enough, I stream the log to a UART in hex which can be decoded by pcieflat.
This tool is also useful for debugging performance problems -- you can look at how well your DMA reads and writes are pipelined.
Alternatively, if you have integrated logic analyzer or similar on the FPGA, you can trace the activity that way. But it's nicer to have the TLPs parsed according to PCIE protocol.

IRQ 8 isn't working... HW or SW?

First, I program for Vintage computer groups. What I write is specifically for MS-DOS and not windows, because that's what people are running. My current program is for later systems and not the 8086 line, so the plan was to use IRQ 8. This allows me to set the interrupt rate in binary values from 2 / second to 8192 / second (2, 4, 8, 16, etc...)
Only, for some reason, on the newer old systems (ok, that sounds weird,) it doesn't seem to be working. In emulation, and the 386 system I have access to, it works just fine, but on the P3 system I have (GA-6BXC MB w/P3 800 CPU,) it just doesn't work.
The code
setting up the interrupt
disable();
oldrtc = getvect(0x70); //Reads the vector for IRQ 8
settvect(0x70,countdown); //Sets the vector for
outportb(0x70,0x8a);
y = inportb(0x71) & 0xf0;
outportb(0x70,0x8a);
outportb(0x71,y | _MRATE_); //Adjustable value, set for 64 interrupts per second
outportb(0x70,0x8b);
y = inportb(0x71);
outportb(0x70,0x8b);
outportb(0x71,y | 0x40);
enable();
at the end of the interrupt
outportb(0x70,0x0c);
inportb(0x71); //Reading the C register resets the interrupt
outportb(0xa0,0x20); //Resets the PIC (turns interrupts back on)
outportb(0x20,0x20); //There are 2 PICs on AT machines and later
When closing program down
disable();
outportb(0x70,0x8b);
y = inportb(0x71);
outportb(0x70,0x8b);
outportb(0x71,y & 0xbf);
setvect(0x70,oldrtc);
enable();
I don't see anything in the code that can be causing the problem. But it just doesn't seem to make sense. While I don't completely trust the information, MSD "does" report IRQ 8 as the RTC Counter and says it is present and working just fine. Is it possible that later systems have moved the vector? Everything I find says that IRQ 8 is vector 0x70, but the interrupt never triggers on my Pentium III system. Is there some way to find if the Vectors have been changed?
It's been a LONG time since I've done any MS-DOS code and I don't think I ever worked with this particular interrupt (I'm pretty sure you can just read the memory location to fetch the time too, and IRQ0 can be used to trigger you at an interval too, so maybe that's better. Anyway, given my rustiness, forgive me for kinda link dumping.
http://wiki.osdev.org/Real_Time_Clock the bottom of that page has someone saying they've had problem on some machines too. RBIL suggests it might be a BIOS thing: http://www.ctyme.com/intr/rb-7797.htm
Without DOS, I'd just capture IRQ0 itself and remap all of them to my own interrupt numbers and change the timing as needed. I've done that somewhat recently! I think that's a bad idea on DOS though, this looks more recommended for that: http://www.ctyme.com/intr/rb-2443.htm
Anyway though, I betcha it has to do with the BIOS thing:
"Notes: Many BIOSes turn off the periodic interrupt in the INT 70h handler unless in an event wait (see INT 15/AH=83h,INT 15/AH=86h).. May be masked by setting bit 0 on I/O port A1h "