Assembly for MSP430 doesn't work

Assembly for MSP430 doesn't work - c++

I'm currently learning assembly with MSP430.
This simple code should trigger an interrupt for TimerA.
#include "msp430.h" ; #define controlled include file
RSEG CSTACK
RSEG CODE
Reset:
mov.w #WDTPW|WDTHOLD, &WDTCTL ; stop watchdog
mov.w #MC_0|TACLR, &TACTL; stop timer
mov.w #7D0h, &TACCR0; count to 2000
mov.w #CCIE ,&TACCTL0
mov.w #MC_1|TAIE|TASSEL_2, &TACTL; RESET timerA, start it and set clock to VCO
mov.b #BIT0, &P2DIR; output
mov.b #BIT0, &P2OUT;
Main:
mov.w #GIE, SR;
jmp Main;
INTER:
xor.b #BIT0, &P1OUT ; toggle LED
reti
RSEG RESET
DW Reset;
ORG 0xFFF2; address of interrupt
DW INTER
END
But get following error after trying to load code on my MSP:
Error[e104]: Failed to fit all segments into specified ranges. Problem
discovered in segment CODE. Unable to place 1 block(s) (0xfff4 byte(s) total)
in 0x7e0 byte(s) of memory.
The problem occurred while processing the segment placement command
"-P(CODE)CODE=F800-FFDF", where at the moment of placement the
available memory ranges were "CODE:f800-ffdf"
Error while running Linker
I am sure it is some noob error, but can you explain what it is?

Related

Can not run correctly on NXP MIMXRT1061 CVL5A

Describe the bug
I am currently trying to run Zephyr7.0 on a development board equipped with MCU: NXP MIMXRT1061 CVL5A.
I simply compiled a sample: \samples\basic\blinky, but it couldn't run correctly.
At first I thought it was a problem with the XIP format that caused Zephyr to not be booted correctly, but then I used SWD to debug it and found that it was booted correctly.
But when calling: /zephyr/arch/arm/core/aarch32/prep_c.c: z_bss_zero(); function Zephyr went wrong
Compilation process
The board I use is mimxrt1060_evk, theoretically there is no problem, because 1061 is based on 1060
west build -p auto -b mimxrt1060_evk .\samples\basic\blinky\
west flash
Debugging process
The first debugging was broken at z_interrupt_stacks
z_arm_reset () at C:/Users/zhihao3x/work/zephyrproject/zephyr/arch/arm/core/aarch32/cortex_m\reset.S:105
105 msr BASEPRI, r0
(gdb) n
134 ldr r0, =z_interrupt_stacks
(gdb) n
135 ldr r1, =CONFIG_ISR_STACK_SIZE + MPU_GUARD_ALIGN_AND_SIZE
(gdb) n
136 adds r0, r0, r1
(gdb) n
137 msr PSP, r0
(gdb)
138 mrs r0, CONTROL
(gdb)
139 movs r1, #2
(gdb)
140 orrs r0, r1 /* CONTROL_SPSEL_Msk */
(gdb)
141 msr CONTROL, r0
(gdb)
147 isb
(gdb)
154 bl z_arm_prep_c
(gdb)
134 ldr r0, =z_interrupt_stacks
(gdb)
Program received signal SIGTRAP, Trace/breakpoint trap.
(gdb)
Later, I used single-step debugging and found that Zephyr reached z_bss_zero
(gdb)
z_arm_floating_point_init ()
at C:/Users/zhihao3x/work/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:1003
1003 __ASM volatile ("MSR control, %0" : : "r" (control) : "memory");
(gdb)
163 __set_CONTROL(__get_CONTROL() & (~(CONTROL_FPCA_Msk)));
(gdb)
0x6000305c in __set_CONTROL (control=3758157056)
at C:/Users/zhihao3x/work/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:1003
1003 __ASM volatile ("MSR control, %0" : : "r" (control) : "memory");
(gdb)
1004 __ISB();
(gdb)
__ISB () at C:/Users/zhihao3x/work/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:260
260 __ASM volatile ("isb 0xF":::"memory");
(gdb)
z_arm_prep_c () at C:/Users/zhihao3x/work/zephyrproject/zephyr/arch/arm/core/aarch32/prep_c.c:183
183 z_bss_zero();
(gdb) n
Program received signal SIGTRAP, Trace/breakpoint trap.
0x62652dc0 in ?? ()
Eventually I located the problem in the memset function，when this function is called, it will fall into an infinite loop and then my program will crash
z_arm_prep_c () at C:/Users/zhihao3x/work/zephyrproject/zephyr/arch/arm/core/aarch32/prep_c.c:183
183 z_bss_zero();
(gdb)
z_bss_zero () at C:/Users/zhihao3x/work/zephyrproject/zephyr/kernel/init.c:89
89 (void)memset(__bss_start, 0, __bss_end - __bss_start);
(gdb)
memset (buf=0x80000030 <z_idle_threads>, c=c#entry=0, n=428)
at C:/Users/zhihao3x/work/zephyrproject/zephyr/lib/libc/minimal/source/string/string.c:355
(gdb)
Program received signal SIGTRAP, Trace/breakpoint trap.
0x63f5fbf8 in ?? ()
Later I tried to debug the memset function and found that the dest address did not change during the loop
Very strange, I do not specify whether it is a gdb problem or a zephyr problem. The address is 0x80000030 before the increment, and 0x80000031 after the increment, and it changes back to 0x80000030 when the next cycle is repeated.
357 unsigned char c_byte = (unsigned char)c;
(gdb)
389 while (n > 0) {
(gdb)
390 *(d_byte++) = c_byte;
392 n--;
(gdb) p d_byte
$1 = (unsigned char *) 0x80000030 <z_idle_threads> "\b"
I tried to change the z_bss_zero function
(void)memset(__bss_start, 0, __bss_end - __bss_start);
Changed the size length
(void)memset(__bss_start, 0, 3);
But when I was debugging with gdb, I found that the while loop inside memset exceeded three times without stopping, and it entered an endless loop.
I speculate that there is a problem with the incremented code after these two lines of code, because when I use the gdb print command to print the value of the variable after each increment, the value of the variable does not change
*(d_byte++) = c_byte;
n--;
Another weird place is that when I use gdb to debug, I find that there is a pointer. Gdb will not break to this side, but skip this code and execute it. I doubt that the compiler is doing it to me. Is the code optimized?
More details
When I use next to execute z_bss_zero, gdb will break to an indeterminate address at 0xdeadbeee. I guess it may be that Zephyr destroyed the stack when initializing the memory.
(gdb)
z_arm_prep_c () at C:/Users/zhihao3x/work/zephyrproject/zephyr/arch/arm/core/aarch32/prep_c.c:183
183 z_bss_zero();
(gdb) n
Program received signal SIGTRAP, Trace/breakpoint trap.
0xdeadbeee in ?? ()
At the same time, the Jlink debugger also output an error log
ERROR: Cannot read register 15 (R15) while CPU is running
Reading all registers
ERROR: Cannot read register 0 (R0) while CPU is running
ERROR: Cannot read register 1 (R1) while CPU is running
ERROR: Cannot read register 2 (R2) while CPU is running
ERROR: Cannot read register 3 (R3) while CPU is running
ERROR: Cannot read register 4 (R4) while CPU is running
ERROR: Cannot read register 5 (R5) while CPU is running
ERROR: Cannot read register 6 (R6) while CPU is running
ERROR: Cannot read register 7 (R7) while CPU is running
ERROR: Cannot read register 8 (R8) while CPU is running
ERROR: Cannot read register 9 (R9) while CPU is running
ERROR: Cannot read register 10 (R10) while CPU is running
ERROR: Cannot read register 11 (R11) while CPU is running
ERROR: Cannot read register 12 (R12) while CPU is running
ERROR: Cannot read register 13 (R13) while CPU is running
ERROR: Cannot read register 14 (R14) while CPU is running
ERROR: Cannot read register 15 (R15) while CPU is running
ERROR: Cannot read register 16 (XPSR) while CPU is running
ERROR: Cannot read register 17 (MSP) while CPU is running
ERROR: Cannot read register 18 (PSP) while CPU is running
ERROR: Cannot read register 24 (PRIMASK) while CPU is running
ERROR: Cannot read register 25 (BASEPRI) while CPU is running
ERROR: Cannot read register 26 (FAULTMASK) while CPU is running
ERROR: Cannot read register 27 (CONTROL) while CPU is running
ERROR: Cannot read register 32 (FPSCR) while CPU is running
ERROR: Cannot read register 33 (FPS0) while CPU is running
ERROR: Cannot read register 34 (FPS1) while CPU is running
ERROR: Cannot read register 35 (FPS2) while CPU is running
ERROR: Cannot read register 36 (FPS3) while CPU is running
ERROR: Cannot read register 37 (FPS4) while CPU is running
ERROR: Cannot read register 38 (FPS5) while CPU is running
ERROR: Cannot read register 39 (FPS6) while CPU is running
ERROR: Cannot read register 40 (FPS7) while CPU is running
ERROR: Cannot read register 41 (FPS8) while CPU is running
ERROR: Cannot read register 42 (FPS9) while CPU is running
ERROR: Cannot read register 43 (FPS10) while CPU is running
ERROR: Cannot read register 44 (FPS11) while CPU is running
ERROR: Cannot read register 45 (FPS12) while CPU is running
ERROR: Cannot read register 46 (FPS13) while CPU is running
ERROR: Cannot read register 47 (FPS14) while CPU is running
ERROR: Cannot read register 48 (FPS15) while CPU is running
ERROR: Cannot read register 49 (FPS16) while CPU is running
ERROR: Cannot read register 50 (FPS17) while CPU is running
ERROR: Cannot read register 51 (FPS18) while CPU is running
ERROR: Cannot read register 52 (FPS19) while CPU is running
ERROR: Cannot read register 53 (FPS20) while CPU is running
ERROR: Cannot read register 54 (FPS21) while CPU is running
ERROR: Cannot read register 55 (FPS22) while CPU is running
ERROR: Cannot read register 56 (FPS23) while CPU is running
ERROR: Cannot read register 57 (FPS24) while CPU is running
ERROR: Cannot read register 58 (FPS25) while CPU is running
ERROR: Cannot read register 59 (FPS26) while CPU is running
ERROR: Cannot read register 60 (FPS27) while CPU is running
ERROR: Cannot read register 61 (FPS28) while CPU is running
ERROR: Cannot read register 62 (FPS29) while CPU is running
ERROR: Cannot read register 63 (FPS30) while CPU is running
ERROR: Cannot read register 64 (FPS31) while CPU is running
ERROR: Cannot read register 33 (FPS0) while CPU is running
ERROR: Cannot read register 34 (FPS1) while CPU is running
ERROR: Cannot read register 35 (FPS2) while CPU is running
ERROR: Cannot read register 36 (FPS3) while CPU is running
ERROR: Cannot read register 37 (FPS4) while CPU is running
ERROR: Cannot read register 38 (FPS5) while CPU is running
ERROR: Cannot read register 39 (FPS6) while CPU is running
ERROR: Cannot read register 40 (FPS7) while CPU is running
ERROR: Cannot read register 41 (FPS8) while CPU is running
ERROR: Cannot read register 42 (FPS9) while CPU is running
ERROR: Cannot read register 43 (FPS10) while CPU is running
ERROR: Cannot read register 44 (FPS11) while CPU is running
ERROR: Cannot read register 45 (FPS12) while CPU is running
ERROR: Cannot read register 46 (FPS13) while CPU is running
ERROR: Cannot read register 47 (FPS14) while CPU is running
ERROR: Cannot read register 48 (FPS15) while CPU is running
ERROR: Cannot read register 49 (FPS16) while CPU is running
ERROR: Cannot read register 50 (FPS17) while CPU is running
ERROR: Cannot read register 51 (FPS18) while CPU is running
ERROR: Cannot read register 52 (FPS19) while CPU is running
ERROR: Cannot read register 53 (FPS20) while CPU is running
ERROR: Cannot read register 54 (FPS21) while CPU is running
ERROR: Cannot read register 55 (FPS22) while CPU is running
ERROR: Cannot read register 56 (FPS23) while CPU is running
ERROR: Cannot read register 57 (FPS24) while CPU is running
ERROR: Cannot read register 58 (FPS25) while CPU is running
ERROR: Cannot read register 59 (FPS26) while CPU is running
ERROR: Cannot read register 60 (FPS27) while CPU is running
ERROR: Cannot read register 61 (FPS28) while CPU is running
ERROR: Cannot read register 62 (FPS29) while CPU is running
ERROR: Cannot read register 63 (FPS30) while CPU is running
ERROR: Cannot read register 64 (FPS31) while CPU is running
Removing breakpoint # address 0x60003068, Size = 2
WARNING: Failed to read memory # address 0xDEADBEEE
Jlink output
The following is the MCU information output by Jlink, I am not sure if this version is supported by Zephyr
-----GDB Server start settings-----
GDBInit file: none
GDB Server Listening port: 2331
SWO raw output listening port: 2332
Terminal I/O port: 2333
Accept remote connection: localhost only
Generate logfile: off
Verify download: off
Init regs on start: off
Silent mode: on
Single run mode: on
Target connection timeout: 5000 ms
------J-Link related settings------
J-Link Host interface: USB
J-Link script: none
J-Link settings file: none
------Target related settings------
Target device: MIMXRT1062xxx6A
Target interface: SWD
Target interface speed: auto
Target endian: little
I have tried hard for seven days and still can’t solve this problem. I tried to switch the version of Zephyr and also used platform IO to generate elf and bin files, but they were all ineffective. Please help me. Thank you very much.

How to debug segmantation fault happening on 'stp' instruction in arm binary?

My application randomly and rarely crashes with segmentation fault signal.
When coredump is opened in GDB following can be seen:
arm instruction leading to crash is:
0x7f8ea08130 fd 7b b7 a9 stp x29, x30, [sp,#-144]!
When code of crashed frame is browsed in GDB, breakpoint stops at opening curly brace of a function:
void SomeClass::someMethod(const std::string& s, int i)
>{
...
}
examining of 'sp' register gives following output:
x $sp
>~"0x7fc761a070:\t0xc761a270\n"
x $sp-144\n"
>~"0x7fc7619fe0:\t"
>&"Cannot access memory at address 0x7fc7619fe0\n"
>169^error,msg="Cannot access memory at address 0x7fc7619fe0"
stack trace seems fine and not corrupted
there are roughly 300 frames in stack and stack size limit is set to be 8192K
UPD: the pagesize in the system is 4k:
>grep -i pagesize /proc/1/smaps
KernelPageSize: 4 kB
MMUPageSize: 4 kB
What else I can check to debug this issue?

STM32F767ZI External Interrupt Handling

I'm attempting to create a proper SPI slave interface for an AD7768-4 ADC. The ADC has a SPI interface, but it doesn't output the conversions via SPI. Instead, there are data outputs that are clocked out on individual GPIO pins. So I basically need to bit-bang data, and output to SPI to get a proper slave SPI interface. Please don't ask why I'm doing it this way, it was assigned to me.
The issue I'm having is with the interrupts. I'm using the STM32F767ZI processor - it runs at 216 MHz, and my ADC data MUST BE clocked out at 20MHz. I've set up my NMIs but what I'm not seeing is where the system calls or points to the interrupt handler.
I used the STMCubeMX software to assign pins and generate the setup code, and in the stm32F7xx.c file, it shows the NMI_Handler() function, but I don't see a pointer to it anywhere in the system files. I also found void HAL_GPIO_EXTI_IRQHandler() function in STM32F7xx_hal_gpio.c, which appears to check if the pin is asserted, and clears any pending bits, but it doesn't reset the interrupt flag, or check it, and again, I see no pointer to this function.
To more thoroughly complicate things, I have 10 clock cycles to determine which flag is set (1 of two at a time), reset it, incerment a variable, and move data from the GPIO registers. I believe this is possible, but again, I'm uncertain of what the system is doing as soon as the interrupt is tripped.
Does anyone have any experience in working with external interrupts on this processor that could shed some light on how this particular system handles things? Again - 10 clock cycles to do what I need to... moving data should only take me 1-2 clock cycles, leaving me 8 to handle interrupts...
EDIT:
We changed the DCLK speed to 5.12 MHz (20.48 MHz MCLK/4) because at 2.56 MHz we had exactly 12.5 microseconds to pipe data out and set up for the next DRDY pulse, and 80 kHz speed gives us exactly zero margin. At 5.12 MHz, I have 41 clock cycles to run the interrupt routine, which I can reduce slightly if I skip checking the second flag and just handle incoming data. But I feel I must use the DRDY flag check at least, and use the routine to enable the second interrupt otherwise I'll be constantly interrupting because DCLK on the ADC is always running. This allows me 6.12 microseconds to read in the data, and 6.25 microseconds to shuffle it out before the next DRDY pulse. I should be able to do that at 32 MHz SPI clock (slave) but will most likely do it at 50MHz. This is my current interrupt code:
void NMI_Handler(void)
{
if(__HAL_GPIO_EXTI_GET_IT(GPIO_PIN_0) != RESET)
{
count = 0;
__HAL_GPIO_EXTI_CLEAR_IT(GPIO_PIN_0);
HAL_GPIO_EXTI_Callback(GPIO_PIN_0);
// __HAL_GPIO_EXTI_CLEAR_FLAG(GPIO_PIN_0);
HAL_NVIC_EnableIRQ(GPIO_PIN_1);
}
else
{
if(__HAL_GPIO_EXTI_GET_IT(GPIO_PIN_1) != RESET)
{
data_pad[count] = GPIOF->IDR;
count++;
if (count == 31)
{
data_send = !data_send;
HAL_NVIC_DisableIRQ(GPIO_PIN_1);
}
__ HAL_GPIO_EXTI_CLEAR_IT(GPIO_PIN_1);
HAL_GPIO_EXTI_Callback(GPIO_PIN_1);
// __HAL_GPIO_EXTI_CLEAR_FLAG(GPIO_PIN_0);
}
}
}
I am still concerned about clock cycles, and I believe I can get away with only checking the DRDY flag if I operate on the presumption that the only other EXTI flag that will trip is for the clock pin. Although I question how this will work if SYS_TICK is running in the background... I'll have to find out.
We're investigating a faster processor to handle the bit-banging, but right now, it looks like the PI3 won't be able to handle it if it's running Linux, and I'm unaware of too many faster processors that run either a very small reliable RTOS, or can be bare metal programmed in a pinch...

10 clock cycles to do what I need to... moving data should only take me 1-2 clock cycles, leaving me 8 to handle interrupts...
No way. Interrupt entry (pushing registers, fetching the vector and filling the pipeline) takes 10-12 cycles even on a Cortex-M7. Then consider a very simple interrupt handler, just moving the input data bits to a buffer and clearing the interrupt flag:
uint32_t *p;
void handler(void) {
*p++ = GPIOA->IDR;
EXTI->PR = 0x10;
}
it gets translated to something like this
handler:
ldr r0, .addr_of_idr // load &GPIOA->IDR
ldr r1, [r0] // load GPIOA->IDR
ldr r2, .addr_ofr_p // load &p
ldr r3, [r2] // load p
str r1, [r3] // store the value from IDR to *p
adds r3, r3, #4 // increment p
str r3, [r2] // store p
ldr r0, .addr_of_pr // load &EXTI->PR
movs r1, #0x10
str r1, [r0] // store 0x10 to EXTI->PR
bx lr
.addr_of_p:
.word p
.addr_of_idr
.word 0x40020010
.addr_of_pr
.word 0x40013C14
So it's 11 instructions, each taking at least one cycle, after interrupt entry. That's assuming the code, vector table, and the stack are all in the fastest RAM region. I'm not sure whether literal pools work in ITCM at all, using immediate literals would add 3 more cycles. Forget it.
This has to be solved with hardware.
The controller has 6 SPI interfaces, pick 4 of them. Connect DRDY to all four NSS pins, DCLK to all SCK pins, and each DOUT pin to one MISO pin. Now each SPI interface handles a single channel, and can collect up to 32 bits in its internal FIFO.
Then I'd set an interrupt on a rising edge on one of the NSS pins (EXTI still works even if the pin is in alternate function mode), and read all data at once.
EDIT
It turns out that the STM32 SPI requres an inordinate amount of delay between NSS falling and SCK rising, which the AD7768 does not provide, so it will not work.
Sigma-Delta interface
The STM32F767 has a DFSDM peripheral, designed to receive data from external ADCs. It can receive up to 8 channels of serial data with 20 MHz, and it can even do some preprocessing that your application might need.
The problem is that the DFSDM has no DRDY input, I don't exactly know how could the data transfer be synchronized. It might work by asserting the START# singal to reset the communication.
If that doesn't work, then you can try starting the DFSDM channels using a timer and DMA. Connect DRDY to the external trigger of TIM1 or TIM8 (other timers won't work, because they are connected to the slower APB1 bus and the other DMA controller), start it on the rising edge of ETR, and let it generate a DMA request after ~20 ns. Then let the DMA write the value needed to start the channel to the DFSDM channel configuration register. Repeat for the oher three channels.

There's a startup file generated before compile: startup_stm32f767xx.s - which contains all the pointers to functions.
Under the marker g_pfnVectors: is .word NMI_Handler pointing to a function for handling the non-masked interrupts, and two other pointers, .word EXTI0_IRQHandler and .word EXTI1_IRQHandler as vectors to the external interrupt handlers. Further down in the same file, is the following compiler directives:
.weak NMI_Handler
.thumb_set NMI_Handler,Default_Handler
.weak EXTI0_IRQHandler
.thumb_set EXTI0_IRQHandler,Default_Handler
.weak EXTI1_IRQHandler
.thumb_set EXTI1_IRQHandler,Default_Handler
This was the info I was looking for to be able to control my interrupts with more precision and fewer clock cycles.

I readed AD7768 DS more carefully and found that it can srnd four channels data to one DOUT pin. So, I talking again about serial audio interface (SAI).
If you can lower DCLK frequency up to 2.5MHz than you can lower sample with ratio 1:8 (as ratio 2.5 MHz to 20 MHz) irt sample rate at full ADC clock.
If you route all 4 channels to one output DOUT0 you slow down sample rate just in ratio 1:4.
AD7768-4 DS
page 53
On the AD7768, the interface can be configured to output conversion
data on one, two, or eight of the DOUTx pins. The DOUTx configuration
for the AD7768 is selected using the FORMATx pins (see Table 33).
page 66 table 34: (for AD7768-4)
page 67 figure 98:
FORMAT0 = 1 All channels output on the DOUT0 pin, in TDM output. Only DOUT0 is in use.
You can use SAI with FS = DRDY and four slots, 32 bits/slot

ORG alternative for C++

In assembly we use the org instruction to set the location counter to a specific location in the memory. This is particularly helpful in making Operating Systems. Here's an example boot loader (From wikibooks):
org 7C00h
jmp short Start ;Jump over the data (the 'short' keyword makes the jmp instruction smaller)
Msg: db "Hello World! "
EndMsg:
Start: mov bx, 000Fh ;Page 0, colour attribute 15 (white) for the int 10 calls below
mov cx, 1 ;We will want to write 1 character
xor dx, dx ;Start at top left corner
mov ds, dx ;Ensure ds = 0 (to let us load the message)
cld ;Ensure direction flag is cleared (for LODSB)
Print: mov si, Msg ;Loads the address of the first byte of the message, 7C02h in this case
;PC BIOS Interrupt 10 Subfunction 2 - Set cursor position
;AH = 2
Char: mov ah, 2 ;BH = page, DH = row, DL = column
int 10h
lodsb ;Load a byte of the message into AL.
;Remember that DS is 0 and SI holds the
;offset of one of the bytes of the message.
;PC BIOS Interrupt 10 Subfunction 9 - Write character and colour
;AH = 9
mov ah, 9 ;BH = page, AL = character, BL = attribute, CX = character count
int 10h
inc dl ;Advance cursor
cmp dl, 80 ;Wrap around edge of screen if necessary
jne Skip
xor dl, dl
inc dh
cmp dh, 25 ;Wrap around bottom of screen if necessary
jne Skip
xor dh, dh
Skip: cmp si, EndMsg ;If we're not at end of message,
jne Char ;continue loading characters
jmp Print ;otherwise restart from the beginning of the message
times 0200h - 2 - ($ - $$) db 0 ;Zerofill up to 510 bytes
dw 0AA55h ;Boot Sector signature
;OPTIONAL:
;To zerofill up to the size of a standard 1.44MB, 3.5" floppy disk
;times 1474560 - ($ - $$) db 0
Is it possible accomplish the task with C++? Is there any command, function etc. like org where i can change the location of the program?

No it's not possible to do in any C compiler that I know of. You can however create your own linker script that places the code/data/bss segments at specific addresses.

Just for clarity, the org directive does not load the code at the specified address, it merely informs the assembler that the code will be loaded at that address. The code shown appears to be for Nasm (or similar) - in AT&T syntax, the .org directive does something different: it pads the code to that address - similar to the times line in the Nasm code.. Nasm can do this because in -f bin mode, it "acts as it's own linker".
The important thing for the code to know is the address where Msg can be found. The jmps and jnes (and call and ret which your example doesn't have, but a compiler may generate) are relative addressing mode. We code jmp target but the bytes that are actually emitted say jmp distance_to_target (plus or minus) so the address doesn't matter.
Gas doesn't do this, it emits a linkable object file. To use ld without a linker script the command line looks something like:
ld -o boot.bin boot.o -oformat binary -T text=0x7C00
(don't quote me on that exact syntax but "something like that") If you can get a linkable object file from your (16-bit capable!) C++ compiler, you might be able to do the same.
In the case of a bootsector, the code is loaded by the BIOS (or fake BIOS) at 0x7C00 - one of the few things we can assume about the bootsector. The sane thing for a bootsector to do is not fiddle-faddle around printing a message, but to load something else. You'll need to know how to find the something else on the disk and where you want to load it to (perhaps where your C++ compiler wants to put it by default) - and jmp there. This jmp will want to be a far jmp, which does need to know the address.
I'm guessing it's going to be some butt-ugly C++!

Why would cortex-m3 reset to address 0 in gdb?

I am building a cross-compile toolchain for the Stellaris LM3S8962 cortex-m3 chip. The test c++ application I have written will execute for some time then fault. The fault will occur when I try to access a memory-mapped hardware device. At the moment my working hypothesis is that I am missing some essential chip initialization in my startup sequence.
What I would like to understand is why would the execution in gdb get halted and the program counter be set to 0? I have the vector table at 0x0, but the first value is the stack pointer. Shouldn't I end up in one of the fault handlers I specify in the vector table?
(gdb)
187 UARTSend((unsigned char *)secret, 2);
(gdb) cont
Continuing.
lm3s.cpu -- clearing lockup after double fault
Program received signal SIGINT, Interrupt.
0x00000000 in g_pfnVectors ()
(gdb) info registers
r0 0x1 1
r1 0x32 50
r2 0xffffffff 4294967295
r3 0x0 0
r4 0x74518808 1951500296
r5 0xc24c0551 3259762001
r6 0x42052dac 1107635628
r7 0x20007230 536900144
r8 0xf85444a9 4166272169
r9 0xc450591b 3293600027
r10 0xd8812546 3632342342
r11 0xb8420815 3091335189
r12 0x3 3
sp 0x200071f0 0x200071f0
lr 0xfffffff1 4294967281
pc 0x1 0x1 <g_pfnVectors+1>
fps 0x0 0
cpsr 0x60000023 1610612771
The toolchain is based on gcc, gdb, openocd.

GDB happily gave you some clue:
clearing lockup after double fault
Your CPU was in locked state. That means it could not run its "Hard Fault" Interrupt Handler (maybe there is a 0 in its Vector).
I usually get these when I forgot to "power" the periperial, the resulting Bus Error escalates first to "Hard Fault" and then to locked state. Should be mentioned in the manual of your MCU, btw.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js