I want to build a Linux for my Altera DE2-115 that connects with the serial port. Now I see kernel panic from the serial port when I try and run it:
0.000000] Linux version 4.8.0+ (developer#developer-Latitude-E7450) (gcc version 6.2.0 (Sourcery CodeBench Lite 207
[ 0.000000] bootconsole [early0] enabled
[ 0.000000] early_console initialized at 0xe8001400
[ 0.000000] On node 0 totalpages: 32768
[ 0.000000] free_area_init_node: node 0, pgdat c05d88f0, node_mem_map c0699740
[ 0.000000] Normal zone: 288 pages used for memmap
[ 0.000000] Normal zone: 0 pages reserved
[ 0.000000] Normal zone: 32768 pages, LIFO batch:7
[ 0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[ 0.000000] pcpu-alloc: [0] 0
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32480
[ 0.000000] Kernel command line: debug console=ttyAL0,115200
[ 0.000000] PID hash table entries: 512 (order: -1, 2048 bytes)
[ 0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
[ 0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
[ 0.000000] Sorting __ex_table...
[ 0.000000] allocated 131072 bytes of page_ext
[ 0.000000] Memory: 122900K/131072K available (4215K kernel code, 166K rwdata, 1456K rodata, 164K init, 740K bss, 81)
[ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] NR_IRQS:64 nr_irqs:64 0
[ 0.000000] Kernel panic - not syncing: 1 timer is found, it needs 2 timers in system
[ 0.000000]
[ 0.000000] ---[ end Kernel panic - not syncing: 1 timer is found, it needs 2 timers in system
[ 0.000000]
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
[ 0.000000] Kernel panic - not syncing: Oops
[ 0.000000] ---[ end Kernel panic - not syncing: Oops
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 0.000000] ea = c0003de4, ra = c028ac84, cause = 15
There were many steps and I will try and rebuild with two timers but can you tell me how I should do it? What information more is needed? I changed a lot in make menuconfig but now I proved that it can get started and seen with minicom through the serial port. I used a .sof from a friend who had a very stripped down version of uClinux and it was also old. It only had root access and I try and make one where you can log in. The dts file is as follows.
/*
* This devicetree is generated by sopc2dts version unknown on Mon Mar 13 18:52:55 CET 2017
* Sopc2dts is written by Walter Goossens <waltergoossens#home.nl>
* in cooperation with the nios2 community <nios2-dev#lists.rocketboards.org>
*/
/dts-v1/;
/ {
model = "ALTR,qsys";
compatible = "ALTR,qsys";
#address-cells = <1>;
#size-cells = <1>;
cpus {
#address-cells = <1>;
#size-cells = <0>;
cpu: cpu#0x0 {
device_type = "cpu";
compatible = "altr,nios2-16.1", "altr,nios2-1.1";
reg = <0x00000000>;
interrupt-controller;
#interrupt-cells = <1>;
altr,exception-addr = <3221225504>; /* embeddedsw.dts.params.altr,exception-addr type NUMBER */
altr,fast-tlb-miss-addr = <3355447296>; /* embeddedsw.dts.params.altr,fast-tlb-miss-addr type NUMBER */
altr,has-initda = <1>; /* embeddedsw.dts.params.altr,has-initda type NUMBER */
altr,has-mmu = <1>; /* embeddedsw.dts.params.altr,has-mmu type NUMBER */
altr,has-mul = <1>; /* embeddedsw.dts.params.altr,has-mul type NUMBER */
altr,implementation = "fast"; /* embeddedsw.dts.params.altr,implementation type STRING */
altr,pid-num-bits = <8>; /* embeddedsw.dts.params.altr,pid-num-bits type NUMBER */
altr,reset-addr = <3221225472>; /* embeddedsw.dts.params.altr,reset-addr type NUMBER */
altr,tlb-num-entries = <256>; /* embeddedsw.dts.params.altr,tlb-num-entries type NUMBER */
altr,tlb-num-ways = <16>; /* embeddedsw.dts.params.altr,tlb-num-ways type NUMBER */
altr,tlb-ptr-sz = <8>; /* embeddedsw.dts.params.altr,tlb-ptr-sz type NUMBER */
clock-frequency = <50000000>; /* embeddedsw.dts.params.clock-frequency type NUMBER */
dcache-line-size = <32>; /* embeddedsw.dts.params.dcache-line-size type NUMBER */
dcache-size = <2048>; /* embeddedsw.dts.params.dcache-size type NUMBER */
icache-line-size = <32>; /* embeddedsw.dts.params.icache-line-size type NUMBER */
icache-size = <4096>; /* embeddedsw.dts.params.icache-size type NUMBER */
}; //end cpu#0x0 (cpu)
}; //end cpus
memory {
device_type = "memory";
reg = <0x08001000 0x00000400>,
<0x00000000 0x08000000>;
}; //end memory
sopc0: sopc#0 {
device_type = "soc";
ranges;
#address-cells = <1>;
#size-cells = <1>;
compatible = "ALTR,avalon", "simple-bus";
bus-frequency = <50000000>;
jtag: serial#0x8001440 {
compatible = "altr,juart-16.1", "altr,juart-1.0";
reg = <0x08001440 0x00000008>;
interrupt-parent = <&cpu>;
interrupts = <1>;
}; //end serial#0x8001440 (jtag)
timer: timer#0x8001420 {
compatible = "altr,timer-16.1", "altr,timer-1.0";
reg = <0x08001420 0x00000020>;
interrupt-parent = <&cpu>;
interrupts = <0>;
clock-frequency = <50000000>; /* embeddedsw.dts.params.clock-frequency type NUMBER */
}; //end timer#0x8001420 (timer)
uart: serial#0x8001400 {
compatible = "altr,uart-16.1", "altr,uart-1.0";
reg = <0x08001400 0x00000020>;
interrupt-parent = <&cpu>;
interrupts = <2>;
clock-frequency = <50000000>; /* embeddedsw.dts.params.clock-frequency type NUMBER */
current-speed = <115200>; /* embeddedsw.dts.params.current-speed type NUMBER */
}; //end serial#0x8001400 (uart)
}; //end sopc#0 (sopc0)
chosen {
bootargs = "debug console=ttyAL0,115200";
}; //end chosen
}; //end /
You don't run Linux on the "Altera DE2-115" board, you run it on the hardware implemented in the FPGA (as Cyclone IV E chip "EP4CE115" used in the board is not a SoC and have no any "hard cores" of some ARM Cortex-A9 or like).
It is impossible to fully answer the question without details of the hardware you have implemented, as FPGAs allow to implement anything (which fits into the FPGA used).
Your kernel and your nios 2 implementation are not compatible. Can you regenerate FPGA bitstream, do you have quartus / qsys project with the nios 2 and its settings?
You should check how many timers are implemented in your nios, as your kernel requires two of them, which is visible at first "panic":
[ 0.000000] Kernel panic - not syncing: 1 timer is found, it needs 2 timers in system
This requirement is arch-specific for the nios2, it is from time initializer code: http://lxr.free-electrons.com/source/arch/nios2/kernel/time.c?v=4.8#L341
341 void __init time_init(void)
343 struct device_node *np;
344 int count = 0;
345
346 for_each_compatible_node(np, NULL, ALTR_TIMER_COMPATIBLE)
347 count++;
348
349 if (count < 2)
350 panic("%d timer is found, it needs 2 timers in system\n", count);
Rebuilding of the linux kernel 4.8 will not help, as this code is always unconditionally compiled into it: http://lxr.free-electrons.com/source/arch/nios2/kernel/Makefile?v=4.8 20 obj-y += time.o
And your build of the kernel correct or almost correct, as it boots.
What you should do - provide needed timers in your soft core by reconfiguring qsys project (add second timer, connect it like the first one, but on different address and new interrupt id; other parameters should be as required by altera's linux). Then rebuild bitstream with quartus (it is long). And also register new timer in the device tree with correct offsets/irq id, then relink kernel with devicetree and reupload both FPGA bitstream and linux image to FPGA / Flash.
The requirements of two timers for 3.19+ kernels are listed at https://rocketboards.org/foswiki/view/Documentation/NiosIILinuxUserManual (commit http://lkml.iu.edu/hypermail/linux/kernel/1507.0/01501.html?)
Kernel v3.19 and above
Few things need to be noted if using kernel 3.19 and above:
Toolchain: Sourcery CondeBench Lite 2014.05-47 and above
Hardware: 2 timers in nios2 system
DTS: Small letter for altr prefix in dts file, eg: altr,has-mmu; compatible = "altr,juart-1.0";
Or you can try some kernel older than 3.19, which may work with single timer.
Related
I am trying to use Address Sanitizer, but the kernel keeps killing my process due to excessive memory usage. Without Address Sanitizer the process runs just fine.
The program is compiled for arm-v7a using gcc-8.2.1 with
-fno-omit-frame-pointer
-fsanitize=address
-fsanitize-recover=all
-fdata-sections
-ffunction-sections
-fPIC
I am starting the process as follows:
ASAN_OPTIONS=debug=1:verbosity=0:detect_leaks=0:abort_on_error=0:halt_on_error=0:check_initialization_order=1:allocator_may_return_null=1 ./Launcher
Is there a way to reduce the memory footprint of the Address Sanitizer? Unfortunately, enabling swap is not an option.
This is the kernel log as printed by dmesg:
[512792.413376] Launcher invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
[512792.424695] CPU: 3 PID: 7786 Comm: Launcher Tainted: G W 5.4.1 #1
[512792.432821] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[512792.439455] Backtrace:
[512792.442039] [<8010eb1c>] (dump_backtrace) from [<8010eee0>] (show_stack+0x20/0x24)
[512792.449721] r7:811d32ec r6:00000000 r5:60070113 r4:811d32ec
[512792.455500] [<8010eec0>] (show_stack) from [<80ba06e8>] (dump_stack+0xbc/0xe8)
[512792.462840] [<80ba062c>] (dump_stack) from [<80257360>] (dump_header+0x64/0x440)
[512792.470343] r10:00000a24 r9:a9a4ce00 r8:00016f9c r7:80e82aac r6:a749fce0 r5:a9a4ce00
[512792.478275] r4:a749fce0 r3:6f25b167
[512792.481958] [<802572fc>] (dump_header) from [<80256364>] (oom_kill_process+0x494/0x4ac)
[512792.490066] r10:00000a24 r9:a9a4c100 r8:00016f9c r7:80e82aac r6:a749fce0 r5:a9a4ce00
[512792.497996] r4:a9a4d264
[512792.500636] [<80255ed0>] (oom_kill_process) from [<80256e8c>] (out_of_memory+0xf8/0x4ec)
[512792.508830] r10:00000a24 r9:a9a4c100 r8:00016f9c r7:8110b640 r6:8110b640 r5:811d8860
[512792.516760] r4:a749fce0
[512792.519405] [<80256d94>] (out_of_memory) from [<802a0910>] (__alloc_pages_nodemask+0xf7c/0x13a4)
[512792.528295] r9:00000000 r8:81107d30 r7:811d5588 r6:0000233c r5:00000000 r4:00000000
[512792.536153] [<8029f994>] (__alloc_pages_nodemask) from [<80285d10>] (__pte_alloc+0x34/0x1ac)
[512792.544697] r10:74b94000 r9:00000000 r8:00000000 r7:a8b9e580 r6:a8b9e580 r5:a7445d28
[512792.552628] r4:a7445d28
[512792.555271] [<80285cdc>] (__pte_alloc) from [<802869c8>] (copy_page_range+0x4ec/0x650)
[512792.563295] r9:00000000 r8:00000000 r7:a8b9e580 r6:a7174f4c r5:a8b9e580 r4:a7445d28
[512792.571148] [<802864dc>] (copy_page_range) from [<801241b8>] (dup_mm+0x470/0x4e0)
[512792.578736] r10:a7174f14 r9:a7174f10 r8:a8b9d680 r7:a7c36420 r6:a7174f4c r5:a8b9e580
[512792.586667] r4:a7835d20
[512792.589307] [<80123d48>] (dup_mm) from [<801255e0>] (copy_process+0x10bc/0x1888)
[512792.596807] r10:a749ff60 r9:ffffffff r8:00000000 r7:a749e000 r6:9d283400 r5:a825c300
[512792.604738] r4:00100000
[512792.607378] [<80124524>] (copy_process) from [<80125fb8>] (_do_fork+0x90/0x750)
[512792.614792] r10:00100000 r9:a749e000 r8:801011c4 r7:a749e000 r6:a749ff60 r5:6f25b167
[512792.622722] r4:00000001
[512792.625362] [<80125f28>] (_do_fork) from [<80126954>] (sys_clone+0x80/0x9c)
[512792.632428] r10:00000078 r9:a749e000 r8:801011c4 r7:00000078 r6:7649e000 r5:6f25b167
[512792.640358] r4:a749e000
[512792.643001] [<801268d4>] (sys_clone) from [<80101000>] (ret_fast_syscall+0x0/0x28)
[512792.650671] Exception stack(0xa749ffa8 to 0xa749fff0)
[512792.655828] ffa0: 54ad00fc 76ffe964 00100011 00000000 54ad00fc 00000000
[512792.664112] ffc0: 54ad00fc 76ffe964 7649e000 00000078 54ad0100 54ad0120 00000001 54ad0280
[512792.672391] ffe0: 00000078 54ad00e8 763d590b 763bf746
[512792.677546] r5:76ffe964 r4:54ad00fc
[512792.681484] Mem-Info:
[512792.683936] active_anon:158884 inactive_anon:15315 isolated_anon:0
active_file:1041 inactive_file:1140 isolated_file:0
unevictable:2224 dirty:8 writeback:1 unstable:0
slab_reclaimable:4553 slab_unreclaimable:4490
mapped:5064 shmem:17635 pagetables:1579 bounce:0
free:56987 free_pcp:173 free_cma:53962
[512792.718450] Node 0 active_anon:635536kB inactive_anon:61260kB active_file:4264kB inactive_file:5460kB unevictable:8896kB isolated(anon):0kB isolated(file):0kB mapped:21056kB dirty:32kB writeback:4kB shmem:70540kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[512792.742142] Normal free:226708kB min:3312kB low:4140kB high:4968kB active_anon:635436kB inactive_anon:61260kB active_file:4584kB inactive_file:5652kB unevictable:8896kB writepending:36kB present:1048576kB managed:1015668kB mlocked:0kB kernel_stack:1216kB pagetables:6316kB bounce:0kB free_pcp:192kB local_pcp:0kB free_cma:215848kB
[512792.771461] lowmem_reserve[]: 0 0 0
[512792.775161] Normal: 1651*4kB (UMEC) 839*8kB (UMEC) 495*16kB (UMEC) 221*32kB (UMEC) 78*64kB (UEC) 29*128kB (MC) 1*256kB (U) 40*512kB (C) 35*1024kB (C) 21*2048kB (C) 10*4096kB (C) 2*8192kB (C) 0*16384kB 1*32768kB (C) = 226708kB
[512792.795442] 20243 total pagecache pages
[512792.799391] 0 pages in swap cache
[512792.802816] Swap cache stats: add 0, delete 0, find 0/0
[512792.808232] Free swap = 0kB
[512792.811225] Total swap = 0kB
[512792.814296] 262144 pages RAM
[512792.817288] 0 pages HighMem/MovableOnly
[512792.821232] 8227 pages reserved
[512792.824558] 81920 pages cma reserved
[512792.828247] Tasks state (memory values in pages):
[512792.833057] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[512792.841890] [ 211] 0 211 9965 1608 67584 0 0 systemd-journal
[512792.851149] [ 224] 0 224 3848 249 16384 0 -1000 systemd-udevd
[512792.860222] [ 317] 0 317 1559 339 12288 0 0 dhclient
[512792.868867] [ 316] 0 316 1559 348 14336 0 0 dhclient
[512792.877508] [ 333] 0 333 1810 856 14336 0 0 haveged
[512792.886061] [ 334] 101 334 4985 261 22528 0 0 systemd-timesyn
[512792.895309] [ 336] 104 336 1342 167 12288 0 0 rpcbind
[512792.903866] [ 368] 106 368 1333 218 12288 0 -900 dbus-daemon
[512792.912684] [ 369] 0 369 6193 356 22528 0 0 rsyslogd
[512792.921327] [ 370] 0 370 2681 178 18432 0 0 systemd-logind
[512792.930490] [ 372] 0 372 1625 158 14336 0 0 cron
[512792.938784] [ 431] 0 431 428 122 10240 0 0 motion_sensor
[512792.947870] [ 560] 0 560 8756 207 18432 0 0 automount
[512792.956597] [ 564] 0 564 1190 172 12288 0 0 login
[512792.964988] [ 566] 0 566 1338 98 12288 0 0 agetty
[512792.973372] [ 572] 0 572 2218 276 16384 0 -1000 sshd
[512792.981664] [ 574] 0 574 946 33 12288 0 0 inputattach
[512792.990569] [ 637] 0 637 3017 379 18432 0 0 systemd
[512792.999122] [ 640] 0 640 3504 402 20480 0 0 (sd-pam)
[512793.007768] [ 653] 0 653 1760 329 12288 0 0 bash
[512793.016057] [ 671] 0 671 2599 1116 18432 0 0 Server.
[512793.025310] [ 732] 0 732 1300 132 12288 0 0 dbus-daemon
[512793.034212] [ 31836] 0 31836 3173 980 22528 0 0 sshd
[512793.042428] [ 31847] 0 31847 422 154 8192 0 0 sftp-server
[512793.051332] [ 5350] 0 5350 2555 351 16384 0 0 sshd
[512793.059631] [ 5452] 0 5452 1793 379 16384 0 0 bash
[512793.067924] [ 5823] 0 5823 2555 350 16384 0 0 sshd
[512793.076216] [ 5833] 0 5833 1760 326 14336 0 0 bash
[512793.084509] [ 6822] 0 6822 792 31 10240 0 0 xinit
[512793.092813] [ 6823] 0 6823 29526 5386 112640 0 0 Xorg
[512793.101103] [ 6827] 0 6827 3655 866 22528 0 0 xterm
[512793.109488] [ 6829] 0 6829 1620 114 14336 0 0 bash
[512793.117784] [ 7256] 0 7256 1549 322 12288 0 0 watch
[512793.126169] [ 7363] 0 7363 127832 56725 520192 0 0 gdb
[512793.134370] [ 7368] 0 7368 281561 93707 1046528 0 0 Launcher
[512793.143613] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),task=Launcher,pid=7368,uid=0
[512793.152974] Out of memory: Killed process 7368 (Launcher) total-vm:1126244kB, anon-rss:365128kB, file-rss:5700kB, shmem-rss:4000kB, UID:0 pgtables:1046528kB oom_score_adj:0
[512793.387824] oom_reaper: reaped process 7368 (Launcher), now anon-rss:0kB, file-rss:0kB, shmem-rss:4000kB
You could reduce some Asan features (or enable them one by one in separate runs):
# Disable UAR error detection (reduces code and heap size)
CFLAGS+='-fsanitize-address-use-after-return=never -fno-sanitize-address-use-after-scope'
export ASAN_OPTIONS="$ASAN_OPTIONS:detect_stack_use_after_return=1"
# Disable inline instrumentation (slower but saves code size)
CFLAGS+='-fsanitize-address-outline-instrumentation'
# Reduce heap quarantine (reduces heap consumption but also lowers chance of UAF detection)
export ASAN_OPTIONS="$ASAN_OPTIONS:quarantine_size_mb=16"
# Do not keep full backtrace of malloc origin (slightly complicates debugging but reduces heap size)
export ASAN_OPTIONS="$ASAN_OPTIONS:malloc_context_size=5"
Compiler options are for Clang but GCC also has similar switches.
As for the swap, we had good experience with enabling compressed swap in RAM.
The test setup is: pktgen send packet to vhost-user1 port, then ovs forward it vhost-user2, then testpmd received it from vhost-user2.
The problem is: pktgen can not send any packets, testpmd received no packet also, I don't know what's the problem.
Needs some help, thanks in advance!
OVS: 2.9.0
DPDK: 17.11.6
pktgen: 3.4.4
OVS setup:
export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
export PATH=$PATH:/usr/local/share/openvswitch/scripts
rm /usr/local/etc/openvswitch/conf.db
ovsdb-tool create /usr/local/etc/openvswitch/conf.db /usr/local/share/openvswitch/vswitch.ovsschema
ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach
ovs-vsctl --no-wait init
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true other_config:dpdk-lcore=0x2 other_config:dpdk-socket-mem="1024,0"
ovs-vswitchd unix:/usr/local/var/run/openvswitch/db.sock --pidfile --detach
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x8
ovs-vsctl add-br ovs-br0 -- set bridge ovs-br0 datapath_type=netdev
ovs-vsctl add-port ovs-br0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuser
ovs-vsctl add-port ovs-br0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuser
ovs-vsctl add-port ovs-br0 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuser
ovs-vsctl add-port ovs-br0 vhost-user3 -- set Interface vhost-user3 type=dpdkvhostuser
sudo ovs-ofctl del-flows ovs-br0
sudo ovs-ofctl add-flow ovs-br0 in_port=2,dl_type=0x800,idle_timeout=0,action=output:3
sudo ovs-ofctl add-flow ovs-br0 in_port=3,dl_type=0x800,idle_timeout=0,action=output:2
sudo ovs-ofctl add-flow ovs-br0 in_port=1,dl_type=0x800,idle_timeout=0,action=output:4
sudo ovs-ofctl add-flow ovs-br0 in_port=4,dl_type=0x800,idle_timeout=0,action=output:1
run pktgen:
root#k8s:/home/haosp/OVS_DPDK/pktgen-3.4.4# pktgen -c 0xf --master-lcore 0 -n 1 --socket-mem 512,0 --file-prefix pktgen --no-pci \
> --vdev 'net_virtio_user0,mac=00:00:00:00:00:05,path=/usr/local/var/run/openvswitch/vhost-user0' \
> --vdev 'net_virtio_user1,mac=00:00:00:00:00:01,path=/usr/local/var/run/openvswitch/vhost-user1' \
> -- -P -m "1.[0-1]"
Copyright (c) <2010-2017>, Intel Corporation. All rights reserved. Powered by DPDK
EAL: Detected 4 lcore(s)
EAL: Probing VFIO support...
EAL: VFIO support initialized
Lua 5.3.4 Copyright (C) 1994-2017 Lua.org, PUC-Rio
Copyright (c) <2010-2017>, Intel Corporation. All rights reserved.
Pktgen created by: Keith Wiles -- >>> Powered by DPDK <<<
>>> Packet Burst 64, RX Desc 1024, TX Desc 2048, mbufs/port 16384, mbuf cache 2048
=== port to lcore mapping table (# lcores 4) ===
lcore: 0 1 2 3 Total
port 0: ( D: T) ( 1: 1) ( 0: 0) ( 0: 0) = ( 1: 1)
port 1: ( D: T) ( 1: 1) ( 0: 0) ( 0: 0) = ( 1: 1)
Total : ( 0: 0) ( 2: 2) ( 0: 0) ( 0: 0)
Display and Timer on lcore 0, rx:tx counts per port/lcore
Configuring 2 ports, MBUF Size 2176, MBUF Cache Size 2048
Lcore:
1, RX-TX
RX_cnt( 2): (pid= 0:qid= 0) (pid= 1:qid= 0)
TX_cnt( 2): (pid= 0:qid= 0) (pid= 1:qid= 0)
Port :
0, nb_lcores 1, private 0x5635a661d3a0, lcores: 1
1, nb_lcores 1, private 0x5635a661ff70, lcores: 1
** Default Info (net_virtio_user0, if_index:0) **
max_rx_queues : 1, max_tx_queues : 1
max_mac_addrs : 64, max_hash_mac_addrs: 0, max_vmdq_pools: 0
rx_offload_capa: 28, tx_offload_capa : 0, reta_size : 0, flow_type_rss_offloads:0000000000000000
vmdq_queue_base: 0, vmdq_queue_num : 0, vmdq_pool_base: 0
** RX Conf **
pthresh : 0, hthresh : 0, wthresh : 0
Free Thresh : 0, Drop Enable : 0, Deferred Start : 0
** TX Conf **
pthresh : 0, hthresh : 0, wthresh : 0
Free Thresh : 0, RS Thresh : 0, Deferred Start : 0, TXQ Flags:00000f00
Create: Default RX 0:0 - Memory used (MBUFs 16384 x (size 2176 + Hdr 128)) + 192 = 36865 KB headroom 128 2176
Set RX queue stats mapping pid 0, q 0, lcore 1
Create: Default TX 0:0 - Memory used (MBUFs 16384 x (size 2176 + Hdr 128)) + 192 = 36865 KB headroom 128 2176
Create: Range TX 0:0 - Memory used (MBUFs 16384 x (size 2176 + Hdr 128)) + 192 = 36865 KB headroom 128 2176
Create: Sequence TX 0:0 - Memory used (MBUFs 16384 x (size 2176 + Hdr 128)) + 192 = 36865 KB headroom 128 2176
Create: Special TX 0:0 - Memory used (MBUFs 64 x (size 2176 + Hdr 128)) + 192 = 145 KB headroom 128 2176
Port memory used = 147601 KB
Initialize Port 0 -- TxQ 1, RxQ 1, Src MAC 00:00:00:00:00:05
** Default Info (net_virtio_user1, if_index:0) **
max_rx_queues : 1, max_tx_queues : 1
max_mac_addrs : 64, max_hash_mac_addrs: 0, max_vmdq_pools: 0
rx_offload_capa: 28, tx_offload_capa : 0, reta_size : 0, flow_type_rss_offloads:0000000000000000
vmdq_queue_base: 0, vmdq_queue_num : 0, vmdq_pool_base: 0
** RX Conf **
pthresh : 0, hthresh : 0, wthresh : 0
Free Thresh : 0, Drop Enable : 0, Deferred Start : 0
** TX Conf **
pthresh : 0, hthresh : 0, wthresh : 0
Free Thresh : 0, RS Thresh : 0, Deferred Start : 0, TXQ Flags:00000f00
Create: Default RX 1:0 - Memory used (MBUFs 16384 x (size 2176 + Hdr 128)) + 192 = 36865 KB headroom 128 2176
Set RX queue stats mapping pid 1, q 0, lcore 1
Create: Default TX 1:0 - Memory used (MBUFs 16384 x (size 2176 + Hdr 128)) + 192 = 36865 KB headroom 128 2176
Create: Range TX 1:0 - Memory used (MBUFs 16384 x (size 2176 + Hdr 128)) + 192 = 36865 KB headroom 128 2176
Create: Sequence TX 1:0 - Memory used (MBUFs 16384 x (size 2176 + Hdr 128)) + 192 = 36865 KB headroom 128 2176
Create: Special TX 1:0 - Memory used (MBUFs 64 x (size 2176 + Hdr 128)) + 192 = 145 KB headroom 128 2176
Port memory used = 147601 KB
Initialize Port 1 -- TxQ 1, RxQ 1, Src MAC 00:00:00:00:00:01
Total memory used = 295202 KB
Port 0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
!ERROR!: Could not read enough random data for PRNG seed
Port 1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
!ERROR!: Could not read enough random data for PRNG seed
=== Display processing on lcore 0
WARNING: Nothing to do on lcore 2: exiting
WARNING: Nothing to do on lcore 3: exiting
RX/TX processing lcore: 1 rx: 2 tx: 2
For RX found 2 port(s) for lcore 1
For TX found 2 port(s) for lcore 1
Pktgen:/>set 0 dst mac 00:00:00:00:00:03
Pktgen:/>set all rate 10
Pktgen:/>set 0 count 10000
Pktgen:/>set 1 count 20000
Pktgen:/>str
| Flags:Port : P--------------:0 P--------------:1 0/0
Link State : P--------------:0 P--------------:1 ----TotalRate----
Pkts/s Max/Rx : <UP-10000-FD> <UP-10000-FD> 0/0
Max/Tx : 0/0 0/0 0/0
MBits/s Rx/Tx : 256/0 256/0 512/0
Broadcast : 0/0 0/0 0/0
Multicast : 0 0
64 Bytes : 0 0
65-127 : 0 0
128-255 : 0 0
256-511 : 0 0
512-1023 : 0 0
1024-1518 : 0 0
Runts/Jumbos : 0 0
Errors Rx/Tx : 0/0 0/0
Total Rx Pkts : 0/0 0/0
Tx Pkts : 0 0
Rx MBs : 256 256
Tx MBs : 0 0
ARP/ICMP Pkts : 0 0
Tx Count/% Rate : 0/0 0/0
Pattern Type : abcd... abcd...
Tx Count/% Rate : 10000 /10% 20000 /10%--------------------
PktSize/Tx Burst : 64 / 64 64 / 64
Src/Dest Port : 1234 / 5678 1234 / 5678--------------------
Pkt Type:VLAN ID : IPv4 / TCP:0001 IPv4 / TCP:0001
802.1p CoS : 0 0--------------------
ToS Value: : 0 0
- DSCP value : 0 0--------------------
- IPP value : 0 0
Dst IP Address : 192.168.1.1 192.168.0.1--------------------
Src IP Address : 192.168.0.1/24 192.168.1.1/24
Dst MAC Address : 00:00:00:00:00:03 00:00:00:00:00:05--------------------
Src MAC Address : 00:00:00:00:00:05 00:00:00:00:00:01
VendID/PCI Addr : 0000:0000/00:00.0 0000:0000/00:00.0--------------------
Pktgen:/> str
-- Pktgen Ver: 3.4.4 (DPDK 17.11.6) Powered by DPDK --------------------------
Pktgen:/>
run testpmd:
./testpmd -c 0xf -n 1 --socket-mem 512,0 --file-prefix testpmd --no-pci \
--vdev 'net_virtio_user2,mac=00:00:00:00:00:02,path=/usr/local/var/run/openvswitch/vhost-user2' \
--vdev 'net_virtio_user3,mac=00:00:00:00:00:03,path=/usr/local/var/run/openvswitch/vhost-user3' \
-- -i -a --burst=64 --txd=2048 --rxd=2048 --coremask=0x4
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: 1 hugepages of size 1073741824 reserved, but no mounted hugetlbfs found for that size
EAL: Probing VFIO support...
EAL: VFIO support initialized
update_memory_region(): Too many memory regions
update_memory_region(): Too many memory regions
Interactive-mode selected
Auto-start selected
Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
update_memory_region(): Too many memory regions
update_memory_region(): Too many memory regions
update_memory_region(): Too many memory regions
update_memory_region(): Too many memory regions
Configuring Port 0 (socket 0)
Port 0: 00:00:00:00:00:02
Configuring Port 1 (socket 0)
Port 1: 00:00:00:00:00:03
Checking link statuses...
Done
Start automatic packet forwarding
io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support enabled, MP allocation mode: native
Logical Core 2 (socket 0) forwards packets on 2 streams:
RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
io packet forwarding packets/burst=64
nb forwarding cores=1 - nb forwarding ports=2
port 0: RX queue number: 1 Tx queue number: 1
Rx offloads=0x0 Tx offloads=0x0
RX queue: 0
RX desc=2048 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
RX Offloads=0x0
TX queue: 0
TX desc=2048 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX offloads=0x0 - TX RS bit threshold=0
port 1: RX queue number: 1 Tx queue number: 1
Rx offloads=0x0 Tx offloads=0x0
RX queue: 0
RX desc=2048 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
RX Offloads=0x0
TX queue: 0
TX desc=2048 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX offloads=0x0 - TX RS bit threshold=0
testpmd> show port info
Bad arguments
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 0 RX-missed: 0 RX-bytes: 0
RX-errors: 0
RX-nombuf: 0
TX-packets: 0 TX-errors: 0 TX-bytes: 0
Throughput (since last show)
Rx-pps: 0
Tx-pps: 0
############################################################################
######################## NIC statistics for port 1 ########################
RX-packets: 0 RX-missed: 0 RX-bytes: 0
RX-errors: 0
RX-nombuf: 0
TX-packets: 0 TX-errors: 0 TX-bytes: 0
Throughput (since last show)
Rx-pps: 0
Tx-pps: 0
############################################################################
OVS dump-flow show:
root#k8s:/home/haosp# ovs-ofctl dump-flows ovs-br0
cookie=0x0, duration=77519.972s, table=0, n_packets=0, n_bytes=0, ip,in_port="vhost-user1" actions=output:"vhost-user2"
cookie=0x0, duration=77519.965s, table=0, n_packets=0, n_bytes=0, ip,in_port="vhost-user2" actions=output:"vhost-user1"
cookie=0x0, duration=77519.959s, table=0, n_packets=0, n_bytes=0, ip,in_port="vhost-user0" actions=output:"vhost-user3"
cookie=0x0, duration=77518.955s, table=0, n_packets=0, n_bytes=0, ip,in_port="vhost-user3" actions=output:"vhost-user0"
ovs-ofctl dump-ports ovs-br0 show:
root#k8s:/home/haosp# ovs-ofctl dump-ports ovs-br0
OFPST_PORT reply (xid=0x2): 5 ports
port "vhost-user3": rx pkts=0, bytes=0, drop=0, errs=0, frame=?, over=?, crc=?
tx pkts=0, bytes=0, drop=6, errs=?, coll=?
port "vhost-user1": rx pkts=0, bytes=0, drop=0, errs=0, frame=?, over=?, crc=?
tx pkts=0, bytes=0, drop=8, errs=?, coll=?
port "vhost-user0": rx pkts=0, bytes=0, drop=0, errs=0, frame=?, over=?, crc=?
tx pkts=0, bytes=0, drop=8, errs=?, coll=?
port "vhost-user2": rx pkts=0, bytes=0, drop=0, errs=0, frame=?, over=?, crc=?
tx pkts=0, bytes=0, drop=8, errs=?, coll=?
port LOCAL: rx pkts=50, bytes=3732, drop=0, errs=0, frame=0, over=0, crc=0
tx pkts=0, bytes=0, drop=0, errs=0, coll=0
ovs-ofctl show ovs-br0
root#k8s:/home/haosp# ovs-ofctl show ovs-br0
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000ca4f2b8e6b4b
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(vhost-user0): addr:00:00:00:00:00:00
config: 0
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
2(vhost-user1): addr:00:00:00:00:00:00
config: 0
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
3(vhost-user2): addr:00:00:00:00:00:00
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
4(vhost-user3): addr:00:00:00:00:00:00
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(ovs-br0): addr:ca:4f:2b:8e:6b:4b
config: 0
state: 0
current: 10MB-FD COPPER
speed: 10 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
ovs-vsctl show
root#k8s:/home/haosp# ovs-vsctl show
635ba448-91a0-4c8c-b6ca-4b9513064d7f
Bridge "ovs-br0"
Port "vhost-user2"
Interface "vhost-user2"
type: dpdkvhostuser
Port "ovs-br0"
Interface "ovs-br0"
type: internal
Port "vhost-user0"
Interface "vhost-user0"
type: dpdkvhostuser
Port "vhost-user3"
Interface "vhost-user3"
type: dpdkvhostuser
Port "vhost-user1"
Interface "vhost-user1"
type: dpdkvhostuser
It seems that pktgen can not send packets, ovs statatics shows no packet received also,
I have no idea yet, it confused me
If the goal is to have packet transfer between Pktgen and testpmd that is connected by OVS-DPDK one has to use net_vhost and virtio_user pair.
DPDK Pktgen (net_vhost) <==> OVS-DPDK port-1 (virtio_user) {Rule to forward} OVS-DPDK port-2 (virtio_user) <==> DPDK Pktgen (net_vhost)
In the current setup, you will have to make the following changes
start DPDK pktgen by changing from --vdev net_virtio_user0,mac=00:00:00:00:00:05,path=/usr/local/var/run/openvswitch/vhost-user0 to --vdev net_vhost0,iface=/usr/local/var/run/openvswitch/vhost-user0
start DPDK testpmd by changing from --vdev 'net_virtio_user2,mac=00:00:00:00:00:02,path=/usr/local/var/run/openvswitch/vhost-user2' to --vdev 'net_vhost0,iface=/usr/local/var/run/openvswitch/vhost-user2'
then start DPDK-OVS with --vdev=virtio_user0,path=/usr/local/var/run/openvswitch/vhost-user0 and --vdev=virtio_user1,path=/usr/local/var/run/openvswitch/vhost-user2
add rules to allow the port to port forwarding between pktgen and testpmd
Note:
please update the command line for multiple ports.
screenshot shared below with pktgen and l2fwd setup
I'm writing a CUDA program for image processing. Same kernel "processOneChannel" will be launched for RGB channels.
Below I try to specify streams for the three kernel launches so they can be processed concurrently. But nvprof says they are still launched one after another...
There are two other kernels before and after these three, and I don't want them to run concurrently.
Basically I want the following:
seperateChannels --> processOneChannel(x3) --> recombineChannels
Please advice what I did wrong..
void kernelLauncher(const ushort4 * const h_inputImageRGBA, ushort4 * const d_inputImageRGBA,
ushort4* const d_outputImageRGBA, const size_t numRows, const size_t numCols,
unsigned short *d_redProcessed,
unsigned short *d_greenProcessed,
unsigned short *d_blueProcessed,
unsigned short *d_prand)
{
int MAXTHREADSx = 512;
int MAXTHREADSy = 1;
int nBlockX = numCols / MAXTHREADSx + 1;
int nBlockY = numRows / MAXTHREADSy + 1;
const dim3 blockSize(MAXTHREADSx,MAXTHREADSy,1);
const dim3 gridSize(nBlockX,nBlockY,1);
// cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
int nstreams = 5;
cudaStream_t *streams = (cudaStream_t *) malloc(nstreams * sizeof(cudaStream_t));
for (int i = 0; i < nstreams; i++)
{
checkCudaErrors(cudaStreamCreateWithFlags(&(streams[i]),cudaStreamNonBlocking));
}
separateChannels<<<gridSize,blockSize>>>(d_inputImageRGBA,
(int)numRows,
(int)numCols,
d_red,
d_green,
d_blue);
cudaDeviceSynchronize();
checkCudaErrors(cudaGetLastError());
processOneChannel<<<gridSize,blockSize,0,streams[0]>>>(d_red,
d_redProcessed,
(int)numRows,(int)numCols,
d_filter,d_prand);
processOneChannel<<<gridSize,blockSize,0,streams[1]>>>(d_green,
d_greenProcessed,
(int)numRows,(int)numCols,
d_filter,d_prand);
processOneChannel<<<gridSize,blockSize,0,streams[2]>>>(d_blue,
d_blueProcessed,
(int)numRows,(int)numCols,
d_filter,d_prand);
cudaDeviceSynchronize();
checkCudaErrors(cudaGetLastError());
recombineChannels<<<gridSize, blockSize>>>(d_redProcessed,
d_greenProcessed,
d_blueProcessed,
d_outputImageRGBA,
numRows,
numCols);
for (int i = 0; i < nstreams; i++)
{
cudaStreamDestroy(streams[i]);
}
free(streams);
cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
}
Here's nvprof gpu trace output. Note the memcpy before the kernel launches are to pass filter data for the processing, so they cannot run in concurrency with kernel launches.
==10001== Profiling result:
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput Device Context Stream Name
1.02428s 2.2400us - - - - - 28.125MB 1e+04GB/s GeForce GT 750M 1 13 [CUDA memset]
1.02855s 18.501ms - - - - - 28.125MB 1.4846GB/s GeForce GT 750M 1 13 [CUDA memcpy HtoD]
1.21959s 1.1371ms - - - - - 1.7580MB 1.5098GB/s GeForce GT 750M 1 13 [CUDA memcpy HtoD]
1.22083s 1.3440us - - - - - 7.0313MB 5e+03GB/s GeForce GT 750M 1 13 [CUDA memset]
1.22164s 1.3440us - - - - - 7.0313MB 5e+03GB/s GeForce GT 750M 1 13 [CUDA memset]
1.22243s 3.6480us - - - - - 7.0313MB 2e+03GB/s GeForce GT 750M 1 13 [CUDA memset]
1.22349s 10.240us - - - - - 8.0000KB 762.94MB/s GeForce GT 750M 1 13 [CUDA memcpy HtoD]
1.22351s 6.6021ms (6 1441 1) (512 1 1) 12 0B 0B - - GeForce GT 750M 1 13 separateChannels(...) [123]
1.23019s 10.661ms (6 1441 1) (512 1 1) 36 192B 0B - - GeForce GT 750M 1 14 processOneChannel(...) [133]
1.24085s 10.518ms (6 1441 1) (512 1 1) 36 192B 0B - - GeForce GT 750M 1 15 processOneChannel(...) [141]
1.25137s 10.779ms (6 1441 1) (512 1 1) 36 192B 0B - - GeForce GT 750M 1 16 processOneChannel(...) [149]
1.26372s 5.7810ms (6 1441 1) (512 1 1) 15 0B 0B - - GeForce GT 750M 1 13 recombineChannels(...) [159]
1.26970s 19.859ms - - - - - 28.125MB 1.3831GB/s GeForce GT 750M 1 13 [CUDA memcpy DtoH]
Here's CMakeList.txt where I passed -default-stream per-thread to nvcc
cmake_minimum_required(VERSION 2.6 FATAL_ERROR)
find_package(OpenCV REQUIRED)
find_package(CUDA REQUIRED)
set(
CUDA_NVCC_FLAGS
${CUDA_NVCC_FLAGS};
-default-stream per-thread
)
file( GLOB hdr *.hpp *.h )
file( GLOB cu *.cu)
SET (My_files main.cpp)
# Project Executable
CUDA_ADD_EXECUTABLE(My ${My_files} ${hdr} ${cu})
target_link_libraries(My ${OpenCV_LIBS})
Each kernel is launching 6*1441 which is over 8000 blocks, of 512 threads each. That is filling the machine, preventing blocks from subsequent kernel launches from executing.
The machine has a capacity. The maximum instantaneous capacity in blocks is equal to the number of SMs in your GPU multiplied by the maximum number of blocks per SM, both of which are specifications that you can retrieve with the deviceQuery app. When you fill it up, it cannot process more blocks until some of the already running blocks have retired. This process will continue for the first kernel launch until most of the blocks have retired. Then the second kernel will start executing.
I want to create my own volume id using the drive serial + partition offset + partition size, but I need to know how to get the partition information on OS X. I have (unsucceedingly) tried the following:
int fd;
if ((fd = open("/dev/disk0s1", O_RDONLY|O_NONBLOCK)) >= 0) {
struct hd_geometry geom;
if (ioctl(fd, 0x0301, &geom) == 0){ //0x0301 is HDIO_GETGEO
printf("Index = %u\n", geom.start);
}
close(fd);
}
But even if that were to succeed, it is a flawed solution since as this noted: hd_geometry.start is an unsigned long and "will not contain a meaningful value for disks over 219 Gb in size." Furthermore, I belive that it requires administrative rights, which is also bad. Is there any other way of doing this?
Okay last point first. Requiring admin rights is necessary because you are trying to read a raw disk; you could for example potentially seek to a block where a private crypto key is written and read it as an unprivileged user and then where would we be?
Second, /dev/disk0s1 is just a partition and it's also the block-device version of it. You need to read the character device version of the disk, which would be /dev/rdisk0.
Third, HDIO_GETGEO is a linux kernel ioctl (especially consider the 0x0301 value of it) you are not going to get far on Darwin with this; Have a look at <sys/disk.h> for the related disk IOCTLs. I think DKIOCGETFEATURES / DKIOCGETPHYSICALBLOCKSIZE etc should get you going.
If you have trouble with these concepts I HIGHLY recommend doing this development in a virtual machine that you can clobber because you do NOT want to accidentally use an IOCTL which will screw up your disks.
Addendum (possibly the answer)
GUID Partition Table
So you are working on Mac OS X / Darwin; We'll assume GUID Partition Table
LBA == Logical Block Addressing ... ; 1 block = 512 bytes
LBA 0 - Master Boot Record (also contained old partition table)
LBA 1 - GUID Partition Table (standard for OS X)
LBA 2 - first 4 entries
LBA 3 - 33 - next 124 entries making for a total of 128 entries
LBA 34 - Partition 1
You can grab the second block and start tracing the information
Have a read at http://en.wikipedia.org/wiki/GUID_Partition_Table
It's quite well defined. GUID uses little-endian byte order for integer values (see examples at the bottom of the wikipedia page)
Suggestion for testing
Make a copy so that you are not screwing with the actual disks:
dd if=/dev/rdisk0 of=fakedisk count=33
this will create a copy of the first 33 blocks or a disk.
Use fakedisk to test your program out.
MBR
In case your disk uses MBR use the same concepts as GPT
http://en.wikipedia.org/wiki/Master_Boot_Record
has excellent description of the sectors.
Using dtruss fdisk -d /dev/rdisk0 dump to get hints
dtrussing fdisk dump shows that fdisk uses the approach described above.
dtruss fdisk -d /dev/rdisk0
SYSCALL(args) = return
open("/dev/dtracehelper\0", 0x2, 0x7FFF5CFDD5C0) = 3 0
__sysctl(0x7FFF5CFDD084, 0x2, 0x7FFF5CFDD070) = 0 0
bsdthread_register(0x7FFF8BCA41D4, 0x7FFF8BCA41C4, 0x2000) = 0 0
[[ .... content edited ... ]]
open("/dev/rdisk0\0", 0x0, 0x7FFF5CFDDD7A) = 3 0
fstat64(0x3, 0x7FFF5CFDDA10, 0x0) = 0 0
fstat64(0x3, 0x7FFF5CFDDAC8, 0x0) = 0 0
ioctl(0x3, 0x40086419, 0x7FFF5CFDDB60) = 0 0
ioctl(0x3, 0x40046418, 0x7FFF5CFDDB5C) = 0 0
close(0x3) = 0 0
open("/dev/rdisk0\0", 0x0, 0x0) = 3 0
fstat64(0x3, 0x7FFF5CFDDAD0, 0x0) = 0 0
open("/dev/rdisk0\0", 0x0, 0x0) = 4 0
fstat64(0x4, 0x7FFF5CFDDA80, 0x0) = 0 0
lseek(0x4, 0x0, 0x0) = 0 0
issetugid(0x102C22000, 0x3, 0x7FFF5CFDDC00) = 0 0
geteuid(0x102C22000, 0x3, 0x0) = 0 0
[[ tracing data suppressed ]]
read(0x4, "\0", 0x200) = 512 0
close(0x4) = 0 0
getrlimit(0x1008, 0x7FFF5CFDCFA8, 0x7FFF8BD0D470) = 0 0
fstat64(0x1, 0x7FFF5CFDCEF8, 0x7FFF5CFDCFBC) = 0 0
ioctl(0x1, 0x4004667A, 0x7FFF5CFDCF94) = 0 0
write_nocancel(0x1, "1,625142447,0xEE,-,1023,254,63,1023,254,63\n\0", 0x2B) = 43 0
write_nocancel(0x1, "0,0,0x00,-,0,0,0,0,0,0\n\0", 0x17) = 23 0
write_nocancel(0x1, "0,0,0x00,-,0,0,0,0,0,0\n\0", 0x17) = 23 0
write_nocancel(0x1, "0,0,0x00,-,0,0,0,0,0,0\n\0", 0x17) = 23 0
close(0x3) = 0 0
deciphering ioctls
How did I figure out that it was these ioctls that are used.
dtruss dump is:
ioctl(0x3, 0x40086419, 0x7FFF5CFDDB60) = 0 0
ioctl(0x3, 0x40046418, 0x7FFF5CFDDB5C) = 0 0
and 0x40086518 corresponds to DKIOCGETBLOCKSIZE
This is gleaned by tracing back disk.h (and noting that _IOR expands to _IOC in ioccom.h) and that the last 8 bits correspond to the second number in the IOCTL constant define.
#define DKIOCGETBLOCKSIZE _IOR('d', 24, uint32_t)
in 0x40086418 the trailing 18(hex) == 24(dec)
So now that we note that fdisk performs DKIOCGETBLOCKCOUNT and DKIOCGETBLOCKSIZE to get the physical extents because technically you should use the RESULT of that to figure out LBA offsets (see deciphering ioctls below)
The actual read in fdisk
This is how fdisk is doing it:
open("/dev/rdisk0\0", 0x0, 0x0) = 4 0
read(0x4, "\0", 0x200) = 512 0
close(0x4) = 0 0
You can follow suit make sure you replace the 0x200 with the actual block size.
Also, if you're going to use the dd command above to make a copy use the block size as it comes out here.
Have you tried the DKIOCGETPHYSICALEXTENT ioctl? It fills in a dk_physical_extent_t structure that includes a 64-bit offset and a 64-bit length.
Using:
inline uint64_t rdtsc()
{
uint32_t cycles_high;
uint32_t cycles_low;
asm volatile ("CPUID\n\t"
"RDTSC\n\t"
"mov %%edx, %0\n\t"
"mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low)::
"%rax", "%rbx", "%rcx", "%rdx");
return ( ((uint64_t)cycles_high << 32) | cycles_low );
}
thread 1 running
while(globalIndex < COUNT)
{
while(globalIndex %2 == 0 && globalIndex < COUNT)
;
cycles[globalIndex][0] = rdtsc();
cycles[globalIndex][1] = cpuToBindTo;
__sync_add_and_fetch(&globalIndex,1);
}
thread 2 running
while(globalIndex < COUNT)
{
while(globalIndex %2 == 1 && globalIndex < COUNT)
;
cycles[globalIndex][0] = rdtsc();
cycles[globalIndex][1] = cpuToBindTo;
__sync_add_and_fetch(&globalIndex,1);
}
i am seeing
CPU rdtsc() t1-t0
11 = 5023231563212740 990
03 = 5023231563213730 310
11 = 5023231563214040 990
03 = 5023231563215030 310
11 = 5023231563215340 990
03 = 5023231563216330 310
11 = 5023231563216640 990
03 = 5023231563217630 310
11 = 5023231563217940 990
03 = 5023231563218930 310
11 = 5023231563219240 990
03 = 5023231563220230 310
11 = 5023231563220540 990
03 = 5023231563221530 310
11 = 5023231563221840 990
03 = 5023231563222830 310
11 = 5023231563223140 990
03 = 5023231563224130 310
11 = 5023231563224440 990
03 = 5023231563225430 310
11 = 5023231563225740 990
03 = 5023231561739842 310
11 = 5023231561740152 990
03 = 5023231561741142 310
11 = 5023231561741452 12458
03 = 5023231561753910 458
11 = 5023231561754368 1154
03 = 5023231561755522 318
11 = 5023231561755840 982
03 = 5023231561756822 310
11 = 5023231561757132 990
03 = 5023231561758122 310
11 = 5023231561758432 990
03 = 5023231561759422 310
I'm not sure how I received a pong of 12458, but was wondering why i was seeing 310-990-310 instead of 650-650-650. I thought that tsc was suppose to be synchronized across cores. my constant_tsc cpu flag is on.
What are you running this code on? TSC synchronization is supposed to be done in the OS/kernel and is hardware dependent. For instance, you might pass a flag like powernow-k8.tscsync=1 to the kernel boot parameters via your bootloader.
You need to search for the correct TSC synchronization method for your combination of OS and hardware. By and large, this entire thing is automated - I wouldn't be surprised if you're running on a custom kernel or non i686 hardware?
If you search on Google with the correct terms, you'll find a lot of resources such as mailing list discussions on this topic. For instance, here's one algorithm being discussed (though apparently it's not a good one). However, it's not something that userland developers should be worried with - this is arcane sorcery that only kernel devs need to worry their heads with.
Basically, it's the OS' job, at boot time, to synchronize the TSC counters between all the different processors and/or cores on an SMP machine, within a certain margin of error. If you're seeing numbers that are that wildly off, there's something wrong with the TSC sync and your time would be better spent finding out why your OS hasn't synced the TSCs correctly rather than trying to implement your own TSC sync algorithm.
Do you have a NUMA memory architecture? The global counter could be located in RAM that is a couple hops away for one of the CPUs and local for the other. You can test this by fixing your threads to cores on the same NUMA node.
EDIT: I was guessing this since the performance was CPU specific.
EDIT: As to synchronizing the TSC. I am not aware of a an easy way, which is not to say that there isn't one! What would happen if you took core 1 as the reference clock, and then compared it to core 2? If you did that comparison many times and took the minimum, you might have a good approximation. This should handle the case when you get preempted in the middle of a comparison.