Skip to content

boards/raspberrypi-4b: Implement SMP support#18861

Draft
linguini1 wants to merge 2 commits into
apache:masterfrom
linguini1:rpi4b-smp
Draft

boards/raspberrypi-4b: Implement SMP support#18861
linguini1 wants to merge 2 commits into
apache:masterfrom
linguini1:rpi4b-smp

Conversation

@linguini1
Copy link
Copy Markdown
Contributor

Summary

Implements SMP support for the Raspberry Pi 4B on all four cores using the SMP
spin tables on the BCM2711.

Impact

Users can leverage all four cores on NuttX now! 4x more powerful.

Closes #16954.

Part of GSoC #18507!

Testing

OSTest and the SMP test run successfully on the RPi4B. However, there are artifacts in the early boot logging information on startup. I suspect this is because all cores are starting from _start and competing to print the early output before spinlocks are used to protect the console.

logs coming soon

Any advice on a better jump point to start the cores is appreciated! Not sure if there are any other examples of spin-table SMP devices in the NuttX tree (or at least, I haven't found them).

@github-actions github-actions Bot added Area: Documentation Improvements or additions to documentation Arch: arm64 Issues related to ARM64 (64-bit) architecture Size: M The size of the change in this PR is medium Board: arm64 labels May 10, 2026

for (uint8_t cpu = 0; cpu < CONFIG_SMP_NCPUS; cpu++)
{
putreg64((uint64_t)_start, BCM_SPINTBL_CPU(cpu));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoxiang781216 has Xiaomi ever used a device with spin tables for SMP? Could you inform me if this is a good approach, or if there is a better starting location to jump to after CPU0 has initialized memory and subsystems?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simbit18 @pussuw @mzanders @tmedicci @fdcavalcanti @raiden00pl @acassis could you please take a look if smp init is in the right place? :-)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linguini1 please take a look at a bit lengthy search in the PR comments, looks like there is instrumentation already, we need to set CONFIG_SMP_NCPUS in the configuration? If that does not work then probably this arch specific code would go to arch/arm64/bcm2711/bcm2711_cpustart.c ? :-)

Also CONFIG_SCHED_INSTRUMENTATION may be our friend here :-)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just setting NCPUs did not work before, but I will try with custom stuff in cpustart.c!

acassis
acassis previously approved these changes May 10, 2026
@cederom
Copy link
Copy Markdown
Contributor

cederom commented May 10, 2026

Thank you @linguini1 will take a look and test on hardware in few hours :-)

I also found and ordered ESP32-P4-Nano board for ~25EUR so we will be able to test this hw too :-)

@linguini1
Copy link
Copy Markdown
Contributor Author

Awesome news! Thank you!

@cederom cederom added this to the RPi4B milestone May 10, 2026
linguini1 added 2 commits May 10, 2026 16:56
Adds SMP support and an `smp` configuration for the RPi4B.

Signed-off-by: Matteo Golin <matteo.golin@gmail.com>
Documents the new SMP configuration.

Signed-off-by: Matteo Golin <matteo.golin@gmail.com>
@cederom
Copy link
Copy Markdown
Contributor

cederom commented May 11, 2026

Okay here goes the testing :-) All seems to work good :-)

Great work @linguini1 =)

We may want to put some more benchmarks into this config to see what is the performance gain with smp? :-)

I need to get some sleep, tomorrow will try to read more about this putreg64((uint64_t)_start, BCM_SPINTBL_CPU(cpu)); stuff how other archs implement it :-)

% uname -a
FreeBSD hexagon 14.4-RELEASE-p3 FreeBSD 14.4-RELEASE-p3 GENERIC amd64

% git reflog -1
195077a7b6c (HEAD, linguini1/rpi4b-smp) HEAD@{0}: checkout: moving from master to linguini1/rpi4b-smp

% ./tools/configure.sh -B raspberrypi-4b:smp
  Copy files
  Select CONFIG_HOST_BSD=y
  Refreshing...
(..)
#
# configuration written to .config
#
hexagon% /usr/bin/time -h gmake -j
Create version.h
LN: platform/board to /XXX/nuttx/pr/nuttx-apps.git/platform/dummy
Register: getprime
Register: ostest
Register: smp
Register: dd
Register: nsh
Register: sh
Register: hello
LD: nuttx                                                                                                                                          
Memory region         Used Size  Region Size  %age Used
CP: nuttx.hex
CP: nuttx.bin
Generating config.txt
	10,85s real		40,41s user		15,95s sys

% gmake flash
LD: nuttx                                                                                                                                          
Memory region         Used Size  Region Size  %age Used
CP: nuttx.hex
CP: nuttx.bin
Generating config.txt

% cp nuttx.bin rpi4b-nuttx
% cp config.txt rpi4b-nuttx

% ./rpiboot -v -m 1000 -d rpi4b-nuttx
RPIBOOT: build-date 2025/11/10 pkg-version local fe4a6288

Please fit the EMMC_DISABLE / nRPIBOOT jumper before connecting the power and USB cables to the target device.
If the device fails to connect then please see https://rpltd.co/rpiboot for debugging tips.

Boot directory 'rpi4b-nuttx'
Loading: rpi4b-nuttx/bootcode4.bin
Waiting for BCM2835/6/7/2711/2712...

Device located successfully
Loading: rpi4b-nuttx/bootcode4.bin
Initialised device correctly
Found serial number 4
last_serial -1 serial 4
Second stage boot server
Received message GetFileSize: config.txt
Loading: rpi4b-nuttx/config.txt
File size = 47 bytes
Received message ReadFile: config.txt
File read: config.txt
libusb_bulk_transfer sent 47 bytes; returned 0
Received message GetFileSize: pieeprom.sig
libusb_bulk_transfer sent 0 bytes; returned 0
Cannot open file pieeprom.sig
Received message GetFileSize: recover4.elf
libusb_bulk_transfer sent 0 bytes; returned 0
Cannot open file recover4.elf
Received message GetFileSize: recovery.elf
libusb_bulk_transfer sent 0 bytes; returned 0
Cannot open file recovery.elf
Received message GetFileSize: start4.elf
Loading: rpi4b-nuttx/start4.elf
File size = 2299008 bytes
Received message ReadFile: start4.elf
File read: start4.elf
libusb_bulk_transfer sent 2299008 bytes; returned 0
Received message GetFileSize: fixup4.dat
Loading: rpi4b-nuttx/fixup4.dat
File size = 5496 bytes
Received message ReadFile: fixup4.dat
File read: fixup4.dat
libusb_bulk_transfer sent 5496 bytes; returned 0
Second stage boot server done


% ./rpiboot -v -m 1000 -d rpi4b-nuttx
RPIBOOT: build-date 2025/11/10 pkg-version local fe4a6288

Please fit the EMMC_DISABLE / nRPIBOOT jumper before connecting the power and USB cables to the target device.
If the device fails to connect then please see https://rpltd.co/rpiboot for debugging tips.

Boot directory 'rpi4b-nuttx'
Loading: rpi4b-nuttx/bootcode4.bin
Waiting for BCM2835/6/7/2711/2712...

Device located successfully
Loading embedded: bootcode.bin
Initialised device correctly
Found serial number 1
last_serial -1 serial 1
Second stage boot server
Received message GetFileSize: recovery.elf
libusb_bulk_transfer sent 0 bytes; returned 0
Cannot open file recovery.elf
Received message GetFileSize: config.txt
Loading: rpi4b-nuttx/config.txt
File size = 47 bytes
Received message ReadFile: config.txt
File read: config.txt
libusb_bulk_transfer sent 47 bytes; returned 0
Received message GetFileSize: dt-blob.bin
libusb_bulk_transfer sent 0 bytes; returned 0
Cannot open file dt-blob.bin
Received message GetFileSize: recovery.elf
libusb_bulk_transfer sent 0 bytes; returned 0
Cannot open file recovery.elf
Received message GetFileSize: config.txt
Loading: rpi4b-nuttx/config.txt
File size = 47 bytes
Received message ReadFile: config.txt
File read: config.txt
libusb_bulk_transfer sent 47 bytes; returned 0
Received message GetFileSize: bootcfg.txt
libusb_bulk_transfer sent 0 bytes; returned 0
Cannot open file bootcfg.txt
Received message GetFileSize: nuttx.bin
Loading: rpi4b-nuttx/nuttx.bin
File size = 393216 bytes
Received message GetFileSize: bcm2711-rpi-4-b.dtb
Loading: rpi4b-nuttx/bcm2711-rpi-4-b.dtb
File size = 56289 bytes
Received message ReadFile: bcm2711-rpi-4-b.dtb
File read: bcm2711-rpi-4-b.dtb
libusb_bulk_transfer sent 56289 bytes; returned 0
Received message GetFileSize: overlays/overlay_map.dtb
libusb_bulk_transfer sent 0 bytes; returned 0
Cannot open file overlays/overlay_map.dtb
Received message GetFileSize: config.txt
Loading: rpi4b-nuttx/config.txt
File size = 47 bytes
Received message ReadFile: config.txt
File read: config.txt
libusb_bulk_transfer sent 47 bytes; returned 0
Received message GetFileSize: cmdline.txt
libusb_bulk_transfer sent 0 bytes; returned 0
Cannot open file cmdline.txt
Received message GetFileSize: armstub8-gic.bin
libusb_bulk_transfer sent 0 bytes; returned 0
Cannot open file armstub8-gic.bin
Received message GetFileSize: nuttx.bin
Loading: rpi4b-nuttx/nuttx.bin
File size = 393216 bytes
Received message ReadFile: nuttx.bin
File read: nuttx.bin
libusb_bulk_transfer sent 393216 bytes; returned 0
Received message Done: nuttx.bin
CMD exit
Second stage boot server done


% minicom /dev/cuaU0

Welcome to minicom 2.10

OPTIONS: I18n
Compiled on Mar  2 2026, 17:08:02.
Port /dev/cuaU0, 02:50:56 [U]

Press CTRL-A Z for help on special keys

- Ready to Boot Primary CPU
- Boot from EL2
- Boot from EL1
- Boot to C runtime for OS Initialize

NuttShe- Ready to Boot Second CPU
- Boot from EL2
ll (NSH) NuttX-10.4.0
nsh> ---   BBBoot from EL1
- B--o  ot ot CorCnrunt mo for OS tiitializ


nsh> uname -a
NuttX 10.4.0 195077a7b6c May 11 2026 02:49:21 arm64 raspberrypi-4b

nsh> ?
help usage:  help [-v] [<cmd>]

    .           cp          expr        mkrd        set         truncate
    [           cmp         false       mount       kill        uname
    ?           dirname     fdinfo      mv          pkill       umount
    alias       df          free        pidof       sleep       unset
    unalias     dmesg       memdump     printf      usleep      uptime
    basename    echo        help        ps          source      watch
    break       env         hexdump     pwd         test        xd
    cat         exec        ls          rm          time        wait
    cd          exit        mkdir       rmdir       true

Builtin Apps:
    dd          hello       ostest      smp
    getprime    nsh         sh


nsh> smp
  Main[0]: Running on CPU1
  Main[0]: Initializing barrier
  Main[0]: Thread 1 created
Thread[1]: Started
Thread[1]: Running on CPU0
  Main[0]: Thread 2 created
Thread[2]: Started
Thread[2]: Running on CPU2
  Main[0]: Thread 3 created
Thread[3]: Started
  Main[0]: Thread 4 created
Thread[4]: Started
Thread[3]: Running on CPU1
  Main[0]: Thread 5 created
Thread[5]: Started
Thread[4]: Running on CPU1
  Main[0]: Now running on CPU3
Thread[5]: Running on CPU3
Thread[2]: Now running on CPU0
  Main[0]: Thread 6 created
Thread[6]: Started
Thread[3]: Now running on CPU3
  Main[0]: Thread 7 created
Thread[4]: Now running on CPU3
Thread[6]: Running on CPU3
Thread[7]: Started
  Main[0]: Now running on CPU0
Thread[5]: Now running on CPU0
Thread[7]: Running on CPU3
Thread[2]: Now running on CPU3
  Main[0]: Thread 8 created
Thread[8]: Started
Thread[1]: Now running on CPU1
Thread[4]: Now running on CPU1
  Main[0]: Now running on CPU2
Thread[3]: Now running on CPU2
Thread[8]: Running on CPU2
Thread[5]: Now running on CPU2
Thread[7]: Now running on CPU1
Thread[2]: Now running on CPU1
Thread[6]: Now running on CPU0
Thread[1]: Now running on CPU3
  Main[0]: Now running on CPU3
Thread[8]: Now running on CPU3
Thread[5]: Now running on CPU1
Thread[7]: Now running on CPU3
Thread[4]: Now running on CPU3
Thread[2]: Now running on CPU2
Thread[3]: Now running on CPU0
Thread[6]: Now running on CPU1
Thread[7]: Now running on CPU1
Thread[8]: Now running on CPU1
Thread[4]: Now running on CPU2
Thread[2]: Now running on CPU0
Thread[1]: Now running on CPU0
Thread[5]: Now running on CPU3
Thread[3]: Now running on CPU3
Thread[6]: Now running on CPU0
Thread[7]: Now running on CPU3
Thread[4]: Now running on CPU1
Thread[2]: Now running on CPU1
Thread[3]: Now running on CPU2
Thread[5]: Now running on CPU2
Thread[8]: Now running on CPU2
Thread[4]: Now running on CPU0
Thread[5]: Now running on CPU0
Thread[6]: Now running on CPU1
Thread[3]: Now running on CPU3
Thread[7]: Now running on CPU2
Thread[8]: Now running on CPU1
Thread[6]: Now running on CPU3
Thread[2]: Now running on CPU2
Thread[3]: Now running on CPU0
Thread[4]: Now running on CPU1
Thread[5]: Now running on CPU3
Thread[7]: Now running on CPU0
Thread[1]: Now running on CPU1
Thread[1]: Calling pthread_barrier_wait()
Thread[3]: Calling pthread_barrier_wait()
Thread[4]: Now running on CPU0
Thread[5]: Now running on CPU2
Thread[7]: Now running on CPU3
Thread[2]: Now running on CPU0
Thread[2]: Calling pthread_barrier_wait()
Thread[6]: Now running on CPU0
Thread[4]: Now running on CPU2
Thread[4]: Calling pthread_barrier_wait()
Thread[5]: Calling pthread_barrier_wait()
Thread[6]: Calling pthread_barrier_wait()
Thread[7]: Calling pthread_barrier_wait()
Thread[8]: Calling pthread_barrier_wait()
Thread[1]: Back with ret=0 (I am not special)
Thread[3]: Back with ret=0 (I am not special)
Thread[2]: Back with ret=0 (I am not special)
Thread[4]: Back with ret=0 (I am not special)
Thread[5]: Back with ret=0 (I am not special)
Thread[8]: Back with ret=PTHREAD_BARRIER_SERIAL_THREAD (I AM SPECIAL)
Thread[6]: Back with ret=0 (I am not special)
Thread[7]: Back with ret=0 (I am not special)
Thread[1]: Now running on CPU0
Thread[3]: Now running on CPU1
Thread[4]: Now running on CPU1
Thread[5]: Now running on CPU1
Thread[6]: Now running on CPU2
Thread[8]: Now running on CPU3
Thread[1]: Now running on CPU3
Thread[3]: Now running on CPU0
Thread[7]: Now running on CPU0
Thread[4]: Now running on CPU3
Thread[5]: Now running on CPU3
Thread[6]: Now running on CPU0
Thread[8]: Now running on CPU2
Thread[1]: Now running on CPU0
Thread[3]: Now running on CPU3
Thread[7]: Now running on CPU2
Thread[2]: Now running on CPU2
Thread[4]: Now running on CPU0
Thread[8]: Now running on CPU0
Thread[1]: Now running on CPU1
Thread[5]: Now running on CPU0
Thread[7]: Now running on CPU0
Thread[2]: Now running on CPU0
Thread[6]: Now running on CPU1
Thread[4]: Now running on CPU3
Thread[3]: Now running on CPU1
Thread[8]: Now running on CPU3
Thread[1]: Now running on CPU3
Thread[5]: Now running on CPU3
Thread[7]: Now running on CPU2
Thread[6]: Now running on CPU2
Thread[3]: Now running on CPU2
Thread[8]: Now running on CPU1
Thread[4]: Now running on CPU1
Thread[1]: Now running on CPU0
Thread[2]: Now running on CPU3
Thread[5]: Now running on CPU2
Thread[6]: Now running on CPU0
Thread[7]: Now running on CPU0
Thread[3]: Now running on CPU3
Thread[8]: Now running on CPU0
Thread[1]: Now running on CPU2
Thread[2]: Now running on CPU2
Thread[5]: Now running on CPU1
Thread[7]: Now running on CPU2
Thread[4]: Now running on CPU2
Thread[3]: Now running on CPU2
Thread[8]: Now running on CPU1
Thread[1]: Now running on CPU0
Thread[6]: Now running on CPU3
Thread[5]: Now running on CPU3
Thread[7]: Now running on CPU0
Thread[3]: Now running on CPU1
Thread[2]: Now running on CPU0
Thread[8]: Now running on CPU0
Thread[1]: Now running on CPU2
Thread[2]: Done
Thread[6]: Now running on CPU2
Thread[4]: Now running on CPU1
Thread[7]: Now running on CPU3
Thread[3]: Now running on CPU2
Thread[8]: Now running on CPU1
Thread[2]: Now running on CPU1
Thread[1]: Now running on CPU1
Thread[5]: Now running on CPU0
Thread[7]: Now running on CPU0
Thread[3]: Now running on CPU1
Thread[7]: Done
Thread[8]: Now running on CPU2
Thread[3]: Done
Thread[4]: Now running on CPU2
Thread[6]: Done
Thread[8]: Done
Thread[4]: Done
Thread[7]: Now running on CPU2
Thread[3]: Now running on CPU2
Thread[4]: Now running on CPU1
Thread[1]: Now running on CPU3
Thread[1]: Done
  Main[0]: Thread 1 completed with result=0
  Main[0]: Now running on CPU1
  Main[0]: Thread 2 completed with result=0
Thread[5]: Done
  Main[0]: Now running on CPU0
  Main[0]: Thread 3 completed with result=0
Thread[5]: Now running on CPU1
  Main[0]: Thread 4 completed with result=0
  Main[0]: Thread 5 completed with result=0
  Main[0]: Thread 6 completed with result=0
  Main[0]: Thread 7 completed with result=0
  Main[0]: Thread 8 completed with result=0


nsh> ostest
stdio_test: write fd=1
stdio_test: Standard I/O Check: printf
stdio_test: write fd=2
stdio_test: Standard I/O Check: fprintf to stderr
ostest_main: putenv(Variable1=BadValue3)
ostest_main: setenv(Variable1, GoodValue1, TRUE)
ostest_main: setenv(Variable2, BadValue1, FALSE)
ostest_main: setenv(Variable2, GoodValue2, TRUE)
ostest_main: setenv(Variable3, GoodValue3, FALSE)
ostest_main: setenv(Variable3, BadValue2, FALSE)
show_variable: Variable=Variable1 has value=GoodValue1
show_variable: Variable=Variable2 has value=GoodValue2
show_variable: Variable=Variable3 has value=GoodValue3
ostest_main: Started user_main at PID=23
(..)
End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena    fbb20000 fbb20000
ordblks        15       15
mxordblk fbaf6c98 fbaf6c98
uordblks    10480    10480
fordblks fbb0fb80 fbb0fb80

user_main: barrier test
barrier_test: Initializing barrier
barrier_test: Thread 0 created
barrier_func: Thread 0 started
barrier_test: Thread 1 created
barrier_func: Thread 1 started
barrier_test: Thread 2 created
barrier_func: Thread 2 started
barrier_test: Thread 3 created
barrier_func: Thread 3 started
barrier_test: Thread 4 created
barrier_func: Thread 4 started
barrier_test: Thread 5 created
barrier_func: Thread 5 started
barrier_test: Thread 6 created
barrier_func: Thread 6 started
barrier_test: Thread 7 created
barrier_func: Thread 7 started
barrier_func: Thread 0 calling pthread_barrier_wait()
barrier_func: Thread 1 calling pthread_barrier_wait()
barrier_func: Thread 2 calling pthread_barrier_wait()
barrier_func: Thread 3 calling pthread_barrier_wait()
barrier_func: Thread 4 calling pthread_barrier_wait()
barrier_func: Thread 5 calling pthread_barrier_wait()
barrier_func: Thread 6 calling pthread_barrier_wait()
barrier_func: Thread 7 calling pthread_barrier_wait()
barrier_func: Thread 7, back with status=PTHREAD_BARRIER_SERIAL_THREAD (I AM SPECIAL)
barrier_func: Thread 0, back with status=0 (I am not special)
barrier_func: Thread 1, back with status=0 (I am not special)
barrier_func: Thread 2, back with status=0 (I am not special)
barrier_func: Thread 3, back with status=0 (I am not special)
barrier_func: Thread 4, back with status=0 (I am not special)
barrier_func: Thread 5, back with status=0 (I am not special)
barrier_func: Thread 6, back with status=0 (I am not special)
barrier_func: Thread 7 done
barrier_func: Thread 0 done
barrier_func: Thread 1 done
barrier_test: Thread 0 completed with result=0
barrier_test: Thread 1 completed with result=0
barrier_func: Thread 2 done
barrier_test: Thread 2 completed with result=0
barrier_func: Thread 3 done
barrier_test: Thread 3 completed with result=0
barrier_func: Thread 4 done
barrier_test: Thread 4 completed with result=0
barrier_func: Thread 5 done
barrier_test: Thread 5 completed with result=0
barrier_func: Thread 6 done
barrier_test: Thread 6 completed with result=0
barrier_test: Thread 7 completed with result=0

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena    fbb20000 fbb20000
ordblks        15       15
mxordblk fbaf6c98 fbaf6c98
uordblks    10480    10480
fordblks fbb0fb80 fbb0fb80

user_main: scheduler lock test
sched_lock: Starting lowpri_thread at 97
sched_lock: Set lowpri_thread priority to 97
sched_lock: Starting highpri_thread at 98
sched_lock: Set highpri_thread priority to 98
sched_lock: Waiting...
sched_lock: PASSED No pre-emption occurred while scheduler was locked.
sched_lock: Starting lowpri_thread at 97
sched_lock: Set lowpri_thread priority to 97
sched_lock: Starting highpri_thread at 98
sched_lock: Set highpri_thread priority to 98
sched_lock: Waiting...
sched_lock: PASSED No pre-emption occurred while scheduler was locked.
sched_lock: Finished

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena    fbb20000 fbb20000
ordblks        15       15
mxordblk fbaf6c98 fbaf6c98
uordblks    10480    10480
fordblks fbb0fb80 fbb0fb80

user_main: vfork() test
vfork_test: Child 455 ran successfully

user_main: smp call test
smp_call_test: Test start
smp_call_test: Call cpu 0, nowait
smp_call_test: Call cpu 0, wait
smp_call_test: Call cpu 1, nowait
smp_call_test: Call cpu 1, wait
smp_call_test: Call cpu 2, nowait
smp_call_test: Call cpu 2, wait
smp_call_test: Call cpu 3, nowait
smp_call_test: Call cpu 3, wait
smp_call_test: Call multi cpu, nowait
smp_call_test: Call in interrupt, wait
smp_call_test: Call multi cpu, wait
smp_call_test: Test success

Final memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena    fbb20000 fbb20000
ordblks         5       15
mxordblk fbb10fd0 fbaf6c98
uordblks     e128    10480
fordblks fbb11ed8 fbb0fb80
user_main: Exiting
ostest_main: Exiting with status 0


nsh> getprime
Set thread priority to 10
Set thread policy to SCHED_RR
Start thread #0
thread #0 started, looking for primes < 10000, doing 10 run(s)
thread #0 finished, found 1230 primes, last one was 9973
Done
getprime took 341 msec

@cederom
Copy link
Copy Markdown
Contributor

cederom commented May 11, 2026

In this approach looks like all cores will execute NuttX in parallel?

  for (uint8_t cpu = 0; cpu < CONFIG_SMP_NCPUS; cpu++)
    {
      putreg64((uint64_t)_start, BCM_SPINTBL_CPU(cpu));
    }

Shouldn't CPU0 launch the OS/RTOS and other CPUs stay idle and wait for tasking? :-)

reading more.. will update here with more details :-)

@linguini1
Copy link
Copy Markdown
Contributor Author

Pretty much what happens @cederom . However, CPU0 does all the initial boot and the remaining 3 cores then run in parallel after the early init function is called. This is why we see the garbled console output right at the beginning of the console launch.

From reading the code, my understanding is that the remaining 3 CPUs will hit a point in the init process where (based on their core number) run the idle thread instead of the actual OS. Only CPU0 continues into the application. But, this is why I'm wondering if it's better to make the other cores jump to the idle thread starting point instead, or where exactly they should jump to when booting (as opposed to _start, which works but isn't optimal afaik).

@cederom
Copy link
Copy Markdown
Contributor

cederom commented May 11, 2026

Never implemented SMP on NuttX so we both learn here :D

Maybe _start is needed for other CPUs initialization.

What I found in the docs:

This wiki document https://cwiki.apache.org/confluence/display/NUTTX/SMP mentions SMP implementation pretty well:

CONFIG_SMP_IDLETHREAD_STACKSIZE - Each CPU will have its own IDLE task. System initialization occurs on CPU0 and uses CONFIG_IDLETHREAD_STACKSIZE. This setting provides the stack size for the IDLE task on CPUS 1 through (CONFIG_SMP_NCPUS-1).

Let's take a look at other architectures implementation :-)

  1. arch/arm64/fvp-v8r:
  1. arch/arm64/imx8:
  • does not have smp init in the boot.c
    void arm64_chip_boot(void)
  • does not seem to have smp specific code, but uses ARCH_HAVE_MULTICPU:
    % pwd
    /XXX/nuttx.git/arch/arm64/src/imx8
    
    % grep -ri smp *
    % grep -ri cpu *
    Kconfig:	select ARCH_HAVE_MULTICPU
    
    % pwd
    /XXX/nuttx.git/boards/arm64/imx8
    
    % grep -ri smp *
    % grep -ri cpu *
    imx8qm-mek/scripts/Make.defs:CFLAGS := $(ARCHCFLAGS) $(ARCHOPTIMIZATION) $(ARCHCPUFLAGS) $(ARCHINCLUDES) $(ARCHDEFINES) $(EXTRAFLAGS)
    imx8qm-mek/scripts/Make.defs:CXXFLAGS := $(ARCHCXXFLAGS) $(ARCHOPTIMIZATION) $(ARCHCPUFLAGS) $(ARCHXXINCLUDES) $(ARCHDEFINES) $(EXTRAFLAGS)
    
  1. arch/arm64/imx9:
  • does not init cpus in the boot
    void arm64_chip_boot(void)
  • let's search for smp and multicpu:
    % pwd
    /XXX/nuttx.git/arch/arm64/src/imx9
    
    % grep -ri smp *
    /imx9_flexcan.h:#define CAN_CTRL1_SMP              (1 << 7)  /* Bit 7:  CAN Bit Sampling */
    hardware/imx9_usdhc.h:#define USDHC_AC12ERR_SMP_CLK_SEL        (1 << 23)    /* Bit 23: Sample clock sel */
    hardware/imx9_usdhc.h:#define USDHC_MC_SMP_CLK_SEL             (1 << 23)    /* Bit 23: SMP Clock Sel */
    imx9_boot.c:#ifdef CONFIG_SMP
    imx9_boot.c:#endif /* CONFIG_SMP */
    imx9_usbdev.c:#ifdef CONFIG_SMP
    imx9_usbdev.c:#ifdef CONFIG_SMP
    imx9_usbdev.c:#ifdef CONFIG_SMP
    
    % grep -ri multicpu *
    Kconfig:	select ARCH_HAVE_MULTICPU
    Kconfig:	select ARCH_HAVE_MULTICPU
    
    % pwd
    /XXX/nuttx.git/boards/arm64/imx9
    
    % grep -ri smp *
    % grep -ri multicpu *
    

Hmm, looks like smp related code is somewhere else, let's take a look at arch/arm64/src/common:

% pwd
/XXX/nuttx.git/arch/arm64/src/common

% grep -ri multicpu *

% grep -ri smp *
arm64_arch.h:#ifdef CONFIG_SMP
arm64_arch.h:#ifdef CONFIG_SMP
arm64_arch.h:#ifdef CONFIG_SMP
arm64_arch.h:#endif /* CONFIG_SMP */
arm64_arch.h:#ifdef CONFIG_SMP
arm64_backtrace.c: *   1. Tcb have to be self or not-running.  In SMP case, the running task
arm64_cpuidlestack.c:#include "arm64_smp.h"
arm64_cpuidlestack.c: *                  CONFIG_SMP_STACK_SIZE.
arm64_cpuidlestack.c:#if CONFIG_SMP_NCPUS > 1
arm64_cpuidlestack.c:  DEBUGASSERT(cpu > 0 && cpu < CONFIG_SMP_NCPUS && tcb != NULL &&
arm64_cpuidlestack.c:              stack_size <= SMP_STACK_SIZE);
arm64_cpuidlestack.c:  tcb->adj_stack_size  = SMP_STACK_SIZE;
arm64_cpustart.c:#include "arm64_smp.h"
arm64_cpustart.c:uint64_t *const g_cpu_int_stacktop[CONFIG_SMP_NCPUS] =
arm64_cpustart.c:#if CONFIG_SMP_NCPUS > 1
arm64_cpustart.c:#if CONFIG_SMP_NCPUS > 2
arm64_cpustart.c:#if CONFIG_SMP_NCPUS > 3
arm64_cpustart.c:#if CONFIG_SMP_NCPUS > 4
arm64_cpustart.c:#if CONFIG_SMP_NCPUS > 5
arm64_cpustart.c:#  error This logic needs to extended for CONFIG_SMP_NCPUS > 5
arm64_cpustart.c:#endif /* CONFIG_SMP_NCPUS > 5 */
arm64_cpustart.c:#endif /* CONFIG_SMP_NCPUS > 4 */
arm64_cpustart.c:#endif /* CONFIG_SMP_NCPUS > 3 */
arm64_cpustart.c:#endif /* CONFIG_SMP_NCPUS > 2 */
arm64_cpustart.c:#endif /* CONFIG_SMP_NCPUS > 1 */
arm64_cpustart.c:uint32_t g_smp_busy_wait_flag;
arm64_cpustart.c:uint64_t *const g_cpu_int_fiq_stacktop[CONFIG_SMP_NCPUS] =
arm64_cpustart.c:#if CONFIG_SMP_NCPUS > 1
arm64_cpustart.c:#if CONFIG_SMP_NCPUS > 2
arm64_cpustart.c:#if CONFIG_SMP_NCPUS > 3
arm64_cpustart.c:#if CONFIG_SMP_NCPUS > 4
arm64_cpustart.c:#if CONFIG_SMP_NCPUS > 5
arm64_cpustart.c:#  error This logic needs to extended for CONFIG_SMP_NCPUS > 5
arm64_cpustart.c:#endif /* CONFIG_SMP_NCPUS > 5 */
arm64_cpustart.c:#endif /* CONFIG_SMP_NCPUS > 4 */
arm64_cpustart.c:#endif /* CONFIG_SMP_NCPUS > 3 */
arm64_cpustart.c:#endif /* CONFIG_SMP_NCPUS > 2 */
arm64_cpustart.c:#endif /* CONFIG_SMP_NCPUS > 1 */
arm64_cpustart.c:static void arm64_smp_init_top(void)
arm64_cpustart.c: *   In an SMP configuration, only one CPU is initially active (CPU 0).
arm64_cpustart.c: *   CPU has been started, 1 through (CONFIG_SMP_NCPUS-1).
arm64_cpustart.c: *         value in the range of one to (CONFIG_SMP_NCPUS-1).
arm64_cpustart.c:  DEBUGASSERT(cpu >= 0 && cpu < CONFIG_SMP_NCPUS && cpu != this_cpu());
arm64_cpustart.c:#ifdef CONFIG_SMP
arm64_cpustart.c:  uint32_t *address = &g_smp_busy_wait_flag;
arm64_cpustart.c:  arm64_smp_init_top();
arm64_doirq.c:#ifdef CONFIG_SMP
arm64_doirq.c:  for (cpu = 0; cpu < CONFIG_SMP_NCPUS; cpu++)
arm64_fpu.c:#define FPU_PROC_LINELEN    (64 * CONFIG_SMP_NCPUS)
arm64_fpu.c:static struct arm64_cpu_fpu_context g_cpu_fpu_ctx[CONFIG_SMP_NCPUS];
arm64_fpu.c:  for (i = 0; i < CONFIG_SMP_NCPUS; i++)
arm64_getintstack.c:#if !defined(CONFIG_SMP) && CONFIG_ARCH_INTERRUPTSTACK > 7
arm64_gic.h:#  define GIC_SMP_SCHED             GIC_IRQ_SGI9
arm64_gic.h:#  define GIC_SMP_CPUSTART          GIC_IRQ_SGI10
arm64_gic.h:#  define GIC_SMP_CALL              GIC_IRQ_SGI11
arm64_gic.h:#  define GIC_SMP_SCHED             GIC_IRQ_SGI1
arm64_gic.h:#  define GIC_SMP_CPUSTART          GIC_IRQ_SGI2
arm64_gic.h:#  define GIC_SMP_CALL              GIC_IRQ_SGI3
arm64_gic.h:#ifdef CONFIG_SMP
arm64_gic.h: * Name: arm64_smp_sched_handler
arm64_gic.h:int arm64_smp_sched_handler(int irq, void *context, void *arg);
arm64_gicv2.c: * NOTE: If CONFIG_SMP is enabled then SGI1 and SGI2 are used for inter-CPU
arm64_gicv2.c: *   Perform a Software Generated Interrupt (SGI).  If CONFIG_SMP is
arm64_gicv2.c: *   If CONFIG_SMP is not selected, the cpuset is ignored and SGI is sent
arm64_gicv2.c:#ifdef CONFIG_SMP
arm64_gicv2.c:       * is not mandatory in the GICv2 specification, but for SMP scenarios,
arm64_gicv2.c:       * SMP scenario.
arm64_gicv2.c:#ifdef CONFIG_SMP
arm64_gicv2.c:  DEBUGVERIFY(irq_attach(GIC_SMP_SCHED, arm64_smp_sched_handler, NULL));
arm64_gicv2.c:  DEBUGVERIFY(irq_attach(GIC_SMP_CALL, nxsched_smp_call_handler, NULL));
arm64_gicv2.c: *   Perform a Software Generated Interrupt (SGI).  If CONFIG_SMP is
arm64_gicv2.c: *   If CONFIG_SMP is not selected, the cpuset is ignored and SGI is sent
arm64_gicv2.c:#ifdef CONFIG_SMP
arm64_gicv2.c:#endif /* CONFIG_SMP */
arm64_gicv3.c:static unsigned long g_gic_rdists[CONFIG_SMP_NCPUS];
arm64_gicv3.c:#ifdef CONFIG_SMP
arm64_gicv3.c:  DEBUGVERIFY(irq_attach(GIC_SMP_SCHED, arm64_smp_sched_handler, NULL));
arm64_gicv3.c:  DEBUGVERIFY(irq_attach(GIC_SMP_CALL, nxsched_smp_call_handler, NULL));
arm64_gicv3.c: *   Perform a Software Generated Interrupt (SGI).  If CONFIG_SMP is
arm64_gicv3.c: *   If CONFIG_SMP is not selected, the cpuset is ignored and SGI is sent
arm64_gicv3.c:#ifdef CONFIG_SMP
arm64_gicv3.c:  up_enable_irq(GIC_SMP_SCHED);
arm64_gicv3.c:  up_enable_irq(GIC_SMP_CALL);
arm64_gicv3.c:#ifdef CONFIG_SMP
arm64_head.S:#ifdef CONFIG_SMP
arm64_head.S:    ldr    x2, =g_smp_busy_wait_flag
arm64_head.S:    add    x24, x24, #(SMP_STACK_SIZE)
arm64_head.S:    ldr    w1, =SMP_STACK_WORDS
arm64_head.S:    add    x24, x24, #(SMP_STACK_SIZE)
arm64_head.S:    /* In some case, we need to boot one core in a SMP system,
arm64_head.S:#endif /* CONFIG_SMP */
arm64_hwdebug.c:static struct arm64_debug_s g_arm64_debug[CONFIG_SMP_NCPUS];
arm64_initialize.c:#ifdef CONFIG_SMP
arm64_initialize.c:INIT_STACK_ARRAY_DEFINE(g_cpu_idlestackalloc, CONFIG_SMP_NCPUS,
arm64_initialize.c:                          SMP_STACK_SIZE);
arm64_initialize.c:INIT_STACK_ARRAY_DEFINE(g_interrupt_stacks, CONFIG_SMP_NCPUS,
arm64_initialize.c:INIT_STACK_ARRAY_DEFINE(g_interrupt_fiq_stacks, CONFIG_SMP_NCPUS,
arm64_initialize.c:#ifdef CONFIG_SMP
arm64_initialstate.c:#ifdef CONFIG_SMP
arm64_internal.h:#ifdef CONFIG_SMP
arm64_internal.h:#  define SMP_STACK_SIZE    STACKFRAME_ALIGN_UP(CONFIG_IDLETHREAD_STACKSIZE)
arm64_internal.h:#  define SMP_STACK_WORDS   (SMP_STACK_SIZE >> 2)
arm64_internal.h:#ifdef CONFIG_SMP
arm64_internal.h:INIT_STACK_ARRAY_DEFINE_EXTERN(g_cpu_idlestackalloc, CONFIG_SMP_NCPUS,
arm64_internal.h:                          SMP_STACK_SIZE);
arm64_internal.h:INIT_STACK_ARRAY_DEFINE_EXTERN(g_interrupt_stacks, CONFIG_SMP_NCPUS,
arm64_internal.h:INIT_STACK_ARRAY_DEFINE_EXTERN(g_interrupt_fiq_stacks, CONFIG_SMP_NCPUS,
arm64_mpu.c:static unsigned int g_mpu_region[CONFIG_SMP_NCPUS];
arm64_mpu.h:#ifdef CONFIG_SMP
arm64_sigdeliver.c:#ifdef CONFIG_SMP
arm64_sigdeliver.c:  /* In the SMP case, we must terminate the critical section while the signal
arm64_sigdeliver.c:#ifdef CONFIG_SMP
arm64_sigdeliver.c:  /* In the SMP case, up_schedule_sigaction(0) will have incremented
arm64_sigdeliver.c:#endif /* CONFIG_SMP */
arm64_sigdeliver.c:#ifdef CONFIG_SMP
arm64_sigdeliver.c:#ifdef CONFIG_SMP
arm64_sigdeliver.c:#ifdef CONFIG_SMP
arm64_smp.h: * arch/arm64/src/common/arm64_smp.h
arm64_smp.h:#ifndef __ARCH_ARM64_SRC_COMMON_ARM64_SMP_H
arm64_smp.h:#define __ARCH_ARM64_SRC_COMMON_ARM64_SMP_H
arm64_smp.h:#ifdef CONFIG_SMP
arm64_smp.h: * Name: arm64_enable_smp
arm64_smp.h:void arm64_enable_smp(int cpu);
arm64_smp.h:#endif /* CONFIG_SMP */
arm64_smp.h:#endif /* __ARCH_ARM64_SRC_COMMON_ARM64_SMP_H */
arm64_smpcall.c: * arch/arm64/src/common/arm64_smpcall.c
arm64_smpcall.c: * Name: arm64_smp_sched_handler
arm64_smpcall.c:int arm64_smp_sched_handler(int irq, void *context, void *arg)
arm64_smpcall.c: * Name: up_send_smp_sched
arm64_smpcall.c:int up_send_smp_sched(int cpu)
arm64_smpcall.c:  arm64_gic_raise_sgi(GIC_SMP_SCHED, (1 << cpu));
arm64_smpcall.c: * Name: up_send_smp_call
arm64_smpcall.c: *   Send smp call to target cpu.
arm64_smpcall.c:void up_send_smp_call(cpu_set_t cpuset)
arm64_smpcall.c:  up_trigger_irq(GIC_SMP_CALL, cpuset);
arm64_vectors.S:#ifdef CONFIG_SMP
arm64_vectors.S:#ifdef CONFIG_SMP
arm64_vectors.S:#ifdef CONFIG_SMP
CMakeLists.txt:if(CONFIG_SMP)
CMakeLists.txt:  list(APPEND SRCS arm64_smpcall.c)
Make.defs:ifeq ($(CONFIG_SMP),y)
Make.defs:CMN_CSRCS += arm64_smpcall.c

There is is, let's take a look :-)

ifeq ($(CONFIG_SMP),y)
CMN_CSRCS += arm64_cpuidlestack.c arm64_cpustart.c
CMN_CSRCS += arm64_smpcall.c
endif

If we look at

/****************************************************************************
* Public Functions
****************************************************************************/
/****************************************************************************
* Name: up_cpu_start
*
* Description:
* In an SMP configuration, only one CPU is initially active (CPU 0).
* System initialization occurs on that single thread. At the completion of
* the initialization of the OS, just before beginning normal multitasking,
* the additional CPUs would be started by calling this function.
*
* Each CPU is provided the entry point to its IDLE task when started. A
* TCB for each CPU's IDLE task has been initialized and placed in the
* CPU's g_assignedtasks[cpu] list. No stack has been allocated or
* initialized.
*
* The OS initialization logic calls this function repeatedly until each
* CPU has been started, 1 through (CONFIG_SMP_NCPUS-1).
*
* Input Parameters:
* cpu - The index of the CPU being started. This will be a numeric
* value in the range of one to (CONFIG_SMP_NCPUS-1).
* (CPU 0 is already active)
*
* Returned Value:
* Zero on success; a negated errno value on failure.
*
****************************************************************************/
int up_cpu_start(int cpu)
{
DEBUGASSERT(cpu >= 0 && cpu < CONFIG_SMP_NCPUS && cpu != this_cpu());
#ifdef CONFIG_SCHED_INSTRUMENTATION
/* Notify of the start event */
sched_note_cpu_start(this_task(), cpu);
#endif
#ifdef CONFIG_SMP
uint32_t *address = &g_smp_busy_wait_flag;
*address = 1;
up_flush_dcache((uintptr_t)address, (uintptr_t)address + sizeof(address));
#endif
arm64_start_cpu(cpu);
return 0;
}
/* the C entry of secondary cores */
void arm64_boot_secondary_c_routine(void)
{
#ifdef CONFIG_ARCH_HAVE_MPU
arm64_mpu_init(false);
#endif
#ifdef CONFIG_ARCH_HAVE_MMU
arm64_mmu_init(false);
#endif
/* We need to confirm that current_task has been initialized. */
while (!current_task(this_cpu()));
/* Init idle task to percpu reg */
up_update_task(current_task(this_cpu()));
arm64_gic_secondary_init();
arm64_timer_secondary_init();
arm64_smp_init_top();
}

and search who is using up_cpu_start(int cpu) we have:

  • Information from arch/sim information that up_cpu_idlestack() is called before up_cpu_start() but this probably already in the code

    /****************************************************************************
    * Name: up_cpu_idlestack
    *
    * Description:
    * Allocate a stack for the CPU[n] IDLE task (n > 0) if appropriate and
    * setup up stack-related information in the IDLE task's TCB. This
    * function is always called before up_cpu_start(). This function is
    * only called for the CPU's initial IDLE task; up_create_task is used for
    * all normal tasks, pthreads, and kernel threads for all CPUs.
    *
    * The initial IDLE task is a special case because the CPUs can be started
    * in different wans in different environments:
    *
    * 1. The CPU may already have been started and waiting in a low power
    * state for up_cpu_start(). In this case, the IDLE thread's stack
    * has already been allocated and is already in use. Here
    * up_cpu_idlestack() only has to provide information about the
    * already allocated stack.
    *
    * 2. The CPU may be disabled but started when up_cpu_start() is called.
    * In this case, a new stack will need to be created for the IDLE
    * thread and this function is then equivalent to:
    *
    * return up_create_stack(tcb, stack_size, TCB_FLAG_TTYPE_KERNEL);
    *
    * The following TCB fields must be initialized by this function:
    *
    * - adj_stack_size: Stack size after adjustment for hardware, processor,
    * etc. This value is retained only for debug purposes.
    * - stack_alloc_ptr: Pointer to allocated stack
    * - stack_base_ptr: Adjusted stack base pointer after the TLS Data and
    * Arguments has been removed from the stack allocation.
    *
    * Input Parameters:
    * - cpu: CPU index that indicates which CPU the IDLE task is
    * being created for.
    * - tcb: The TCB of new CPU IDLE task
    * - stack_size: The requested stack size for the IDLE task. At least
    * this much must be allocated. This should be
    * CONFIG_IDLETHREAD_STACKSIZE.
    *
    ****************************************************************************/
    int up_cpu_idlestack(int cpu, struct tcb_s *tcb, size_t stack_size)
    {
    return up_create_stack(tcb, stack_size, TCB_FLAG_TTYPE_KERNEL);
    }

  • So let's take a look on how arch/armv7-m does it

    /****************************************************************************
    * Name: up_cpu_start
    *
    * Description:
    * In an SMP configuration, only one CPU is initially active (CPU 0).
    * System initialization occurs on that single thread. At the completion of
    * the initialization of the OS, just before beginning normal multitasking,
    * the additional CPUs would be started by calling this function.
    *
    * Each CPU is provided the entry point to its IDLE task when started. A
    * TCB for each CPU's IDLE task has been initialized and placed in the
    * CPU's g_assignedtasks[cpu] list. No stack has been allocated or
    * initialized.
    *
    * The OS initialization logic calls this function repeatedly until each
    * CPU has been started, 1 through (CONFIG_SMP_NCPUS-1).
    *
    * Input Parameters:
    * cpu - The index of the CPU being started. This will be a numeric
    * value in the range of one to (CONFIG_SMP_NCPUS-1).
    * (CPU 0 is already active)
    *
    * Returned Value:
    * Zero on success; a negated errno value on failure.
    *
    ****************************************************************************/
    int weak_function up_cpu_start(int cpu)
    {
    sinfo("Starting CPU%d\n", cpu);
    DEBUGASSERT(cpu >= 0 && cpu < CONFIG_SMP_NCPUS && cpu != this_cpu());
    #ifdef CONFIG_SCHED_INSTRUMENTATION
    /* Notify of the start event */
    sched_note_cpu_start(this_task(), cpu);
    #endif
    /* Execute SGI1 */
    arm_cpu_sgi(GIC_SMP_CPUSTART, (1 << cpu));
    return OK;
    }
    #endif /* CONFIG_SMP */

  • Let's take a look on how arch/risc-v does it

    /****************************************************************************
    * Name: up_cpu_start
    *
    * Description:
    * In an SMP configuration, only one CPU is initially active (CPU 0).
    * System initialization occurs on that single thread. At the completion of
    * the initialization of the OS, just before beginning normal multitasking,
    * the additional CPUs would be started by calling this function.
    *
    * Each CPU is provided the entry point to its IDLE task when started. A
    * TCB for each CPU's IDLE task has been initialized and placed in the
    * CPU's g_assignedtasks[cpu] list. No stack has been allocated or
    * initialized.
    *
    * The OS initialization logic calls this function repeatedly until each
    * CPU has been started, 1 through (CONFIG_SMP_NCPUS-1).
    *
    * Input Parameters:
    * cpu - The index of the CPU being started. This will be a numeric
    * value in the range of one to (CONFIG_SMP_NCPUS-1).
    * (CPU 0 is already active)
    *
    * Returned Value:
    * Zero on success; a negated errno value on failure.
    *
    ****************************************************************************/
    int up_cpu_start(int cpu)
    {
    sinfo("CPU=%d\n", cpu);
    #ifdef CONFIG_SCHED_INSTRUMENTATION
    /* Notify of the start event */
    sched_note_cpu_start(this_task(), cpu);
    #endif
    /* Send IPI to CPU(cpu) */
    riscv_ipi_send(cpu);
    return 0;
    }

  • Let's take a look at arch/xtensa/esp32s3 where chip specific code is

    /****************************************************************************
    * Name: up_cpu_start
    *
    * Description:
    * In an SMP configuration, only one CPU is initially active (CPU 0).
    * System initialization occurs on that single thread. At the completion of
    * the initialization of the OS, just before beginning normal multitasking,
    * the additional CPUs would be started by calling this function.
    *
    * Each CPU is provided the entry point to its IDLE task when started. A
    * TCB for each CPU's IDLE task has been initialized and placed in the
    * CPU's g_assignedtasks[cpu] list. No stack has been allocated or
    * initialized.
    *
    * The OS initialization logic calls this function repeatedly until each
    * CPU has been started, 1 through (CONFIG_SMP_NCPUS-1).
    *
    * Input Parameters:
    * cpu - The index of the CPU being started. This will be a numeric
    * value in the range of one to (CONFIG_SMP_NCPUS-1).
    * (CPU 0 is already active)
    *
    * Returned Value:
    * Zero on success; a negated errno value on failure.
    *
    ****************************************************************************/
    int up_cpu_start(int cpu)
    {
    uint32_t regval;
    DEBUGASSERT(cpu >= 0 && cpu < CONFIG_SMP_NCPUS && cpu != this_cpu());
    /* Start CPU1 */
    sinfo("Starting CPU%d\n", cpu);
    #ifdef CONFIG_SCHED_INSTRUMENTATION
    /* Notify of the start event */
    sched_note_cpu_start(this_task(), cpu);
    #endif
    /* OpenOCD might have already enabled clock gating and taken APP CPU
    * out of reset. Don't reset the APP CPU if that's the case as this
    * will clear the breakpoints that may have already been set.
    */
    regval = getreg32(SYSTEM_CORE_1_CONTROL_0_REG);
    if ((regval & SYSTEM_CONTROL_CORE_1_CLKGATE_EN) == 0)
    {
    regval = getreg32(RTC_CNTL_SW_CPU_STALL_REG);
    regval &= ~RTC_CNTL_SW_STALL_APPCPU_C1_M;
    putreg32(regval, RTC_CNTL_SW_CPU_STALL_REG);
    regval = getreg32(RTC_CNTL_OPTIONS0_REG);
    regval &= ~RTC_CNTL_SW_STALL_APPCPU_C0_M;
    putreg32(regval, RTC_CNTL_OPTIONS0_REG);
    /* Enable clock gating for the APP CPU */
    regval = getreg32(SYSTEM_CORE_1_CONTROL_0_REG);
    regval |= SYSTEM_CONTROL_CORE_1_CLKGATE_EN;
    putreg32(regval, SYSTEM_CORE_1_CONTROL_0_REG);
    regval = getreg32(SYSTEM_CORE_1_CONTROL_0_REG);
    regval &= ~SYSTEM_CONTROL_CORE_1_RUNSTALL;
    putreg32(regval, SYSTEM_CORE_1_CONTROL_0_REG);
    /* Reset the APP CPU */
    regval = getreg32(SYSTEM_CORE_1_CONTROL_0_REG);
    regval |= SYSTEM_CONTROL_CORE_1_RESETING;
    putreg32(regval, SYSTEM_CORE_1_CONTROL_0_REG);
    regval = getreg32(SYSTEM_CORE_1_CONTROL_0_REG);
    regval &= ~SYSTEM_CONTROL_CORE_1_RESETING;
    putreg32(regval, SYSTEM_CORE_1_CONTROL_0_REG);
    }
    /* Set the CPU1 start address */
    ets_set_appcpu_boot_addr((uint32_t)xtensa_appcpu_start);
    /* And wait until the APP CPU starts */
    while (!g_appcpu_started);
    /* prev cpu boot done */
    return OK;
    }

Okay so long story short it seems that SMP is done somewhere around arch/arm64/src/common/arm64_cpustart.c when CONFIG_SMP is defined, then according to CONFIG_SMP_NCPUS CPU0 becomes the "main" application CPU and other CPUs are initialized as "secondary" and put to idle.

If that does not work out-of-the-box then probably we need some custom pieces in arch/arm64/src/bcm2711/bcm2711_cpustart.c just to follow existing order (i.e. register in scheduler, allocate resources) but also work on the underlying specific hardware ?

Also CONFIG_SCHED_INSTRUMENTATION may be our friend here :-)

Does that make sense? :-P

@linguini1
Copy link
Copy Markdown
Contributor Author

Makes sense! I did try just letting the existing arm64 logic handle it, but it didn't work properly. I'll take a look at the rest of the implementations you linked and try to follow those instead, with the scheduling profiler enabled to see what's going on.

Copy link
Copy Markdown
Contributor

@hartmannathan hartmannathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I can't wait to try it!

@cederom
Copy link
Copy Markdown
Contributor

cederom commented May 12, 2026

Some hints also here https://cwiki.apache.org/confluence/display/NUTTX/NuttX+Initialization+Sequence :-)

@hartmannathan
Copy link
Copy Markdown
Contributor

Some hints also here https://cwiki.apache.org/confluence/display/NUTTX/NuttX+Initialization+Sequence :-)

That's funny, I don't see that in the Documentation in the repository. Was this one never migrated from the CWIKI to Documentation?

(It also contains a little bit of obsolete info, like board_app_initialize() which was recently replaced with board_late_initialize().)

@cederom
Copy link
Copy Markdown
Contributor

cederom commented May 13, 2026

Yeah we need to just copy paste missing stuff and then update whole thing.. in a "free moment".. but you know.. new board.. some project.. etc etc.. there not much "free moments" when you get old :-P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arch: arm64 Issues related to ARM64 (64-bit) architecture Area: Documentation Improvements or additions to documentation Board: arm64 Size: M The size of the change in this PR is medium

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[FEATURE] Implement RPi 4B SMP

5 participants