boards/raspberrypi-4b: Implement SMP support#18861
Conversation
|
|
||
| for (uint8_t cpu = 0; cpu < CONFIG_SMP_NCPUS; cpu++) | ||
| { | ||
| putreg64((uint64_t)_start, BCM_SPINTBL_CPU(cpu)); |
There was a problem hiding this comment.
@xiaoxiang781216 has Xiaomi ever used a device with spin tables for SMP? Could you inform me if this is a good approach, or if there is a better starting location to jump to after CPU0 has initialized memory and subsystems?
There was a problem hiding this comment.
@simbit18 @pussuw @mzanders @tmedicci @fdcavalcanti @raiden00pl @acassis could you please take a look if smp init is in the right place? :-)
There was a problem hiding this comment.
@linguini1 please take a look at a bit lengthy search in the PR comments, looks like there is instrumentation already, we need to set CONFIG_SMP_NCPUS in the configuration? If that does not work then probably this arch specific code would go to arch/arm64/bcm2711/bcm2711_cpustart.c ? :-)
Also CONFIG_SCHED_INSTRUMENTATION may be our friend here :-)
There was a problem hiding this comment.
Just setting NCPUs did not work before, but I will try with custom stuff in cpustart.c!
|
Thank you @linguini1 will take a look and test on hardware in few hours :-) I also found and ordered ESP32-P4-Nano board for ~25EUR so we will be able to test this hw too :-) |
|
Awesome news! Thank you! |
Adds SMP support and an `smp` configuration for the RPi4B. Signed-off-by: Matteo Golin <matteo.golin@gmail.com>
Documents the new SMP configuration. Signed-off-by: Matteo Golin <matteo.golin@gmail.com>
|
Okay here goes the testing :-) All seems to work good :-) Great work @linguini1 =) We may want to put some more benchmarks into this config to see what is the performance gain with smp? :-) I need to get some sleep, tomorrow will try to read more about this |
|
In this approach looks like all cores will execute NuttX in parallel? for (uint8_t cpu = 0; cpu < CONFIG_SMP_NCPUS; cpu++)
{
putreg64((uint64_t)_start, BCM_SPINTBL_CPU(cpu));
}Shouldn't CPU0 launch the OS/RTOS and other CPUs stay idle and wait for tasking? :-) reading more.. will update here with more details :-) |
|
Pretty much what happens @cederom . However, CPU0 does all the initial boot and the remaining 3 cores then run in parallel after the early init function is called. This is why we see the garbled console output right at the beginning of the console launch. From reading the code, my understanding is that the remaining 3 CPUs will hit a point in the init process where (based on their core number) run the idle thread instead of the actual OS. Only CPU0 continues into the application. But, this is why I'm wondering if it's better to make the other cores jump to the idle thread starting point instead, or where exactly they should jump to when booting (as opposed to |
|
Never implemented SMP on NuttX so we both learn here :D Maybe What I found in the docs:
This wiki document https://cwiki.apache.org/confluence/display/NUTTX/SMP mentions SMP implementation pretty well:
Let's take a look at other architectures implementation :-)
Hmm, looks like smp related code is somewhere else, let's take a look at arch/arm64/src/common: There is is, let's take a look :-) nuttx/arch/arm64/src/common/Make.defs Lines 98 to 101 in b17e448 If we look at nuttx/arch/arm64/src/common/arm64_cpustart.c Lines 161 to 239 in b17e448 and search who is using
Okay so long story short it seems that SMP is done somewhere around If that does not work out-of-the-box then probably we need some custom pieces in Also Does that make sense? :-P |
|
Makes sense! I did try just letting the existing arm64 logic handle it, but it didn't work properly. I'll take a look at the rest of the implementations you linked and try to follow those instead, with the scheduling profiler enabled to see what's going on. |
hartmannathan
left a comment
There was a problem hiding this comment.
Thanks for working on this. I can't wait to try it!
|
Some hints also here https://cwiki.apache.org/confluence/display/NUTTX/NuttX+Initialization+Sequence :-) |
That's funny, I don't see that in the Documentation in the repository. Was this one never migrated from the CWIKI to Documentation? (It also contains a little bit of obsolete info, like board_app_initialize() which was recently replaced with board_late_initialize().) |
|
Yeah we need to just copy paste missing stuff and then update whole thing.. in a "free moment".. but you know.. new board.. some project.. etc etc.. there not much "free moments" when you get old :-P |
Summary
Implements SMP support for the Raspberry Pi 4B on all four cores using the SMP
spin tables on the BCM2711.
Impact
Users can leverage all four cores on NuttX now! 4x more powerful.
Closes #16954.
Part of GSoC #18507!
Testing
OSTest and the SMP test run successfully on the RPi4B. However, there are artifacts in the early boot logging information on startup. I suspect this is because all cores are starting from
_startand competing to print the early output before spinlocks are used to protect the console.logs coming soon
Any advice on a better jump point to start the cores is appreciated! Not sure if there are any other examples of spin-table SMP devices in the NuttX tree (or at least, I haven't found them).