12-对part10-multicore的翻译和搬运

非黑色字体均为我自己添加
图均为原文所有
原文在文章末尾
使用多个CPU核
我建议在我们的主核运行时使用第二个CPU核来播放音频而不是使用DMA来做搬运.我同样说过在树莓派上这很难,它确实是这样.
我写代码时参考了 Sergey Matyukevich的工作(https://github.com/s-matyukevich/raspberry-pi-os/tree/master/src/lesson02),我非常的感谢它.它需要一些改动来保证第二个核在正确的时间启动.这段代码并不是特别"安全",但是足以在原则上证明这个概念.
你需要修改你的SD卡上的config.txt 文件来包括这些行:
kernel_old=1
disable_commandline_tags=1
arm_64bit=1
可能这里最重要的是 kernel_old=1 这条指令.它告诉引导程序(bootloader)期望的内核偏移量是0x00000而不是0x80000.同样的,我们需要从 link.ld 中移除这一行:
. = 0x80000; /* Kernel load address for AArch64 */
它同样不会在启动的时候为我们锁住第二个核,所以我们仍然可以访问它们(稍后会详细介绍).
设置主计时器
还有一些其他的设置是我们现在需要注意的 -- 建立主计时器.我们在 boot.S 的顶部加入如下 #define 块:
#define LOCAL_CONTROL 0xff800000
#define LOCAL_PRESCALER 0xff800008
#define OSC_FREQ 54000000
#define MAIN_STACK 0x400000
LOCAL_CONTROL 是 ARM_CONTROL 寄存器的地址. 在我们 _start 节的顶部设置其为0,有效的告诉ARM主计时器使用晶振作为时钟源,并且设置增量为1.
ldr x0, =LOCAL_CONTROL // Sort out the timer
str wzr, [x0]
我们继续设置分频 -- 把它想成等效的时钟分频.设置它会有效的把除数设为1(也就是它不起作用):
mov w1, 0x80000000
str w1, [x0, #(LOCAL_PRESCALER - LOCAL_CONTROL)]
你需要记住part9中期望的振荡频率是54MHz.我们使用以下行来设置它:
ldr x0, =OSC_FREQ
msr cntfrq_el0, x0
msr cntvoff_el2, xzr
我们的定时器现在是我们所需要的了.
启动主核
我们向往常一样继续检查我们的处理器ID.如果它是0那么我们就在主核上,然后我们向label2跳转.这回我们要稍微不同地设置堆栈指针.我们不能在我们的代码里面设置它,因为他现在在0x00000!相反,我们使用在之前顶部定义好的 MAIN_STACK 地址.
// Set stack to start somewhere safe
mov sp, #MAIN_STACK
我们向往常一样继续清理BBS,并且跳转到我们C代码中的main() 函数.如果它恰好返回了,我们就跳回1,然后挂起核.
设置辅助核
之前我们明确的通过label1中的无线循环来挂起其他的内核.相反,现在每个内核都将在指定的内存地址观察到一个值.在 boot.S 里面被初始化为0,并且被命名为 spin_cpu0-3.如果它的值变为非0,它就被这个信号唤醒,然后跳转到那个内存位置,执行那里的任何代码.一旦那段代码返回了,我们开始循环并且再次观察所有值.
adr x5, spin_cpu0 // Base watch address
1: wfe
ldr x4, [x5, x1, lsl #3] // Add (8 * core_number) to the base address and load what's there into x4
cbz x4, 1b // Loop if zero, otherwise continue
ldr x2, =__stack_start // Get ourselves a fresh stack - location depends on CPU core asking
lsl x1, x1, #9 // Multiply core_number by 512
add x3, x2, x1 // Add to the address
mov sp, x3
mov x0, #0 // Zero registers x0-x3, just in case
mov x1, #0
mov x2, #0
mov x3, #0
br x4 // Run the code at the address in x4
b 1b
你将会注意到我们把栈指针设置到其他的地方,并且每个核都有它们自己指定的栈地址.我们通过把以下的东西加入 link.ld来建立必要的指向安全的内存区域的指针:
.cpu1Stack :
{
. = ALIGN(16); // 16 bit aligned
__stack_start = .; // Pointer to the start
. = . + 512; // 512 bytes long
__cpu1_stack = .; // Pointer to the end (stack grows down)
}
.cpu2Stack :
{
. = . + 512;
__cpu2_stack = .;
}
.cpu3Stack :
{
. = . + 512;
__cpu3_stack = .;
}
哦!这是它装载引导程序的代码.如果你使用新的引导程序和现有的代码,树莓派应该启动并且像之前一样运行.我们现在需要继续实现在这些次要核心上执行代码所需的信号,这些次级核心现在由我们支配.
从C唤醒辅助核
查看 multicore.c .
这里我们为每个核心复制两个函数:
void start_core1(void (*func)(void))
{
store32((unsigned long)&spin_cpu1, (unsigned long)func);
asm volatile ("sev");
}
void clear_core1(void)
{
store32((unsigned long)&spin_cpu1, 0);
}
首先,start_core1()使用了store32()函数(也在 multicore.c 中)来写我们事先定义的spin_cpu1的内存地址.这使它变为非0值,告诉它被唤醒时应该跳转到的地方.因为我们使用wfe(Wait For Event)来使他休眠,我们使用sev(Set Event)来再次唤醒它.
其次,clear_core1()可以被执行的函数使用来重置spin_cpu1到0,所以当代码返回时核不会再次跳转.
更多的main()!
最后,我们看到 kernel.c,我们有一个单独的main(),还有:
core0_main() -- 每一秒递增一下进度条(大约)
core1_main() -- 有两个进度条,在50%的时候使用CPU播放音频,放完时直接跳转到100%
core2_main() -- 设置DMA运输音频,然后每半秒递增一下进度条,播放完成时跳转到100%
core3_main() -- 每四分之一秒递增一下进度条(大概)
main() 是核0的入口,它最终落到core0_main()里面,但它分别向 core3_main() 和 core1_main() 传递开始函数来启动它们之前它不会落进去.当 core1_main() 完成后,它启动 core2_main().
当你运行这个的时候,你可以看见这些函数分别在它们的内核上并行运行.欢迎来到对称多核处理!(原文为 Welcome to symmetric multi-processing!)
如果你在启动的时候看见了彩虹屏,首先试试使用树莓派官方操作系统的 rpi-update 更新你的固件.

在将要来到的part11中,我们将要把这些东西都放在一起,来做一个多核版本的Breakout游戏.
原文如下

Using multiple CPU cores
Instead of a background DMA transfer, I suggested that we might use a second CPU core to play the audio whilst our main core continues on. I also said it would be hard on the Raspberry Pi 4... and it is.
I wrote this code as I referenced [Sergey Matyukevich's work](https://github.com/s-matyukevich/raspberry-pi-os/tree/master/src/lesson02), for which I am very grateful. It did need some modification to ensure the secondary cores are woken up when the time is right. This code isn't particularly "safe" yet, but it's good enough to prove the concept in principle.
You'll need to modify your _config.txt_ file on your SD card to include the following lines:
```c
kernel_old=1
disable_commandline_tags=1
arm_64bit=1
```
Perhaps the most important here is the `kernel_old=1` directive. This tells the bootloader to expect the kernel at offset `0x00000` instead of `0x80000`. As such, we'll need to remove this line from our _link.ld_:
```c
. = 0x80000; /* Kernel load address for AArch64 */
```
It also won't lock the secondary cores for us on boot, so we will still be able to access them (more on this later).
Setting up the main timer
There is one other important piece of setup that we'll need to take care of ourselves now - establishing the main timer. We add the following `#define` block to the top of _boot.S_:
```c
#define LOCAL_CONTROL 0xff800000
#define LOCAL_PRESCALER 0xff800008
#define OSC_FREQ 54000000
#define MAIN_STACK 0x400000
```
`LOCAL_CONTROL` is the address of the ARM_CONTROL register. At the top of our `_start:` section we'll set this to zero, effectively telling the ARM main timer to use the crystal clock as a source and set the increment value to 1:
```c
ldr x0, =LOCAL_CONTROL // Sort out the timer
str wzr, [x0]
```
We go on to set the prescaler - think of this as another clock divisor equivalent. Setting it thus will effectively make this divisor 1 (i.e. it will have no effect):
```c
mov w1, 0x80000000
str w1, [x0, #(LOCAL_PRESCALER - LOCAL_CONTROL)]
```
You should remember the expected oscillator frequency of 54 MHz from part9. We set this with the following lines:
```c
ldr x0, =OSC_FREQ
msr cntfrq_el0, x0
msr cntvoff_el2, xzr
```
Our timer is now as we need it.
Booting the main core
We go on to check the processor ID as we always have. If it's zero then we're on the main core and we jump forward to label `2:`. This time, we have to set our stack pointer slightly differently. We can't set it below our code, because it's at 0x00000 now! Instead, we use the address we defined earlier as `MAIN_STACK` at the top:
```c
// Set stack to start somewhere safe
mov sp, #MAIN_STACK
```
We then continue to clear the BSS as always, and jump to our `main()` function in C code. If it does happen to return, we branch back to `1:` to halt the core.
Setting up the secondary cores
Previously, we've unequivocally halted the other cores by spinning them in an infinite loop at label `1:`. Instead, each core will now watch a value at its own designated memory address, initialised to zero at the bottom of _boot.S_, and named as `spin_cpu0-3`. If this value goes non-zero, then that's a signal to wake up and jump to that memory location, executing whatever code is there. Once that code returns, we start looping and watching all over again.
```c
adr x5, spin_cpu0 // Base watch address
1: wfe
ldr x4, [x5, x1, lsl #3] // Add (8 * core_number) to the base address and load what's there into x4
cbz x4, 1b // Loop if zero, otherwise continue
ldr x2, =__stack_start // Get ourselves a fresh stack - location depends on CPU core asking
lsl x1, x1, #9 // Multiply core_number by 512
add x3, x2, x1 // Add to the address
mov sp, x3
mov x0, #0 // Zero registers x0-x3, just in case
mov x1, #0
mov x2, #0
mov x3, #0
br x4 // Run the code at the address in x4
b 1b
```
You'll notice that we've set our stack pointer elsewhere, and each core has its own designated stack address. This is to avoid it conflicting with activity on the other cores. We establish the necessary pointers to a safe memory area by adding the following to our _link.ld_:
```c
.cpu1Stack :
{
. = ALIGN(16); // 16 bit aligned
__stack_start = .; // Pointer to the start
. = . + 512; // 512 bytes long
__cpu1_stack = .; // Pointer to the end (stack grows down)
}
.cpu2Stack :
{
. = . + 512;
__cpu2_stack = .;
}
.cpu3Stack :
{
. = . + 512;
__cpu3_stack = .;
}
```
Phew! That's it for the bootloader code. If you use this new bootloader with your existing code, the RPi4 should boot and run as before. We now need to go on to implement the signalling required to execute code on these secondary cores which are now at our disposal.
Waking the secondary cores from C
Check out _multicore.c_.
Here we essentially duplicate two functions for each core:
```c
void start_core1(void (*func)(void))
{
store32((unsigned long)&spin_cpu1, (unsigned long)func);
asm volatile ("sev");
}
void clear_core1(void)
{
store32((unsigned long)&spin_cpu1, 0);
}
```
The first, `start_core1()`, uses the `store32()` function (also in _multicore.c_) to write an address to our predefined `spin_cpu1` memory location. This takes it non-zero, telling core 1 where to jump to when it wakes. Since we put it to sleep with a `wfe` (Wait For Event) instruction, we use a `sev` (Set Event) instruction to wake it again.
The second, `clear_core1()`, can be used by an executing function to reset `spin_cpu1` to zero, so the core won't jump again when the executing code returns.
More main()'s please!
Finally, we look at _kernel.c_, where we now have a single `main()`, but also:
* `core0_main()` - increments a progress bar every 1 second (roughly)
* `core1_main()` - has a two-step progress bar, playing an audio sample using the CPU at 50%, jumping straight to 100% when done
* `core2_main()` - sets a DMA audio transfer, then increments a progress bar every half second (roughly), jumping to 100% as playback finishes
* ... and `core3_main()` - increments a progress bar every quarter second (roughly)
`main()` is core 0's entry point, which ultimately falls through to `core0_main()`, but not before it kicks off `core3_main()` and `core1_main()` by passing them to their respective start functions. When `core1_main()` finishes, it kicks off `core2_main()`.
_As you run this, you'll see that these functions run in parallel on their respective cores. Welcome to symmetric multi-processing!_
**If all you see on boot is the rainbow screen, try first updating your firmware using** `rpi-update` **from Raspbian.**

Coming up in part 11, we'll put all of this work together for a multi-core version of our Breakout game.