To fix Chuck Rock, Cannon Fodder and a few other games, codelets need to either be reentrant or execute with IRQ's disabled.
The core problem is that most codelets require working registers and to preserve the existing register they need to be stored somewhere. They can't be stacked as a stack may not exist, so they're currently stored locally within the codelet. This presents a problem if an IRQ occurs whilst a codelet is executing, which subsequently calls the codelet interrupted. As execution is likely to be in User, IRQ's can't easily be disabled which leaves few options.
Options are:
1. Use an undefined instruction to enable / disable IRQ's
2. Use an undefined instruction to stack / unstack working registers
3. Have the codelet check to see if it's already stored working registers and use a secondary working area
4. Intercept all IRQ's and allow any executing codelets to complete before serving the IRQ
5. Hypervise the IRQ and FIQ hardware vectors
(1) is the obvious choice, as it mirrors what the ARM3 would do anyway. The drawback however is that IRQ's could end up potentially disabled most of the time - something we wouldn't want when you're being prompted to insert a floppy.
(2) seems the better option, using a private stack within the JIT. At most four registers are going to be stacked at any one time, two from the codelet that was interrupted and two for any subsequent codelets.
The implication however is that the majority of codelets will cause two undefined instructions, so performance will take a big hit. This won't be noticeable on the Pi, but will impact StrongARM quite heavily. Each undefined instruction will trigger a jump to the Und vector, two CPU mode changes and X instructions to both decode and execute the instruction.
(3) although this sounds simple, it vastly complicates the codelets. One potential workaround to simplify things, would be to duplicate each codelet and have the first handoff to the second if it's already executing.
EDIT: Added option 4. I attempted to code option 2 and soon realised it wont work for some codelets. LDM/STM for example have to repeatedly stack and unstack working registers, to preserve all registers around the original instruction.
EDIT2: Added option 5. To implement, claim the IRQ/FIQ hardware vectors and check R14 on entry, if it's within JIT codelet space, stack the codelets local variables, let the IRQ/FIQ proceed but return to us, unstack the codelets local variables and then exit as normal. If R14 is outside of JIT codelet space pass the call on. I'm not certain if FIQ would need to be hypervised, I suspect not as events that trigger codelet reentrancy tend to be VSync, T1, SSBC, EventV which all hang off IRQ.
This will require some method of 1) ascertaining if the codelet has local variables and 2) knowing where they are in the codelet.
EDIT3: A variant of 5. Claim IRQ/FIQ as above, but only store R14 and immediately pass the call on. At all interrupt driven entry points (OS_Claim, OS_ClaimDeviceDriver, OS_CallBack, OS_CallAfter, IRQ/IRQv, IOC VSync/T1/SSBC) cache the variables of the codelet being interrupted.
Suggest altering the codelet header structure to include 4 variable slots and then walk the codelet tree to find the codelet interrupted. This could be sped up by walking up/down depending on the R14 address being below or above the halfway mark or allocated codelet space.
Codelet reentrancy
Re: Codelet reentrancy
Its looking like (1) is the only workable option. Although (3) will work, there is a slim possibility that an IRQ will occur during an IRQ if they're reenabled by an IRQ handler - Acorn's original advice on IRQ handlers is to switch to SVC and reenable IRQ's as soon as possible if they're going to take time to execute.
(4) IRQ's are already intercepted, as they're all veneered, how one determines if PC was previously in a codelet is a different matter, as RO will more than likely stack registers and obviscate the interrupted PC from us. Even if we can determine if the PC was in a codelet, there's no easy way to let the codelet execute to completion without using CPU debug features.
(4) IRQ's are already intercepted, as they're all veneered, how one determines if PC was previously in a codelet is a different matter, as RO will more than likely stack registers and obviscate the interrupted PC from us. Even if we can determine if the PC was in a codelet, there's no easy way to let the codelet execute to completion without using CPU debug features.
Re: Codelet reentrancy
Having coded up option 5 in ADFFS 2.47, Chuck Rock, Fire & Ice etc crash still in the same place. Codelets are being interrupted, but not very often so reentrancy isn't the big issue I originally perceived it to be. In fact out of the games tested to date, BlowPipe is the only one that I can definitely say has reentrancy issues and they're fixed by the boot script.
After a lot of code tracing of Chuck Rock, I've pinned the problem down to the Abort handler being entered whilst the JIT is running. For example, code writing to VIDC under the JIT will eventually crash if called often enough.
Chuck Rock's issue is within its EventV handler, which writes to VIDC each VSync to set the screen geometry. Comment out the writes and it works without issue. What's strange is the issue itself, the EventV handler calls a subroutine to write to VIDC, when this exits it sometimes returns to R14+8.
I've ruled out the Abort handlers themselves as a source of the problem (MEMC, IOC, VIDC1, VIDC20), the issue appears to be with swapping to the aborting CPU mode and then back to Abort32 whilst the JIT is running. The Abort handler works fine without the JIT running, so the code itself if okay. It's not re-entrancy of the Abort handler, as it checks for it and will report the issue. It's not IRQ's as whilst the Abort handler is working, IRQ's are disabled.
To double check, I've tried rewriting the Abort handlers to use the stack instead of static variable locations and still see the problem, so I'm a bit stumped at the minute.
The code that's failing in Chuck Rock is:
At the point of the crash the STMFD sp!,{pc} at 88B8 did store the correct address of 88C4, the LDMFD sp!,{pc} at 9348 however sets PC to 88CC.
Although the subroutine at 88DC conveniently stores 88CC on the stack, that's not the cause. If I NOP the branch to it, the code just fails somewhere else.
After a lot of code tracing of Chuck Rock, I've pinned the problem down to the Abort handler being entered whilst the JIT is running. For example, code writing to VIDC under the JIT will eventually crash if called often enough.
Chuck Rock's issue is within its EventV handler, which writes to VIDC each VSync to set the screen geometry. Comment out the writes and it works without issue. What's strange is the issue itself, the EventV handler calls a subroutine to write to VIDC, when this exits it sometimes returns to R14+8.
I've ruled out the Abort handlers themselves as a source of the problem (MEMC, IOC, VIDC1, VIDC20), the issue appears to be with swapping to the aborting CPU mode and then back to Abort32 whilst the JIT is running. The Abort handler works fine without the JIT running, so the code itself if okay. It's not re-entrancy of the Abort handler, as it checks for it and will report the issue. It's not IRQ's as whilst the Abort handler is working, IRQ's are disabled.
To double check, I've tried rewriting the Abort handlers to use the stack instead of static variable locations and still see the problem, so I'm a bit stumped at the minute.
The code that's failing in Chuck Rock is:
Code: Select all
000088B8 : E92D8000 : STMFD sp!,{pc} ;stacks 88C4 correctly
000088BC : F1A00000 : MOVNV a1,a1
000088C0 : EA000284 : B &000092D8
000088C4 : E59C0024 : LDR a1,[ip,#&024]
000088C8 : EB000003 : BL &000088DC
000088CC : E59C0038 : LDR a1,[ip,#&038]
000088D0 : E2800001 : ADD a1,a1,#1
000088D4 : E58C0038 : STR a1,[ip,#&038]
000088D8 : E8BD9FFF : LDMFD sp!,{a1-ip,pc}
000088DC : E92D4000 : STMFD sp!,{lr}
000088E0 : E5CF0025 : STRB a1,&0000890D
000088E4 : E1A00420 : MOV a1,a1,LSR #8
000088E8 : E5CF001E : STRB a1,&0000890E
000088EC : E1A00420 : MOV a1,a1,LSR #8
000088F0 : E5CF0017 : STRB a1,&0000890F
000088F4 : E1A00420 : MOV a1,a1,LSR #8
000088F8 : E5CF0010 : STRB a1,&00008910
000088FC : E3A00016 : MOV a1,#&16
00008900 : E28F1004 : ADR a2,&0000890C
00008904 : EF000007 : SWI OS_Word
00008908 : E8BD8000 : LDMFD sp!,{pc}
...
000092D8 : E3A0050D : MOV a1,#&03400000
000092DC : E3A0132A : MOV a2,#&A8000000
000092E0 : E281191D : ADD a2,a2,#&00074000
000092E4 : E5801000 : STR a2,[a1,#0]
000092E8 : E3A0132B : MOV a2,#&AC000000
000092EC : E281180D : ADD a2,a2,#&000D0000
000092F0 : E5801000 : STR a2,[a1,#0]
000092F4 : E3A0120B : MOV a2,#&B0000000
000092F8 : E281184A : ADD a2,a2,#&004A0000
000092FC : E2811803 : ADD a2,a2,#&00030000
00009300 : E5801000 : STR a2,[a1,#0]
00009304 : E3A0132D : MOV a2,#&B4000000
00009308 : E281183F : ADD a2,a2,#&003F0000
0000930C : E5801000 : STR a2,[a1,#0]
00009310 : E3A01322 : MOV a2,#&88000000
00009314 : E281180D : ADD a2,a2,#&000D0000
00009318 : E5801000 : STR a2,[a1,#0]
0000931C : E3A01325 : MOV a2,#&94000000
00009320 : E28119E3 : ADD a2,a2,#&0038C000
00009324 : E5801000 : STR a2,[a1,#0]
00009328 : E3A01323 : MOV a2,#&8C000000
0000932C : E3A02040 : MOV a3,#&40
00009330 : E0811702 : ADD a2,a2,a3,LSL #14
00009334 : E5801000 : STR a2,[a1,#0]
00009338 : E3A01209 : MOV a2,#&90000000
0000933C : E28220A0 : ADD a3,a3,#&A0
00009340 : E0811702 : ADD a2,a2,a3,LSL #14
00009344 : E5801000 : STR a2,[a1,#0]
00009348 : E8BD8000 : LDMFD sp!,{pc} ;stacked PC is 88C4
Although the subroutine at 88DC conveniently stores 88CC on the stack, that's not the cause. If I NOP the branch to it, the code just fails somewhere else.
Re: Codelet reentrancy
Finally found the cause. As the abort handlers exit, the last two instructions switch to the aborting CPU mode and then load R0-PC. If an IRQ occurred between these two instructions, the registers became corrupt. The fix was simply to let the CPU switch the CPU mode as it loads PC via LDM ^.
Testing under StrongARM emulation, both Chuck Rock and Fire & Ice are now working.
On the Pi, Chuck Rock and bouncer now work although it's near impossible to swap disks on Chuck Rock - I need to investigate why the Pi doesn't service the keyboard in the same way the RPC does.
Ironically the codelet reentrancy is now causing the Pi to crash. It looks like the IRQ stack is being reset as RISCOS' IRQ handler exits, I need to investigate further.
Testing under StrongARM emulation, both Chuck Rock and Fire & Ice are now working.
On the Pi, Chuck Rock and bouncer now work although it's near impossible to swap disks on Chuck Rock - I need to investigate why the Pi doesn't service the keyboard in the same way the RPC does.
Ironically the codelet reentrancy is now causing the Pi to crash. It looks like the IRQ stack is being reset as RISCOS' IRQ handler exits, I need to investigate further.
Re: Codelet reentrancy
This turned out to be anJonAbbott wrote:Ironically the codelet reentrancy is now causing the Pi to crash. It looks like the IRQ stack is being reset as RISCOS' IRQ handler exits, I need to investigate further.
I've worked around the issue by implementing a private IRQ stack for ADFFS.
On the positive side, RTSupport may resolve some other issues.