Codelet reentrancy

JonAbbott · Post by **JonAbbott** » Sun Apr 13, 2014 10:55 pm

To fix Chuck Rock, Cannon Fodder and a few other games, codelets need to either be reentrant or execute with IRQ's disabled.

The core problem is that most codelets require working registers and to preserve the existing register they need to be stored somewhere. They can't be stacked as a stack may not exist, so they're currently stored locally within the codelet. This presents a problem if an IRQ occurs whilst a codelet is executing, which subsequently calls the codelet interrupted. As execution is likely to be in User, IRQ's can't easily be disabled which leaves few options.

Options are:

1. Use an undefined instruction to enable / disable IRQ's
2. Use an undefined instruction to stack / unstack working registers
3. Have the codelet check to see if it's already stored working registers and use a secondary working area
4. Intercept all IRQ's and allow any executing codelets to complete before serving the IRQ
5. Hypervise the IRQ and FIQ hardware vectors

(1) is the obvious choice, as it mirrors what the ARM3 would do anyway. The drawback however is that IRQ's could end up potentially disabled most of the time - something we wouldn't want when you're being prompted to insert a floppy.

(2) seems the better option, using a private stack within the JIT. At most four registers are going to be stacked at any one time, two from the codelet that was interrupted and two for any subsequent codelets.

The implication however is that the majority of codelets will cause two undefined instructions, so performance will take a big hit. This won't be noticeable on the Pi, but will impact StrongARM quite heavily. Each undefined instruction will trigger a jump to the Und vector, two CPU mode changes and X instructions to both decode and execute the instruction.

(3) although this sounds simple, it vastly complicates the codelets. One potential workaround to simplify things, would be to duplicate each codelet and have the first handoff to the second if it's already executing.

EDIT: Added option 4. I attempted to code option 2 and soon realised it wont work for some codelets. LDM/STM for example have to repeatedly stack and unstack working registers, to preserve all registers around the original instruction.

EDIT2: Added option 5. To implement, claim the IRQ/FIQ hardware vectors and check R14 on entry, if it's within JIT codelet space, stack the codelets local variables, let the IRQ/FIQ proceed but return to us, unstack the codelets local variables and then exit as normal. If R14 is outside of JIT codelet space pass the call on. I'm not certain if FIQ would need to be hypervised, I suspect not as events that trigger codelet reentrancy tend to be VSync, T1, SSBC, EventV which all hang off IRQ.

This will require some method of 1) ascertaining if the codelet has local variables and 2) knowing where they are in the codelet.

EDIT3: A variant of 5. Claim IRQ/FIQ as above, but only store R14 and immediately pass the call on. At all interrupt driven entry points (OS_Claim, OS_ClaimDeviceDriver, OS_CallBack, OS_CallAfter, IRQ/IRQv, IOC VSync/T1/SSBC) cache the variables of the codelet being interrupted.

Suggest altering the codelet header structure to include 4 variable slots and then walk the codelet tree to find the codelet interrupted. This could be sped up by walking up/down depending on the R14 address being below or above the halfway mark or allocated codelet space.

JonAbbott · Post by **JonAbbott** » Fri Apr 18, 2014 6:50 pm

Its looking like (1) is the only workable option. Although (3) will work, there is a slim possibility that an IRQ will occur during an IRQ if they're reenabled by an IRQ handler - Acorn's original advice on IRQ handlers is to switch to SVC and reenable IRQ's as soon as possible if they're going to take time to execute.

(4) IRQ's are already intercepted, as they're all veneered, how one determines if PC was previously in a codelet is a different matter, as RO will more than likely stack registers and obviscate the interrupted PC from us. Even if we can determine if the PC was in a codelet, there's no easy way to let the codelet execute to completion without using CPU debug features.

JonAbbott · Post by **JonAbbott** » Tue Dec 30, 2014 10:32 pm

Having coded up option 5 in ADFFS 2.47, Chuck Rock, Fire & Ice etc crash still in the same place. Codelets are being interrupted, but not very often so reentrancy isn't the big issue I originally perceived it to be. In fact out of the games tested to date, BlowPipe is the only one that I can definitely say has reentrancy issues and they're fixed by the boot script.

After a lot of code tracing of Chuck Rock, I've pinned the problem down to the Abort handler being entered whilst the JIT is running. For example, code writing to VIDC under the JIT will eventually crash if called often enough.

Chuck Rock's issue is within its EventV handler, which writes to VIDC each VSync to set the screen geometry. Comment out the writes and it works without issue. What's strange is the issue itself, the EventV handler calls a subroutine to write to VIDC, when this exits it sometimes returns to R14+8.

I've ruled out the Abort handlers themselves as a source of the problem (MEMC, IOC, VIDC1, VIDC20), the issue appears to be with swapping to the aborting CPU mode and then back to Abort32 whilst the JIT is running. The Abort handler works fine without the JIT running, so the code itself if okay. It's not re-entrancy of the Abort handler, as it checks for it and will report the issue. It's not IRQ's as whilst the Abort handler is working, IRQ's are disabled.

To double check, I've tried rewriting the Abort handlers to use the stack instead of static variable locations and still see the problem, so I'm a bit stumped at the minute.

The code that's failing in Chuck Rock is:

Code: Select all

000088B8 : E92D8000 : STMFD   sp!,{pc}             ;stacks 88C4 correctly
000088BC : F1A00000 : MOVNV   a1,a1
000088C0 : EA000284 : B       &000092D8
000088C4 : E59C0024 : LDR     a1,[ip,#&024]
000088C8 : EB000003 : BL      &000088DC
000088CC : E59C0038 : LDR     a1,[ip,#&038]
000088D0 : E2800001 : ADD     a1,a1,#1
000088D4 : E58C0038 : STR     a1,[ip,#&038]
000088D8 : E8BD9FFF : LDMFD   sp!,{a1-ip,pc}

000088DC : E92D4000 : STMFD   sp!,{lr}
000088E0 : E5CF0025 : STRB    a1,&0000890D
000088E4 : E1A00420 : MOV     a1,a1,LSR #8
000088E8 : E5CF001E : STRB    a1,&0000890E
000088EC : E1A00420 : MOV     a1,a1,LSR #8
000088F0 : E5CF0017 : STRB    a1,&0000890F
000088F4 : E1A00420 : MOV     a1,a1,LSR #8
000088F8 : E5CF0010 : STRB    a1,&00008910
000088FC : E3A00016 : MOV     a1,#&16
00008900 : E28F1004 : ADR     a2,&0000890C
00008904 : EF000007 : SWI     OS_Word
00008908 : E8BD8000 : LDMFD   sp!,{pc}
...
000092D8 : E3A0050D : MOV     a1,#&03400000
000092DC : E3A0132A : MOV     a2,#&A8000000
000092E0 : E281191D : ADD     a2,a2,#&00074000
000092E4 : E5801000 : STR     a2,[a1,#0]
000092E8 : E3A0132B : MOV     a2,#&AC000000
000092EC : E281180D : ADD     a2,a2,#&000D0000
000092F0 : E5801000 : STR     a2,[a1,#0]
000092F4 : E3A0120B : MOV     a2,#&B0000000
000092F8 : E281184A : ADD     a2,a2,#&004A0000
000092FC : E2811803 : ADD     a2,a2,#&00030000
00009300 : E5801000 : STR     a2,[a1,#0]
00009304 : E3A0132D : MOV     a2,#&B4000000
00009308 : E281183F : ADD     a2,a2,#&003F0000
0000930C : E5801000 : STR     a2,[a1,#0]
00009310 : E3A01322 : MOV     a2,#&88000000
00009314 : E281180D : ADD     a2,a2,#&000D0000
00009318 : E5801000 : STR     a2,[a1,#0]
0000931C : E3A01325 : MOV     a2,#&94000000
00009320 : E28119E3 : ADD     a2,a2,#&0038C000
00009324 : E5801000 : STR     a2,[a1,#0]
00009328 : E3A01323 : MOV     a2,#&8C000000
0000932C : E3A02040 : MOV     a3,#&40
00009330 : E0811702 : ADD     a2,a2,a3,LSL #14
00009334 : E5801000 : STR     a2,[a1,#0]
00009338 : E3A01209 : MOV     a2,#&90000000
0000933C : E28220A0 : ADD     a3,a3,#&A0
00009340 : E0811702 : ADD     a2,a2,a3,LSL #14
00009344 : E5801000 : STR     a2,[a1,#0]
00009348 : E8BD8000 : LDMFD   sp!,{pc}              ;stacked PC is 88C4

At the point of the crash the STMFD sp!,{pc} at 88B8 did store the correct address of 88C4, the LDMFD sp!,{pc} at 9348 however sets PC to 88CC.

Although the subroutine at 88DC conveniently stores 88CC on the stack, that's not the cause. If I NOP the branch to it, the code just fails somewhere else.

JonAbbott · Post by **JonAbbott** » Wed Dec 31, 2014 4:01 pm

Finally found the cause. As the abort handlers exit, the last two instructions switch to the aborting CPU mode and then load R0-PC. If an IRQ occurred between these two instructions, the registers became corrupt. The fix was simply to let the CPU switch the CPU mode as it loads PC via LDM ^.

Testing under StrongARM emulation, both Chuck Rock and Fire & Ice are now working.

On the Pi, Chuck Rock and bouncer now work although it's near impossible to swap disks on Chuck Rock - I need to investigate why the Pi doesn't service the keyboard in the same way the RPC does.

Ironically the codelet reentrancy is now causing the Pi to crash. It looks like the IRQ stack is being reset as RISCOS' IRQ handler exits, I need to investigate further.

JonAbbott · Post by **JonAbbott** » Fri Jan 09, 2015 11:26 am

JonAbbott wrote:Ironically the codelet reentrancy is now causing the Pi to crash. It looks like the IRQ stack is being reset as RISCOS' IRQ handler exits, I need to investigate further.

This turned out to be an ~~issue~~ design decision made in the RTSupport module where it resets the IRQ stack when it exits. It would appear it does this without any form of check to see if there's a IRQ claimant other than itself on the IRQ vector.

I've worked around the issue by implementing a private IRQ stack for ADFFS.

On the positive side, RTSupport may resolve some other issues.

forums.jaspp.org.uk

Codelet reentrancy

Codelet reentrancy

Re: Codelet reentrancy

Re: Codelet reentrancy

Re: Codelet reentrancy

Re: Codelet reentrancy