JIT Phase 2

Discuss development specific to the Pi version of ADFFS
JonAbbott
Posts: 2956
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

JIT Phase 2

Post by JonAbbott »

Add additional requirements for StrongARM:

JIT read ahead (coded)
Cache flushing (coded)
Self-modifying code support (coded)
JonAbbott
Posts: 2956
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: JIT Phase 2

Post by JonAbbott »

I now have the JIT running on StrongARM and the Pi having added read-ahead and cache flushing, below are screenshots of both. The Pi was too quick to capture with a camera, self-play was completing each level in less than 2 seconds.

Numbers across the top are as below. FPS, I, C and R are all decimal values.

<fps> <last instruction> <last codelet> I:<total instructions> C:<cache flushes> R:<re-interpreted instructions>
Zarch on StrongARM under ARM3 JIT (109 fps)
Zarch on StrongARM under ARM3 JIT (109 fps)
zarch_arm3jitSA1.png (31.82 KiB) Viewed 6237 times
Zarch on Pi under ARM3 JIT (424 fps)
Zarch on Pi under ARM3 JIT (424 fps)
zarch_arm3jitPI1.png (22.27 KiB) Viewed 6236 times
JonAbbott
Posts: 2956
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: JIT Phase 2

Post by JonAbbott »

Self modifying code support has now been added. It's working under emulation with Zarch and Pacmania, but not on physical - I'm not sure why, as I'm flushing the D/I-cache's on every write.

That aside, I'm considering adding code to only flush the cache if either an instruction was overwritten, or it's within the same cache line as the running instruction. That should reduce the impact on data writes.

Currently a write to a page that's marked as containing code takes a massive hit:
  1. A page permission Abort is triggered
  2. 20 (may go to 28) instructions - the Abort handler establishes that the write was triggered by the JIT, passing to the OS if otherwise (for lazy page swapping)
  3. 12 instructions for User mode, 9 for IRQ/FIQ/Svc - Save R0-R14 (this could be improved with optimization of the code)
  4. 7 instructions for SWP, 9 STR / STM - instruction is loaded and decoded
    • STM takes 16 + ((reglist / 2) * 3) instructions to decode and a further 18 + (reglist * 6) to proxy the write and adjust Rn
    • STR takes 9 instructions to proxy the write and up to a further 38 instructions to adjust Rn. If the processor is in Late abort mode and write-back is used, it takes 2 instructions to establish Rn doesn't need correcting. 3 instructions to determine if the cache needs flushing, including the flush itself
    • SWP - yet to code
  5. 9 instructions to exit back to the caller, reloading all 15 registers
Total for every write that is in a page with instructions:

STR 65 to 101 instructions, including a I-cache flush if required
STM 94 to 199 instructions, not I-cache flush (currently)
JonAbbott
Posts: 2956
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: JIT Phase 2

Post by JonAbbott »

JonAbbott wrote:Self modifying code support has now been added. It's working under emulation with Zarch and Pacmania, but not on physical - I'm not sure why, as I'm flushing the D/I-cache's on every write.
It's a recurrence of the problems I was seeing when flushing individual D-cache entries and not related to the self-modifying code. It's the JIT instruction decoder that's causing the problem, when it exits. I've reinstated the code that reads 32Kb of RAM to force the D-Cache to flush until I can track it down.

Zarch is running 25% slower at 74 fps on StrongARM, I need to recode the DA2 support following some RISC OS bug fixes, so can't confirm the hit on Pi just yet...that's my next task.
JonAbbott
Posts: 2956
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: JIT Phase 2

Post by JonAbbott »

Whilst testing I've noticed emulation support of page access permission Aborts is a bit hit and miss:

Virtual RPC crashes immediately an Abort is generated.

RPCEmu 0.8.11 Interpreter generates 1 Abort and then no more. The Recompiler hangs instantly when loading ADFFS on SA and refuses to power up an ARM710 machine.

Testing Zarch on a physical StrongARM, I'm seeing 10,000 Aborts a second whilst it's running - all due to writes to pages with intermixed code/data. Considering the amount, I'm surprised it's only slowed it down by 25%.
JonAbbott
Posts: 2956
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: JIT Phase 2

Post by JonAbbott »

Have done some testing on the Pi, Zarch is getting 20,000 Aborts a second and 145 fps. Quite a drop, I'll see what effect only flushing the cache if necessary has.

I can't get the protected version of Zarch to load though, it crashes - its a bug in the STR/STM/SWP write proxy code.
JonAbbott
Posts: 2956
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: JIT Phase 2

Post by JonAbbott »

JonAbbott wrote:Zarch is getting 20,000 Aborts a second and 145 fps. Quite a drop, I'll see what effect only flushing the cache if necessary has.
With selective cache flushing, the figures are:

StrongARM: 96 fps (was 109 fps without self-modifying code support - 12% hit on performance)
Pi: 352 fps (was 424 fps without self-modifying code support - 17% hit on performance)

Not so much of a performance hit now and only requires 2 additional instructions to detect if a cache flush is required :D
JonAbbott
Posts: 2956
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: JIT Phase 2

Post by JonAbbott »

Speed comparison of Zarch, with your ship left on the launch pad and 1 ship buzzing around:

ARM710 40 MHz
Native: 96 fps
JIT without self-modifying code support: 24 fps
JIT with self-modifying code support: 21 fps (seeing ~5000 Aborts a second)

StrongARM 200 MHz
Native: 125 fps
JIT without self-modifying code support: 109 fps
JIT with self-modifying code support: 96 fps (seeing ~10000 Aborts a second)

ARM11 700MHz (Pi)
Native: 714 fps
JIT without self-modifying code support: 424 fps
JIT with self-modifying code support: 352 fps


In terms of their ARM3 equivalent speed:

ARM710
Native: 40 MHz
JIT without self-modifying code support: 10 MHz
JIT with self-modifying code support: 8 MHz

StrongARM
Native: 50 MHz
JIT without self-modifying code support: 44 MHz
JIT with self-modifying code support: 38 MHz

ARM11
Native: 285 MHz
JIT without self-modifying code support: 170 MHz
JIT with self-modifying code support: 140 MHz
JonAbbott
Posts: 2956
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: JIT Phase 2

Post by JonAbbott »

Self-modifying code support is now tested and working on both StrongARM and ARM11. Outstanding issues to resolve are:
  1. Cache flushing issues. Flush single D-cache entry isn't working as advertised. To workaround, it's currently reading 16kb of RAM for StrongARM and using flush D-cache on the ARM11. It would imply there's something wrong with my cache flush code, but it's identical to the RO code for OS_SynchroniseCodeArea (which I've also tried using and has the same issue incidentally). As it works under emulation, I'm beginning to think there's a StrongARM errata.
  2. Check STM works across page boundaries, where the start of the write is in a read/write page and overlaps a read only page. This will affect the StrongARM due to it's Late Abort Mode. It's fairly easy to determine this type of Abort - the bottom 12 bits of the aborting address will be 0, the tricky part is working out if anything was written on the page below, or if the abort started writing at the start of the page. I suspect I may have to codelet every STM on StrongARM to resolve this issue.
  3. Zarch is consistently crashing on both StrongARM, emulation and the Pi at the end of the 2nd auto-play when it completes wave 4, its possibly the code that increases gravity
JonAbbott
Posts: 2956
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: JIT Phase 2

Post by JonAbbott »

JonAbbott wrote:
  • Zarch is consistently crashing on both StrongARM, emulation and the Pi at the end of the 2nd auto-play when it completes wave 4, its possibly the code that increases gravity
It was a fix I'd put in for issue 1 causing this. With that corrected I'm now leaving Zarch to play itself on several machines to check there's no more crashes.

I've noticed another issue with RO4/SA which I need to fix - the screen caching isn't being passed onto the OS when it's enabled.
Post Reply