Update on next release

Discuss ADFFS development and download test releases
JonAbbott
Posts: 2938
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Update on next release

Post by JonAbbott »

Despite putting in between 12-20 hours a day since the start of Feb, it's taking a lot longer that I thought to get the next release ready.

There's been many substantial changes under the hood to provide both Iyonix and IOMD 32bit support which have trippled the amount of testing I have to do, so a simply test run now takes three days. On top of that, when I find something that breaks on one platform I have to go back and retest all the others due to the interdependencies in the codebase.

Currently I have two show-stopping issues to resolve:
  • Cache flushing - this is proving particularly tricky on the Iyonix where its corrupting the cache instead of cleaning it. Over the past two weeks, I've been rewriting the cache handling to reduce cache invalidation and instead clear only I and D entries as required, it's still in progress so I'm not sure it actually works yet. It's meant building up a large library of cache functions due to every ARM chip requiring a different method to clean the cache
  • Branch walking - a new addition in the next release where the JIT walks code branches to reduce the number of JIT entries. This is closely tied in with the Cache flushing so is being developed in parallel. The first release will simply walk the first unconditional B/BL it hits and in a later update BL's will be queued for checking
Lotus Turbo Challenge 2 and Nebulus will be released in parallel with the next release of ADFFS. With the addition of page zero read support any games that read/write to the IRQ vectors directly, now work without manual fixups. The knock on effect of adding this however is the number of games that have bugs in them and read mistakenly from page zero.

ADFFS could let all page zero reads that it doesn't know how to handle succeed (I'll probably add an option for this for public release), but instead I've taken the route of fixing the bugs in the games themselves (ADFFS has the ability to dynamically change code when seen). I've fixed 14 bugs so far across 12 games and have 3 more bugs to resolve in Battle Chess, Caverns and Heimdall. I'm sure this is the tip of the iceberg considering I've only been testing around 20 games in the core testing.

In general illegal page zero reads can be classified into three categories. In order of prevalence, they are:
  1. Undefined pointers (value is 0) - fairly consistent across all C games, so I suspect it was a bug in the original compiler. A special case can be made for a read from 0 itself as it's unlikely a game would ever need the actual value there, so ADFFS can pass back the value it would have got on the OS environment it's currently providing. This may have a knock on effect if it's a memory pointer though, so as yet I've not added or tested this
  2. Sound handlers that don't initialise the Sound Channel Control Block (SCCB) during initialisation - fairly consistent across all games that have internal Voices. In the most part very few have initialisation routines and rely on RISCOS saving the values to the SCCB after the first call
  3. Programming errors - these are inevitable although thankfully low in number. These aren't so easy to fix as the previous two issues, as you have to understand what the code is trying to do and can't simply skip the instruction if it's reading from page zero
New to the next release is support for different RISCOS version environments. You can now specify a particular version of RISCOS and SWI's etc will behave as they did on that version. These differences aren't always documented however so I'm building this up as they come up in games. Examples so far include, the sound sample rate/length, value returned by OS_Byte 129 and OS_Word 21, 0 (define mouse pointer) which pre-RISCOS 3.5 delayed by 1 VSync before returning.

The blitter has had an overhaul to resolve some issues that came up with Gribbly's Day Out and Rockfall, it now correctly handles overlapping boarders and centres the screen where it can. There is one outstanding issue with Caverns to resolve where the blitter is dropping 2 pixels; I'll resolve this in a later release when I add hardware cursor support that Caverns requires.
Vanfanel
Posts: 576
Joined: Mon Sep 16, 2013 12:01 am

Re: Update on next release

Post by Vanfanel »

I can't wait for the next release!!
I come here every day to check the news and progress :D
JonAbbott
Posts: 2938
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: Update on next release

Post by JonAbbott »

Milestone event achieved today, Branch walking is finally working. ADFFS hasn't run a single game for two weeks, so I was overjoyed when Zarch and Battle Chess fired up and succeeded an hour soak test. I was pleasantly surprised to see the number of JIT entries drop substantially meaning it's now spending a far large portion of CPU time in the actual code, instead of the JIT.

Here's some screenshots showing the efficiency improvements Branch walking has on Zarch after it's run for a few minutes. The top image is before Branch walking, the lower after. The figures to look at are:
BeforeAfterDescription
>6830>864Number of times the JIT was entered, down by 83%
I:10275I:4463Number of Instructions the JIT has had to looked at, down by 57%
R:832R:866Number of Instructions the JIT has re-encoded, up 5%
L:6199L:342Number of LDR's it's had to check for page zero reads, down by 94%
C:1008C:881Number of full Cache flushes required, down 13%
#:468#:477Number of codelets created, up 2%
Before Branch Walker_sm.png
Before Branch Walker_sm.png (12.57 KiB) Viewed 8182 times
After Branch Walker_sm.png
After Branch Walker_sm.png (12.7 KiB) Viewed 8182 times
Cache flushing is completely broken at the moment, ADFFS is working under emulation only. I've had a few ideas today and believe I've come up with a far more efficient way of dealing with the problem so am making a start on that tomorrow.

The two key problems with cache flushing are:
  • Optimization - At what point do you give up cleaning individual I/D cache entries and invalidate both caches
  • CPU specifics - Every ARM requires a different method to flush/clean the cache
There's no consistency between any ARM revision and there's vendor specific requirements as well. StrongARM for example requires you to read a 16kb block of reserved memory and then invalidate the cache, the Iyonix on the other hand requires you to preload the cache with a 32kb block of reserved memory and also has a 2kb small cache which requires cleaning, followed by a pause for the flush to complete. The Pi adds additional instructions that allow you to specify ranges to clean, which I've yet to test these properly but looks to be far more efficient.

Trying to support all three in the ADFFS codebase is complex at best...however, I believe I can unify it all, simplify the code and improve it's efficiency. I'll know in a few days. The biggest problem is you don't know if its going to work until it's completely coded - it's not something you can test a step at a time as the cache is either consistent or inconsistent and when it does appear to be working, it requires soak testing to cover all scenarios. It could for example be cleaning a single D line, single I line and/or combinations of both, many cache lines or the complete cache. Also add into the mix the requirement to force the write buffers to actually write the cache data back to memory.

Frustrating complicated is the polite term for it.
JonAbbott
Posts: 2938
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: Update on next release

Post by JonAbbott »

Progress today on the cache flushing issue. Zarch is now working on the Pi and amazingly, without performing a single complete cache flush:
New cache handling_sm.png
New cache handling_sm.png (10.14 KiB) Viewed 8172 times
Figures to take note of:
BeforeAfterDescription
>864>944Number of times the JIT was entered, up by 9%
I:4463I:4450Number of Instructions the JIT has had to looked at, down by 1%
R:866R:863Number of Instructions the JIT has re-encoded, down by 1%
L:342L:328Number of LDR's it's had to check for page zero reads, down by 5%
C:881C:0Number of full Cache flushes required, down 100%
#:477#:470Number of codelets created, down by 2%
Although the JIT entries have increased by 9%, that's more preferable to 881 full cache flushes. The lower codelet figure I suspect is down to a very minor random element to Zarch, although I'm not 100% certain. I can't explain it otherwise as the codelet code hasn't changed.

What you can't tell from the image above is the spinning icon next to C:0 - it rotates as cache lines are flushed, which for Zarch is very slow initially and it's running under it's own steam after a few minutes and the JIT isn't involved.

Zarch however is the only game that's running, all the others I've tried are crashing. I suspect I've introduced a bug whilst implementing the cache code as some crash at the same place under emulation.
JonAbbott
Posts: 2938
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: Update on next release

Post by JonAbbott »

JonAbbott wrote:Zarch however is the only game that's running, all the others I've tried are crashing. I suspect I've introduced a bug whilst implementing the cache code as some crash at the same place under emulation.
After further testing last night, it would appear I was unlucky and managed to randomly picked all the games that don't work. It looks like it's the branch walker that's breaking them, which is fortuitous as I had a brainwave at 3am on how to reduce the number of ADFFS entries by implementing branch prediction.

The complication with branch prediction is self-modifying code, it's not a major problem but in some situations ADFFS could happy predict ahead only to find that the next instruction self-modifies and it's encoded 128 instructions for no reason. There's also the issue of hitting code that isn't legal, where for example it's not been written to memory yet by the game, or in the code sample below:

Code: Select all

8000 BL &10000
8004 DCD 1234    <- not an instruction and would be hit when the branch walker finishes with the code at 10000 and resumes here
8008 LDR R0, [R1]
...
10000 LDR R0, [R14]
10004 SWI OS_Write0
10008 ADD PC, R14, #4
In this scenario it would stop at the first undefined/illegal instruction and hand back to the game.

The benefits however are that for 99% of cases, ADFFS would translate more instructions on each entry and get nearer its instruction limit. I'm still trying to find a balance on what that is exactly, but ideally it needs to remain below the cache size so it's not having to invalidate both Data and Instruction caches. That said however, when it does hit the Data cache limit the ARM will start auto-flushing the oldest code so it's swings and roundabouts.

If I'm honest, I don't think it will make much difference, the JIT is already very efficient even without this kind of performance tuning. Where I do need to improve things is around the number of Aborts that occur due to the self-modifying code support. It's a massive hit, with Zarch generating 40,000 Aborts/sec as an example. If I can build some kind of intelligence into it, it may be possible to determine the difference between self-modifying instructions and intermixed code/data write instructions.

The Aborts are being generated by writes to variables stored within code blocks. Once ADFFS translates an instruction, it marks the memory page as read only. Any subsequent writes trigger the Abort handler and self-modifying code checks. 99.9% of the time these writes are perfectly legal, its just the odd bit of C code and game protection that actually requires self-modify code support.

An example of how this might work may be to track the number of Aborts in each page resetting the counter every time an instruction is overwritten. When it hits a certain threshold, switch the page back to R/W so no more Aborts are generated. We have to be careful here though, as going back to the Zarch example it does self-modify at certain points and due to the high number of Aborts it generates would almost certainly hit the trigger point to switch the page back to R/W.

This improvement is one for another day though, so I'll gather my thoughts and define how it's going to work before attempting to code it.

Getting back to the cache changes in the next release, the way this now works is that it builds up a list of memory ranges and which type of clean/invalidation they require. Once ADFFS hits it's limit or an instruction it can't process, it looks at the list and determines if it's more efficient to invalidate the whole instruction cache or memory ranges. In 99.9% of cases it's currently opting for memory range invalidation.

Another optimization, which I've partially added is to pre-cache instructions it's about to execute whilst it's working though the list. In that way by the time it comes to execute it, the delay spent cleaning the Data cache if offset by the fact the Instruction cache now contains the new instructions - essentially cleaning the cache for free. This does need careful implementation though, as the Data cache must have flushed and sufficient time passed for the pre-cache to execute to ensure the pre-cache buffer isn't saturated - it only has four slots, so can only handle 32 instructions at a time.
JonAbbott
Posts: 2938
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: Update on next release

Post by JonAbbott »

JonAbbott wrote:After further testing last night, it would appear I was unlucky and managed to randomly picked all the games that don't work. It looks like it's the branch walker that's breaking them, which is fortuitous as I had a brainwave at 3am on how to reduce the number of ADFFS entries by implementing branch prediction.

The complication with branch prediction is self-modifying code, it's not a major problem but in some situations ADFFS could happy predict ahead only to find that the next instruction self-modifies and it's encoded 128 instructions for no reason. There's also the issue of hitting code that isn't legal, where for example it's not been written to memory yet by the game
And it turns out it's code that's yet to be written that causing the games to crash. The particular code sequence is:

1. Load file at X
2. Branch to X

When the branch walker gets to the branch at point 2, it hits an illegal instruction and diligently reports it.
JonAbbott
Posts: 2938
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: Update on next release

Post by JonAbbott »

JonAbbott wrote:When the branch walker gets to the branch at point 2, it hits an illegal instruction and diligently reports it.
Been thinking about the best solution for this whilst pottering around yesterday and today. I don't want to turn off all the code checking as it's there to not only prevent the inevitable lock up, but assist in resolving bugs in games and highlight areas I've missed in the JIT.

The solution I'm leaning towards is a circular branch prediction buffer of four entries and have B/BL add to the list as they're hit and immediately exit. The JIT exit handler can then check the list and loop back to the predicted branch address if there's still predicted branches to follow and it's below it's instruction limit.

To resolve the illegal instruction issue, instead of reporting them immediately check the address of the hypercall that triggered the JIT. If it's different to the illegal instruction, ignore it and hand off to the JIT exit handler. If however, the JIT was entered on the illegal instruction then report as normal.

This neatly increases the JIT's efficiency and leaves all the fail-safe checks in place. I'll need to code it before I'll know what kind of improvement it will make, but I suspect it will make a big difference in some cases. In the screenshot above the ratio of JIT entries to instructions being encoded is 864:4463 or ~1:5 which is pretty low and has room for improvement.

From a quick check of a few other games, the ratio is on average 1:4
JonAbbott
Posts: 2938
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: Update on next release

Post by JonAbbott »

Branch prediction now implemented, I've played around with the number of branches it buffers to follow and the maximum number of instructions it will process at each hypercall and ended up at 8 branches and 128 instructions.

Zarch possibly isn't a good test as changing these radically didn't make a lot of difference overall. The efficiency in all cases doubled to 1:10
BeforeAfterDescription
>944>473Number of times the JIT was entered, down by 50%
I:4450I:4915Number of Instructions the JIT has had to looked at, up by 10%
R:863R:1042Number of Instructions the JIT has re-encoded, up by 20%
L:328L:342Number of LDR's it's had to check for page zero reads, up by 4%
C:0C:85Number of full Cache flushes required
#:470#:525Number of codelets created, up by 11%
New branch prediction 4 linear, 8, 128_sm.png
New branch prediction 4 linear, 8, 128_sm.png (12.73 KiB) Viewed 8134 times
JonAbbott
Posts: 2938
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: Update on next release

Post by JonAbbott »

ADFFS 2.49 is now feature complete, just need to debug and resolve some game hangs on the Pi.

@Vanfanel / @ringdings - it's on the dev site (/development/adffs/adffs249X.zip). Update the ADFFS module with the daily build (/development/32bit/cpu/adffs500,ffa), and grab the debug module version to assist with locating issues (/development/32bit/cpu/adffs500db,ffa)

Known issues:
  1. 2067BC
    • crashes after loader screen on the non-debug build (resolved)
  2. Axis
  3. Ballarena
    • ADFFS crashes flushing JIT space - whilst loading a file (Pi only) (resolved)
  4. Battle Chess
    • crashes on loader screen (Pi only) (resolved)
  5. BlowPipe
    • unhandled read/write from page zero on the input screen (Pi only) (resolved)
    • unhandled read from page zero after swapping discs (resolved)
    • crashes starting a level (resolved)
  6. Bouncer
    • crashes on loading (resolved)
    • sprite detection not working (resolved)
  7. Cannon Fodder
    • doesn't get past "Cannon Fodder Loading. Please wait....." (resolved)
    • crashes at prompt for disk 2 (Pi only) (resolved)
  8. Caverns (working but unplayable)
    • screen width is out by two pixels
    • No ship (requires VIDC cursor support adding)
  9. Chuck Rock
    • hangs if disk 2 isn't inserted when you press FIRE (Pi only) (resolved)
    • corruption on title page after insert disk 2 prompt is displayed (Pi only) (resolved)
  10. Conqueror
    • randomly crashes (Pi only) (resolved)
    • crashes when entering the map for a second time (resolved)
  11. Ego: Repton 4
    • doesn't get past the logo screen (Pi only) (resolved)
  12. Fire & Ice
    • hangs after 10+ mins in demo mode (Pi only) (resolved)
    • random hangs (Pi only) (resolved)
    • protection failing, no baddies on level 3 (Pi only) (resolved)
  13. Fish!
    • ADFFS crashes flushing JIT space - whilst loading a file (Pi only) (resolved)
  14. Freddy's Folly
  15. GODS
    • crashes starting a level (resolved)
  16. Gribbly's Day Out
    • hangs on the 2nd demo loop (resolved)
  17. Heimdall
    • crashes swapping discs (Pi only) (resolved)
  18. Hoverbod (working but unplayable)
  19. Ibix The Viking (working but unplayable)
  20. Jahangir Khan World Championship Squash
    • crashes loading (Pi only) (resolved)
  21. James Pond
  22. Jet Fighter
  23. The Legend of the Lost Temple
    • crashes on loading at the highscore screen (resolved)
  24. Lemmings
    • crashes loading (physical only) (resolved)
    • cursor isn't clipped correctly at the left and right of the screen (resolved)
  25. Lotus Turbo Challenge 2
    • hangs when you start the game (resolved)
  26. Mad Professor Mariarti
    • randomly hangs (resolved)
    • randomly returns to the main menu mid-game (resolved)
  27. Maddingly Hall
  28. Magic Pockets
    • crashes after a few demo loops (resolved)
  29. Missile Control
    • Missile Control logo sprite is too large and covers your score
  30. Mr Doo
  31. Nebulus
    • Slow decoding the game whilst at the Krisalis logo screen (resolved)
    • randomly hangs (Pi only) (resolved)
  32. No Excuses
  33. Orion
  34. Pac-mania
    • Music drops notes (resolved)
    • randomly hangs (Pi only) (resolved)
    • hangs starting the game (Pi only) (resolved)
  35. Paradroid 2000
  36. Poizone
  37. Populous
    • ADFFS crashes flushing JIT space - whilst loading a file (Pi only) (resolved)
  38. Quest for Gold
    • quits early (it uses CLib fread which I'm currently recoding and testing) (resolved - rolled back code)
  39. Quest for Gold [Learning Curve version]
  40. Revelation! [BUZZ version]
    • hangs on title page (resolved)
  41. Revolver
  42. Rockfall
  43. Rotor
  44. Slappit
  45. Sporting Triangles (working but unplayable)
  46. SWIV
    • hangs on title page (Pi only) (resolved)
  47. SWIV [BUZZ version]
    • Slow decoding the game whilst at the Krisalis logo screen (resolved)
    • hangs on title page (Pi only) (resolved)
  48. Tactic (UCS)
  49. Terramex
  50. Krisalis Collection, The: Terramex
    • Hangs on the option page when it issues SYS "Hourglass_On"
  51. The Arc/A3000 Chritmas Box: Zap the Red Weirdos from Mars
  52. Thundermonk
    • ADFFS crashes flushing JIT space - whilst loading a file (Pi only) (resolved)
    • When you fire, the sprite doesn't appear (Pi only)
  53. World Class Leaderboard (working but unplayable)
  54. Xenon 2: Megablast
  55. Zarch
JonAbbott
Posts: 2938
Joined: Thu Apr 11, 2013 12:13 pm
Location: Essex
Contact:

Re: Update on next release

Post by JonAbbott »

JonAbbott wrote:I suspect the issues above are minor as 2067BC for example shouldn't be any different with the debug build.
I've spotted some code leakage between the builds, which is down to a compile bug. I'll spend today going through the whole code base to resolve this and see if it's the cause of the problem.

All the games noted above were working last week, so it's either this compiler issue or something I've introduced with the branch prediction.
Post Reply