Really awesome update, I'll be much more likely to play SNES stuff on the Zero thanks to this
Being nitpicky here, but would it be possible to bring over the bilinear scaler from ReGBA? It should look more "even" compared to the one from SNES9x4D (lifebars from Mega Man games seem to be an easy way to compare):
I can understand if there's not enough CPU power left over for that type of scaling in this case (until the IPU driver is done anyways).
ReGBA can scale evenly, but PocketSNES cannot.
GBA: 320x213.33333 <-- 240x160, so the
bilinear interpolants are 0.75x, 0.75y (*3/4 in integer math)
SNES: 320x240 <-- 256x224, so the bilinear interpolants are 0.8x, 0.9333333y (*4/5 and *14/15 in integer math)
Because integer divisions by powers of 2 are much faster than integer divisions by arbitrary integers -- 1 cycle instead of up to 31, according to the MIPS XBurst manual --, one cannot scale in real-time using the software bilinear scaler from ReGBA if the linear interpolants cannot be represented as fractions with a power-of-2 denominator.
Currently the scaling is done by a Bresenham half-pixel algorithm vertically and quarter-pixel horizontally. It can run in 63 cycles per group of 8 output pixels, or 604800 cycles per frame (3.62% CPU time).
A software bilinear scaler would have to spend time doing a division by 15 and a division by 5 per color component per input pixel, and it would have to look at the 4 neighbors of all output pixels amongst the input image, so it would run in 744 cycles per single output pixel, or 57139200 cycles per frame (342% CPU time). As happened in ReGBA before I noticed the interpolants were optimisable, the frameskipper decision algorithm would decide that it can only render 1 frame every 3 or 4, so every game starts at 20 or 15 FPS.
tl;dr: I'll wait for the IPU.