Author Topic: GCW-Zero IPU screen scaling implementation  (Read 21326 times)

slaanesh (OP)

  • Posts: 569
    • Slaanesh Dev
GCW-Zero IPU screen scaling implementation
« on: September 25, 2014, 02:12:10 am »
I'm close to release an updated version of MAME for the GCW-Zero.

As an option, I want to support hardware screen scaling using the IPU. This would be very useful for MAME given the multitude of different resolutions that various games use.

Anyway, how exactly is the IPU screen scaling implemented?

From the http://www.gcw-zero.com/updates:

Quote
Applications can be modified to request resolutions smaller than 320x240, which will then be scaled up in hardware. This can allow for example some emulators to run a bit faster. By default, scaling will preserve the aspect ratio, but if you prefer to have no black borders even if that means distorting the image, you can switch modes using Power+A.

Applications can also set the key X-OD-NeedsDownscaling=true in their OPK metadata to request the use of resolutions higher than 320x240 which are then downscaled to 320x240. This can help in porting PC applications that need for example 640x480 output resolution. For applications that can render in either low (320x240 or below) or high resolution, we suggest to not set this key and render in low resolution, since outputting fewer pixels is better for performance and battery life.

This is a nice description and exactly what I would like to use but I can't find any implementation details.

Are we talking SDL function calls here?
ie.
Code: [Select]
surface = SDL_SetVideoMode(width, height,depth, mode_flag);
Or kernel function calls? Is there example code using the scaler?

                                                                               

EDIT: Isn't it always the way, you find the info a few minutes after posting a question?

http://boards.dingoonity.org/gcw-releases/opendingux-update-2014-08-20/msg111587/#msg111587

Now if only the IPU could do rotations as well, that would be awesome! But scaling is harder/time consuming so this is a very nice addition.
« Last Edit: September 25, 2014, 02:19:05 am by slaanesh »

hi-ban

  • Posts: 886
Re: GCW-Zero IPU screen scaling implementation
« Reply #1 on: September 25, 2014, 02:41:41 am »
Exactly. The IPU automatically scales whatever screen resolution you throw at it. Of course if you set the screen resolution to 320x240, it wont scale anything for obvious reasons.
Lets say you want to scale a 256x224 screen: Then you simply do:

Code: [Select]
screen = SDL_SetVideoMode(256, 224, 16, SDL_HWSURFACE | SDL_DOUBLEBUF);
or if you want to use triple buffering (which is actually recommended to do):

Code: [Select]
screen = SDL_SetVideoMode(256, 224, 16, SDL_HWSURFACE | SDL_TRIPLEBUF);
Aditionally, you can also force the IPU to keep the aspect ratio when scaling, or force it to stretch to full screen.
By default it is set to keep the aspect ratio. You can alternate the modes by pressing power+A, but i found it painful to have to use that every time you start a new game, so you can make 2 different scalers (an "aspect ratio" one, and a "fullscreen" one) selectable from the emulator menu:

Code: [Select]
if (aspect_ratio_scaling)
{
FILE* aspect_ratio_file = fopen("/sys/devices/platform/jz-lcd.0/keep_aspect_ratio", "w");
if (aspect_ratio_file)
{
fwrite("1", 1, 1, aspect_ratio_file);
fclose(aspect_ratio_file);
}
}
if (fullscreen_scaling)
{
FILE* aspect_ratio_file = fopen("/sys/devices/platform/jz-lcd.0/keep_aspect_ratio", "w");
if (aspect_ratio_file)
{
fwrite("0", 1, 1, aspect_ratio_file);
fclose(aspect_ratio_file);
}
}

Also, if you need the IPU to use downscaling at any point (in the case of MAME, you will need it at least for CPS games), you must reflect that in the .desktop file which comes packed into the OPK, by adding the following line in the .desktop file:

Code: [Select]
X-OD-NeedsDownscaling=true
« Last Edit: September 25, 2014, 03:01:30 am by hi-ban »

slaanesh (OP)

  • Posts: 569
    • Slaanesh Dev
Re: GCW-Zero IPU screen scaling implementation
« Reply #2 on: September 25, 2014, 03:16:19 am »
Thanks for the quick and comprehensive answer.

Code: [Select]
screen = SDL_SetVideoMode(256, 224, 16, SDL_HWSURFACE | SDL_DOUBLEBUF);

For a single buffered video mode, can I simply write to screen->pixels or do I need to write to the screen structure using other SDL functions?

Code: [Select]
X-OD-NeedsDownscaling=true

What exactly do these settings do? Set environment variables? Most of the time during development I am not running from an .OPK file; how else can this hint be set?

mth

  • Posts: 317
Re: GCW-Zero IPU screen scaling implementation
« Reply #3 on: September 25, 2014, 03:43:27 am »
Thanks for the quick and comprehensive answer.

Code: [Select]
screen = SDL_SetVideoMode(256, 224, 16, SDL_HWSURFACE | SDL_DOUBLEBUF);

For a single buffered video mode, can I simply write to screen->pixels or do I need to write to the screen structure using other SDL functions?

Don't use single buffering, it will suck in one way or another:
  • SDL_SWSURFACE is slower than SDL_HWSURFACE and will cause tearing.
  • SDL_HWSURFACE without SDL_DOUBLEBUF will tear and will flicker on overpainting. Also it will do unnecessary cache flushes if the frame rate is below 60 fps.

If your application has a frame rate that regularly drops below 60 fps, double buffering will give you a constant 30 fps. If that is not what you want, you can use triple buffering. It will increase latency a bit, but it can output glitch-free at for example 50 fps. See this patch for a code example.

In any mode, you can push pixels via screen->pixels. To make the code portable, lock the surface before accessing the pixels pointer and unlock it when you're done, but on the Zero the locking doesn't actually do anything. Make sure you don't cache the screen->pixels pointer between frames, since the pointer changes every frame when double/triple buffering. When your frame is done, call SDL_Flip(), which does the right thing in every mode.

Code: [Select]
X-OD-NeedsDownscaling=true

What exactly do these settings do? Set environment variables? Most of the time during development I am not running from an .OPK file; how else can this hint be set?

You can set it via sysfs:

Code: [Select]
echo 1 > /sys/devices/platform/jz-lcd.0/allow_downscaling
Use this only during development, not in production, as this path could change in the future.

slaanesh (OP)

  • Posts: 569
    • Slaanesh Dev
Re: GCW-Zero IPU screen scaling implementation
« Reply #4 on: September 25, 2014, 03:49:26 am »
Thanks to mth, hi-ban and Nebuleon (from a PM).

Excellent information and my questions are all answered.

This kind of stuff should be in the GCW-Zero Development Wiki.

Slightly off-topic, If HDMI output could be realized soon, the GCW-Zero would certainly give Raspberry Pi a run for it's money. The reason I mention this is I've been experimenting with HDMI output on the PI and DispmanX scaler/rotater using gles2.

I prefer the GCW-Zero, but I do love the easy connectivity of the Raspberry Pi with its standard, HDMI, 4x USB and Ethernet.
« Last Edit: September 25, 2014, 03:55:04 am by slaanesh »

David Knight

  • Posts: 577
Re: GCW-Zero IPU screen scaling implementation
« Reply #5 on: September 25, 2014, 08:29:57 pm »

Code: [Select]
echo 1 > /sys/devices/platform/jz-lcd.0/allow_downscaling
Thanks for this, I had the same issue whilst testing 640x480 screens.

Are there any plans to allow 800x600 downscaling? I noticed whilst testing it doesn't seem to downscale properly. It would be useful when HDMI is supported.

Nebuleon

  • Guest
Re: GCW-Zero IPU screen scaling implementation
« Reply #6 on: September 25, 2014, 10:55:33 pm »
<1 level of quote nesting omitted ~Nebuleon>

Thanks for this, I had the same issue whilst testing 640x480 screens.

-> Are there any plans to allow 800x600 downscaling? I noticed whilst testing it doesn't seem to downscale properly. It would be useful when HDMI is supported.
That is a current limitation of the framebuffer. Its maximum resolution is 640x480 in 2014-08-20 firmware.

mth

  • Posts: 317
Re: GCW-Zero IPU screen scaling implementation
« Reply #7 on: September 26, 2014, 05:56:32 pm »
Are there any plans to allow 800x600 downscaling? I noticed whilst testing it doesn't seem to downscale properly. It would be useful when HDMI is supported.

For now, the max resolution is set to 640x480, like Nebuleon said. We are planning to migrate the video driver from the old Linux framebuffer interface to the modern KMS/DRM interface. After that migration, we might add more resolutions.

Note that in the Linux kernel, DRM means Direct Rendering Manager; it has nothing to do with restricting your access ;)

Nebuleon

  • Guest
Re: GCW-Zero IPU screen scaling implementation
« Reply #8 on: September 27, 2014, 09:17:55 am »
We've gone ahead and documented hardware scaling here: http://wiki.gcw-zero.com/Hardware_Scaling

There's also a page about triple buffering.

slaanesh (OP)

  • Posts: 569
    • Slaanesh Dev
Re: GCW-Zero IPU screen scaling implementation
« Reply #9 on: October 09, 2014, 07:13:14 am »
By the way, to all those involved with the IPU scaling code, it's pretty awesome.

I've started implementing in my project and the results are brilliant.

Just from a technical point of view, does the GCW-Zero need to convert this to something like YUV420 or is it doing RGB scaling? The A320 has IPU scaling but required the source video be in YUV which isn't super awesome for most applications.

Nebuleon

  • Guest
Re: GCW-Zero IPU screen scaling implementation
« Reply #10 on: October 09, 2014, 07:40:32 am »
It's RGB to RGB scaling. The JZ4770 also inherits YUV to RGB scaling from the JZ4740 of course, but it's not implemented on the GCW Zero.

pcercuei

  • Posts: 1676
    • My devblog
Re: GCW-Zero IPU screen scaling implementation
« Reply #11 on: October 09, 2014, 08:04:06 am »
By the way, to all those involved with the IPU scaling code, it's pretty awesome.
I'm glad you like it :D It took 6 months of work.

Basically the hardware can convert everything to everything  ;D
We only support RGB565 / RGB888 for now, but we will eventually support more (YUV, RBG/BRG/BGR/GRB/GBR and RGB555 to RGB).
Also, currently we only have bilinear filtering implemented, while the hardware can do bicubic (bilinear is better for downscaling, bicubic is better for upscaling).

Note that the IPU uploads the converted image directly to the LCD converter: the converted image is not written to RAM, which means that scaling is absolutely free performance-wise.

slaanesh (OP)

  • Posts: 569
    • Slaanesh Dev
Re: GCW-Zero IPU screen scaling implementation
« Reply #12 on: October 10, 2014, 01:11:35 am »
Well great job, I'm impressed as it is working very well AND is very fast with good quality results.

For me RGB565 and RGB888 support is all my application needs at the moment so I'm covered. I was only curious about YUV as it was the only way the jz4740 did scaling, which like I already said isn't very useful most of the time.

One more question, can the IPU do 90/180/270 degree rotations?
« Last Edit: October 10, 2014, 01:13:28 am by slaanesh »

Nebuleon

  • Guest
Re: GCW-Zero IPU screen scaling implementation
« Reply #13 on: October 10, 2014, 01:39:08 am »
I don't think it does, at least as a transformation you can request inline with resizing. And you couldn't implement it with hacks either, because the IPU's memory accesses are pretty dumb.

The framebuffer is stored a certain way (all pixels in a line, stored packed; repeat for all rows); the IPU reads from the memory in that way (32-byte bursts read from RAM, which reads 16 RGB 565 or 8 RGB 888 pixels from the same row); the IPU also stores that same way to the LCD (await the next horizontal refresh, write a line, repeat for all lines).

So rotation is done in software, and it's rather efficient, I'd say!

Simply read rows (from your intermediate buffer) one at a time and write the pixels as a column (to the framebuffer). Reading a source row in RGB 888 (240x320) will be 960 bytes, so 30 cache lines filled for one row. Writing the pixels as a column (320x240) will load a cache line from each target row for the purpose of modifying them with one pixel each, so 240 cache lines filled for the next 8 columns.

That turns out to be 270 * 32 = 8640 cache bytes used concurrently at any point, out of 16384 for the L1 data cache, so you don't thrash the L1 cache or access L2 while rotating (OK).
Reading each row will be 1800 cycles RAM access, then writing the columns will be 14400 cycles RAM access for every 8 columns, then 240 cycles for the next 7 due to the 7 next columns being in the same cache lines that were already loaded.
So you get a rotation time of 1,058,400 cycles per frame, which is 6% CPU usage (63.5 MHz).

[Divide by 2 if you want RGB 565, so 3% CPU usage (31.75 MHz).]

slaanesh (OP)

  • Posts: 569
    • Slaanesh Dev
Re: GCW-Zero IPU screen scaling implementation
« Reply #14 on: October 10, 2014, 01:59:21 am »
Thanks Nebuleon, so it should be fairly quick.
I have a basic "C" based blitter that will do 90/180/270 degree blits already but an optimized assembler blitter would be beneficial here. I see good use of the MIPS32r2 INS/EXT instructions here!

One more IPU question - can it do palette conversions?
ie. 8-bit palette framebuffer (ie. each pixel represent by a byte) converted to RGB565 or RGB888?

By the way where are you guys getting your technical information in regards to the IPU?

Nebuleon

  • Guest
Re: GCW-Zero IPU screen scaling implementation
« Reply #15 on: October 10, 2014, 02:33:07 am »
The IPU might do palette conversions, but I don't know if it can read 8-bit input and use lookup tables to map them to colors.

As for the technical information, it's under NDA, so I'm not sure if I can discuss it more than what I've observed of its behavior that was not in documentation. Of course, you can look at the source for our Linux repository ;)

I'm not sure what INS/EXT would be useful for, that ANDI/SRL wouldn't also be useful for...
Code: [Select]
; register assignments:
; $4: input pixel pointer
; $5: output pixel pointer
; $6: input line pitch in bytes
; $7: output line pitch in bytes
; $8-$15: free to use by the algorithm
  LW    $8,  0($4)        ; read two pixels at once, first pixel = LSB because little-endian
  ANDI  $9,  $8,  0xFFFF  ; extract LSB pixel
  SRL   $10, $8,  16      ; extract MSB pixel
  SH    $9,  0($5)        ; write LSB pixel to the current row
  ADDU  $5,  $5,  $7
  SH    $10, 0($5)        ; write MSB pixel to the next row

But I was just thinking straight LH/SH (or LW/SW for RGB 888), maybe with some overlapping to read 8 pixels into registers then write the 8 registers to 8 rows.

slaanesh (OP)

  • Posts: 569
    • Slaanesh Dev
Re: GCW-Zero IPU screen scaling implementation
« Reply #16 on: October 10, 2014, 02:58:35 am »
Here is a list of what my application's blitter needs to handle:

  • Resolution of the source is not fixed. It can be anything from 256x192 to 640x480 and almost anything in between.
  • In pretty much all cases palette conversion is required from either 8-bit or 16-bit. ie map to a 256 color entry or up to a 65,536 color entry.
  • In most cases, no rotation is required.
  • In most cases scaling would be nice or is required.

So you end up with a blitter that is anything but straight forward as there are many cases it needs to handle.
A generic blitter that handles all cases all the time is too slow.

So to optimize, I have various blitters that do some or all of the transformations required. The application selects the blitter which is most suitable for the job.

To have the IPU handle scaling is a huge benefit as it's probably the most CPU intensive of all the aspects of the blitter.
Rotation is probably the next biggest, followed by the palette lookups.

Most of the time I am dealing with 16-bit pixels so I thought could read 32-bits (two pixels) and write each pixel with INS. Which saves one read operation. This is ignoring palette lookups though.

Anyway I'll sort something out.



Nebuleon

  • Guest
Re: GCW-Zero IPU screen scaling implementation
« Reply #17 on: October 10, 2014, 03:02:30 am »
Yeah, so you'll need to do rotation and palette lookups in software, but you don't need to deal with including a generic bilinear scaler among all that.

I'm glad we could help :)

pcercuei

  • Posts: 1676
    • My devblog
Re: GCW-Zero IPU screen scaling implementation
« Reply #18 on: October 10, 2014, 07:38:16 am »
The IPU does not handle rotation nor palette lookup. The 2D core of the GPU might... But we have no driver for it.

Slaanesh, writing assembly routines is an ARM thing. The MIPS architecture is simpler, and GCC generally does a very good job at optimizing.

Technical information is available on Ingenic's FTP, IIRC.

pcercuei

  • Posts: 1676
    • My devblog
Re: GCW-Zero IPU screen scaling implementation
« Reply #19 on: October 10, 2014, 07:40:21 am »
Oh, you could also use a shader for the rotation and palette lookup.

 

Post a new topic