Your Render_draw_frame() function already renders from a 8-bit paletted buffer (Render_rgb) to a 16-bit one (Render_screen_pixels), if you add another conversion step to downscale, it's going to eat up a lot of CPU time.
What you can do, is convert from 8-bit paletted to 16-bit and downscale in one single pass.
If you take a 3x3 pixel block as source:
[NW] [N] [NE]
[W] [C] [E]
[SW] [S] [SE]
You want to convert it to a 2x2 block like this:
where avg() is the average of the 4 values, and pal() is the 16-bit color you get from the palette at the offset of the value of the source pixel.