The Folded Polynomial - N64 Optimization

227,484
0
Published 2023-09-09
It seems so obvious in hindsight, but somehow only one person thought of this...
Patreon: www.patreon.com/Kazestuff
Discord: discord.com/invite/B3vG7G3
Streams: youtube.com/@KazeClips
🐦 twitter.com/KazeEmanuar
MERCH: kazemerch.myspreadshop.com/all

0:00 Intro
0:56 Chapter 1: Refresher
2:49 Chapter 2: Numerics
3:20 Chapter 3: The Square Root...
4:42 Chapter 4: The Folded Polynomial
(I quickly scrubbed the video several times and there was no chapter 5, I think he just saw the length between chapters 4 and 6 and assumed it was there)
10:29 Chapter 6: Other Ideas
12:23 Conclusio

All Comments (21)
  • @KazeN64
    Getting a lot of comments about making the code branchless - so let me explain why that's a bad idea: A branch takes a single cycle on N64 and we have no branch prediction. Doing bit manipulations on floats requires us to move the float from a float register to a general purpose register first, so that will always be a penalty of 2 cycles. This means that just doing the conditions is WAY faster than doing bit manipulation on floats. Lets compare a branchless version to a branchfull version: if (shifter & 0x8000) { cosx = -cosx; } Compiles to: andi t0, a0, $8000 beq t0, r0, DontInvert neg.s f0, f0 Dontinvert: (3 cycles) cosx = cosx ^ ((shifter&0x8000)<<16) Compiles to: mfc1 t0, f0 andi a0, a0, $8000 sll a0, a0, $16 xor t0, a0, t0 mtc1 t0, f0 (6 cycles)
  • @GameDevYal
    "We run the computations first and THEN figure out which one we computed" You know you're pushing against the limits of what's possible when your code starts implementing quantum mechanics
  • @undefined06855
    kaze on his way to save literally 0.000096 microseconds on a console thats 27 years old
  • @FairyKid64
    I really appreciate how open minded you are and how you give credit where credit is due and don't try to make people with "worse" ideas look bad. Keep up the good work!
  • @realvaporcry
    Imagine being this math genius, code genius, retro gamer, nintendo enthusiast and also being buff. WTF with this dude, he is a demigod.
  • @Armameteus
    In short: - not actually faster - way more accurate I'd say that's a decent trade-off.
  • @fders938
    This stuff reminds me of when I was learning x86 programming and attempted to use my new-found powers to beat libc's sin/cos. After hand-writing an asm implementation of a 7th order taylor polynomial it was...2x slower and less accurate than libc's version. These videos might help in the future when I get into DS programming.
  • @GamerOverThere
    Kaze is slowly recreating the shipoftheseus problem in SM64 😂
  • @SpringDavid
    Kaze when he accidentally creates a movement of optimizing old games to the point they cannot lag:
  • @CynicPlacebo
    Oh, I remember! I'm so glad to hear a programmer that still cares about performance. Too often I've had coworkers do something in a disgustingly inefficient way because they just don't even think about it. I'll rewrite a query, or a function, or just flatten some nested loops, and suddenly it's 3 to 100 orders of magnitude faster (usually when someone is 1,000x slower they start reaching out for help, but that's about it)
  • @Nicoya
    Optimizing trig functions is great, but the fastest trig function is the one you never call. Have you taken the time to step back and see how many places where you can avoid entering degree/polar space, and instead simply stay in linear (vector/matrix/quaternion) space?
  • @jfa4771
    imagine if Nintendo discovered your rom hack in 1996, how shocked the devs would be
  • @Ragesauce
    You have no idea how excited I am to play the original SM64 when you remake it with all the improvements. I have held off playing it for years all for this moment. I cannot wait!
  • @LucidTyrant
    In an age where a modern basketball video game has over 100 gb of data you give me haven knowing people out there are developing formulas for specific hardware in order to save possibly a few milliseconds just for the sake of optimization. You're work is both amazing and humbling.
  • @SailorCheryl
    Schoolkid: "Nobody needs this math stuff in real life!" Kaze: Entertainment is math, enjoy!
  • @supersmily5811
    I want this rom hack so badly. The levels look so huge, and clean! You have ziplines! And workplace accidents!
  • @lexacutable
    I'm enjoying imagining an alternate universe in which commercial n64 games were this efficient
  • @rebmcr
    Do you plan to release a version of your optimised engine at some point, which can run the original Super Mario 64 ROM? It would make for a very interesting comparison.
  • @unique_two
    I think you could use a double angle identity of cos here: cos(2x) = 2cos(x)^2 - 1. Compute cos in the interval [0, pi/4] via quadratic polynomial, then use the identity to expand to the interval [0, pi/2]. From there you get all values of cos via the usual symmetries. This might get rid of the square root, but I don't understand the details of the implementation.