Trying to optimize a rotation shader

Welcome to our brand new Clickteam Community Hub! We hope you will enjoy using the new features, which we will be further expanding in the coming months.

A few features including Passport are unavailable initially whilst we monitor stability of the new platform, we hope to bring these online very soon. Small issues will crop up following the import from our old system, including some message formatting, translation accuracy and other things.

Thank you for your patience whilst we've worked on this and we look forward to more exciting community developments soon!

Clickteam.
  • I've been working on a toughy project that calls for rotation the entire screen at runtime; a 512x512 texture (as viewed in a 362x362 box circumscribed inside). The single continuing problem thats been holding us back has been slowdown on some peoples machines- I can't get it to run a uniform 50 FPS on everyones computers. Over the past week, simple optimizations to the pixel shader have brought me up from 49 FPS to 56 FPS on my own system (an edited copy of the one sphax posted, which converts it to polar coordinates and back, as seen here):


    But some people still get FPS around 25-30 on their system, which would render my game unplayable. So what I've been searching for is alternative means of doing the rotation calculations. Carrying out 512x512x50 square roots & arccosines per second is a serious load in the first place, and the inefficiency of HLSL complicates that. I've attempted the simple things like using Distance(a,b) instead of that sqrt(etc), but it still comes out the same.

    Anyone know a better way to do this? I know that D3DX has matrix rotation vertex shaders built into it which should be many times more efficient (assembly code and optimized), but I don't think theres any way for MMF2 to support vertex shaders in the first place, is there? Otherwise I'm at a loss for the moment. If there is, please do tell!


    Anyway, for anyone who this came off as a bunch of gibberish to, you can actually help me too- if you've got the time, just take a peek at my project and tell me what FPS you get on your machine / graphics card:
    Please login to see this link.


    and much thanks to sphax of course

  • Try do something like this:

    float angle = //whatever angle
    float2 point = { In.Texture.x - 0.5, In.Texture.y - 0.5 };
    float2x2 rotationMatrix = { cos(angle), -sin(angle), sin(angle), cos(angle) };

    float2 newPoint = rotationMatrix * point + {0.5, 0.5};
    Out.Color = tex2D(Tex0, newPoint);

    Not sure if some of this even compiles, can't remember HLSL syntax from just writing a single shader :)

    The idea behind it explained here:
    Please login to see this link.

    Please login to see this picture. Please login to see this link. - Please login to see this picture.

  • Oh lord, that was so simple I can't believe I didn't think of it. I mean, not exactly what you said, but a bit different. You can't really do the texture matrices in pixel shaders, because you'd have to be operating on the entire matrix of textures; thats a vertex shader's job. But what I *could* do is just multiply out the obvious rotation vectors:

    Texture.X = X * Cos(A) - Y * Sin(A)
    Texture.Y = Y * Cos(A) + X * Sin(A)

    I can't believe that didn't occur to me.
    I rewrote the body of the shader and it looks like this:

    Bam, it was really that simple, and suddenly my game is cranking away at an amazing 100 FPS (the same as it gets with just CPU limitations via no pixel shaders). Theres literally no slowdown at all. Thanks a ton!

  • Glad I could help :)

    Though I think you mixed something up about matrices. They are just simple arrays of numbers. They aren't associated with vertex or pixel shaders as such. I think they can be used in both for whatever use you have for them.

    If matrices are allowed in pixel shaders (which I don't see why they shouldn't be) my version with the matrix would actually be faster than what you are doing :)
    The GPU has special instructions for doing matrix/vector multiplications that are faster than doing them value by value.

    Please login to see this picture. Please login to see this link. - Please login to see this picture.

  • It should be faster still if you only calculate cos(angle) and sin(angle) once at the very start of the function. Partly because sin and cos are slow, so doing them once each instead of twice each is much faster, and partly because of a little-known optimization in the shader compiler:
    The direct-x pixel shader compiler takes anything at the start of the pixel shader function that doesn't use the input from the vertex shader and sets it aside so that it is only executed once per draw call instead of once for every pixel.

    Please login to see this picture.

  • Humm neat trick there, ill add that in, thanks. Interesting little optimization, but I suppose if the pixel shader can't alter the inputs than it should work shouldn't it (although sin and cos are very very fast compared to sqrt, since unless I'm mistaken it uses a preset table of values for whole numbers, or maybe thats just my calculator)

  • Yes very neat trick :)
    I don't know if the GPU has a fast lookup table but I doubt it. You could easily make one yourself and store it in a texture.

    Buuuuuuut Dynasoft's method is pretty much the best solution ;)

    Please login to see this picture. Please login to see this link. - Please login to see this picture.

  • I'd be surprised if it doesnt, I'd guess that at least 50% of all sin/cos calls to the GPU are for integers. Then again that would depend on the GPU wouldn't it. But yeah I'm just doing that store-the-cos trick, but remember its a tradeoff between processing & storage, but I doubt 2 extra floats would make a difference

    edit: actually scratch that, my research is showing that they tend to just eschew the lookup tables simply because sin/cos can be done trivially fast, so theres no advantage.

  • Quote from Pixelthief

    I'd guess that at least 50% of all sin/cos calls to the GPU are for integers.


    Their parameter is in radians, I would think nearly no calls are going to be integers.

    Very few shaders use sin/cos because it's more flexible to calculate a transform matrix cpu-side and pass that to the shader, so I wouldn't be surprised if sin/cos weren't very optimised.

    Please login to see this picture.

  • Sorry to Jump in the middle, but it is true sin and cos are not very efficient function.

    i drop this link that have very useful asm function

    ftp://Please login to see this link.

    them give you a very good idea how to be implemented in asm.


    Regards

    Regards,


    Fernando Vivolo

    ... new things are coming ...

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!