The Clickteam Fusion 2.5 optimisation & performance Hard Data thread

Welcome to our brand new Clickteam Community Hub! We hope you will enjoy using the new features, which we will be further expanding in the coming months.

A few features including Passport are unavailable initially whilst we monitor stability of the new platform, we hope to bring these online very soon. Small issues will crop up following the import from our old system, including some message formatting, translation accuracy and other things.

Thank you for your patience whilst we've worked on this and we look forward to more exciting community developments soon!

Clickteam.
  • So, you want to optimise your Fusion game. You google "clickteam optimisation" and quickly find a lot of threads and articles with many suggestions. But many aren't backed up with actual proven tests or are from many years ago, and so the real-world, present-day implications of optimisation remain somewhat vague. Where should you invest your optimisation efforts most? Does "fine collision" really hurt performance? How much quicker is testing for collisions vs testing for overlap? Will putting all your fastloops in groups be a goldmine of increased framerate, or a waste of time? Often, we just don't know these answers. Or some of us do, but haven't shared our findings with the rest.

    So let's have a thread where we can gather in one spot various benchmarks and tests that we've done, to measure the performance of various aspects of Fusion 2.5. My hope is that as many people as possible will contribute. I'd like to suggest only two (loose) rules:

    1. Contribute any findings you like, about any aspect of Fusion. But please, no hearsay, theories, gut feelings, speculation, wisdom passed down through the generations, or something you're kinda pretty sure was true in MMF 1.5....without some kind of data backing it up. How you test is up to you. You can make a rigorous benchmark and upload results and MFAs. You can use the inbuilt A/B tester in Please login to see this link.*. Or you can just try out something informally a few times in your own game and write down the FPS. As long as there's some kind of observable result that you're basing your findings on.

    2. Try to provide enough data that the question "will this make any noticeable impact in the real-world" can be answered. For example, just saying "I compared X and Y, and X was 10% faster!" isn't enough. Did it require a brute-force test with 5000 rapidly moving objects and a fastloop with 1 million loops for this 10% difference to show itself? (if so, then real-world optimisation implications are likely to be nil). Or was the test somewhat less extreme (and hence more likely to have real-world implications)? Generally, giving a brief description of the test scenario should be enough, though try to provide: what you tested, the resulting measure (eg. milliseconds to complete, Fraps fps), and how excessively you needed to push the test to get noticeable results (eg. how many fastloops per frame, how many objects on screen, etc.)

    I'm starting this thread because I've been doing some tests for my own use, and I thought I'd share them. I've also occasionally seen other people post benchmarks that I found fascinating, and I'd love to see more! Hopefully this thread can eventually become a central spot for people to learn about what are the best practices for making optimised games, which tips are valuable, which aren't worth the hassle, and hopefully discover some unexpected surprises. Here are some things that I do encourage you to do:


    - Dispute the results....if you can provide countering evidence (screenshot, mfa, detailed description of test, etc.). The more data-points the better!
    - Provide complementary results to what others have posted, perhaps testing a slightly different way, on a different exporter or OS, or even just repeating the same test to see if your results match
    - request tests you'd like to see done, if you don't feel confident (or haven't the time) to test them accurately yourself

    I invite you to contribute any way you can, so that we can all learn more about how to optimise our games :)


    * the VACCiNE A/B testing panel:

    Please login to see this picture.

    Please login to see this link.
    My Fusion Tools: Please login to see this link. | Please login to see this link. | Please login to see this link.

    Edited 2 times, last by Volnaiskra (April 30, 2017 at 12:10 PM).

  • Here are some tests that I've done. I did my these using the inbuilt A/B testing mechanism in Please login to see this link.. To my knowledge, it's accurate (but I built it, so it might be rubbish). All of the following tests were done in the windows runtime (running from within Fusion, not building an actual EXE). My specs: Windows 10, core i7 4770k @ stock, GTX 980ti, 32GB DDR3-1333 ram. Fusion version R288.3 steam (direct3d 9 mode)

    SIGNIFICANT IMPACT. (maybe)
    Pay attention to these areas. Even if you have to rework existing code, it may be worth the effort.

    Testing for Collisions vs Testing for Overlap
    It's well known that polling for collisions is faster than polling for overlap. I tested this by having 4000 Actives flying around, polling multiple times per frame for collision/overlap. Polling for collision was 797% faster! (184ms @ 1000 loops per frame). Note that this speed advantage only exists if the "on collision" condition is the topmost condition (ie. it's green). Placing it lower appears to make it function like an "on overlap" condition".


    Checking "antialiasing" in display options of an Active Object - This one is an eyebrow-raiser!
    I tested the performance impact of the "Anti-aliasing" checkbox under "effects" in the "Display Properties" of Active Objects. To test, I had a few thousand actives flying around, then repositioned them randomly 1500 times per frame in a fastloop.
    Checking "antialiasing" results in a whopping 64% speed increase (211ms to complete 1500 fastloops). These three numbers (relatively high speed increase, relatively high milliseconds, relatively low number of fastloops), combined with how plentiful Active Objects are in almost any game - make this one of the most impactful optimisation factors I have ever seen tested.

    This result is remarkable, for a number of reasons. Firstly, let me repeat in case you missed it: turning "anti-aliasing" on gives you the big performance increase. Secondly, as far as I can tell with the naked eye, this setting doesn't seem to do anything - I couldn't see any difference in visual quality with the setting on or off - even after screengrabbing and zooming in Photoshop. Thirdly, from what I could gather after googling old threads, Yves says that the setting is actually misnamed, since in DirectX mode (which I'm assuming almost every Clicker uses by now), it doesn't turn anything on but rather turns Windows' system antialiasing off (though this appears to only make a visible difference to text, as in the String object). Finally, when you create an Active Object, this setting is unchecked by default.

    So, unless I'm missing something, this is the situation: The "antialiasing" option does nothing at all except substantially impede performance, yet is set to do this by default on every new object you create. It's misnamed in such a way that users desperate for more performance will be inclined to uncheck it, inadvertantly worsening their performance. And, as if to make things as unhelpful as possible, the in-editor description of this option offers this advice: "..."

    I'd be very interested for others to chime in about this one (with their own tests and/or info about what this setting actually does). Perhaps I've made a stupid mistake somewhere and missed something. Or perhaps it works very differently on other exporters. Or maybe it really is as terrible as it seems, and we should all be religiously turning it off (ie. checking the box) every time we create an Active.

    UPDATE 9 May: This one has proved to be highly elusive. A couple of people tried tests on their PCs and saw no difference between AA on and AA off. I myself spent a few hours trying to recreate the test and result (I stupidly didn't save the MFA the first time), and I can no longer find any performance impact either. I'm putting this down to a nvidia driver change I made a few days ago. At least one other person (happygreenfrog) has reported a sizeable difference when testing this setting. So, it seems that this setting can make a difference, but only in particular cases, perhaps with particular hardware/driver configurations. My recommendation would still be to check antialiasing, because there's a chance it might have a benefit for some of your players. And I've not seen any evidence (either in reports on this thread or in my own tests) that checking antialiasing hurts performance. Instead, it seems to either do nothing in some cases while improving performance in other cases.


    Control X object vs regular keyboard/mouse object
    I tested 3 events in a fast loop: "upon pressing space", "while pressing C" and "if any key pressed". The Result: Control X was 818% times faster! (20ms per frame @ 200,000 loops per frame)
    Given that you're probably polling for key states many times per frame, on every single frame of your game, this one seems a no-brainer - use Control X! (it's why I've switched VACCiNE to use it almost exclusively). Note: the speed advantage is only noteable when using Control X's "select by value" options


    Fine Detection vs no Fine Detection
    I did a few different tests here. For example, I had a few thousand actives flying around, and tested repeatedly for collisions. Or I tested repeatedly for overlaps in a fastloop. I tried round shapes, and wonky irregular shapes. In each case, the results were highly unpredictable. Sometimes they would show a substantial win for "fine detection", while other times it would be the exact opposite. After probably about 20+ tests, the results seemed to slightly favour "no fine detection", on average (maybe about 2%?) So I'm putting it in the "significant" section, though I find the wildly fluctuating results puzzling.


    Hiding unused fastloops in closed groups VS leaving them open
    I've read a few times that you should put fastloops in groups, and activate/deactivate those groups only as needed in runtime. Every fastloop call must search through all fastloops, so hiding unused ones in inactivated groups speeds this process up. I tested this by running a small handful of fastloops, 2000 times per frame. In test B, I repeated this while also inactivating the vast majority of my game's fastloops (500+ fastloop events referencing 60+ individual fastloops) by closing their groups at the beginning of frame, and reactivated them at the end of each frame. The results were impressive: hiding the fastloops gave me a 179% faster result (30ms).

    Performance gains in benchmarks like this almost always become less noticeable when you shift from testing conditions to less extreme, more real-world levels. This is because performance discrepancies are exposed and amplified by the stress-testing of a benchmark environment, but are more likely to be camouflaged by everything else going on in the game when brought down to regular levels. However, when I lowered this test from 2000 loops per frame to 200 loops per frame, the result was still a 50% measured speed increase, and when I lowered it further to just 50 loops per frame, I still got an 11% increase! These are solid performance gains at real-world levels. Leave unused fastloops open at your own peril!


    Large single Active vs many little actives, pasted into background
    Someone asked in another thread whether using a single large active as a background would be better or worse than splitting up the same image into lots of tiny actives that would then be pasted into the background. I tested this. Test A used a single 2048x2048 Active. Test B used 4096 little 32x32 Actives (which cover the same area) pasted in the background. In both tests, I rapidly moved the camera around many times per loop in a fastloop, to force Fusion to frequently have to redraw everything. According to the results, the single Active was 35% faster (114ms @ 150000 loops per frame).


    Always+Condition VS Condition+Condition
    Say you want to set the value "jump" to 1 when the user is holding space, and to 0 when the user is not holding space. There are two basic approaches to achieving this.

    Code
    Approach #1: 
    CONDITION: (x) if user pressing space (negated)
    ----ACTION: set "jump" to 0
    
    
    CONDITION: if user pressing space
    ----ACTION: set "jump" to 1
    Code
    Approach #2: 
    CONDITION: Always 
    ----ACTION: set "jump" to 0
    
    
    CONDITION: if user pressing space
    ----ACTION: set "jump" to 1


    Both approaches will produce identical results, but Approach #2 is faster (89% faster according to my test). What approach 2 wastes by always executing the first action (even when it will be immediately overwritten by the 2nd action) it more than makes up for by not having to execute two conditions each time (forcing Fusion to poll the keyboard twice per frame instead of only once).

    This is a very simple and valuable technique to use in your games. And because you're bound to have plenty of opportunities to use it, it may stack up into some tangible performance savings. However, the performance impact of this technique will vary widely depnding on the circusmtances. In the above example, polling the keyboard (using the default mouse/keyboard object) is expensive, so Approach #2 wins easily by polling fewer times. But if your condition does something less expensive, like checking an alterable value, your savings won't be as large (38% according to my test). Furthermore, the actions will impact on result too. Our example uses a very simple action (set "jump" to a number), so it doesn't matter much that it's sometimes executed unnecessarily. But if we had 10 actions, some of them dealing with complex equations or expensive extensions? Then the cost of sometimes unnecessarily executing those actions might well outweigh the savings of executing one fewer condition.

    So my advice is to use this always+condition technique frequently, but not unthinkingly.


    Mixing & Matching Active Objects VS homogenous Active Objects
    I tested a condition and an action that contained equations referencing the same Active Object several times (eg. Alterable value ("Fred") * 2.5 * Alterable Value B("Fred") / Alterable Value C("Fred")....). I compared this to an otherwise identical event that referenced multiple Actives Objects (eg. Alterable value ("Fred") * 2.5 * Alterable Value B("Barney") / Alterable Value C("Wilma").....). The homogenous Active event was 10% faster (144 @ 1 million loops per frame).

    This performance increase is small but could potentially accumulate into something significant over a whole project. It's a good argument for storing your variables in 'storage' Active Objects that you created for that purpose, and trying to group those variables contextually, to minimise mixing and matching of Actives (eg. put all your movement-related values in one Active, all your enemy-AI-related values in another, etc.). Moreover, it's also an argument for using global values - especially considering globals' inherent speed advantage (see next post)....though I personally still prefer the neatness and easy accessibility of Actives.


    MEASURABLE IMPACT BUT UNLIKELY TO MATTER
    If you do all of these things, and do them religiously, they might just amount to a tiny noticeable performance increase...but quite possibly not even then. My advice: you may want change some small habits to accommodate some of these things, but don't waste any significant energy worrying about them.

    division vs multiplication
    It is said that multiplying is quicker for the CPU than division. And my tests suggest that that's true, but barely. Multiplying tested 2% faster (114ms @ 1 million loops per frame). My advice is to opt for multiplication where convenient, but don't go out of your way.


    compare to 0 vs compare to 2
    Computers are said to be quicker at comparing things to 0 than to other numbers. In my simple test (test A: is blabla("Active") = 0 | test B: is blabla("Active") = 2) this appears to be the case. But the difference is minimal (comparing to zero is 2% faster: 34ms @ 1.5million loops per frame).


    sin vs cos
    Sin seems 2% faster (45ms @ 800000 loops per frame). If all you want is any old curvy wave, then you might as well choose sin over cos.


    scale quality: 0 vs 1
    Scaling an active using "quality = 0" was 2% faster (100ms @ 300,000 loops per frame)


    flags vs alterable values
    My tests revealed flags (conditions and actions) to be 18% faster. But the numbers are so miniscule (76ms VS 93ms @ 4 million loops) that I doubt you'd see any difference in anything but the most extreme bottlenecks. I'd personally stick with Alterable Values for their numerous other advantages, except for special use cases where flags really work for you.


    XOR 1 vs multiply by -1
    There are at least two ways to easily 'toggle' an alterable value. One method is to initially set an Alterable Value to 1, then set Alterable Value("MyActive") to Alterable Value("MyActive") * -1 (the result will alternate between 1 and -1). Another is to initially set an Alterable Value to 0, then set Alterable Value("MyActive") to Alterable Value("MyActive") XOR 1 (the result will alternate between 0 and 1). My tests showed the XOR method to be 8% faster (183ms @ 1500000 loops per frame). So that could be a good alternative to flags - you get some of the some of the speed increase of flags, with none of their downfalls.


    call fastloop vs open/close group
    There are two commonly used methods of 'psuedo-functions' (basically, running a section of code only when you need it): putting your 'function' in a fastloop, or in a closed (inactive) group that you activate when necessary. My testing showed calling a fastloop to be moderately faster than opening /closing a group (7% faster, 55ms @ 1million loops per frame). However, this is pretty much a moot point since, as shown earlier, fastloops should be combined with opened/closed groups anyway. In this case, the delay introduced by opening/closing a group once per frame is likely to be far outweighed by the benefit of the fastloop inside not needing to be searched every single time any other fastloop in your game is called.


    Every nth Frame: mod vs TimeX
    There are a couple of easy ways to tell an event to only execute, say, every 7th frame (or 2nd, 4th, or 100th...). One is to make a counter and always add 1 to it and then make a condition that says if counter mod 7 = 0. Another is to use the TimeX object. The mod method tested 4% faster (46ms @1 million loops per frame).


    checking alterable vs checking fixed
    I tested comparing to an alterable value VS comparing to a fixed value. The alterable value was nominally faster (2% faster, 69ms @ 1700000 loops per frame)


    NO MEASURED IMPACT
    These things appear to make no difference at all - or such a microscopic one that even a stress-test couldn't expose it.

    > VS =
    I compared the speed of testing whether an Alterable is equal to something, or greater than something (I set it up so that the answer would be "no" in all cases). I measured no difference (100ms @ 1 million loops per frame)


    floats vs integers
    They say floating point numbers are quicker for the CPU to deal with than integers. If so, then the difference is too miniscule to show up in my tests. I tested a number of different events that manipulated floats/integers in a few different ways. There was no discernible speed difference (100ms @ 1million loops per frame)


    using an alpha coefficient (translucency) VS not using it
    You might think (I did) that adding an alpha coefficient would have a big impact on performance, since the GPU would now need to mathematically combine the color of each of the object's pixels with those from overlapping objects, and those of the background, to create a translucent effect. But not according to my test. I had very many (a few thousand) objects on screen, being repositioned many times in a fastloop. Whether the objects were opaque (alpha coeff = 0) or had translucency (alpha coeff = 150) made no difference (in each case, 90ms to complete a frame @ 1500 loops per frame)


    PNG8 vs PNG24 vs PNG32
    I did a similar test to the one above (lots of active objects on screen, rapidly being repositioned in a fastloop). I tested it with objects whose graphics were comprised of PNGs that I imported in the graphics editor. Whether I imported 8 bit PNGs (256 colors), 24 bit PNGs (16M colors) or 32bit PNGs (16M colors + alpha channel), it made no noticeable impact on the test result. In each case, the test took 100ms to complete 1500 fastloops per frame. Keep in mind that the RAM consumption of these different PNGs would almost certainly have been different. But memory and speed are two very separate issues, and what I was measuring here was speed. As far as speed was concerned, there was no difference.


    UNCERTAIN
    I need more data to be able to say anything sensible about this

    fastloop character movement VS forEach loop character movement
    They say forEach loops are faster than fastloops, so I wondered whether using a custom fastloop player movement would be faster using a forEach loop (even though there's only one player object, which goes against the traditional use case for a forEach loop). I converted the fastloop Y movement in my game to a forEach loop, and executed it thousands of times per frame (using a fastloop to trigger the forEach loop thousands of times). It was much faster than doing it using my regular fastloop movement - up to 500% faster in one test!

    Then, to make sure, I created a new custom fastloop movement in a new MFA. I kept it very simple: on each loop, it would move 1 pixel, backtrack 1 pixel if overlapping a backdrop, and then save the X position to an alterable value. I tested this movement using both a fastloop and a forEach loop. This time, the fastloop was much quicker - up to about 180%. I thought that perhaps the problem was that this new MFA had very few other Active Objects in it, whereas the fastloops in my game had to search through hundreds of other objects each time they ran. So I created a couple thousand new actives (about half cloned, half duplicated), but that didn't change anything (other than very slightly slowing down both the forEachloop test and the fastloop test).

    So, I don't know what to think. Looking through the code, and trying a few variations, I couldn't see why fastloops would be much faster in one scenario and much slower in another. I hope that some more people test fastloops VS forEach loops so we can shed some more light on this.

    Please login to see this link.
    My Fusion Tools: Please login to see this link. | Please login to see this link. | Please login to see this link.

    Edited 14 times, last by Volnaiskra (May 9, 2017 at 11:47 AM).

  • 2 years ago, tompa uploaded a fantastic MFA that compared a whole bunch of different methods of reading/writing variables. Below are my results using this MFA in Windows 10, on the latest version of Fusion (R288.3 steam*). To see tompa's original screenshot, and the mfa so you can test yourself, visit Please login to see this link.

    There's plenty of juicy data here. Though some of the key takeaways are:

    -Global values are the fastest things around;
    -Alterable Values (and Counters) are the 2nd-fastest
    -strings are routinely slower than numbers
    -Certain other types of variable storage have the power to tank your performance big-time (eg. Static Text object is over 1000 times slower than String object. Ouch!!) . Be careful with them!

    Please login to see this picture.

    *disregard the "283.5" in the screenshot. The test used 288.3

    Please login to see this link.
    My Fusion Tools: Please login to see this link. | Please login to see this link. | Please login to see this link.

    Edited 3 times, last by Volnaiskra (April 30, 2017 at 12:48 PM).

  • Volnaiskra you're a gem for this community. Always helping fellow Clickers out, creating additional stuff (control, camera) and then putting together testing like this. Hats off!

    +1

    Currently working on Please login to see this link.
    Released games: Please login to see this link. and Please login to see this link.

  • Super useful thread!

    Tompa's awesome test missed String Tokenizer (which is significantly faster than parser)
    and Associative Array (which is significantly faster than Named Variable object)
    I should have those added in the test somewhere.

    I've tested/collected information mostly on data storage elements (and how to optimize their use),
    also circumstances in which software mode performed better than DX mode (mainly with active picture pasting to backdrop)
    on math expressions (builtin usually win against expression editor, BUT some give integer precision so there's some weighting to do)
    plus some other thing I'm failing to remember now XD
    now I have to go but I'll come back editing this post with those findings with detailed data/proof.

    I have to test turning anti-aliasing on in actives, never heard about that oddity,
    could be very very interesting :O

    thanks Volnaiskra!


    ************ ok, here's some of the findings I think worth sharing **************


    Distance and angle functions

    ODistance is 1.5x faster than Distance
    and 2.5/3x faster than Expression [ Sqr(( X( 'A' ) - X( 'B' ) ) pow 2 + ( Y( 'A' ) - Y( 'B' ) ) pow 2)) ]

    You don't get floating point value BUT you very rarely need floating point precision in a distance,
    (you get a difference of +/- 1 pixel!)
    so it's generally better going with ODistance / Distance

    OAngle 10 % faster than Expression [ Atan2 (atan2(Y( 'A' )-Y( 'B' ),X( 'B' )-X( 'A' ))) ]
    BUT
    when working with angles you'll generally benefit from floating point precision,
    a difference of 0.2/0.5 degrees can change a lot
    i.e., when you are aiming a projectile towards a far direction,
    or rotating a stretched object (say, a line) towards a far coordinate on screen.


    String splitting

    String Tokenizer is about 2x/3x faster than String Parser
    of course, String Parser has lots of additional features,
    I often end up using both in same project: tokenizer for most of splitting needs,
    and parser for various string functions it provides (and wildcard matching)


    Some considerations on the standard Array

    probably one of the most useful, flexible and performant builtin objects,
    there are some things worth considering when using arrays:

    A) number array is 2x/3x faster than string array both in writing and reading,
    plus it has a (simple, but not undesireable :D) sort of encryption when saved.
    If you want to save more than one information in each cell of a number array,
    consider "nesting" multiple values in a single integer.
    Number arrays work with 32 bit integers so you have up to 9 full digits:
    (2) 1 4 7 4 8 3 6 4 7

    you can use this formula to retrieve a number nested inside a bigger one:

    smaller number=
    (big number/ 100..) [count zeroes and put the 1 in the first digit position from the left you need]
    mod 100.. [where '0's are number of digits you want to retrieve after the first position above]

    say you have: 123456789
    and you need to retreive 456
    you can do (123456789 / 1000 ) mod 1000


    B) array is generally very fast in retrieving values/strings, but when it comes to writing lots of
    data it is VITAL that the array is expanded to max foreseeable dimension at start of the process,
    to avoid expanding step by step, this would result in a dramatic performance loss.
    Say you have 50000 values to write in the array,
    and you are setting them one by one within a loop from index 0 to 49999
    if your array has X dimension 1 at start
    running the loop would take about 250/300 times more than if the array had 50000 X dimension at start!
    This is because in first scenario the array is expanded 50000 times vs none.


    Some considerations on the List object

    List objects are awesome, and super useful in most of your projects.
    Lists are about 25 times slower than a string array but they can be the election choice for a couple
    reasons, perhaps being the closest object that can resemble a database table you can partially
    "query": with "findstring" you can quickly and quite efficiently find an element inside a list by
    making sure leftmost part of the string contains the unique (or multiple if needed) ID you want to
    match.
    This is enormously faster than firing a loop through the array and checking if the cell contains that
    id - like thousands times faster!

    Using a list object as a database takes huge benefit from unchecking "vertical scrollbar"
    (and partially from "hide on start", but the bigger bottleneck is "vertical scrollbar")
    a list with vertical scrollbar enabled is 25 times slower when adding a line!


    Some general considerations

    Lowering framerate (particularly in mobile ports) can result in a game-changer.
    Consider if your game really needs to perform 60 times x second all those operations you are telling Fusion to perform in the event sheet.
    Unless you're going for juicy-sweet animations (as lots of people rightfully do of course, but lots don't really need to)
    human eye will be tricked more than enough from 30 frames x second on.

    A bit on the same argument: it's easy to code something that happens "always" (with a "always" event, or simply with a condition that always returns true) >> and so those framerate times x second.
    A little trick I often benefit from using is making those operations happen in specific moments of the frame, and splitting tasks across a reasonable time-frame.

    I.e. you can make a constantly increasing value and trim it to cycle for few frames, say 5
    (of course you can use "mod" for this if you prefer):

    always >> add 1 to "frame_count"
    frame_count > 4 >> set frame_count to 0

    and then:

    frame_count = 0 >>> perform this very heavy task
    frame_count = 2 >>> perform this other quite heavy task
    frame_count = 3 >>> perform still another heavy task
    ...

    in this way you have splitted heavy tasks across more frames,
    and you can get a smooth feel anyway by carefully selecting the pacing of this thing to happen

    a selection of my Fusion examples can be found Please login to see this link.

    Edited once, last by schrodinger (April 30, 2017 at 7:32 PM).

  • And the same goes for schrodinger! :)
    Awesome info, thank you very much for posting!

    Currently working on Please login to see this link.
    Released games: Please login to see this link. and Please login to see this link.

  • Wow, that anti-aliasing thing improved performance immensely in War for Robovania! I'm going to have to try this in some of my other games...

    Also, this sounds like something Clickteam should consider fixing... Volnaiskra, how about putting up a report in the bug box explaining why the anti-aliasing setting's behavior should change so it's easier to get this super-awesome performance?

    My Please login to see this link. (which I actually use), my Please login to see this link. (which I mostly don't use), and my Please login to see this link. (which I don't use anymore pretty much at all really). If there are awards for "'highest number of long forum posts", then I'd have probably won at least 1 by now. XD

    Edited once, last by happygreenfrog (April 30, 2017 at 9:48 PM).

  • Just checking- for the "Anti-aliasing" setting, keeping it OFF has a positive impact on string display, but no discernible impact on active display?
    And the opposite- turning it ON has a positive impact on active performance, but no discernible impact on string performance?

    Therefore, it is recommended to turn it ON for actives and OFF for strings?

    Please login to see this link., Please login to see this link.
    Discord: Please login to see this link., Please login to see this link., Please login to see this link.

  • J3sseM: thank you for your very kind words :)

    @hgf: I'm so pleased! It's incredible isn't it? I am tempted to report it to the bugbox, but I'd like to learn more about this setting. The whole 'big performance increase with zero drawbacks' aspect has a 'too good to be true' feeling about it which makes me slightly uneasy. Like maybe there is some hidden cost lurking somewhere.

    advaith: to my knowledge, everything you wrote is correct, except for the bit about it having no performance impact on strings. I haven't actually measured the impact of performance on strings, though I assume there may be some.

    But as for the rest, yes: leaving anti aliasing OFF hurts performance in actives, has no visual impact (that I can see) on actives, but does have visual impact on strings (it actually enables anti aliasing, paradoxically).

    And conversely, turning anti aliasing ON improves performance in actives, has no visual impact on actives, but has visual impact on strings (makes them jaggy, paradoxically).

    Though all this may be different on different runtimes, non-HWA (non-directX ) mode, or when other factors are involved. Hopefully others can weigh in with more info on this.

    Please login to see this link.
    My Fusion Tools: Please login to see this link. | Please login to see this link. | Please login to see this link.

    Edited once, last by Volnaiskra (May 1, 2017 at 12:54 PM).

  • and I assume the stuff about the anti-aliasing for actives is also true for backdrop objects?

    Please login to see this link., Please login to see this link.
    Discord: Please login to see this link., Please login to see this link., Please login to see this link.

  • Fantastic thread, Vol. Lots of stuff I didn't think to test!


    XOR 1 vs multiply by -1
    There are at least two ways to easily 'toggle' an alterable value. One method is to initially set an Alterable Value to 1, then set Alterable Value("MyActive") to Alterable Value("MyActive") * -1 (the result will alternate between 1 and -1). Another is to initially set an Alterable Value to 0, then set Alterable Value("MyActive") to Alterable Value("MyActive") XOR 1 (the result will alternate between 0 and 1). My tests showed the XOR method to be 8% faster (183ms @ 1500000 loops per frame). So that could be a good alternative to flags - you get some of the some of the speed increase of flags, with none of their downfalls.

    There's a third way I like to use, that works because of Mod:

    set Alterable Value("MyActive") to (Alterable Value("MyActive")+1) Mod 2

    This will alternate between 0 and 1 as well.

    Best person at writing incomprehensible posts. Edits are a regularity.

  • -Loop optimization for moving objects-

    Normally, whenever you make multiple objects that require precise movement, you loop through each one with a foreach and then start a different set of actual movement fastloops for each object. The process looks something like this:

    -objects that need to move are looped through with a foreach loop
    -the ID of each object is recorded on this foreach,
    -movement fastloops are run for each individual object and its ID is compared, to move it pixel-by-pixel according to its velocities, also checking for collision

    Let's assume that you have 5 objects moving at 5 pixels per second on the X axis. Using the above method, that is 25 Fastloops required for movement. As your object count grows, so does the number of fastloops, until eventually, the sheer amount of loops bogs your application down. My personal limit on my (old and very weak) laptop is about 170 moving objects with collision before the FPS drops low enough to become unplayable. That adds up to at worst 850 fastloops fired. I could probably manage more objects just moving around if they didn't require collision, but in action games collision is usually a necessity.


    Naturally, the limitations of the traditional method with fastloops got to me, and I sat around brainstorming for an hour or so. The following method is the result of that, and astoundingly simple:

    -objects that need to move are looped through with a foreach loop
    -the highest absolute velocity of the objects is recorded (which would be 5, like earlier)
    -movement fastloops are run only 5 times
    -each object decides to move by checking whether its absolute velocity is >= to the move loop index

    Using this method, you don't need to specify which object is currently moving, because Fusion does it for you: any object with an absolute velocity higher than the loop index is scoped. To remove an object from the loop, you would set its absolute velocity to 0, which means it wouldn't be scoped whenever the loop fires. Thanks to the comparison to the loopindex, you can also have objects that move at independent speeds. If an object only has a velocity of 1, it will remove itself from the scope after one loop, whereas another object would keep going through iterations until it reaches its cap.

    Now lets go back to our earlier scenario, of 5 objects moving at 5 pixels per second. With the new method, that amounts to only 5 fastloops vs the old 25! Starting to look like an improvement, but let's check what my new object limit is using the new method: 2000

    :O Now there is a significant improvement! The second method yields a total object gain of 1830! All moving simultaneously, all with collisions working! And there are still only 5 loops running each frame. The old method would be using 10,000 loops each frame, and probably crash the application on even the strongest computer!

    You don't need to worry about objects having independent movements, like an enemy that uses a certain behavior or the player's standard movement. All you need to do is handle physics separately, before the movement loops happen. I added a couple different behaviors in the following example for you to play around with:

    Please login to see this attachment.

    (I threw the "un-optimized" examples together quickly, so there are leftover values from the normal examples in there and the debug string isn't accurate. Ignore them. Fine detection is also OFF by default. Turning it on doesn't change much, though. My FPS dropped from 45 on average to 30-40 on average in the optimized bench.)


    I don't think I need to explain the "Real world application" benefits for this method, do I? Practically everyone will benefit, especially mobile users where loops completely wipe their performance. :)

    Best person at writing incomprehensible posts. Edits are a regularity.

    Edited 8 times, last by casleziro (May 1, 2017 at 2:26 PM).

  • That's a clever way to handle lots of instances with fastloops,
    never thought of the problem this way, thanks for sharing! :D

    [bookmarks this thread for future reference]


    Tried the anti-aliasing trick with very resource-taxing situations in P3D,
    when lots of actives are onscreen,
    but sadly it didn't make any difference (neither plus nor minus),
    wonder which other concurrent setting/circumstance makes it unleash that extra-power :o

    a selection of my Fusion examples can be found Please login to see this link.

  • Could you try in a new mfa, and/or upload a screenshot of your various application, frame and active object settings?

    Quote

    Say you have 50000 values to write in the array,
    and you are setting them one by one within a loop from index 0 to 49999
    if your array has X dimension 1 at start
    running the loop would take about 250/300 times more than if the array had 50000 X dimension at start!
    This is because in first scenario the array is expanded 50000 times vs none.

    Must this (pre-emptively setting larger dimensions) be done also for arrays that are loaded from file?

    Please login to see this link.
    My Fusion Tools: Please login to see this link. | Please login to see this link. | Please login to see this link.

    Edited once, last by Volnaiskra (May 2, 2017 at 4:37 AM).

  • Great thread!

    Posts like these and the new library website is really building the Fusion knowledge bank.

    Please login to see this link.
    Please login to see this link.|Please login to see this link.|Please login to see this link.

  • @ Volnaiskra:
    will do further testing later when back home,
    I wonder if this is because P3D also "scales" those objecs >> forces some specific setup for antialiasing? (tried with both quality settings)
    can't do without scaling, but will make a blank mfa to test scaling and the overall thing, which sounds very interesting
    tried removing the shader also but that didn't help either (same performance with antialiasing tick on/off without shader applied)

    I would suspect an array loaded from file is directly expanded to full size only once (optimized way),
    but never tested, so that would be something worth testing too :)
    (just going to save my 50000 entries and try to load with preset index 1 or 50000)

    btw, forgot to say that this can be done dynamically:
    if the largest dimension is unknown while working in frame editor,
    but known within the runtime,
    you can "write" a foo value/string to largest known index before the loop tales place
    (or write values backwards from largest to smaller index)

    i.e.

    ...
    >>> set array_maxindex to (say, 50000)
    >>> write value/text "whatever" to array_maxindex
    >>> start loop "fill array" array_maxindex times

    on loop "fill array"
    >>> ....

    this will expand just once before the loop takes place

    a selection of my Fusion examples can be found Please login to see this link.

    Edited once, last by schrodinger (May 2, 2017 at 10:44 AM).

  • Great thread.

    I think it should be seriously highlighted somewhere that these optimisational 'facts' are running on D3D9 PC runtime as in no way would some of these reflect a mobile device. I just think we need a bit of mitigation there as some users may/will expect these to be coherent across all the runtimes, so establishing these base results that are base runtime results and not per exporter.

    For example, your PNG8/24/32 is marked as 'has no effect' yet this has a substantial effect on Mobile Exporters such as Android and iOS. You have to remember PC runtime is usually dealing with a PC with 3+ ghz of processing power, 4+ GB of RAM and 2+ GB of GPU RAM, whereas some mobile devices are still averaging around half of this power and nowhere near the profile of a PC's GPU. Only the high-end ones but you cannot speculate that all your mobile users only yield high-end devices otherwise you'd be cutting off a wide chunk or distributing an app that looks like it is struggling because you didn't profile correctly.

    Game Launcher Creator V3 - Please login to see this link.
    Bespoke Software Development - Please login to see this link.
    Learn Clickteam Fusion 2.5 - Please login to see this link.

    Danny // Clickteam

  • I agree, Danny. That's why I gave a very detailed summary of my testing environment at the beginning of my results, and mentioned not just once but several times that I encourage people to test on different runtime and setups and add their results here.

    You may well be right about PNG and mobile. But there's only one way to find out for sure ;)

    Please login to see this link.
    My Fusion Tools: Please login to see this link. | Please login to see this link. | Please login to see this link.

  • @ Volnaiskra: I'm failing to reproduce the antialiasing boost,
    this super-simple file gives same results both with antialiasing turned on and off:

    Please login to see this attachment.

    (just wait few seconds for the framerate to drop, at least, in my average desktop machine goes around 20)
    (p.s. tried with moving bouncing ball objects too, same result)

    Re-reading your original post,
    is it required for the object to be moved in a fastloop for this increase to show? :o
    Do you have a simple test file working on your side,
    so I can try and see if it happens here too, or there might be something machine-specific?
    Happygreenfrog confirmed that too, so wonder what might be missing here, I want faster games with just a tick too (XDXD)


    For the array loading, did a quick test, and it seems there's no difference in the array size before loading a file,
    so I suppose this optimization is for writing-on-runtime only, and makes no difference when loading an array file.

    a selection of my Fusion examples can be found Please login to see this link.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!