Here are some tests that I've done. I did my these using the inbuilt A/B testing mechanism in VACCiNE. To my knowledge, it's accurate (but I built it, so it might be rubbish). All of the following tests were done in the windows runtime (running from within Fusion, not building an actual EXE). My specs: Windows 10, core i7 4770k @ stock, GTX 980ti, 32GB DDR3-1333 ram. Fusion version R288.3 steam (direct3d 9 mode)
SIGNIFICANT IMPACT. (maybe)
Pay attention to these areas. Even if you have to rework existing code, it may be worth the effort.
Testing for Collisions vs Testing for Overlap
It's well known that polling for collisions is faster than polling for overlap. I tested this by having 4000 Actives flying around, polling multiple times per frame for collision/overlap. Polling for collision was 797% faster! (184ms @ 1000 loops per frame). Note that this speed advantage only exists if the "on collision" condition is the topmost condition (ie. it's green). Placing it lower appears to make it function like an "on overlap" condition".
Checking "antialiasing" in display options of an Active Object - This one is an eyebrow-raiser!
I tested the performance impact of the "Anti-aliasing" checkbox under "effects" in the "Display Properties" of Active Objects. To test, I had a few thousand actives flying around, then repositioned them randomly 1500 times per frame in a fastloop.
Checking "antialiasing" results in a whopping 64% speed increase (211ms to complete 1500 fastloops). These three numbers (relatively high speed increase, relatively high milliseconds, relatively low number of fastloops), combined with how plentiful Active Objects are in almost any game - make this one of the most impactful optimisation factors I have ever seen tested.
This result is remarkable, for a number of reasons. Firstly, let me repeat in case you missed it: turning "anti-aliasing" on gives you the big performance increase. Secondly, as far as I can tell with the naked eye, this setting doesn't seem to do anything - I couldn't see any difference in visual quality with the setting on or off - even after screengrabbing and zooming in Photoshop. Thirdly, from what I could gather after googling old threads, Yves says that the setting is actually misnamed, since in DirectX mode (which I'm assuming almost every Clicker uses by now), it doesn't turn anything on but rather turns Windows' system antialiasing off (though this appears to only make a visible difference to text, as in the String object). Finally, when you create an Active Object, this setting is unchecked by default.
So, unless I'm missing something, this is the situation: The "antialiasing" option does nothing at all except substantially impede performance, yet is set to do this by default on every new object you create. It's misnamed in such a way that users desperate for more performance will be inclined to uncheck it, inadvertantly worsening their performance. And, as if to make things as unhelpful as possible, the in-editor description of this option offers this advice: "..."
I'd be very interested for others to chime in about this one (with their own tests and/or info about what this setting actually does). Perhaps I've made a stupid mistake somewhere and missed something. Or perhaps it works very differently on other exporters. Or maybe it really is as terrible as it seems, and we should all be religiously turning it off (ie. checking the box) every time we create an Active.
UPDATE 9 May: This one has proved to be highly elusive. A couple of people tried tests on their PCs and saw no difference between AA on and AA off. I myself spent a few hours trying to recreate the test and result (I stupidly didn't save the MFA the first time), and I can no longer find any performance impact either. I'm putting this down to a nvidia driver change I made a few days ago. At least one other person (happygreenfrog) has reported a sizeable difference when testing this setting. So, it seems that this setting can make a difference, but only in particular cases, perhaps with particular hardware/driver configurations. My recommendation would still be to check antialiasing, because there's a chance it might have a benefit for some of your players. And I've not seen any evidence (either in reports on this thread or in my own tests) that checking antialiasing hurts performance. Instead, it seems to either do nothing in some cases while improving performance in other cases.
Control X object vs regular keyboard/mouse object
I tested 3 events in a fast loop: "upon pressing space", "while pressing C" and "if any key pressed". The Result: Control X was 818% times faster! (20ms per frame @ 200,000 loops per frame)
Given that you're probably polling for key states many times per frame, on every single frame of your game, this one seems a no-brainer - use Control X! (it's why I've switched VACCiNE to use it almost exclusively). Note: the speed advantage is only noteable when using Control X's "select by value" options
Fine Detection vs no Fine Detection
I did a few different tests here. For example, I had a few thousand actives flying around, and tested repeatedly for collisions. Or I tested repeatedly for overlaps in a fastloop. I tried round shapes, and wonky irregular shapes. In each case, the results were highly unpredictable. Sometimes they would show a substantial win for "fine detection", while other times it would be the exact opposite. After probably about 20+ tests, the results seemed to slightly favour "no fine detection", on average (maybe about 2%?) So I'm putting it in the "significant" section, though I find the wildly fluctuating results puzzling.
Hiding unused fastloops in closed groups VS leaving them open
I've read a few times that you should put fastloops in groups, and activate/deactivate those groups only as needed in runtime. Every fastloop call must search through all fastloops, so hiding unused ones in inactivated groups speeds this process up. I tested this by running a small handful of fastloops, 2000 times per frame. In test B, I repeated this while also inactivating the vast majority of my game's fastloops (500+ fastloop events referencing 60+ individual fastloops) by closing their groups at the beginning of frame, and reactivated them at the end of each frame. The results were impressive: hiding the fastloops gave me a 179% faster result (30ms).
Performance gains in benchmarks like this almost always become less noticeable when you shift from testing conditions to less extreme, more real-world levels. This is because performance discrepancies are exposed and amplified by the stress-testing of a benchmark environment, but are more likely to be camouflaged by everything else going on in the game when brought down to regular levels. However, when I lowered this test from 2000 loops per frame to 200 loops per frame, the result was still a 50% measured speed increase, and when I lowered it further to just 50 loops per frame, I still got an 11% increase! These are solid performance gains at real-world levels. Leave unused fastloops open at your own peril!
Large single Active vs many little actives, pasted into background
Someone asked in another thread whether using a single large active as a background would be better or worse than splitting up the same image into lots of tiny actives that would then be pasted into the background. I tested this. Test A used a single 2048x2048 Active. Test B used 4096 little 32x32 Actives (which cover the same area) pasted in the background. In both tests, I rapidly moved the camera around many times per loop in a fastloop, to force Fusion to frequently have to redraw everything. According to the results, the single Active was 35% faster (114ms @ 150000 loops per frame).
Always+Condition VS Condition+Condition
Say you want to set the value "jump" to 1 when the user is holding space, and to 0 when the user is not holding space. There are two basic approaches to achieving this.
CONDITION: (x) if user pressing space (negated)
----ACTION: set "jump" to 0
CONDITION: if user pressing space
----ACTION: set "jump" to 1
----ACTION: set "jump" to 0
CONDITION: if user pressing space
----ACTION: set "jump" to 1
Both approaches will produce identical results, but Approach #2 is faster (89% faster according to my test). What approach 2 wastes by always executing the first action (even when it will be immediately overwritten by the 2nd action) it more than makes up for by not having to execute two conditions each time (forcing Fusion to poll the keyboard twice per frame instead of only once).
This is a very simple and valuable technique to use in your games. And because you're bound to have plenty of opportunities to use it, it may stack up into some tangible performance savings. However, the performance impact of this technique will vary widely depnding on the circusmtances. In the above example, polling the keyboard (using the default mouse/keyboard object) is expensive, so Approach #2 wins easily by polling fewer times. But if your condition does something less expensive, like checking an alterable value, your savings won't be as large (38% according to my test). Furthermore, the actions will impact on result too. Our example uses a very simple action (set "jump" to a number), so it doesn't matter much that it's sometimes executed unnecessarily. But if we had 10 actions, some of them dealing with complex equations or expensive extensions? Then the cost of sometimes unnecessarily executing those actions might well outweigh the savings of executing one fewer condition.
So my advice is to use this always+condition technique frequently, but not unthinkingly.
Mixing & Matching Active Objects VS homogenous Active Objects
I tested a condition and an action that contained equations referencing the same Active Object several times (eg. Alterable value ("Fred") * 2.5 * Alterable Value B("Fred") / Alterable Value C("Fred")....). I compared this to an otherwise identical event that referenced multiple Actives Objects (eg. Alterable value ("Fred") * 2.5 * Alterable Value B("Barney") / Alterable Value C("Wilma").....). The homogenous Active event was 10% faster (144 @ 1 million loops per frame).
This performance increase is small but could potentially accumulate into something significant over a whole project. It's a good argument for storing your variables in 'storage' Active Objects that you created for that purpose, and trying to group those variables contextually, to minimise mixing and matching of Actives (eg. put all your movement-related values in one Active, all your enemy-AI-related values in another, etc.). Moreover, it's also an argument for using global values - especially considering globals' inherent speed advantage (see next post)....though I personally still prefer the neatness and easy accessibility of Actives.
MEASURABLE IMPACT BUT UNLIKELY TO MATTER
If you do all of these things, and do them religiously, they might just amount to a tiny noticeable performance increase...but quite possibly not even then. My advice: you may want change some small habits to accommodate some of these things, but don't waste any significant energy worrying about them.
division vs multiplication
It is said that multiplying is quicker for the CPU than division. And my tests suggest that that's true, but barely. Multiplying tested 2% faster (114ms @ 1 million loops per frame). My advice is to opt for multiplication where convenient, but don't go out of your way.
compare to 0 vs compare to 2
Computers are said to be quicker at comparing things to 0 than to other numbers. In my simple test (test A: is blabla("Active") = 0 | test B: is blabla("Active") = 2) this appears to be the case. But the difference is minimal (comparing to zero is 2% faster: 34ms @ 1.5million loops per frame).
sin vs cos
Sin seems 2% faster (45ms @ 800000 loops per frame). If all you want is any old curvy wave, then you might as well choose sin over cos.
scale quality: 0 vs 1
Scaling an active using "quality = 0" was 2% faster (100ms @ 300,000 loops per frame)
flags vs alterable values
My tests revealed flags (conditions and actions) to be 18% faster. But the numbers are so miniscule (76ms VS 93ms @ 4 million loops) that I doubt you'd see any difference in anything but the most extreme bottlenecks. I'd personally stick with Alterable Values for their numerous other advantages, except for special use cases where flags really work for you.
XOR 1 vs multiply by -1
There are at least two ways to easily 'toggle' an alterable value. One method is to initially set an Alterable Value to 1, then set Alterable Value("MyActive") to Alterable Value("MyActive") * -1 (the result will alternate between 1 and -1). Another is to initially set an Alterable Value to 0, then set Alterable Value("MyActive") to Alterable Value("MyActive") XOR 1 (the result will alternate between 0 and 1). My tests showed the XOR method to be 8% faster (183ms @ 1500000 loops per frame). So that could be a good alternative to flags - you get some of the some of the speed increase of flags, with none of their downfalls.
call fastloop vs open/close group
There are two commonly used methods of 'psuedo-functions' (basically, running a section of code only when you need it): putting your 'function' in a fastloop, or in a closed (inactive) group that you activate when necessary. My testing showed calling a fastloop to be moderately faster than opening /closing a group (7% faster, 55ms @ 1million loops per frame). However, this is pretty much a moot point since, as shown earlier, fastloops should be combined with opened/closed groups anyway. In this case, the delay introduced by opening/closing a group once per frame is likely to be far outweighed by the benefit of the fastloop inside not needing to be searched every single time any other fastloop in your game is called.
Every nth Frame: mod vs TimeX
There are a couple of easy ways to tell an event to only execute, say, every 7th frame (or 2nd, 4th, or 100th...). One is to make a counter and always add 1 to it and then make a condition that says if counter mod 7 = 0. Another is to use the TimeX object. The mod method tested 4% faster (46ms @1 million loops per frame).
checking alterable vs checking fixed
I tested comparing to an alterable value VS comparing to a fixed value. The alterable value was nominally faster (2% faster, 69ms @ 1700000 loops per frame)
NO MEASURED IMPACT
These things appear to make no difference at all - or such a microscopic one that even a stress-test couldn't expose it.
> VS =
I compared the speed of testing whether an Alterable is equal to something, or greater than something (I set it up so that the answer would be "no" in all cases). I measured no difference (100ms @ 1 million loops per frame)
floats vs integers
They say floating point numbers are quicker for the CPU to deal with than integers. If so, then the difference is too miniscule to show up in my tests. I tested a number of different events that manipulated floats/integers in a few different ways. There was no discernible speed difference (100ms @ 1million loops per frame)
using an alpha coefficient (translucency) VS not using it
You might think (I did) that adding an alpha coefficient would have a big impact on performance, since the GPU would now need to mathematically combine the color of each of the object's pixels with those from overlapping objects, and those of the background, to create a translucent effect. But not according to my test. I had very many (a few thousand) objects on screen, being repositioned many times in a fastloop. Whether the objects were opaque (alpha coeff = 0) or had translucency (alpha coeff = 150) made no difference (in each case, 90ms to complete a frame @ 1500 loops per frame)
PNG8 vs PNG24 vs PNG32
I did a similar test to the one above (lots of active objects on screen, rapidly being repositioned in a fastloop). I tested it with objects whose graphics were comprised of PNGs that I imported in the graphics editor. Whether I imported 8 bit PNGs (256 colors), 24 bit PNGs (16M colors) or 32bit PNGs (16M colors + alpha channel), it made no noticeable impact on the test result. In each case, the test took 100ms to complete 1500 fastloops per frame. Keep in mind that the RAM consumption of these different PNGs would almost certainly have been different. But memory and speed are two very separate issues, and what I was measuring here was speed. As far as speed was concerned, there was no difference.
I need more data to be able to say anything sensible about this
fastloop character movement VS forEach loop character movement
They say forEach loops are faster than fastloops, so I wondered whether using a custom fastloop player movement would be faster using a forEach loop (even though there's only one player object, which goes against the traditional use case for a forEach loop). I converted the fastloop Y movement in my game to a forEach loop, and executed it thousands of times per frame (using a fastloop to trigger the forEach loop thousands of times). It was much faster than doing it using my regular fastloop movement - up to 500% faster in one test!
Then, to make sure, I created a new custom fastloop movement in a new MFA. I kept it very simple: on each loop, it would move 1 pixel, backtrack 1 pixel if overlapping a backdrop, and then save the X position to an alterable value. I tested this movement using both a fastloop and a forEach loop. This time, the fastloop was much quicker - up to about 180%. I thought that perhaps the problem was that this new MFA had very few other Active Objects in it, whereas the fastloops in my game had to search through hundreds of other objects each time they ran. So I created a couple thousand new actives (about half cloned, half duplicated), but that didn't change anything (other than very slightly slowing down both the forEachloop test and the fastloop test).
So, I don't know what to think. Looking through the code, and trying a few variations, I couldn't see why fastloops would be much faster in one scenario and much slower in another. I hope that some more people test fastloops VS forEach loops so we can shed some more light on this.