Don't preoptimize

edited in General
Hi

I just wanted to share a result of a quick test I ran, which indicates that you should not spend time trying to optimize code where you "think" you may get better performance. Basically, don't try to micro-optimize everything.

So the test I ran was to see how efficient the Manhattan distance approximation is. To those of you who are unfamiliar with it, it is an approximation to distance (magnitude) of a vector. Since square roots are normally an expensive function, we approximate the distance of a vector with components x and y, as
distance = Abs(x) + Abs(y)


instead of

distance = Sqrt(x*x + y*y);


This gives you a proxy value for distance. It is not quite distance but if is useful for comparisons. Now how much performance can you get from this versus using distance with C# in unity? Here's the test code:

Vector2 []vectors = new Vector2[8000000];
		float x = 0;
		float y = 0;
		for (int i = 0; i < vectors.Length; i++) {
			vectors[i] = new Vector2(x, y);
			x++;
			y = x * (y*0.001f +0.1f);
		}
		
		float total = 0;
		float t1 = Time.realtimeSinceStartup;
		foreach (Vector2 vec in vectors) {
			total += vec.magnitude;
		}
		float t2 = Time.realtimeSinceStartup;
		Debug.Log ("TEST1: " + (t2 - t1));
		
		total = 0;
		float t3 = Time.realtimeSinceStartup;
		foreach (Vector2 vec in vectors) {
			total += Mathf.Abs(vec.x) + Mathf.Abs(vec.y);
		}
		float t4 = Time.realtimeSinceStartup;
		Debug.Log ("TEST2: " + (t4 - t3));


The results on my machine is:
TEST1: 0.218....
TEST2: 0.2519....


In other words, the performance is WORSE for a result that is less accurate. This just shows how well optimized certain functions are in Unity (and the same applies to other packages you may be using). I believe that it could be because the custom manhattan distance script runs "higher up" on the virtual machine, whereas unity's function perhaps directly communicates with the hardware.

On a side note (slightly unrelated), if you are trying to optimize distance comparisons, use magnitude squared (just skip taking the square roots on both sides of the comparison. If the one side is a constant, simply square it to get the same comparison, but faster). The result of doing it in a test similar to above:
TEST3: 0.0907...


Much faster :)

Comments

  • edited
    Yeah, premature optimization is root of all evil, I think, was the saying.

    There are a couple things muddying the results of your test though - the timer checks and Log hits are weighing in against the later tests.
    Still, I'm surprised to see that magnitude squared is so much faster when, presumably, it was done with a*a+b*b in VM code.

    Maybe I'll try out your test in native code... I've also wondered how mag squared really compares to sqrt...

    UPDATE:
    float vsize( const Vec2d& V )		{ return sqrtf(V.X * V.X + V.Y * V.Y); }
    float vsize_sq( const Vec2d& V )	{ return V.X * V.X + V.Y * V.Y; }
    float vsize_mt( const Vec2d& V )	{ return abs(V.X) + abs(V.Y); }

    uint ListSize = 8000000;
    	std::vector<Vec2d> Vectors(ListSize);
    	float x = 0, y = 0;
    	for( auto& V : Vectors ) {
    		V = Vec2d(x,y);
    		x += 0.5f;
    		y = x * (y * 0.1f + 0.33f);
    	}
    
    	float TotalA = 0, TotalB = 0, TotalC = 0;
    	sf::Int32 time_vsize, time_vsize_sq, time_vsize_mt;
    	sf::Clock Clock;
    
    	for( auto& V : Vectors )
    		TotalA += vsize( V );
    	time_vsize = Clock.getElapsedTime().asMilliseconds();
    
    	Clock.restart();
    	for( auto& V : Vectors )
    		TotalB += vsize_sq( V );
    	time_vsize_sq = Clock.getElapsedTime().asMilliseconds();
    
    	Clock.restart();
    	for( auto& V : Vectors )
    		TotalC += vsize_mt( V );
    	time_vsize_mt = Clock.getElapsedTime().asMilliseconds();
    
    	std::cout << "vsize() time : " << time_vsize << "\n";
    	std::cout << "vsize_sq() time : " << time_vsize_sq << "\n";
    	std::cout << "vsize_mt() time : " << time_vsize_mt << "\n";
    	std::cout << "Total values : " << TotalA << ", " << TotalB << ", " << TotalC << "\n";


    The results: (these are milliseconds)
    vsize() time : 33
    vsize_sq() time : 18
    vsize_mt() time : 18

    There was a tiny bit of variation of about 1ms between different runs on each of those times.

    That's 8 million of the good magnitude tests in 33 ms. Bumping it up to 64 million, the times all scaled perfectly linearly.

    I tried using basic "new Vec2d[]" instead of "std::vector<Vec2d>" to see if it made any difference.
    The only difference:
    vsize_sq() time : 6

    ...What? Something must be wrong, but I can't find anything. I'm too lazy to go over the assembly now, perhaps the compiler found something to optimize in this situation.

    I guess the conclusion is very much in line with the topic of the thread and, indeed, sanity - Don't Preoptimize ;)
  • edited
    It also depends on what you're optimizing. Crunching numbers is what CPUs excel at, so a point of diminishing returns is usually reached very quickly. What CPUs suck at though is retrieving data from system memory (or rather, system memory sucks at sending data to the CPU because it's so much slower than the CPU).

    Whether or not you should pre-optimize also depends on how leaky your abstractions are. Optimizing for cache-coherency (AKA data-oriented program design) can have profound effects on your public API, meaning if you wait too long to optimize, doing so could have a ripple effect on the rest of your codebase, as well as your clients' codebases.

    Premature optimization is definitely something to be taken seriously, but postmature optimization is a demon in its own right.

    Bob Nystrom's Data-Locality pattern has the most concise explanation of cache coherency issues and their potential solutions that I've seen.
  • @ChristopherM That's also true. In terms of structure, planning is very important. I suppose what I meant to say, is don't do micro-optimizations prematurely. Things that you could do quite easily later if necessary. Especially if there is a chance the compiler can do it better.
  • IMO, putting effort into optimization depends on only this: it's too slow already.

    Worrying about APIs and clients down the line is a problem for general software development, not really an issue for a small game developer here in the current situation.

    Optimizing the overall process of making a game is the real optimization.

    (This post is a good example of wasteful optimization. It was longer, then took time to make smaller, all the while doing nothing but making me seem more like a mean old nazi...) ;)
Sign In or Register to comment.