Ok, my apologies in advance if you expected artificial intelligence agents playing with each other in this post, I couldn’t help myself when I wrote the title. This post is about comparing the speeds of two significantly different implementations of the same neural network algorithm.
- In the left corner:
- Old C++ code from the uni 15 years ago,
- Written in Borland C++ Builder 6,
- Fixed size containers for network parameters
- Implemented with arrays and lots of raw pointers,
- For loops with if/else conditions for matrix algorithms
- Mental amount of copy constructor calls when returning temporary matrices by value,
- Simple but should be fast.
- In the right corner:
- C++11 implementation
- Dynamic container sizes
- With lots of smart pointers
- Lots of vtable access,
- Move constructor semantics,
- Uses STL algorithms for almost everything except matrix product
- Transpose_iterator implementation that fakes a transposed matrix without actually shuffling the data – transposing a matrix is now flipping a boolean
- Wrapped in CLI/C++
- Displayed in C# WinForms.
Both are release builds started within 20 iterations from each other.
Update 2: I have optimised the loops in the matrix multiplication algorithm:
- pre-fetching const values before the loops
- pre-fetching a function pointer to the right Getter function (transposed vs normal), saving an extra if/else condition from inside the loop
the speed difference between the two applications jumped to 35.8% – which is a lot more pleasing for me to see. The speed gain also shows in the profiler, the CPU spends around 16% less time in the matrix multiplier operator compared to the previous build.