Recently I’ve been reading quite a lot about activation functions and Neural Networks in general and I think I found a good answer to a question that has been bugging me (and others who know what I am working on) ever since I started working on my own deep learning framework. I’ve had conversations with some recent graduates who learned about neural networks at university in a probably much more modern way than how I was taught about them in the early 2000’s, and it was interesting to hear their views on my work. Most of the time the first question they asked me is:
What is the point? Why would I want to implement my own algorithms and Machine Learning framework if I could just use e.g. TensorFlow off the shelf?
It is a very valid question, although sometimes it comes across a bit rudely, when for example I had a Skype chat with some PhD researcher guys who wanted to use neural networks for a quite interesting project – they immediately ended the conversation with me when I told them that I have my own algorithm code – and I never heard from them since. I guess they immediately assumed that I am unable to create anything as good as the researchers at Google – which would be a very fair assumption to be honest, and in itself I am not offended by that at all. But since I am not working for Google and at the same time I am still very interested in neural networks my only possible course of action is to implement my own framework. I guess it is as simple as that.
Machine Learning and especially Neural Networks are far from exact fields of science, they are still very much work in progress, even though companies like Google set out a couple of standards through releasing their own internal software and protocols, these are still and probably will continue to be at research stage for a long time. Also the fact that buzzwords are circling around on LinkedIn and in other professional circles about Machine Learning (some recruiters love posting stuff they have no clue about!), these could easily make someone think that there are well established methods and dogmas that one can learn off the shelf and be done with it for life. Of course there can be some good off the shelf solutions that fit a wide variety of problems, but with more experience one will be able to select solutions more carefully.
One should not think about neural networks as a very well established field. Nobody in the world understands them well enough so that they could always tell you the best possible network and configuration for any given problem. However, this is exactly what makes them interesting for a guy like myself – I am convinced there is still a lot to discover about them.
Also If I wanted to do my own research on this field I need a code base that I fully understand. One that I created myself from the bottom up. Only this can give me the flexibility of quickly implementing new ideas I read about elsewhere or I may come up with myself. Often the problem with new ideas is that it would take too much work to test them – and I would like to be in a position where the amount of required work is not blocking me from implementing new things.
If there is one thing I may regret about DeepTrainer is that I did not start working on it earlier, like back in 2004 – well before the big Machine Learning renaissance and the emergence of Big Data and Cloud computing. I would be much further by now. But I don’t think I am late either.