Entropy, IQ, and overfitting

Why young minds often come up with the paradigm shifting ideas

I am going to be making some statements without explaining too much. The statements are explained in some of the other essays I have written, so I am not going to spend too much time explaining from first principles. I'll eventually add the hyperlinks.

Entropy is a fundamental principle of the universe and the only known principle in physics that can distinguish between the past and the present. The third law of thermodynamics states that the entropy of any closed system can only increase, never decrease. This simple rule has embedded in it the mechanism for the emergence of all the complexity we experience, including the complexity of intelligent life, arguably the most fantastic byproduct of the universe. 

Mathematically, the entropy of a particular configuration is a measure of probability, specifically of that configuration, over all possible configurations that make up a macrostate. High entropy means that the probability of occurrence of that configuration is high, while a low entropy means that the probability of occurrence is low. For example, tossing 100 coins simultaneously, the probability (and thus the entropy) of the configuration 50-Heads, 50-Tails is highest, whereas that of 100-Heads is the lowest. Low entropy sequences is the same principle applied to sequences.

(credit: Hyperphysics)

Matter, life, fuel, chemicals, energy and almost everything we interact with spans a wide range of probabilities, but the most interesting, valuable and critical things are inherently low entropy. The wood you burn to generate a fire takes an ordered configuration of carbon molecules mixed with other atoms and loses most of its structure, leaving behind ash. The food we eat is low entropy configurations. 

Entropy is so fundamental to the universe that the brain as an organ evolved to predict entropy. We don't need a clear definition of life; observing something at our scale is enough to distinguish between animate and inanimate objects. Thus the brain, in the long evolutionary run, acquired the ability to distinguish the low entropy events and sequences and store a mental representation of that as memory. 

Due to the complexity of our brain and the number of neurons we have been endowed with as a species, there are shared abilities common across our species. These include the ability to use tools, learn and speak language, make sense of what we are looking at, understand speech, recognize faces, and so many more. These subconscious abilities are programmed through billions of years of evolution and are a repository of fundamental low-entropy events and sequences. G-factor or intelligence is a measure of the brain's ability to sift through low entropy memory space and see similarities (what we call pattern recognition). Our intelligence or g-factor is the ability to build up our repository with new low-entropy sequences by learning and understanding new concepts. Learning and understanding are associating a new event that the brain calculated as low entropy to others in the repository.

Thus g-factor is the ability to take our repository of low-entropy memory space and apply it to a new event we haven't encountered yet. The repository helps predict entropy and see the similarity to something in our memory. 

This brings us to fitting. In artificial intelligence, overfitting is when the neural network becomes so good at predicting the idiosyncrasies of the training data that its performance on the training set is very high. However, show the same network an example it has not seen, and it paradoxically struggles. Thus the network is unable to generalize. Underfitting is a better place to be in, where the network's performance is reasonably high and thus has learned the general characteristics in the training data. Thus it can extract the same characteristics in a sample not encountered and offer a better prediction. 

Under and overfitting find an analogy in knowledge work like science and entrepreneurship. Many groundbreaking ideas come from young minds who have a high g-factor but are learning that subject for the first time and hence aren't bogged by the biases and idiosyncracies of the past. They come at it with a "fresh pair of eyes." I think this is analogous to older professors overfitting the data because of how much time they have spent, and the younger minds underfitting and thus can generalize and pick the most interesting threads to pull that may unravel the problem. 

Notice I said young minds and not youngsters. By this, I mean those who are learning a subject matter anew and constantly adding to their repository. As long as you are constantly adding to your repository, you are young in your mind. When you say most of what you need to know you know, you start tending towards overfitting. The brain is so complex and magnificent that even with an overfit you can still do amazing work, but the paradigm-shifting ideas will likely stay out of reach.