What does Non-Linearity mean in Neural Networks?
A neural network is made of layers, and layers are made of nodes (also called neurons or units). It is these nodes that process input data. First, each neuron that receives input. If this is the first layer, the input is raw data. If it is a higher order layer, the input comes from the layer below. The node applies weights to the inputs. Then adds a bias term. Then applies a nonlinear activation function (such as ReLU or sigmoid) and produces its outputs the resulting value to the next layer, which receives it as input. The keyword here is “nonlinear” because in real life things are rarely linear.
Everything in real life depends on a lot of variables that do not follow a linear relationship between each other. Let’s explain this through an analogy from structural engineering which might be easier to visualize: First, think of a spring. The relationship between the force applied to a spring and its deformation can be written by a linear relationship as: F=kx. Here F is the force, k is the spring constant and x is the amount of elongation or shortening of the spring. This is a linear relationship. The more you push or pull the spring, the more it will deform, which is determined by a simple linear relationship based on a constant value, k. So far, simple right? But, in real life, things are not this simple. Even this very easy example was actually a theoretical one… This is because in nature a true linear relation does not exist. But some relations are simple enough, like the example we just gave, that we can get away by assuming that the relationship is linear, and still be accurate enough in practice. Here the F=kx relationship suffices for us, for all practical purposes, as long as we are within certain limits. For example we may assume that for a certain spring, this relationship holds true until a force of say, 1000 Newtons. Up to this point, we say the relationship is linear. But if we applied a force greater than 1000 Newtons, there might start some excessive deformations in the spring and the spring constant might change. The relationship then starts to cease being linear and it becomes nonlinear. In such case we need a nonlinear mathematical function in order to define this new relationship between the force and deformation. This way we can keep calculating the ongoing deformations based on force, even though the relationship is more complex now.
Let’s scale up and give an example from a building this time. When an earthquake happens, it shakes the building laterally back and forth. Up to a point, we can assume that the force deflection relationship here is linear and it will be relatively easy to calculate deflections of the building. But if the earthquake is too strong, there will be greater ground acceleration which will cause building to experience greater forces. This time the deformations cannot be calculated so easily because now there can be failed members, permanently deformed members, buckling in various places within the structure that cause a much more complex relation between the force on the building and the overall deflection. This time the relationship has just become nonlinear. It cannot be explained with a simple linear relationship anymore.
Now after this background, hoping that the concept of nonlinearity is better visualized, let’s return to our main subject, nonlinearity in neural networks.
Just like in structural systems, where the linear relations can go only so far, neural networks must also use nonlinear functions to model more complex behaviors.
Neural network can be thought as a type of machine which, after we feed info to it, it assigns weights and layers to them and then produces outputs in the form of predictions or classifications. Here if the relation between the input and output was only linear, then whatever input we give, the output should have changed the same proportion. But this is not the case. Between the input and output there is not a straight line. In the real world of recognizing images or understanding language, things get far more complex than what can be described by a straight line. Instead, there should be curves and or zigzags to varying degrees. In other words, the output does not simply follow a straight path in proportion to the input. Any change in input can cause output to vary in different patterns and complexities, not in a straight but curved manner which also vary in degree. This nonlinearity is introduced into neural network through something called activation functions. These ensure that the network can adjust the math by applying special switches and filters in order to fit the real life data better with varying curves instead of straight lines. Examples of activation functions are such as ReLU, tanh, sigmoid… Without nonlinearity, AI’s vision, voice recognition, natural language processing would not be possible. For example in an image there are texture, edges, curves. In a financial data there are complex relationships, in language there is syntax, semantics. To process all these, we need relationships that are far more complex than when can be defined by a line such as y=kx.
Post By: A. Tuter
——————————————————
Terms of use:
Copying or republishing of our content is not allowed without written permission from us. We make dated records and keep originals of our posts and images. The content in this website may be incorrect or incomplete. User assumes all liability as a result of using this website. Also see our Terms page.