Talking about Deep Learning: Core Concepts

deep learning tutorial, deep learning vs machine learning, deep learning ai, deep learning andrew ng, deep learning mit, deep learning pc, deep learning python, deep learning stanford, deep learning algorithms, deep learning applications, deep learning andrew ng course 1, deep learning attention, deep learning a crash course, deep learning andrew ng course 2, a deep learning framework for character motion synthesis and editing, a deep learning-based radar and camera sensor fusion architecture for object detection, a deep learning approach to grasping the invisible, a deep learning approach for generalized speech animation, a deep learning framework for financial time series using stacked autoencoders and lstm, a deep learning model for pediatric patient risk stratification, a deep learning framework for neuroscience, a deep learning approach for population estimation from satellite imagery, deep learning basics, deep learning build, deep learning bangla, deep learning book, deep learning basics with python tensorflow and keras, deep learning bangla tutorial, deep learning backpropagation, deep learning build 2020, deep learning course, deep learning computer, deep learning crash course, deep learning cmu, deep learning coursera, deep learning chatbot, deep learning chapter 2, deep learning computer vision, c language deep learning, c programming deep learning, deep learning c'est quoi, deep learning desktop, deep learning documentary, deep learning deepmind, deep learning deeplizard, deep learning demo, deep learning demystified, deep learning desktop 2020, deep learning drug discovery, deep learning explained, deep learning example, deep learning engineer, deep learning edureka, deep learning english, deep learning economics, deep learning experiments, deep learning gpu, machine learning e deep learning, differenza machine learning e deep learning, diferença machine learning e deep learning, deep learning for computer vision, deep learning fundamentals, deep learning full course, deep learning frameworks, deep learning from scratch, deep learning freecodecamp, deep learning for beginners, deep learning for coders, deep learning game, deep learning google, deep learning gpu benchmarks, deep learning genomics, deep learning gan, deep learning goodfellow, deep learning graphics card, deep learning hindi, deep learning hardware, deep learning hyperparameter tuning, deep learning harvard, deep learning history, deep learning hinton, deep learning hype, deep learning hello world, deep learning interview questions, deep learning in hindi, deep learning in r, deep learning introduction, deep learning in python, deep learning image processing, deep learning image classification, deep learning ian goodfellow, deep learning jobs, deep learning java, deep learning javascript, deep learning jeremy howard, deep learning julia, deep learning jupyter notebook, deep learning jetson nano, deep learning jarvis, deep learning for j, deep learning krish naik, deep learning keras, deep learning kelly howell, deep learning keras tutorial, deep learning keras tensorflow, deep learning kaggle, deep learning kumar, deep learning khan academy, deep learning p k biswas, deep learning lecture, deep learning laptop, deep learning lex fridman, deep learning laptop 2020, deep learning lstm, deep learning loss function, deep learning linear algebra, deep learning lex, deep learning music, deep learning machine, deep learning matlab, deep learning math, deep learning machine build, deep learning medical imaging, deep learning music generation, iitm deep learning, deep learning nptel, deep learning neural networks, deep learning nlp, deep learning nyu, deep learning neural networks and the future of ai, deep learning nvidia, deep learning ng, deep learning news, deep learning optimization, deep learning object detection, deep learning overview, deep learning on macbook pro, deep learning on aws, deep learning oxford, deep learning ocr, deep learning on windows, o que é deep learning, deep learning projects, deep learning pc build, deep learning pytorch, deep learning pc build 2020, deep learning python tutorial, deep learning projects in python, deep learning quantization, deep learning question answering, deep learning quick tutorial, deep learning que es, deep learning qgis, deep learning quiz coursera, deep learning quantum computing, deep learning quant trading, deep q learning tutorial, deep q learning pytorch, deep q learning tensorflow, deep q learning keras, deep q learning sentdex, double deep q learning, deep q learning tensorflow 2, dueling deep q learning, deep learning rig, deep learning research, deep learning recommender systems, deep learning robot, deep learning reinforcement learning, deep learning regularization, deep learning regression, deep learning roadmap, r cnn deep learning, deep learning in r studio, deep learning tutorial in r, deep learning keras r, deep learning specialization, deep learning server, deep learning super sampling, deep learning state of the art, deep learning sentdex, deep learning stock trading, deep learning simplified, deep learning tensorflow, deep learning transformers, deep learning tutorial andrew ng, deep learning trading, deep learning tutorial for beginners, deep learning tutorial python, deep learning time series forecasting, deep learning ucl, deep learning udacity, deep learning using python, deep learning use cases, deep learning using matlab, deep learning upscaling, deep learning uses, deep learning using keras, u net deep learning, deep learning video, deep learning voice, deep learning visualization, deep learning vs neural networks, deep learning voice cloning, deep learning vs machine learning vs neural network, deep learning video generation, gta v deep learning, titan v deep learning, deep learning with pytorch, deep learning with python, deep learning workstation, deep learning with tensorflow, deep learning with pytorch - free six week course, deep learning with python tensorflow and keras tutorial, deep learning workstation build, deep learning with pytorch a 60 minute blitz, deep learning x ucl, deep learning x ray, deep learning xor example, deep learning xor, 5700 xt deep learning, xception deep learning, xgboost vs deep learning, covid xray deep learning, x ray deep learning, jetson nano x deep learning 딥러닝, deep learning yann lecun, deep learning yolo, deep learning youtube, deep learning yoshua bengio, deep learning yolo object detection, deep learning youtube recommendations, deep learning yann lecun yoshua bengio & geoffrey hinton, deep learning youtube andrew ng, machine learning y deep learning, diferencia machine learning y deep learning, deep learning zero to hero, deep learning zero to gans, deep learning zero to all, deep learning a-z udemy, deep learning a-z, eth zurich deep learning, deep learning 101, deep learning 1x1 convolution, deep learning 1d convolution, kaggle deep learning 1, intro to deep learning #1, intro to deep learning #11, deep learning chapter 1, deep learning.ai course 1, lesson 1 deep learning 2019, course 1 deep learning, deep learning part 1, deep learning tutorial 1, deep learning 1, deep learning 2020, deep learning 2019, deep learning 2020 mit, deep learning 2080 ti, deep learning 20, deep learning 2048, deep learning 2d to 3d, deep learning 2.0 bengio, dota 2 deep learning, system 2 deep learning, deep learning part 2, deep learning course 2, deep learning week 2 assignment, deep learning starcraft 2, 2 minute papers deep learning, deep learning 3blue1brown, deep learning 3d, deep learning 3d modeling, deep learning 3d reconstruction, deep learning 37, deep learning 3 blue, deep learning 3d model generation, deep learning 34, 3 deep learning algorithms in under 5 minutes, lesson 3 deep learning 2019, deep learning chapter 3, deep learning course 3, kaggle deep learning 3, deep learning raspberry pi 3, deep learning 3, deep learning series ep 3, deep learning 4j, deep learning 4j tutorial, deep learning 4 java, deep mob learning skyfactory 4, deep learning course 4, raspberry pi 4 deep learning, deep learning chapter 4, 4 gpu deep learning build, unreal engine 4 deep learning, deep learning week 4 assignment, deep learning tutorial 4, deep learning на пальцах 4, deep learning 5 minutes engineering, deep learning 53, deep learning 5700 xt, deep learning in 5 minutes, deep learning course 5, deep learning chapter 5, deep learning ai course 5, gta 5 deep learning, deep learning book chapter 5, deep learning 6.s191, deep learning 6 c's, mit deep learning 6.s191, deep learning на пальцах 6, rtx 8000 deep learning

Machine Learning (Machine Learning)
In machine learning, we (1) read the data, (2) train the model, and (3) use the model to make predictions on new data. Training can be seen as a process of learning one by one when the model gets new data. At each step, the model makes predictions and gets feedback on accuracy. The form of feedback is the error under a certain measure (such as the distance from the correct solution), which is then used to correct the prediction error.
Learning is a iterative process in the parameter space: when you adjust the parameters to correct a prediction, the model may get the original right and wrong. It takes many iterations for the model to have good predictive ability. This "prediction-correction" process continues until the model has no room for improvement.
Feature Engineering
Feature engineering extracts useful patterns from data to make it easier to classify by machine learning models. For example, use a bunch of green or blue pixel areas as a standard to determine whether the photo is a land animal or aquatic animal. This feature is very effective for machine learning models because it limits the number of categories that need to be considered.
In most forecasting tasks, feature engineering is a necessary skill to achieve good results. However, because different data sets have different feature engineering methods, it is difficult to draw general rules and only some general experience, which makes feature engineering more of an art than a science. An extremely critical feature in one data set may not be useful in another data set (for example, the next data set is full of plants). It is precisely because feature engineering is so difficult that scientists will develop algorithms that automatically extract features.
Many tasks can already be automated (such as object recognition, speech recognition), and feature engineering is still the most effective technology in complex tasks (such as most tasks in Kaggle machine learning competitions).
Feature Learning
Feature learning algorithms look for common patterns among the same class and automatically extract them for classification or regression. Feature learning is feature engineering that is automatically completed by algorithms. In deep learning, the convolutional layer is very good at finding the features in the picture and mapping to the next layer, forming a hierarchical structure of non-linear features, and the complexity gradually increases (for example: circle, edge -> nose, eyes, cheek) . The last layer uses all the generated features for classification or regression (the last layer of the convolutional network is essentially polynomial logistic regression).
Figure 1 shows the features generated by the deep learning algorithm. It is difficult to find that these features are very clear, because most features are often incomprehensible, especially recurrent neural networks, LSTMs or very deep deep convolutional networks.
Deep Learning
In hierarchical feature learning, we extract several layers of non-linear features and pass them to the classifier, which integrates all the features to make predictions. We deliberately stack these deep non-linear features, because the number of layers is small, and complex features cannot be learned. It can be proved mathematically that the best features that a single-layer neural network can learn are circles and edges, because they contain the most information that a single nonlinear transformation can carry. In order to generate more informative features, we cannot directly manipulate these inputs, but continue to transform the first batch of features (edges and circles) to get more complex features.
Studies have shown that the human brain has the same working mechanism: the cones of the first layer of nerves that receive information are more sensitive to edges and circles, while the deeper cerebral cortex is sensitive to more complex structures, such as the human face.
Hierarchical feature learning was born before deep learning, and its structure faces many serious problems, such as the disappearance of gradients-the gradient becomes too small at very deep levels to provide any learning information. This makes the hierarchical structure inferior to some traditional machine learning algorithms (such as support vector machines).
To solve the problem of vanishing gradients, so that we can train dozens of non-linear layers and features, many new methods and strategies have emerged, and the term "deep learning" comes from this. In the early 2010s, research found that with the help of GPU, the excitation function has a gradient flow sufficient to train a deep structure. Since then, deep learning has begun to develop steadily.
Deep learning is not always tied to deep nonlinear hierarchical features, and sometimes it is also related to long-term nonlinear time dependence in sequence data. For sequence data, most other algorithms only have the memory of the last 10 time steps, while the LSTM recurrent neural network (invented by Sepp Hochreiter and Jürgen Schmidhuber in 1997) enables the network to trace back the activities hundreds of time steps ago to make correct predictions. . Although LSTM has been hidden in the snow for nearly 10 years, its application has grown rapidly since it was combined with convolutional networks in 2013.
basic concepts
Logistic Regression
Regression analysis predicts the relationship between statistical input variables to predict output variables. Logistic regression uses input variables to produce output in a limited number of input variables, such as "cancer" / "no cancer", or the type of picture such as "bird" / "car" / "dog" / "cat" / "horse".
Logistic regression uses the logistic sigmoid function (see Figure 2) to assign weights to the input values to produce two-category predictions (in polynomial logistic regression, it is multi-category).
Logistic regression is very similar to non-linear perceptrons or neural networks without hidden layers. The main difference is that as long as the input variables meet some statistical properties, logistic regression is easy to interpret and reliable. If these statistical properties are true, a very stable model can be generated with very little input data. Therefore, logistic regression has extremely high value in many sparse data applications. For example, in medicine or social sciences, logistic regression is used to analyze and interpret experimental results. Because logistic regression is simple and fast, it is also very friendly to large data sets.
In deep learning, in the neural network used for classification, the last few layers are generally logistic regression. In this series, the deep learning algorithm is regarded as a number of feature learning stages, and then the features are passed to the logistic regression to input the classification.
Artificial Neural Network (Aritificial Neural Network)
The artificial neural network (1) reads the input data, (2) performs the transformation to calculate the weighted sum, (3) applies a nonlinear function to the transformation result, and calculates an intermediate state. These three steps together are called a "layer", and the transformation function is called a "unit". The intermediate state—the feature—is also the input of another layer.
Through the repetition of these steps, the artificial neural network learns many layers of nonlinear characteristics, and finally combines them to get a prediction. The learning process of the neural network is to generate error signals-the difference between the network prediction and the expected value-and then use these errors to adjust the weights (or other parameters) to make the prediction results more accurate.
Kaiser: The following is an analysis of several terms, including the changes in habits in recent years. I don't think we need to go into it.
Unit
The unit sometimes refers to the excitation function in a layer, and the input is transformed by these nonlinear excitation functions, such as the logistic sigmoid function. Usually a unit will connect several inputs and multiple outputs, but there are also more complicated ones. For example, a long short-term memory (LSTM) unit contains multiple excitation functions and is connected to a nonlinear function or maximum output unit in a special layout. LSTM calculates the final output after a series of nonlinear transformations on the input. Pooling, convolution and other input transformation functions are generally not called units.
Artificial Neuron (Artificial Neuron)
Artificial neurons-or neurons-are synonymous with "units", just to show a close connection with neurobiology and the human brain. In fact, deep learning has nothing to do with the brain. For example, it is now believed that a biological neuron is more like a whole multilayer perceptron, rather than a single artificial neuron. Neurons were proposed during the last AI winter, with the purpose of distinguishing the most successful neural network from the failed perceptrons. However, since the great success of deep learning since 2012, the media has often used the term "neuron" as an example and compared deep learning to the mimicry of the human brain. This is actually misleading and dangerous for the field of deep learning. Nowadays, the industry does not recommend the use of the term "neuron", but instead uses a more accurate "unit".
Acitivation Function
The excitation function reads the weighted data (the input data and the weight are subjected to matrix multiplication), and the nonlinear transformation of the output data. For example, output = max(0, weight_{data}) is to modify the linear excitation function (essentially, the negative value becomes 0). The difference between "unit" and "activation function" is that the unit can be more complex, for example, a unit can contain several activation functions (like LSTM units) or more complex structures (such as Maxout units).
Kaiser: This example in the original text obviously complicates the simple question, and I suggest skipping it.
The difference between linear excitation function and nonlinear excitation function can be reflected by the relationship between a set of weighted values: imagine 4 points A1, A2, B1, B2. The point (A1/A2) is very close to (B1/B2), but A1 is far away from B1 and B2, and so is A2.
After linear transformation, the relationship between the points will change, for example, A1 and A2 are far away, but at the same time B1 and B2 are also far away. However, the relationship between (A1/A2) and (B1/B2) remains unchanged.
The nonlinear transformation is different. We can increase the distance between A1 and A2, while reducing the distance between B1 and B2, thus establishing a new relationship with higher complexity. In deep learning, the nonlinear activation function of each layer creates more complex features.
The pure linear transformation, even if there are 1000 layers, can be represented by a single layer (because a series of matrix multiplications can be represented by a matrix multiplication). This is why nonlinear activation functions are so important in deep learning.
Layer
The layer is the more advanced basic unit of deep learning. The first layer is a container that accepts weighted input, undergoes nonlinear excitation function transformation, and passes it to the next layer as an output. A layer usually contains only one excitation function, pooling, convolution, etc., so it can be simply compared with other parts of the network. The first and last layers are called "input layer" and "output layer" respectively, and the middle ones are called "hidden layers".
Convolutional deep learning
Convolution
Convolution is a mathematical operation that expresses the rules of mixing two functions or two pieces of information: (1) feature mapping or input data, mixed with (2) convolution kernel to form (3) transformed feature mapping. Convolution is also often used as a filter. The kernel filters feature maps to find certain information. For example, a convolution kernel may only find edges and discard other information.
Convolution is very important in physics and mathematics, because it establishes a bridge between time domain and space (position (0,30), pixel value 147) and frequency domain (amplitude 0.3, frequency 30 Hz, phase 60 degrees). The establishment of this bridge is through Fourier transform: when you do Fourier transform on both the convolution kernel and the feature map, the convolution operation is greatly simplified (integration becomes multiplication).
Convolution can describe the spread of information. For example, when you pour milk into the coffee without stirring, this is diffusion, and it can be accurately described by convolution (pixels spread toward the edge of the image). In quantum mechanics, this describes the probability that when you measure a quantum's position, it will appear in a specific position (the probability that the pixel is in the highest position at the edge). In probability theory, the author describes cross-correlation, which is the similarity of two sequences (if a feature (such as a nose) pixels overlap with an image (such as a face), the similarity is high). In statistics, convolution describes the weighted moving average of the positive input blood hunting (high edge weight, low weight elsewhere). There are many other explanations from different perspectives.
Kaiser: The "edge" here is different from the previous edge. The original word is contour, which refers to the decision boundary. For example, for face recognition, the contour of the face is the contour, the focus of recognition.
But for deep learning, we don’t know which interpretation of convolution is correct. Cross-correlation interpretation is currently the most effective: Convolution filter is a feature detector, and the input (feature map) is Filtered by a certain feature (kernel), if the feature is detected, the output is high. This is also the interpretation of cross-correlation in image processing.
Pooling/Sub-Sampling
The pooling process reads input from a specific area and compresses it into a value (downsampling). In a convolutional neural network, the concentration of information is a very useful property, so that the output connections usually receive similar information (the information flows through the funnel and flows to the correct position of the next convolutional layer). This provides basic rotation/translation invariance. For example, if the face is not in the center of the picture, but slightly offset, this should not affect the recognition, because the pooling operation imports the information to the correct position, so the convolution filter can still detect the face.
The larger the pooling area, the more information is compressed, resulting in a more "slim" network and easier to match with video memory. But if the pooling area is too large and too much information is lost, it will also reduce the predictive ability.
Kaiser: The following part puts together the concepts listed before and is the essence of the full text.
Convolutional Neural Network (CNN)
Convolutional neural networks, or convolutional networks, use convolutional layers to filter the input to obtain valid information. Some parameters of the convolutional layer are automatically learned, and the filter is automatically adjusted to extract the most useful information (feature learning). For example, for a general object recognition task, the shape of the object may be the most useful; for birds to recognize people, the color may be the most useful (because the shapes of the birds are similar, but the colors are varied).
Generally, we use multiple convolutional layers to filter images, and the information obtained after each layer is more and more abstract (level features).
Convolutional networks use a pooling layer to obtain limited translation/rotation invariance (even if the target is not in the usual position, it can be detected). Pooling also reduces memory consumption, allowing more convolutional layers to be used.
Recent convolutional networks use the inception module, that is, a 1x1 convolution kernel to further reduce memory consumption and speed up the calculation and training process.
Inception
In convolutional networks, the original intention of the inception module is to implement deeper and larger convolutional layers with higher computational efficiency. Inception uses a 1x1 convolution kernel to generate a small feature map. For example, 192 28x28 feature maps can be compressed into 64 28x28 feature maps after 64 1x1 convolutions. Because the size is reduced, these 1x1 convolutions can be followed by larger convolutions, such as 3x3, 5x5. In addition to 1x1 convolution, max pooling is also used for dimensionality reduction.
In the output of the Inception module, all large convolutions are spliced into a large feature map, and then passed to the next layer (or inception module).

Label

Talking about Deep Learning: Core Concepts

Related Posts

Comments