.


Backpropagation, also called «backward propagation of errors,» is an approach that is commonly used in the training process of the deep neural network to reduce errors128.


Backward Chaining, also called goal-driven inference technique, is an inference approach that reasons backward from the goal to the conditions used to get the goal. Backward chaining inference is applied in many different fields, including game theory, automated theorem proving, and artificial intelligence129.


Bag-of-words model in computer vision. In computer vision, the bag-of-words model (BoW model) can be applied to image classification, by treating image features as words. In document classification, a bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words is a vector of occurrence counts of a vocabulary of local image features130.


Bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity. The bag-of-words model has also been used for computer vision. The bag-of-words model is commonly used in methods of document classification where the (frequency of) occurrence of each word is used as a feature for training a classifier131.


Baldwin effect – the skills acquired by organisms during their life as a result of learning, after a certain number of generations, are recorded in the genome132.


Baseline is a model used as a reference point for comparing how well another model (typically, a more complex one) is performing. For example, a logistic regression model might serve as a good baseline for a deep model. For a particular problem, the baseline helps model developers quantify the minimal expected performance that a new model must achieve for the new model to be useful133.


Batch – the set of examples used in one gradient update of model training134.


Batch Normalization is a preprocessing step where the data are centered around zero, and often the standard deviation is set to unity135.


Batch size – the number of examples in a batch. For example, the batch size of SGD is 1, while the batch size of a mini-batch is usually between 10 and 1000. Batch size is usually fixed during training and inference; however, TensorFlow does permit dynamic batch sizes136,137.


Bayes’s Theorem is a famous theorem used by statisticians to describe the probability of an event based on prior knowledge of conditions that might be related to an occurrence138.


Bayesian classifier in machine learning is a family of simple probabilistic classifiers based on the use of the Bayes theorem and the «naive» assumption of the independence of the features of the objects being classified139.


Bayesian Filter is a program using Bayesian logic. It is used to evaluate the header and content of email messages and determine whether or not it constitutes spam – unsolicited email or the electronic equivalent of hard copy bulk mail or junk mail. A Bayesian filter works with probabilities of specific words appearing in the header or content of an email. Certain words indicate a high probability that the email is spam, such as Viagra and refinance140.


Bayesian Network, also called Bayes Network, belief network, or probabilistic directed acyclic graphical model, is a probabilistic graphical model (a statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph