* Tesla P100: 1GPU 16GB HBM2 $ 1.46 / hour

Let's try. Let's follow the link colab.research.google.com and press the button "create a notepad". We will have a blank Notebook. You can enter an expression:

10 ** 3/2 + 3

and clicking on play – we get the result 503.0. You can display the graph of the parabola by clicking the "+ Code" button in the new cell in the code:

def F (x):

return x * x

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace (-5, 5, 100)

y = list (map (F, x))

plt.plot (x, y)

plt.ylabel ("Y")

plt.xlabel ("X")

Or displaying an image as well:

import os

! wget https://www.python.org/static/img/python-logo.png

import PIL

img = PIL.Image.open ("python-logo.png")

img

Popular frameworks:

* Caffe, Caffe2, CNTK, Kaldi, DL4J, Keras – a set of modules for design;

* TensorFlow, Theano, MXNet – graph programming;

* Torch and PyTorch – register the main parameters, and the graph will be built automatically.

Consider the PyTorch library (NumPy + CUDA + Autograd) because of its simplicity. Let's look at operations with tensors – multidimensional arrays. Let's connect the library and declare two tensors: press + Code, enter the code into the cell and press execute:

import torch

a = torch.FloatTensor ([[1, 2, 3], [5, 6, 7], [8, 9, 10]])

b = torch.FloatTensor ([[– 1, -2, -3], [-10, -20, -30], [-100, -200, -300]])

Element-wise operations such as "+", "-", "*", "/" on two matrices of the same dimensions perform operations with their corresponding elements:

a + b

tensor ([[0., 0., 0.],

[-5., -14., -23.],

[-92., -191., -290.]])

Another option for the elementwise operation is to apply one operation to all elements one by one, for example, multiply by -1 or apply a function:

a

tensor ([[1., 2., 3.],

[5., 6., 7.],

[8., 9., 10.]])

a * -1

tensor ([[-1., -2., -3.],

[-5., -6., -7.],

[-8., -9., -10.]])

a.abs ()

tensor ([[1., 2., 3.],

[5., 6., 7.],

[8., 9., 10.]])

There are also convolution operations, such as sum, min, max, which, as input, give the sum of all elements, the smallest or largest element of the matrix:

a.sum ()

tensor (51.)

a.min ()

tensor (1.)

a.max ()

tensor (10.)

But, we will be more interested in post-column operations (the operation will be performed on each column):

a.sum (0)

tensor ([14., 17., 20.])

a.min (0)

torch.return_types.min (values = tensor ([1., 2., 3.]), indices = tensor ([0, 0, 0]))

a.max (0)

torch.return_types.max (values = tensor ([8., 9., 10.]), indices = tensor ([2, 2, 2]))

As we remember, a neural network consists of three layers, a layer of neurons, and a neuron contains connections at the input with weights in the form of prime numbers. The weight is set by an ordinary number, then the incoming connections to the neuron can be described by a sequence of numbers – a vector (one-dimensional array or list), the length of which is the number of connections. Since the network is fully connected, all the neurons of this layer are connected to the previous one, and therefore the vectors demonstrating them also have the same length, creating a list of vectors of equal length – a matrix. It is a convenient and compact layer representation optimized for use on a computer. At the output of the neuron, there is an activation function (sigmoid or, ReLU for deep and ultra-deep networks), which determines whether the neuron outputs a value or not. To do this, it is necessary to apply it to each neuron, that is, to each column: we have already seen the operation on columns.

Accelerating learning

These operations are used for convolutions, which take over 99% of the time and therefore there are specialized tools for their optimization. The calculations themselves are performed not in Python, but in C – Python only calls the API of low-level math libraries. Since such computations are easily parallelized, processors designed for parallel image processing (GPU) are used instead of general-purpose processors (CPUs). So, if a PC has from 2 to 8 cores in a processor, and a server has from 10 to 20 cores, then in a GPU there are hundreds or thousands of highly specialized for processing matrices and vectors. The most popular standard for the group of drivers providing access to the NVidia GPU is called CUDA (Computed Unified Device Architecture), which you can check for support with "lspci | grep-i Nvidia". The alternate is OpenCL promoted by AMD for its GPUs, but development and support in frameworks is rudimentary. For more optimization in processors for ML, special instructions are used that are used in special libraries. For example, Intel Xeon SCalate processors in eight-bit numbers and special pipelines that are activated when using OpenVINO, which gives an increase in speed up to 3.7 times for PyTorch. To speed up the classic ML (classification) XGboost giving an increase of up to 15 times. For now, a low-power CPU is enough for us.