c++ - deep neural network's precision for image recognition, float or double? -


neural networks image recognition can big. there can thousands of inputs/hidden neurons, millions of connections can take lot of computer resources.

while float being commonly 32bit , double 64bit in c++, don't have performance difference in speed yet using floats can save memory.

having neural network using sigmoid activation function, if choose of variables in neural network can float or double float save memory without making neural network unable perform?

while inputs , outputs training/test data can floats because not require double precision since colors in image can in range of 0-255 , when normalized 0.0-1.0 scale, unit value 1 / 255 = 0.0039~

1. hidden neurons output precision, safe make them float too?

hidden neuron's output gets value sum of previous layer neuron's output * connection weight being calculating neuron , sum being passed activation function(currently sigmoid) new output. sum variable double since become large number when network big.

enter image description here

2. connection weights, floats?

while inputs , neuron's outputs @ range of 0-1.0 because of sigmoid, weights allowed bigger that.


stochastic gradient descent backpropagation suffers on vanishing gradient problem because of activation function's derivative, decided not put out question of precision should gradient variable be, feeling float not precise enough, specially when network deep.

  1. what hidden neurons output precision, safe make them float too?

using float32 everywhere safe first choice of neural network applications. gpus support float32, many practitioners stick float32 everywhere. many applications, 16-bit floating point values sufficient. extreme examples show high accuracy networks can trained little 2-bits per weight (https://arxiv.org/abs/1610.00324).

the complexity of deep networks limited not computation time, amount of ram on single gpu , throughput of memory bus. if you're working on cpu, using smaller data type still helps use cache more efficiently. you're limited machine datatype precision.

since colors in image can in range of 0-255,

you're doing wrong. force network learn scale of input data, when known (unless you're using custom weight initialization procedure). better results achieved when input data normalized range (-1, 1) or (0, 1) , weights initialized have average output of layer @ same scale. popular initialization technique: http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

if inputs in range [0, 255], average input being ~ 100, , weights being ~ 1, activation potential (the argument of activation function) going ~ 100×n, n number of layer inputs, far away in "flat" part of sigmoid. either initialize weights ~ 1/(100×n), or scale data , use popular initialization method. otherwise network have spend lot of training time bring weights scale.

stochastic gradient descent backpropagation suffers on vanishing gradient problem because of activation function's derivative, decided not put out question of precision should gradient variable be, feeling float not precise enough, specially when network deep.

it's less matter of machine arithmetic precision, scale of outputs each of layers. in practice:

  • preprocess input data (normalize (-1, 1) range)
  • if have more 2 layers, don't use sigmoids, use rectified linear units instead
  • initialize weights carefully
  • use batch normalization

this video should helpful learn these concepts if you're not familiar them.


Comments

Popular posts from this blog

depending on nth recurrence of job in control M -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -