c++ - deep neural network's precision for image recognition, float or double? -

February 15, 2010

neural networks image recognition can big. there can thousands of inputs/hidden neurons, millions of connections can take lot of computer resources.

while float being commonly 32bit , double 64bit in c++, don't have performance difference in speed yet using floats can save memory.

having neural network using sigmoid activation function, if choose of variables in neural network can float or double float save memory without making neural network unable perform?

while inputs , outputs training/test data can floats because not require double precision since colors in image can in range of 0-255 , when normalized 0.0-1.0 scale, unit value 1 / 255 = 0.0039~

1. hidden neurons output precision, safe make them float too?

hidden neuron's output gets value sum of previous layer neuron's output * connection weight being calculating neuron , sum being passed activation function(currently sigmoid) new output. sum variable double since become large number when network big.

2. connection weights, floats?

while inputs , neuron's outputs @ range of 0-1.0 because of sigmoid, weights allowed bigger that.

stochastic gradient descent backpropagation suffers on vanishing gradient problem because of activation function's derivative, decided not put out question of precision should gradient variable be, feeling float not precise enough, specially when network deep.

what hidden neurons output precision, safe make them float too?

using float32 everywhere safe first choice of neural network applications. gpus support float32, many practitioners stick float32 everywhere. many applications, 16-bit floating point values sufficient. extreme examples show high accuracy networks can trained little 2-bits per weight (https://arxiv.org/abs/1610.00324).

the complexity of deep networks limited not computation time, amount of ram on single gpu , throughput of memory bus. if you're working on cpu, using smaller data type still helps use cache more efficiently. you're limited machine datatype precision.

since colors in image can in range of 0-255,

you're doing wrong. force network learn scale of input data, when known (unless you're using custom weight initialization procedure). better results achieved when input data normalized range (-1, 1) or (0, 1) , weights initialized have average output of layer @ same scale. popular initialization technique: http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

if inputs in range [0, 255], average input being ~ 100, , weights being ~ 1, activation potential (the argument of activation function) going ~ 100×n, n number of layer inputs, far away in "flat" part of sigmoid. either initialize weights ~ 1/(100×n), or scale data , use popular initialization method. otherwise network have spend lot of training time bring weights scale.

stochastic gradient descent backpropagation suffers on vanishing gradient problem because of activation function's derivative, decided not put out question of precision should gradient variable be, feeling float not precise enough, specially when network deep.

it's less matter of machine arithmetic precision, scale of outputs each of layers. in practice:

preprocess input data (normalize (-1, 1) range)
if have more 2 layers, don't use sigmoids, use rectified linear units instead
initialize weights carefully
use batch normalization

this video should helpful learn these concepts if you're not familiar them.

Search This Blog

CSS

c++ - deep neural network's precision for image recognition, float or double? -

Comments

Post a Comment

Popular posts from this blog

php - trouble displaying mysqli database results in correct order -

depending on nth recurrence of job in control M -

sql server - Cannot query correctly (MSSQL - PHP - JSON) -