nvidia - CUDA bandwidth in single precison and bandwidth in double precison -


i trying determinate pc bandwidth cuda. have 750m bord , theoretical bandwidth 90 gb, in specification mention 80gb. have tried simple algorithm nvidia website https://devblogs.nvidia.com/parallelforall/how-implement-performance-metrics-cuda-cc/. have change code single precison , double precison , have results:

single precison(float) : 30gb/s

double precison : 26gb/s

the bandwidth single precison calculated this:

 printf("effective bandwidth (gb/s): %fn", n*4*3/milliseconds/1e6); 

and if try make double precison (8 bytes) :

printf("effective bandwidth (gb/s): %fn", n*8*3/milliseconds/1e6); 

the rezult double precison bigger single precison :

single precison(float) : 30 gb/s

double precison : 45 gb/s

using approach, making 32 bits loads. memory management unit card not have necessary resources (requests in flight - see little law , memory latency this technical report seems cover it) bandwidth perform @ full performance.

you want use 128 bits loads (using float4 example) maximize bandwidth, or @ least float2, , perform several operations each thread.


Comments

Popular posts from this blog

sql server - Cannot query correctly (MSSQL - PHP - JSON) -

php - trouble displaying mysqli database results in correct order -

C++ Linked List -