nvidia - CUDA bandwidth in single precison and bandwidth in double precison -
i trying determinate pc bandwidth cuda. have 750m bord , theoretical bandwidth 90 gb, in specification mention 80gb. have tried simple algorithm nvidia website https://devblogs.nvidia.com/parallelforall/how-implement-performance-metrics-cuda-cc/. have change code single precison , double precison , have results:
single precison(float) : 30gb/s
double precison : 26gb/s
the bandwidth single precison calculated this:
printf("effective bandwidth (gb/s): %fn", n*4*3/milliseconds/1e6); and if try make double precison (8 bytes) :
printf("effective bandwidth (gb/s): %fn", n*8*3/milliseconds/1e6); the rezult double precison bigger single precison :
single precison(float) : 30 gb/s
double precison : 45 gb/s
using approach, making 32 bits loads. memory management unit card not have necessary resources (requests in flight - see little law , memory latency this technical report seems cover it) bandwidth perform @ full performance.
you want use 128 bits loads (using float4 example) maximize bandwidth, or @ least float2, , perform several operations each thread.
Comments
Post a Comment