Cutlass
CUDA Templates for Linear Algebra Subroutines and Solvers
|
#include <gemm_global_stream.h>
Contains private storage in shared memory needed by the objects within this class. Note, this is NOT the shared memory allocation for the GEMM threadblock tile. That necessarily exists outside this class, as it is also needed by the warp-level shared=>RF stream.