47 typename ThreadGemmShape_,
49 int kScalarsPerLdgA_ = 1,
51 int kScalarsPerLdgB_ = 1>
65 ThreadMultiplyAdd<ThreadGemmShape_, Shape<1, 4, 8>, double, double, double>,
108 int kScalarsPerLdgA_ = 1,
110 int kScalarsPerLdgB_ = 1,
112 typename Index_ = int,
114 typename GemmConfig_ =
117 typename GemmEpilogueTraits_ =
127 GemmEpilogue<GemmEpilogueTraits_>,
Defines iterators for efficiently loading and storing to global memory.
Defines structural properties of complete GEMM computation.
Kind
Enumeration defining fundamental contiguous layouts.
Definition: matrix_traits.h:159
Template implementing matrix multiply-add operations on fragments.
Implements the epilogue phase of the GEMM kernel that efficiently updates global memory with the comp...
Defines iterators for efficiently loading and storing tiles to and from shared memory.
Definition: gemm_config.h:76
Definition: dgemm_traits.h:119
Definition: dgemm_traits.h:52
A Shape implementing Layout Concept describing the dimensions of a cube.
Definition: shape.h:64
Definition: gemm_epilogue_traits.h:340
Functor to compute linear combination of fragments.
Definition: linear_scaling.h:51
Implements a software-pipelined efficient GEMM.
Defines structural properties of the GEMM epilogue.
Definition: gemm_traits.h:773