47 typename ThreadGemmShape_,
49 int kScalarsPerLdgA_ = 1,
51 int kScalarsPerLdgB_ = 1,
53 bool kLaunchBounds =
true>
66 ThreadMultiplyAdd<ThreadGemmShape_, Shape<1, 4, 8>, float, float, float>,
108 int kScalarsPerLdgA_ = 1,
110 int kScalarsPerLdgB_ = 1,
112 typename Index_ = int,
114 typename GemmConfig_ =
117 typename GemmEpilogueTraits_ =
127 GemmEpilogue<GemmEpilogueTraits_>,
146 int kScalarsPerLdgA_ = 1,
148 int kScalarsPerLdgB_ = 1,
150 typename Index_ = int,
152 typename GemmConfig_ =
155 typename GemmEpilogueTraits_ =
165 GemmEpilogue<GemmEpilogueTraits_>,
Defines iterators for efficiently loading and storing to global memory.
Defines structural properties of complete GEMM computation.
Kind
Enumeration defining fundamental contiguous layouts.
Definition: matrix_traits.h:159
Definition: sgemm_traits.h:54
Template implementing matrix multiply-add operations on fragments.
Helper to define SGEMM traits using Launch Bounds.
Definition: sgemm_traits.h:157
Implements the epilogue phase of the GEMM kernel that efficiently updates global memory with the comp...
Defines iterators for efficiently loading and storing tiles to and from shared memory.
Definition: gemm_config.h:76
A Shape implementing Layout Concept describing the dimensions of a cube.
Definition: shape.h:64
Definition: gemm_epilogue_traits.h:340
Definition: sgemm_traits.h:119
Functor to compute linear combination of fragments.
Definition: linear_scaling.h:51
Implements a software-pipelined efficient GEMM.
Defines structural properties of the GEMM epilogue.
Definition: gemm_traits.h:773