Cutlass
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Collect the global load streams for multiplicands.
#include <gemm_stream_pair.h>
Classes | |
struct | Params |
Parameters object passed to load iterators. More... | |
Public Types | |
typedef StreamA_ | StreamA |
Stream for A multiplicand. More... | |
typedef StreamB_ | StreamB |
Stream for B multiplicand. More... | |
typedef ZipTensorRef< typename StreamA::TensorRef, typename StreamB::TensorRef > | ThreadblockTileRef |
Shared memory allocation for threadblock-scoped GEMM tile. More... | |
Public Member Functions | |
CUTLASS_DEVICE | SharedStreamPair (Params const ¶ms, ThreadblockTileRef const &threadblock_tile_ref) |
Construct with the composable structure. More... | |
CUTLASS_DEVICE void | copy (int step) |
Trigger the copies from shared memory to registers. More... | |
CUTLASS_DEVICE void | commit (int step) |
Commit the data. More... | |
CUTLASS_DEVICE StreamA::TransformedFragment const & | fragment_a (int step) const |
The fragment A. More... | |
CUTLASS_DEVICE StreamB::TransformedFragment const & | fragment_b (int step) const |
The fragment B. More... | |
CUTLASS_DEVICE void | inc_stage () |
Increment the stage. More... | |
Public Attributes | |
StreamA | stream_a |
The stream for A. More... | |
StreamB | stream_b |
The stream for B. More... | |
typedef StreamA_ cutlass::gemm::SharedStreamPair< StreamA_, StreamB_ >::StreamA |
typedef StreamB_ cutlass::gemm::SharedStreamPair< StreamA_, StreamB_ >::StreamB |
typedef ZipTensorRef<typename StreamA::TensorRef, typename StreamB::TensorRef > cutlass::gemm::SharedStreamPair< StreamA_, StreamB_ >::ThreadblockTileRef |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
StreamA cutlass::gemm::SharedStreamPair< StreamA_, StreamB_ >::stream_a |
StreamB cutlass::gemm::SharedStreamPair< StreamA_, StreamB_ >::stream_b |