Suppose I define some arrays which are visible to the GPU:
double* doubleArr = new double[fieldLen];
float* floatArr = new float[fieldLen];
char* charArr = new char[fieldLen]
Now, I have the following CUDA thread:
void thread(){
int o = getOffset(...);
double d = doubleArr[threadIdx.x + o];
float f = floatArr[threadIdx.x + o];
char c = charArr[threadIdx.x + o];
}
I'm not quite sure whether I correctly interpret the documentation, and its very critical for my design: Will the memory accesses for double, float and char be nicely coalesced? (Guess: Yes, it will fit into sizeof(type) * blockSize.x / (transaction size) transactions, plus maybe one extra transaction at the upper and lower boundary.)
Furthermore, suppose I also have a struct:
struct char3{
char a;
char b;
char c;
}
char3* char3Arr = new char3[fieldLen];
I guess this will be padded and aligned to 32 bit and then consume fieldLen*4 bytes in memory and coalesce the same way as a float?
Aucun commentaire:
Enregistrer un commentaire