mercredi 1 juillet 2015

Double-checking understanding of memory coalescing in CUDA

Suppose I define some arrays which are visible to the GPU:

double* doubleArr = new double[fieldLen];
float* floatArr = new float[fieldLen];
char* charArr = new char[fieldLen]

Now, I have the following CUDA thread:

void thread(){
  int o = getOffset(...);
  double d = doubleArr[threadIdx.x + o];
  float f = floatArr[threadIdx.x + o];
  char c = charArr[threadIdx.x + o];
}

I'm not quite sure whether I correctly interpret the documentation, and its very critical for my design: Will the memory accesses for double, float and char be nicely coalesced? (Guess: Yes, it will fit into sizeof(type) * blockSize.x / (transaction size) transactions, plus maybe one extra transaction at the upper and lower boundary.)

Furthermore, suppose I also have a struct:

struct char3{
  char a;
  char b;
  char c;
}

char3* char3Arr = new char3[fieldLen];

I guess this will be padded and aligned to 32 bit and then consume fieldLen*4 bytes in memory and coalesce the same way as a float?

Aucun commentaire:

Enregistrer un commentaire