Features and Limitations

Now that we discussed the core language features in OKL, it would be a good time to disclose features and limitations that might not be apparent at first.

Multiple Outer Loops

A kernel can have multiple outer loops, each with different number of iterations

@kernel void addVectors(int N,
                        int *a,
                        int *b,
                        int *ab) {
  // Update 'ab'
  for (int group = 0; group < N; group += 64; outer) {
    for (int i = group; i < (group + 64); ++i; inner) {
      if (i < N) {
        ab[i] = a[i] + b[i];
      }
    }
  }
  // Copy over 'ab' to 'b' shifted by 1
  for (int group = 0; group < N; group += 32; outer) {
    for (int i = group; i < (group + 32); ++i; inner) {
      if (0 < i && i < N) {
        b[i - 1] = ab[i];
      }
    }
  }
}

Multiple Inner Loops

An outer loop can contain multiple inner loops with the restriction that each loop does the same number of iterations

The following is correct code since both inner loops traverse 32 times

for (int group = 0; group < N; group += 32; outer) {
  for (int i = group; i < (group + 32); ++i; inner) {
    // Work
  }
  for (int i = 0; i < 32; ++i; inner) {
    // Work
  }
}

The following is incorrect code since the first loop has 32 iterations while the second only has 16

for (int group = 0; group < N; group += 32; outer) {
  for (int i = group; i < (group + 32); ++i; inner) {
    // Work
  }
  for (int i = group; i < (group + 32); i += 2; inner) {
    // Work
  }
}

Variable Declarations

We introduced the notion of shared and exclusive due to inner loop iterations not guaranteeing execution ordering. Additional restrictions include

  • Variables defined inside a kernel but outside of outer loops must be constant
@kernel void foo() {
  const int X = 20;
  for (outer) {}
}
  • Variables defined inside an outer loop but outside of inner loops must be constant, shared, or exclusive.
@kernel void foo() {
  for (outer) {
    const int X = 10;
    shared int Y[5];
    exclusive int Z;
    for (inner) {
    }
  }
}