GSoC 2021 3rd Entry
This is my 3rd entry of my CuPy backend QuTiP project. There have been some developments since the last entry, some paths have bifurcated, others pruned and most of them have lengthened as the general project gathers pace.
CuPyDense class
There are currently 2 open PRs involving this class. The first one involves the base implementation of CuPyDense, which just means QuTiP’s Data class methods. The second one corresponds to the methods inherent to the Dense class and to some handy constructors (zeros matrix, diagonal matrix, etc).
Precision
After doing some minor benchmarking, as well as some reading , calculation precision seems to play a major role in setting run times. The latest generations of Nvidia GPUs have introduced TensorCores which benefit float32 and float16 operations, but not float64 you can read more here.
As of late, due to the success of Adaptive Mixed Precision training in Deep Learning ( which works out-of-the-box in the two major DL libraries) there have been more studies on how to leverage TensorCores for doing scientific calculations with half-precision arithmetic [1]. Some of these [2–3] deal with how to improve calculations on Complex types ,which are of particular importance to us, and there are sure more to come.
Adding the precision as an attribute to our CuPyDense class will let us take advantage of these present tidings.
Avoiding CuPy calls on internal methods
During benchmarking I also realized that the aggressive calls to CuPy.array I was making at initialization inside the CuPyDense internal methods, were slowing down the code. I was in need of bypassing some parts of the initialization.
In other programming languages the class initializer or contructor can be overloaded naturally, by making different versions that take differently typed arguments as inputs. One would have a constructor that on being called on an instance of Class CuPyDense would just bypass all the checking and convert to CuPy.array. I grappled with 2 different ways of achieving similar result in python.
With an specialized constructor one just calls __new__ to create the class and one goes manually allocating the atributes and calling super to initialize parent classes. A programmer needs to be confident that what is being passed to this constructor complies with the requirements of the CuPyDense class.
@classmethoddef _no_checks_constructor(cls, data): # This constructor should only be called when we are sure the argument # is a CuPy 2-dimensional array. out = cls.__new__(cls) super(cls, out).__init__(data.shape) out._cp = data out.dtype = data.dtype return outdef conj(self): return CuPyDense._no_checks_constructor(self._cp.conj())
One issue with this approach is that if there is a complex inheritance graph properly passing to each parent class becomes cumbersome and difficult to track. Another one is that one should make an effort to hide the new constructor from the user in order to avoid undesired results, here we declared it as a classmethod and used a leading underscore.
Another way to have the same results as when overloading constructors in a stronlgy typed language, is either to call the isinstance function and do manual type checking, or to use metaclasses just like here .
class MetaInit(type):
def __call__(cls, *args, **kwargs):
if args or kwargs:
return super().__call__(*args, **kwargs)
return cls.__new__(cls)class CuPyDense(metaclass=MetaInit,data.Data):
def __init__(self, data, shape=None, copy=True, dtype=cp.complex128):
................ ............... def conj(self): out = CuPyDense() out._cp = self._cp.conj() super(CuPyDense, out).__init__(data.shape) out.dtype = data.dtype return out
Notice that a call to __init__ on CuPyDense will get delegated to the metaclass __call__ method first, and if there are no arguments no initialization of the class takes place; just a new uninitialized object is returned. If some argumens were passed then construction will be delegated to type.__call__ which will call the __new__ and CuPyDense.__init__. Understanding this properly requires knowing how the whole initialization process of a class takes place. As you can see using metaclasses is a much more obscure way of getting the same result, and one vehemently adviced against, so we are going with the custom initializer option.
GPU and CI
As you may remember from my last entry there were some issues making tests run on GitHub Actions, mostly because of the fact that we were trying to install CuPy without a GPU and some issues setting a cudatoolkit environment. After many tries, and with the advice of CuPy core developer, Leo Fang, we decided to incorporate a self-hostef GPU to our github actions environment. The task was carried out mostly by my mentor, Simon Cross, who got us an AWS instqaance and the necesary connections working. There was one minor hiccup with the original AWS AMI, it so happens that Ubuntu18.04 comes with a pre-installed git version that is not compatible with some of the magic of GitHub Actions, but by switching to an AMI with Ubuntu 20.04 and Nvidia drivers the problem was solved.
Till the next entry.
References
[1] A. Abdelfattah et al., “A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic,” 2020. https://arxiv.org/pdf/2007.06674.pdf
[2] A. Abdelfattah, S. Tomov, and J. Dongarra, Towards half-precision computation for complex matrices: A case study for mixed-precision solvers on GPUs, in 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), 2019, pp. 17–24, https://doi.org/10.1109/ScalA49573.2019.00008
[3] A. Abdelfattah, S. Tomov and J. Dongarra, Matrix multiplication on batches of small matrices in half and half-complex precisions, J. Parallel Distrib. Comput. 145 (2020) 188–201