--- title: Writing a Convolutional Neural Network library with CUDA Support draft: true --- "Just use cuBLAS, it'll be easier. You don't have to implement custom CUDA kernels.", they said. Actually, noone said that. I just thought that because I didn't do enough research. Why not combine multiple challenging things into 1 (C++, cmake, CUDA, CNN) Quickly discovering that without writing custom kernels, you can't really progress - cuBLAS column major layout, macro - cmake woes (findCUDA) - google test - padding kernel - column major / row major headache - removing cuBLAS -> just row major representation - naive conv2d - learning 3D memory representation - optimizing conv2d - softmax sum reduce