--- title: Writing a Convolutional Neural Network library with CUDA Support draft: true --- Straightforward project, learned a lot more than I expected. "Just use cuBLAS, it'll be easier. You don't have to implement custom CUDA kernels.", they said. Actually, noone said that. I just thought that because I didn't do enough research. Why not combine multiple challenging things into 1 (C++, cmake, CUDA, CNN) Quickly discovering that without writing custom kernels, you can't really progress - cuBLAS column major layout, macro - cmake woes (findCUDA) - google test - padding kernel - column major / row major headache - removing cuBLAS -> just row major representation - naive conv2d - learning 3D memory representation - optimizing conv2d - softmax sum reduce - softmax numerical stability - max reduce - custom binary weights file - (safetensors - json parser vs csv) values overwritten by header - tests passing -> implement AlexNet - AlexNet cmake, opencv - AlexNet crashing -> add cuda error checking to tests -> test crashing - compute-sanitizer memecheck