namesny-com/content/blog/cuda_net.md

1.0 KiB

title draft
Writing a Convolutional Neural Network library with CUDA Support true

Straightforward project, learned a lot more than I expected.

"Just use cuBLAS, it'll be easier. You don't have to implement custom CUDA kernels.", they said. Actually, noone said that. I just thought that because I didn't do enough research.

Why not combine multiple challenging things into 1 (C++, cmake, CUDA, CNN)

Quickly discovering that without writing custom kernels, you can't really progress

  • cuBLAS column major layout, macro
  • cmake woes (findCUDA)
  • google test
  • padding kernel
  • column major / row major headache
  • removing cuBLAS -> just row major representation
  • naive conv2d
  • learning 3D memory representation
  • optimizing conv2d
  • softmax sum reduce
  • softmax numerical stability - max reduce
  • custom binary weights file - (safetensors - json parser vs csv) values overwritten by header
  • tests passing -> implement AlexNet
  • AlexNet cmake, opencv
  • AlexNet crashing -> add cuda error checking to tests -> test crashing
  • compute-sanitizer memecheck