namesny-com/content/blog/cuda_net.md

705 B

title draft
Writing a Convolutional Neural Network library with CUDA Support true

"Just use cuBLAS, it'll be easier. You don't have to implement custom CUDA kernels.", they said. Actually, noone said that. I just thought that because I didn't do enough research.

Why not combine multiple challenging things into 1 (C++, cmake, CUDA, CNN)

Quickly discovering that without writing custom kernels, you can't really progress

  • cuBLAS column major layout, macro
  • cmake woes (findCUDA)
  • google test
  • padding kernel
  • column major / row major headache
  • removing cuBLAS -> just row major representation
  • naive conv2d
  • learning 3D memory representation
  • optimizing conv2d
  • softmax sum reduce