namesny-com/cuda_net.md at b8a160bac80ef941337942ff84e8817af85b4c30

705 B

Raw Blame History

title	draft
Writing a Convolutional Neural Network library with CUDA Support	true

"Just use cuBLAS, it'll be easier. You don't have to implement custom CUDA kernels.", they said. Actually, noone said that. I just thought that because I didn't do enough research.

Why not combine multiple challenging things into 1 (C++, cmake, CUDA, CNN)

Quickly discovering that without writing custom kernels, you can't really progress

cuBLAS column major layout, macro
cmake woes (findCUDA)
google test
padding kernel
column major / row major headache
removing cuBLAS -> just row major representation
naive conv2d
learning 3D memory representation
optimizing conv2d
softmax sum reduce

705 B Raw Blame History

705 B

Raw Blame History