namesny-com/cuda_net.md at 501c92444b3125eb1f34bb132bf3fc032a70e350

1.0 KiB

Raw Blame History

title	draft
Writing a Convolutional Neural Network library with CUDA Support	true

Straightforward project, learned a lot more than I expected.

"Just use cuBLAS, it'll be easier. You don't have to implement custom CUDA kernels.", they said. Actually, noone said that. I just thought that because I didn't do enough research.

Why not combine multiple challenging things into 1 (C++, cmake, CUDA, CNN)

Quickly discovering that without writing custom kernels, you can't really progress

cuBLAS column major layout, macro
cmake woes (findCUDA)
google test
padding kernel
column major / row major headache
removing cuBLAS -> just row major representation
naive conv2d
learning 3D memory representation
optimizing conv2d
softmax sum reduce
softmax numerical stability - max reduce
custom binary weights file - (safetensors - json parser vs csv) values overwritten by header
tests passing -> implement AlexNet
AlexNet cmake, opencv
AlexNet crashing -> add cuda error checking to tests -> test crashing
compute-sanitizer memecheck

1.0 KiB Raw Blame History

1.0 KiB

Raw Blame History