Completely off track...
MATLAB has a Machine Learning extension as well as a (required) Parallel Computing extension but even for students, they cost money. Until I get a better graphics card, I will set it aside...
What's the point? Well, you can model your deep learning project by stacking various CNN layers in series and learn things like digit recognition. The real point is this: The extensions will solve the math on a CPU or NVIDA graphics cards with CUDA cores. The fancier the graphics card, the more CUDA cores it will have. A high end graphics card may have as many as 8192 CUDA cores.
In the simple Digit Recognition example, on my PC with a very limited graphics card, the problem is solved in 25 seconds on the CPU and 15 seconds on the graphics card.
I suppose there is a reason that NVIDIA provides a C++ and Fortran compiler for their Jetson products (and the others, I imagine). Note that they include Fortran in their SDK! There's something still alive in the Fortran universe and call-by-reference still rules. I haven't used Python for any of my AI type projects (limited as they are) and why would I want to slow down the computation with an interpreted language? ETA: That's not quite right, I have done a few examples in Python using various books. It's just that the MATLAB approach is more appealing.
The code for the Digit Recognition example shows how easy it is to stack layers:
[font=courier]
%Program to recognize digits using Deep CNN
%Giving path of dataset folder
digitDatasetPath='c:/Digits';
%Reading Digit Images from image Database Folder
digitimages=imageDatastore(digitDatasetPath,'IncludeSubfolders',true,'LabelSource','foldernames');
%Distributing images in the set of Training and Testing
numTrainFiles=750; %numTrainFiles=-.75 (75%)
[TrainImages,TestImages]=splitEachLabel(digitimages,numTrainFiles,'randomize');
layers=[
imageInputLayer([28,28,1],'Name','Input')
convolution2dLayer(3,8,'Padding','same','Name','Conv_1')
batchNormalizationLayer('Name','BN_1')
reluLayer('Name','Relu_1')
maxPooling2dLayer(2,'Stride',2,'Name','MaxPool_1')
convolution2dLayer(3,16,'Padding','same','Name','Conv_2')
batchNormalizationLayer('Name','BN_2')
reluLayer('Name','Relu_2')
maxPooling2dLayer(2,'Stride',2,'Name','MaxPool_2')
convolution2dLayer(3,32,'Padding','same','Name','Conv_3')
batchNormalizationLayer('Name','BN_3')
reluLayer('Name','Relu_3')
maxPooling2dLayer(2,'Stride',2,'Name','MaxPool_3')
convolution2dLayer(3,64,'Padding','same','Name','Conv_4')
batchNormalizationLayer('Name','BN_4')
reluLayer('Name','Relu_4')
fullyConnectedLayer(10,'Name','FullyConnected')
softmaxLayer('Name','SoftMax')
classificationLayer('Name','OutputClassification')
];
lgraph = layerGraph(layers);
plot(lgraph); %Plotting Network Structure
%------ Training Options ------
options = trainingOptions('sgdm', ...
'ExecutionEnvironment','AUTO',... % or AUTO, GPU or CPU
'InitialLearnRate',0.01, ...
'MaxEpochs',4, ...
'Shuffle','every-epoch', ...
'ValidationData',TestImages, ...
'ValidationFrequency',30, ...
'Verbose',false, ...
'Plots','training-progress');
net = trainNetwork(TrainImages,layers,options); %Network Training
% analyzeNetwork(net)
YPred = classify(net,TestImages); %Recognizing digits
YValidation = TestImages.Labels; %Getting labels'
accuracy = sum(YPred == YValidation)/numel(YValidation); %finding accuracy
fprintf('Accuracy: %g%%\n',100*accuracy)
[/font]