Disclaimer here: I was not able to use the entire dataset, so I used as much as I could in order to build an okay model (mae below 50 on the training set)

DATASET:
The dataset includes 10000 overhead pictures of Accra that each are 480x480 pixels. These pictures are labeled *.jpeg, where * is a number between 1 and 10000 corresponding to each of the 10000 pictures in the dataset. Another csv contains the population in each image. The goal of this project (the model), is to build and train a model that can accurately be given an image and then predict the population in that image based on the features present in that test image. The dataset was split into 9000 training images and 1000 testing images. For this task, since we are dealing with image recognition, the CNN should be expected to perform better than the DNN.

BUILDING THE MODEL:
I built two models: a DNN and a CNN. The DNN was designed with a Flatten layer followed by three layers with 128, 64 and 1 neurons per layer respectively. The DNN was compiled with the RMSprop optimizer, the MSE loss function and measured both the MAE and MSE (MAE^2). I trained the DNN with 1000 images in the training set and 1000 images in the testing set. I used 25 epochs with 100 steps per epoch and a batch size of 10.

The CNN was built off the DNN by adding 3 additional convolutional plus max pooling layers. The convolutions doubled with each layer, with 16, 32 and 64 convolutions per layer respectively. The CNN was compiled with the RMSprop optimizer, the MSE loss function and measured both the MAE and MSE (MAE^2). I trained the CNN with 250 images in the training set and 1000 images in the testing set. I used 25 epochs with 10 steps per epoch and a batch size of 25.

RESULTS:
I was unable to produce a graph however I will include my data results here. A cursory inspection returns the expected, both the DNN and the CNN model did very well with the training data it was given (last epoch 14.4495 and 14.0713 respectively). I noticed that the CNN gives a slightly lower MAE than the DNN model, however, it took about 4 times longer to train (about 34s compared to 130s respectively). However, the difference between the DNN and CNN comes with the testing set, which incorporated all 1000 testing images. The DNN scored an amazing 625 MAE on the testing set compared to its training set MAE with 14.4495. The CNN scored an 13.1401 MAE compared to its training set MAE with 14.0713. The DNN was overfit beyond compare while the CNN was only slightly underfit.

SOME PROBLEMS AND SOLUTIONS:
As stated in the disclaimer, I was unable to fully encorporate the entire 10000 image dataset. At one point, I was able to incorporate about 4700 images in the training set and 1000 images into the testing set into my DNN. The results were about 21 on the MAE on the training set and 86 on the testing set. The DNN used the same model and instead used 50 epochs with 47 steps per epoch with a batch size of 100. This model was the most successful DNN model compared to the other DNN model.

In order to improve these models, I would first like to be able to incorporate the entire 10000 image set. For the DNN, I would like to use a 100/90 split between the steps per epoch and batch size. I have noticed that leaning too heavily into either steps per epoch or batch size result in wild MAE fluctuations (for example, one epoch I had 22 MAE suddenly jump to 9000 on the next epoch). Based on these observations, I noticed that increasing the steps per epoch seem to give the model greater stability in its training stage, so I decided to use a 100/90 split instead of a 90/100 split. I believe that the model was slightly overfit, so I would maybe try training the model for less epochs (around 30).

For the CNN, I would incorporate the same 100/90 split between the steps per epoch and batch size. Howver, I noticed surprisingly that the model was actually underfitting. However, it seems that the MAE has already been fluctuating about epoch 12 at around 11 MAE. I would maybe add one more Dense layer with 64 neurons. I believe that the model being underfit while having a fluctuating MAE that is an indication of overfitting could mean that the model is not strong enough to encapsulate all the features inherent in this data set. I would probably play around with the epochs to see if anything changes.