Color Vision and Computers
Humans have an enormous variety of ways to interact with the world around them. Touch, smell, taste, hearing, but most important (in my mind) is vision. Computers are presently limited in their interactions with the world. They are allowed to crunch numbers, post a Facebook status that none of your friends want to read but ultimately don't get to touch, taste, smell, or see the world around them.
One thing that I find fascinating is object recognition using neural networks. I am currently taking a class with a focus on pattern recognition and machine learning where we have to do a project of our choosing. My classmate and I are comparing performance of traditional classification and deep learning for object detection (in our case an orange hockey ball). I was extremely happy when our prof approved the usage of deep learning despite the focus of the course being more on traditional classifying techniques. The project work division is that my classmate is developing the traditional classification and tracking method while I am creating its deep learning counter-part.
Conceptually training a neural network is really easy. Have enough inputs to take in every pixel in your image and enough outputs for your objects (in our case Ball, No Ball). I expected this to be an afternoon of work for an accurate system until I realized how much data was needed for training AND how long training could take.
The first dataset we created was with my Canon Powershot G5X. It can shoot 60 FPS at 1080p so we were collecting a huge number of high quality samples. We had around 2000 good images to work with after a few minutes of filming. Ideally we would have tens of thousands of images to train a deep learning system but this was for preliminary testing so we weren't too worried.
My classmate started working on his model and I grabbed some sample networks from Matlab's tutorials and started trying to understand what exactly is going on in the system. After starting to get a rough idea I decided to train my first neural network and man was it a bad idea. I setup my inputs as 1080x1920x3 (RGB images) and output with the two options described earlier. I setup the training to run on my mid-range gaming laptop's quad-core CPU (kicking myself for not getting the NVIDIA GPU card) and after about an hour it was hardly even through the first of many epochs.
I needed a beefier computer that I could offload the training to. Fortunately due to another project I am working on, I was able to keep access to the lab I had worked on for the summer. My workstation was a pretty powerful computer that could be run 24/7 and that I could remote desktop into to check on how the training was going. It took 24 hours to train the first network.
After the painfully long wait I decided to test the network. I opened my testing script, loaded in some images and ALL of them failed. Not a single positive result. All images returned 'No Ball'. I knew it would take too long to try and retrain with a few tweaks so I decided to try training again with black and white images (1080x1920x1) which would have a third of the inputs and data. It still took the majority of a day to train.
Our data set was just too big to train in a reasonable amount of time. The solution was to compress all of the images to 120x160x3 (RGB images). This allowed color images to be used and for much faster training time. I was surprised with how much information is still retained when dropping the image quality down so low.
Training is currently ongoing but is nearing completed. Performance in real-time situations is roughly 70% and improving as I work on increasing the size of the dataset. The network below is the best trained network so far. Below is the training plot of of the best network to date.
In your own personal experience how was training networks? Any performance suggestions or unwritten rules?
One thing that I find fascinating is object recognition using neural networks. I am currently taking a class with a focus on pattern recognition and machine learning where we have to do a project of our choosing. My classmate and I are comparing performance of traditional classification and deep learning for object detection (in our case an orange hockey ball). I was extremely happy when our prof approved the usage of deep learning despite the focus of the course being more on traditional classifying techniques. The project work division is that my classmate is developing the traditional classification and tracking method while I am creating its deep learning counter-part.
Orange Hockey Ball For Tracking |
Conceptually training a neural network is really easy. Have enough inputs to take in every pixel in your image and enough outputs for your objects (in our case Ball, No Ball). I expected this to be an afternoon of work for an accurate system until I realized how much data was needed for training AND how long training could take.
The first dataset we created was with my Canon Powershot G5X. It can shoot 60 FPS at 1080p so we were collecting a huge number of high quality samples. We had around 2000 good images to work with after a few minutes of filming. Ideally we would have tens of thousands of images to train a deep learning system but this was for preliminary testing so we weren't too worried.
My classmate started working on his model and I grabbed some sample networks from Matlab's tutorials and started trying to understand what exactly is going on in the system. After starting to get a rough idea I decided to train my first neural network and man was it a bad idea. I setup my inputs as 1080x1920x3 (RGB images) and output with the two options described earlier. I setup the training to run on my mid-range gaming laptop's quad-core CPU (kicking myself for not getting the NVIDIA GPU card) and after about an hour it was hardly even through the first of many epochs.
Terrible results from an early network despite good validation results. |
I needed a beefier computer that I could offload the training to. Fortunately due to another project I am working on, I was able to keep access to the lab I had worked on for the summer. My workstation was a pretty powerful computer that could be run 24/7 and that I could remote desktop into to check on how the training was going. It took 24 hours to train the first network.
After the painfully long wait I decided to test the network. I opened my testing script, loaded in some images and ALL of them failed. Not a single positive result. All images returned 'No Ball'. I knew it would take too long to try and retrain with a few tweaks so I decided to try training again with black and white images (1080x1920x1) which would have a third of the inputs and data. It still took the majority of a day to train.
Ball rolling across lab bench. |
Our data set was just too big to train in a reasonable amount of time. The solution was to compress all of the images to 120x160x3 (RGB images). This allowed color images to be used and for much faster training time. I was surprised with how much information is still retained when dropping the image quality down so low.
Decreased resolution of original training image. |
Training is currently ongoing but is nearing completed. Performance in real-time situations is roughly 70% and improving as I work on increasing the size of the dataset. The network below is the best trained network so far. Below is the training plot of of the best network to date.
In your own personal experience how was training networks? Any performance suggestions or unwritten rules?
Comments
Post a Comment