Data Processing: Matlab, Python or Octave?



The internet is a sea of servers, data and people. Some of said people have managed to scrape out a living doing data analysis on just about anything they can get their hands on. Data scientists are being hired left right and center to try and improve performance and experience of users. By looking at trends they can determine things like what students in a university course are likely to fail based on their current test scores in the class. This would allow teachers to then help the students before it is too late and they fail courses.

The possibilities of data analysis are nearly endless. It could be loading data into neural networks for AIs to analyze, it could be optimizing website layout to increase user visit duration or anything else you can imagine. 

With large amounts of data there comes many different ways to process it. You could try it in Excel (and die before you can fill everything in) or you could look to more powerful tools. In my experience with data processing and analysis I have used three of the more popular options. They are Matlab, Python and Octave. 

Each language (set of software?) has pros and cons that accompany it so we will look at each option in detail. Though this isn't a complete list, I haven't encountered many other data processing tools as commonly used as these. Feel free to comment any that I left out that you would like me to cover and I will update the article.

Matlab

Matlab is pretty much the industry standard (for electrical engineering). Though it may be expensive ($2150 USD) for a personal copy it does have many useful toolboxes. The trick here is that Matlab is the EA (Electronic Arts) of the computing world. Almost all of the good toolboxes cost extra and many of the more advanced toolboxes need several other more basic ones first. A toolbox that I would like is the HDL Coder toolbox. Sneaky Matlab makes it so I have to buy three toolboxes to get it and have it be functional.

Besides the potential problems with toolbox requirements Matlab is excellent to work with. When describing it to friends I often just say a calculator on steroids. You can do control systems with Matlab but my experience with that has been through Simulink which I would consider a separate tool.

The language component of Matlab is very simple to use. If at any time a user needs help they can look up commands on either the Mathworks site or by typing help "command" into the Matlab command line. All in all the software has great community support but does come at the price of having to buy Matlab.

Expensive but gets the job done, not my first choice but is a standard piece of software for EE.


Python

I am definitely biased towards Python. Based on my experience with Python to day I will definitely be using more of it in the future. As a data processing tool Python is exceptional. It has many libraries that will assist in the number crunching. Using what is called Anaconda, a user can manage and install all of the libraries for Python that you could ever need. The best part is that all of the Python libraries are completely free.

When comparing Python to Matlab it is very similar to write the code. The command names and syntax are nearly identical. Python is slightly less user-friendly (if you don't use Anaconda) but it still manageable for new users.

One of my favorite things about Python is how diverse it is. It is really easy to run C code and a variety of langues from within Python using wrappers. I do know that Matlab can do this as well but it can be really frustrating, especially if it involves Mex files.

Despite being great for number crunching, Python is a complete programming language that is great for object oriented and beginner programming. In recent years, many universities have started switching from teaching Matlab or other data analysis tools to students and replaced it with Python. Why not kill many birds with one stone? Free, open-source and easily learned. If I could go back and start again, I would have tried to avoid Matlab and used Python instead. 


GNU Octave



Octave, oh octave. Haven't used Octave in any amounts recently. The last time I used octave was when it was just starting to have the option for a GUI. I can definitely say I haven't spent enough time with it because from what I have heard it is a great piece of software but my personal experience was very underwhelming. 

Octave looks and feels like Matlab, code writes almost the exact same and everything which isn't surprising since it is designed as an open-source equivalent. Like Python and Matlab, you are able to load code from C/C++, Fortran and other languages to be run from inside Octave. I have never done this with Octave but I am assuming it can be as frustrating as mex files in Matlab. 


Performance

Somebody new to data analysis and trying to select what is best for them will say which one should I pick? For me this wouldn't be a particularly difficult question since Python is free, fast and does more than just data analysis. 

If somebody is worried about execution speed I wouldn't worry too much about which language you choose. In my experience when using both Python and Matlab for signal processing I found Python to have a slight edge. If performance is that big of an issue once can simply implement C/C++ code into any of the above options to improve performance dramatically. 


Alternatives

In recent years I have heard of some great alternatives for data processing. People have been mentioning using R. I have no experience with R but plan on investigating this more in the future.

- Sage Math
- Julia
- R
- Excel (starts bleeding internally)


What is your weapon of choice for data processing?

Popular posts from this blog

Artful Aardvark: Ubuntu 17.10 Tips and Tricks

Color Vision and Computers

Python: My Anaconda Don't Want None