8 Software Packages You Should Learn for Data Science

1. Excel

Before everyone starts booing me, Excel is a highly under rated tool by many. I blame it on over familiarity. We have all seen our computer illiterate co-worker using Excel to make up a phone list for their kid’s little league. It is hard to imagine that same tool can be a powerhouse when it comes to data.

Take sometime to learn data cleaning and manipulation methods in Excel. Download the following free add-ons: PowerQuery, PowerPivot, and Solver. These tools offer a lot of the same functionality as a $30K license to SAS, admittedly on a small scale.

2. R

R is a statistical programming package. While far from user friendly – you are going to have to learn to program here – you will be hard pressed to find something that R cannot do. Even better, it is free. As part of the open source community, R is constantly being updated and libraries are being created to cover almost anything you can imagine.

3. Python

Python is another free programming language that is quite popular in the data science world. Unlike R, Python is not specifically designed for data and statistics. It is a full fledged Object Oriented Programming language that has plenty of uses in the computer world.

If you spend even a little time around Data Science forums, you will see the battle of R vs Python play out over and over again. My advice – take a little time to learn the fundamentals of both.

4. MS SQL Server

I chose MS SQL Server over Oracle for 2 reasons. 1) MS SQL Server is 100 times easier to install and configure for most computer users. 2) While Oracle’s PL\SQL is definitely a more robust language, Microsoft’s T-SQL is easier to pick up.

My one caveat here would be if your desire is to become a database design or database engineer. In that case, I would suggest learning Oracle first, as all the fundamental you develop there, will be easily translated to MS SQL Server.

Another great advantage of MS SQL Server is the developer’s edition. For around $70, you can purchase a developer’s license which gives you the whole suite of tools including: SSIS and SSRS.

5. SSIS

SSIS (SQL Server Integration Services) is a robust ETL (Extract Transform Load) tool build around the Microsoft Visual Studios platform. It comes included with the developer’s edition and provides a graphical interface for building ETL processes.

6. Tableau

Until further notice, Tableau is the reigning king of data visualizations. Just download a copy of free Tableau Public and you will wonder why you spent all that effort fighting with Excel charts for all these years.

While Tableau’s analytics tools leave a lot to be desired, the time it will save you in data exploration will have you singing its praises.

7. Qlik

Admittedly, Qlik is focused more as an end user BI tool, but it provides robust data modeling capabilities. Also, Qlik’s interactive dashboards are not only easy to construct, but leave the competition in the dust when it comes to ease of use.

8. Hadoop

It’s Big Data’s world, we just live here. If you want to be a data professional in this century, you are going to need to become familiar with Big Data platforms. My suggestion is to download the Hortonworks Sandbox edition of Hadoop. It is free and they provide hours worth of tutorials. Time spent learning Pig Script and Hive Script (Big Data query languages) will be well worth your effort.

What about SAS, SSPS, Cognos, blah, blah, blah…

While these packages do have a dominant position in the world of analytics, they are not very nice about offering free or low cost versions for people wanting to get into the profession to learn. I wanted to fill my list with software that could be obtain for little or no cost.

If you are currently a college student, you have a better position. Check with your college for software partnerships. Also, check with software vendors to see if they have student editions. As of this writing, I know Tableau, SAS, and Statistica offer student versions that you may want to look into.

2 thoughts on “8 Software Packages You Should Learn for Data Science

  1. I was a reluctant user of Excel, but I’ve really come to like it. It’s surprisingly powerful and if you’re working with people who aren’t experienced analysts, it’s much easier to work with Excel than with something like R.

Leave a Reply