Data Jobs: What do Data Scientists do?

While generalizing any profession is difficult since so many factors can come into play: (academic versus corporate — large company versus start-up), I find the role of Data Scientist can truly only be described by breaking it apart into three very different roles. For lack of better terms I will name these roles: Academic/Research Data Scientist, Applied Data Scientist, and finally Data Analyst/Scientist. These roles require different skill sets and different mindset that I will discuss below.

First off, I would like to put my bias out front, I am an Applied Data Scientist. Now, that does not mean I feel that my current role is superior to any other data scientist role out there. Also, I work for a very large company with lots of resources, so I also want to point out that the roles I will describe below can overlap depending on your working environment.

Academic/Research Data Scientist

These are the people who design new machine learning algorithms and push the boundaries of data science and AI. When you see advances in self driving cars and computer vision, these are the people behind those advances. These individuals work either in a university setting or as part of a research team for companies like Google, Facebook, Tesla. Most either hold a PhD or are currently working on one. These data scientists actively develop and conduct research experiments that are written up and published in scientific journals. These are the pure scientists among the data science world.

To be amongst this crowd of data scientists, you need to be at the top of your game with advanced mathematics and programming skills. If you really love diving deep into the field and can handle the often glacial pace research moves at, this could be the job for you. The biggest drawback is that these jobs are limited. There are only so many academic or industry research positions out there. Most data scientists today instead fall into the other two categories.

Applied Data Scientist

Applied Data Scientists are a bit more pragmatic. They work for a company that drives their goals, instead of being funded by research grants. They do not have the luxury of time often afforded those working on grants, so the solutions they build need to go into production sooner rather than later. (Now to be fair to researchers, applied data scientists also don’t have to deal with the headaches surrounding grant proposals).

Typically, data scientists working in industry (not part of a research team) are not out developing new algorithms or trying to push the limits of machine learning. Instead, they use tools created by others to explore and derive meaning from data that can be acted upon. When the boss wants actionable data and they need it now. Most applied data scientists keep a few algorithms on hand that they know work for certain scenarios and spend most of their time gathering, cleaning, and prepping the data to build out the models.

I kind of like this position. I view myself more of an applied scientist. Even while going through my PhD, I always leaned towards applied versus theoretical.

To work as an applied data scientist, a candidate should have a master’s degree or at least 6 years industry experience. They should be inquisitive and honestly interested in the domain in which they work in. I work in cyber security, I have spent a lot of time researching and studying the field so I can identify opportunities to provide a data driven solution. Candidates should also have a wide ranging skill set beyond just ML. An applied data scientist should be well versed in tools such as dashboard development, optimization modeling, forecasting, and simulation modeling.

Data Analyst/Scientist

This is probably the most common data science position right now. What this position is calling for is basically a top level analyst familiar with data science tools. Now I am not denigrating this positon. What these data scientists do is every bit as challenging and important as the other two I described above. They are expected to operate as a data scientist while also handling analyst or business intelligence duties as well.

People in this position are not less than or incapable of doing the other roles described above, they are instead part of an organization that either does not have a fully developed data science program or they are working outside the data science organization. These data scientists are often embedded in a department providing data management and analysis expertise. Truly jacks of all trades, these data scientist often stand-up and manage data warehouses or data marts for their department. They can be expected to handle reporting duties as well as machine learning model development.

Conclusion

To sum this all up, data science is a broad and still evolving field. Solid industry definitions are not common place and titles often do not represent actual job duties. However, most data scientists can be generalized under one of the three roles I discussed above: Academic/Research Data Scientist, Applied Data Scientist, or Data Analysts/Scientists. Neither role is inherently better or more important, but the differences in the rows can definitely attract different individuals.

Data Jobs: What does a Data Analyst Do?

Data Analysts get a bad wrap. With the advent of the Data Scientist, Data Analysts are often viewed as Data Scientists lite, however I feel that is not the honest case. Truth is, there is a lot of overlap between the two fields. I will dive deeper into what a Data Scientist is in a future article, but just know my opinion is the definition of Data Scientist as a job is still a bit fuzzy and I think the job title may eventually be broken into a few different titles to better define the differences.

Data Analyst

So what does a Data Analyst do?

A lot actually. You could put 10 data analysts into a room and you would get ten different answers to this question. So the best I can do here is make sweeping generalities. As the old saying goes “Your results may vary”

In general, data analysts perform statistical analysis, create reporting, run ad-hoc queries from data warehouses, create data visualizations, create and maintain dashboards, perform data mining, and create machine learning models (yes, ML is not only for data scientists). Their assignments are business driven. A data analysts is either embedded with a business unit (financial planning, fraud, risk management, cyber security, etc.) or working in a centralized reporting/analytics team. They use their skills to provide reporting and analytics for the business.

Tools used by Data Analysts

  • SQL – MySql, SQL Server, Oracle, Teradata, Postgres – whether simply querying a data warehouse or creating and managing a local data mart, data analysts need to be advanced SQL programmers
  • Visualization tools – Tableau, Qlik, Power BI, Excel, analysts use these tools to create visualizations and dashboards
  • Python/R – Data analysts should be familiar with languages like Python or R to help manage data and perform statistical analysis or build machine learning models
  • Spreadsheets – Excel, Google Sheets, Smart Sheets are used to create reports, and pivot tables used to analyze the data
  • ETL tools – SSIS, Alteryx, Talend, Knime, these tools are design to move data to and from databases, CSV files, and spreadsheets. Until the data is in a usable format, analysis cannot be performed.

Educational Requirements

Typically a data analyst position will ask for a bachelors degrees, preferably in computer science, statistics, database management or even business. While the barrier to entry for a data analyst job is generally not as high as a data scientist, that does not mean you cannot make a meaningful and well paid career as a data analyst. Also, the demand for data professionals seems to keep going up and up and it most likely will for the foreseeable future.

How do I get a job in Data Science?

This has to be the most common question on data science I am asked, and honestly it is a hard one to answer. For everyone out there trying to get your foot in the door on your first data job, believe me, I feel for you. Multiple interviews without any offers, or even not getting any interviews at all can be beyond frustrating. Now unfortunately, I do not have any magic trick to get your into the data field, but I can share how I did it.

So, how did I get into the data science field…

Honestly, I “Made” my first job. My first career out of the Army was as a biomedical equipment technician. I fixed medical equipment like patient monitors, ultrasounds, and x-ray machines.

We had a ticketing system called MediMizer where all the repairs and routine maintenance jobs were recorded. My bosses would run monthly reports out the system. I read some of the reports and just felt like we could do better.

I started with just Excel. I downloaded some data, created some pivot charts and made some basic visualizations. I asked new questions from the data. I looked at angles that weren’t covered in the existing reporting.

I showed these to my bosses, my co-workers, other department managers, basically anyone who would listen to me. Then I learned about Tableau, and using its free version I was able to create some more professional looking visualizations.

I learned how to make a dashboard, I started analyzing data sets from other departments, and I began feeding them my reports. I went back to school to get a degree and used what I was learning in school to improve my reporting skills.

While my job title didn’t change, I was now able to put data analysis skills on my resume. I was lucky enough to have very supportive management who saw the value in what I was doing, and allowed me to dedicate some of my time to it.

But most importantly, I was now a data professional (even if not in title). I was using data to solve real world problems. I put together a portfolio of some of the reporting I was doing. This allowed me to show my future employer that not only was I able to create reporting, but more importantly I was able to identify real world business problems and use data to help solve them.

The take away is don’t let your job title hold you back. Look around, what kind of problems do you see? Can you find a data-driven solution to help fix the problem? If you do this, you are a now a data professional (even if not in title). A portfolio made from real world examples can be more impressive than generic tutorial or Kaggle projects.

Remember, when trying to break into a new field, sometimes you need to make your own luck.

Data Science Interview Questions: Parts of a SQL Query

A common question I have seen good people get tripped up on. Remember KISS – Keep it Simple Stupid.

Don’t let the question trick you into overthinking. For a SQL query you need a SELECT command (what do we want), a FROM (what table is it in) and if you want to go for the extra credit point, throw in a WHERE (its our filter)

Return to questions page

Real Data Science Interview Questions

Below is a list (in no particular order) of real interview questions I have either asked, been asked, or saw asked in a real interview. While everyone has their own take on interview advice, mine is pretty clear cut. Answer the question, nothing more nothing less. Don’t get caught up in the trap of trying to add too much detail. Unless specifically asked for more detail, the interviewer more often than not just wants to make sure you have a grasp of the concepts.

1. Explain the difference between Regression and Classfiers:

Click here for answer

2. What does ETL stand for?

Click here for answer

3. What is the difference between a data warehouse and a transactional database

Answer is coming

4. Name 3 ETL tools

Answer is coming

5. Explain Probability vs Odds

Click here for answer

6. What are the basic elements or parts for writing SQL queries?

Click here for answer

7. Explain supervised versus unsupervised machine learning

Answer is coming

8. Explain the difference between a left join and an inner join

Answer is coming

9. What is ensemble modeling?

Answer is coming

10. What is a confusion matrix?

Answer is coming

11. What is DDL vs DML

Click Here for Answer

Real Data Science Interview Questions: Question 1

Explain the difference between Regression and a Classifier:

Link to video

While regression and classifiers are both popular machine learning model types, the difference sits in the results they return:

Regression returns distinct values: think height of a person, price of a house, weight of truck

Classifiers return categories: tall / short, expensive/mid-ranged/cheap, heavy/light

Return to interview questions

Data Jobs: What does a Data Architect do?

If experience has taught me anything, it is that while companies and organizations have gotten much better at collecting data, most of this data is still just stored away in unstructured data stores. So while the data is “technically” there, it is not a whole lot of use to anyone trying to build a report or create a machine learning model. In order to get actually use out of all the data being stored, it needs to organized into some sort usable structure: Enter the Data Architect.

Data Architect is a job title I have held on multiple occasions throughout my career. My best description when explaining the job to other people is that I was kind of like a data janitor. You see, everyone takes their data and dumps it into the storage closet. So you find some billing data on the second shelf, tucked away behind employee HR records. The floor is cluttered with server logs and six years worth of expense reports stored as PDFs with completely useless names like er123.pdf.

As a data architect, it was my job to try to organize this mess and put the data into some sort of structure that lends itself to reporting or modeling. So data architects have to be well versed in data modeling, data storage, and data governance.

Data Modeling

ERD diagram showing an HR database design

Data modeling is basically taking raw data dumps and organizing them into structure that fit the needs of company. It could involve creating an HR database like above or creating a series of aggregated tables designed for reporting or dashboarding. It is the job of the data architect to best fit business needs to a data platform: be it a transactional database, a data warehouse, or perhaps a data lake.

Data Storage

Data architects also need to address data storage. While I often defer to the server crew in the IT department, as a data architect I do advise on to how and where to store data. Cloud infrastructure and cheaper faster disk storage has made a lot of data storage decisions easier, but it is good to have a working understanding of storage platforms.

Data Governance

Data governance is all about how the data is managed from a regulatory and security standpoint. It is the practice of deciding who can have access to what data. Can some data be “outside facing” versus should some data sit behind multiple firewalls in a DMZ zone.

You will often work hand in hand with Legal and Security departments when figuring out how data governance practices will be implemented.

The AI Journey – Career advice for aspiring data professionals

A past co-worker of mine by the name of Antonio Ivanovski (AI) has written a free e-book giving some great advice on how to get your foot in the door as a data professional and how to make some smart moves to really improve your job satisfaction as well as earning potential. He not only talks the talk, but walks the walk. He has managed to quadruple his salary in a little under 5 years moving from UPS to Verizon, and now (as of this writing) he is working as a Senior Data Analyst for Google.

If you are looking to get into the field or looking for advice on how to move up, his free e-book is worth a read.

You can find it hear: AI with AI

You can also find him on LinkedIn, send him a connection request and tell him you want to know all about his Macedonian Battle Llama