While this list is not exhaustive, the languages listed below are commonly used by data professionals like analysts, data scientists, and data engineers.
SQL (Structured Query Language):
SQL is a language used to manage and manipulate relational databases, which are commonly used to store large amounts of structured data. It is considered essential for data analysts as it allows them to extract insights and information from these databases.
Python is a high-level programming language that is widely used in data analysis and data science. It has a large ecosystem of libraries and frameworks such as Pandas, NumPy, and Scikit-learn that are specifically designed for data manipulation, analysis, and modeling.
R is another programming language that is designed for statistical computing and graphics. It has a large library of packages for data manipulation, visualization, and analysis, making it an essential tool for data analysts.
SAS is a software suite that provides a range of tools for data analysis and business intelligence. It is commonly used in industries such as healthcare, finance, and retail, and is known for its ability to handle large datasets.
Java is a popular programming language that is widely used in enterprise-level applications and big data processing. Its ability to handle large volumes of data makes it an essential language for data analysts.
MATLAB is a programming language used primarily for numerical computing and visualization. It is commonly used in scientific research and engineering, but is also used in data analysis and machine learning.
Scala is a programming language that is designed to be scalable and efficient, making it an ideal language for big data processing. It is often used in conjunction with Apache Spark, a distributed computing framework for processing large datasets.
It’s worth noting that the specific languages used by data analysts can vary depending on the industry, the type of data being analyzed, and the specific job requirements. However, a strong foundation in SQL and at least one of the programming languages mentioned above is generally considered essential for data analysts.
Data analysis has become an integral part of business operations in the digital age. As companies collect and store vast amounts of data, they need skilled professionals to extract insights and make data-driven decisions. Data analysts play a crucial role in this process, using their expertise to analyze data and draw insights that inform business decisions. Here are the top skills a data analyst should have to excel in this field.
1. Strong quantitative skills:
Data analysts need to be comfortable with numbers and statistics. They should have a solid foundation in mathematics and be proficient in tools such as Excel and statistical programming languages like R or Python. Understanding mathematical concepts such as probability, regression analysis, and hypothesis testing is essential to be able to analyze data accurately and draw meaningful conclusions.
2. Data visualization skills:
Data analysts must be able to communicate their insights effectively. They should be proficient in creating data visualizations, such as charts and graphs, to help others understand complex data sets. Knowledge of data visualization tools like Tableau, Power BI, or QlikView can help in creating interactive dashboards and presentations to communicate insights.
3. Strong problem solving skills:
Data analysis is all about solving problems. Data analysts must have a strong analytical mind to identify patterns and insights in data that can help businesses solve problems. They should have the ability to think creatively, identify gaps in data, and come up with strategies to fill those gaps.
4. Attention to detail:
Data analysis requires meticulous attention to detail. Data analysts must be able to identify and correct errors in data to ensure accuracy in their analysis. They must be skilled in data cleaning and data preparation, and be able to ensure data consistency across multiple sources.
5. Business acumen:
Data analysts should have a strong understanding of the business they are working for. They must be able to connect the insights drawn from data analysis to the larger business objectives and strategy. They must also be able to communicate their findings in a way that is understandable to non-technical stakeholders.
6. Communication skills:
Data analysts must be able to effectively communicate their insights to stakeholders. They must be skilled in creating clear and concise reports, presentations, and visualizations that convey the insights drawn from data analysis. They should also be able to work collaboratively with others, including non-technical stakeholders, to identify business problems and develop solutions.
7. Continuous learning:
The field of data analysis is constantly evolving, and data analysts must be willing to continuously learn and adapt to new technologies and techniques. They should be passionate about exploring new tools and techniques, and be willing to experiment with new approaches to problem-solving.
In conclusion, data analysis is a complex field that requires a combination of technical skills, business acumen, and problem-solving abilities. Data analysts must be comfortable with numbers and statistics, have strong analytical skills, and be able to communicate their insights effectively. They must be able to work collaboratively with others, including non-technical stakeholders, to identify business problems and develop solutions. Finally, they must be willing to continuously learn and adapt to new technologies and techniques to stay ahead in the field.
A database administrator, commonly referred to as a DBA, is a professional who is responsible for managing, maintaining, and optimizing a database. Databases are an essential component of most organizations as they store critical information that is required for daily operations.
The role of a DBA is to ensure that the database is running smoothly and efficiently, while also being available to users at all times. This requires a combination of technical and interpersonal skills, as well as a deep understanding of database management systems.
Here are some of the key responsibilities of a DBA:
- Installing, configuring, and upgrading database management systems: A DBA is responsible for setting up new database systems and ensuring that they are configured properly. This also includes upgrading existing systems to the latest version.
- Database security: A DBA is responsible for implementing security measures to protect the database from unauthorized access, theft, or corruption. This includes setting up user accounts, defining access privileges, and implementing encryption and backup systems.
- Performance tuning: A DBA is responsible for optimizing the performance of the database by analyzing queries and indexes, as well as adjusting parameters to ensure that the database is running as efficiently as possible.
- Data backup and recovery: A DBA is responsible for backing up the database regularly and testing the recovery process to ensure that critical data can be restored in the event of a failure.
- Monitoring database activity: A DBA is responsible for monitoring the database for performance and usage trends, as well as identifying any potential problems or issues that may arise.
- Data migration: A DBA is responsible for moving data from one database to another, either for the purpose of upgrading systems or for consolidating multiple databases into one.
- Troubleshooting: A DBA is responsible for solving problems that arise with the database, whether it is a performance issue or a software bug. This requires a deep understanding of the database management system and the ability to work quickly and effectively to resolve the problem.
In conclusion, a DBA is a critical member of any organization that relies on a database for its operations. The role requires a combination of technical expertise, interpersonal skills, and a deep understanding of database management systems. A DBA is responsible for ensuring that the database is running smoothly, efficiently, and securely, and is available to users at all times.
In today’s data-driven world, data engineers play a crucial role in ensuring that data is collected, stored, processed, and made available for analysis and decision making. Data engineers are responsible for the design, construction, and maintenance of the infrastructure that supports the collection and processing of vast amounts of data. They are the foundation of the data ecosystem, making it possible for data scientists, analysts, and business users to derive insights and make informed decisions.
The role of a data engineer is to build and maintain the pipelines that transfer data from various sources to a centralized repository. They are responsible for ensuring that the data is cleaned, transformed, and made ready for analysis. This requires a deep understanding of data management, data warehousing, and the use of big data technologies such as Hadoop, Spark, and NoSQL databases.
Data engineers work closely with data scientists and analysts to understand the data requirements and design systems that meet those needs. They also ensure that the data is properly secured, backed up, and protected from unauthorized access. Data engineers must be able to write efficient and scalable code, debug and resolve issues, and optimize systems for performance and scalability.
One of the most important responsibilities of data engineers is to ensure that data is easily accessible and usable. This requires the development of data APIs and integration with other systems such as data visualization tools and business intelligence platforms. Data engineers must also be able to monitor the data pipelines and systems to ensure they are functioning as expected and make adjustments as needed.
Data engineering is a rapidly evolving field, and data engineers must be able to stay up-to-date with the latest technologies and trends. This requires continuous learning and a willingness to experiment with new tools and approaches. Additionally, data engineers must have excellent communication skills, as they often work in cross-functional teams and must be able to effectively communicate technical concepts to non-technical stakeholders.
In conclusion, data engineers play a vital role in the data ecosystem by building and maintaining the infrastructure that supports data collection and processing. They ensure that data is accessible and usable, and they work closely with data scientists and analysts to support data-driven decision making. If you are interested in pursuing a career in data engineering, you should have a strong foundation in computer science and programming, as well as experience with big data technologies and data management.
While generalizing any profession is difficult since so many factors can come into play: (academic versus corporate — large company versus start-up), I find the role of Data Scientist can truly only be described by breaking it apart into three very different roles. For lack of better terms I will name these roles: Academic/Research Data Scientist, Applied Data Scientist, and finally Data Analyst/Scientist. These roles require different skill sets and different mindset that I will discuss below.
First off, I would like to put my bias out front, I am an Applied Data Scientist. Now, that does not mean I feel that my current role is superior to any other data scientist role out there. Also, I work for a very large company with lots of resources, so I also want to point out that the roles I will describe below can overlap depending on your working environment.
Academic/Research Data Scientist
These are the people who design new machine learning algorithms and push the boundaries of data science and AI. When you see advances in self driving cars and computer vision, these are the people behind those advances. These individuals work either in a university setting or as part of a research team for companies like Google, Facebook, Tesla. Most either hold a PhD or are currently working on one. These data scientists actively develop and conduct research experiments that are written up and published in scientific journals. These are the pure scientists among the data science world.
To be amongst this crowd of data scientists, you need to be at the top of your game with advanced mathematics and programming skills. If you really love diving deep into the field and can handle the often glacial pace research moves at, this could be the job for you. The biggest drawback is that these jobs are limited. There are only so many academic or industry research positions out there. Most data scientists today instead fall into the other two categories.
Applied Data Scientist
Applied Data Scientists are a bit more pragmatic. They work for a company that drives their goals, instead of being funded by research grants. They do not have the luxury of time often afforded those working on grants, so the solutions they build need to go into production sooner rather than later. (Now to be fair to researchers, applied data scientists also don’t have to deal with the headaches surrounding grant proposals).
Typically, data scientists working in industry (not part of a research team) are not out developing new algorithms or trying to push the limits of machine learning. Instead, they use tools created by others to explore and derive meaning from data that can be acted upon. When the boss wants actionable data and they need it now. Most applied data scientists keep a few algorithms on hand that they know work for certain scenarios and spend most of their time gathering, cleaning, and prepping the data to build out the models.
I kind of like this position. I view myself more of an applied scientist. Even while going through my PhD, I always leaned towards applied versus theoretical.
To work as an applied data scientist, a candidate should have a master’s degree or at least 6 years industry experience. They should be inquisitive and honestly interested in the domain in which they work in. I work in cyber security, I have spent a lot of time researching and studying the field so I can identify opportunities to provide a data driven solution. Candidates should also have a wide ranging skill set beyond just ML. An applied data scientist should be well versed in tools such as dashboard development, optimization modeling, forecasting, and simulation modeling.
This is probably the most common data science position right now. What this position is calling for is basically a top level analyst familiar with data science tools. Now I am not denigrating this positon. What these data scientists do is every bit as challenging and important as the other two I described above. They are expected to operate as a data scientist while also handling analyst or business intelligence duties as well.
People in this position are not less than or incapable of doing the other roles described above, they are instead part of an organization that either does not have a fully developed data science program or they are working outside the data science organization. These data scientists are often embedded in a department providing data management and analysis expertise. Truly jacks of all trades, these data scientist often stand-up and manage data warehouses or data marts for their department. They can be expected to handle reporting duties as well as machine learning model development.
To sum this all up, data science is a broad and still evolving field. Solid industry definitions are not common place and titles often do not represent actual job duties. However, most data scientists can be generalized under one of the three roles I discussed above: Academic/Research Data Scientist, Applied Data Scientist, or Data Analysts/Scientists. Neither role is inherently better or more important, but the differences in the rows can definitely attract different individuals.
Data Analysts get a bad wrap. With the advent of the Data Scientist, Data Analysts are often viewed as Data Scientists lite, however I feel that is not the honest case. Truth is, there is a lot of overlap between the two fields. I will dive deeper into what a Data Scientist is in a future article, but just know my opinion is the definition of Data Scientist as a job is still a bit fuzzy and I think the job title may eventually be broken into a few different titles to better define the differences.
So what does a Data Analyst do?
A lot actually. You could put 10 data analysts into a room and you would get ten different answers to this question. So the best I can do here is make sweeping generalities. As the old saying goes “Your results may vary”
In general, data analysts perform statistical analysis, create reporting, run ad-hoc queries from data warehouses, create data visualizations, create and maintain dashboards, perform data mining, and create machine learning models (yes, ML is not only for data scientists). Their assignments are business driven. A data analysts is either embedded with a business unit (financial planning, fraud, risk management, cyber security, etc.) or working in a centralized reporting/analytics team. They use their skills to provide reporting and analytics for the business.
Tools used by Data Analysts
- SQL – MySql, SQL Server, Oracle, Teradata, Postgres – whether simply querying a data warehouse or creating and managing a local data mart, data analysts need to be advanced SQL programmers
- Visualization tools – Tableau, Qlik, Power BI, Excel, analysts use these tools to create visualizations and dashboards
- Python/R – Data analysts should be familiar with languages like Python or R to help manage data and perform statistical analysis or build machine learning models
- Spreadsheets – Excel, Google Sheets, Smart Sheets are used to create reports, and pivot tables used to analyze the data
- ETL tools – SSIS, Alteryx, Talend, Knime, these tools are design to move data to and from databases, CSV files, and spreadsheets. Until the data is in a usable format, analysis cannot be performed.
Typically a data analyst position will ask for a bachelors degrees, preferably in computer science, statistics, database management or even business. While the barrier to entry for a data analyst job is generally not as high as a data scientist, that does not mean you cannot make a meaningful and well paid career as a data analyst. Also, the demand for data professionals seems to keep going up and up and it most likely will for the foreseeable future.
This has to be the most common question on data science I am asked, and honestly it is a hard one to answer. For everyone out there trying to get your foot in the door on your first data job, believe me, I feel for you. Multiple interviews without any offers, or even not getting any interviews at all can be beyond frustrating. Now unfortunately, I do not have any magic trick to get your into the data field, but I can share how I did it.
So, how did I get into the data science field…
Honestly, I “Made” my first job. My first career out of the Army was as a biomedical equipment technician. I fixed medical equipment like patient monitors, ultrasounds, and x-ray machines.
We had a ticketing system called MediMizer where all the repairs and routine maintenance jobs were recorded. My bosses would run monthly reports out the system. I read some of the reports and just felt like we could do better.
I started with just Excel. I downloaded some data, created some pivot charts and made some basic visualizations. I asked new questions from the data. I looked at angles that weren’t covered in the existing reporting.
I showed these to my bosses, my co-workers, other department managers, basically anyone who would listen to me. Then I learned about Tableau, and using its free version I was able to create some more professional looking visualizations.
I learned how to make a dashboard, I started analyzing data sets from other departments, and I began feeding them my reports. I went back to school to get a degree and used what I was learning in school to improve my reporting skills.
While my job title didn’t change, I was now able to put data analysis skills on my resume. I was lucky enough to have very supportive management who saw the value in what I was doing, and allowed me to dedicate some of my time to it.
But most importantly, I was now a data professional (even if not in title). I was using data to solve real world problems. I put together a portfolio of some of the reporting I was doing. This allowed me to show my future employer that not only was I able to create reporting, but more importantly I was able to identify real world business problems and use data to help solve them.
The take away is don’t let your job title hold you back. Look around, what kind of problems do you see? Can you find a data-driven solution to help fix the problem? If you do this, you are a now a data professional (even if not in title). A portfolio made from real world examples can be more impressive than generic tutorial or Kaggle projects.
Remember, when trying to break into a new field, sometimes you need to make your own luck.
A common question I have seen good people get tripped up on. Remember KISS – Keep it Simple Stupid.
Don’t let the question trick you into overthinking. For a SQL query you need a SELECT command (what do we want), a FROM (what table is it in) and if you want to go for the extra credit point, throw in a WHERE (its our filter)
Below is a list (in no particular order) of real interview questions I have either asked, been asked, or saw asked in a real interview. While everyone has their own take on interview advice, mine is pretty clear cut. Answer the question, nothing more nothing less. Don’t get caught up in the trap of trying to add too much detail. Unless specifically asked for more detail, the interviewer more often than not just wants to make sure you have a grasp of the concepts.
1. Explain the difference between Regression and Classfiers:
2. What does ETL stand for?
3. What is the difference between a data warehouse and a transactional database
Answer is coming
4. Name 3 ETL tools
Answer is coming
5. Explain Probability vs Odds
6. What are the basic elements or parts for writing SQL queries?
7. Explain supervised versus unsupervised machine learning
Answer is coming
8. Explain the difference between a left join and an inner join
Answer is coming
9. What is ensemble modeling?
Answer is coming
10. What is a confusion matrix?
Answer is coming
11. What is DDL vs DML