Python: Convert Datetime to Date using Pandas

To convert a datetime in a pandas dataframe to date use the dt.date function:

df['column'] = pd.to_datetime(df['column']).dt.date

To demonstrate, first let’s build a dataframe

import pandas as pd
df = pd.DataFrame({'Job_Start': ['Demolition','Construction', 'Cleanup'],
                   'time': ['2022-05-20 08:07:22', '2022-05-27 07:34:01', 
                   '2022-06-01 09:12:11']})

Now lets convert the “time” column to date instead of datetime

df['time'] = pd.to_datetime(df['time']).dt.date

Python: Read all files in a folder

os.listdir() command will easily give you a list off all files in a folder.

So for this exercise I created a folder and threw a few files in it.

Using the following code, can iterate through the file list

import os
for files in os.listdir("C:/Users/blars/Documents/test_folder"):
    print(files)

Now if I wanted to read the files, I could use the Pandas command pd.read_excel to read each file in the loop

***Note, I made this folder with only Excel files on purpose for ease of demonstration. You could do this with multiple file types in a folder, it would however require some conditional logic to handle the different file types

To read all the Excel files in the folder:

import pandas as pd
import os

os.chdir('C:/Users/blars/Documents/test_folder')
for files in os.listdir("C:/Users/blars/Documents/test_folder"):
    print(files)
    file = pd.read_excel(files)
    print(file)

Python Web Scraping: Get http code easily with the Requests module

The Requests module for Python makes capturing and working with HTML code from any website.

Requests comes installed in many of the Python distributions, you can test if it is installed on yours machine by running the command: import requests

If that command fails, then you’ll need to install the module using Conda or Pip

import requests
t = requests.get('http://aiwithai.com')
print(t.text)

As you can see, using just 3 lines of code you can return the HTML from any website

You can see that all the text found on the web page is found in the HTML code, so parsing through the text can allow you to scrape the information off of a website

Requests has plenty more features, here are couple I use commonly

t.status_code == returns the status of your get request. If all goes well, it will return 200, otherwise you will get error codes like 404

t.headers

You can also extract your results into json

t.json()

Data Science Interview Questions: Parts of a SQL Query

A common question I have seen good people get tripped up on. Remember KISS – Keep it Simple Stupid.

Don’t let the question trick you into overthinking. For a SQL query you need a SELECT command (what do we want), a FROM (what table is it in) and if you want to go for the extra credit point, throw in a WHERE (its our filter)

Return to questions page

Real Data Science Interview Questions

Below is a list (in no particular order) of real interview questions I have either asked, been asked, or saw asked in a real interview. While everyone has their own take on interview advice, mine is pretty clear cut. Answer the question, nothing more nothing less. Don’t get caught up in the trap of trying to add too much detail. Unless specifically asked for more detail, the interviewer more often than not just wants to make sure you have a grasp of the concepts.

1. Explain the difference between Regression and Classfiers:

Click here for answer

2. What does ETL stand for?

Click here for answer

3. What is the difference between a data warehouse and a transactional database

Answer is coming

4. Name 3 ETL tools

Answer is coming

5. Explain Probability vs Odds

Click here for answer

6. What are the basic elements or parts for writing SQL queries?

Click here for answer

7. Explain supervised versus unsupervised machine learning

Answer is coming

8. Explain the difference between a left join and an inner join

Answer is coming

9. What is ensemble modeling?

Answer is coming

10. What is a confusion matrix?

Answer is coming

11. What is DDL vs DML

Click Here for Answer

Real Data Science Interview Questions: Question 1

Explain the difference between Regression and a Classifier:

Link to video

While regression and classifiers are both popular machine learning model types, the difference sits in the results they return:

Regression returns distinct values: think height of a person, price of a house, weight of truck

Classifiers return categories: tall / short, expensive/mid-ranged/cheap, heavy/light

Return to interview questions

Python: Selenium – Setting Chrome browser size

There are times when running automation on a web browser that you will want to adjust the window size of the browser. The most obvious reason I can think of is some that websites, (mine included) act display differently based on the window size.

For example: Full Size

Reduced size

Notice in the minimized window, my menu list is replaced by accordion button. For the purposes of automation and webscraping, the accordion button is actually easy to navigate than my multi-layered menu.

The code for opening the browser in full screen mode is below: note the line –start-maximized

To open the window in a smaller scale try: window-size=width, length. Play around with the values to get one that works for your screen.

Data Jobs: What does a Data Architect do?

If experience has taught me anything, it is that while companies and organizations have gotten much better at collecting data, most of this data is still just stored away in unstructured data stores. So while the data is “technically” there, it is not a whole lot of use to anyone trying to build a report or create a machine learning model. In order to get actually use out of all the data being stored, it needs to organized into some sort usable structure: Enter the Data Architect.

Data Architect is a job title I have held on multiple occasions throughout my career. My best description when explaining the job to other people is that I was kind of like a data janitor. You see, everyone takes their data and dumps it into the storage closet. So you find some billing data on the second shelf, tucked away behind employee HR records. The floor is cluttered with server logs and six years worth of expense reports stored as PDFs with completely useless names like er123.pdf.

As a data architect, it was my job to try to organize this mess and put the data into some sort of structure that lends itself to reporting or modeling. So data architects have to be well versed in data modeling, data storage, and data governance.

Data Modeling

ERD diagram showing an HR database design

Data modeling is basically taking raw data dumps and organizing them into structure that fit the needs of company. It could involve creating an HR database like above or creating a series of aggregated tables designed for reporting or dashboarding. It is the job of the data architect to best fit business needs to a data platform: be it a transactional database, a data warehouse, or perhaps a data lake.

Data Storage

Data architects also need to address data storage. While I often defer to the server crew in the IT department, as a data architect I do advise on to how and where to store data. Cloud infrastructure and cheaper faster disk storage has made a lot of data storage decisions easier, but it is good to have a working understanding of storage platforms.

Data Governance

Data governance is all about how the data is managed from a regulatory and security standpoint. It is the practice of deciding who can have access to what data. Can some data be “outside facing” versus should some data sit behind multiple firewalls in a DMZ zone.

You will often work hand in hand with Legal and Security departments when figuring out how data governance practices will be implemented.

The AI Journey – Career advice for aspiring data professionals

A past co-worker of mine by the name of Antonio Ivanovski (AI) has written a free e-book giving some great advice on how to get your foot in the door as a data professional and how to make some smart moves to really improve your job satisfaction as well as earning potential. He not only talks the talk, but walks the walk. He has managed to quadruple his salary in a little under 5 years moving from UPS to Verizon, and now (as of this writing) he is working as a Senior Data Analyst for Google.

If you are looking to get into the field or looking for advice on how to move up, his free e-book is worth a read.

You can find it hear: AI with AI

You can also find him on LinkedIn, send him a connection request and tell him you want to know all about his Macedonian Battle Llama