Python Web Scraping / Automation: Connecting to Microsoft Edge with Selenium

Selenium is a Python package that allows you to control web browsers through Python. In this tutorial (and the following tutorials), we will be connecting to Googles Chrome browser, Selenium does work with other browsers as well.

First you will need to download Selenium, you can use the following commands depending on your Python distribution

c:\> Pip install selenium

c:\> Conda install selenium

If you are on a work computer or dealing with a restrictive VPN, the offline install option may help you: Selenium_Install_Offline

Next you need to download the driver that let’s you manage Microsoft Edge through Python.

Start by determining what version of Edge you have on your computer

Click the three horizontal lines in the upper right corner > Help and Feedback>About Microsoft Edge

Search for msedgedriver to download the file that matches your Edge version. (note, this is something you will need to do every time Edge is updated, so get used to it.)

Open up the zipfile you downloaded, you will find a file called msedgedriver.exe

Put it somewhere you can find, put in the following code to let Python know where to find it.

from selenium import webdriver
opts = webdriver.Edge()
dr = webdriver.Edge('C:/Users/larsobe/Desktop/msedgedriver.exe',chrome_options=opts)

Now to see if this works, use the following line, (you can try another website if you choose)   

Note the message Edge is being controlled by automated test software.

You are now running a web browser via Python.

Python for Data Science

Welcome to Python for Data Science, my free course that will take you from complete beginner to being able to build a machine learning model. The course will consist of 4 modules, the first one: Fundamentals is a available now. Each lesson contains a write up as well as a video. There is over an hour of videos for the Fundamentals Module.

Each module will also also contain some projects: some are simple challenges while the later projects will require you to build a ML model and test it.

The code for each lesson and project solutions are available to download below each module.

If you are working from your phone/tablet or work computer where you can’t install Python, I have a browser based Python console you can work with here:

Python testing shell


Fundamentals

  1. Python: Install Python and Hello World
  2. Python: Arithmetic Operations and Variables
  3. Python: Print Variables and User Input
  4. Python: Printing with .format()
  5. Python: Lists and Dictionaries
  6. Python: Working with Lists
  7. Python: Working with Dictionaries
    1. Project 1: Lists and Dictionaries
  8. Python: Tuples and Sets
  9. Python Conditional Logic
  10. Python Loops
  11. Python Functions
    1. Project 2: Loops and Conditional Logic
  12. Python: Enumerate() and Sort
  13. Python: Error handling
  14. Python lambda, map(), reduce(), filter()
  15. Python zip and unpack
  16. Python list comprehensions
  17. Python: Generators
  18. Python: Regular Expressions
  19. Python: **Kwargs and *Args
    1. Project 3: Cash Register
  20. Python: Closures (Bonus)
  21. Python: Decorators (Bonus)

Download Fundamental lesson Notebooks and Project solutions:

Data Handling / Management

  1. Create, Import, and Use Modules
  2. Numpy
  3. Numpy Part II
  4. Pandas (Series)
  5. Pandas (DataFrame)
  6. Pandas: Working with DataFrames
  7. Pandas: Renaming a Column
  8. Pandas: Working with Rows in DataFrames
  9. Working with CSV Files
  10. Working with Excel and CSV files using Pandas
  11. Pivot tables with Pandas
  12. An Interesting Problem with Pandas
  13. Connect to a MySQL database
    1. Connect to a SQL Server Database

coming soon

Statistics and Visualizations

coming soon

Machine Learning

coming soon

Python Project 1: Lists and Dictionaries

For your first project in the course, I am giving you the code below. You will notice I am placing 2 dictionaries in a list:

d = {'Name': 'Ben', 'Age': 35}
e = {'Name': 'Christine', 'Age': 30}
a = []
a.append(d)
a.append(e)
print(a)

Your challenge, should you choose to accept it, will be to run this code on your machine or in the test python browser (click on the blue arrow below) 

Try your Python code in the free console

I then want you to

1) Print out the second dictionary from the list

2) Print out the name from the first dictionary

3) Print out both ages

Note: I have not shown how to work with dictionaries inside a list. So consider this a stretch project. Don’t be afraid to use Google to help find an answer.


Project Answer Notebook Download Available at Course Page below:

Back to Python Course: Course

Solution Video

Python Web Scraping / Automation: Connecting to Firefox with Selenium

Selenium is a Python package that allows you to control web browsers through Python. In this tutorial (and the following tutorials), we will be connecting to Googles Chrome browser, Selenium does work with other browsers as well.

First you will need to download Selenium, you can use the following commands depending on your Python distribution

c:\> Pip install selenium

c:\> Conda install selenium

If you are on a work computer or dealing with a restrictive VPN, the offline install option may help you: Selenium_Install_Offline

Next you need to download the driver that let’s you manage Firefox through Python.

Start by determining what version of Firefox you have on your computer

Click the three horizontal lines in the upper right corner > Help >About Firefox

Search for geckodriver to download the file that matches your Firefox version. (note, this is something you will need to do every time Firefox is updated, so get used to it.)

Open up the zipfile you downloaded, you will find a file called geckodriver.exe

Put it somewhere you can find, put in the following code to let Python know where to find it.

from selenium import webdriver
opts = webdriver.FirefoxOptions()
dr = webdriver.Firefox('C:/Users/larsobe/Desktop/geckodriver.exe',chrome_options=opts)

Now to see if this works, use the following line, (you can try another website if you choose)   

Note the message Firefoxis being controlled by automated test software.

You are now running a web browser via Python.

How do I get a job in Data Science?

This has to be the most common question on data science I am asked, and honestly it is a hard one to answer. For everyone out there trying to get your foot in the door on your first data job, believe me, I feel for you. Multiple interviews without any offers, or even not getting any interviews at all can be beyond frustrating. Now unfortunately, I do not have any magic trick to get your into the data field, but I can share how I did it.

So, how did I get into the data science field…

Honestly, I “Made” my first job. My first career out of the Army was as a biomedical equipment technician. I fixed medical equipment like patient monitors, ultrasounds, and x-ray machines.

We had a ticketing system called MediMizer where all the repairs and routine maintenance jobs were recorded. My bosses would run monthly reports out the system. I read some of the reports and just felt like we could do better.

I started with just Excel. I downloaded some data, created some pivot charts and made some basic visualizations. I asked new questions from the data. I looked at angles that weren’t covered in the existing reporting.

I showed these to my bosses, my co-workers, other department managers, basically anyone who would listen to me. Then I learned about Tableau, and using its free version I was able to create some more professional looking visualizations.

I learned how to make a dashboard, I started analyzing data sets from other departments, and I began feeding them my reports. I went back to school to get a degree and used what I was learning in school to improve my reporting skills.

While my job title didn’t change, I was now able to put data analysis skills on my resume. I was lucky enough to have very supportive management who saw the value in what I was doing, and allowed me to dedicate some of my time to it.

But most importantly, I was now a data professional (even if not in title). I was using data to solve real world problems. I put together a portfolio of some of the reporting I was doing. This allowed me to show my future employer that not only was I able to create reporting, but more importantly I was able to identify real world business problems and use data to help solve them.

The take away is don’t let your job title hold you back. Look around, what kind of problems do you see? Can you find a data-driven solution to help fix the problem? If you do this, you are a now a data professional (even if not in title). A portfolio made from real world examples can be more impressive than generic tutorial or Kaggle projects.

Remember, when trying to break into a new field, sometimes you need to make your own luck.

Python: Working with Dictionaries

A dictionary is an advanced data structure in Python. It holds data in a Key – Value combination.

Basic Syntax for a dictionary:

dictionary = { <key> : <value>,
               <key> : <value>,
               <key> : <value> }

I think the key/value relationship can be explained thinking about a car. A car has a few key values that describe it. Think: make, model, color, year

If I want to make a dictionary to describe a car, I could build it like this:

car = { 'Make': 'Ford',
        'Model': 'Mustang',
        'Color': 'Candy Apple Red',
        'Year': 1965 }

You can also use the dict() function to create a dictionary

car = dict([ ('Make', 'Ford'), 
             ('Model', 'Mustang'),
             ('Color', 'Candy Apple Red'),
             ('Year', 1965)])

Now, how do we access the values in a dictionary. I know in lists you can just refer to the index, but in dictionaries, it doesn’t work that way.

Try your code in our online Python console: 

  

Instead, we refer to dictionaries by their key:

You can add a key/value pair:

You can change a dictionary value:

To remove values from a dictionary, use the del() function

Try your code in our online Python console: 

Previous Lesson: Working with lists

Next Lesson: Tuples and Sets

Back to Python Course: Course

Python Testing Shell

You can use the shell below to try Python code you learn in the lessons

To run Python scripts, type them out in the code editor and press the run button to execute:

To run console commands (run each line from a command prompt), click the down arrow and choose the down arrow and then >_ Console

Return to main python page: Python

SQL: Rollback and Commit – undo mistakes in SQL

If you ever want to experience a heart attack, may I advise accidently deleting a production table from a database. And even if you survive the heart attack, your job may not.

This is, IMHO, this single most important piece of SQL code you will ever learn. It is called Rollback and Commit.

What Rollback does, is reverse and changes you have made to the database, allowing you to undelete the table you accidentally sent to the data afterworld.

Here is how it works. You start by letting the system know to you are making changes that you may want to reverse. The syntax is the as follows

Start transaction;

delete from <table> where <column> = <Value>;

Now comes the life saving part. If you realize you made a mistake, simply type Rollback;

If, on the other hand, you like the results, then type Commit;

The results will be saved and the transaction instance will be closed out.

Now lets try adding a new row, using start transaction
If that is not what you want, just type Rollback; and your mistake is gone
On the other hand, if you want to save the results, simply type commit; and the new record will stay

Important NOTE: Rollback only works if you first Start Transaction: — it is two words that will save your job. Start every instance of adding or deleting data or object with Start Transaction: – trust me on this one.

SQL: What is SQL and its 5 Subgroups DQL, DML, TCL, DDL, DCL?

SQL – short for Sequel Query Language is a programming language designed to work with data stored in RDBMS (relational database management systems). The data managed by SQL is tabular, meaning it formatted in rows and columns, very much like an Excel spreadsheet.

SQL is broken down into 5 sets of command groups:

DQL = Data Query Language

Querying (SELECT, FROM, etc)

DML = Data Manipulation Language

Data manipulation (INSERT, DELETE, etc)

TCL = Transaction Control Language

Transaction mgt. (COMMIT, ROLLBACK, etc)

DDL = Data Definition Language

Data definition (CREATE, DROP, etc)

DCL = Data Control Language

Data control (GRANT, REVOKE, etc)

Python: Installing Python, Hello World and learning to use Jupyter Notebooks

I want to start with the obvious. there are many different version of Python, and many platforms you can use to work with it, from command line/terminal, IDEs like Spyder of IDLE, or notebooks like Jupyter. For this course on Python, I have chosen to start with Jupyter Notebooks, because I feel this is a great tool for learning Python.

Installing Python

First things first. You need to get Python on your machine. In this series of lessons, I am going to use the Anaconda Distribution. I am choosing this distribution for one simple reason: it is built for data science, most of the tools you will need for data analysis and machine learning come pre-loaded with Anaconda.

To install this distribution, go to the Anaconda website and install Python: Anaconda

Jupyter Notebook

Once installed, you will find Jupyter Notebook in your program list.

Jupyter will open in your default browser and should look something like this

Click on the New button and then Python 3

You will now have a new notebook that looks like this

Lets start with some very basic code. We will do the traditional Hello World exercise

The code is a follows

print("Hello World")

Print() with the text you wish to print in quotes inside the parenthesis

After you type it into a notebook command box, click the arrow icon to run your code (or hit Shift+Enter)

Your results will appear at the bottom.

Click here for lesson 2: Arithmetic Operation, Comparators, and Variables

Back to Python Course: Course