Skip to content

Interfacing Python and MySQL

Why this matters

In the previous lesson, SQL and Pandas were two ways to ask questions about tabular data. This lesson adds the missing operational piece:

How does Python actually talk to a real database server?

MySQL is a relational database management system. Unlike a local DataFrame, it usually runs as a separate service. Python connects to it, sends SQL statements, receives results, and then either works with those results directly or turns them into Pandas DataFrames.

The source notebook uses classroom Docker containers for MySQL, including the Sakila and Employees sample databases. This guide focuses on the transferable workflow rather than the exact Docker setup.

Mental model

Think of Python as a database client.

Python code -> connection -> cursor/query -> MySQL server -> result rows

The main objects are:

  • connection: the live link to a database server
  • cursor: the object that sends SQL and fetches results
  • query result: rows returned by a SQL statement
  • DataFrame result: query rows loaded directly into Pandas

The practical question is:

Do I want low-level control over SQL execution, or do I want the query result as a DataFrame?

Use a cursor when you need explicit database actions. Use pandas.read_sql when your next step is analysis in Python.

Core ideas

  • MySQL is a database server; Python needs a driver to connect to it.
  • mysql-connector-python is one driver that lets Python talk to MySQL.
  • A connection stores host, user, password, port, and optionally the database name.
  • A cursor executes SQL statements and fetches result rows.
  • fetchall() consumes the result from the last executed query.
  • Data-changing statements should be committed if you want them persisted.
  • pandas.read_sql runs a query and returns the result as a DataFrame.
  • Close database connections when the work is done.
  • Avoid hardcoding real credentials in notebooks or source code.

Walkthrough

From SQLite to MySQL

SQLite is often just a local file. It is convenient for small projects, embedded apps, exercises, and simple local storage.

MySQL is a server-based database. That matters because you now care about:

  • where the server is running
  • which port it listens on
  • which user is allowed to connect
  • which database/schema you want
  • whether credentials and connections are handled safely

In the course notebook, MySQL runs in Docker containers such as sakila-mysql and employees-mysql. In another setup, the host might be localhost, a VM, or a remote database server.

Install a MySQL driver

Python does not speak MySQL by itself. It needs a driver.

python3 -m pip install mysql-connector-python

Then Python can import the connector:

import mysql.connector as mysql

Connect to the server

A minimal connection looks like this:

connection = mysql.connect(
    host="localhost",
    user="your_user",
    password="your_password",
    port=3306,
)

For the course Docker setup, the host and credentials are classroom-specific. In real projects, do not hardcode real passwords in notebooks. Use environment variables, a local config file that is not committed, or a secret manager.

Execute SQL with a cursor

The connection creates a cursor. The cursor sends SQL statements.

cursor = connection.cursor()
cursor.execute("SHOW DATABASES")

databases = cursor.fetchall()
print(databases)

execute() sends the SQL. fetchall() retrieves all rows from that query. After you fetch the rows, that result is consumed; if you need it again, run the query again.

Select a database

You can connect directly to a specific database:

db = mysql.connect(
    host="localhost",
    user="your_user",
    password="your_password",
    port=3306,
    database="dbpy",
)

After that, SQL statements operate inside dbpy unless they explicitly name another database.

Create a table and insert rows

A small teaching example:

cursor = db.cursor()

cursor.execute(
    "CREATE TABLE cats (owner VARCHAR(20), name VARCHAR(20), type SMALLINT)"
)

cursor.execute("INSERT INTO cats VALUES ('juan', 'darcy', 1)")
cursor.execute("INSERT INTO cats VALUES ('bob', 'lucky', 2)")
cursor.execute("INSERT INTO cats VALUES ('bob', 'oreo', 1)")

db.commit()

commit() matters. Many database drivers use transactions. Without a commit, your changes may not persist after the connection closes.

Query rows manually

cursor.execute("SELECT owner, name FROM cats")

for owner, name in cursor.fetchall():
    print(owner, name)

This style is useful when you want to control execution closely or when the result is small and simple.

Query into Pandas

If the next step is analysis, go directly to a DataFrame:

import pandas as pd

cat_counts = pd.read_sql(
    """
    SELECT owner, COUNT(*) AS number_of_cats
    FROM cats
    GROUP BY owner
    ORDER BY number_of_cats
    """,
    db,
)

Now cat_counts is a DataFrame. You can inspect, plot, join with other data, or continue analysis in Python.

Use SQL first for large databases

The Employees sample database in the notebook has millions of records. That is exactly where SQL should do the first reduction.

Good pattern:

hired_women = pd.read_sql(
    """
    SELECT *
    FROM employees
    WHERE gender = 'F'
      AND hire_date BETWEEN '1980-01-01' AND '1985-12-31'
    """,
    employees_connection,
)

The database filters first. Pandas receives only the relevant result.

Bad pattern:

all_employees = pd.read_sql("SELECT * FROM employees", employees_connection)

That may be fine for a tiny table, but it is the wrong reflex for large databases. Pull what you need, not everything that exists.

Join normalized data in SQL

The Sakila sample database models a DVD rental store. It is normalized, so useful questions often require joins across several tables.

Example question:

Which actor appears in the most films?

SELECT CONCAT(first_name, ' ', last_name) AS full_name,
       COUNT(*) AS film_count
FROM actor
LEFT JOIN film_actor USING (actor_id)
GROUP BY actor_id
ORDER BY film_count DESC
LIMIT 1;

This query is better done in SQL because the relationships live in the database schema. Let MySQL join and aggregate; then bring the compact result into Python.

Explained code examples

Safer connection helper

import os
import mysql.connector as mysql


def connect_to_mysql(database=None):
    return mysql.connect(
        host=os.environ.get("MYSQL_HOST", "localhost"),
        user=os.environ["MYSQL_USER"],
        password=os.environ["MYSQL_PASSWORD"],
        port=int(os.environ.get("MYSQL_PORT", "3306")),
        database=database,
    )

What this teaches:

  • credentials come from environment variables, not source code
  • database is optional, so the same helper can connect before or after choosing a schema
  • MYSQL_PORT defaults to 3306, the usual MySQL port

For a classroom Docker setup, environment variables can still hold the provided demo credentials.

Parameterized query

Avoid building SQL by string concatenation when values come from users or variables.

cursor = db.cursor()

cursor.execute(
    "SELECT owner, name FROM cats WHERE owner = %s",
    ("bob",),
)

rows = cursor.fetchall()

The %s placeholder is filled by the driver. This helps avoid SQL injection and quoting mistakes.

Close connections

cursor.close()
db.close()

Open connections consume resources. Closing them is part of the workflow, not cleanup trivia.

For larger scripts, prefer context managers or try / finally patterns so connections close even when an error happens.

Common traps

A connection and a cursor are the same thing.

The connection represents the live database session. The cursor executes SQL and fetches results inside that session.

If an insert works, it is automatically saved.

Not always. Use commit() for data-changing statements unless autocommit is deliberately enabled.

I should load the whole database into Pandas.

Usually no. Use SQL to filter, join, and aggregate first. Load the smaller result into Pandas.

Hardcoded notebook passwords are fine because they are local.

Demo credentials are okay for disposable classroom containers. Real credentials should not be committed.

fetchall() is harmless for any query.

fetchall() loads all returned rows into memory. For huge results, fetch in chunks or reduce the query first.

SQL strings are safe if they look simple.

Use parameterized queries for variable values. Do not concatenate user input into SQL.

Pandas replaces the cursor.

read_sql is convenient for analysis results, but cursors are still useful for database administration, DDL, inserts, updates, and explicit control.

Check yourself

What does a MySQL connector/driver do?

It lets Python open a connection to a MySQL server, send SQL statements, and receive results.

What is the role of a cursor?

A cursor executes SQL statements and fetches result rows from the database session.

When should you use pandas.read_sql?

When you want a SQL query result as a Pandas DataFrame for analysis.

Why should SQL filter a large database before Pandas sees it?

The database can reduce the data near where it is stored, avoiding unnecessary memory and transfer cost in Python.

Why is commit() important after inserts?

It persists data-changing operations in the database transaction.

What is wrong with hardcoding real database passwords in a notebook?

The credentials can leak through source control, screenshots, shared files, or logs. Use environment variables or another secret mechanism.

Why are parameterized queries safer than string concatenation?

The driver handles quoting and values separately from SQL structure, reducing SQL injection and formatting mistakes.

Source anchors

  • Source file: notebooks/Module2/02b-Interfacing Python and MySQL.ipynb
  • Source databases: classroom MySQL Docker containers, dbpy, Employees sample database, Sakila sample database
  • Key source concepts: MySQL as server DBMS, mysql-connector-python, connections, cursors, execute, fetchall, creating databases and tables, inserting rows, pandas.read_sql, Employees queries, Sakila joins, closing connections
  • Source image: study-guide/docs/assets/extracted/MySQL.png