Skip to content

Association Rule Mining

Why this matters

Association rule mining finds patterns of the form:

if a transaction contains X, it often also contains Y

The classic example is market basket analysis: customers who buy one product often buy another. But the same idea can apply to playlists, website clicks, medical co-occurrence, or any data where each record is a set of items.

The practical skill is not just running Apriori. The practical skill is interpreting support, confidence, and lift without fooling yourself.

Mental model

Imagine each shopping basket as a set:

transaction 1: beer, potato chips
transaction 2: wine, cheese
transaction 3: beer, wine, cheese

Association rule mining asks:

Which item combinations appear often enough, and which "if X then Y" rules are stronger than chance?

A rule is directional:

wine -> cheese

means:

Among transactions containing wine, cheese appears often.

It does not automatically mean wine causes cheese purchases.

Core ideas

  • A transaction is one basket, session, user profile, or event containing multiple items.
  • An itemset is a set of one or more items.
  • A frequent itemset appears in enough transactions to pass a support threshold.
  • An association rule has an antecedent and a consequent: antecedent -> consequent.
  • Support measures how common the itemset is overall.
  • Confidence measures how often the consequent appears when the antecedent appears.
  • Lift compares the rule against the baseline frequency of the consequent.
  • Apriori finds frequent itemsets by growing smaller frequent itemsets into larger candidates.
  • FP-growth avoids some Apriori cost by using a compressed frequent-pattern tree.
  • Thresholds are modeling choices; different support/confidence/lift values produce different rules.

Walkthrough

A tiny night-store example

Suppose we have 20 transactions from a small store:

transactions = [
    ("beer", "wine", "cheese"),
    ("beer", "potato chips"),
    ("eggs", "flour", "butter", "cheese"),
    ("eggs", "flour", "butter", "beer", "potato chips"),
    ("wine", "cheese"),
    ("potato chips",),
    ("eggs", "flour", "butter", "wine", "cheese"),
    ("eggs", "flour", "butter", "beer", "potato chips"),
    ("wine", "beer"),
    ("beer", "potato chips"),
    ("butter", "eggs"),
    ("beer", "potato chips"),
    ("flour", "eggs"),
    ("beer", "potato chips"),
    ("eggs", "flour", "butter", "wine", "cheese"),
    ("beer", "wine", "potato chips", "cheese"),
    ("wine", "cheese"),
    ("beer", "potato chips"),
    ("wine", "cheese"),
    ("beer", "potato chips"),
]

The visible patterns are already suggestive:

  • beer often appears with potato chips
  • wine often appears with cheese
  • eggs, flour, and butter often appear together

Association rule mining makes those patterns measurable.

Support: how common is the pattern?

Support asks:

In what fraction of all transactions does this itemset appear?

If wine and cheese appear together in 7 out of 20 transactions:

support(wine and cheese) = 7 / 20 = 0.35

High support means the pattern is common. Low support means the pattern may be too rare to care about, even if it looks strong.

Confidence: how reliable is the rule?

Confidence asks:

When the antecedent appears, how often does the consequent also appear?

For:

wine -> cheese

confidence is:

transactions with both wine and cheese / transactions with wine

If wine appears in 8 transactions and wine-with-cheese appears in 7:

confidence(wine -> cheese) = 7 / 8 = 0.875

That says: when wine appears, cheese appears 87.5% of the time.

Lift: is it stronger than the baseline?

Confidence alone can mislead. If cheese appears in almost every basket, then wine -> cheese might have high confidence simply because cheese is common.

Lift compares the rule confidence with the baseline support of the consequent:

lift(A -> B) = confidence(A -> B) / support(B)

Interpretation:

  • lift near 1: A and B appear together about as often as expected by chance
  • lift greater than 1: B is more likely when A appears
  • lift less than 1: B is less likely when A appears

Lift is often the most useful first check for whether a rule is interesting.

Why thresholds matter

Association rule mining can generate many rules. Thresholds decide what survives.

Common thresholds:

  • minimum support: remove rare patterns
  • minimum confidence: remove unreliable rules
  • minimum lift: remove rules that are not stronger than baseline
  • minimum rule length: require at least a certain number of items

Changing thresholds changes the story. A threshold is not just a technical parameter; it is part of the analysis.

Apriori: grow frequent itemsets

Apriori is based on a simple property:

If an itemset is frequent, all of its smaller subsets must also be frequent.

Example:

If {beer, potato chips} is frequent, then {beer} and {potato chips} must each be frequent too.

Apriori uses this to reduce the search space:

  1. Find frequent single items.
  2. Combine them into candidate pairs.
  3. Keep frequent pairs.
  4. Combine frequent pairs into larger candidates.
  5. Generate rules from frequent itemsets.

Apriori example

Why Apriori can be expensive

Apriori has two main costs:

  • it can generate many candidate itemsets
  • it repeatedly scans the transaction database to count support

This becomes expensive when there are many items, many transactions, or long baskets.

That is why the notebook introduces FP-growth as an alternative.

FP-growth: compress repeated patterns

FP-growth stands for Frequent Pattern Growth.

Instead of generating many candidates like Apriori, FP-growth builds a compressed structure called an FP-tree. The tree stores shared prefixes of transactions, which can reduce repeated scanning.

The study-level distinction:

Apriori: candidate generation and repeated counting
FP-growth: compressed pattern tree and recursive mining

Both are used to find frequent itemsets. FP-growth is often faster for larger or denser transaction data.

From baskets to recommendations

The LastFM exercise in the notebook asks how association rules could recommend artists.

Model each user as a transaction:

user 1: Radiohead, The National, Interpol
user 2: Radiohead, Arcade Fire, The National
user 3: Beyoncé, Rihanna, SZA

Then a rule like:

Radiohead and The National -> Interpol

could suggest Interpol to users who listen to Radiohead and The National but not Interpol.

This is not the only recommendation method, but it is an intuitive one.

Explained code examples

Run Apriori on transaction tuples

from efficient_apriori import apriori

transactions = [
    ("beer", "wine", "cheese"),
    ("beer", "potato chips"),
    ("wine", "cheese"),
    ("beer", "potato chips"),
]

itemsets, rules = apriori(
    transactions,
    min_support=0.25,
    min_confidence=0.5,
)

for rule in rules:
    print(rule)

What this teaches:

  • the input is a list of transactions
  • each transaction is a tuple or list of items
  • support and confidence thresholds filter the output
  • the result contains frequent itemsets and generated rules

Convert a basket DataFrame to transactions

Many basket datasets are stored as rows where each row contains purchased items and missing values.

transactions = []

for row in store_data.values:
    basket = [str(item) for item in row if str(item) != "nan"]
    transactions.append(basket)

What this teaches:

  • each row becomes one basket
  • missing cells are ignored
  • algorithms usually expect lists of item names, not a raw rectangular DataFrame

Read rule metrics

If a rule prints like:

{wine} -> {cheese} (conf: 0.875, supp: 0.350, lift: 2.188)

Read it as:

  • support 0.350: wine and cheese appear together in 35% of transactions
  • confidence 0.875: when wine appears, cheese appears 87.5% of the time
  • lift 2.188: cheese is more than twice as likely when wine appears than in a random basket

That is an actionable rule. A store might place wine and cheese near each other, bundle them, or test a promotion.

Common traps

High confidence means the rule is interesting.

Not always. If the consequent is already very common, confidence can be high without the antecedent adding much information. Check lift.

Association means causation.

A rule shows co-occurrence, not that one item causes the other.

Rare but perfect rules are always useful.

A rule with 100% confidence but very low support may be too rare to matter.

Thresholds are objective.

Support, confidence, and lift thresholds are analysis choices. Tune them based on the business question and dataset size.

Apriori output is the final answer.

Rules need interpretation, filtering, and domain judgment. Some rules are obvious, trivial, or operationally useless.

Only retail baskets count.

Any set-like record can work: listened artists per user, visited pages per session, diagnoses per patient, installed apps per device.

FP-growth finds different truths.

FP-growth is mainly a more efficient way to mine frequent patterns. The interpretation of support, confidence, and lift remains the same.

Check yourself

What is a transaction in association rule mining?

One record containing a set of items, such as a shopping basket, user listening history, or web session.

What does support measure?

How often an itemset appears across all transactions.

What does confidence measure?

How often the consequent appears among transactions that contain the antecedent.

Why is lift useful?

It compares the rule against the baseline frequency of the consequent, helping reveal whether the association is stronger than chance.

What does a lift near 1 mean?

The antecedent does not change the likelihood of the consequent much compared with baseline.

Why can Apriori be expensive?

It can generate many candidate itemsets and repeatedly scan the transaction database to count support.

What is the core advantage of FP-growth?

It mines frequent patterns using a compressed tree structure, reducing candidate generation and repeated scans.

How could association rules recommend music?

Treat each user's listened artists as a transaction, then recommend consequents from strong rules whose antecedents match the user's current artists.

Source anchors

  • Source file: notebooks/Module2/06a-Association Rule Mining.ipynb
  • Source datasets: notebooks/Module2/Data/store_data.csv, notebooks/Module2/Data/lastfm.csv
  • Key source concepts: market basket analysis, frequent itemsets, association rules, support, confidence, lift, Apriori, apyori, efficient-apriori, FP-growth, LastFM recommendation exercise
  • Source images: study-guide/docs/assets/extracted/associationrule.png, study-guide/docs/assets/extracted/exampleapriori.png, study-guide/docs/assets/extracted/aprioridata.jpg