Our Multi-Touch Attribution Was Wrong for 18 Months (DoWhy Showed Us Why)
Traditional attribution confuses correlation with causation. Here’s how to use DoWhy’s causal inference to find which channels actually drive conversions.
Last month, I watched a colleague at a Fortune 500 company confidently present their marketing attribution results to leadership. They had a beautiful dashboard showing that email campaigns drove 40% of conversions. The CMO nodded approvingly. Two weeks later, they paused email marketing for an unrelated reason, and conversions barely moved. The attribution model had been lying to them for months.
I’ve seen this story play out dozens of times. Traditional attribution models, whether last-touch, first-touch, or even fancy multi-touch approaches, share a fatal flaw: they confuse correlation with causation. And in a world where cookies are vanishing at an alarming rate, we need methods that actually work.
Most marketing teams are measuring shadows on the wall and calling them reality. Last-click attribution says paid search drove a conversion, but what if that user was already planning to buy after seeing your TV ad? Multi-touch models spread credit across channels, but they’re essentially guessing at the weights.
This DoWhy Python marketing attribution tutorial will show you a different path. We’re going to build a causal inference pipeline that measures true marketing channel effectiveness using DoWhy, Microsoft’s causal inference library. By the end, you’ll have code that runs refutation tests to validate your findings, not just cross-validation metrics that make you feel good while hiding systematic bias.
I’ll walk through a realistic e-commerce scenario where we need to measure true marketing channel effectiveness using causal methods that can actually handle the complexity. No cookies required.
The Attribution Problem: Confounders Hiding in Your Marketing Data
You’re an e-commerce company running campaigns across paid social, paid search, email, and display. Your data shows that users exposed to display ads convert at higher rates. Victory, right?
Not so fast. Users who see display ads are often those who visited your site before, because that’s how retargeting works. They were already warmer leads. The display ad didn’t cause the conversion; the prior intent did. That prior intent is what we call a confounder.
Here’s a simplified diagram of what’s really going on:
┌─────────────────┐
│ Prior Intent │
│ (Confounder) │
└────────┬────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Display │ │ Search │ │ Conversion │
│ Exposure │ │ Clicks │ │ │
└──────┬──────┘ └──────┬──────┘ └─────────────┘
│ │ ▲
└────────────────┴────────────────┘
Traditional attribution can’t see the confounder. Causal inference methods can, and they adjust for it.
Other confounders lurking in marketing data include seasonality (you run more ads during high-intent periods), geographic targeting (you advertise more in high-income areas), and platform algorithms (ad networks show your ads to users likely to convert anyway, then take credit for it).
Building Your Causal DAG: Setting Up Marketing Attribution in Python
A causal DAG (Directed Acyclic Graph) is your map of how variables actually relate to each other. Getting this right is the single most important step. Honestly, it’s where domain knowledge matters more than fancy algorithms.
For our e-commerce case study, let’s define our variables:
- Treatment: Paid social ad spend (we’ll focus on one channel first)
- Outcome: Revenue per user
- Observed confounders: User’s prior site visits, device type, day of week, geographic region
- Unobserved confounders: Brand awareness, competitor activity
Setting up a causal DAG for marketing attribution in Python looks like this:
import dowhy
from dowhy import CausalModel
import pandas as pd
import numpy as np
# Simulate realistic marketing data
np.random.seed(42)
n_users = 50000
# Confounder: prior engagement score (affects both treatment and outcome)
prior_engagement = np.random.beta(2, 5, n_users)
# Treatment: paid social exposure (more likely if prior engagement is high)
paid_social_prob = 0.2 + 0.4 * prior_engagement
paid_social_exposed = np.random.binomial(1, paid_social_prob)
# True causal effect: $15 per exposed user
true_effect = 15

# Outcome: revenue (affected by prior engagement AND treatment)
base_revenue = 50 + 100 * prior_engagement # confounded relationship
treatment_effect = true_effect * paid_social_exposed
noise = np.random.normal(0, 20, n_users)
revenue = base_revenue + treatment_effect + noise
# Create DataFrame
df = pd.DataFrame({
'prior_engagement': prior_engagement,
'paid_social': paid_social_exposed,
'revenue': revenue,
'device': np.random.choice(['mobile', 'desktop', 'tablet'], n_users),
'day_of_week': np.random.randint(0, 7, n_users)
})
Now let’s define our causal graph:
# Define the causal DAG
causal_graph = """
digraph {
prior_engagement -> paid_social;
prior_engagement -> revenue;
paid_social -> revenue;
device -> revenue;
day_of_week -> paid_social;
}
"""
model = CausalModel(
data=df,
treatment='paid_social',
outcome='revenue',
graph=causal_graph
)
# Visualize the model
model.view_model()
Notice that day_of_week affects treatment (we might run more ads on weekends), but doesn’t directly cause revenue in this simplified model. In reality, your graph will be messier. That’s fine. Start with what you know and iterate.
DoWhy Implementation: From Raw Campaign Data to Causal Effects
Now for the core of this step-by-step DoWhy causal inference guide for marketers. DoWhy follows a four-step process: model, identify, estimate, refute.
We’ve already built the model. Let’s identify and estimate:
# Step 2: Identify causal effect
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)
# Step 3: Estimate causal effect using different methods
# Method 1: Propensity Score Matching
estimate_psm = model.estimate_effect(
identified_estimand,
method_name="backdoor.propensity_score_matching",
target_units="ate" # Average Treatment Effect
)
print(f"PSM Estimate: {estimate_psm.value:.2f}")
# Method 2: Linear Regression with controls
estimate_lr = model.estimate_effect(
identified_estimand,
method_name="backdoor.linear_regression"
)
print(f"Linear Regression Estimate: {estimate_lr.value:.2f}")
# Method 3: Inverse Propensity Weighting
estimate_ipw = model.estimate_effect(
identified_estimand,
method_name="backdoor.propensity_score_weighting"
)
print(f"IPW Estimate: {estimate_ipw.value:.2f}")
In my simulated data, these methods should recover something close to our true effect of $15. With real data, you won’t know the true effect. That’s exactly why the next section matters so much.
Refutation Tests: Validating Your Marketing Attribution Findings

Here’s where DoWhy really separates itself from the pack. Last year, I worked with a retail brand that had built an elaborate attribution system. Beautiful estimates, impressive dashboards. But when we ran refutation tests, their “causal effects” evaporated completely. They’d been optimizing against noise for six months.
Refutation tests are how you run causal inference with DoWhy for marketing attribution responsibly. They stress-test your estimates by asking: “Would my conclusions hold up under different assumptions?”
Let’s run through the key tests with a DoWhy refutation test marketing example walkthrough:
# Refutation 1: Placebo Treatment
# Replace real treatment with random noise - effect should disappear
placebo_refute = model.refute_estimate(
identified_estimand,
estimate_psm,
method_name="placebo_treatment_refuter",
placebo_type="permute"
)
print(placebo_refute)
# Refutation 2: Random Common Cause
# Add a random confounder - estimate should be stable
random_cause_refute = model.refute_estimate(
identified_estimand,
estimate_psm,
method_name="random_common_cause"
)
print(random_cause_refute)
# Refutation 3: Data Subset Validation
# Estimate on random subsets - should be consistent
subset_refute = model.refute_estimate(
identified_estimand,
estimate_psm,
method_name="data_subset_refuter",
subset_fraction=0.8,
num_simulations=10
)
print(subset_refute)
What should you look for?
- Placebo treatment: The estimated effect should drop to near zero. If it doesn’t, your model is picking up spurious correlations.
- Random common cause: Adding random noise as a confounder shouldn’t change your estimate much. Big changes suggest sensitivity to unmeasured confounders.
- Data subset: Estimates should be stable across different random samples. High variance means you might need more data.
Teams that skip refutation often publish causal estimates that completely fall apart under scrutiny. Run these tests every time. No exceptions.
Incremental Lift Calculation: Converting Causal Estimates to Dollars
Here’s the thing about CFOs: they don’t care about Average Treatment Effects. They care about dollars. So how do you translate causal estimates into an incremental lift measurement that finance will actually use?
def calculate_channel_roi(estimate, df, treatment_col, cost_per_exposure):
"""Convert causal estimate to incremental revenue and ROI."""
# Causal effect per exposed user
incremental_revenue_per_user = estimate.value
# Total exposed users
exposed_users = df[treatment_col].sum()
# Total incremental revenue attributable to channel
total_incremental_revenue = incremental_revenue_per_user * exposed_users
# Total spend (assuming we know cost per exposure)
total_spend = cost_per_exposure * exposed_users
# ROI calculation
roi = (total_incremental_revenue - total_spend) / total_spend
return {
'incremental_revenue_per_user': incremental_revenue_per_user,
'exposed_users': exposed_users,
'total_incremental_revenue': total_incremental_revenue,
'total_spend': total_spend,
'roi': roi,
'roas': total_incremental_revenue / total_spend
}
# Example calculation
results = calculate_channel_roi(
estimate=estimate_psm,
df=df,
treatment_col='paid_social',
cost_per_exposure=2.50 # $2.50 CPM equivalent per user
)
print(f"Incremental Revenue per User: ${results['incremental_revenue_per_user']:.2f}")
print(f"Total Incremental Revenue: ${results['total_incremental_revenue']:,.2f}")
print(f"ROAS: {results['roas']:.1f}x")
Now you’ve got a way to measure incremental revenue by channel using DoWhy in a format that actually drives budget decisions.
DoWhy vs. CausalImpact: Choosing Your Tool
People constantly ask me about when to use DoWhy versus Google’s CausalImpact (or its Python port). Here’s my take:
Use DoWhy when:
- You have cross-sectional or panel data at the user level
- You can identify confounders and specify a causal graph
- You want to measure the effects of ongoing campaigns
- You need multiple estimation methods and refutation tests
Use CausalImpact when:
- You have time series data at the aggregate level
- You’re measuring the impact of a discrete event (campaign launch, market entry)
- You have good pre-period data for building a synthetic control
- You don’t have a user-level treatment assignment
In practice, many analysts use both: CausalImpact for “Did this campaign launch move the needle overall?” and DoWhy for “What’s the causal effect of exposure at the user level?” They answer different questions.
For marketing attribution without cookies, causal methods like these are becoming the only reliable option. You’re measuring actual causal relationships, not relying on tracking that’s increasingly blocked or deprecated.
Putting It All Together
Let’s recap what we’ve built in this DoWhy Python marketing attribution tutorial:
- A causal DAG that maps how your marketing channels actually affect revenue
- Multiple estimation methods that control for confounders
- Refutation tests that validate your findings aren’t spurious
- A framework for converting causal estimates to incremental revenue
To deploy this in production, here’s what I recommend:
Start small. Pick one channel and one outcome metric. Get stakeholder buy-in on the causal graph before running any code. Too many projects fail because the data scientist built something brilliant that marketing didn’t trust.
Run continuously. Causal effects drift over time. What worked last quarter might not work now. Build pipelines that re-estimate effects monthly.
Communicate uncertainty. Always report confidence intervals, not just point estimates. And explain the refutation test results to stakeholders, even if they glaze over slightly.
Iterate on your DAG. Your first causal graph will be wrong. That’s fine. As you learn more about your business, update the graph and re-run the analysis.
What I love about causal inference for marketing mix modeling in Python is that it forces you to be explicit about your assumptions. Traditional attribution hides assumptions in black boxes. Causal inference puts them front and center, where they can be debated, tested, and improved.
Here’s my challenge to you: pick one marketing channel this week and build a simple causal model. Run the refutation tests. Compare what you find to what your current attribution says. I’m betting you’ll discover that at least one of your “top performing” channels isn’t performing at all.








