prscrew.com

Maximizing Conversions and CTRs through A/B Testing and Chi-Squared Analysis

Written on

Chapter 1: Introduction to A/B Testing

A/B testing stands as one of the most applicable concepts in data science within professional settings. Despite its relevance, many still find it challenging to grasp fully due to its complexities.

For instance, numerous experimentalists rely on the t-test to assess whether significant differences exist between two options. But what happens when the distribution isn't presumed to follow a Gaussian shape? How do we proceed if the standard deviations of the two samples differ? What if the distribution remains entirely unknown?

In this article, I will explore a specific A/B testing technique that effectively compares click-through rates and conversion metrics.

Section 1.1: Defining A/B Testing

In its most straightforward form, A/B testing involves conducting an experiment with two variants to evaluate which one performs better based on a defined metric. Typically, two groups of consumers are exposed to two different versions of an item to determine if there are notable differences in metrics such as session counts, click-through rates, or conversion rates.

For example, we might randomly divide our customer base into two segments: a control group and a variant group. The variant group might see a red banner on our website, and we would analyze whether this change leads to a significant uptick in conversions. It’s crucial to ensure that all other factors remain constant during the A/B test.

From a more technical perspective, A/B testing is a statistical method involving two-sample hypothesis testing. This involves comparing a sample dataset against population data to ascertain whether the observed differences between two samples are statistically relevant.

Section 1.2: The Role of the Chi-Squared Test

What unites click-through rates and conversion rates? Both can be modeled using a Bernoulli Distribution, which is a discrete probability distribution that accounts for outcomes of 1 (success) and 0 (failure). In the case of click-through rates, a user either clicks (1) or does not click (0). Similarly, for conversions, a user either converts (1) or does not convert (0).

As we are conducting an A/B test on conversions, a categorical variable following a Bernoulli distribution, the Chi-Squared Test becomes our tool of choice. The steps involved in performing a chi-squared test include:

  1. Calculating the chi-squared test statistic
  2. Determining the p-value
  3. Comparing the p-value to the significance level

This methodology will become clearer as we progress through the project example.

Section 1.3: Calculating the Chi-Squared Test Statistic

To compute the chi-squared test statistic, we can follow these steps:

Let's illustrate this with a scenario where we test two advertisements, A and B, to see if users clicked or didn't click on either ad. At the conclusion of the test, we gather the following data:

To analyze the results, we would perform four calculations and sum them:

  • Advertisement A, Click
  • Advertisement A, No Click
  • Advertisement B, Click
  • Advertisement B, No Click

For advertisement A, we find the observed value is 360. The expected value can be calculated by multiplying the total times ad A was displayed by the click probability, resulting in an expected value of 31.429. After executing similar calculations for the other scenarios, we can input these figures into the chi-squared formula to arrive at our test statistic.

Section 1.4: Understanding the Dataset

The dataset utilized for this A/B test is sourced from Kaggle (link to the dataset). It encompasses the results of an A/B test where two groups—the control group and the treatment group—were directed to an old webpage and a new webpage, respectively. Each row indicates a unique user and records whether they were part of the control or treatment group and whether they converted.

Chapter 2: A/B Testing Project Walkthrough

The first video, "Lecture 4.8 Applying Chi Squared to AB Tests," provides a comprehensive guide on employing the Chi-Squared Test in A/B Testing.

Importing Libraries and Data

import numpy as np

import pandas as pd

import scipy

import matplotlib.pyplot as plt

df = pd.read_csv('../input/ab-testing/ab_data.csv')

As a first step, I imported the necessary libraries along with the dataset.

Section 2.1: Data Cleaning

Before carrying out the chi-squared test, I wanted to ensure the data quality. The first task was to verify whether any users in the control group had been shown the new webpage, and vice versa.

df.groupby(['group', 'landing_page']).count()

The output indicated that some control group members had viewed the new page and some in the treatment group had seen the old page. To address this, I decided to exclude these cases.

df_cleaned = df.loc[(df['group'] == 'control') & (df['landing_page'] == 'old_page') |

(df['group'] == 'treatment') & (df['landing_page'] == 'new_page')]

After cleaning the dataset, I confirmed that the control group was limited to the old page and the treatment group to the new page. Next, I checked for duplicate user IDs.

df_cleaned['user_id'].duplicated().sum()

Section 2.2: Exploratory Data Analysis

Once the dataset was cleaned, I sought to gain insights through visualizations. I plotted a bar chart to compare conversion rates between both groups.

groups = df_cleaned.groupby(['group', 'landing_page', 'converted']).size()

groups.plot.bar()

Additionally, I used a pie chart to evaluate the distribution of users across both groups.

df_cleaned['landing_page'].value_counts().plot.pie()

Section 2.3: Preparing Data for the Chi-Squared Test

With the data cleaned and understood, I prepared it for the chi-squared analysis. The following code organizes the data into a 2x2 format suitable for the test.

# 1) Split groups into two separate DataFrames

a = df_cleaned[df_cleaned['group'] == 'control']

b = df_cleaned[df_cleaned['group'] == 'treatment']

# 2) A-click, A-noclick, B-click, B-noclick

a_click = a.converted.sum()

a_noclick = a.converted.size - a.converted.sum()

b_click = b.converted.sum()

b_noclick = b.converted.size - b.converted.sum()

# 3) Create np array

T = np.array([[a_click, a_noclick], [b_click, b_noclick]])

Section 2.4: Conducting the Chi-Squared Test

Once the data was structured properly, I could conduct the chi-squared test using the Scipy library.

import scipy

from scipy import stats

print(scipy.stats.chi2_contingency(T, correction=False)[1])

The calculated p-value was 19%. Given a significance level of 5%, it suggests that we do not reject the null hypothesis, indicating no significant difference in conversions between the old and new webpages.

The second video, "How to Do A/B Testing for Conversion Rate Optimization (WordPress)," offers practical insights for implementing A/B testing for enhancing conversion rates in WordPress environments.

Conclusion

To validate the results, I calculated the click-through rates for both groups, revealing a minimal difference, reinforcing the integrity of the conducted experiment.

Thank you for reading! I hope this tutorial has provided valuable insights into A/B testing and its application in maximizing conversions.

-- Terence Shin

Founder of ShinTwin | Connect with me on LinkedIn | Explore my Project Portfolio

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Harnessing the Power of Silence: 8 Strategies for Mindfulness

Discover 8 powerful strategies to leverage silence for mindfulness and effective communication.

Building a Python-Driven Startup: A Comprehensive Guide

Explore how Python's features and ecosystem empower startups, from development to scaling operations.

Can We Detect Warp Drives from Extraterrestrial Civilizations?

A study explores how we might identify warp drive signatures from advanced alien technology using gravitational waves.

The Frightening Future of AI: How CHAT GPT Threatens Creativity

Exploring the impact of AI like CHAT GPT on creative professions and the future of writing.

Starlink Faces Competition: OneWeb's Satellite Internet Strategy

An exploration of how OneWeb poses a challenge to Starlink in satellite internet services, comparing their strategies, costs, and performance.

# Unlocking the Present: Overcoming Overthinking for a Fulfilling Life

Discover how to stop overthinking and embrace the present moment for a richer, more fulfilling life.

Navigating the Unseen: Embracing Life's Unexpected Turns

Exploring personal growth and the challenges of postpartum depression amid a pandemic.

Exploring the Intriguing Realm of Subatomic Particles

Discover the essential aspects of subatomic particles, their types, properties, and applications in science and technology.