How are effects of online A/B tests distributed? How often are they not significant? Does achieving significance guarantee meaningful business impact?
We answer these questions in our new paper, “False Discovery in A/B Testing”, recently out in Management Science >>
The paper is co-authored with Christophe Van den Bulte and analyzes over 2,700 online A/B tests that were run on the
@Optimizely
platform by more than 1,300 experimenters.
Link to paper:
Non paywalled:
>>
Generally, firms should try to test more radical variations in order to have larger impact on consumer behavior (we call this “swing for the fences”). Most variations will probably not be very impactful, but once in a while an innovation will prove to be very lucrative.
(Fin)
@Jeffely
Indeed the number of adults fully vaccinated in Israel has been quoted at 85% by officials.
This is what the age distribution of those recently testing positive in last month. Majority is from last two weeks. (Green: women, blue: men, y axis is age group).
Our analysis shows that in our data, hypothesis tests conducted with alpha=0.05 yield an FDR of 18%-25%.
Much higher than 5%.
That is, about 20% of significant effects chosen for implementation will not generate the business impact that was observed in the experiment.
>>
@mike_luca
@garjoh_canuck
@ronnyk
Yes. Just pick the outcome with the highest mean (Test & Roll) and you can use Latent Stratification if you really care about significance. Sometimes it helps.
On T&R:
On LS:
We then classify effects into “null” (zero) and “non-null” (pos or neg), to understand how many experiments, on average, have an underlying zero effect.
The answer is about 70%.
That is, 70% of effects will not show any impact on Engagement compared to a baseline.
>>
Traveling with a group of Penn Faculty on a Mission to Israel to build bridges with the Israeli academic community and bear witness to the impact of the Hamas attacks.
First visit since Oct 7.
Deep thoughts going through my mind rn.
#penn
#upenn
#pennfaculty
#mission
#israel
Been teaching digital marketing
@Wharton
for over 7 years.
Have had multiple students reach hundreds of thousands of social media followers in projects.
Still can't describe the (happy childish) feeling of this junior professor who reached 2,000 followers today.
Hi All! >>
First, we analyze the effects of all the A/B tests in our data. They are quite small. The median (and average) webpage variations have roughly zero effect on webpage Engagement.
But the distribution is quite long-tailed with some variations showing big effects.
>>
🚨Pre-doc Announcement🚨
@HummySong
and I are hiring a research associate to start in Summer 2022.
Interested in marketing analytics, causal inference, health economics and healthcare operations mgmt?
Come work with us
@Wharton
!
Details:
Please RT!
First, I coincidentally grab the table that a Nobel prize winner was eyeing at a coffeeshop.
Then, another Nobel prize winner shows up to my seminar talk.
Talk about a stressful day.
#stanford
Cue in TWFE DiD, Synthetic controls and what not.
Special quirk - our retailers have different time trends and potentially endogenous adoption timing.
Solution was a combination of
@jmwooldridge
’s POLS regression,
@ArkhangelskyD
et al. SynthDiD, and an IV.
First on the agenda - a meeting with the president of Israel, Mr. Isaac Herzog.
"You will meet a nation agonizing. An atomic bomb of emotions" the president told us.
#penn
#upenn
#pennfaculty
#mission
#israel
A big draw of the paper is that
@Optimizely
have graciously allowed us to publish the data we used in the analysis. We hope this would be valuable to other researchers as well.
>>
@shimrizzz
told us: "As a student your obligation is to always ask questions and not take anything for granted. Don't believe even me, ask questions and you'll see what evil has happened here and should never happen again."
Read more about this quote below
>>
alpha is the false positive rate (FPR), or
Pr(effect is significant | effect is null).
We care about the opposite,
Pr(effect is null | effect is significant),
which is called the false discovery rate (FDR).
>>
We use multiple methods to estimate the rate of true nulls and the FDR, but one was particularly fun to learn about, as it was developed to estimate false discoveries in genomewide studies. You can read about it here: .
>>
I know we always complain about reviewers, but dear authors, if you submit a paper which is over 45 pages and less than 1.5 spacing (on initial submission!) you are not helping the paper...
@HughJassole96
@JewishWonk
"did not happen in a vacuum" is like saying "but I can understand it" as in " I denounce the Holocaust, but it did not happen in a vacuum".
It's ok to say some things are just not ok. Ever.
Have to hand it to
#Stanford
.
Among university expense reimbursement processes, they are by far the easiest and most modern.
Not only do they transfer the money using Zelle (and not, e.g., send a check), but they also collect feedback afterwards!
#niche
I learned to make empanadas from my mom when I was a young, but we always bought the dough.
So I thought it's really hard to make, but my sister insisted it's easy and sent recipes.
So today's weekend experiment was a success!
#argentino
#vamoscarajo
Our paper estimates the business costs of these false discoveries, and discusses and tests possible solutions that firms can implement. The details are a bit beyond the scope of this thread, but we hope that the paper with the accompanying data and code will prove useful.
>>
Luckily, all these methods showed the same converging results, which are that adopting the descriptive dashboard yields an increase in revenues, diversity of products sold, number of repeat customers and number of transactions.
As usual, causality disclaimers apply.
>>
By comparing the results of dashboard users to non-users, we see that only users reap the benefit from the dashboard.
This allows us to rule out improved performance due to an unobserved and unrelated mechanism.
>>
Often I find it hard to track conference deadlines and plan academic year travel.
Here's a live list of CfPs relevant to quant marketers.
Did I miss anything? Let me know...
#MarketingAcad
h/t:
@eleafeit
@dade_us
@AnikoOery
Why are descriptive analytics so popular then?
Although they often leave users to generate their own insights, they provide a simple way to assess different decisions, enabling managers to extend the range of actions they can take and to integrate new technologies.
>>
You can learn a lot about consumer preferences in these times.
Clearly Trader Joe's customers didn't yet realize that Bamba is superior to anything else, especially in defeating plagues.
Also, why not smoked fish for a quarantine? Delish!
@TaylorLorenz
I teach a pretty serious (and popular) course on digital marketing
@Wharton
and your reporting is tremendously helpful in analyzing case studies and learning how phenomena spread.
Please ignore the haters and keep up the great work...
Time to celebrate! I just learned that the great
@eleafeit
recently got tenure at
@DrexelUniv
@LeBow
. She's a world-class scholar and I'm so happy for her!
@haneenshib
יש לי חבר שגר בארהב שכל פעם שהוא מתקשר לשירות לקוחות של משהו (נניח חברת תעופה או בנק) אם תוך כמה דקות לא נפתר מה שהוא רצה הוא פשוט מנתק (בלי לומר שלום) ומנסה שוב.
הוא גילה שיותר קל ליפול אקראית על מישהו שרוצה לעזור מלהסביר עצמך לאמריקאי שלא מבין כלום...
ארבע לפנות בוקר. יושב בשדה התעופה בסינגפור בדרך חזרה לפילדלפיה.
בטלויזיה CNN משדרת תמונות מתל-אביב של מדינה שהממשלה שלה מפוררת אותה.
ולי עצוב, ותחושת מחנק וחוסר אונים.
We used data from over 1,500 small and medium ecommerce global sellers (with mostly Shopify stores) with average monthly revenues of ~$60K.
Every retailer adopted an analytics dashboard that displayed KPIs such as weekly sales, avg basket size, conversion rate etc.
>>
Visiting communities in southern Israel to bear witness to the aftermath of the Oct 7 Hamas attacks.
This might be our most difficult and emotional visit on this trip.
Last time I was here was about 20 years ago. It also required bullet proof vests...
My dream academic job: freelance scientific “closer.” You hire me to do skilled, unemotional revision work when you’ve completely lost the will to continue with a project, and I never have to start one.
The concept of a true null is somewhat subtle (as we explain in the paper), since we often assume that true-nulls don’t exist.
However, as long as we test for a null hypothesis of a true null using significance testing, there is no reason to assume the null cannot be true.
>>
Are social media algorithms to blame for filter bubbles, online polarization and the shallowness of online content?
The answer is maybe, but probably not. All based on a recent paper with
@zsoltkatona
: “Curation Algorithms and Filter Bubbles in Social Networks”.
[Thread]
On New Year's Day, 2024, as war waged in Israel and Gaza, a group of professors from the University of Pennsylvania boarded an El Al flight to Tel Aviv. Representing fields as diverse as statistics, film, orthopedics, and law, they brought messages of friendship and support.
>>
Traveling with a group of Penn Faculty on a Mission to Israel to build bridges with the Israeli academic community and bear witness to the impact of the Hamas attacks.
First visit since Oct 7.
Deep thoughts going through my mind rn.
#penn
#upenn
#pennfaculty
#mission
#israel
Why are so many university leaders hesitating to condemn these atrocities? Because they are scared of the very extreme "social justice" warriors who lack any sense of critical thinking. Those who claim that responsibility lies on "both sides" and who support Hamas actions \1
Academics are one of the biggest groups using the
#TwitterAPI
to research what’s happening. Their work helps make the world (& Twitter) a better place, and now more than ever, we must enable more of it.
Introducing 🥁 the Academic Research product track!
Updating and expanding the list of Call for Papers for quant marketing related conferences ().
Any CS conferences that you think are a good fit?
@gzervas
@dade_us
@_ajbc
First on the agenda - a meeting with the president of Israel, Mr. Isaac Herzog.
"You will meet a nation agonizing. An atomic bomb of emotions" the president told us.
#penn
#upenn
#pennfaculty
#mission
#israel
37 behavioral scientists designed a
23 condition megastudy testing different sets of
1-2 text messages to boost vaccinations among
689,693
@Walmart
pharmacy customers
430 forecasters tried to predict what worked
NOW our results are out in
@PNASNews
... 🧵
Instead, descriptive analytics serve to help retailers monitor additional marketing technologies (martech) and amplify their value.
Most retailers adopt additional technologies, but only the retailers that use the dashboard are able to benefit from them.
>>
@deaneckles
@causalinf
@eleafeit
and I have, in R, with both MLE and Stan models. If you can assume monotonicity (no defiers), you get upper and lower bounds with simple averages. For more identification, more assumptions are needed. Good reference, IMO:
A big logistical challenge with the Pfizer vaccine is apparently the shipment size (975 per tray). Once defrosted, they go bad quickly.
In Israel that has been the largest hurdle with distribution to small non-dense locations.
When the retailer adopted the dashboard, the dashboard’s provider also collected historical data, so we see performance before and after the adoption.
Because the dashboard was adopted in different times, the adoption is staggered, which you know what it means… 🙀🙀🙀
>>
A very interesting analysis on the troubling news from Israel that the vaccine might seem ineffective against Delta.
Once the variant started spreading more evenly (unfortunately) we're starting to see how great the vaccine is.
Vaccine impact in Israel: changing trends. Analysis thread with
@AArgoetti
.
From about a month and a half ago we are experiencing a new “Delta” burst. Cases are accumulating, and more and more severe patients are in the hospitals.
A statistically significant result generated by a true null effect is called a false discovery. Although we often think of the significance threshold we set for hypothesis testing (e.g., alpha=0.05) as the rate of false discoveries, this is not actually what we get.
>>
Rightmost column - age group. Leftmost column - percent of age group vaccinated in Israel (first shot).
At current rate, the 60+ age group should be covered within 2-3 weeks.