If you are curious about when neural nets can perform causal inferences, & more fundamentally, how neural & causal models are related, check out: “The Causal-Neural Connection: Expressiveness, Learnability, and Inference”: (with K Xia, K Lee, Y Bengio)
Interested in Causal Inference & Reinforcement Learning? Consider attending my
@icmlconf
tutorial on the basic principles & tools of Causal Reinforcement Learning (CRL). I’ll discuss many new & pervasive learning challenges/opportunities within CRL. Link:
As promised, here is the video of my talk last week
@MIT
on causal inference, generalizability, & fusion: . The lecture includes a summary of recent advances in the field. Thanks,
@VC31415
et al. for the intellectually stimulating discussions.
@mitidss
The WHY-21 workshop "Causal Inference & Machine Learning: Why now?" is currently accepting submissions, . Our goal is to bring CI & ML researchers together to discuss the nextgen AI! (joint w/
@yudapearl
, Y. Bengio, T. Sejnowski,
@bschoelkopf
)
@NeurIPSConf
1/5 If you are interested in causal inference & machine learning, I am excited to share some of the latest work of the CausalAI Lab that will appear
@ICMLconf
this year. We welcome you to stop by and chat during the sessions or at some point during the conference…
#ICML2022
The WHY'21 workshop "Causal Inference & Machine Learning: Why now?" will take place this Monday at
#NeurIPS2021
. Our goal is to bring CI & ML researchers together to discuss the nextgen AI! Program (joint w/
@yudapearl
,
@bschoelkopf
, Y Bengio, T Sejnowski)
1/2 Thanks to the 5k new followers in this past year! To celebrate, two announcements -- First, we are beta-testing a tool called ‘Fusion’, which offers an easy-to-use way of doing causal inference from 1st principles (see
#bookofWHY
). Subscribe here: .
1/6 If you are interested in causal inference & machine learning, I am excited to share some of the latest work of the causal artificial intelligence group that will appear
@NeurIPSConf
this year. We welcome you to stop by and chat with us at the following times…
#NeurIPS2021
1/n Dear friends, I am pleased to announce that after four productive years here at Purdue, I decided to move to Columbia University, starting on this July 1st. I am excited about the new possibilities and adventures that we’ll have in causality-land in the next years, both in
Our work on general identifiability was just selected as UAI-19 Best Paper Award (1 out of 450)! The paper provides the conditions necessary to reason with experimental distributions; e.g., one can combine two dists, do(X1) & do(X2), to answer about the joint effect, do(X1, X2).
UAI 2019 best paper award to research on Causal Inference: "General Identifiability with Arbitrary Surrogate Experiments" by Sanghack Lee, Juan D. Correa and Elias Bareinboim (
@eliasbareinboim
).
#UAI2019
.
Hi Judea, thank you for your kindness and for sharing your wisdom & friendship during this long journey.
I feel humbled and blessed to have the opportunity to study & research the foundations of causal inference and AI with you.
Congratulations to
@eliasbareinboim
for getting tenure at Columbia University. Simultaneously, congratulations go to Columbia University for a securing its leadership in next-generation AI.
This advancement strengthens causal inference research with an important academic
1/2
@jonathanrlarkin
Hey Jonathan, cool to see you interested in causality! :)
I would say that the starting point for causal researchers is that we are NOT useful in tasks that reside in layer 1 (L1) of Pearl’s hierarchy, i.e., purely predictive tasks (). I believe the
1/3 Just made available the video & slides for my
#ICML2020
“Causal Reinforcement Learning” (CRL) tutorial: . CRL combines the strengths of Causal Inference(CI) & RL to solve novel/practical decision-making problems that neither approach can tackle alone!
1/6 If you are interested in causal inference & machine learning, I am excited to share some of the latest work of the causal artificial intelligence group that will appear
@NeurIPSConf
this year. We welcome you to stop by and chat with us at the following times…
#NeurIPS2020
The call for papers of our causal inference and machine learning symposium is out -- "Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI," which will happen this spring at Stanford. Details:
@yudapearl
@bschoelkopf
@CsabaSzepesvari
If you are in Boston and curious about the relationship between causal inference & reinforcement learning, which I have been calling "causal reinforcement learning", I'll be talking tomorrow morning at MIT Samberg center, 6th Floor. For more details, see
The slides of Judea's talk - "The Foundations of Causal Inference, with Reflections on ML and AI" - this week at the
#WHY19
is available, see ; more coming soon.
Just made available the video of my 2020 MSR's Frontiers of Machine Learning talk - "On the Causal Foundations of AI", link: . To understand some of the results (incl. PCH & CHT), see joint chapter w/ J. Correa, D. Ibeling, T. Icard:
The WHY-19 will be happening from Mar/25-27 @ Stanford. The theme this year is "Beyond Curve Fitting: Causation, Counterfactuals, Imagination-based AI". We have great speakers, including J. Pearl, Y. Bengio, K. Imai, J. Ioannidis. Don't miss!!
#bookofwhy
As promised, link to my keynote on the interactive causal learning conf. last week, including my talk + nice Q & A session: . I summarized the current state of the art in causal AI, and tried demystify & contextualize recent progress & real challenges.
2/2 Due to server constraints, be patient, access will be 'raffle' style. Second, we created a Youtube channel w/ causal inference videos. So far we have
@yudapearl
's keynote @ WHY-19 & my lecture on causal data science
@Columbia
. Watch, subscribe & share:
Neural causal models were introduced in + the 1st general algorithm to perform causal inference entirely *inside* a neural net! There is indeed a tradeoff bt symb. & OPT-based methods for ID, elaborated in App C4, p 43; see also FAQ-Q11
@ylecun
@GaryMarcus
1/6 If you are curious about how causality & imitation learning are related, check our latest
#NeurIPS2020
paper (w/
@junzhez
@danielkumor
): “Causal Imitation Learning with Unobserved Confounders,” & stop by our oral presentation later today (09:15 EST).
The call for papers for the Journal of Causal Inference's special issue on the data-fusion challenge -- combining observational and experimental data -- is out! I will be co-editing it with
@mark_vdlaan
. Please, consider submitting your best work:
Our paper "Causal Inference and Data-Fusion in Econometrics” is finally out, . We discuss recent results in Causal AI in the context of Econometrics. It has been a pleasure to work w/
@PHuenermund
, who wrote a nice tweet-thread explaining the paper (below).
We are developing a tool called ‘Fusion’ that is fully compatible w/ the
#BookOfWHY
, following the discussion in Ch. 10. Fusion offers an easy-to-use way of doing causal inference & fusion from 1st principles, incl. do-calc. If interested, subscribe here .
@yudapearl
@wiredmau5
@yudapearl
What do you think about DoWhy and CausalImpact? Are there any tools that you would recommend for when you are ready to put what you have read into practice? Thanks!
#Bookofwhy
I will be talking today in the MSR's Frontiers in Machine Learning 2020 -- "On the Causal Foundations of Artificial Intelligence (Explainability & Decision-Making)" -- . If you are attending the event, I would be glad to chat more.
@amt_shrma
@emrek
I'll be talking about causal inference & the data-fusion next week at Harvard, see the link below. If you are around, consider stopping by. I'll be covering topics such as transportability & generalizability as discussed in this PNAS paper , See you!
If you are around Palo Alto, I’ll be speaking at the Stanford Business School (
@StanfordGSB
) about “Causal Data Science’, i.e., how to make transparent & principled causal inferences from data. When: Mon Oct/7, 1:10 - 2:30pm; Where: Rm G101, Gunn Bldg.
@StanfordMed
@StanfordEng
If you are in Boston, I’ll be speaking at the
@MIT
graphical models workshop on “Causal Data Science’. When: Today 9:30 am; Where: MIT Bldg 2 (Math). I'll be around for a few hours after the talk, drop me a line if you are interested in talking more about CDS.
@mitidss
@MIT_CSAIL
Regardless of RL, which has interesting connections w/ CI (), Deep & Causal modes of reasoning are connected in a fundamental way, as recently discovered: . I believe this starts addressing Judea's concerns, as quoted in footnote 4.
Agree, my take on generalization related to causal knowledge: . The foundations are pretty stable but there have been some newer results since I wrote this around 2014. Thanks
@yudapearl
for the inspiration & partnership. Summary:
🔔Virtually every important question in AI revolves around the scope and limits of generalization—yet there is little consensus about what generalization even means or how to test it.
Check out
@_dieuwke
et al’s new synthesis,
@genbench
, to get some much-needed clarity.
I'll be talking about causal inference & data-fusion at MIT next week; see details . If you are around, consider stopping by. I'll be covering topics such as generalizability & transportability, following our PNAS paper … See you!
How can we handle the biases that emerge when piecing together multiple datasets collected under heterogeneous conditions? Join us next Monday (May 2) for an IDSS Distinguished Seminar with
@eliasbareinboim
on Causal Inference and Data Fusion.
Nice blog post () by
@an1lam
discussing some common misconceptions about Causal Inference & summarizing some of what he learned in my course this Spring. It was nice having you around, Stephen, thanks!
If you are at ICML, check our paper (w/
@YonghanJung
& Jin Tian) on the first family of Double-ML estimators for any identifiable effect computable from observational data (i.e., equiv. class of DAGs):
When: July/23, 12-2 am (EST),
1/ The words “model” & “diagram” seem to be overloaded & shouldn't be conflated, which we clarified in our new chapter (). i) Model (top left, Fig 1.2) = A structural causal model (SCM) is a collection of mechanisms underlying a system & it’s always there.
@yudapearl
One thing I missed in "The Book of Why" was on how causal models come to be. Most of the discussion in the book already assumed some kind of model/diagram. Many "curve-fitting" ML practitioners I talked to struggle especially on this part and are therefore skeptical about it.
One way of attaching causal meaning in regression is thr Instrumental Variables (Wright, 1928). In a new paper w/
@analisereal
@danielkumor
, we unify the literature on linear identification & develop a method for finding Instrumental Cutsets (general. IV)
Thanks for sharing your thoughts, Amit. Recall, just to add some clarity in terms of context, my comment regarding
@ylecun
&
@yudapearl
's posts is neither about generative nor about deep learning versus causal; those are pacified issues in the literature. In other words, we now
So from this perspective, I see the value of both
@ylecun
and
@eliasbareinboim
viewpoints.
True that we cannot learn causal agents simply from observing the world, since observed data is usually confounded.
Carlos Cinelli (
@analisereaal
) just gave a wonderful talk about external validity, transportability, and generalizability in causal inference, which is at the core of what CI is all about. The slides are available here: .
For those who are curious, this is the paper the cute baby is learning about, "Estimating Causal Effects Using Weighting-Based Estimators" (). The task is to allow the estimation of identifiable expressions that go beyond the usual backdoor/IPW cases.
Thanks for writing the piece,
@erichorvitz
! I think "causal inference" should be put front and center since, as part of the AI community, we could help provide sound foundations for many challenges (including safety, equity, robustness, transparency, and understanding). Happy to
Thanks for sharing this. Somewhat curious perspective by on causality. Apparently, he believes that causality is just one among "hundreds of different things that deep learning doesn't do well" & his example of another is "explainability"! Thoughts?
I bring this up, not as indictment of ML leaders but as an enticement of their students. These "goodies" are essentially solved using causal modeling, a fact that should jolt every curious student into asking: "HOW?" and into exploring the Laws and tools of Causal Inference.
Insightful note by investor
@RayDalio
on the necessity of understanding cause-effect relations to make robust decision-making & the insufficiency of machine learning, (~3 mins): . We do have a language & tools to encode this understanding today!
@yudapearl
1/3 This seems to be a nice extension of the work causal-neural connection () for GNNs, but I haven't fully read it (congrats,
@kerstingAIML
& team!). Still, we answered precisely this question in Q11 in the FAQ (p. 52), also attached for your convenience.
2/ is: "What is gained by this translation?" Namely, why not conduct the analysis in the SCM framework and let GNN take over the estimation part only, as is done here
An explicit characterization of the gain would be helpful.
@ylecun
@yudapearl
> "You don't necessarily need RL to train a causal world model." (...) "Only SSL from off-line observation data and planning."
If I understand what you are saying, in addition to problems of scale that we can leave aside for now, your proposal assumes that the way the offline
I believe what you are looking for is called "causal inference" (CI) - every problem I found in RL is solved from 1st principles by tools developed within the CI framework. For a few cataloged examples, see , ,
I don't have any beautiful new framework to replace RL with. But I hope that during my lifetime someone will come up with one. My guess is that it will not come out of current formalisms, such as the POMDP framework, and that most of them will be pointless after the revolution.
Thanks for your interest,
@haschyle
, I just posted the slides here: . The talk was recorded and I expect the video should be available sometime in the near future, stay tuned.
@StanfordGSB
Nice rebuttal to Sutton's note by
@wellingmax
. The folks doing causal inference may be excited to read the note since we learn case after case that there's no such thing as model-blind causality. Certainly worth reading to see how causality fits the larger conversation in ML.
8/8 If anyone is in the NY area & wants to grab a coffee, drop me a line. We should have a friendly & intellectually stimulating environment to receive scholars interested in learning more about CI, doing research in the field, or using CI methods in their own research. See you!
@eliasbareinboim
will present "Causal Data Science: A general framework for data fusion and causal inference" as part of the Distinguished Lecture Series on Monday, April 1 at 11:40am in CSB 451.
#DataScience
#causalinference
1/2 I don't disagree but my feeling from interactions at NeurIPS this year is that there's a deeper phenomenon going on. Folks want to do CI since they feel it's important, incl. because of your new book, but they haven't spent the time & energy learning what CI is really about.
@ngutten
@jsusskin
Interesting viewpoint. Trouble is, regression is such a small part of CI that to say "To do CI with regression requires an extra step" is almost like saying: "To do CI with algebra requires an extra step". Its better to say "Do CI first, add NN if needed"
@ylecun
#Bookofwhy
@ylecun
@yudapearl
Yann, what is the difference between what you call "causal prediction" and computing next-state probabilities in a typical reinforcement learning (RL) setting?
As we elaborated through our work on Causal RL (), there are exciting things coming from RL, but
Hi Judea, I just learned about Danny from your tweet. His work is really groundbreaking and he is a unique human being, very generous, humble, and curious. I have some memories and will share one. We first met in person when I was interviewing here at Columbia about 5 years ago.
1/4 Broadly, because estimating a cond. prob. P(Y | X) is easier than a joint one, P(Y, X, Z), when X & Z are high-dimensional. Indeed, this observation led to (non-causal) graphical models in the 1980s, including Bayesian nets (ie, non-causal DAGs), Markov random fields & so on.
2/6 Tue 11:30 am (EST) “The Causal-Neural Connection: Expressiveness, Learnability, and Inference”, with Kevin Xia, Kai-Zhan Lee, and Yoshua Bengio. Link: .
No, but my UAI tutorial this year, which is a longer version of this talk, was recorded and should be available sometime soon... Also, our CRL survey should be made available sometime soon, I hope. Thank you for your interest, I'll keep you guys posted.
For the folks following the thread, the disentanglement of the assumptions and conclusions is essential in data science. After months of discussions & going around, Ivan’s clean & crisp analysis sheds light on a pervasive issue of transportability. Thanks for sharing this, Ivan.
In this post and multiple others,
@f2harrell
says that a generally correct way to analyze RCTs with binary outcomes is to fit a logistic regression. One of the main claims is that this leads to "one-number" effect measures that are "highly transportable".
1/12 Brdly spk, assuming one can infer the effect of an intervention when experimental data matching exactly this intervention is available is not surprising, causally-speaking. Why? There's no cross-rung inference, but a typical function approx exercise. This is Judea's main...
@yudapearl
@GaryMarcus
@moultano
What precisely do you mean by "mathematical impossibility"? Causal graph can be inferred if observations are sampled using different random interventions. Such inference can be done using DL . Then CI can be done on top using another DL model
We have an example written to answer this issue; see p. 28, example 10 (), and summarized in the diagram below. We can have different Bayes nets generating the same obs. data (P(V) ) & that naively would entail different do-distributions P(Y | do(X)).
@julianschuess
But people do this all of the time. They fit a statistical model to the data (a markov net over a DAG), and do "causal" calculations. Is there an interpretation, especially given that there is fundamentally no way of verifying from the data if the model is causal or not?
What is the Purpose of Statistical Modelling? (by David Hand) -- . Nice piece by David Hand that called my attention since, while only implicitly, it seems to suggest a connection of mainstream Stats with the Ladder of Causation.
Speaking of advertising for a job or a postdoc, here is one from
@eliasbareinboim
I do not know what could be more "innovative" than research on Causal AI. If I was qualified I would go for it.
Thanks! Lang. L1 (association), L2 (intervention), L3 (counterfactual) are increasingly complex & higher-order lang. include the lower ones. Sec 1.3 formalizes the expressiveness issue; Theorem 1 (pp. 22) shows the containment is strict; for intuition, see examples 7-9 (~p. 27).
@PHuenermund
@yudapearl
@eliasbareinboim
Great paper. It is not clear to me why L3 questions are separated from L2. It seems that L2 is a special instance of L3. L2 is done using do() notation, but this is abandoned when doing L3.
Everyone wants data-driven smth, sure, but Judea's perspective is technical, rooted in the Causal Hierarchy Theorem (CHT) (p. 22, ). The CHT says that for almost any causal model, the layers of the hierarchy do not collapse (examples 7-9 are illustrative).
@yudapearl
It’s unlikely that data-driven ML will be replaced. Rather it will be augmented by causal modeling. And I bet that even causal modeling will eventually become data-driven itself in the form of causal generative models.
Following
@causalinf
request, here are some of
@yudapearl
personal memories from his early days, which were taken out of the
#bookofwhy
: . It's still unpublished material but authorized for your enjoyment. More to come.
#bookofwhy
The goal shouldn't be "good in-distribution generalization" but extracting the proper invariances to work in the real world, which distribution rarely matches the sample (chang. conditions). In causality, this task is known as statistical transportability,
@yudapearl
If I were to think of a reason, it’d be that a good chunk of ML is about getting good in-distribution generalization (yes, “curve-fitting”). The lack of a sound causal treatment doesn’t bite that hard in this setting.
@tdietterich
@hardmaru
I would just add that even though not all causal invariances are learnable from interactions, some of them are. I think the most promising setting is reinforcement learning, where interventional capabilities are available (e.g., , ).
1/5 Happy to share some of our latest work that will be presented this week at NeurIPS in New Orleans!
The authors would be delighted to see you at the poster session and talk more about our current work and future challenges!
Tue 6:15 pm (Poster Session 2)
"Causal discovery
@ccaballeroh10
@yudapearl
@ylecun
@sirbayes
The terminology game is indeed complicated and usually arises when we attempt to make interdisciplinary claims, but point well taken.
To focus on the substance, my suggestion for those interested in learning more is to first:
(1) understand what control theory has accomplished,
Thanks for sharing your thoughts, K. The universality of NNs does NOT help causal reasoning. In general, it's not that a task would be "harder," but impossible to solve. I am sharing below a recent paper that investigates this issue from first principles,
Some random musings on how the whole "AI scaling" debate. It is clear that neural nets can model any predictive distribution p(x_future|x_past) given enough parameters and data, since they are universal approximators;
@ildiazm
It's surprising to hear
@f2harrell
saying that, it seems the opposite of how things evolved. I am attaching an intro written in Stat Sci ~10yr ago on threads vs. assumptions in the ctx of RCT's external validity, where transportability theory started.
For those interested in causal inference & around NYC, I'll be talking about causal data science & modeling today (Nov 21) at CUIMC. Time: 1:00 pm - 2:30 pm. Location: Hammer Health Sciences Building, 701 West 168th Street, Room # LL-106
@ColumbiaMSPH
@ColumbiaMed
@DSI_Columbia
The wisdom in this line is quite remarkable & largely unknown by most folks outside the field (& some reviewers inside :)). My 2 cents: this is one of the first instances where the requirement of having a model was relaxed after understanding it seriously:
2/2
Along a similar vein, I was asked to retweet the last line of my slides in the Why-19 symposium . Gladly; it reads: "Only by taking models seriously we can learn when they are not needed." And I still vow for it.
#Bookofwhy
Hi Damien, thanks for raising the issue. I am not sure I fully buy into the "camps" comparison, with all due respect, and I would add that most folks I know who are doing causal inference research are not focused on the worst-case scenario but on systematic "understanding," which
My meta impression on "ML optimists" & "causal reasoning" camps:
- one tends to focus on empirical results about what can be done (with observational data, scaling...)
- the other on principled reasons about what can't be done (worst case scenarios)
We need both points of view.🙏
Scientific progress happens but at a slower pace than it should by any reasonable standard (see, eg, ). I volunteer to help, perhaps talking w/ some of the agencies leaders? Reviewers are extra conservative, perhaps need enlightened leadership?
@tdietterich
Our well-intentioned funding agencies are partly responsible for this mis-balanced mis-investment. They all want & fund "explainable AI" "robust AI" "life-long learning" etc etc. forgetting that, once projects are run by traditionally-trained PI's they end up "more of the same."
1/3 From Hume (footnote 2, ): “Nature has kept us at a great distance from all her secrets, & has afforded only the knowledge of a few superficial qualities of objects; while she conceals from us those powers & principles, on which...
I think an AI system's ability to perform tasks like casual inference and symbolic processing will eventually be learned or evolved as an emergent property that just happens to be useful for its environment. Not be something that is formally defined and hand-engineered by humans.
@yudapearl
's talk at CIFAR Machines & Brains workshop -- "Data versus Science: Contesting the Soul of Data-Science": . (Due to some Zoom issues, the synchronization of the video & slides was not perfect, apologizes!!)
@bschoelkopf
@MILAMontreal
@CIFAR_News
5/5 Thr 11:45 am (Poster Session 5)
"A Causal Framework for Decomposing Spurious Variations"
(joint work with Drago Plecko)
In this paper, we study confounded variations, which are a fundamental aspect of constructing explanations and may help to answer questions ranging from
The parameters of the SCM are almost never identifiable from obs. data, check the Causal Hierarchy Thm (p. 22) & examples 7-9 (p. 24-26, ). Still, estimating effects using DAGs as proxies for the SCMs makes total sense, e.g., see .
@eliasbareinboim
@deaneckles
@pablogerbas
I missed the scope of this statement "so estimating/fitting the parameters makes no sense" -- is this specific a paper or do you mean it never makes sense to estimate parameters using a DAG?
1/2 I would phrase it more precisely - NCM is a special class of SCMs used as proxies of the true SCM & where the functions are neural nets. IF one wants to solve an identification task, eg, yes, a search over the NCM space would be entailed. App. C4 (p 42) discusses this point.
Hey
@andrewgwils
, thanks for sharing your thoughts. I am not a radiology expert, but to address a broader, related point -- isn't the success of these methods in this setting due to the huge amount of annotated data available, coming from real, qualified radiologists? As far as I
@MelMitchell1
@geoffreyhinton
The whole point is he wasn't trying to be precise about the exact timescale. It's not 10 vs 5. It was more: "we should start changing the way we train radiologists, and those in professions where we can collect ample data and be greatly assisted by machine learning".
Hi
@ylecun
, thank you for bringing attention to this concerning picture of the current situation.
From my observation, many academics, myself included, might not be so closely monitoring the issue, and as a result, might be unaware that certain individuals or companies are
@tegmark
@RishiSunak
@vonderleyen
Altman, Hassabis, and Amodei are the ones doing massive corporate lobbying at the moment.
They are the ones who are attempting to perform a regulatory capture of the AI industry.
You, Geoff, and Yoshua are giving ammunition to those who are lobbying for a ban on open AI R&D.
If
2/5 Tue 11:45 am (Poster Session 1)
"Estimating Causal Effects Identifiable from Combination of Observations and Experiments"
(joint work with
@YonghanJung
,
@ildiazm
, and Jin Tian)
The task investigated in this paper is to develop a family of estimators for identifiable