On 5th-6th April 2024, TAIS will bring together leading AI safety experts in Tokyo to discuss how to make AI safe, beneficial, and aligned with human values.
Technical AI Safety Conference (TAIS 2024), held on 5th-6th April 2024 in Tokyo, will cover frontier areas of research in AI safety, including Mechanistic Interpretability, Scalable Oversight and Agent Foundations. Learn more:
#TAIS2024
#NoeonResearch
On 5–6 April, 2024, leading global experts in AI safety will gather at the Technical AI Safety (TAIS) conference in Tokyo. To join the discussion on safe, beneficial and aligned AI, register to attend (free, in-person or virtual):
#TAIS2024
#NoeonResearch
In his talk at
#TAIS
, Stan van Wingerden shared the discoveries of singular learning theory and how they pave the way for fresh prospects in interpretability, mechanistic anomaly detection, and the exploration of inductive biases. He elaborated on his vision for the field's
#TAIS2024
is brought together in partnership with
@AIAlignNetwork
– a newly established nonprofit organization in Japan. At the conference, The AI Alignment Network will share their vision and strategies for making Japan an emerging hub of interest in AI safety research.
In his talk at
#TAIS2024
, Manuel will discuss the notions of active inference and the free energy principle, and their role in AI safety. He will explain how these concepts can help with defining "what agents are" and "what agents do", and, in particular, how Markov blankets can
@manuelbaltieri
and
@36zimmer
will share insights from their endeavours in the field of Artificial Life (ALIFE). ALIFE is an interdisciplinary approach to AI that blends computer science, robotics and biology. It is especially popular in Japan, where much of AI research happens
Don't miss
@jesse_hoogland
captivating talk at
#TAIS2024
on the structure of neural networks and the links between learning theory and interpretability! Watch now:
#AISafety
Stan van Wingerden's
#TAIS2024
talk on how singular learning theory opens doors for interpretability, anomaly detection, and alignment research starts soon. Watch live now:
#AIsafety
@Klingefjord
's
#TAIS2024
talk on aligning
#AI
with human values begins. He'll share insights from using large language models to elicit and reconcile values across 500 Americans on divisive ethical issues. Watch live now:
#AIsafety
Thank you for coming to
#TAIS2024
, in-person or virtually! Goodbye for now, but we will be coming back soon with photos from the conference, recordings of the talks and future events!
#TAIS2024
#AISafety
Koen Holtman will show how AI can be made safe by tweaking its utility function. He will demonstrate that various conditions for corrigibility and domestication can be achieved through setting the utility function to 'Maximise X, while acting as if Y'.
Tim Parker will present the framework for safe and ethical AI, in which ethical values are encoded in formal logic. By prioritising these formalised values, AI systems will prevent themselves from exhibiting unsafe behaviour, and proceed with maximum safety even in dangerous
We are excited to announce that the Technical AI Safety Conference (TAIS 2024) will take place on 5th-6th April, 2024 in Tokyo, Japan. TAIS will bring together leading experts and rising voices in AI safety research. Learn more:
#TAIS2024
#NoeonResearch
Martin's presentation concerns circumstances under which an input-output AI system can be claimed as agentic. Martin will formalise the research question for a specific class of systems known as Moore machines, sharing the insights of detecting agency in this particular case and
In his talk at
#TAIS2024
, Manuel will discuss the notions of active inference and the free energy principle, and their role in AI safety. He will explain how these concepts can help with defining "what agents are" and "what agents do", and, in particular, how Markov blankets can
@DanHendrycks
, author of GELU and director of
@ai_risks
, will deliver the keynote at
#TAIS2024
. In his talk, Dan will discuss representation engineering (RepE) – an emerging area that seeks to enhance the transparency of AI systems with insights from cognitive neuroscience.
#TAIS2024
was made possible by its primary partner
@NoeonAI
– Tokyo-based AI startup building an alternative AI architecture. On Day Two, Noeon Research's CEO
@KrutikovAndrei
will present his team's approach to safe, interpretable by design AI. Read more:
Most researchers underinvest in explaining and promoting their research. At
#TAIS2024
,
@robertskmiles
of YouTube fame will share a few tips, tools and techniques that you can use to multiply the impact of your research.
Kicking off the first day of the conference is
@ryan_kidd44
, Co-Director of ML Alignment & Theory Scholars (MATS) Program! Ryan will summarise MATS' insights into selecting and developing AI safety research talent and their plans for future projects.
#TAIS2024
Watch the сo-director of
#MATS
@ryan_kidd44
talk about MATS' achievements and future plans at
#TAIS2024
! Ryan talked about MATS' mission and goals, shared its insights and observations in
#AIsafety
, outlined the program's ambitions to accelerate high-impact scholars and support
Tim Parker will present the framework for safe and ethical AI, in which ethical values are encoded in formal logic. By prioritising these formalised values, AI systems will prevent themselves from exhibiting unsafe behaviour, and proceed with maximum safety even in dangerous
The field of Agent Foundations seeks to understand fuzzy concepts like agency in a rigorous mathematical way, aiming to formally prove safety properties of agentic systems. At
#TAIS2024
, this field will be represented by Tim Parker and Koen Holtman.
At
#TAIS2024
,
@manuelbaltieri
will soon discuss how active inference and free energy principle ideas help define "what agents are" and "what they do", and how agents can be separated from their environment. Watch live:
#AISafety
In his talk at
#TAIS2024
,
@manuelbaltieri
shared the concepts of active inference and the free energy principle, highlighting their significance in
#AIsafety
. He explained how these ideas contribute to defining "what agents are" and "what agents do", particularly emphasizing the
At
#TAIS
, Miki Aoyagi talked about singular learning theory, revealing that learning coefficients of multiple-layered neural networks with linear units remain bounded, even when the number of layers approaches infinity. Her groundbreaking research opens up new possibilities for
@KrutikovAndrei
, CEO of
@NoeonAI
– general partner of
#TAIS2024
– will present his startup's approach to safe AI. In his talk, he will argue that interpretability is the crux of AI safety. Andrei will discuss how his team defines interpretability and how that definition equips
Research that
@hoagycunningham
will present at
#TAIS2024
delves into the issue of finding the right directions in activation spaces of LLMs, among the plethora thereof. Hoagy will explain sparse autoencoders (SAE) as an emerging approach to solving such problems.
At
#TAIS2024
,
@robertskmiles
, well-known on YouTube, offered valuable advice on amplifying the impact of your research, sharing a range of tips, tools, and techniques for effective research communication. Watch now:
#AIsafety
In their talk, James and Matt will discuss the intelinkages between causality, agency and AI safety. They will demonstrate possible practical applications of their theoretical findings by showcasing their approaches towards developing ‘agency detectors’.
#TAIS2024
Tim Parker's talk on a framework for encoding values for
#safeAI
in formal logic is about to start at
#TAIS2024
. Tim will show how his approach prevents unsafe behavior and ensures safety in dangerous contexts. Watch live:
#AISafety
@OskarJohnH
unveils sentiment in LLMs is linearly encoded - intervening on this direction cripples sentiment tasks. Oskar's research exposes underlying mechanisms like attention summarizing sentiment at non-emotional tokens like commas. Oskar will how disrupting this "summarized"
Research that
@hoagycunningham
will present at
#TAIS2024
delves into the issue of finding the right directions in activation spaces of LLMs, among the plethora thereof. Hoagy will explain sparse autoencoders (SAE) as an emerging approach to solving such problems.
Miki Aoyagi's
#TAIS2024
talk starts soon. Her research shows the learning coefficients of deep linear NNs are bounded, despite infinite layers. Watch live now:
#AIsafety
At
#TAIS2024
,
@jesse_hoogland
is about to show how transformers exhibit discrete developmental stages during in-context learning, when trained on language or linear regression tasks. Watch live now:
Closing the Day Two of
#TAIS2024
,
@noahysiegel
will address the issue of faithfulness in LLMs - whether their outputs faithfully reflect the underlying factors influencing the response. Watch live:
#AISafety
Wrapping up the Mechanistic Interpretability section at
#TAIS2024
,
@noahysiegel
will address the issue of faithfulness in LLMs, or whether the models' outputs regarding their reasoning trajectory for a given response truly reflect the underlying factors influencing the output in
@OskarJohnH
unveils sentiment in LLMs is linearly encoded - intervening on this direction cripples sentiment tasks. Oskar's research exposes underlying mechanisms like attention summarizing sentiment at non-emotional tokens like commas. Oskar will how disrupting this "summarized"
At
#TAIS2024
,
@hoagycunningham
will shortly present research on finding the right directions in LLM activation spaces. Hoagy will explain sparse autoencoders as an emerging approach to solving these challenges. Watch live:
#AISafety
At
#TAIS2024
@noahysiegel
addressed the issue of faithfulness in LLM's, or whether the models' outputs regarding their reasoning trajectory for a given response truly reflect the underlying factors influencing the output in question. Watch now:
#AIsafety
@AleksPPetrov
presented at
#TAIS2024
the mechanics of prefix-tuning, a method for approximating model responses by tuning its initial tokens. His research shows that this approach can universally approximate the behavior of a small model. Watch now:
@KrutikovAndrei
, CEO of
@NoeonAI
, is about to present
#TAIS2024
his startup's approach to interpretability by design. Andrei will discuss how Noeon defines it and how that guides their work on a safe AI architecture. Watch live:
#AISafety
YouTube star
@robertskmiles
is about to share tips, tools & techniques to help researchers multiply the impact of their work, as many underinvest in explaining and promoting their research. Watch live:
@emmons_scott
and
@klingefjord
will conclude the first day of
#TAIS2024
by sharing their research in Scalable Oversight! This field discusses how to ensure AI Safety by keeping humans in the loop, allowing them to effectively oversee advanced AI systems as they scale in
AI must align with human values, but how? Our last speaker of the day, Oliver Klingefjord, tackles this by 1) Eliciting people's values on ethical issues; 2) Reconciling values into an "alignment target", which uses a large language model to interview participants about their
@DanHendrycks
,
@ai_risks
director, is about to present his talk on the WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning at
#TAIS2024
. He will explain CUT, a new SOTA unlearning method. Watch live now:
#AIsafety
At
#TAIS2024
,
@DanHendrycks
, director of
@ai_risks
, unveiled his presentation on the WMDP Benchmark, focusing on measuring and mitigating malicious usage through unlearning. He introduced CUT, a cutting-edge unlearning technique. Watch now:
#AIsafety
The field of Agent Foundations seeks to understand fuzzy concepts like agency in a rigorous mathematical way, aiming to formally prove safety properties of agentic systems. At
#TAIS2024
, this field will be represented by Tim Parker and Koen Holtman.
Scott will discuss the issues of parial observability in reinforcement learning from human feedback (RLHF). Challenging a common assumption that human evaluators fully observe the environment in which they give feedback, he shows that, under certain conditions, RLHF is guaranteed
Registrations for
#TAIS2024
will remain open throughout both days of the conference. Fill out the form and feel free to come to International Conference Hall in Odaiba, Tokyo, or join us virtually today and tomorrow:
@manuelbaltieri
and
@36zimmer
will share insights from their endeavours in the field of Artificial Life (ALIFE). ALIFE is an interdisciplinary approach to AI that blends computer science, robotics and biology. It is especially popular in Japan, where much of AI research happens
At
#TAIS2024
@emmons_scott
will soon discuss partial observability issues in reinforcement learning from human feedback, showing it can lead to deception or overjustification under certain conditions. Watch live now:
#AIsafety
Koen Holtman will show
#TAIS2024
how setting the function to "Maximize X, while acting as if Y" can make it safe. He'll demonstrate those are sufficient conditions for corrigibility & domestication. Watch live:
#AISafety
Elevate Your Enterprise with AI Leadership!
#TAIS2024
is organized in partnership with the AI Industry Foundation (AIIF) – a network of companies in the Asia-Pacific region coming together to share their AI expertise. Learn more at:
@jesse_hoogland
, Miki Aoyagi & Stan van Wingerden will present latest findings in the field of Developmental Intepretability, which aims to uncover how and why structure emerges in neural networks over the course of training, with an eye to preventing sharp left turns.
#TAIS2024
@jesse_hoogland
will demonstrate that in-context learning emerges in transformers in discrete developmental stages, when they are trained on either language modeling or linear regression tasks. Jesse will also share 2 novel methods for detecting these stages.
#TAIS2024
@jesse_hoogland
, Miki Aoyagi & Stan van Wingerden will present latest findings in the field of Developmental Intepretability, which aims to uncover how and why structure emerges in neural networks over the course of training, with an eye to preventing sharp left turns.
#TAIS2024
@OskarJohnH
will soon unveil to
#TAIS2024
how sentiment in LLMs is linearly encoded. Oskar will show how sentiment is summarized at non-emotional tokens, disrupting which decimates zero-shot sentiment classification. Watch live:
#AISafety
Miki Aoyagi will share her invaluable insights into singular learning theory. Her research will show that the learning coefficients of multiple-layered neural networks with linear units are bounded even though the number of layers goes to infinity.
#TAIS2024
During his presentation at
#TAIS2024
,
@Klingefjord
outlined his approach to addressing the challenge of ensuring that
#AI
aligns with human values. He described his method, which involves: 1) eliciting people's values regarding ethical matters, 2) consolidating these values into
At
#TAIS2024
,
@AleksPPetrov
is about to shed light on the mechanics of prefix-tuning - approximating model responses by tuning initial tokens. He'll show a small model's behavior can be thereby universally approximated. Watch live:
#AISafety
Stan van Wingerden will discuss how the findings of the singular learning theory open new opportunities for interpretability, mechanistic anomaly detection, and the study of inductive biases. He will share his thoughts on the field's future role in alignment research.
#TAIS2024
@OskarJohnH
reveals that sentiment within
#LLMs
is encoded linearly, and intervening on this axis detrimentally impacts sentiment-related tasks. Oskar's research unveils how underlying mechanisms such as attention summarize sentiment even at non-emotional tokens like commas.
In their talk at
#TAIS2024
,
@James_D_Fox
and
@mattmacdermott1
explored the interconnectedness of causality, agency and
#AIsafety
. They illustrated potential real-world implementations of their theoretical insights by presenting their strategies for creating 'agency detectors'.
Coming back after a short coffee break at 3.30pm JST with
@AleksPPetrov
's talk on how prefix-tuning can approximate model behaviour!
#TAIS2024
#AISafety
At
#TAIS2024
@36zimmer
formalised the notion of agency for Moore machines. He shared his insights into detecting agency within this context and speculated about its broader implications. Watch now:
#AIsafety
Scott Emmons discussed at
#TAIS2024
the issues of partial observability in reinforcement learning from human feedback (RLHF). He challenged the prevalent notion that human evaluators have complete awareness of the environment when providing feedback. Scott revealed that under
Koen Holtman illustrated how
#AI
safety can be enhanced by adjusting its utility function. He demonstrated that conditions for corrigibility and domestication can be met by configuring the utility function to 'Maximize X, while acting as if Y'. Watch now:
@AleksPPetrov
will shed light on the mechanics of prefix-tuning, which is an approach to model responses approximation, whereby initial tokens are tweaked. His research demonstrates that a small model's behaviour can be universally approximated by using this approach.
#TAIS2024
@OskarJohnH
unveils sentiment in LLMs is linearly encoded - intervening on this direction cripples sentiment tasks. Oskar's research exposes underlying mechanisms like attention summarizing sentiment at non-emotional tokens like commas. Oskar will how disrupting this "summarized"
@ReaktorNow
is proud to sponsor
#TAIS2024
in Tokyo! It will be a unique opportunity for researchers to learn and connect, and Reaktor is delighted to help make it happen. Find out what they’ve learned from 100+ AI projects at
#AISafety
#AI
@ryan_kidd44
, Co-Director of ML Alignment & Theory Scholars (MATS) Program, begins his presentation at
#TAIS2024
! Ryan will summarise MATS' insights into selecting and developing AI safety research talent and their future plans. Watch live now:
In his talk At
#TAIS2024
,
@hoagycunningham
's research focused on navigating activation spaces within LLMs, putting forward sparse autoencoders (SAE) as a way to of identify therein optimal directions.
Watch now:
#AIsafety