Technical AI Safety Conference (TAIS) @tais_2024 profile

Technical AI Safety Conference (TAIS)

@tais_2024

Followers

193

Following

34

Media

77

Statuses

101

On 5th-6th April 2024, TAIS will bring together leading AI safety experts in Tokyo to discuss how to make AI safe, beneficial, and aligned with human values.

https://t.co/zDbUnS6hrS

Tokyo, Japan

Joined February 2024

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

KAI FOR BIG4 • 78440 Tweets

BIG FOUR RAIN • 63942 Tweets

#虹7th_day1 • 44635 Tweets

#にじさんじ大感謝祭 • 36231 Tweets

AioonMay LOVE ORBIT • 31631 Tweets

ランウェイ • 26563 Tweets

Spurs • 25301 Tweets

Tottenham • 24746 Tweets

FB X BOOKEXPO2024 • 24270 Tweets

West Ham • 24154 Tweets

KL Rahul • 20187 Tweets

Ange • 16066 Tweets

Kudus • 14333 Tweets

スクイズ • 12934 Tweets

ジャクソン • 12550 Tweets

Tokopedia Happy Hour

キンタロー

Amad

I Stand With Ukraine

Werner

牧野先生

琴ちゃん

坂本勇人

イベおつ

Kulusevski

Paqueta

あと一球コール

大勢コール

坂本のヘッスラ

佐野大丈夫

Udogie

コロッケさん

Solanke

マルチャミー

Bowen

EşitYasa EşitAF

Maddison

Areola

Todibo

ベイスボール

Moyes

توتنهام

ダブルスチール

Lopetegui

アフライ

うさほー

#TOTWHU

アフターライブ

#توقع_مفاجاه_مياه_فجر

#شركه_تنظيف_ونقل_θ5θ1144θ66

Last Seen Profiles

@DarraghOBrienTD

@VinsmokeSa32854

@trading_axe

@texasbbqhouseaz

@kklakeyy

@XlQlNG

@yoitsgb

@JanaKaylee863

@JermaineTerryII

@PemuasBinor6

@HTCGAA

@hazard_response

@alukka1

@vitales_signos

@Hc249b

@SpaceX

@yadongwiki

@nyewarburton

@mengaogigante

@BjornMan75

Technical AI Safety Conference (TAIS)

@tais_2024

8 months

Technical AI Safety Conference (TAIS 2024), held on 5th-6th April 2024 in Tokyo, will cover frontier areas of research in AI safety, including Mechanistic Interpretability, Scalable Oversight and Agent Foundations. Learn more: #TAIS2024 #NoeonResearch

0

21

483

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

We are excited to share the full schedule of talks at #TAIS2024 ! Secure your free in-person spot here:

1

15

280

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

On 5–6 April, 2024, leading global experts in AI safety will gather at the Technical AI Safety (TAIS) conference in Tokyo. To join the discussion on safe, beneficial and aligned AI, register to attend (free, in-person or virtual): #TAIS2024 #NoeonResearch

0

14

24

Technical AI Safety Conference (TAIS)

@tais_2024

6 months

In his talk at #TAIS , Stan van Wingerden shared the discoveries of singular learning theory and how they pave the way for fresh prospects in interpretability, mechanistic anomaly detection, and the exploration of inductive biases. He elaborated on his vision for the field's

0

7

22

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

#TAIS2024 is brought together in partnership with @AIAlignNetwork – a newly established nonprofit organization in Japan. At the conference, The AI Alignment Network will share their vision and strategies for making Japan an emerging hub of interest in AI safety research.

0

12

16

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

In his talk at #TAIS2024 , Manuel will discuss the notions of active inference and the free energy principle, and their role in AI safety. He will explain how these concepts can help with defining "what agents are" and "what agents do", and, in particular, how Markov blankets can

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@manuelbaltieri and @36zimmer will share insights from their endeavours in the field of Artificial Life (ALIFE). ALIFE is an interdisciplinary approach to AI that blends computer science, robotics and biology. It is especially popular in Japan, where much of AI research happens

0

1

6

0

1

14

Technical AI Safety Conference (TAIS)

@tais_2024

6 months

Don't miss @jesse_hoogland captivating talk at #TAIS2024 on the structure of neural networks and the links between learning theory and interpretability! Watch now: #AISafety

0

3

12

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Stan van Wingerden's #TAIS2024 talk on how singular learning theory opens doors for interpretability, anomaly detection, and alignment research starts soon. Watch live now: #AIsafety

0

1

13

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@Klingefjord 's #TAIS2024 talk on aligning #AI with human values begins. He'll share insights from using large language models to elicit and reconcile values across 500 Americans on divisive ethical issues. Watch live now: #AIsafety

1

4

13

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Thank you for coming to #TAIS2024 , in-person or virtually! Goodbye for now, but we will be coming back soon with photos from the conference, recordings of the talks and future events! #TAIS2024 #AISafety

1

8

10

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Koen Holtman will show how AI can be made safe by tweaking its utility function. He will demonstrate that various conditions for corrigibility and domestication can be achieved through setting the utility function to 'Maximise X, while acting as if Y'.

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Tim Parker will present the framework for safe and ethical AI, in which ethical values are encoded in formal logic. By prioritising these formalised values, AI systems will prevent themselves from exhibiting unsafe behaviour, and proceed with maximum safety even in dangerous

0

8

0

10

Technical AI Safety Conference (TAIS)

@tais_2024

8 months

We are excited to announce that the Technical AI Safety Conference (TAIS 2024) will take place on 5th-6th April, 2024 in Tokyo, Japan. TAIS will bring together leading experts and rising voices in AI safety research. Learn more: #TAIS2024 #NoeonResearch

1

3

11

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Martin's presentation concerns circumstances under which an input-output AI system can be claimed as agentic. Martin will formalise the research question for a specific class of systems known as Moore machines, sharing the insights of detecting agency in this particular case and

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

In his talk at #TAIS2024 , Manuel will discuss the notions of active inference and the free energy principle, and their role in AI safety. He will explain how these concepts can help with defining "what agents are" and "what agents do", and, in particular, how Markov blankets can

0

1

14

0

2

10

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@DanHendrycks , author of GELU and director of @ai_risks , will deliver the keynote at #TAIS2024 . In his talk, Dan will discuss representation engineering (RepE) – an emerging area that seeks to enhance the transparency of AI systems with insights from cognitive neuroscience.

0

3

10

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

#TAIS2024 was made possible by its primary partner @NoeonAI – Tokyo-based AI startup building an alternative AI architecture. On Day Two, Noeon Research's CEO @KrutikovAndrei will present his team's approach to safe, interpretable by design AI. Read more:

0

1

9

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Most researchers underinvest in explaining and promoting their research. At #TAIS2024 , @robertskmiles of YouTube fame will share a few tips, tools and techniques that you can use to multiply the impact of your research.

0

10

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Kicking off the first day of the conference is @ryan_kidd44 , Co-Director of ML Alignment & Theory Scholars (MATS) Program! Ryan will summarise MATS' insights into selecting and developing AI safety research talent and their plans for future projects. #TAIS2024

0

9

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Day Two of the conference is about to begin! See you at the International Conference in Odaiba, Tokyo, or virtually: #TAIS2024 #AISafety

1

6

9

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

#TAIS2024 begins tomorrow! Register for in-person attendance (last seats available) or join us virtually:

0

3

8

Technical AI Safety Conference (TAIS)

@tais_2024

6 months

Watch the сo-director of #MATS @ryan_kidd44 talk about MATS' achievements and future plans at #TAIS2024 ! Ryan talked about MATS' mission and goals, shared its insights and observations in #AIsafety , outlined the program's ambitions to accelerate high-impact scholars and support

0

2

9

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Tim Parker will present the framework for safe and ethical AI, in which ethical values are encoded in formal logic. By prioritising these formalised values, AI systems will prevent themselves from exhibiting unsafe behaviour, and proceed with maximum safety even in dangerous

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

The field of Agent Foundations seeks to understand fuzzy concepts like agency in a rigorous mathematical way, aiming to formally prove safety properties of agentic systems. At #TAIS2024 , this field will be represented by Tim Parker and Koen Holtman.

0

6

0

8

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

At #TAIS2024 , @manuelbaltieri will soon discuss how active inference and free energy principle ideas help define "what agents are" and "what they do", and how agents can be separated from their environment. Watch live: #AISafety

0

2

8

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

We will shortly break for lunch, see you back at #TAIS2024 at 1.30pm for @DanHendrycks 's keynote! #AISafety

TAIS 2024 | Day 1 full livestream (2024-04-05)

This is the livestream from TAIS 2024, a technical AI safety conference hosted at the Plaza Heisei in Tokyo April 5th–6th. Today's sessions cover development...

www.youtube.com

0

3

8

Technical AI Safety Conference (TAIS)

@tais_2024

5 months

In his talk at #TAIS2024 , @manuelbaltieri shared the concepts of active inference and the free energy principle, highlighting their significance in #AIsafety . He explained how these ideas contribute to defining "what agents are" and "what agents do", particularly emphasizing the

0

3

7

Technical AI Safety Conference (TAIS)

@tais_2024

6 months

At #TAIS , Miki Aoyagi talked about singular learning theory, revealing that learning coefficients of multiple-layered neural networks with linear units remain bounded, even when the number of layers approaches infinity. Her groundbreaking research opens up new possibilities for

0

3

7

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@KrutikovAndrei , CEO of @NoeonAI – general partner of #TAIS2024 – will present his startup's approach to safe AI. In his talk, he will argue that interpretability is the crux of AI safety. Andrei will discuss how his team defines interpretability and how that definition equips

0

2

8

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Research that @hoagycunningham will present at #TAIS2024 delves into the issue of finding the right directions in activation spaces of LLMs, among the plethora thereof. Hoagy will explain sparse autoencoders (SAE) as an emerging approach to solving such problems.

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Concluding the presentation part of #TAIS2024 are @hoagycunningham @OskarJohnH @AleksPPetrov and @noahysiegel representing the field of Mechanistic Interpretability – the study of algorithms learned by neural networks. By trying to make the internals of a network interpretable

0

6

1

0

7

Technical AI Safety Conference (TAIS)

@tais_2024

5 months

At #TAIS2024 , @robertskmiles , well-known on YouTube, offered valuable advice on amplifying the impact of your research, sharing a range of tips, tools, and techniques for effective research communication. Watch now: #AIsafety

0

1

7

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

In their talk, James and Matt will discuss the intelinkages between causality, agency and AI safety. They will demonstrate possible practical applications of their theoretical findings by showcasing their approaches towards developing ‘agency detectors’. #TAIS2024

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

On #TAIS2024 , @James_D_Fox and @mattmacdermott1 will present their research in Causal Incentives – a subfield of agent foundations, focusing on causal graphical models.

0

7

0

7

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Tim Parker's talk on a framework for encoding values for #safeAI in formal logic is about to start at #TAIS2024 . Tim will show how his approach prevents unsafe behavior and ensures safety in dangerous contexts. Watch live: #AISafety

0

1

7

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@OskarJohnH unveils sentiment in LLMs is linearly encoded - intervening on this direction cripples sentiment tasks. Oskar's research exposes underlying mechanisms like attention summarizing sentiment at non-emotional tokens like commas. Oskar will how disrupting this "summarized"

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Research that @hoagycunningham will present at #TAIS2024 delves into the issue of finding the right directions in activation spaces of LLMs, among the plethora thereof. Hoagy will explain sparse autoencoders (SAE) as an emerging approach to solving such problems.

1

0

7

0

1

7

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Concluding the presentation part of #TAIS2024 are @hoagycunningham @OskarJohnH @AleksPPetrov and @noahysiegel representing the field of Mechanistic Interpretability – the study of algorithms learned by neural networks. By trying to make the internals of a network interpretable

0

6

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Miki Aoyagi's #TAIS2024 talk starts soon. Her research shows the learning coefficients of deep linear NNs are bounded, despite infinite layers. Watch live now: #AIsafety

0

7

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

At #TAIS2024 , @jesse_hoogland is about to show how transformers exhibit discrete developmental stages during in-context learning, when trained on language or linear regression tasks. Watch live now:

0

1

7

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Closing the Day Two of #TAIS2024 , @noahysiegel will address the issue of faithfulness in LLMs - whether their outputs faithfully reflect the underlying factors influencing the response. Watch live: #AISafety

0

2

7

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Wrapping up the Mechanistic Interpretability section at #TAIS2024 , @noahysiegel will address the issue of faithfulness in LLMs, or whether the models' outputs regarding their reasoning trajectory for a given response truly reflect the underlying factors influencing the output in

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@OskarJohnH unveils sentiment in LLMs is linearly encoded - intervening on this direction cripples sentiment tasks. Oskar's research exposes underlying mechanisms like attention summarizing sentiment at non-emotional tokens like commas. Oskar will how disrupting this "summarized"

0

1

7

0

7

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

At #TAIS2024 , @hoagycunningham will shortly present research on finding the right directions in LLM activation spaces. Hoagy will explain sparse autoencoders as an emerging approach to solving these challenges. Watch live: #AISafety

0

1

7

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

On #TAIS2024 , @James_D_Fox and @mattmacdermott1 will present their research in Causal Incentives – a subfield of agent foundations, focusing on causal graphical models.

0

7

Technical AI Safety Conference (TAIS)

@tais_2024

5 months

At #TAIS2024 @noahysiegel addressed the issue of faithfulness in LLM's, or whether the models' outputs regarding their reasoning trajectory for a given response truly reflect the underlying factors influencing the output in question. Watch now: #AIsafety

0

6

Technical AI Safety Conference (TAIS)

@tais_2024

5 months

@AleksPPetrov presented at #TAIS2024 the mechanics of prefix-tuning, a method for approximating model responses by tuning its initial tokens. His research shows that this approach can universally approximate the behavior of a small model. Watch now:

0

2

4

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@KrutikovAndrei , CEO of @NoeonAI , is about to present #TAIS2024 his startup's approach to interpretability by design. Andrei will discuss how Noeon defines it and how that guides their work on a safe AI architecture. Watch live: #AISafety

0

1

6

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

YouTube star @robertskmiles is about to share tips, tools & techniques to help researchers multiply the impact of their work, as many underinvest in explaining and promoting their research. Watch live:

0

6

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@emmons_scott and @klingefjord will conclude the first day of #TAIS2024 by sharing their research in Scalable Oversight! This field discusses how to ensure AI Safety by keeping humans in the loop, allowing them to effectively oversee advanced AI systems as they scale in

1

0

6

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

AI must align with human values, but how? Our last speaker of the day, Oliver Klingefjord, tackles this by 1) Eliciting people's values on ethical issues; 2) Reconciling values into an "alignment target", which uses a large language model to interview participants about their

0

6

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@DanHendrycks , @ai_risks director, is about to present his talk on the WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning at #TAIS2024 . He will explain CUT, a new SOTA unlearning method. Watch live now: #AIsafety

0

5

6

Technical AI Safety Conference (TAIS)

@tais_2024

6 months

At #TAIS2024 , @DanHendrycks , director of @ai_risks , unveiled his presentation on the WMDP Benchmark, focusing on measuring and mitigating malicious usage through unlearning. He introduced CUT, a cutting-edge unlearning technique. Watch now: #AIsafety

0

1

6

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Thanks to everyone who attended today’s livestream! We will resume tomorrow at 9:30 am JST. #TAIS2024

0

5

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

The field of Agent Foundations seeks to understand fuzzy concepts like agency in a rigorous mathematical way, aiming to formally prove safety properties of agentic systems. At #TAIS2024 , this field will be represented by Tim Parker and Koen Holtman.

0

6

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Scott will discuss the issues of parial observability in reinforcement learning from human feedback (RLHF). Challenging a common assumption that human evaluators fully observe the environment in which they give feedback, he shows that, under certain conditions, RLHF is guaranteed

1

0

6

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Registrations for #TAIS2024 will remain open throughout both days of the conference. Fill out the form and feel free to come to International Conference Hall in Odaiba, Tokyo, or join us virtually today and tomorrow:

0

1

6

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@manuelbaltieri and @36zimmer will share insights from their endeavours in the field of Artificial Life (ALIFE). ALIFE is an interdisciplinary approach to AI that blends computer science, robotics and biology. It is especially popular in Japan, where much of AI research happens

0

1

6

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@James_D_Fox and @mattmacdermott1 will soon discuss Causal Incentives at #TAIS2024 , exploring links between causality, agency and #AIsafety . They'll showcase 'agency detectors' that use causal graphical models. Watch live now: #AI

0

1

6

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

At #TAIS2024 @emmons_scott will soon discuss partial observability issues in reinforcement learning from human feedback, showing it can lead to deception or overjustification under certain conditions. Watch live now: #AIsafety

0

6

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Koen Holtman will show #TAIS2024 how setting the function to "Maximize X, while acting as if Y" can make it safe. He'll demonstrate those are sufficient conditions for corrigibility & domestication. Watch live: #AISafety

0

1

5

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Elevate Your Enterprise with AI Leadership! #TAIS2024 is organized in partnership with the AI Industry Foundation (AIIF) – a network of companies in the Asia-Pacific region coming together to share their AI expertise. Learn more at:

0

5

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

The conference is starting shortly! Join us live:

TAIS 2024 | Day 1 full livestream (2024-04-05)

This is the livestream from TAIS 2024, a technical AI safety conference hosted at the Plaza Heisei in Tokyo April 5th–6th. Today's sessions cover development...

www.youtube.com

0

5

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@jesse_hoogland , Miki Aoyagi & Stan van Wingerden will present latest findings in the field of Developmental Intepretability, which aims to uncover how and why structure emerges in neural networks over the course of training, with an eye to preventing sharp left turns. #TAIS2024

0

4

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@jesse_hoogland will demonstrate that in-context learning emerges in transformers in discrete developmental stages, when they are trained on either language modeling or linear regression tasks. Jesse will also share 2 novel methods for detecting these stages. #TAIS2024

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@jesse_hoogland , Miki Aoyagi & Stan van Wingerden will present latest findings in the field of Developmental Intepretability, which aims to uncover how and why structure emerges in neural networks over the course of training, with an eye to preventing sharp left turns. #TAIS2024

0

4

1

0

4

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

We will come back to #TAIS2024 after a lunch break! Drop in at 1.30pm JST for @robertskmiles 's talk on research communication: #AISafety

TAIS 2024 | Day 2 full livestream (2024-04-06)

This is the livestream from TAIS 2024, a technical AI safety conference hosted at the Plaza Heisei in Tokyo April 5th–6th. Today's sessions cover ALIFE, agen...

www.youtube.com

0

4

Technical AI Safety Conference (TAIS)

@tais_2024

5 months

@KrutikovAndrei , CEO of @NoeonAI , argues that interpretability is the crux of #AIsafety . At #TAIS2024 , Andrei discussed how his team defines interpretability and how that definition equips #NoeonResearch with a roadmap to build an alternative #AIarchitecture that is interpretable

0

2

4

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@OskarJohnH will soon unveil to #TAIS2024 how sentiment in LLMs is linearly encoded. Oskar will show how sentiment is summarized at non-emotional tokens, disrupting which decimates zero-shot sentiment classification. Watch live: #AISafety

0

2

4

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Miki Aoyagi will share her invaluable insights into singular learning theory. Her research will show that the learning coefficients of multiple-layered neural networks with linear units are bounded even though the number of layers goes to infinity. #TAIS2024

1

0

4

Technical AI Safety Conference (TAIS)

@tais_2024

6 months

During his presentation at #TAIS2024 , @Klingefjord outlined his approach to addressing the challenge of ensuring that #AI aligns with human values. He described his method, which involves: 1) eliciting people's values regarding ethical matters, 2) consolidating these values into

0

1

4

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

At #TAIS2024 , @AleksPPetrov is about to shed light on the mechanics of prefix-tuning - approximating model responses by tuning initial tokens. He'll show a small model's behavior can be thereby universally approximated. Watch live: #AISafety

0

3

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Stan van Wingerden will discuss how the findings of the singular learning theory open new opportunities for interpretability, mechanistic anomaly detection, and the study of inductive biases. He will share his thoughts on the field's future role in alignment research. #TAIS2024

0

3

Technical AI Safety Conference (TAIS)

@tais_2024

5 months

@OskarJohnH reveals that sentiment within #LLMs is encoded linearly, and intervening on this axis detrimentally impacts sentiment-related tasks. Oskar's research unveils how underlying mechanisms such as attention summarize sentiment even at non-emotional tokens like commas.

0

1

3

Technical AI Safety Conference (TAIS)

@tais_2024

5 months

In their talk at #TAIS2024 , @James_D_Fox and @mattmacdermott1 explored the interconnectedness of causality, agency and #AIsafety . They illustrated potential real-world implementations of their theoretical insights by presenting their strategies for creating 'agency detectors'.

0

3

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

Coming back after a short coffee break at 3.30pm JST with @AleksPPetrov 's talk on how prefix-tuning can approximate model behaviour! #TAIS2024 #AISafety

0

2

Technical AI Safety Conference (TAIS)

@tais_2024

5 months

At #TAIS2024 @36zimmer formalised the notion of agency for Moore machines. He shared his insights into detecting agency within this context and speculated about its broader implications. Watch now: #AIsafety

0

1

2

Technical AI Safety Conference (TAIS)

@tais_2024

6 months

Scott Emmons discussed at #TAIS2024 the issues of partial observability in reinforcement learning from human feedback (RLHF). He challenged the prevalent notion that human evaluators have complete awareness of the environment when providing feedback. Scott revealed that under

0

2

Technical AI Safety Conference (TAIS)

@tais_2024

5 months

Koen Holtman illustrated how #AI safety can be enhanced by adjusting its utility function. He demonstrated that conditions for corrigibility and domestication can be met by configuring the utility function to 'Maximize X, while acting as if Y'. Watch now:

0

2

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@AleksPPetrov will shed light on the mechanics of prefix-tuning, which is an approach to model responses approximation, whereby initial tokens are tweaked. His research demonstrates that a small model's behaviour can be universally approximated by using this approach. #TAIS2024

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@OskarJohnH unveils sentiment in LLMs is linearly encoded - intervening on this direction cripples sentiment tasks. Oskar's research exposes underlying mechanisms like attention summarizing sentiment at non-emotional tokens like commas. Oskar will how disrupting this "summarized"

0

1

7

0

1

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@ReaktorNow is proud to sponsor #TAIS2024 in Tokyo! It will be a unique opportunity for researchers to learn and connect, and Reaktor is delighted to help make it happen. Find out what they’ve learned from 100+ AI projects at #AISafety #AI

0

1

Technical AI Safety Conference (TAIS)

@tais_2024

7 months

@ryan_kidd44 , Co-Director of ML Alignment & Theory Scholars (MATS) Program, begins his presentation at #TAIS2024 ! Ryan will summarise MATS' insights into selecting and developing AI safety research talent and their future plans. Watch live now:

1

0

1

Technical AI Safety Conference (TAIS)

@tais_2024

5 months

In his talk At #TAIS2024 , @hoagycunningham 's research focused on navigating activation spaces within LLMs, putting forward sparse autoencoders (SAE) as a way to of identify therein optimal directions. Watch now: #AIsafety

0

1