Building LLM apps that chain LLMs with vector databases, agents and more? Playing with
@LangChainAI
@gpt_index
+
@OpenAI
@huggingface
LLMs +
@pinecone
DB? Try TruLens for LLMs -- the first open source library to evaluate and track your LLM experiments.
Stanford CS 329T: Trustworthy Machine Learning: Large Language Models and Applications -- a course I am teaching with John Mitchell (Stanford) and Ankur Taly (Google) started today!
We will be putting a lot of content online. Check out the course page and stay tuned for more
CMU colleagues
@Mmffmmffdd
@zicokolter
and my former PhD student Zifan Wang have created automated safety attacks on open source LLM chatbots that surprisingly transfer also to closed source chatbots -- ChatGPT, Bard, Claude.
Open foundation models are playing a key role in the AI ecosystem. On September 21,
@RishiBommasani
@percyliang
@random_walker
and I are organizing an online workshop on their responsible development. Hear from experts in CS, tech policy, and OSS.
RSVP:
7/27: Live at the Midway, SF -- join
@jerryjliu0
,
@shayaksen
and me for a hands-on workshop on building and evaluating LLM apps -- RAGs, Data Agents and more -- with open source tools
@llamaindex
+
@TruLensML
! Let's build the future responsibly.
#LLMs
Excited to chair the National Academies workshop today on Trustworthy AI. Expect an insightful, actionable discussion with distinguished panelists and organizing committee.
@DMulliganUCB
@aleks_madry
@benbendc
@susan_dumais
J Wing, DJ Dvijotham
Don't miss our workshop starting tomorrow on how we can design trustworthy
#AI
systems for use in finance, transportation, and health! Hear from experts including Facebook’s
@jquinonero
, FDA’s
@_bakulpatel
and UPenn’s
@mkearnsupenn
. Register here:
Properly evaluating agents is an under-explored problem - what are the key tests you should run on inputs/outputs? 🧪🤖
Check out this brand-new notebook that helps to explore that on our
@llama_index
Yelp agent! 🧑🍳
Huge s/o to the
@truera_ai
team:
@ylecun
Cool,
@ylecun
.
@CarnegieMellon
has had related Interdisciplinary efforts, including the Center for Automated Learning and Discovery from 1997 that became Machine Learning Department in 2006. Not identical in that it's a department within the School of CS but many affiliated
A new LLM, called phi-1.5, released today by
@SebastienBubeck
and his team
@MSFTResearch
. Released as open source to support open research on AI foundations, transparency, and safety, this model does text completion with quality that compares favorably to much larger LLMs.
A couple of interesting recent papers are making progress toward significantly smaller LLMs for tasks like writing and coding. Perhaps the data gap between training LLMs and learning for children can be bridged over time
@mcxfrank
via:
@karpathy
,
@SebastienBubeck
,
@EldanRonen
"Textbooks Are All You Need" is making rounds:
reminding me of my earlier tweet :). TinyStories is also an inspiring read:
We'll probably see a lot more creative "scaling down" work: prioritizing data quality and diversity over
Pleasure joining
@AzitaMartin
(VP and GM for AI in Retail & more, NVIDIA), Xun Wang (CTO, Bloomreach), and Christina Augustine (COO, Bloomreach) on a panel on Generative AI in Retail aptly named "No Limits".
Key takeaways:
1/n
Insightful keynote by
@pwang
at
@DataCouncilAI
! Key takeaway as data science adoption grows in Enterprise: enable exploration & innovation by data scientists with guardrails to mitigate risk and enable adoption by mlops at scale in production
#DataScience
#MLOps
#DataCouncilAI
Reflections on GenAI, operationalizing AI, and the role of education from a week in Europe:
I spent the last week speaking at and hanging out with a few thousand startups, SMBs, and enterprises at
@WorldSummitAI
in Amsterdam, having in-depth technical conversations with leaders
My blog post on AI Quality attributes that are critical for AI to generalize well, respect societal expectations of transparency and fairness, and sustainably deliver business value throughout its lifecycle. Important for every data science and MLOps team.
@geomblog
@ginasue
@mathbabedotorg
That said, going from basic research to impact on practice does present a number of hurdles -- access to high impact algorithmic systems is one; incentives for organizations to adopt accountability technologies is another. Some of those points in the article make sense. 2/2
Proper evals are important when setting up a complex retrieval-augmented system:
- Overall App I/O
- LLM I/O
- Retrieved context calls
- Latency
- Token counts
We show how you can eval all these RAG components when using LlamaIndex with
@truera_ai
! 🧪
4/n 3. There are significant concerns around hallucinations in generative use cases in retail and an ask for better LLM app evaluations. We are seeing some promising results on evaluations as we build on our open source
@TruLensML
library to work with a number of companies.
Tooling to enable systematic testing and monitoring of AI Quality (data quality, model performance, societal impact indicators such as fairness and privacy) is a key part of the solution.
@geomblog
@ginasue
@mathbabedotorg
includes faculty from CMU, Cornell, ICSI. This area is growing in academia, including tenure-track and tenured faculty participation. So, difficult to accept the claim that academia is asleep. 1/2