![Parul Pandey Profile](https://pbs.twimg.com/profile_images/1795676874066268160/JR23v7bt.jpg)
Parul Pandey
@pandeyparul
Followers
7K
Following
4K
Media
469
Statuses
2K
Experimenting with AI tools to make education more accessible for kids| Author | @kaggle Grandmaster(Notebooks) | Mom
India
Joined December 2010
Professor Ashwin Rao of Stanford University has crafted a compelling @GoogleColab notebook that delves into the high-level reasons behind the failure of Silicon Valley Bank (#SVB). In this resource, Ashwin maintains a straightforward approach, employing only high-school-level
6
200
997
Extremely excited to share a project that we have been working on during the past months. A book on #MachineLearning for High-Risk Applications. You can read the pre-release here: #ResponsibleAI
14
157
855
This paper covers student misconceptions about #overfitting in #MachineLearning, solutions to overfitting, and implementation mistakes commonly confused with overfitting issues. 🔗:
7
166
782
Awesome. Google's Gemini Pro and Gemini Pro Vision are now available on @kaggle at no cost. There is also a notebook to help you get started:
6
120
584
#DataWrangler is a code-centric data cleaning tool 🧹that is integrated into #vscode and VS Code Jupyter Notebooks. With it, you can seamlessly clean and explore your data in VS Code ✨. 🐼 It automatically generates Pandas code and shows insightful column statistics and
7
101
462
One of the perks of being an O'Reilly author is getting early access to all their upcoming books. Today, I spent some time exploring the new book on language models by @JayAlammar and @MaartenGr. I'm thoroughly enjoying it, particularly the clear diagrams that make the learning
3
49
330
🤔 After having worked with #pandas library for so long, it's only recently that I came to know about the #read_clipboard functionality. 💡 You can create a dataframe directly from data copied to the clipboard using this method.🔗 Here is a link to read:
3
56
265
🔥 PyTorch Conference 2023 Highlights! 🔥. The videos from the #PyTorchConference2023 are now up on their official YouTube channel. While I'm still diving into the content, here are some standout sessions that caught my eye and are definitely worth a watch:. 1️⃣ What's New for
2
68
219
🎊 Career Update: After a successful stint at I'm excited to be joining the awesome folks at @weights_biases as a #MachineLearning Engineer. Really excited about this job role and looking forward to contributing to the product as well as the community.
10
6
207
Thank you, @Thom_Wolf for sharing your slides from the recent lecture at ELLIS Winter School. Despite its modest title, "A Little Guide to Building Large Language Models in 2024," the presentation is anything but 'little'—offering a deep dive into the intricacies of the workflow
1
42
204
The youtube channel @Socratica is highly underrated and deserves more support IMO. Not only do they have great educational content and coding videos, but the way it is delivered is also exceptionally good. The #Python and #SQL videos by @ulkaM are superb.
9
47
189
There is now a direct integration between arXiv and @huggingface Face. This makes it easy to find related papers, models, and datasets instantly on the HuggingFace website. Nice work by the team.
4
48
194
My notebook won an award in the ongoing @Kaggle 2019 Survey Challenge. What makes me even happier is that the topic I explored in the survey was related to women's representation in ML and DS and people could relate to it. @wimlds @h2oai @WiMLDS_HYD.
The 2019 Kaggle Survey Challenge is underway! The first notebook to win a prize explored women's representation in machine learning and data science:
14
21
175
The long-awaited video on Transformers by @3blue1brown is finally here! Even for those well-versed in Transformers, the visualizations in this video are simply outstanding.
1
22
141
@AndrewYNg #AIpun.Q: What is Brain's Favourite cable Television Channel?.A: The Neural Network.
0
34
135
Help @SETIInstitute build a student-run meteor observatory in #India! This will help them track long-period comets, small asteroids, Comet, & streams of meteoroids hurtling around our solar system. Contribute and help spread the word. Fundraiser:
1
38
119
Great presentation by one of our @kaggle GrandMasters, Guanshuo Xu, Principal Data Scientist at In a detailed walk-through, Guanshuo presents the strategy behind securing the second-place finish in the Kaggle competition whose objective was to discern
0
27
122
Just published - Different ways of getting #datasets for your #DataScience tasks. It's basically a compilation of 8 different articles that I wrote on the same topic, where each touches on a different technique.
2
34
116
Was trying to make this work and @OpenAI o3-mini nails it perfectly!. I’m building an educational app to help kids visualize 3D shapes easily. The goal was to create a 3D cube that responds to touch, rotates when clicked, and unfolds into a net when double-clicked. Will replicate
8
24
132
Great paper on how UX can contribute throughout the AI lifecycle - from stakeholder research and ideation to creating design interventions and evaluation. Authors: @mihaela_v @QVeraLiao @HariSubramonyam and @lgw_ .Link:
0
14
111
Introducing H2ODanube3 from @h2oai - a series of small language models consisting of : . 📊 H2O-Danube3-4B: Trained on 6T tokens. 📊 H2O-Danube3-500M: Trained on 4T tokens. 📚These models were pre-trained on high-quality web data in three stages with different data mixes. After
3
25
109
Ever wonder how well Large Language Models (#LLMs) represent global perspectives? The recent @AnthropicAI study sheds light on some fascinating findings. • It turns out that LLMs seem to echo the views of specific groups, particularly those from the USA and certain European
5
22
109
New to Retrieval-Augmented Generation (RAG) and don't know where to start? @helloiamleonie has got you covered with her excellent collection of tutorials on this topic. In this collection, you'll find my articles related to. ✨ RAG Paradigms: Naive RAG, Advanced RAG, Modular
6
11
104
A very informative playlist of short bite-sized videos on #ethics in #machinelearning by @math_rachel. They have been curated from her excellent lecture on ethics in the @fastdotai series. 📺Playlist: 📰Blog:
0
33
102
A compilation of some of the advanced or rather interesting plots in @matploltib which could take our analysis a notch above. #dataviz .
1
16
94
I have interviewed a lot of Kagglers in the past. It feels nice to be on the other side of the fence for a change. Thanks, @wendykan and the @kaggle for getting my story out. Happy mother's day.💞.
A Mother's Day interview between Kaggle Master and career-changing mama @pandeyparul, and Kaggle's own Head of Data Analytics, @wendykan. 💞#womenindatascience #WiDS . [READ]
4
9
89
Samvaad is the latest open-source initiative from @SarvamAI showcasing a rich collection of datasets specifically tailored for India. This release features 100,000 high-quality, multi-turn conversations, amassing over 700,000 turns, available in English, Hindi, and Hinglish.
2
12
92
Render Interactive plots with #Matplotlib via different backends 📊📉#dataviz. 1⃣The 'nbagg backend' can be enabled via the ' %matplotlib notebook' command in the #Jupyter notebook. The interactive features include pan, zoom, and auto-updating the figures from other cells.
2
11
85
For people following the @fastdotai 's #Fastbook sessions and otherwise too, the following articles by @waydegilliam are a great supplement👌.
2
15
87
📣 Excited to announce that our book on #MachineLearning for High-risk applications has finally reached the finish line 🏁 and it is going into production! It was a mammoth task, but the final draft is now complete. Pre-release on @OReillyMedia
6
14
78
I am currently reading ‘Practical Guide to Applied Conformal Prediction in Python,’ by @predict_addict, and while I’m in the initial chapters, I was drawn to the last chapter on Handling Imbalanced Data — a critical aspect of machine learning. 🎯 The chapter provides some great
5
15
83
Efforts are already underway to replicate #Devin through the power of the open-source community, aiming not only to emulate but also to enhance and innovate upon the original project. #opensource .
Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is
2
10
79
Yesterday, I gave a presentation in @Kaggle_Days #Hyderabad. Just converted my presentation into a blog post - This article is a compilation of my learnings when it comes to writing effective @kaggle #notebooks. @MeganRisdal.
2
18
75
The short course on Large Language Models with Semantic Search course, offered by @DeepLearningAI and taught by @JayAlammar and @SerranoAcademy is an excellent way to learn how to leverage LLMs into search. This course, which doesn't take more than an hour to complete, is just
2
10
73
This "Zero to MVP" is a great course not only on how to use #Weaviate but, in general, a great 101 on vector databases and searches as well. I really like the flow and the short, bite-sized videos. Link:
4
22
64
The second(and the final) article in the 'Advanced Plots in @matplotlib ' Series. This version covers the following: ✳️ Event Plots.✳️ Timeline plots.✳️ Bar of Pie.✳️ Cyberpunk style 🤘.✳️ automatic label placement.#dataviz.
2
13
63
Really like the new features for dataset search in the @huggingface Dataset Hub! For instance, you can now filter by:. • Modality.• Size.• Format.• Library. The best part is that all these new features can be combined with existing filters, making it so much easier to find
0
11
58
#IndicVoices - a dataset of natural and spontaneous speech containing a total of 7348 hours of read (9%), extempore (74%), and conversational (17%) audio from 16,237 contributors. ✨This dataset spans 145 districts across India 🇮🇳and encompasses 22 languages. Link:
1
11
60
Gemini 2.0 Flash has a 1M token context window. Since 1M tokens can be hard to visualize, @DynamicWebPaige nicely puts it into perspective with examples of what 1-10M tokens can represent. This is from the @weights_biases latest course on LLM evals.
2
15
62
has recently launched #TheLearningCenter, where you can sign up for #AI and #machinelearning courses for free. Also, don't forget to share your accomplishments with the community via the H2O badges. Link to the learning Centre:
2
25
59
Having a blast creating games with Gemini Coder using @_akhaliq 's AnyChat @huggingface space! Sharing a few here 😊 They turned out surprisingly good with very little prompt engineering—great for a first iteration and can be polished further. 1. Fractions Feast - to help kids
2
11
62
Tried out ai-gradio - a Python library from @_akhaliq that makes deploying AI apps easy using the Gradio interface. It integrates seamlessly with major model providers and works within @GoogleColab too. try. Below is a demo of using DeepSeek-V3 for chat via @hyperbolic_labs
5
12
53
Glad to become 3x @kaggle Grandmaster today with the latest one being in @KaggleDatasets. It was a very good learning experience collecting, curating and maintaining datasets. Thank you all!
3
3
50
In the rapidly evolving fields of #GenerativeAI and Copyright law, it’s increasingly important to have a grasp of both domains. As these disciplines become ever more intertwined, understanding the key concepts in each is crucial to navigating their intersection successfully.
1
7
48
— is a #WebScraping Sandbox website created for practicing web scraping. There are two subdomains where you can practice either scraping a fictional #bookstore or a site that lists quotes from famous people
0
12
44
See you all today. Details below. Also, i'll be open to questions on my journey in data science and #kaggle .
DON'T MISS THIS TODAY!. Parul is the first Indian woman to achieve the title of Kaggle Notebooks grandmaster and in this very special episode of Talks, she will talk about Data Science, Diversity & Kaggle. LIVE: Google Calendar:
1
4
46
A Round-up of 20 Exciting LLM-related Papers by @seb_ruder . Sebastian has done an incredible job in sifting through 3586 papers to bring us a curated selection of 20 standout #NLP papers from #NeurIPS2023 .Here's a quick glimpse into the main trends that are defining the future
0
11
44
There’s an interesting new course on @DeepLearningAI on building Long-Context AI Apps with Jamba, by @AI21Labs . If you’re curious about how Jamba architecture, I'll highly recommend this course.
4
8
41
#FACTSCORE - a novel evaluation method that breaks down a generation into atomic facts, calculating the percentage of facts supported by a reliable knowledge source allowing a more fine-grained evaluation of factual precision in #LLMs.
2
13
38
Zero-Shot Tokenizer Transfer (ZeTT).Authors : @bminixhofer @PontiEdoardo @licwu.TLDR: ZeTT allows language models to swap tokenizers on the fly; by training a hypernetwork to predict embeddings for new tokenizers, ZeTT maintains performance with minimal training. This approach
0
10
41
Explore the world of Large Language Models (#LLMs) with ease. Check out my latest blog on @TDataScience featuring visual tools and articles, simplifying complex LLM concepts. The guides include some well-known resources and also some new ones. Hope you find it useful. · 1.
2
10
38
I'll be posting a series of #ShortArticles on finding datasets for data analysis. This series is targeted towards #beginners looking to create an end to end project starting with #Data acquisition.
0
7
40
I will be speaking at @ODSC's next #Hyderabad meetup. It'll be a fun and interactive session where we shall discuss some interesting tips and libraries that can aid in the #dataanalysis and EDA process. Signup here:
2
4
38
The hackathon being organised as part of our H2O Open Source #GenAIWorld event is now live on @kaggle. 🎯 Objective.Can you detect which out of 7 potential #LLM models produced a specific output? Each model comes with its distinct features. Challenge yourself to identify the
1
15
39
This is pretty Cool. Here is a personal blog site inspired by the Netflix-style layout made using Gemini coder.
Gemini coder can now accept image uploads in anychat. example shows adding snow to google main page. using the ai-gradio integration, its only a few lines of code for developers to setup their own gemini coder apps. try it out
1
3
38
An Interesting paper that reveals the latest trends in LLM research on arXiv. The authors analyzed 388K papers from CS and Stat arXivs, focusing on shifts in publication patterns between 2023 and 2018–2022. Lots of great insights. Authors : @rajivmovva , @sidhikab1 , @kennylpeng
0
11
38
Converted my submission to the 2019 @kaggle ML & DS survey challenge into a blogpost to make the insights available to a wider audience @TDataScience @h2oai.
0
18
38
Towards AI Accountability Infrastructure: Gaps and Opportunities in AI Audit Tooling.Authors : @OjewaleV , @ryanbsteed , @brianavecchione, @Abebab, @rajiinio. TLDR: Despite many tools designed for setting standards and evaluating AI systems, these often fall short in practice for
0
9
37
Expanding on my previous post, I wanted to share another valuable resource by @hima_lakkaraju,@nazneenrajani & @kkenthapadi titled "Generative AI meets Responsible AI." . This informative talk was presented during #FAccT2023 and features an impressive 133 slides that delve into
1
9
36
I wrote a pretty basic notebook on Analysing #TimeSeries ⌛️data with #pandas on @kaggle. Nothing too advance but people who are just getting started with Time Series may find it useful.😀
1
6
36
This course by @pranavrajpurkar came out last year, and while techniques can age, what really stands out is its focus on rarely taught skills—like reading research papers, generating ideas, presenting them effectively in slides, project management and team communication.
1
8
36
Create hassle-free subplots in @matplotlib using the subplots_mosaic function, which provides a convenient interface for visually laying out your axes - via @TDataScience
Simplifying subplots creation in Matplotlib by @pandeyparul
0
4
34
It’s heartening to see the remarkable work being done on advancing Indian languages in the AI space. The release of Navarasa-2.0, a Gemma 7B/2B instruction-tuned model, is another such stellar example. Kudos to @ravithejads and @ramsri_goutham for this initiative uplifting our
2
1
34
I used #Python's #Folium library to analyze the regions which have been affected by the #CoronavirusOutbreak (till 30th'Jan). link to @Kaggle notebook(. Hope to add more info in the coming days.
4
7
33
In my new article, I have compiled some new features and enhancements that have been added to Google's #Colaboratory Notebooks .1. Smart Data Pasting from Google Sheets.when users paste data into an empty code cell, Colab automatically generates code to create a Pandas Dataframe.
1
7
31
Great short course by @DeepLearningAI that covers everything about working with the @AnthropicAI API, culminating in a computer-use agent demonstration at the end.
2
10
32
Brilliant Overview of Vector Search by @victorialslocum - clear, concise, and to the point. Also has some good use cases for vector search.
2
4
31
I will be joining @amaarora for the next #FastBook session this Thursday at 1pm AEST(8:30 am IST) for a guest lecture. Here is a post I wrote to summarize what I intend to cover 👇.
Here's a wonderful article by @pandeyparul - "Building a compelling Data Science Portfolio with writing"!. Having a good data science portfolio not only helps the people around you but also is great for your own personal growth! . 1/.
0
6
32
Based on my preliminary look, the chapters are very well written, and the content looks solid. The inclusion of exercises and challenges at the end of each chapter is a nice touch to test understanding and reinforce learning 🎯. So looking forward to the official release.
Excited to share the latest early release of the early release of our book - Hands-On Generative AI with Transformers and Diffusion Models 🔥. This release includes new chapters on ML for Audio, fine-tuning LLMs, and fine-tuning SD. Enjoy!.
1
2
32
Here is a great course on Explainable Artificial Intelligence (#XAI)! it is great for people who want to understand the following:. 🔍 Discover the power of transparency in complex machine-learning models. ⚖️ Explore the difference between interpretable models and post-hoc
0
9
31
Shared @KaggleDatasets of #COVID related clinical studies being conducted worldwide : Dataset curated from . Has some great insights to offer wrt the current phases of trials.
0
6
30
A wonderful resource to building models in a responsible way in a comic book format: “The Hitchhiker’s Guide to #Responsible Machine Learning” 📚✨ . 🔢 Divided into 10 sections, the book provides step-by-step instructions for creating and testing models with a strong emphasis on
1
4
28