Jay Chia - getdaft.io @JayChia5 profile

Jay Chia - getdaft.io

@JayChia5

Followers

305

Following

143

Statuses

364

Cofounder @ Eventual. Works on Daft (https://t.co/i5vV81AuTj) the Distributed Python Dataframe. LESS OOM MORE ZOOM

San Francisco, CA

Joined August 2022

Don't wanna be here? Send us removal request.

Jay Chia - getdaft.io

@JayChia5

11 months

Late night rant: Spark is an awesome piece of software. But a horrible developer experience. What happened to OSS that was simply `apt install` and 🚀? Why should software be excused for slow local performance because it was built for "production scale"? So much of "big data" JVM-based tooling was hacked together on the giant datacenters of tech giants. The world has changed, and so too must our big data tooling. ⭐️ Rust: self-contained compiled native binaries that have no dependencies. Hello, clean installs, my old friend. 🐍 Python: the undeniable winner of iterative plumbing for data/ML. Build with a Python API in mind. Using the JVM through a Py4J gateway should be an automatic disqualification. ☁️ Cloud: Build cloud-first, lightweight, ephemeral software. Cattle vs Pets. S3, not NFS/HDFS. Spot instances, not machines on a rack. 🤓 Dev UX: build for the single developer, on their laptop, then think about scaling. A docker-compose local dev story is lazy bundling of overly complicated software. ☀️ Open Formats: let software TALK to each other, so devs can choose the right tool for the right job, and so devs can keep building better tooling. This is why JSON is awesome. Arrow is awesome. Iceberg is awesome. Parquet and CSV are (I begrudgingly admit) somewhat awesome. And please build flexible SDKs for these formats, in C++ or Rust, not just for the JVM.

0

1

17

Jay Chia - getdaft.io

@JayChia5

1 day

@criccomini @continuedev @cursor_ai @daft_dataframe Yes! Was shocked at how good it was at GitHub Actions specifically. I guess there’s a ton of training data out there that looks really similar since there’s just a finite set of ways to configure actions.

1

0

4

Jay Chia - getdaft.io

@JayChia5

7 days

@haro_ca_ Plenty! Unstructured data clustering, GPU model batch inference, running Python UDFs (efficiently), dataset curation on unstructured data, video ingestion/indexing... Hint hint: Daft can do all that, and we're going to get a SICK data warehouse with daft as the engine :)

2

0

4

Jay Chia - getdaft.io

@JayChia5

7 days

@Ubunta Why do you think tools like DuckDB are only discussed over lunch, but organizations are still using the big expensive data warehouses?

2

0

1

Jay Chia - getdaft.io

@JayChia5

8 days

@mim_djo @daft_dataframe @duckdb Disk cache -- for repeated access of the same version of an iceberg/delta table?

1

0

2

Jay Chia - getdaft.io

@JayChia5

9 days

Beautiful dev documentation examples: OpenAI's docs: FastAPI: And of course, Daft:

0

1

Jay Chia - getdaft.io

@JayChia5

21 days

Daft cooks. We ate. Get yourself a CEO like @Sammy_Sidhu who can cook for the entire team??

0

2

11

Jay Chia - getdaft.io

@JayChia5

1 month

New year 2025 and Spark still makes me sad Had a 30m back-and-forth conversation with Claude to figure out wtf is s3:// vs s3a:// vs s3n:// Also all the magical `--confs` that need to be added to get this stuff working Jeez.

1

0

2

Jay Chia - getdaft.io

@JayChia5

2 months

We’ll be building some blogposts in the new year to talk about benchmarks and use-cases! Some good ones off the top of my head: - streaming ML dataloading (URL downloads into image decoding/tensors) - interactive data science — .show() is significantly snappier - ETL: much better memory usage/stability

0

Jay Chia - getdaft.io

@JayChia5

2 months

We expect some instabilities initially (a BIG thank you to our beta testers!) but by an large you should see a big improvement in both the user experience and memory! HAPPY HOLIDAYS all, from the Daft team 🤗🤗🤗

0

2

Jay Chia - getdaft.io

@JayChia5

2 months

@mim_djo Cool. How would something like Daft we able to interact with PowerBI? Is this JDBC/ADBC or something similar?

1

0

1