Yingjun Wu | Building Data Infra @YingjunWu profile

Yingjun Wu | Building Data Infra

@YingjunWu

Followers

3,280

Following

901

Media

243

Statuses

1,496

Founder @RisingWaveLabs . Database, stream processing, event-driven architecture. Previously @awscloud Redshift, @IBMResearch Almaden. PhD @NUSingapore @CMUDB .

https://t.co/QuVRonhK2F

San Francisco, CA

Joined February 2011

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Kamala • 2261369 Tweets

hobi • 606057 Tweets

Argentina • 584483 Tweets

Bret • 528591 Tweets

hoseok • 522941 Tweets

jhope • 409800 Tweets

WELCOME BACK J-HOPE • 409134 Tweets

Fox News • 293942 Tweets

seokjin • 284383 Tweets

García Luna • 268369 Tweets

ブロック • 222827 Tweets

maya • 170463 Tweets

2seok • 135352 Tweets

Palermo • 93875 Tweets

DE LA MANO DEL SEÑOR • 91804 Tweets

Sabrina • 85040 Tweets

Liam Payne • 84170 Tweets

#AgathaAllAlong • 53832 Tweets

Bluesky • 42858 Tweets

LINGORM PANTENE VOGUE TRIP • 40961 Tweets

ミュート • 34671 Tweets

#AEWDynamite • 34657 Tweets

スーパームーン • 34365 Tweets

#ラヴィット • 31277 Tweets

#OneDirection • 30267 Tweets

Lonzo • 25201 Tweets

イーロン • 20070 Tweets

Lynx • 18892 Tweets

O Vasco • 12403 Tweets

Buehler • 10951 Tweets

ジンくん • 10222 Tweets

ホビたん

Sabres

Carille

Severino

Lilia

Shelton

Luciano

Breanna Stewart

Alvarez

Lucas Moura

ブルースカイ

Paiva

Marinho

タイッツー

ブルスカ

Stewie

Maicon

Coutinho

#WNBAFinals

Last Seen Profiles

@chinjereol

@ltoya1978

@HiColBsb

@joearrigofsm

@ATIFRAUF4747

@fjr982121029

@52aaaed

@snughho

@Ao_its_Moi

@AtticusOrsborn

@p_4all

@elmer19400

@NursesSikh

@WarungJav

@EEAA1818

@CanFly128

@3Z4Z3

@zOaCWJWad516Muh

@gnuimpersonator

@Coach_Orta

Pinned Tweet

Yingjun Wu | Building Data Infra

@YingjunWu

2 months

I’ll probably launch a new product later this year. Build in public. To developers. Still data infra. But with AI. Incredibly exciting time to run startups. Follow me. stay tuned.

3

0

15

Yingjun Wu | Building Data Infra

@YingjunWu

11 months

New data engineering trend (?): Several companies have told me that they are moving away from Kafka to S3 for message queuing use cases. The reason is that they think Kafka is too expensive, and it's not worth running Kafka instances just for system decoupling or connection.

28

40

314

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

We are officially a series-A startup now! 🎆🎆🎆 With $36M from the series-A round, we will accelerate the development of our flagship product, RisingWave Cloud, which is now open for private preview! Press release:

RisingWave Labs Secures $36M in Series A Funding for its Stream Processing Platform

Former AWS and IBM database engineer builds a company to deliver a simple, affordable cloud-native streaming database for real-time application development...

www.globenewswire.com

8

172

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

It was my paper published in VLDB 2017. The original title was "This is the Best Paper Ever on In-Memory Multi-Version Concurrency Control." We changed the title 3 times as the chair threatened to desk reject our paper 🙃

Murat Demirbas (Distributolog)

@muratdemirbas

2 years

[new blog post] An Empirical Evaluation of In-Memory MultiVersion Concurrency Control (VDLB'17)

0

6

29

2

14

172

Yingjun Wu | Building Data Infra

@YingjunWu

1 month

Postgres, Kafka, Iceberg. Build everything around them. That’s what I’m betting on heavily.

4

11

167

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

I recently wrote a new blog () to share my thoughts about stream processing. While having been working on stream processing for 10+ years, I still feel I am pretty new to this domain. I am still learning, and any comments are greatly appreciated!

0

8

106

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

It's already 2023, and RisingWave will enter its 3rd year. RisingWave, has grown from a personal project to one with 3.7K stars and 100+ contributors. I summarized RisingWave's 2022 here in a blog: . Any comment is welcome, and happy new year!

1

9

89

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

The landscape of vector databases. Did I miss anything? @trychroma @qdrant_engine @milvusio @zilliz_universe @weaviate_io @pinecone @elastic @Redisinc @RocksetCloud @SingleStoreDB @PostgreSQL #vectordatabase

22

23

108

Yingjun Wu | Building Data Infra

@YingjunWu

8 months

Who uses @duckdb for real? Very interesting discussion. Seems that DuckDB is gaining widespread popularity in the data science domain. Can we simply use SQL (instead of Python, like Pandas) to do data science???

From the dataengineering community on Reddit

Explore this post and more from the dataengineering community

www.reddit.com

4

9

104

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

Stream processing + OLAP = true real-time analytics 🚀🚀🚀 @RisingWaveLabs @ClickHouseDB

2

5

98

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

The stream processing landscape. I hope I've included all the major systems 🙂

13

23

90

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

Excited to meet great people from @Decodableco @immerokcom @DeltaStreamInc @VervericaData @SingularityData @streamnativeio @redpandadata at @FlinkForward conference! The streaming market is booming💥and full of opportunities! Hope all the streaming companies grow and thrive!

1

2

79

Yingjun Wu | Building Data Infra

@YingjunWu

3 years

Whether you read job descriptions carefully when applying for jobs?

Yes

172

No

50

Sometimes

63

0

2

74

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

My "RisingWave vs Flink" blog is out: ! Our marketing manager told me she has edited my blog for better SEO, but I replied to her that people will read my blog if we use the meme below 👇👇👇 @SingularityData @ApacheFlink #streamprocessing #OpenSource #Rust

3

2

76

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

The landscape of the vector database market👇Pick your favourite vector DB from here! @trychroma @marqo_ai @vespaengine @qdrant_engine @lancedb @milvusio @OpenSearchProj @ClickHouseDB @PostgreSQL @cassandra @weaviate_io @pinecone @elastic @Redisinc @RocksetCloud @SingleStoreDB

7

11

76

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

You know why I am here @SIGMODConf . Talk to me 😝 #SIGMOD22 #sigmod2022

1

3

76

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

People are talking about data lakes and lake houses these days. At @RisingWaveLabs , we've put lots of effort into integrating with data lakes. Here's an exciting project we are working on: . All projects written in Rust will soon enjoy better integration

GitHub - apache/iceberg-rust: Apache Iceberg

Apache Iceberg. Contribute to apache/iceberg-rust development by creating an account on GitHub.

github.com

3

15

71

Yingjun Wu | Building Data Infra

@YingjunWu

10 months

Register here -> Topics covered: ✅ BYOC vs. managed cloud; ✅ Open data lake format; ✅ S3 Express and S3 as the primary storage; ✅ Transition from batch to streaming; ✅ many others! See you on this Thursday at 9am PST!

4

0

52

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

Any idea about the in-production use cases of DuckDB? Disclaimer: I'm also fan of @duckdb 🙂

From the dataengineering community on Reddit

Explore this post and more from the dataengineering community

www.reddit.com

7

10

61

Yingjun Wu | Building Data Infra

@YingjunWu

8 months

Self-hosted coding copilot: . Well, open source is eating software!

GitHub - TabbyML/tabby: Self-hosted AI coding assistant

Self-hosted AI coding assistant. Contribute to TabbyML/tabby development by creating an account on GitHub.

github.com

1

4

60

Yingjun Wu | Building Data Infra

@YingjunWu

3 years

RisingLight () is open-sourced! It is an OLAP database built by a group of talented students (initiated by @Cat99Vegetable , our company's previous intern, now PhD at @UMassAmherst ) with the aim of helping people learn OLAP database internals using Rust! ...

GitHub - risinglightdb/risinglight: An educational OLAP database system.

An educational OLAP database system. Contribute to risinglightdb/risinglight development by creating an account on GitHub.

github.com

2

17

49

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

I caught @andy_pavlo and tried to convince him to rewrite @OtterTuneAI using @rustlang . The new @CMUDB course should also use Rust to rewrite BusTub (). IMHO every single C++ database should be rewritten in Rust, just as what we did @SingularityData 😝

2

1

51

Yingjun Wu | Building Data Infra

@YingjunWu

7 months

If you are new to stream processing, or want to understand how to use SQL to write streaming applications, then you may be interested in this repo: . No BS - just runnable code. No cluster required - works on laptop 😀😀

GitHub - risingwavelabs/awesome-stream-processing: A collection of demonstrations showcasing how...

A collection of demonstrations showcasing how stream processing can be used to solve real-world problems. - risingwavelabs/awesome-stream-processing

github.com

0

9

47

Yingjun Wu | Building Data Infra

@YingjunWu

2 months

No matter how the data infra world evolves, three things will always remain constant, and I always bet on them: * Postgres * Kafka * Iceberg How are they connected? They are all open standards and essential building blocks for data persistence. Any system should be designed to

4

1

48

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

S3 is the universal storage layer for modern data sytems, and RisingWave () is the #1 stream processing system built on top of S3. I won't change my mind.

GitHub - risingwavelabs/risingwave: Best-in-class stream processing, analytics, and management....

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streami...

github.com

Davis Treybig

@TreybigDavis

1 year

S3 is increasingly becoming the default storage layer for cloud infrastructure. I wrote notes on this trend, its benefits, its challenges, its early adopters, and the opportunity it presents for new startups to disrupt large infrastructure categories

16

34

199

1

7

47

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

. @ClickHouseDB is one of RisingWave’s best friends in the OLAP domain. There’s a cool project called chDB () that embeds ClickHouse to applications. Small data is the new trend, and I believe more and more cool projects will emerge in this space. Excited!

GitHub - chdb-io/chdb: chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse

chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse - GitHub - chdb-io/chdb: chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse

github.com

0

11

45

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

Just talked to our engineers and decided to change our slogan in GitHub: 🚀 SQL stream processing with @PostgreSQL -like experience. 🪄 10X faster and more cost-efficient than @ApacheFlink . Announcements and reports coming soon. GitHub: .

GitHub - risingwavelabs/risingwave: Best-in-class stream processing, analytics, and management....

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streami...

github.com

2

7

46

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

Just came back from Kafka Summit London 2023. It was a great event that brought together thousands of data enthusiasts. I wrote a blog describing my takeaways from #kafkasummit : . TLDR: * Cost efficiency is becoming the key thing. @redpandadata and

This link will take you to a page that’s not on LinkedIn

lnkd.in

0

5

40

Yingjun Wu | Building Data Infra

@YingjunWu

9 months

The purpose of distributed systems has changed drastically over the last 2 decades. When MapReduce first emerged, the need for dist. systems was to get better perf - single node wasn't powerful enough. But now, in the cloud era, we we can easily rent machines with big DRAM from

4

2

40

Yingjun Wu | Building Data Infra

@YingjunWu

2 months

I actually don't like the idea of Continuous Queries ( @googlecloud BigQuery), Dynamic Tables ( @SnowflakeDB ), and Delta Live Tables ( @databricks ). It's not because the technology (which is stream processing) is wrong, but because stream processing in most cases still belongs to

4

3

38

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

I keep wondering why @duckdb has suddenly become so popular. While everyone is advocating for big data, how is it that DuckDB, a single node OLAP database, is gaining so much traction in 2020s? Can anyone explain? cc @motherduck @PuffinDB @BoilingData @RillData

11

4

37

Yingjun Wu | Building Data Infra

@YingjunWu

11 months

Data infra trend in 2024: - all Kafka vendors want to replace data lake; - all data lake vendors want to replace Kafka 😜

2

36

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

Seems @ClickHouseDB now natively supports vector search:

Vector Search with ClickHouse - Part 1

Read about how Vector Search is supported in ClickHouse with the first part of this series introducing key concepts and applications

clickhouse.com

0

4

33

Yingjun Wu | Building Data Infra

@YingjunWu

4 months

. @RocksetCloud was one of the few companies I interviewed with just after graduating from grad school. At that time, Rockset was a tiny 10-person company, and that's why I knew the Rockset founding team pretty well. I could never have imagined two things: 1️⃣ It would

1

5

33

Yingjun Wu | Building Data Infra

@YingjunWu

7 months

Data infra trend: - @confluentinc : managed Kafka ➡️ event streaming platform - @databricks : managed Spark➡️ data lakehouse + AI platform - @elastic : managed ElasticSearch ➡️ search + analytics platform Data infra vendors are moving to the SaaS layer. Data product is the future.

1

8

32

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

Best ad today: @databricks @SnowflakeDB

3

4

31

Yingjun Wu | Building Data Infra

@YingjunWu

3 years

I am working on the "RisingWave vs Flink" blog today, and surprised to see RisingWave is the #1 trending Rust project this week! If you haven't heard of #RisingWave , you should definitely check it out: ! @SingularityData @RustTrending @rustlang #OpenSource

2

1

31

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

I swear you don't want to miss AIDB workshop @VLDBconf 2022 this year. Keynote speakers this year: @tim_kraska @matei_zaharia @feifei_initiald @Hippotas . They'll let you know how ML is changing the database system designs in real world! More info: .

0

2

32

Yingjun Wu | Building Data Infra

@YingjunWu

3 years

The fun part of working at a startup is that we can always learn sth new from our daily work. From a candidate I got to know that #Rust 's energy consumption is 75X lower than Python, actually even lower than C++! We are essentially building an energy-efficient database! #rustlang

2

6

30

Yingjun Wu | Building Data Infra

@YingjunWu

3 years

Finally! We open sourced RisingWave, the streaming database designed for the cloud! A big milestone for @SingularityData ! #RisingWave 's goal is to democratize stream processing: stream processing must be made simple, affordable, and accessible, for everyone!

RisingWave

@RisingWaveLabs

3 years

💥💥💥We are excited to announce the open-sourcing of #RisingWave , a cloud-native streaming database that uses SQL as the interactive interface, at @github today! 💥💥💥 #RisingWave #Database #StreamProcessing #CloudNative #opensource #opensourcecommunity #opensourcesoftware

1

17

87

1

2

29

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

🎉One year ago today, on April 8th, 2022, we open sourced RisingWave, a distributed SQL streaming database. It's been a journey filled with challenges, lessons, and accomplishments. As we celebrate RisingWave's 1st birthday, let's reflect on some milestones: 🚀 First production

2

1

29

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

That's a SUPER COOL feature!!! Let me try to understand: so @redpandadata 's tiered storage data will be written and read using Iceberg format, correct? That essentially means, Redpanda will be evolving to a streaming lake house, correct? @emaxerrno

Yaroslav Tkachenko

@sap1ens

1 year

1/4 There was no big announcement, but @emaxerrno mentioned something big during a Twitter Space yesterday. @redpandadata will add support for reading their Tiered Storage data with Iceberg format 🤯. This is HUGE.

3

12

48

3

4

29

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

An interesting move by IBM: . Two things worth noting: 1) IBM, known for its heavy investment in Apache Spark (they even had Spark Technology Center in San Francisco), has now acquired Ahana, a SaaS for Presto; 2) the investment seems to have been

IBM joins the Presto Foundation through acquisition of Ahana

Today we’re thrilled to share that IBM has acquired Ahana, the venture-backed SaaS for Presto startup company, and we want to write more about our belief

prestodb.io

2

28

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

🌐 Weekend Dive: Discover the power of deterministic simulation for testing your distributed systems! 🚀 Unlock insights here:

Deterministic Simulation: A New Era of Distributed System Testing (Part 1 of 2) - RisingWave:...

This article discusses the background and principles of deterministic simulation, introduce our deterministic testing framework Madsim, and share our experience applying deterministic testing to...

risingwave.com

2

5

27

Yingjun Wu | Building Data Infra

@YingjunWu

3 years

We are hosting the 4th Int'l Workshop on Applied AI for Database Systems and Applications (AIDB 2022) in this year's VLDB. If you are a database/AI person, please do consider submitting a paper here! website: . @vldb2022 #Database

1

8

27

Yingjun Wu | Building Data Infra

@YingjunWu

9 months

some fun topics I'd love to discuss: - is big data really dead? @duckdb - vector search as a plugin or a database? - scale out vs scale up? ... Next Thursday!

RisingWave

@RisingWaveLabs

9 months

Join @nikitabase from @neondatabase , @ryguyrg from @motherduck and @YingjunWu as they discuss key database trends to look out for in 2024. ▶ What's the future of vector databases? ▶ Is Postgres becoming the database lingua franca? and more... Sign up:

0

6

21

1

5

27

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

One of the most important things I learned from @CMUDB is that a great DBMS must support storing emoji 👻 I bore that in mind when designing #RisingWave (). Now everyone can use it to do stream processing over their favorite emojis with low latency! 💩😊💩

2

0

26

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

David Maier () was the person that lead me into the stream processing world. He advised me on stream processing research during my PhD. Many years later, I founded my own startup @SingularityData focusing on stream processing. Thank you, Dave! #sigmod2022

1

27

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

In the field of stream processing, the performance and usability of @ApacheFlink have always been a widely discussed topic 🔥🔥🔥! That's why @AlibabaGroup , the world's largest investor in the Flink community, re-implemented Flink internally using cloud-native architecture,

6

14

26

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

Our Singapore 🇸🇬 meetup was a huge success! Kudos to the teams at @RisingWaveLabs @Alluxio @Onehousehq @ShopeeSG , and really eager to see how open source will eat the world! #Singapore #OpenSource

0

3

25

Yingjun Wu | Building Data Infra

@YingjunWu

2 months

S3 is the biggest opportunity in today’s data infra world.

2

3

26

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

Fun fact: the co-authors of the paper have launched three startups: Ran Xian founded Metabit Trading in 2019; @andy_pavlo founded @OtterTuneAI in 2020; I founded @SingularityData in 2021. @CMUDB is truly amazing!!!

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

It was my paper published in VLDB 2017. The original title was "This is the Best Paper Ever on In-Memory Multi-Version Concurrency Control." We changed the title 3 times as the chair threatened to desk reject our paper 🙃

2

14

172

0

2

25

Yingjun Wu | Building Data Infra

@YingjunWu

9 months

Stream processing will be made **ultimately** easy. All you need is just a single node: . TL;DR: "DuckDB for stream processing" coming soon! @RisingWaveLabs #risingwave #streamprocessing #database

Add RisingWave single_node mode · Issue #14895 · risingwavelabs/risingwave

Motivation Make it easier for users to setup and operate a RW cluster. Download Redirect this url to an install script in the risingwave repository. curl https://risingwave.com/sh | sh The script w...

github.com

1

3

26

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

Busy day for RisingWave! #DataAISummit

0

25

Yingjun Wu | Building Data Infra

@YingjunWu

3 years

To PhD candidates: TREAT YOUR PHD THESIS SERIOUSLY! I talked to 5 candidates yesterday and 3 of them asked me about my thesis! Candidates don't give a s**t to your startup if you don't treat your own business seriously! Thanks to everyone who pushed me forward during my PhD 😊

0

2

25

Yingjun Wu | Building Data Infra

@YingjunWu

11 months

If a database vendor claims that they can beat Oracle because they have better technology, I am pretty sure it will be a failure. Oracle is the dominator not because they have the best tech, but because they have the best customer service, best channel, and the best ecosystem.

Sai Srirampur

@saisrirampur

11 months

@sv_techie @YingjunWu @CockroachDB @Yugabyte @PingCAP @PlanetScale A few that come to my mind: 1. Postgres is not yet ready for tier 0 and tier 1 enterprise grade workloads, where even 1 second of downtime is costly. More foundational features such as faster failovers, reliable and simple to setup/manage active-active (across region) etc. are

2

1

16

2

24

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

Morning Berlin! w/ @Al_Grigor

1

25

Yingjun Wu | Building Data Infra

@YingjunWu

11 months

While @ApacheIceberg and @apachehudi are in a hot war, @ApacheFlink and @RisingWaveLabs are enjoying love and peace ☮️☮️☮️😇😇😇

1

24

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

They asked me for my fav emoji 👀

2

0

23

Yingjun Wu | Building Data Infra

@YingjunWu

10 months

I’m a database guy. I will die for databases.

Kaivalya Apte - The Geek Narrator

@thegeeknarrator

10 months

Ten things to understand about your database: 1) High level Architecture 2) How writes work? (Replication, data distribution, internal organisation etc) 3) How reads work? (Consistency guarantees, tuning options, etc) 4) CAP theorem, ex. CP or AP 5) Transactions and Concurrency

11

89

460

1

24

Yingjun Wu | Building Data Infra

@YingjunWu

2 months

𝐇𝐨𝐰 𝐌𝐮𝐜𝐡 𝐃𝐨 𝐘𝐨𝐮 𝐏𝐚𝐲 𝐟𝐨𝐫 𝐘𝐨𝐮𝐫 𝐂𝐥𝐨𝐮𝐝 𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞 𝐒𝐞𝐫𝐯𝐢𝐜𝐞𝐬? While many cloud databases market themselves as low-cost options, a closer examination of their configurations and pricing reveals that prices across different vendors tend to fall

2

3

23

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

I wrote a blog on why Kafka is the new data lake: . I received lots of criticism after publishing it, and people argue that @apachekafka is just for streaming, and it’s at most a “data river”. I won’t change my view. I believe streaming platform vendors

Why Kafka Is the New Data Lake?

There is increasing evidence to suggest that Kafka is evolving into a new form of data lake.This article will explain why this evolution is occurring.

risingwave.com

2

5

23

Yingjun Wu | Building Data Infra

@YingjunWu

9 days

I’ve been working on RisingWave, the stream processing system, for over 3.5 years. During this time, we built everything from scratch, went through countless failed PoCs, and now have thousands of users processing event streaming data with RisingWave. But how are people using

1

4

24

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

Yesterday, we saw the launch of @warpstream_labs , a Kafka-compatible platform built on S3. No matter how good/bad their product is, here're some implications for the data infra space: * Innovation must still comply with established protocols (e.g., Kafka protocol); * Everyone is

3

2

22

Yingjun Wu | Building Data Infra

@YingjunWu

9 months

. @rustlang doesn’t just bring better performance; it empowers engineers to be 10x more productive when working on a complex, collaborative project. That’s the real reason we use Rust to build RisingWave for enterprise-grade stream processing. #risingwave @RisingWaveLabs

0

3

21

Yingjun Wu | Building Data Infra

@YingjunWu

9 months

We @RisingWaveLabs are pushing hard to enable developers to build stream processing solutions with high productivity and low cost. My TODO list for H1 2024: ⏹️Python interface ⏹️Standalone mode ⏹️Adaptive scaling ⏹️Unified stream&batch processing Let's see we can ship them!

1

4

20

Yingjun Wu | Building Data Infra

@YingjunWu

7 months

Some exciting projects are coming out pretty soon! 🤘🤘🤘 #risingwave #warpstream

0

4

22

Yingjun Wu | Building Data Infra

@YingjunWu

9 months

Lessons I Learned After running a startup for Three Full Years: People do not need a “perfect” product. Instead, they require a product that satisfies all the following three criteria: ✅ Addresses customers' pain points; ✅ Fits into customers' existing environments; ✅ Is

1

3

21

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

My friends in Singapore 🇸🇬: we are going to host the first physical meetup in our SG office (near Lau Pa Sat) on August 11. Our engineer will talk about the evolution of stream processing. Please join us if you are interested! link: . Dinner is served! 🇸🇬🇸🇬

1

3

21

Yingjun Wu | Building Data Infra

@YingjunWu

7 months

When I was at AWS Redshift (C/C++ codebase), I spend 2 months developing a feature, followed by 3 months testing and debugging it. SIG11 is always the nightmare. Modern C++ does have cool features like smart pointers, but when you work in a big team, you still have to suffer -

0

3

21

Yingjun Wu | Building Data Infra

@YingjunWu

10 months

RisingWave will soon make its metadata service pluggable 🔌🔌🔌. Right now, we use @etcdio to store our metadata service. But unfortunately, it's hard for us to make things right when supporting large workload. We have to find a solution. Instead of using classic services like

2

3

20

Yingjun Wu | Building Data Infra

@YingjunWu

3 years

#RisingWave was born in early 2021. I still remembered the day I resigned from @awscloud Redshift and started coding alone at my home office. I am super lucky to have dozens of engineers to join us reinventing stream procesing. Now it's time to open source the project! #startup

RisingWave

@RisingWaveLabs

3 years

🎉Such a big day for us at @Singularity Data! Today we are happy to share that we will open source #RisingWave on Friday, April 8th. 🎉 To learn more, please visit: #RisingWave #Database #StreamProcessing #CloudNative #SQL

0

17

171

0

2

21

Yingjun Wu | Building Data Infra

@YingjunWu

3 years

Stream processing technology is not black magic. I have no idea why companies pay so much hiring talented engineers simply for maintaining hard-to-use streaming systems. Modern streaming systems must be simple, affordable, and accessible! #RisingWave #Database #StreamProcessing

RisingWave

@RisingWaveLabs

3 years

Over the past few years, the amount of streaming data has grown rapidly and now constitutes an increasing share of the data tech teams have to deal with daily. To make their life easier, our talented engineers developed RisingWave. Stay tuned to learn more! #RisingWave #Database

0

1

19

0

19

Yingjun Wu | Building Data Infra

@YingjunWu

16 days

Glad to see more folks talking about @RisingWaveLabs , the stream processing system we've been working on for the last 4 years... Build, ship, repeat.. and with some luck, people will know your name one day 😀 (and no, we didn't hire any fake reviewers🙂

From the apachekafka community on Reddit

Explore this post and more from the apachekafka community

www.reddit.com

1

2

20

Yingjun Wu | Building Data Infra

@YingjunWu

9 months

When we first began building RisingWave, we used Calcite. But it turned out to be unsuitable, in terms of compatability, flexibility, and several other reasons. Now we are using our home-made optimizer to optimize streaming queries. I am not trying to persuade anyone that

Nikita Shamgunov

@nikitabase

9 months

Every new database engine requires a query optimizer. And it's just a TON of work. We couldn't get one off the shelf in the past - Apache Calcite was an interesting attempt, but it didn't translate to OLTP or native languages. Hopeful for this effort!

9

70

1

2

19

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

Came back from #Current22 . Takeaway: the stream processing area is booming 💥💥💥 2000 in-person + 5000 online attendees! Will write a blog post on what’s happening at Current22!

0

1

19

Yingjun Wu | Building Data Infra

@YingjunWu

11 months

We should tag @CockroachDB @Yugabyte @PingCAP @PlanetScale here. I agree with Nikita from technology wise, but disturbed OLTP may not be that “niche” from the TAM perspective 🙂

Nikita Shamgunov

@nikitabase

11 months

Database architecture thread. Technical. There has been several startups building an operational relational databases focused on OLTP with a shared nothing architecture. @neondatabase is using a different approach - shared storage. What's the difference?

12

68

430

2

4

20

Yingjun Wu | Building Data Infra

@YingjunWu

9 months

. @neondatabase is one of the most popular serverless PostgreSQL providers in the data world. I'd like to learn from their CEO @nikitabase about: 1) their views on vector databases; 2) PostgreSQL's position; 3) how to scale out PostgreSQL; and 4) many more!

Neon - Serverless Postgres

@neondatabase

9 months

Save the date for this Thursday! Join in to hear Neon's CEO @nikitabase discuss key database trends to look out for in 2024 along with @ryguyrg and @YingjunWu .

0

1

10

0

2

20

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

@RisingWaveLabs X @MaterializeInc !

0

19

Yingjun Wu | Building Data Infra

@YingjunWu

7 months

90% of RisingWave's business is b2b. We see the great value of community and decide to double down on it. RisingWave's free tier is coming next week. Developers can use stream processing technology for free.

Sam Lambert

@isamlambert

7 months

@ThePrimeagen 90% of our business is b2b. The avg customer pays us $271,000. The free tier is not our funnel. We’ve shed an immense amount of spending and gained infinite runway. Hundreds of millions MAU run on top of our tech. Can any of the database startups throwing shade claim anything

42

2

120

1

2

20

Yingjun Wu | Building Data Infra

@YingjunWu

3 years

Showan is presenting his EuroSys 2022 paper! This is one of my favorite papers in the stream processing area!

RisingWave

@RisingWaveLabs

3 years

@easyAbi (Boston University) will give a talk on how to build persistent KV stores tailored for stream processing systems. Zoom talk is open to public at 8:00am ET, Mar 24, 2022. Details:

0

3

8

1

5

20

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

“The world is becoming more transactional” - @jorandirkgreef is speaking at @QCon SF! @TigerBeetleDB #infoq #qconsf

0

3

20

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

Thanks @streamnativeio for hosting Pulsar Summit! Great talks! @PulsarSummit

0

18

Yingjun Wu | Building Data Infra

@YingjunWu

3 years

We are building an open and collaborative community for #RisingWave - everyone is welcome, and we are eager to partner with other communities! Right now, we are actively working with @redpandadata to unleash productivity in building real-time apps. Blog post to be released soon!

RisingWave

@RisingWaveLabs

3 years

And here's how you can get involved: ➡️Start by checking out the source code at GitHub (); ➡️By following us on Twitter, you will stay updated on the most recent developments! #RisingWave #github #opensource #opensourcecommunity #opensourcesoftware

0

4

17

2

8

19

Yingjun Wu | Building Data Infra

@YingjunWu

9 months

Hot discussion on @confluentinc vs @redpandadata vs @aiven_io 's @apachekafka services: . I just shared my own opinions. I do businesses with all these vendors, and I swear I didn't get paid by them 🙂 #apachekafka

From the apachekafka community on Reddit

Explore this post and more from the apachekafka community

www.reddit.com

0

2

19

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

How many people are using RisingWave, the open-source streaming database ()? See chart below. The daily Kubernetes deployment has increased by 10X 🚀🚀🚀 over the last 2 months!!! Start building your real-time apps with @PostgreSQL SQL today!

2

18

Yingjun Wu | Building Data Infra

@YingjunWu

9 months

I am always skeptical of any technology that *(self-)claims* to be the "gold standard." Technologies come and go; only protocols can last forever. @PostgreSQL protocol is the gold standard; @apachekafka protocol is the gold standard. A specific technology, however, is not.

0

4

18

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

@eatonphil I'm the top 1 contributor of Peloton DB, and proud to see that my name gets mentioned in Andy's blog. Very few database was built from scratch since 2017, as there have been so many successful DBs. RisingWave () is the very few that started after 2020.

GitHub - risingwavelabs/risingwave: Best-in-class stream processing, analytics, and management....

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streami...

github.com

1

18

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

Just watched @martinkl 's keynote at #Kafka Summit 2018 again. My assertion: Kafka/Redpanda/Pulsar is the new data lake. Agreed? @apachekafka @redpandadata @apache_pulsar

Keynote: Is Kafka a Database? - Confluent

Jay Kreps | Kafka Summit 2018 Keynote (Apache Kafka and Event-Oriented Architecture | Overcome by Events)

www.confluent.io

3

2

17

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

Well, I know I should post some photo in social 👻 #Current23 @RisingWaveLabs

0

17

Yingjun Wu | Building Data Infra

@YingjunWu

3 months

The Kafka ecosystem has reached a pivotal moment, and several significant changes are either underway or have already occurred: 💸 𝐑𝐮𝐧𝐧𝐢𝐧𝐠 𝐊𝐚𝐟𝐤𝐚 𝐰𝐢𝐥𝐥 𝐛𝐞 10𝐗 𝐜𝐡𝐞𝐚𝐩𝐞𝐫 𝐭𝐡𝐚𝐧 𝐢𝐭 𝐰𝐚𝐬 𝐚 𝐟𝐞𝐰 𝐲𝐞𝐚𝐫𝐬 𝐚𝐠𝐨. For those using Kafka primarily for

0

5

16

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

okay based on the poll results, I'll include all the streaming databases I'm familiar with in my @QCon SF talk 😀 @RisingWaveLabs @ksqlDB @MaterializeInc @timeplusdata @DeltaStreamInc

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

In a professional technical presentation, should I include the names of all my competitors in my slide deck? It's NOT a sponsored talk; it's all about technology.

0

5

17

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

Interesting blog at @Medium -: "You can also adopt streaming database like @RisingWaveLabs which can joining/aggregating streaming data with SQL syntax." Yes we do see the trend of using RisingWave in ML infra.

From Data Platform to ML Platform

How Data/ML platforms evolve and support complex MLOps practices

towardsdatascience.com

0

3

17

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

Well, just reached 5,000 GitHub stars 🌟🌟🌟 @RisingWaveLabs

Rust Trending

@RustTrending

1 year

risingwavelabs / risingwave: The distributed streaming database: SQL stream processing with Postgres-like experience 🪄. 10X faster and more cost-efficient than Apache Flink 🚀. ★4968

1

5

26

0

1

17

Yingjun Wu | Building Data Infra

@YingjunWu

11 months

i love the risingwave team so much

1

0

17

Yingjun Wu | Building Data Infra

@YingjunWu

9 months

Wow, I'm so happy to see that more and more people are adopting RisingWave and Redpanda for high-performance streaming data processing! @RisingWaveLabs @redpandadata #risingwave #redpanda

Kai Jellinghaus

@KJellinghaus

9 months

I've been playing with @RisingWaveLabs It's absolutely insane. The performance is out of this world. I haven't touched this thing and am just running the all-in-one version locally. - 120k+ rows/s ingestion from Kafka (redpanda - 2G memory) - Joined to output 240k+ rows/s!!

2

0

6

0

2

17

Yingjun Wu | Building Data Infra

@YingjunWu

11 months

Can I call Berlin 🇩🇪 the Bay Area of Europe? So many startups, data folks, and fun events!

4

0

17

Yingjun Wu | Building Data Infra

@YingjunWu

1 year

I’ll be talking about building stream processing applications with RisingWave and Apache Pulsar at #PulsarSummit Europe! Check out the schedule here: !

2

5

16

Yingjun Wu | Building Data Infra

@YingjunWu

4 months

Today's news about @ApacheIceberg tells us that in the ever-changing data market, the key to success is finding consensus. Iceberg is the consensus for data lakes, and whether it's @SnowflakeDB , @databricks , or @awscloud , they all need to adhere to this consensus. Clearly,

0

2

16

Yingjun Wu | Building Data Infra

@YingjunWu

3 months

Multi-versioning is a cornerstone technology in database systems. Even though it's not new, it's still used in almost every data product out there. At @RisingWaveLabs , we use multi-versioning to create a feature called "time-traveling," which lets users go back to a specific

Peter Kraft

@petereliaskraft

3 months

Want to learn how database concurrency control really works? Check out this paper from @YingjunWu and @andy_pavlo ! It dives deep into the most widely used type of concurrency control today: multi-version concurrency control (MVCC). The basic idea behind MVCC is for a database

1

50

297

0

1

16

Yingjun Wu | Building Data Infra

@YingjunWu

2 years

Nice talk from Christian Williams describing how they use Rust to save huge amounts of money💰(10X cheaper!!!) for their streaming jobs! @databricks @Scribd @DeltaLakeOSS @rustlang #rustlang #DataAISummit

0

4

16