Zach Wilson @EcZachly profile

Zach Wilson

@EcZachly

Followers

32,708

Following

971

Media

276

Statuses

3,976

Founder @ $1m ARR | ADHD | 800k+ followers on all platforms | 10 yrs DE experience |ex @facebook , @netflix , and @airbnb

https://t.co/5R0nm9WnEr

A free 75k+ DE newsletter 👉

Joined July 2014

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Karime • 347496 Tweets

ドーナツホール • 174199 Tweets

Kris Kristofferson • 147881 Tweets

Mayito • 116358 Tweets

XCROSSOVER WITH 00K • 64265 Tweets

石破ショック • 33357 Tweets

唐沢俊一 • 30244 Tweets

#INDvBAN • 19664 Tweets

PRABOWOsiapkan TRANSISImulus • 19348 Tweets

KitaKOMPAK KitaTANGGUH • 18614 Tweets

Happy New Week • 18086 Tweets

Mombasa • 13823 Tweets

#ドミノ39歳ハピバ • 13758 Tweets

スタリラ • 13023 Tweets

カッコの中 • 12618 Tweets

デレステ • 12559 Tweets

シャウエッセン

Bazball

稀代の風刺画家

セクシー田中さんの教訓

小幡スタメン

都民の日

前言撤回

テキスト不備

民放関係者

ファシリエ

IT HAS ENDED IN GOD

5k USD

महबूब अली

山藤章二さん

戦力外通告

原作者側と調整

ラキドロ

週刊朝日

ヤンタン続投

菅さん大丈夫

ギャクポ

मिथुन चक्रवर्ती

ギャクテンポインター

ファーストレイエス

Antoine Griezmann

Jawaban Menteri Agama

タンブラー

Deschamps

#AkalLagaoMstockPeAao

#HALInnovations

セカンド郡司

#ホワイトチョコの白い衝撃

エラッタ

Grizou

Last Seen Profiles

@DairyZambia

@Amparopolo

@Airdrops_one

@GdB_it

@eskoortee

@aurorak12

@deadlysands

@HCUAZE

@MadScientistJo

@samdhare

@1Talashh

@LewczukIgor

@DjSabjekt_koto

@bunnycorpsee

@GOON_Papi8

@Cristy11608120

@1Talashh

@churchkeybeer

@eWeatherNews

@AskPaddyPower

Zach Wilson

@EcZachly

2 years

Getting into #dataengineering is actually pretty easy - learn SQL - learn Python - learn Snowflake/BigQuery/DataBricks - learn data modeling - learn data pipelines with Airflow If you learn these 5 things, you’ll be interview-ready for a junior position for sure

89

724

4K

Zach Wilson

@EcZachly

1 year

You’re a great engineer if you know the definition of: - idempotent - monoid - decoupled - dependency injection - unit - functional programming - asynchronous vs parallel programming - thread locking - eventual consistency - exactly-once semantics - lambda vs kappa

237

455

3K

Zach Wilson

@EcZachly

11 months

I created a public Github repo with all the resources, books, companies, and social media accounts you should be following to stay current on data engineering topics. I'm accepting PRs so we can crowdsource this effort! #dataengineering

GitHub - DataExpert-io/data-engineer-handbook: This is a repo with links to everything you'd ever...

This is a repo with links to everything you'd ever want to learn about data engineering - DataExpert-io/data-engineer-handbook

github.com

22

468

2K

Zach Wilson

@EcZachly

3 months

If I had to start learning #dataengineering all over again, I’d follow this plan, mostly in order: - Learn SQL — Aggregations with GROUP BY — Joins (INNER, LEFT, FULL OUTER) — Window functions — Common table expressions - Learn about data modeling — read about data

16

339

2K

Zach Wilson

@EcZachly

1 year

I worked 2 years each at Meta, Airbnb and Netflix. Their engineering stacks are different and cultures have pros and cons. - Meta Stack I used: Hive, Spark, HDFS, Dataswarm, Unidash, Deltoid Pros: Tons of motivated people willing to help you Great social events to make

35

209

2K

Zach Wilson

@EcZachly

8 months

The data engineer interview has 4-5 pieces: - the SQL interview Make sure you know: Window functions, self-joins, common table expressions and SQL fundamentals - the data modeling interview Make sure you know: Fact data modeling, dimensional data modeling, aggregate tables

13

327

2K

Zach Wilson

@EcZachly

1 year

I know data engineers who know just Python and SQL who make $500k at Netflix. You don’t need to know the high performance languages to make a killing as a data engineer!

36

155

2K

Zach Wilson

@EcZachly

10 months

The best tech for each task: - batch pipeline: Apache Spark - data visualization: Apache Superset - web api: NextJS (spring boot close second) - SQL database: Postgres - NoSQL database: DynamoDB - Graph database: Neo4j - front end web: React - front end mobile: React

45

257

1K

Zach Wilson

@EcZachly

1 year

When I was at Airbnb, I reduced the pricing and availability data sets to 3% their original size! This removed a few petabytes from the cloud and made Jeff Bezos cry. How did I do this? 1. I recognized that listing and listing night information should be in one table not

28

104

1K

Zach Wilson

@EcZachly

11 months

Seven months ago, I decided to leave my big tech job to build something on my own. I was inspired by @thejustinwelsh 's solopreneur content and believed I could attain a similar life! I was making $600k/year at my data engineering job at Airbnb. I made $600k in seven months as an

39

104

1K

Zach Wilson

@EcZachly

2 years

Data engineering is like you take all the frustrating parts of being a data analyst and combined them with all the frustrating parts of being a software engineer

29

196

1K

Zach Wilson

@EcZachly

7 months

Every SQL concept you should know to ace data engineering interviews: - Basics SELECT, FROM, WHERE, GROUP BY, ORDER BY and HAVING - Window functions Know the difference between RANK vs DENSE_RANK vs ROW_NUMBER Know how PARTITION BY and ORDER BY work in the OVER clause

17

232

1K

Zach Wilson

@EcZachly

6 months

AI isn't the cause of the tech hiring slow down! There was a law that went into effect in 2022 that updated Section 174 of the tax code. Here are two scenarios to illustrate this: - In 2021, you could found a startup and hire an engineer and pay them $100,000. Say your company

32

253

1K

Zach Wilson

@EcZachly

2 months

The data engineer interview has 4-5 pieces: - the SQL interview Make sure you know: Window functions, self-joins, common table expressions and SQL fundamentals - the data modeling interview Make sure you know: Fact data modeling, dimensional data modeling, aggregate tables

9

195

1K

Zach Wilson

@EcZachly

6 months

I migrated my data engineer handbook to: This repo has over 7300 stars and all the resources you'd ever need to become an amazing data engineer! #dataengineering

GitHub - DataExpert-io/data-engineer-handbook: This is a repo with links to everything you'd ever...

This is a repo with links to everything you'd ever want to learn about data engineering - DataExpert-io/data-engineer-handbook

github.com

10

218

1K

Zach Wilson

@EcZachly

1 year

SQL interviews are common in data engineering. They’re even more common in big tech. I wrote an article today revealing everything I know about them in my nine years of data engineering experience! Link in my bio since Elon would bury it otherwise! #dataengineering

7

192

1K

Zach Wilson

@EcZachly

1 year

Breaking in to data engineering can be 100% free and 100% project-based! Here are the steps: - find a REST API you like as a data source. Maybe stocks, sports games, Pokémon, etc. - learn Python to build a short script that reads that REST API and initially dumps to a CSV

13

173

1K

Zach Wilson

@EcZachly

10 months

Please never use COUNT(*) in your SQL. It’s bad and unnecessarily selects all the columns. Use COUNT(1) for a basic row count. Or COUNT(column) for the count of a specific column. #dataengineering

36

115

967

Zach Wilson

@EcZachly

11 months

Breaking into data engineering can be very confusing! Should I learn Spark or Snowflake? Python or Scala? Airflow or Argo? Flink or Spark Streaming? AWS or GCP? Superset or Tableau? Fundamentals are more important than technologies: - understanding distributed

22

156

961

Zach Wilson

@EcZachly

11 months

For the next week only, I’m removing the paywall on my data engineering interview articles. I wrote four in depth articles on passing the following four big tech interviews: - data structures and algos - data modeling - data architecture - SQL Link in bio since Elon would

9

127

948

Zach Wilson

@EcZachly

3 months

My favorite stack to build a data analytics product - Apache Spark (for processing) - Amazon S3 (for storage) - Apache Iceberg (for metadata) - Apache Airflow (for scheduling) - Apache Superset (for visualization) - Great Expectations (for data quality) #dataengineering

15

127

963

Zach Wilson

@EcZachly

1 month

Breaking in to data engineering can be 100% free and 100% project-based! Here are the steps: - find a REST API you like as a data source. Maybe stocks, sports games, Pokémon, etc. - learn Python to build a short script that reads that REST API and initially dumps to a CSV

12

154

944

Zach Wilson

@EcZachly

11 months

SQL is deceptively complex. The order which things apply isn’t that intuitive and can be frustrating when debugging queries. Let’s talk about the ordering of a query and when each step is executed. Here’s the query well deconstruct. SELECT city, SUM(weed_smoked) as

19

139

930

Zach Wilson

@EcZachly

9 months

Happy new year everybody! Here’s a 2024 learning data engineering roadmap. 1. The basics: - learn SQL — SELECT, FROM, WHERE, GROUP BY, JOIN, HAVING, etc - learn Python — data structures: objects, arrays, tuples, namedtuples — algorithms: recursion, loops 2. Intermediate

15

182

871

Zach Wilson

@EcZachly

1 year

I wrote a new article on passing data engineering data structures and algorithms interviews! I cover: - how to prepare in the interviews - what to do on the day of the interview - the exact leetcode questions I’ve seen in my career and more! Check out the link in my bio

8

145

860

Zach Wilson

@EcZachly

6 months

When I worked at Netflix, I built pipelines that processed over 2000 terabytes per day, data pipelines play by different rules when you get to this scale. I go into more detail here in this 2 min YouTube video you should check out! #dataengineering

I built data pipelines at Netflix that ran 2000 TBs per day, here’s...

Check out my academy at https://www.DataExpert.io where you can learn all this in much more detail!You can get use code ZACH15 to get 15% off!#dataengineerin...

www.youtube.com

8

110

861

Zach Wilson

@EcZachly

2 years

Don’t stop at SQL and Python when learning #dataengineering Add: - distributed computation - data modeling - bash/docker/dev ops - a statically typed language like Java you’ll make a lot more money if you do this

24

160

853

Zach Wilson

@EcZachly

7 months

Breaking in to data engineering can be 100% free and 100% project-based! Here are the steps: - find a REST API you like as a data source. Maybe stocks, sports games, Pokémon, etc. - learn Python to build a short script that reads that REST API and initially dumps to a CSV

12

151

833

Zach Wilson

@EcZachly

7 months

Data engineers come in a few levels: - level 1 Knows Python and SQL. Can move data from point A to point B so long as it’s not too big - level 2 Knows distributed compute basics like BigQuery and Spark. Can move data around on the order of single terabytes - level 3

12

132

810

Zach Wilson

@EcZachly

2 months

Data engineering is like you take all the frustrating parts of being a data analyst and combined them with all the frustrating parts of being a software engineer

21

110

774

Zach Wilson

@EcZachly

2 years

How I went from junior data engineer (L3) at Facebook to staff data engineer (L6) at Airbnb in 4 years. - I got hired at Facebook in 2016 as a junior data engineer. I had 2 years of experience and I realized that I probably got hired at the wrong level. (1/13)

19

85

735

Zach Wilson

@EcZachly

1 year

Being a data engineer is 50% building pipelines and 50% thinking about becoming a data scientist or software engineer

25

56

737

Zach Wilson

@EcZachly

7 months

Python, SQL and Airflow will get you to $125k as a data engineer. If you want more, you’ll need to adopt a software engineering mindset. - how do you make these pipelines scalable to arbitrary sizes of data? - how do you make data sets that are adaptable to inevitable

6

84

616

Zach Wilson

@EcZachly

11 months

The S tier data engineering stack is: - S3 and Apache iceberg for storage - Spark and Flink for compute - Airflow or Mage or Prefect for orchestration - Great Expectations for data quality - Druid for fast columnar storage for dashboards - AWS as the cloud platform What’s

46

86

722

Zach Wilson

@EcZachly

2 years

Quick guide to go from 0 to #dataengineering hero: - learn SQL Data Lemur is a great resource here - learn Python Do like… 30-40 leetcode easy and medium questions - distributed compute Get a trial of Databricks or Snowflake and find a training to learn about it 1/3

10

130

703

Zach Wilson

@EcZachly

2 years

Starting out in the data field can be overwhelming. Should you be a data scientist? A data engineer? A data analyst? An ML engineer? The number of role options is overwhelming! Here's some high-level guidance on how to pick between some of these roles. 1/5

16

163

687

Zach Wilson

@EcZachly

3 months

If you’re a data engineer that knows about: - data lakes - file formats - compression techniques - distributed compute You’re crushing it!

13

57

695

Zach Wilson

@EcZachly

5 months

You should pick SQL over Python for all pipelines that can use it! Here’s why: - SQL pipelines are going to be closer to the database and more likely to be optimized by default - SQL is the common denominator language of data professionals allowing analysts to more easily

21

90

660

Zach Wilson

@EcZachly

2 months

Fundamental concepts every data engineer should know because they don’t really change - ANSI SQL - distributed compute - OLTP vs OLAP - CAP theorem - slowly-changing dimensional modeling - fact data modeling - logging best practices - AVRO / Thrift schemas - idempotent

4

101

631

Zach Wilson

@EcZachly

6 months

By changing the sort order of one of my parquet tables at Airbnb, I was able to reduce its size from 35 GBs to 1 GB! Since there's 365 partitions of this data. It goes from being 12.2 TBs of data to 0.3 TBs. Remember when sorting your Parquet data that you should start with

18

85

627

Zach Wilson

@EcZachly

10 months

Here is a picture of how my resume transformed between 2014 and 2023. You'll see I didn't even list SQL or Python on my 2014 resume! You're allowed to change your mind on the trajectory and direction of your technical career! I realized I didn't like mobile app development

13

69

609

Zach Wilson

@EcZachly

2 years

It’s wild how many SQL-killers SQL has withstood in the last 50 years

27

64

614

Zach Wilson

@EcZachly

8 months

3 months ago, I created a public Github repo with all the resources, books, companies, and social media accounts you should be following to stay current on data engineering topics. This repo has ~6k stars now! I'm still accepting PRs so we can crowdsource this effort and make

7

93

589

Zach Wilson

@EcZachly

6 months

Data engineers often become bored of data engineering! After a while of SQL + Python + airflow, you start thinking all pipelines are the same and it’s copy and paste work. Some strategies to help with this: - become more end-to-end Maybe that means building a dashboard. Maybe

13

74

586

Zach Wilson

@EcZachly

2 months

Data products are what is going to elevate data engineering into the stratosphere! They power everything you could imagine in the big tech companies! - At Airbnb, I worked on a data product that helped detect "bad hosts" to increase guest satisfaction - At Netflix, I worked

4

79

590

Zach Wilson

@EcZachly

2 years

Data engineering without SQL is like pizza without cheese. Sure you can do it but it’ll be weird! #dataengineering

12

69

555

Zach Wilson

@EcZachly

2 years

Most companies need the following data roles: - Data engineer for master data management - Data scientist for model development and experimentation - Analytics engineer for KPI development and visualization - Machine learning engineer for model development, deployment, monitor

13

117

555

Zach Wilson

@EcZachly

9 months

Top 4 reasons why data engineering is the best data profession: 1. highest pay for the least education Machine learning engineers and data scientists make 10-15% more but spent 30% more time in college. Data analysts make less than data engineers but require less schooling.

11

71

506

Zach Wilson

@EcZachly

8 months

Picking the right storage technology depends on a lot of factors! Picking the wrong one will always result in pain and migrations down the line! These constraints are around: - latency Low latency is dominated by queues and caches. Data access in those data structures is

8

110

546

Zach Wilson

@EcZachly

9 months

Data educators who know their stuff are hard to come by! Here’s a list of a few that inspire: - @v_vashishta - teaches AI strategy - @Alex_TheAnalyst - teaches data analytics - @andreaskayy - teaches data engineering - @NickSinghTech - teaches SQL - @alexxubyte - teaches

11

123

538

Zach Wilson

@EcZachly

1 year

Window functions are critical in SQL interviews. Here's every piece dissected. An example query for the question "Give me the rolling 30-day sum of revenue by department" SELECT SUM(revenue) OVER (PARTITION BY department ORDER BY date ROWS BETWEEN 30 PRECEDING AND CURRENT ROW)

12

77

537

Zach Wilson

@EcZachly

10 months

Data engineering has many "this or that" questions - Python or Scala? If you don't know either, start with Python. If you want to transition to the software/data engineer archetype, pick up Scala later. - Streaming or Batch? A vast majority of data engineering jobs are batch

15

90

521

Zach Wilson

@EcZachly

2 years

My favorite stack to build a data analytics product - Apache Spark (for processing) - Amazon S3 (for storage) - Apache Iceberg (for metadata) - Apache Airflow (for scheduling) - Apache Superset (for visualization) - Great Expectations (for data quality) #dataengineering

12

85

525

Zach Wilson

@EcZachly

1 year

The data engineer journey has a few levels: - level 1 Am I an analyst or a data engineer? At this level you’re probably doing a mixture of pipeline work and reporting. You like pipeline work more. - level 2 Why are pipelines so complicated? Here you learn about

3

92

513

Zach Wilson

@EcZachly

9 months

Breaking into data engineering can feel overwhelming! Here’s a path forward that takes 6-9 months to truly complete! #dataengineering

3

125

517

Zach Wilson

@EcZachly

17 days

When I worked at Netflix, I built pipelines that processed over 2000 terabytes per day, data pipelines play by different rules when you get to this scale. I go into detail here in this 2 min YouTube video you should check out!

I built data pipelines at Netflix that ran 2000 TBs per day, here’s...

Check out my academy at https://www.DataExpert.io where you can learn all this in much more detail!You can get use code ZACH15 to get 15% off!#dataengineerin...

www.youtube.com

2

83

514

Zach Wilson

@EcZachly

1 month

Do you want to get better at data engineering? Here's some free YouTube videos you should watch: Data Modeling 100TBs to 5 TBs: Data Lake fundamentals (Iceberg and Parquet): Dimensional Data Modeling:

Dimensional data modeling and idempotent pipelines in 78 minutes with...

We'll be covering: - Idempotent pipelines - Why non-idempotent pipelines are problematic- Things that make your pipelines not idempotent- Slowly changing dim...

www.youtube.com

1

87

513

Zach Wilson

@EcZachly

9 months

Data engineers spend weeks of their lives grinding out data pipelines just for a data analyst to display it with a pie chart! #dataengineering

29

65

494

Zach Wilson

@EcZachly

9 months

Level 1 data engineers: I use SQL Level 2 data engineers: SQL is hard to test, you need TDD in your pipelines, data frames only! Level 3 data engineers: I use SQL and dbt #dataengineering

10

59

491

Zach Wilson

@EcZachly

6 months

Data analysts don’t need to learn that much more SQL to become data engineers! Data analysts have a mastery of the SELECT query! This is 80% of data engineering SQL tools! Adding in a few other SQL commands will make it much easier to go from data analyst to data engineer! -

6

101

488

Zach Wilson

@EcZachly

10 months

I nearly tripled my salary in a year by transitioning from data analyst to data engineer! I started my career as a data analyst in 2014 making $30k. I decided I needed to upskill more. I learned Linux, Hadoop fundamentals, Java MapReduce, and got more depth in my software

22

49

485

Zach Wilson

@EcZachly

1 year

Python, SQL and Airflow will get you to $125k as a data engineer. If you want more, you’ll need to adopt a software engineering mindset. - how do you make these pipelines scalable to arbitrary sizes of data? - how do you make data sets that are adaptable to inevitable

8

81

484

Zach Wilson

@EcZachly

7 months

When I was in my early 20s, I believed that making $250k was going to be my "late career" earnings. This belief changed in 2017 after working at Facebook for a year. After working for a year with people whose parents' paid $250k+ for their college, made me realize that either:

12

56

478

Zach Wilson

@EcZachly

4 months

Data engineering != data science != software engineering So many companies have data engineers writing REST APIs, data scientists building pipelines and software engineers building models. Hire your specialists for their special skills. Don’t push them into inefficient

12

80

473

Zach Wilson

@EcZachly

3 months

Some people have been asking for sample lectures from the boot camp content. Here's the very first data modeling lecture at full length to give you an idea if the boot camp is for you or not! I hope you enjoy the 48 minutes of data engineering bliss!

Data modeling a 100 TB data lake into 5 TBs with STRUCT and Array -...

This is the first lecture of my 40+ hour boot camp materials. This is connected to this lesson here: https://dataexpert.io/lesson/dimensional-data-modeling-d...

www.youtube.com

5

85

448

Zach Wilson

@EcZachly

9 months

Job requirements are mostly wishlists. I applied to a staff data engineer role at Airbnb that required 10+ years of experience when I had 6 years of experience. I got the job though! Apply to jobs you don’t think you’re ready for! You might surprise yourself!

8

44

438

Zach Wilson

@EcZachly

5 months

Every SQL keyword and its corresponding cloud cost: - SELECT: EC2 compute cost - FROM: S3 egress cost - JOIN: S3 egress cost, EC2 compute cost, shuffle and restart costs - ORDER BY/GROUP BY: EC2 compute cost, shuffle and restart costs - HAVING / WHERE: EC2 compute cost,

5

66

441

Zach Wilson

@EcZachly

5 months

In 2024, my favorite technologies to learn are: - NextJS - Apache Spark - Snowflake - BigQuery - Apache Iceberg - Apache Airflow - Spring Boot Any that I’m missing? #softwareengineering #dataengineering

32

46

439

Zach Wilson

@EcZachly

1 year

The data architecture interview is often the thing that stands between you and a fancy senior+ data engineering role in big tech! I wrote a newsletter article covering the pieces that you need to remember to excel in these interviews! Link in the bio since Elon would downrank

2

56

431

Zach Wilson

@EcZachly

7 months

Slow ETL slaps data engineers on a daily basis! If you want to speed up your ETL 10x, try these things out: 1. Cumulatively build your dimensions Facebook keeps track of 30 days of user activity in an array. This makes calculating monthly active users much easier! You no

11

68

424

Zach Wilson

@EcZachly

8 months

Please stop using sub queries in your pipelines! #dataengineering

13

49

413

Zach Wilson

@EcZachly

2 months

What people think breaking into data engineering looks like: - processing hundreds of terabytes at scale - mastering Spark, Iceberg, Airflow - knowing everything about data lakes and data architecture - burning thousands of dollars on AWS compute just to get a job What breaking

6

61

415

Zach Wilson

@EcZachly

6 months

Here's what the average data engineering interview looks like in 2024: - 1 hour algorithms in Python Here you will be asked irrelevant questions about dynamic programming, linked lists, and inverting trees - 1 hour SQL Here you will be asked niche questions about recursive CTEs

10

65

405

Zach Wilson

@EcZachly

2 years

Data engineers with strong software engineering skills will be in very high demand for the next 5 years! Building end-to-end data products and not just data pipelines will unlock outsized value for companies! Data products are full stack so DEs should upskill here: 1/2

9

67

414

Zach Wilson

@EcZachly

9 months

After you’ve been in data for a while you realize tooling doesn’t matter that much! - whether it’s Snowflake vs BigQuery vs Spark It’s all distributed compute underneath the hood. - whether it’s Airflow vs Prefect vs Mage vs Dagster It’s all CRON underneath the hood -

8

72

404

Zach Wilson

@EcZachly

9 months

My bold 5 year predictions about #dataengineering - Streaming data eng jobs account for 15-20% of all data eng jobs, but pay the most - Rust becomes a mainstream data engineering infrastructure language like Scala - Spark starts looking like Hive does now - Data engineers

15

52

400

Zach Wilson

@EcZachly

10 months

When I was 17, I ran away from home and ultimately got tackled by my 300 pound step dad. He screamed at me, “Zach you’re a drug addict!” My journey since then has been kind of crazy. I spent 17-22 lost. Going in and out of rehabs and feeling dejected and anxious. My one

20

19

398

Zach Wilson

@EcZachly

3 months

When I worked at Netflix, I built a graph database that had over 40 different vertex types and 50 different edge types! This extreme variety of data needs to be handled with care! I wrote a detailed blog post about everything you should consider here:

4

53

406

Zach Wilson

@EcZachly

1 year

Distributed SQL is not the same as regular SQL! These keywords cause shuffling in distributed environments: - GROUP BY - JOIN - ORDER BY - PARTITION BY These keywords behave mostly the same everywhere: - WHERE - HAVING - FROM - SELECT You’ll notice the word “BY”

5

57

389

Zach Wilson

@EcZachly

2 years

Refusing to grow beyond SQL and Python will limit your career growth as a data engineer! Growing in the following areas will get you more money: - data modeling Knowing when to use cumulative table design to model your dimensions is critical. Knowing how to efficiently model

16

65

391

Zach Wilson

@EcZachly

1 year

Mid-level engineers often fall into the trap that doing more gets you promoted faster! This bias sounds correct though. Senior engineers write more code that’s why they’re senior right? I remember at Facebook I fell into this trap. I became the main DE owning notifications,

11

41

395

Zach Wilson

@EcZachly

16 days

Getting a big tech data engineer job in 2016: - do you know SQL? - yes - here’s $500k Getting a big tech data engineer job in 2024: - do you know Spark, Kafka, Iceberg? - yes - did you shake hands with Bill Inmon when he invented the data warehouse? - no - rejected and

6

47

393

Zach Wilson

@EcZachly

6 months

Data modeling has evolved beyond Kimball’s book Here’s why: - Kimball modeling didn’t think about distributed compute environments or large scale data - Splitting everything up into tables that can’t be broadcast JOIN’d in Spark is expensive. - Doing JOINs with extremely

10

64

392

Zach Wilson

@EcZachly

1 year

Top five skills to break into data engineering: - data modeling Dimensional data modeling - what analysts use Relational data modeling - what software engineers use One Big Table data modeling - a new cutting edge way that is appropriate sometime - distributed compute The

5

85

385

Zach Wilson

@EcZachly

7 months

Data engineering SQL interviews always have a silly RANK question. Should you use RANK, DENSE RANK, or ROW NUMBER? Here’s a refresher! For more free data engineering interview, subscribe to my blog: #dataengineering

3

60

389

Zach Wilson

@EcZachly

6 months

Every engineer has one of two tech stacks: - stack one: MacBook, Discord, AWS, JavaScript, React, Jenkins, GitHub, FaceTime - stack two: Windows, Slack, Azure, Python, Vue, GitHub actions, GitHub, Zoom Which stack are you?

71

35

378

Zach Wilson

@EcZachly

6 months

Data analytics is going to become more "Kafka-first" for a variety of reasons - Relying on a data engineer to ETL the data is a bottleneck that a lot of companies don't want to worry about - Technologies like Apache Pinot sit on top of Kafka and enable real-time analytics

11

69

384

Zach Wilson

@EcZachly

6 months

Breaking into data engineering can feel complicated and overwhelming! You need to learn the languages of the trade SQL and Python. You need to learn the tools of the trade Spark,BigQuery, Airflow, Databricks, etc. Then you need to show that you actually know this stuff! I go

2

72

385

Zach Wilson

@EcZachly

7 months

Once you’ve been in analytics long enough you realize there’s only like… 6 patterns - Aggregatation Count things by other things - Experimentation / Segmentation Split people into groups and test product changes - Accumulation vs Derivative Think rolling sum or YoY

5

32

381

Zach Wilson

@EcZachly

1 year

If you use Excel for data analytics, you’re a data analyst. You don’t have to know SQL and Python. Don’t belittle others for using tools that are different from yours! It’s very impressive how far business can go with just Excel.

11

89

357

Zach Wilson

@EcZachly

10 months

Data engineering interviews are frustrating because: - some treat DE like software eng and give you ridiculous data structures and algorithms questions - some treat DE like analytics eng and expect extremely in depth knowledge of dbt and metrics - some treat DE like being a

7

42

366

Zach Wilson

@EcZachly

6 months

The perfect data engineering portfolio project has the following things: - a data modeling diagram This shows you know how to build usable data tables. - a live visualization people can view from the web This is probably the thing people will look at and share. Without this

4

59

367

Zach Wilson

@EcZachly

7 months

I intentionally don’t monetize my long form YouTube videos so y’all can have the best learning experience even if you can’t afford YouTube Premium! Here are my best ad-free hits: Data Lakes, Apache Iceberg and parquet compression in 60 minutes:

Data Lake Fundamentals, Apache Iceberg and Parquet in 60 minutes on...

We'll be covering data lakes, parquet file format, data compression and shuffle!Make sure to have a https://www.DataExpert.io account here so you can get the...

www.youtube.com

6

65

366

Zach Wilson

@EcZachly

8 months

I turn 29 + 1 today at 9:02 PM Pacific. As I desperately cling to my 20s, here are 29 + 1 things I’ve learned during my time on this planet that have lead to success 1. Always ask questions! The stupider the question the better! 2. Don’t ask what’s the least I can do. Ask

39

50

357

Zach Wilson

@EcZachly

11 months

Linear regression is still more important than LLMs for 95%+ of data science jobs!

14

46

348

Zach Wilson

@EcZachly

9 months

For the holidays, I'm offering ten full-ride scholarships to V4 boot camp. If you get selected, you'll get immediate access to V3 material and get a free seat in the V4 boot camp in the spring! Here's the link to apply for the scholarship:

49

163

344

Zach Wilson

@EcZachly

3 months

Do you want to get better at data engineering? Here's some free YouTube videos you should watch: Data Modeling 100TBs to 5 TBs: Data Lake fundamentals (Iceberg and Parquet): Dimensional Data Modeling: ()

4

79

351

Zach Wilson

@EcZachly

1 year

The data modeling round in big tech interviews weeds out the DEs who can't solve vague business problems! I wrote a free article about everything you need to know to pass these interviews! Link in my bio since Elon would downrank otherwise! #dataengineering

1

55

348

Zach Wilson

@EcZachly

2 years

Data engineering compensation can get kind of crazy as you climb the ladder in big tech! - junior DEs usually make $180-200k - mid-level makes $250-275k - senior makes $300-350k - staff makes $500-600k Climbing the ladder is definitely worth it! #dataengineering

21

36

346

Zach Wilson

@EcZachly

11 months

Follow these accounts to level up your data skills! - @SeattleDataGuy - talks about data engineering and consulting - @andreaskayy - talks about data engineering - @startdataeng - talks about data engineering and architecture - @NickSinghTech - talks about SQL - @DalianaLiu

4

72

338