Mark Kurtz @markurtz_ profile

Mark Kurtz

@markurtz_

Followers

343

Following

107

Statuses

174

CTO @neuralmagic; making deep learning faster, smaller, and more accessible

Boston, MA

Joined April 2017

Don't wanna be here? Send us removal request.

Mark Kurtz

@markurtz_

2 days

I’m a bit late in sharing this, but at the start of the year, I embarked on an exciting new journey—joining @RedHat following its acquisition of @neuralmagic, where I had the privilege of serving as CTO. While waiting for the deal to finalize during a holiday trip, I saw this fedora-shaped iceberg (it looked a bit more like it from the front!) floating by and ran outside to admire it. It was a perfect reflection point—not just on the incredible work we accomplished at Neural Magic, but on how much more lies beneath the surface, waiting to unfold. Fittingly, the Fedora is an iconic symbol for Red Hat, representing the power of open-source communities, collaboration, and innovation. The AI space follows the same principles. The industry often spotlights curated, surface-level demos, but the real breakthroughs happen below in product enablement, fine-tuning, optimization, and scaling, where innovation transforms AI from research into real-world impact across industries and communities. What you see is just the tip; what truly matters is the foundation beneath. At Red Hat, I'm thrilled to scale our commitment to making AI open, efficient, and accessible, all within the incredible open-source communities that power Red Hat and the future of technology. Exciting things ahead! Read more about the acquisition and Red Hat's AI vision: #opensourcedevelopment #ai #machinelearning #mlops #community #startups

0

4

Mark Kurtz

@markurtz_

1 month

RT @jamieagoldstein: In 2018, we raised a glass to toast the start of our partnership with @neuralmagic. Nearly seven years later, we raise…

0

8

0

Mark Kurtz

@markurtz_

3 months

Incredibly excited to continue our mission of open, efficient AI!

Neural Magic (Acquired by Red Hat)

@neuralmagic

3 months

Introducing 2:4 Sparse Llama: The first sparse foundation model built on Llama 3.1 8B. With 98% recovery on Open LLM Leaderboard v1 and full recovery on fine-tuning tasks (math, coding, chat), it’s more efficient and open source! Get from Hugging Face:

0

1

5

Mark Kurtz

@markurtz_

3 months

RT @addvin: I’m thrilled to announce that Neural Magic has signed a definitive agreement to join forces with Red Hat, Inc. At Neural Magic…

0

36

0

Mark Kurtz

@markurtz_

3 months

@nopainkiller @neuralmagic We're actively working on including it currently and hopefully will be able to start reporting on it for our next eval round. Our ultimate goal, ideally with the next round, is combine the best features across the algorithms into some easy to use recipes that are robust

0

1

Mark Kurtz

@markurtz_

3 months

@gm8xx8 Thanks for sharing! Excited that we were able to get this out for others to hopefully build on

0

1

Mark Kurtz

@markurtz_

3 months

@TheXeophon @gm8xx8 For this paper, we focused on real-world usage in the chat, instruct, and code generation domains specifically because of complaints about performance on Arena hard leaderboards. We'll have a follow-up with multi-language, long context, and multi-modal a bit later.

0

3

Mark Kurtz

@markurtz_

4 months

@NaanLeCun @zjasper666 Getting performance on the W4A8 setup is a surprisingly hard area based on the current hardware limitations. We're actively working on it, but we've seen the most consistent benefits with W8A8

1

0

1

Mark Kurtz

@markurtz_

4 months

@TheXeophon @_philschmid @neuralmagic @AIatMeta For this work we focused on the main usage pathways we've seen so far for LLMs. It definitely does have higher sensitivity in the long context regime and we are working on a follow up towards that now!

0

2

Mark Kurtz

@markurtz_

4 months

@BramVanroy @_philschmid @neuralmagic @AIatMeta Training is important to explore a large optimization space to converge on a precise solution over millions of gradient updates. At inference, we've converged so we can remove the unused pathways / precision

0

3

Mark Kurtz

@markurtz_

4 months

It was a privilege to sit down with Chris Brandt from and explore the state of AI today. We covered everything from the promise of smaller, specialized models to the real risks enterprises face when adopting AI. Key points in the podcast: 1️⃣ Why is AI replacing creative fields and not mundane tasks 2️⃣ The pitfalls of larger, general models and the importance of scaling smaller models 3️⃣ The power of open-source AI 4️⃣ Smarter algorithms for the future Watch the full talk on YouTube: Or listen to the podcast: What’s your take on these trends? Are enterprises ready for the shift to AI? How will creativity and automation coexist?

0

3

7

Mark Kurtz

@markurtz_

5 months

@EddyLeeKhane @_EldarKurtic @svpino In other words, it centralizes the latest SOTA methods into a single, easy-to-use repo that works across nearly any model. All focused on vLLM compatibility, ensuring perf rather than theoretical compression. Finally, some novel techniques, such as activation quantization methods

0

Mark Kurtz

@markurtz_

5 months

@J33P4 @svpino You definitely could. The difference is LLM Compressor focuses on compressing your models to tune the compression vs. accuracy tradeoffs to your use cases, compress models not available in popular repos, and enables the latest research to improve compression and recovery levels.

0

1

Mark Kurtz

@markurtz_

5 months

@Bakaburg1 @svpino Overall, LLM Compressor enables compression of your models with simple APIs for SOTA algorithms and methods targeted at max vLLM perf. These include activation quantization, sparsity, and more! For off the shelf SOTA compresed models, check these out:

0

Mark Kurtz

@markurtz_

5 months

@EonWeaveLabs @_EldarKurtic @svpino Expanding on this, here's an example that walks through LLM Comrpessor for AMD GPUs:

0

1

Mark Kurtz

@markurtz_

5 months

RT @svpino: You can now optimize and make any open-source LLM faster: 1. pip install llmcompressor 2. apply quantization with 1 line of co…

0

125

0