🎩 Optimizing inference with math: TheStage AI and its framework
The inference market has grown so significantly that inefficiencies between revenue and inference costs have emerged.
@TheStageAI
’s solution, which uses our infrastructure, is designed to
What’s the role of Transformers in new GPUs?
What's better for Transformer training and inference, H100 or A100?
Our Senior Technical Project Manager, Igor Ofitserov, has written an in-depth review where you will find out the answers to these questions:
📣 Several weeks ago, our Product Director Narek Tatevosyan gave a talk at the
#TheAISummit
London. He shared an insider's perspective on the key steps, essential tools and challenges involved in training
#genAI
model from scratch. Watch the talk here:
Recraft
@recraftai
, recently funded in a round led by Khosla Ventures and former
@github
CEO Nat Friedman, is the first
#genAI
model built for designers. Featuring 20B parameters, the model was trained from scratch on Nebius AI. Here’s how:
💬 Join our next hands-on webinar on deploying a knowledge-based chatbot with
#RAG
in production! This implementation leverages
#opensource
tools and is powered by
@nvidia
H100 GPUs.
When?
May 16, 17:00 (GMT+2).
Register:
🔥 This fall, we’re adding
@NVIDIA
H200 SXM GPUs
Currently,
#H200
is the most powerful
#GPU
for your AI. Its memory and data access speed ensure up to 2x the LLM inference performance over H100.
Prices start from $2.50 GPU/h. Reserve your cluster today:
🧩Preserving knowledge in compact models:
#opensource
contributions of Unum x Nebius
To date,
@unum_cloud
has trained and open-sourced 4 models in partnership with us, all available on
@huggingface
for everyone to experiment with. Here’s the story so far:
🎬 Watch now: Slurm vs K8s for large model training
At the webinar a couple of weeks ago, our CSA Panu Koskela explored
#Slurm
and
#K8s
, covered their architecture, original purposes, adaptations for ML, and key considerations for choosing between them.
🎬 We’ve filmed this video in Finland, the home of our
#datacenter
. Here, we built ISEG, the world's 16th most powerful
#supercomputer
— since then, we also constructed an 8,000-GPU supercluster. Take a look at our DC and learn more about hardware R&D:
Join our first meetup with
@mlopscommunity
on Thursday, April 11, in Amsterdam!
Filipp will cover infrastructure resilience of multi-node
#LLM
#training
, and Luka will discuss realtime standby energy waste prediction. Spaces are limited — register soon:
🌄 Unlocking the power of
#opensource
LLMs
Our expert team will unveil our
#inference
infra and share insights on navigating the token-as-a-service market. Register for the webinar for
#GenAI
builders, CTOs, PMs, data scientists and related roles:
🔥 Price drop:
@NVIDIA
H100
One
#GPU
host with H100 is now priced at $3.50 per hour when paying as you go, down from the previous $4.85.
The prices for hosts reserved for 3, 6 or 12 months are also down. Check out our pricing for details:
🧩 Building
#RAG
-based solutions in Nebius AI: all you need to know
With our platform, you can easily manage and control RAG solutions. We’ve gathered all the related info on one page: . Learn more about architecture, resources and expert support options
🎛️ Fundamentals of LoRA and low‑rank fine-tuning
In the next installment of our series of deep technical articles on AI research, let’s switch our attention to the famous LoRA, a low-rank adaptation technique:
#LoRA
#lowrank
#finetuning
#research
#paper
Special offer for those training
#largemodels
: fight for the best price on your
#H100
cluster of 128+ GPUs! Till May 31, name your price for at least 14-day training and get a training-optimized infrastructure + free dedicated architect support as a bonus:
Just recently, we added
#Kubeflow
, a popular open-source platform providing an ML stack for
#K8s
. It consists of
@TensorFlow
,
@ProjectJupyter
notebooks, and other tools. You can deploy Kubeflow in your Managed K8s clusters on Nebius AI using this product:
🔥 Nebius AI is now open to everyone. Whether you are a company or an individual engineer, access the
#GPUcloud
console straight away and start running your
#ML
experiments.
Get started by logging in with your Google or GitHub account:
🔄 Announcing Managed Service for MLflow in public preview
#MLflow
is a renowned industry tool that streamlines
#workflows
in the model dev cycle. We made MLflow more accessible to a broad audience of ML enthusiasts by providing it as a
#managed
solution:
Checkpoints of large ML models can weigh hundreds of gigabytes. Let’s explore how to handle it. We’ll discuss strategies like async checkpointing, choosing storage and format, adjusting the code to the network, and scheduling with possible redoing in mind:
🎩 Levon Sarkisyan, our Solution Architect Team Lead, wrote a piece reflecting on the technical challenges his team (and many others at Nebius) have faced: . Instead of making a flashy announcement, we encourage you to simply check out Levon’s short article
🔬 How large models can abstract rules: a research by LIMS
How well can LLMs abstract problem-solving rules? A
#research
by the London Institute for Mathematical Sciences, conducted using our infrastructure, conquers the weakness of each modern
#LLM
:
🧪
#ML
experiments help you discover the most optimum model version for your specific use case. Read this article on our blog to learn about different types of
#experiments
and what you need to watch out for when conducting them:
How Dubformer performs AI dubbing on Nebius infrastructure ▶️
Dubformer is an AI and end-to-end localisation dubbing solution that broadcasts in over 70 languages. The company manages the two most resource-intensive Nebius AI-based tasks.
Learn more:
🏙️ We prepared a solution for deploying large-scale, customizable training environments.
#Slurm
-based Clusters feature
@nvidia
stack, high-performance shared storage and advanced scheduling: We partner directly with
@SchedMD
to deliver exceptional support.
🚀 Nebius AI Studio is our new product designed to simplify the process of creating
#GenAI
apps and using foundational models. The first release within Studio, Inference Service, provides endpoints for today’s most popular
#models
. Learn more on our blog:
🔥 Choosing
#storage
for
#deeplearning
: a comprehensive guide
Drawing from Nebius’ and our clients’ extensive experience, today’s guide by our own Igor Ofitserov aims to help engineers choose the most fitting storage solutions for deep learning:
No matter how ambitious your AI journey is - whether it’s a new
#LLM
, product based on
#Generative_AI
, or
#Computer_vision
- Nebius AI is here to help at every stage.
We are excited to have you join us!
🦾
@NVIDIA
L40S: now available in Nebius AI. Based in our own data center, L40S
#GPUs
support BF16, FP8, INT8 and INT4 formats and provide 48 GB of RAM each, making them ideal for inferring <8B models.
Start building solutions on top of L40S right away:
⭐️ Introducing Managed Service for Apache Spark
Learn more and request access if you’d like to process large-scale
#datasets
using
#Spark
in the Nebius infrastructure: . Currently, the service is provided free of charge and is at the Preview stage.
Supporting CUDAMODE IRL Hackathon was thrilling. Our CBO
@RomanChernin
said a few words before we began — thanks to everyone for the follow-up discussions. Each researcher we spoke with was aware and involved in the field. It was nice seeing you, Andrej
@Karpathy
!
🔥 We’re launching a new Nebius platform built from the ground up
We believe it will serve the AI explorers’ needs even better.
It features:
- Faster storage
- Support for bleeding-edge new GPUs
- Better observability
- More intuitive UI
Learn more:
🏋️ Our new Inference Service provides endpoints for popular open-source AI models (Llama, Mistral, Qwen...). First users get $100 worth of free tokens. That's ~40,000 interactions! From quick experiments to big projects, your journey starts here:
Compute Secured for the CUDAMODE IRL Hackathon!
We’ve secured an incredible $300+K in cloud credits, a 10-node GH200 cluster, and a 4-node 8 H100 cluster!
Thanks to our amazing sponsors, we are working with the sponsors to extend the credits beyond the event. (1/2)
🟢 Register among the first 50 to get your free
@TechCrunch
Disrupt ticket!
We’re sponsoring and giving away tickets. Just fill out this form among the first 50: . Note that the ticket will only be valid for the person whose name you indicated on the form.
🔥 In-house LLM R&D: Nebius AI’s secret ingredient for truly AI‑centric cloud
To build a full-fledged ML platform, we realized it’s necessary to perform large-scale distributed training in-house. That’s why we formed the LLM R&D team, leveraging our compute capacities to let us
🤝 Nebius AI and announce technological partnership
We are thrilled to collaborate with a company that develops such a widely used and significant open-source tool as DVC
@dvcorg
. Learn more on our blog:
With a keen eye on power usage effectiveness, Nebius is excited to be among the first cloud providers adopting and offering NVIDIA Blackwell GPUs🔥 Read more:
Watch the
#NVIDIA
#GTC24
keynote to learn more about the NVIDIA B200:
Throwback to this past weekend, when we supported the HACK UK Hackathon hosted by
@a16z
,
@MistralAI
and
@cerebral_valley
. We provided teams with infra powered by H100 Tensor Core GPUs. Additionally, Boris Yangel, head of our LLM R&D team, participated as a judge.
✋ We’re at
@Ray_Summit_Live
in SF with our booth and a talk. Let’s meet up! Also, join us at the
#RaySummit
Lightning Stage today at 12:00 PM for a talk by our Senior Product Manager Aleksandr Patrushev. He will be discussing the work of our LLM RnD team.
🇫🇷 Nebius launches its first GPU cluster in France
It is a colocation based at
@Equinix
’s PA10 campus — and the first data center equipped solely with Nebius-designed servers from day 1. Learn more and see the farm on the roof warmed by our servers:
In the last few weeks, we held our first webinar featuring Recraft, updated docs with some useful guides, shared Dubformer’s AI dubbing story, tackled the topic of AI research, and expanded the portfolio of ML-related products on Marketplace.
Learn more:
🔀 Recent research has exposed deep connections between different architectural options: transformers,
#RNNs
,
#SSMs
and matrix mixers.
In the next installment of our AI research series, we’ll mainly follow papers like “Transformers are RNNs” and Mamba 2, getting elbows deep in
🎨 We’re at the famous
@joinstationf
, a venue for
@xyz_paris
, sponsoring and participating with an inspiring talk by our own Rashid Ivaev. The French
#AI
community has always welcomed us warmly, which is especially important considering the opening of our GPU cluster in
#Paris
.
🍸 Mixture of Experts has become popular as an efficiency-boosting architectural component for
#LLMs
. The latest article in our AI research series explores the steps researchers have taken on the road toward the perfect
#MoE
:
🔥 This Saturday, we’ll be supporting CUDA Hackathon in SF, providing H100 to each hacker:
It‘ll be great to have
@clattner_llvm
with us, the creator of LLVM, the Clang compiler and Swift.
@ashvardanian
will also speak at the event, joined by co-hosts
📝 Check out our new article on data preparation for large models
We're exploring methods and technologies for maximizing efficiency in data collection and preparation for training LLMs, outlining the pipeline in detail and discussing our own chosen workload for dataprep:
🏅 We are
#17
in TechRound’s AITech35 winners!
In the rating, the startup and tech magazine
@techrounduk
celebrates the most innovative
#AI
companies across UK and Europe. We’re proud to rank
#17
, alongside great companies like
@Kayrros
and
@Databricks
. Kudos to all the winners!
🗓️ Nebius monthly digest, August 2024
We recently reduced the prices of
@NVIDIA
#H100
, invited everyone interested to reserve
#H200
, launched a public preview of Managed
#MLflow
and released a new talk about deploying generative AI models in production:
Here’s how to run
@Meta
Llama 405B with Nebius AI Studio API:
Our Studio allows
#GenAI
builders to use top
#opensource
models without facing the usual difficulties. The platform provides an API to run such models. The rest you can find out in the guide.
Yesterday, Top 500 updated the ranking of supercomputers.
We are happy to share that Nebius AI's ISEG is in the 16th place worldwide.
You can now use part of Nebius AI supercomputer for your AI projects!
Check out more at
Our booth can be сozier or more spacious from conference to conference, sometimes with flowers, sometimes without. One feature remains the same, though: here, you can always have a meaningful conversation about optimizing your AI infrastructure. Up until this Friday, we’re at
🏙️ In a rare public talk by our hardware R&D team, Igor Znamenskiy and Oleg Fedorov will discuss Nebius’ own server design and much more. We're especially proud to present this talk at
@OpenComputePrj
Summit. To attend, come to room 220C at Concourse Level at 1:30 PM on Oct 17.
🌌 Upcoming
@llmopsSpace
webinar: Taming AI or How we build the alignment pipeline
Speaking on July 11 will be Maksim Nekrashevich, our ML & LLM Engineer, accompanied by
@PhilipTannor
, CEO and Co-Founder at
@deepchecks
. Don't forget to register:
Great news! We have just opened access to our AI-centric cloud platform.
Nebius AI is ready for intensive ML workloads thanks to thousands of NVIDIA® H100 Tensor Core GPUs available right away.
Check out our website and contact us for an offer tailored for your needs.
🎛️ We’re regularly expanding the feature set of our AI-centric cloud platform. What you see here is
#ML
#workflow
we consider most effective for the users as of now. Our cloud architects will gladly help you build a pipeline of such complexity. Learn more:
👁️ In the field of computer vision, selecting the hardware can be tricky due to the variety of usable ML models and their significantly different architectures. Our latest blog post on the blog explores the criteria for selecting the best
#GPU
for
#CV
:
🏙️ Access up to 8
#H100
,
#V100
or
#L40S
GPUs right away
Here’s what we offer via self-service:
- Up to 8
@NVIDIA
GPUs on demand in 5 minutes
- Flexibility in scaling
- Only $25 to start
- 24/7 support
- Step-by-step guides
Learn more and see the video:
🦆 Our quAIck-quAIck is wishing the best of luck to all hackers at
@MistralAI
Paris Hackathon!
We equipped 45 teams with
@nvidia
#H100
#GPUs
, and our ML engineer Sergei Polezhaev will share useful tips on applying RLHF without labelers. Shout out to
@cerebral_valley
for hosting!
🔥
@icmlconf
has kicked off: come meet us!
ICML is underway — take a short break from discussing posters and come to our booth 202. Lots of ideas around require resilient infrastructure, that’s why we are here.
You can meet Boris Yangel, our Head of NLP, Sergey Polezhaev, ML
The latest episode of
@mlopscommunity
Podcast is live, featuring our ML Engineer Simon Karasik, who provided an intro to
#LLM
checkpoints, shared tips and tricks for handling them and choosing a storage for them.
Audio:
Video:
✋ Greetings from
@LDNTechWeek
! It was wonderful to discuss AI implementations into established products with such great companies as
@Grammarly
and
@canva
! If you’re also on site, let’s chat about partnerships and
#GPU
infrastructure.
🗓️ Nebius monthly digest, September 2024
In September, we announced the opening of our French region, launched the world’s first open-source
#K8s
operator for
#Slurm
as well as Nebius AI Studio, a product for
#GenAI
builders — and there’s more:
💶 This new video will help you learn how to decode GPU pricing:
The
#GPUcloud
market is flourishing, bringing a variety of
#pricing
models. Our CFO Danila Pavlov provided some tips on interpreting price lists, identifying hidden costs and asking your
🔥 Introducing Soperator, the world’s first fully featured
#Kubernetes
operator for
#Slurm
in open source
From the in-depth article about it, you’ll find out how the community tackled this before and the details of the architecture of our open solution:
Today, we’re in 🇬🇧London at Fully Connected 2024, organized by
@weights_biases
. At our booth near the VIP lounge, you can meet our Product Director Narek Tatevosyan and Engagement Manager Dmitry Levner, who'd be happy to discuss
#GPUcloud
solutions dedicated to your ML pipelines.
🎰 Today and tomorrow, we’re at the famous MGM Grand in Vegas, the home of one the
@Ai4Conferences
, joining more than 5K attendees! If you’re one of them, get in touch with our team to discuss your bottlenecks in building
#AI
#training
and/or
#inference
workloads:
🏋️ Weights & Biases Launch agent, an important tool for managing ML experiments, is now available on our Marketplace: . This marks the beginning of a broader tech partnership between Nebius AI and
@weights_biases
, with more outcomes on the way.
👨👩👧👦 Hello again,
@mlopscommunity
!
We’re once again supporting MLOps Community Meetup, while also participating with a talk. This time we’re at
@Techspace
Kreuzberg in Berlin, sharing the stage with
@JoannaStoff
and
@katjawittfoth
. Our Senior ML Engineer Sergey Polezhaev just
🌁 During the
#RaySummit
in SF, our Senior Product Manager Aleksandr Patrushev will give a talk on why every AI cloud provider needs an in-house
#LLM
team. Come to the Lightning Stage on Oct 2 at 12:00 PM.
Also, throughout the summit, you can visit our booth. See you there!
Hardcore CUDA Hackathon: the winners are in! 🏆🏆🏆
That's a wrap for the hackathon at AGI House in San Francisco, which we supported with our H100s!
Our congrats go to the winning team:
David Heineman —
Evan Rusmisel —
Evan
🚜 Learn more about TractoAI, our end-to-end solution for data preparation, exploration and distributed
#training
, powered by proven
#opensource
technology. In the image, you can see our product landscape. Take a detailed look at the implementation here:
🗓️ Read our monthly digest — in June, we updated self-service offering up to 8 GPUs, decoded GPU pricing and added L40S to our portfolio. We also detailed our Terraform provider, discussed LLM dataprep,
@DVCorg
partnership and introduced feature requests:
🗓️ Nebius AI April digest — read on our blog:
Our main news of the past month is that Nebius AI is now open to everyone! We also participated in the
#MLOps
podcast and published practical
#videos
and stories about how our clients are building ML models.
We're at
#ICLR
, one of the most prestigious AI research conferences. Meet our ML engineers, ready to share LLM training techniques.
Many experts at ICLR are seeking new methods and also resources to implement them. We’re here to help. Come discuss your workflows at our booth 07.
📝 Read about Krisp’s experience with our platform
With Nebius AI,
@krispHQ
adopts
#Accent
#Localization
, a real-time technology that removes the accent from call center agent speech resulting in US-native speech.
Learn more on our blog:
🇬🇧 Today and tomorrow, we are at
#TheAISummit
London. Come say hi at booth 105! At 16:00 today, we will give a demo on building production-ready RAG chatbots.
Also, tomorrow at 11:55, our Product Director Narek Tatevosyan will give a talk on training a genAI model from scratch.
Last call: join our webinar on Slurm 🆚 K8s for large model training. Panu Koskela, our CSA, will cover their architecture, design, original purposes, and adaptations for ML.
🕐 When: March 28, Thursday, 16:00 (GMT+1)
📍 Where: Zoom
Register:
If you're attending
@EmergeConfHQ
in Yerevan, don't miss the talk by our CMO Anastasia Zemskova
@elsegreen_
. It will show you how conversations with some of the brightest minds in AI helped us build a platform that’s flexible, adaptable, and ready for whatever comes next.
🎬 Our cloud solutions architect Boris Popov is back, this time exploring retrieval-augmented generation in a hands-on video: . The
#RAG
-based chatbot he built leverages open-source technologies and is powered by
#H100
#GPUs
.
We're continuing our exploration of our platform with our tech expert 😎
In this video, Khamzet Shogenov, a cloud solution architect, will discuss networking basics within the Nebius AI cloud platform.
#Network
#VPC
#NebiusAI
#machinelearning
#ai
🇩🇪 Today, the
@THWS_Presse
is hosting a hackathon focused on solving LLM-related tasks presented by
@DeutscheBank
. We provided participants with
@nvidia
#H100
#GPUs
and cloud solution architect expertise. Hope that the teams' quick experiments will lead to impressive results!
🗓️ Nebius AI in July: what’s new?
In July, we introduced our LLM R&D team and shared more of the team’s experience. We also launched a GPU auction and published diverse materials, including a guest post about the London Institute’s research based on our infra.
There were other
We are releasing a new tiny VLM 🎉
The model is smaller and much more accurate than our previous
@unum_cloud
"uform-gen" downloaded 100K times/mo
The new decoder is only 0.5 billion parameters and the model is already available on
@huggingface
🤗
🧵
Meet our own Idan Belisha, a Cloud Solution Architect, explaining to the participants of the
#CUDA
hackathon how to start using
#H100
GPUs. We take pride in our architect team, and it’s great to have Idan with us here in the Bay Area.
🎛️ So why is Nebius AI the best for model training? Well, we’ve put lots of effort into understanding what ML engineers need from a GPU cloud. Here, we outlined everything we learned from our LLM R&D team,
@recraftai
,
@higgsfield_ai
and other clients:
Slurm 🆚 K8s: a comprehensive blog post
Our Cloud Solution Architect Panu Koskela's article compares the most popular options today for orchestration in model
#training
— Slurm and
@kubernetesio
, covering their origins,
#ML
adaptations and other factors:
Nebius AI monthly digest: March🌿
March was a busy month for us: we opened access to
#Managed
databases, hosted a webinar on
#Slurm
vs
#K8s
, published new guides in our documentation and several ML-focused articles.
Find out the details on our blog:
🌕 JupyterHub: just released on Marketplace
#JupyterHub
is a multi-user server for Jupyter notebooks, providing an environment for data science, ML and scientific computing.
You can deploy the server with PyTorch and CUDA in your K8s clusters right away:
💰 Recently, our CFO Danila Pavlov wrote a piece on solving the
#GPU
#pricing
puzzle for
@TheStartupsMag
. He outlined the key tactics startups can use when selecting a GPU provider for training and resource-intensive inference. Read the article here:
⚖️ At the frontiers of physics and maths: exploring London Institute’s projects based on Nebius infra
Today’s guest post on our blog is written by
@Ananyo
Bhattacharya, Chief Science Writer at the London Institute for Mathematical Sciences
@London_Inst
. We are honored to host
Join our webinar with Recraft's CEO Anna Veronika Dorogush for insights on heavy model training via Nebius AI, and learn about our platform's features and capabilities from our experts Andrew and Levon.
➡️ More information and registration:
We have a tradition at Nebius AI - Friday Cats when colleagues share pics of their pets. But our editor doesn't have any animals. So, she asked the AI to draw a cute cat.
Which one would you choose to share with your colleagues? 😺
@recraftai
@veedstudio
🎙️ We talk a lot about using our platform for model training, but don’t be mistaken — it is equally powerful
#inference
-wise. We're proud that
@higgsfield_ai
infers such impressive models with
#L40S
GPUs on Nebius AI. Appreciate your kind words,
@alexmashrabov
!
💸 The
#GPUcloud
market is flourishing, bringing a variety of
#pricing
models. During the webinar on Jun 27, our own Danila Pavlov will provide tips on interpreting price lists, identifying hidden costs and asking your provider right questions.
Register:
What makes GPUs and AI related so closely? What obstacles emerge when creating hardware tailored for machine learning, and how exactly does this equipment grace the workplace environment of an ML engineer? For answers, read our recent blog post.
🐰 Yesterday was April 1st — check out some facts we shared 🐰
April Fools' Day is an unexpected day to reflect on what has been going on at Nebius AI, right? Still, we thought, why not share a few facts on our blog — with a funny twist.
Take a look:
🇦🇹 Nebius AI at ICML: who to meet and where to find us
Here are the members of our team who will be at
@icmlconf
:
- Boris Yangel, Head of NLP
- Sergey Polezhaev, ML Engineer
- Levon Sarkisian, Cloud Solutions Architect Team Leader
- Aleksandr Patrushev, Senior Product Manager
Congrats to all four companies chosen by the EU to get supercomputing hours! And thanks to
@NomadicNarnian
at
@thenextweb
for covering the competition:
By the way, Nebius AI offers everyone — not just the EU's favored few — reasonable rates to access