XYOU Profile Banner
Malte Ostendorff Profile
Malte Ostendorff

@XYOU

Followers
767
Following
2K
Statuses
618

@occiglot @openlegaldata @deutschetelekom

Berlin, Germany
Joined June 2009
Don't wanna be here? Send us removal request.
@XYOU
Malte Ostendorff
5 days
10/ Even basic things like language identification work poorly for those languages, which decreases the amount of data already at the stage of Web crawls.
1
0
0
@XYOU
Malte Ostendorff
6 days
@abacaj Better use Qwen as the base model
0
0
0
@XYOU
Malte Ostendorff
3 months
RT @occiglot: For anybody still at #EMNLP, we will be presenting community-Oscar at the MRL poster session at 11am. See you there.
0
5
0
@XYOU
Malte Ostendorff
4 months
The very first edition of the Conference on Language Modeling is kicking off ๐Ÿš€#COLM2024 @COLM_conf
Tweet media one
1
1
16
@XYOU
Malte Ostendorff
4 months
@BSC_CNS just released a series of European LLMs: 2b, 7b and soon 40b ๐Ÿš€
@MartaVillegasM
Marta Villegas
5 months
We are launching Salamandra 2B & 7B multilingual LLMs trained at @BSC_CNS from scratch with nearly 8 trillion tokens in 35 EU languages+code. Spanish languages have been carefully curated, with Romance languages comprising >30% of the training dataset. ๐Ÿ‘‡
0
0
2
@XYOU
Malte Ostendorff
6 months
We released a new multilingual dataset in cooperation with the OSCAR project. ๐Ÿš€๐Ÿš€๐Ÿš€
@occiglot
OcciGlot
6 months
๐Ÿ“ฃAnnouncing Community-OSCAR: A collaboration between Occiglot and the OSCAR project for creating multilingual Web-crawled datasets. Blog: HF:
0
0
3
@XYOU
Malte Ostendorff
7 months
@hu_yifei Did you already try Grobid?
0
0
1
@XYOU
Malte Ostendorff
7 months
@MalteLandwehr @HamelHusain The assumption is that you can simply put all new data into the prompt as soon as the context length becomes really big.
0
0
1
@XYOU
Malte Ostendorff
8 months
There are even more Axolotl configs being shared on @huggingface
@TheZachMueller
Zach Mueller
8 months
One of the most underrated things about @winglian's Axolotl is the database of config yamls that exist showcasing how to train so, so many models. I hope some day soon we can emulate a tenth of this into Accelerate to help people understand DDP configs more
Tweet media one
0
0
1
@XYOU
Malte Ostendorff
8 months
All three videos look so similar. Do they all overfit to the same training data?
@CuriousRefuge
Curious Refuge
8 months
We are LOVING the generations from @LumaLabsAI Dream Machine. Now itโ€™s time to answer the question, how does it compare to these other big name video generators. For most of these shots, Luma wins. #lumadreammachine #aifilmmaking #aivideo #filmmaking
0
0
1
@XYOU
Malte Ostendorff
8 months
@gui_penedo @pjox13 Thatโ€™s even better. I will share the data with you as soon itโ€™s ready!
0
0
2