conceptofmind
@concept_of_mind
Followers
2K
Following
411
Media
71
Statuses
1K
Joined November 2019
Releasing Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling. The model was trained in collaboration with u/bloc97 and @theemozilla of @NousResearch and @Void13950782 of @AiEleuther.
28
167
771
Releasing LLongMA-2 16k, a suite of Llama-2 models, trained at 16k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.
14
91
384
Releasing LLongMA-2, a suite of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.
6
86
368
Releasing LLongMA-2 13b, a Llama-2 model, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.
6
53
264
Releasing Hermes-Falcon-7b-8k, a Falcon model fine-tuned at 8k context length on the @NousResearch Hermes instruction dataset.
16
38
222
With Reddit and many other sites shutting down access to their APIs it is now more important than ever to release quality open-source conversational data. I worked with @ShayneRedford to generate ~80GB of labeled FLAN dialog data.
2
40
205
Releasing Hermes-Open-Llama-7b-8k, an OpenLLaMA model fine-tuned at 8k context length on the @NousResearch Hermes instruction dataset.
4
40
177
Releasing a new PaLM 2.1b model trained at a context length of 8k on C4. This model release is a continuation of the previously released 150m, 410m, and 1b models.
Introducing three new open-source PaLM models trained at a context length of 8k on C4. Open-sourcing LLMs is a necessity for the fair and equitable democratization of AI. The models of sizes 150m, 410m, and 1b are available to download and use here:
3
21
137
Releasing Hermes-LLongMA-2 8k, a series of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The models were trained in collaboration with @Teknium1 and @theemozilla of @NousResearch, and @kaiokendev1.
3
32
132
Introducing LLongMA, a series of OpenLLaMA models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and Kaiokendev.
3
26
124
@TeraflopAI is excited to help support the @caselawaccess and @HarvardLIL, in the release of over 6.6 million state and federal court decisions published throughout U.S. history.
3
36
93
Introducing LLongMA 13b, an OpenLLaMA model trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.
1
17
80
Towards clean and open source text data. A deduplicated version of wikitext-103-v1 is available on @Huggingface datasets. The dataset was deduplicated with Minhash LSH and a Jaccard similarity of 0.80. #machinelearning #deeplearning #datascience.
2
9
59
We are releasing all of the code, open-source, to fully reproduce the results of the paper. The repository containing u/bloc97 and @theemozilla’s implementation of YaRN rotary embeddings can be found here:
2
6
54
Happy to be a core contributor to @ShayneRedford's Data Provenance Initiative. It is now more important than ever to verify the commercial licensing of available datasets in order to help ensure the integrity of the open-source community.
📢Announcing the🌟Data Provenance Initiative🌟. 🧭A rigorous public audit of 1800+ instruct/align datasets. 🔍Explore/filter sources, creators & license conditions. ⚠️We see a rising divide between commercially open v closed licensed data. 🌐: 1/
0
15
52
Happy to have played a part in the release of Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers. This SOTA work was done by @RiversHaveWings, @Birchlabs, @StefanABaumann, @iScienceLuvr, and @DanielZKaplan.
3
10
52
We are releasing trillions of high-quality, copyright-free, permissively licensed tokens and multimodal data. Be sure to follow our releases @TeraflopAI.
1
9
44
It is important to democratize fair access to data to the public, legal community, and researchers. You can find a processed and cleaned version of the data available on @huggingface here:
2
8
38
We worked directly with @kaiokendev1, to extend the context length of the Llama-2 7b model through fine-tuning. The models pass all our evaluations and maintain the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.
1
1
39
Happy to announce our paper, Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers, has been accepted to #ICML2024. A huge congratulations to @RiversHaveWings, @StefanABaumann, and @Birchlabs. @icmlconf #ICML .
Happy to have played a part in the release of Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers. This SOTA work was done by @RiversHaveWings, @Birchlabs, @StefanABaumann, @iScienceLuvr, and @DanielZKaplan.
2
7
36
I would also recommend checking out the phenomenal research by @OfirPress on ALiBi which laid the foundation for many of these scaling techniques:
1
3
29
YaRN: Efficient Context Window Extension of Large Language Models was accepted to ICLR 2024. @bloc97_ @theemozilla @Void13950782. @iclr_conf #ICLR2024.
Releasing Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling. The model was trained in collaboration with u/bloc97 and @theemozilla of @NousResearch and @Void13950782 of @AiEleuther.
2
2
24
The data used during fine-tuning was extensively decontaminated and cleaned of any potential benchmarks it was evaluated against by @dmayhem93.
Odds of everyone starting to train on benchmarks? 🤔. Llama2 only briefly mentions this in Appendix A.6, but only published numbers they deemed "significant" (vs Table C.1 in the GPT-3 paper which shows actual contamination metrics across all benchmarks).
2
3
24
I have been working on an open-source replication of WebGPT using @LangChainAI. LangChain by @hwchase17 is by far the best library for building comprehensive language applications.
🔎 More detailed search results. @EnricoShippole added a method to the search classes to return more detailed search info: title, snippet, link. 👀 WebGPT?. Google Search: Bing Search:
3
0
22
@OfirPress Reddit data is extremely low quality and should be filtered from almost all pre-training so this won't make any difference regardless.
6
1
18
You can find the weights on @huggingface if you prefer to download the @PyTorch .pt files from there instead:
2
1
18
@wightmanr Currently working on this in collab with Lucid and a few members from Carper/EAI. @ShayneRedford has been helping me to open-source the FLAN dataset so we can instruct fine-tune models from the Pythia suite. As well as things such as training a flan-PaLM model.
1
3
17
The models used @tri_dao's flash attention 2 and part of @togethercompute's codebase. You can find out more about Flash Attention 2 here:
3
2
17
@Teknium1 @suchenzang Pretty big difference in price for dedicated vs. spot. Also depends on which A100s. You should use skypilot. It is what I am adding integration for into my trainer:
0
1
15
Our work on Toolformer, PaLM, and related projects is all thanks to the generous sponsorship by @carperai and @StabilityAI. A big thank you to @dmayhem93, @jonbtow, Aman, and @zach_nussbaum as well for providing input on the @huggingface library.
1
0
16
All of the C4 data has been pre-tokenized with the GPTNEOX tokenizer and blocked at sequence lengths of 8192. This will help to save you the large cost of preprocessing data. The datasets are available on @huggingface. An example chunk can be found here:
2
1
15
Data is what makes the model. We at @TeraflopAI are working hard to provide the open-source community with permissible commercially licensed datasets for training. Congrats to @arankomatsuzaki, @lintangsutawika, and @colinraffel. And thanks to @ShayneRedford for his work on FLAN.
0
1
14
Glad to see Stablelm-2-12B by @jonbtow, @dmayhem93, and @StabilityAI using our permissively licensed data to push the cutting-edge of language modeling. Data quality is more important than ever. @arankomatsuzaki and I are working to solve this challenge at scale at @TeraflopAI.
Has anyone tried this yet?. They seem to have perfected trianing small models (1.6B and 3B). If they were able to keep that while scaling up, this should be amazing.
0
2
14
Additionally, you can find a Hermes-Falcon-7b-4k model fine-tuned at a context length of 4k on @huggingface here:
2
1
11
Further instruction-tuning will be done on the new FLAN datasets we have released. A big thank you to @ShayneRedford for helping!
1
0
13
The @NousResearch Hermes dataset consists of over 300,000 instruction data points. Thank you to @karan4d and @Teknium1 for providing the data to train these models.
1
1
10
An open-source implementation of the ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer research paper in Google's #JAX and #Flax. @paperswithcode. #machinelearning #python #code #programming #tech #deeplearning #ai.
1
3
9
A big thank you to @joespeez @Meta for mentioning our previous research, YaRN, at the @weights_biases Fully Connected conference. We have some exciting long-context releases coming up soon.
2
0
12
The repository containing @theemozilla’s implementation of scaled rotary embeddings can be found here:
1
0
11
A distributed training script is provided so that you may train or fine-tune your own PaLM models using @huggingface accelerate. More information and experiments about the training will be detailed in the repository.:
1
1
11
Update 6: Added The Pile by #EleutherAI as the default dataset for an #opensource pre-training implementation of the #LaMDA research paper and #ai in #PyTorch with @huggingface streaming datasets. #MachineLearning #python #code #programming #tech.
0
2
8
@Yampeleg This is a common practice that has been used for quite a few years. You can find an example of packing the text and appending an EOS/EOT token with Huggingface datasets and tokenizers here:
0
1
10
An open-source implementation of the Better plain ViT baselines for ImageNet-1k research paper in Google's #JAX and #Flax. @paperswithcode. #MachineLearning #python #code #programming #tech #deeplearning #ai.
1
5
5
If you would like to preprocess your own dataset for training there is a dataset builder script provided. This uses @huggingface datasets to efficiently map, tokenize, and block the data:
1
0
9
We worked directly with @kaiokendev1, to extend the context length of the Llama-2 13b and 7b models through fine-tuning. The models pass all our evaluations and maintain perplexity at 16k extrapolation surpassing the performance of other recent methodologies.
2
0
10
The dialog data is available on @huggingface to download. It was processed at an extended context length of 8192. It contains relevant metadata such as Inputs, Targets, Task Source, and Task Name.
1
0
10
I have more copyright-free and commercially viable data than I know what to possibly do with. We are always actively looking for organizations to partner with to train on and serve this data to the community.
🚨More AI copyright lawsuits!🚨. 1. Artists sue Google for Imagen (. 2. More newspapers sue MSFT/OpenAI (. The newspaper litigation has far more compelling examples and arguments than prior cases. One to watch.
0
1
9
The repository containing @theemozilla's implementation of scaled rotary embeddings can be found here:
1
1
8
If you would like to learn more about scaling rotary embeddings, I would strongly recommend reading @kaiokendev1's blog posts on his findings:
1
1
8
An additional FLAN Dialog submix dataset was also preprocessed for causal language modeling, fixing different encoding issues, and is available on @huggingface to download.
1
0
9
The @NousResearch Hermes dataset consists of over 300,000 instruction data points. Thank you to @karan4d and @Teknium1 for providing the data to train these models.
2
1
9
I have had the pleasure of working with @hwchase17 to expand the @LangChainAI ecosystem by adding support for numerous different #opensource models, such as those by @AiEleuther, and providers. It is a necessary step in ensuring the democratization of artificial intelligence.
We need more options for integrating open source models (like those from @AiEleuther) into @LangChainAI . 🏆Thanks to @EnricoShippole we have exactly that. 🚀First class support for @gooseai_NLP @cerebriumai @ForefrontAI and Petals . 📃Docs:
0
2
9
Dissemination of artificial intelligence through clean, open-source user interfaces and experiences is an absolutely necessary gap that needs to be bridged between the research community and app developers. We must start furthering collaboration with front-end communities.
2
0
8
Additionally, you can find a Hermes-Open-Llama-7b-4k model fine-tuned at a context length of 4k on @huggingface here:
1
0
9
Adding support for numerous different #opensource models and providers to @LangChainAI is an imperative step in establishing an ecosystem that is mutually beneficial to all. The work done by @hwchase17 will help lead to both fair and equal distribution of artificial intelligence.
🦜🔗 v0.0.86. 📂Lots more open source model integrations! @EnricoShippole .🪵PromptLayer (@imjaredz) and Helicone (@justinstorre) integrations. And lots of other docs and bug fixes!. 🧵.
1
1
9
A PR to add scaled rotary embeddings to @huggingface transformers has been added by @joao_gante and merged:
1
1
7
We worked directly with @kaiokendev1, to extend the context length of the Llama-2 13b model through fine-tuning. The model passes all our evaluations and maintains the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.
1
0
8
I would also recommend checking out the phenomenal research by @OfirPress on ALiBi which laid the foundation for many of these scaling techniques:
1
0
8
@jeremyphoward Something that is quite often not discussed is sequence parallelism as well. It is supported in Nvidia Apex. Enables 4D parallelism. Using it in my larger 8k transformers:
2
1
8
@kamyrov You can do this for completely free as well with stable diffusion (what Lensa uses) using Google colab. Here is the link to the open-source notebook:
0
1
8
@andersonbcdefg @typedfemale I would recommend a general understanding of CUDA and GPU programming. Start with something like Oak Ridge National Laboratory CUDA training series:
0
2
8