concept_of_mind Profile Banner
conceptofmind Profile
conceptofmind

@concept_of_mind

Followers
2K
Following
411
Media
71
Statuses
1K

Joined November 2019
Don't wanna be here? Send us removal request.
@concept_of_mind
conceptofmind
1 year
Releasing Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling. The model was trained in collaboration with u/bloc97 and @theemozilla of @NousResearch and @Void13950782 of @AiEleuther.
Tweet media one
28
167
771
@concept_of_mind
conceptofmind
2 years
Introducing three new open-source PaLM models trained at a context length of 8k on C4. Open-sourcing LLMs is a necessity for the fair and equitable democratization of AI. The models of sizes 150m, 410m, and 1b are available to download and use here:
7
104
522
@concept_of_mind
conceptofmind
2 years
Releasing LLongMA-2 16k, a suite of Llama-2 models, trained at 16k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.
14
91
384
@concept_of_mind
conceptofmind
2 years
Releasing LLongMA-2, a suite of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.
6
86
368
@concept_of_mind
conceptofmind
2 years
Releasing LLongMA-2 13b, a Llama-2 model, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.
Tweet media one
6
53
264
@concept_of_mind
conceptofmind
2 years
Releasing Hermes-Falcon-7b-8k, a Falcon model fine-tuned at 8k context length on the @NousResearch Hermes instruction dataset.
16
38
222
@concept_of_mind
conceptofmind
2 years
Releasing Flan-Open-Llama-7b, an OpenLLaMA model fine-tuned on the FLAN instruction dataset.
4
38
204
@concept_of_mind
conceptofmind
2 years
With Reddit and many other sites shutting down access to their APIs it is now more important than ever to release quality open-source conversational data. I worked with @ShayneRedford to generate ~80GB of labeled FLAN dialog data.
2
40
205
@concept_of_mind
conceptofmind
10 months
We publicly released a cleaned open-source version of the case law data. You can train your own similar legal models with this dataset. We plan to release numerous other legal datasets consisting of billions of tokens in the upcoming weeks.
@ClementDelangue
clem 🤗
10 months
Even OAI is telling you that specialized models are better!
Tweet media one
Tweet media two
21
27
189
@concept_of_mind
conceptofmind
2 years
Releasing Hermes-Open-Llama-7b-8k, an OpenLLaMA model fine-tuned at 8k context length on the @NousResearch Hermes instruction dataset.
4
40
177
@concept_of_mind
conceptofmind
2 years
Introducing an open-source reproduction of the FLAN V2 dataset.
3
32
170
@concept_of_mind
conceptofmind
2 years
Releasing Flan-Open-Llama-13b, an OpenLLaMA model fine-tuned on the FLAN instruction dataset.
3
29
147
@concept_of_mind
conceptofmind
2 years
Releasing a new PaLM 2.1b model trained at a context length of 8k on C4. This model release is a continuation of the previously released 150m, 410m, and 1b models.
@concept_of_mind
conceptofmind
2 years
Introducing three new open-source PaLM models trained at a context length of 8k on C4. Open-sourcing LLMs is a necessity for the fair and equitable democratization of AI. The models of sizes 150m, 410m, and 1b are available to download and use here:
3
21
137
@concept_of_mind
conceptofmind
1 year
The model can be found on @huggingface here:
4
21
135
@concept_of_mind
conceptofmind
2 years
Releasing Hermes-LLongMA-2 8k, a series of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The models were trained in collaboration with @Teknium1 and @theemozilla of @NousResearch, and @kaiokendev1.
3
32
132
@concept_of_mind
conceptofmind
2 years
Introducing LLongMA, a series of OpenLLaMA models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and Kaiokendev.
Tweet media one
3
26
124
@concept_of_mind
conceptofmind
2 years
Releasing Tasksource-Open-Llama-13b, an OpenLLaMA model fine-tuned on the Tasksource instruction dataset.
2
23
101
@concept_of_mind
conceptofmind
11 months
@TeraflopAI is excited to help support the @caselawaccess and @HarvardLIL, in the release of over 6.6 million state and federal court decisions published throughout U.S. history.
Tweet media one
3
36
93
@concept_of_mind
conceptofmind
2 years
Introducing LLongMA 13b, an OpenLLaMA model trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.
1
17
80
@concept_of_mind
conceptofmind
2 years
Releasing Flan-Open-Llama-3b, an OpenLLaMA model fine-tuned on the FLAN instruction dataset.
1
18
69
@concept_of_mind
conceptofmind
2 years
Towards clean and open source text data. A deduplicated version of wikitext-103-v1 is available on @Huggingface datasets. The dataset was deduplicated with Minhash LSH and a Jaccard similarity of 0.80. #machinelearning #deeplearning #datascience.
2
9
59
@concept_of_mind
conceptofmind
1 year
We are releasing all of the code, open-source, to fully reproduce the results of the paper. The repository containing u/bloc97 and @theemozilla’s implementation of YaRN rotary embeddings can be found here:
2
6
54
@concept_of_mind
conceptofmind
1 year
Happy to be a core contributor to @ShayneRedford's Data Provenance Initiative. It is now more important than ever to verify the commercial licensing of available datasets in order to help ensure the integrity of the open-source community.
@ShayneRedford
Shayne Longpre
1 year
📢Announcing the🌟Data Provenance Initiative🌟. 🧭A rigorous public audit of 1800+ instruct/align datasets. 🔍Explore/filter sources, creators & license conditions. ⚠️We see a rising divide between commercially open v closed licensed data. 🌐: 1/
0
15
52
@concept_of_mind
conceptofmind
1 year
Happy to have played a part in the release of Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers. This SOTA work was done by @RiversHaveWings, @Birchlabs, @StefanABaumann, @iScienceLuvr, and @DanielZKaplan.
Tweet media one
3
10
52
@concept_of_mind
conceptofmind
1 year
We worked to extend the context length of the Llama-2 13b and 7b models through fine-tuning. The model passes all our evaluations and maintains the same perplexity at 128k extrapolation surpassing the performance of our other recent methodology, NTK-part scaling.
Tweet media one
2
12
45
@concept_of_mind
conceptofmind
10 months
We are releasing trillions of high-quality, copyright-free, permissively licensed tokens and multimodal data. Be sure to follow our releases @TeraflopAI.
1
9
44
@concept_of_mind
conceptofmind
11 months
It is important to democratize fair access to data to the public, legal community, and researchers. You can find a processed and cleaned version of the data available on @huggingface here:
2
8
38
@concept_of_mind
conceptofmind
2 years
We worked directly with @kaiokendev1, to extend the context length of the Llama-2 7b model through fine-tuning. The models pass all our evaluations and maintain the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.
Tweet media one
1
1
39
@concept_of_mind
conceptofmind
2 years
We worked directly with Kaiokendev, to extend the context length of the open-llama 7b and 3b models through fine-tuning. The fine-tuned models maintain the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.
Tweet media one
1
6
37
@concept_of_mind
conceptofmind
9 months
Happy to announce our paper, Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers, has been accepted to #ICML2024. A huge congratulations to @RiversHaveWings, @StefanABaumann, and @Birchlabs. @icmlconf #ICML .
@concept_of_mind
conceptofmind
1 year
Happy to have played a part in the release of Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers. This SOTA work was done by @RiversHaveWings, @Birchlabs, @StefanABaumann, @iScienceLuvr, and @DanielZKaplan.
Tweet media one
2
7
36
@concept_of_mind
conceptofmind
1 year
A Yarn-Llama-2-7b model trained for 128k context length is available on @huggingface here:
1
6
33
@concept_of_mind
conceptofmind
1 year
I would also recommend checking out the phenomenal research by @OfirPress on ALiBi which laid the foundation for many of these scaling techniques:
1
3
29
@concept_of_mind
conceptofmind
2 years
The model can be found on @huggingface here:
1
3
25
@concept_of_mind
conceptofmind
2 years
@zhangir_azerbay Oak Ridge National Laboratory has a CUDA training series from 2021:
0
1
24
@concept_of_mind
conceptofmind
1 year
YaRN: Efficient Context Window Extension of Large Language Models was accepted to ICLR 2024. @bloc97_ @theemozilla @Void13950782. @iclr_conf #ICLR2024.
@concept_of_mind
conceptofmind
1 year
Releasing Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling. The model was trained in collaboration with u/bloc97 and @theemozilla of @NousResearch and @Void13950782 of @AiEleuther.
Tweet media one
2
2
24
@concept_of_mind
conceptofmind
2 years
The data used during fine-tuning was extensively decontaminated and cleaned of any potential benchmarks it was evaluated against by @dmayhem93.
@suchenzang
Susan Zhang
2 years
Odds of everyone starting to train on benchmarks? 🤔. Llama2 only briefly mentions this in Appendix A.6, but only published numbers they deemed "significant" (vs Table C.1 in the GPT-3 paper which shows actual contamination metrics across all benchmarks).
2
3
24
@concept_of_mind
conceptofmind
2 years
The models are also compatible with many of Lucidrain's popular repositories such as Toolformer-pytorch, PaLM-rlhf-pytorch, and PaLM-pytorch. Please be sure to sponsor and help support Phil's great work:
1
0
22
@concept_of_mind
conceptofmind
2 years
I have been working on an open-source replication of WebGPT using @LangChainAI. LangChain by @hwchase17 is by far the best library for building comprehensive language applications.
@LangChainAI
LangChain
2 years
🔎 More detailed search results. @EnricoShippole added a method to the search classes to return more detailed search info: title, snippet, link. 👀 WebGPT?. Google Search: Bing Search:
3
0
22
@concept_of_mind
conceptofmind
2 years
@OfirPress Reddit data is extremely low quality and should be filtered from almost all pre-training so this won't make any difference regardless.
6
1
18
@concept_of_mind
conceptofmind
2 years
You can find the weights on @huggingface if you prefer to download the @PyTorch .pt files from there instead:
2
1
18
@concept_of_mind
conceptofmind
2 years
The models were trained with Flash Attention, Xpos Rotary Embeddings for better length extrapolation, and multi-query single-key-value attention for more efficient decoding.
1
0
17
@concept_of_mind
conceptofmind
1 year
It is also worth reviewing the paper, A Length-Extrapolatable Transformer, and xPos technique which also applies scaling to rotary embeddings:
1
4
16
@concept_of_mind
conceptofmind
1 year
The models have similar performance to the base LLaMA 2 models on the Open LLM benchmarks while scaling context length directly to 128k.
Tweet media one
1
2
16
@concept_of_mind
conceptofmind
2 years
@wightmanr Currently working on this in collab with Lucid and a few members from Carper/EAI. @ShayneRedford has been helping me to open-source the FLAN dataset so we can instruct fine-tune models from the Pythia suite. As well as things such as training a flan-PaLM model.
1
3
17
@concept_of_mind
conceptofmind
1 year
The models used @tri_dao's flash attention 2 and part of @togethercompute's codebase. You can find out more about Flash Attention 2 here:
3
2
17
@concept_of_mind
conceptofmind
2 years
A Llama-2 7b model trained at 16k context length will release soon on @huggingface here:
1
4
17
@concept_of_mind
conceptofmind
2 years
A Llama-2 13b model trained at 8k will release soon on @huggingface here:
1
1
13
@concept_of_mind
conceptofmind
1 year
We also trained a set of models at 64k context length. You can find the Yarn-Llama-2-13b-64k model here:
1
2
16
@concept_of_mind
conceptofmind
2 years
A LLongMA-13b model trained at 8k context length will be released soon. As well as a suite of LLongMA models trained at 16k and 32k context lengths.
1
2
16
@concept_of_mind
conceptofmind
1 year
@Teknium1 @suchenzang Pretty big difference in price for dedicated vs. spot. Also depends on which A100s. You should use skypilot. It is what I am adding integration for into my trainer:
0
1
15
@concept_of_mind
conceptofmind
2 years
Our work on Toolformer, PaLM, and related projects is all thanks to the generous sponsorship by @carperai and @StabilityAI. A big thank you to @dmayhem93, @jonbtow, Aman, and @zach_nussbaum as well for providing input on the @huggingface library.
1
0
16
@concept_of_mind
conceptofmind
2 years
All of the C4 data has been pre-tokenized with the GPTNEOX tokenizer and blocked at sequence lengths of 8192. This will help to save you the large cost of preprocessing data. The datasets are available on @huggingface. An example chunk can be found here:
2
1
15
@concept_of_mind
conceptofmind
10 months
Not only that. We additionally built a search index over all of the data for RAG applications:
1
1
14
@concept_of_mind
conceptofmind
2 years
@aicrumb Worth checking out as well:
1
3
15
@concept_of_mind
conceptofmind
10 months
Data is what makes the model. We at @TeraflopAI are working hard to provide the open-source community with permissible commercially licensed datasets for training. Congrats to @arankomatsuzaki, @lintangsutawika, and @colinraffel. And thanks to @ShayneRedford for his work on FLAN.
0
1
14
@concept_of_mind
conceptofmind
10 months
Glad to see Stablelm-2-12B by @jonbtow, @dmayhem93, and @StabilityAI using our permissively licensed data to push the cutting-edge of language modeling. Data quality is more important than ever. @arankomatsuzaki and I are working to solve this challenge at scale at @TeraflopAI.
@Euclaise_
Jade
10 months
Has anyone tried this yet?. They seem to have perfected trianing small models (1.6B and 3B). If they were able to keep that while scaling up, this should be amazing.
0
2
14
@concept_of_mind
conceptofmind
2 years
Additionally, you can find a Hermes-Falcon-7b-4k model fine-tuned at a context length of 4k on @huggingface here:
2
1
11
@concept_of_mind
conceptofmind
1 year
As well as the Yarn-Llama-2-7b-64k model here:
1
2
13
@concept_of_mind
conceptofmind
2 years
Further instruction-tuning will be done on the new FLAN datasets we have released. A big thank you to @ShayneRedford for helping!
1
0
13
@concept_of_mind
conceptofmind
2 years
The @NousResearch Hermes dataset consists of over 300,000 instruction data points. Thank you to @karan4d and @Teknium1 for providing the data to train these models.
1
1
10
@concept_of_mind
conceptofmind
2 years
The model has similar performance to LLaMA 2 under 4k context length, performance scales directly to 8k, and works out-of-the-box with the new version of transformers (4.31) or with `trust_remote_code` for <= 4.30.
1
1
12
@concept_of_mind
conceptofmind
3 years
An open-source implementation of the ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer research paper in Google's #JAX and #Flax. @paperswithcode. #machinelearning #python #code #programming #tech #deeplearning #ai.
1
3
9
@concept_of_mind
conceptofmind
2 years
1
1
12
@concept_of_mind
conceptofmind
2 years
Applying the method to the rotary position embedding requires only slight changes to the model's code by dividing the positional index, t, by a scaling factor.
Tweet media one
1
1
11
@concept_of_mind
conceptofmind
1 year
You can find out more about the @NousResearch organization here:
0
2
12
@concept_of_mind
conceptofmind
9 months
A big thank you to @joespeez @Meta for mentioning our previous research, YaRN, at the @weights_biases Fully Connected conference. We have some exciting long-context releases coming up soon.
2
0
12
@concept_of_mind
conceptofmind
2 years
Different inference optimizations such as Flash Attention, Hidet, and Torch compile are used. You can read more about the Hidet compiler and project here:
1
0
12
@concept_of_mind
conceptofmind
2 years
This is not an official Google or StabilityAI product. If you have any questions about the models or training be sure to reach out and ask! I will try to respond promptly.
2
0
12
@concept_of_mind
conceptofmind
2 years
The repository containing @theemozilla’s implementation of scaled rotary embeddings can be found here:
1
0
11
@concept_of_mind
conceptofmind
2 years
A distributed training script is provided so that you may train or fine-tune your own PaLM models using @huggingface accelerate. More information and experiments about the training will be detailed in the repository.:
1
1
11
@concept_of_mind
conceptofmind
3 years
Update 6: Added The Pile by #EleutherAI as the default dataset for an #opensource pre-training implementation of the #LaMDA research paper and #ai in #PyTorch with @huggingface streaming datasets. #MachineLearning #python #code #programming #tech.
0
2
8
@concept_of_mind
conceptofmind
2 years
@EMostaque @alexgraveley @joao_gante Llama-2 8k is releasing tomorrow if all goes smoothly.
0
2
11
@concept_of_mind
conceptofmind
2 years
@Yampeleg This is a common practice that has been used for quite a few years. You can find an example of packing the text and appending an EOS/EOT token with Huggingface datasets and tokenizers here:
0
1
10
@concept_of_mind
conceptofmind
3 years
An open-source implementation of the Better plain ViT baselines for ImageNet-1k research paper in Google's #JAX and #Flax. @paperswithcode. #MachineLearning #python #code #programming #tech #deeplearning #ai.
1
5
5
@concept_of_mind
conceptofmind
1 year
All of the models can be found on Huggingface:
1
2
10
@concept_of_mind
conceptofmind
2 years
If you would like to preprocess your own dataset for training there is a dataset builder script provided. This uses @huggingface datasets to efficiently map, tokenize, and block the data:
1
0
9
@concept_of_mind
conceptofmind
2 years
We worked directly with @kaiokendev1, to extend the context length of the Llama-2 13b and 7b models through fine-tuning. The models pass all our evaluations and maintain perplexity at 16k extrapolation surpassing the performance of other recent methodologies.
Tweet media one
2
0
10
@concept_of_mind
conceptofmind
2 years
Working on an open-source version of Deepmind's Sparrow. A conversational agent utilizing Google's web search API for factual grounding and RLHF. I am going to be taking the learnings from our previous implementation of Toolformer.
1
0
10
@concept_of_mind
conceptofmind
2 years
The 7b model can be found on @huggingface here:
1
0
10
@concept_of_mind
conceptofmind
2 years
A basic inference script was provided in the repository which you can play around with. You may want to experiment with hyperparameters in order to get generations of varying quality. Changing a variable such as temperature matters a lot.
1
0
10
@concept_of_mind
conceptofmind
2 years
The LLongMA 7b model is available on @huggingface to use:
1
0
10
@concept_of_mind
conceptofmind
2 years
The dialog data is available on @huggingface to download. It was processed at an extended context length of 8192. It contains relevant metadata such as Inputs, Targets, Task Source, and Task Name.
1
0
10
@concept_of_mind
conceptofmind
9 months
I have more copyright-free and commercially viable data than I know what to possibly do with. We are always actively looking for organizations to partner with to train on and serve this data to the community.
@PeterHndrsn
Peter Henderson
9 months
🚨More AI copyright lawsuits!🚨. 1. Artists sue Google for Imagen (. 2. More newspapers sue MSFT/OpenAI (. The newspaper litigation has far more compelling examples and arguments than prior cases. One to watch.
0
1
9
@concept_of_mind
conceptofmind
2 years
The repository containing @theemozilla's implementation of scaled rotary embeddings can be found here:
1
1
8
@concept_of_mind
conceptofmind
2 years
If you would like to learn more about scaling rotary embeddings, I would strongly recommend reading @kaiokendev1's blog posts on his findings:
1
1
8
@concept_of_mind
conceptofmind
2 years
An additional FLAN Dialog submix dataset was also preprocessed for causal language modeling, fixing different encoding issues, and is available on @huggingface to download.
1
0
9
@concept_of_mind
conceptofmind
2 years
The @NousResearch Hermes dataset consists of over 300,000 instruction data points. Thank you to @karan4d and @Teknium1 for providing the data to train these models.
2
1
9
@concept_of_mind
conceptofmind
2 years
"Helped" add the new @PyTorch 2.0 Flash Attention to Lucidrain's PaLM-rlhf-pytorch repository. The repository uses RLHF to build models similar to #ChatGPT and #GPT4. Be sure to support/donate to his great open-source work.
0
2
7
@concept_of_mind
conceptofmind
2 years
I have had the pleasure of working with @hwchase17 to expand the @LangChainAI ecosystem by adding support for numerous different #opensource models, such as those by @AiEleuther, and providers. It is a necessary step in ensuring the democratization of artificial intelligence.
@hwchase17
Harrison Chase
2 years
We need more options for integrating open source models (like those from @AiEleuther) into @LangChainAI . 🏆Thanks to @EnricoShippole we have exactly that. 🚀First class support for @gooseai_NLP @cerebriumai @ForefrontAI and Petals . 📃Docs:
0
2
9
@concept_of_mind
conceptofmind
1 year
Dissemination of artificial intelligence through clean, open-source user interfaces and experiences is an absolutely necessary gap that needs to be bridged between the research community and app developers. We must start furthering collaboration with front-end communities.
@rohanpaul_ai
Rohan Paul
1 year
Ollama iOS mobile app (open source). It works with all models served with Ollama.
2
0
8
@concept_of_mind
conceptofmind
2 years
Additionally, you can find a Hermes-Open-Llama-7b-4k model fine-tuned at a context length of 4k on @huggingface here:
1
0
9
@concept_of_mind
conceptofmind
2 years
This is a previous extension of his work to publicly release the FLAN collection: .
@concept_of_mind
conceptofmind
2 years
Introducing an open-source reproduction of the FLAN V2 dataset.
1
0
9
@concept_of_mind
conceptofmind
2 years
Adding support for numerous different #opensource models and providers to @LangChainAI is an imperative step in establishing an ecosystem that is mutually beneficial to all. The work done by @hwchase17 will help lead to both fair and equal distribution of artificial intelligence.
@LangChainAI
LangChain
2 years
🦜🔗 v0.0.86. 📂Lots more open source model integrations! @EnricoShippole .🪵PromptLayer (@imjaredz) and Helicone (@justinstorre) integrations. And lots of other docs and bug fixes!. 🧵.
1
1
9
@concept_of_mind
conceptofmind
2 years
A PR to add scaled rotary embeddings to @huggingface transformers has been added by @joao_gante and merged:
1
1
7
@concept_of_mind
conceptofmind
2 years
We worked directly with @kaiokendev1, to extend the context length of the Llama-2 13b model through fine-tuning. The model passes all our evaluations and maintains the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.
1
0
8
@concept_of_mind
conceptofmind
2 years
@nisten @NousResearch Standard fine-tune at 8k. No landmark. No lora.
0
0
6
@concept_of_mind
conceptofmind
2 years
I would also recommend checking out the phenomenal research by @OfirPress on ALiBi which laid the foundation for many of these scaling techniques:
1
0
8
@concept_of_mind
conceptofmind
2 years
@jeremyphoward Something that is quite often not discussed is sequence parallelism as well. It is supported in Nvidia Apex. Enables 4D parallelism. Using it in my larger 8k transformers:
2
1
8
@concept_of_mind
conceptofmind
2 years
@kamyrov You can do this for completely free as well with stable diffusion (what Lensa uses) using Google colab. Here is the link to the open-source notebook:
0
1
8
@concept_of_mind
conceptofmind
2 years
@andersonbcdefg @typedfemale I would recommend a general understanding of CUDA and GPU programming. Start with something like Oak Ridge National Laboratory CUDA training series:
0
2
8