conceptofmind @concept_of_mind profile

conceptofmind

@concept_of_mind

Followers

2K

Following

411

Media

71

Statuses

1K

Joined November 2019

Don't wanna be here? Send us removal request.

conceptofmind

@concept_of_mind

1 year

Releasing Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling. The model was trained in collaboration with u/bloc97 and @theemozilla of @NousResearch and @Void13950782 of @AiEleuther.

28

167

771

conceptofmind

@concept_of_mind

2 years

Introducing three new open-source PaLM models trained at a context length of 8k on C4. Open-sourcing LLMs is a necessity for the fair and equitable democratization of AI. The models of sizes 150m, 410m, and 1b are available to download and use here:

7

104

522

conceptofmind

@concept_of_mind

2 years

Releasing LLongMA-2 16k, a suite of Llama-2 models, trained at 16k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.

14

91

384

conceptofmind

@concept_of_mind

2 years

Releasing LLongMA-2, a suite of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.

6

86

368

conceptofmind

@concept_of_mind

2 years

Releasing LLongMA-2 13b, a Llama-2 model, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.

6

53

264

conceptofmind

@concept_of_mind

2 years

Releasing Hermes-Falcon-7b-8k, a Falcon model fine-tuned at 8k context length on the @NousResearch Hermes instruction dataset.

16

38

222

conceptofmind

@concept_of_mind

2 years

Releasing Flan-Open-Llama-7b, an OpenLLaMA model fine-tuned on the FLAN instruction dataset.

4

38

204

conceptofmind

@concept_of_mind

2 years

With Reddit and many other sites shutting down access to their APIs it is now more important than ever to release quality open-source conversational data. I worked with @ShayneRedford to generate ~80GB of labeled FLAN dialog data.

2

40

205

conceptofmind

@concept_of_mind

10 months

We publicly released a cleaned open-source version of the case law data. You can train your own similar legal models with this dataset. We plan to release numerous other legal datasets consisting of billions of tokens in the upcoming weeks.

clem 🤗

@ClementDelangue

10 months

Even OAI is telling you that specialized models are better!

21

27

189

conceptofmind

@concept_of_mind

2 years

Releasing Hermes-Open-Llama-7b-8k, an OpenLLaMA model fine-tuned at 8k context length on the @NousResearch Hermes instruction dataset.

4

40

177

conceptofmind

@concept_of_mind

2 years

Introducing an open-source reproduction of the FLAN V2 dataset.

3

32

170

conceptofmind

@concept_of_mind

2 years

Releasing Flan-Open-Llama-13b, an OpenLLaMA model fine-tuned on the FLAN instruction dataset.

3

29

147

conceptofmind

@concept_of_mind

2 years

Releasing a new PaLM 2.1b model trained at a context length of 8k on C4. This model release is a continuation of the previously released 150m, 410m, and 1b models.

conceptofmind

@concept_of_mind

2 years

Introducing three new open-source PaLM models trained at a context length of 8k on C4. Open-sourcing LLMs is a necessity for the fair and equitable democratization of AI. The models of sizes 150m, 410m, and 1b are available to download and use here:

3

21

137

conceptofmind

@concept_of_mind

1 year

The model can be found on @huggingface here:

4

21

135

conceptofmind

@concept_of_mind

2 years

Releasing Hermes-LLongMA-2 8k, a series of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The models were trained in collaboration with @Teknium1 and @theemozilla of @NousResearch, and @kaiokendev1.

3

32

132

conceptofmind

@concept_of_mind

2 years

Introducing LLongMA, a series of OpenLLaMA models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and Kaiokendev.

3

26

124

conceptofmind

@concept_of_mind

2 years

Releasing Tasksource-Open-Llama-13b, an OpenLLaMA model fine-tuned on the Tasksource instruction dataset.

2

23

101

conceptofmind

@concept_of_mind

11 months

@TeraflopAI is excited to help support the @caselawaccess and @HarvardLIL, in the release of over 6.6 million state and federal court decisions published throughout U.S. history.

3

36

93

conceptofmind

@concept_of_mind

2 years

Introducing LLongMA 13b, an OpenLLaMA model trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1.

1

17

80

conceptofmind

@concept_of_mind

2 years

Releasing Flan-Open-Llama-3b, an OpenLLaMA model fine-tuned on the FLAN instruction dataset.

1

18

69

conceptofmind

@concept_of_mind

2 years

Towards clean and open source text data. A deduplicated version of wikitext-103-v1 is available on @Huggingface datasets. The dataset was deduplicated with Minhash LSH and a Jaccard similarity of 0.80. #machinelearning #deeplearning #datascience.

2

9

59

conceptofmind

@concept_of_mind

1 year

We are releasing all of the code, open-source, to fully reproduce the results of the paper. The repository containing u/bloc97 and @theemozilla’s implementation of YaRN rotary embeddings can be found here:

2

6

54

conceptofmind

@concept_of_mind

1 year

Happy to be a core contributor to @ShayneRedford's Data Provenance Initiative. It is now more important than ever to verify the commercial licensing of available datasets in order to help ensure the integrity of the open-source community.

Shayne Longpre

@ShayneRedford

1 year

📢Announcing the🌟Data Provenance Initiative🌟. 🧭A rigorous public audit of 1800+ instruct/align datasets. 🔍Explore/filter sources, creators & license conditions. ⚠️We see a rising divide between commercially open v closed licensed data. 🌐: 1/

0

15

52

conceptofmind

@concept_of_mind

1 year

Happy to have played a part in the release of Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers. This SOTA work was done by @RiversHaveWings, @Birchlabs, @StefanABaumann, @iScienceLuvr, and @DanielZKaplan.

3

10

52

conceptofmind

@concept_of_mind

1 year

We worked to extend the context length of the Llama-2 13b and 7b models through fine-tuning. The model passes all our evaluations and maintains the same perplexity at 128k extrapolation surpassing the performance of our other recent methodology, NTK-part scaling.

2

12

45

conceptofmind

@concept_of_mind

10 months

We are releasing trillions of high-quality, copyright-free, permissively licensed tokens and multimodal data. Be sure to follow our releases @TeraflopAI.

1

9

44

conceptofmind

@concept_of_mind

11 months

It is important to democratize fair access to data to the public, legal community, and researchers. You can find a processed and cleaned version of the data available on @huggingface here:

2

8

38

conceptofmind

@concept_of_mind

2 years

We worked directly with @kaiokendev1, to extend the context length of the Llama-2 7b model through fine-tuning. The models pass all our evaluations and maintain the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.

1

39

conceptofmind

@concept_of_mind

2 years

We worked directly with Kaiokendev, to extend the context length of the open-llama 7b and 3b models through fine-tuning. The fine-tuned models maintain the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.

1

6

37

conceptofmind

@concept_of_mind

9 months

Happy to announce our paper, Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers, has been accepted to #ICML2024. A huge congratulations to @RiversHaveWings, @StefanABaumann, and @Birchlabs. @icmlconf #ICML .

conceptofmind

@concept_of_mind

1 year

Happy to have played a part in the release of Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers. This SOTA work was done by @RiversHaveWings, @Birchlabs, @StefanABaumann, @iScienceLuvr, and @DanielZKaplan.

2

7

36

conceptofmind

@concept_of_mind

1 year

A Yarn-Llama-2-7b model trained for 128k context length is available on @huggingface here:

1

6

33

conceptofmind

@concept_of_mind

1 year

I would also recommend checking out the phenomenal research by @OfirPress on ALiBi which laid the foundation for many of these scaling techniques:

1

3

29

conceptofmind

@concept_of_mind

2 years

The model can be found on @huggingface here:

1

3

25

conceptofmind

@concept_of_mind

2 years

@zhangir_azerbay Oak Ridge National Laboratory has a CUDA training series from 2021:

0

1

24

conceptofmind

@concept_of_mind

1 year

YaRN: Efficient Context Window Extension of Large Language Models was accepted to ICLR 2024. @bloc97_ @theemozilla @Void13950782. @iclr_conf #ICLR2024.

conceptofmind

@concept_of_mind

1 year

Releasing Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling. The model was trained in collaboration with u/bloc97 and @theemozilla of @NousResearch and @Void13950782 of @AiEleuther.

2

24

conceptofmind

@concept_of_mind

2 years

The data used during fine-tuning was extensively decontaminated and cleaned of any potential benchmarks it was evaluated against by @dmayhem93.

Susan Zhang

@suchenzang

2 years

Odds of everyone starting to train on benchmarks? 🤔. Llama2 only briefly mentions this in Appendix A.6, but only published numbers they deemed "significant" (vs Table C.1 in the GPT-3 paper which shows actual contamination metrics across all benchmarks).

2

3

24

conceptofmind

@concept_of_mind

2 years

The models are also compatible with many of Lucidrain's popular repositories such as Toolformer-pytorch, PaLM-rlhf-pytorch, and PaLM-pytorch. Please be sure to sponsor and help support Phil's great work:

1

0

22

conceptofmind

@concept_of_mind

2 years

I have been working on an open-source replication of WebGPT using @LangChainAI. LangChain by @hwchase17 is by far the best library for building comprehensive language applications.

LangChain

@LangChainAI

2 years

🔎 More detailed search results. @EnricoShippole added a method to the search classes to return more detailed search info: title, snippet, link. 👀 WebGPT?. Google Search: Bing Search:

3

0

22

conceptofmind

@concept_of_mind

2 years

@OfirPress Reddit data is extremely low quality and should be filtered from almost all pre-training so this won't make any difference regardless.

6

1

18

conceptofmind

@concept_of_mind

2 years

You can find the weights on @huggingface if you prefer to download the @PyTorch .pt files from there instead:

2

1

18

conceptofmind

@concept_of_mind

2 years

The models were trained with Flash Attention, Xpos Rotary Embeddings for better length extrapolation, and multi-query single-key-value attention for more efficient decoding.

1

0

17

conceptofmind

@concept_of_mind

1 year

It is also worth reviewing the paper, A Length-Extrapolatable Transformer, and xPos technique which also applies scaling to rotary embeddings:

1

4

16

conceptofmind

@concept_of_mind

1 year

The models have similar performance to the base LLaMA 2 models on the Open LLM benchmarks while scaling context length directly to 128k.

1

2

16

conceptofmind

@concept_of_mind

2 years

@wightmanr Currently working on this in collab with Lucid and a few members from Carper/EAI. @ShayneRedford has been helping me to open-source the FLAN dataset so we can instruct fine-tune models from the Pythia suite. As well as things such as training a flan-PaLM model.

1

3

17

conceptofmind

@concept_of_mind

1 year

The models used @tri_dao's flash attention 2 and part of @togethercompute's codebase. You can find out more about Flash Attention 2 here:

3

2

17

conceptofmind

@concept_of_mind

2 years

A Llama-2 7b model trained at 16k context length will release soon on @huggingface here:

1

4

17

conceptofmind

@concept_of_mind

2 years

A Llama-2 13b model trained at 8k will release soon on @huggingface here:

1

13

conceptofmind

@concept_of_mind

1 year

We also trained a set of models at 64k context length. You can find the Yarn-Llama-2-13b-64k model here:

1

2

16

conceptofmind

@concept_of_mind

2 years

A LLongMA-13b model trained at 8k context length will be released soon. As well as a suite of LLongMA models trained at 16k and 32k context lengths.

1

2

16

conceptofmind

@concept_of_mind

1 year

@Teknium1 @suchenzang Pretty big difference in price for dedicated vs. spot. Also depends on which A100s. You should use skypilot. It is what I am adding integration for into my trainer:

0

1

15

conceptofmind

@concept_of_mind

2 years

Our work on Toolformer, PaLM, and related projects is all thanks to the generous sponsorship by @carperai and @StabilityAI. A big thank you to @dmayhem93, @jonbtow, Aman, and @zach_nussbaum as well for providing input on the @huggingface library.

1

0

16

conceptofmind

@concept_of_mind

2 years

All of the C4 data has been pre-tokenized with the GPTNEOX tokenizer and blocked at sequence lengths of 8192. This will help to save you the large cost of preprocessing data. The datasets are available on @huggingface. An example chunk can be found here:

2

1

15

conceptofmind

@concept_of_mind

10 months

Not only that. We additionally built a search index over all of the data for RAG applications:

1

14

conceptofmind

@concept_of_mind

2 years

@aicrumb Worth checking out as well:

1

3

15

conceptofmind

@concept_of_mind

10 months

Data is what makes the model. We at @TeraflopAI are working hard to provide the open-source community with permissible commercially licensed datasets for training. Congrats to @arankomatsuzaki, @lintangsutawika, and @colinraffel. And thanks to @ShayneRedford for his work on FLAN.

0

1

14

conceptofmind

@concept_of_mind

10 months

Glad to see Stablelm-2-12B by @jonbtow, @dmayhem93, and @StabilityAI using our permissively licensed data to push the cutting-edge of language modeling. Data quality is more important than ever. @arankomatsuzaki and I are working to solve this challenge at scale at @TeraflopAI.

Jade

@Euclaise_

10 months

Has anyone tried this yet?. They seem to have perfected trianing small models (1.6B and 3B). If they were able to keep that while scaling up, this should be amazing.

0

2

14

conceptofmind

@concept_of_mind

2 years

Additionally, you can find a Hermes-Falcon-7b-4k model fine-tuned at a context length of 4k on @huggingface here:

2

1

11

conceptofmind

@concept_of_mind

1 year

As well as the Yarn-Llama-2-7b-64k model here:

1

2

13

conceptofmind

@concept_of_mind

2 years

Further instruction-tuning will be done on the new FLAN datasets we have released. A big thank you to @ShayneRedford for helping!

1

0

13

conceptofmind

@concept_of_mind

2 years

The @NousResearch Hermes dataset consists of over 300,000 instruction data points. Thank you to @karan4d and @Teknium1 for providing the data to train these models.

1

10

conceptofmind

@concept_of_mind

2 years

The model has similar performance to LLaMA 2 under 4k context length, performance scales directly to 8k, and works out-of-the-box with the new version of transformers (4.31) or with `trust_remote_code` for <= 4.30.

1

12

conceptofmind

@concept_of_mind

3 years

An open-source implementation of the ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer research paper in Google's #JAX and #Flax. @paperswithcode. #machinelearning #python #code #programming #tech #deeplearning #ai.

1

3

9

conceptofmind

@concept_of_mind

2 years

@YoniKremer123 @theemozilla @NousResearch @kaiokendev1 16k is training.

1

12

conceptofmind

@concept_of_mind

2 years

Applying the method to the rotary position embedding requires only slight changes to the model's code by dividing the positional index, t, by a scaling factor.

1

11

conceptofmind

@concept_of_mind

1 year

You can find out more about the @NousResearch organization here:

0

2

12

conceptofmind

@concept_of_mind

9 months

A big thank you to @joespeez @Meta for mentioning our previous research, YaRN, at the @weights_biases Fully Connected conference. We have some exciting long-context releases coming up soon.

2

0

12

conceptofmind

@concept_of_mind

2 years

Different inference optimizations such as Flash Attention, Hidet, and Torch compile are used. You can read more about the Hidet compiler and project here:

1

0

12

conceptofmind

@concept_of_mind

2 years

This is not an official Google or StabilityAI product. If you have any questions about the models or training be sure to reach out and ask! I will try to respond promptly.

2

0

12

conceptofmind

@concept_of_mind

2 years

The repository containing @theemozilla’s implementation of scaled rotary embeddings can be found here:

1

0

11

conceptofmind

@concept_of_mind

2 years

A distributed training script is provided so that you may train or fine-tune your own PaLM models using @huggingface accelerate. More information and experiments about the training will be detailed in the repository.:

1

11

conceptofmind

@concept_of_mind

3 years

Update 6: Added The Pile by #EleutherAI as the default dataset for an #opensource pre-training implementation of the #LaMDA research paper and #ai in #PyTorch with @huggingface streaming datasets. #MachineLearning #python #code #programming #tech.

0

2

8

conceptofmind

@concept_of_mind

2 years

@EMostaque @alexgraveley @joao_gante Llama-2 8k is releasing tomorrow if all goes smoothly.

0

2

11

conceptofmind

@concept_of_mind

2 years

@Yampeleg This is a common practice that has been used for quite a few years. You can find an example of packing the text and appending an EOS/EOT token with Huggingface datasets and tokenizers here:

0

1

10

conceptofmind

@concept_of_mind

3 years

An open-source implementation of the Better plain ViT baselines for ImageNet-1k research paper in Google's #JAX and #Flax. @paperswithcode. #MachineLearning #python #code #programming #tech #deeplearning #ai.

1

5

conceptofmind

@concept_of_mind

1 year

All of the models can be found on Huggingface:

1

2

10

conceptofmind

@concept_of_mind

2 years

If you would like to preprocess your own dataset for training there is a dataset builder script provided. This uses @huggingface datasets to efficiently map, tokenize, and block the data:

1

0

9

conceptofmind

@concept_of_mind

2 years

We worked directly with @kaiokendev1, to extend the context length of the Llama-2 13b and 7b models through fine-tuning. The models pass all our evaluations and maintain perplexity at 16k extrapolation surpassing the performance of other recent methodologies.

2

0

10

conceptofmind

@concept_of_mind

2 years

Working on an open-source version of Deepmind's Sparrow. A conversational agent utilizing Google's web search API for factual grounding and RLHF. I am going to be taking the learnings from our previous implementation of Toolformer.

1

0

10

conceptofmind

@concept_of_mind

2 years

The 7b model can be found on @huggingface here:

1

0

10

conceptofmind

@concept_of_mind

2 years

A basic inference script was provided in the repository which you can play around with. You may want to experiment with hyperparameters in order to get generations of varying quality. Changing a variable such as temperature matters a lot.

1

0

10

conceptofmind

@concept_of_mind

2 years

The LLongMA 7b model is available on @huggingface to use:

1

0

10

conceptofmind

@concept_of_mind

2 years

The dialog data is available on @huggingface to download. It was processed at an extended context length of 8192. It contains relevant metadata such as Inputs, Targets, Task Source, and Task Name.

1

0

10

conceptofmind

@concept_of_mind

9 months

I have more copyright-free and commercially viable data than I know what to possibly do with. We are always actively looking for organizations to partner with to train on and serve this data to the community.

Peter Henderson

@PeterHndrsn

9 months

🚨More AI copyright lawsuits!🚨. 1. Artists sue Google for Imagen (. 2. More newspapers sue MSFT/OpenAI (. The newspaper litigation has far more compelling examples and arguments than prior cases. One to watch.

0

1

9

conceptofmind

@concept_of_mind

2 years

The repository containing @theemozilla's implementation of scaled rotary embeddings can be found here:

1

8

conceptofmind

@concept_of_mind

2 years

If you would like to learn more about scaling rotary embeddings, I would strongly recommend reading @kaiokendev1's blog posts on his findings:

1

8

conceptofmind

@concept_of_mind

2 years

An additional FLAN Dialog submix dataset was also preprocessed for causal language modeling, fixing different encoding issues, and is available on @huggingface to download.

1

0

9

conceptofmind

@concept_of_mind

2 years

The @NousResearch Hermes dataset consists of over 300,000 instruction data points. Thank you to @karan4d and @Teknium1 for providing the data to train these models.

2

1

9

conceptofmind

@concept_of_mind

2 years

"Helped" add the new @PyTorch 2.0 Flash Attention to Lucidrain's PaLM-rlhf-pytorch repository. The repository uses RLHF to build models similar to #ChatGPT and #GPT4. Be sure to support/donate to his great open-source work.

0

2

7

conceptofmind

@concept_of_mind

2 years

I have had the pleasure of working with @hwchase17 to expand the @LangChainAI ecosystem by adding support for numerous different #opensource models, such as those by @AiEleuther, and providers. It is a necessary step in ensuring the democratization of artificial intelligence.

Harrison Chase

@hwchase17

2 years

We need more options for integrating open source models (like those from @AiEleuther) into @LangChainAI . 🏆Thanks to @EnricoShippole we have exactly that. 🚀First class support for @gooseai_NLP @cerebriumai @ForefrontAI and Petals . 📃Docs:

0

2

9

conceptofmind

@concept_of_mind

1 year

Dissemination of artificial intelligence through clean, open-source user interfaces and experiences is an absolutely necessary gap that needs to be bridged between the research community and app developers. We must start furthering collaboration with front-end communities.

Rohan Paul

@rohanpaul_ai

1 year

Ollama iOS mobile app (open source). It works with all models served with Ollama.

2

0

8

conceptofmind

@concept_of_mind

2 years

Additionally, you can find a Hermes-Open-Llama-7b-4k model fine-tuned at a context length of 4k on @huggingface here:

1

0

9

conceptofmind

@concept_of_mind

2 years

This is a previous extension of his work to publicly release the FLAN collection: .

conceptofmind

@concept_of_mind

2 years

Introducing an open-source reproduction of the FLAN V2 dataset.

1

0

9

conceptofmind

@concept_of_mind

2 years

Adding support for numerous different #opensource models and providers to @LangChainAI is an imperative step in establishing an ecosystem that is mutually beneficial to all. The work done by @hwchase17 will help lead to both fair and equal distribution of artificial intelligence.

LangChain

@LangChainAI

2 years

🦜🔗 v0.0.86. 📂Lots more open source model integrations! @EnricoShippole .🪵PromptLayer (@imjaredz) and Helicone (@justinstorre) integrations. And lots of other docs and bug fixes!. 🧵.

1

9

conceptofmind

@concept_of_mind

2 years

A PR to add scaled rotary embeddings to @huggingface transformers has been added by @joao_gante and merged:

1

7

conceptofmind

@concept_of_mind

2 years

We worked directly with @kaiokendev1, to extend the context length of the Llama-2 13b model through fine-tuning. The model passes all our evaluations and maintains the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.

1

0

8

conceptofmind

@concept_of_mind

2 years

@nisten @NousResearch Standard fine-tune at 8k. No landmark. No lora.

0

6

conceptofmind

@concept_of_mind

2 years

I would also recommend checking out the phenomenal research by @OfirPress on ALiBi which laid the foundation for many of these scaling techniques:

1

0

8

conceptofmind

@concept_of_mind

2 years

@jeremyphoward Something that is quite often not discussed is sequence parallelism as well. It is supported in Nvidia Apex. Enables 4D parallelism. Using it in my larger 8k transformers:

2

1

8

conceptofmind

@concept_of_mind

2 years

@kamyrov You can do this for completely free as well with stable diffusion (what Lensa uses) using Google colab. Here is the link to the open-source notebook:

0

1

8

conceptofmind

@concept_of_mind

2 years

@andersonbcdefg @typedfemale I would recommend a general understanding of CUDA and GPU programming. Start with something like Oak Ridge National Laboratory CUDA training series:

0

2

8