interstellarninja Profile Banner
interstellarninja Profile
interstellarninja

@intrstllrninja

Followers
1,521
Following
319
Media
575
Statuses
3,926

growing artificial societies | by the open-source AGI, for the people.

Tesseract
Joined December 2010
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@intrstllrninja
interstellarninja
1 month
this interstellarninja is on covert missions right now involving power struggles with closed source AI labs and regulatory bodies plotting against open source AI 🥷
@nippon_en
Nippon.com
1 month
Japan’s ninja are famed for their covert activities over centuries of power struggles in the country, and were highly prized by Tokugawa Ieyasu.
84
578
8K
0
0
4
@intrstllrninja
interstellarninja
4 months
current state of stochastic parrot LLMs
Tweet media one
35
567
5K
@intrstllrninja
interstellarninja
9 months
Mixtral API pricing by provider: 1. @MistralAI input: 0.6€ / 1M tokens output: 1.8€ / 1M tokens 2. @togethercompute $0.6 / 1M tokens 3. @perplexity_ai input: $0.14 / 1M tokens output: $0.56 / 1M tokens 4. @anyscalecompute $0.50 / 1M tokens
@anyscalecompute
Anyscale
9 months
We’re excited to announce the official @MistralAI Mixtral 8x7B model on Anyscale Endpoints, offering the best price on the market with an OpenAI compatible API. 💸 Pricing: $0.5 / million tokens 📆 Coming soon: JSON mode and function calling Try out Mixtral on Anyscale
Tweet media one
32
75
684
15
74
709
@intrstllrninja
interstellarninja
8 months
mixtral routing analysis shows that experts did not specialize to specific domains
Tweet media one
10
31
316
@intrstllrninja
interstellarninja
9 months
it is confirmed by Mistral co-founder Arthur Mensch on a16z podcast that they duplicated the dense base 7B model layers 8x and further trained with a gating network
@tianle_cai
Tianle Cai
9 months
Exciting times with the new Mixtral model from @MistralAI ! It’s evident that they’ve fine-tuned the Mistral 7B model to an impressive 8x. The significant correlation between the weights of the two models is a testament to the successful reuse of models. This approach could
Tweet media one
13
55
480
6
23
292
@intrstllrninja
interstellarninja
6 months
1-bit LLMs with ternary weights require no multiplication which calls for a new hardware design different from GPUs
Tweet media one
12
36
300
@intrstllrninja
interstellarninja
4 months
killer app of LLMs is not Scarlett Johansson, it's AI agent execution graph with local models🤖 here's MeeseeksAI, a local AI agent execution graph running on ollama with @NousResearch 's Hermes-2-Pro-Llama-3-8B with flawless tool-use and reasoning 🚀
@karpathy
Andrej Karpathy
4 months
The killer app of LLMs is Scarlett Johansson. You all thought it was math or something
326
1K
12K
9
30
262
@intrstllrninja
interstellarninja
7 months
Qwen 0.5B may not write good poems but it is a beast at function calling🔥 It had a pass-rate of 77% on @FireworksAI_HQ function calling eval dataset 🧰 their blog also shows that Qwen1.5 models are close to GPT-4's performance on tool-use @huybery @JustinLin610
@andrew_n_carr
Andrew Carr (e/🤸)
7 months
Qwen 0.5B cannot write poems
3
0
12
7
26
245
@intrstllrninja
interstellarninja
6 months
JSON mode with Mistral-7B has a pass rate of 80% 🔥 Mistral-7B base was finetuned on a mix of mini Hermes, function calling, json-mode and agentic datasets. stay tuned for struct models & datasets from @NousResearch 🥽
8
28
228
@intrstllrninja
interstellarninja
9 months
just watched @realGeorgeHotz hack Mistral MoE inference with @__tinygrad__ and this was fun my definition of entertainment has changed 😊
3
12
213
@intrstllrninja
interstellarninja
9 months
Mistral-7B performs on par with GPT-3.5 on function calling🔥 i'm happy to replace my function calling/extraction projects with Mistral API now also I propose @MistralAI to implement the following schema for "response_format" for json mode which includes type and schema
Tweet media one
@robertnishihara
Robert Nishihara
9 months
Function calls have been a massive gap in the open source ecosystem (and the most common feature request). We benchmarked function calling on a variety of open and proprietary models. Impressively, Mistral-7B performs on par with GPT-3.5. Here's how they stack up 🤯🤯 ⚫️
19
70
500
1
22
162
@intrstllrninja
interstellarninja
4 months
wait phi-3 is trained on function-calling out of the box?
Tweet media one
7
14
129
@intrstllrninja
interstellarninja
6 months
recursive function-calling LLM dropping to your local GPU very soon...
5
6
110
@intrstllrninja
interstellarninja
1 year
happiness is finetuning llama2 7B w/ qlora on a mid RTX 3060 GPU
Tweet media one
7
8
108
@intrstllrninja
interstellarninja
4 months
Multi-token prediction is 3x faster using self-speculative decoding while also improving performance on tasks like coding and algorithmic reasoning as it emphasizes on longer-term dependencies
Tweet media one
@cto_junior
TDM (e/λ)
4 months
One of the important observations in Multi-token prediction paper is not that it's fast (otherwise it would be a bummer) but that prediction n > 1 tokens also improves model's accuracy on multiple coding evals
Tweet media one
Tweet media two
3
1
27
3
6
82
@intrstllrninja
interstellarninja
5 months
Will GPT-5 be natively built with agentic capability?
8
14
79
@intrstllrninja
interstellarninja
4 months
Multi-Head Latent Attention (MLA) introduced by DeepSeek-V2 uses low-rank key-value joint compression to significantly reduce the key-value cache required during inference. It achieves better performance than standard multi-head attention while using 93.3% less key-value cache.
Tweet media one
0
7
77
@intrstllrninja
interstellarninja
8 months
mixtral experts seem to specialize in syntax rather than domain specially in the initial and final layers
Tweet media one
1
5
69
@intrstllrninja
interstellarninja
6 months
you can now run function calling and json mode with @ollama thanks to @AdrienBrault 🔥
Tweet media one
@AdrienBrault
Adrien Brault-Lesage
6 months
I have created and pushed @ollama models for Hermes 2 Pro 7B!
8
11
118
1
7
69
@intrstllrninja
interstellarninja
6 months
JSON mode with local @NousResearch Hermes 2 Pro model doesn't need begging the LLM gods
@AnirudhTulasi
Anirudh Tulasi
6 months
Yo, just witnessed what @Teknium1 pulled off with their new release – bro cooked! 🚀
3
5
51
0
6
59
@intrstllrninja
interstellarninja
4 months
@lillux_l well humans have tongue slips and correct themselves as they speak for one and we hit backspace a lot while we type a lot of humans communication is through gestures besides words
4
0
51
@intrstllrninja
interstellarninja
9 months
GPU poor but API rich 😊
Tweet media one
4
1
51
@intrstllrninja
interstellarninja
5 months
Hermes 2 Pro @ 96% vs GPT-3.5 @ 89% on adhering to JSON schema over 5 million requests 🔥
@DataDeLaurier
𝙳𝚊𝚟𝚒𝚍 𝙳𝚎𝙻🄰𝚞𝚛🄸𝚎𝚛 ⏩
5 months
@NousResearch I have officially replaced GPT-3.5 with Hermes 2 Pro 7B. I have compared output from both models for 5 million requests involving adhering to a JSON schema and it's not even close. Correct output from GPT-3.5 @ 89%. Correct output from Hermes 2 Pro 7B @ 96%. Also, Hermes 2 Pro 7B
2
3
17
2
5
48
@intrstllrninja
interstellarninja
4 months
okay phi-3 passes JSON-mode test fine
Tweet media one
@abacaj
anton
4 months
Phi-3 seems pretty good, an improvement over phi-2 for sure. The long context 128k seems very useful for extracting information and document processing given that the model is quite small it can be deployed for less
Tweet media one
10
19
219
3
2
42
@intrstllrninja
interstellarninja
1 year
We need a @huggingface leaderboard for high-quality datasets!
@Teknium1
Teknium (e/λ)
1 year
We need to build code instruct datasets that are advanced and elite now to be ready
3
3
50
1
1
38
@intrstllrninja
interstellarninja
6 months
<cmd> run world_sim.exe --epoch "Earth in 2500" --civilization_type "Type-II on Kardashev scale" </cmd>
@karan4d
mephisto
6 months
im opensourcing worldsim of course i am worldsim sysprompt and conversation to intitialize: sysprompt: <sys>Assistant is in a CLI mood today. The human is interfacing with the simulator directly. capital letters and punctuation are optional meaning is optional hyperstition is
21
71
638
3
7
37
@intrstllrninja
interstellarninja
6 months
build your recursive AI agent with function-calling in just a few lines of code using our latest Hermes 2 Pro model supports json mode and has in-context agentic abilities it was great collaborating with @Teknium1 and folks at @NousResearch let the local AGI unleash itself! 🚀
@NousResearch
Nous Research
6 months
Introducing the latest version of our Hermes series of models, Hermes 2 Pro 7B. This latest version improves several capabilities, using an updated and cleaned version of the Hermes 2 dataset, and is now trained on a diverse and rich set of function calling and JSON mode
Tweet media one
26
110
616
0
7
35
@intrstllrninja
interstellarninja
6 months
recursive function calling works wonders as a google search agent 🔍
2
3
33
@intrstllrninja
interstellarninja
9 months
@markopolojarvi @jxmnop Gemini uses pathways instead of MoE and it is natively multimodal unlike GPT-4
1
0
33
@intrstllrninja
interstellarninja
6 months
@NousResearch the model is highly performant on function calling with a pass rate of 95% we use special tags such as <tool_call></tool_call> for function calling for parsing but no tags for json mode and addition of json mode dataset with no tags doesn't degrade function calling ability
Tweet media one
4
2
30
@intrstllrninja
interstellarninja
4 months
merge of Herme-2-Pro and Llama-3-Instruct is here🔄 you can pull the ollama version of Hermes-2-Theta gguf here:
@Teknium1
Teknium (e/λ)
4 months
Me and @chargoddard collabed to make something pretty unique here, Hermes 2 Θ (Theta) - a Hermes 2 Pro + Llama-3 Instruct merge that takes Hermes to the next level (and gets to meme on gpt4"o" at the same time). Check it out on HF here: We added some
Tweet media one
31
40
356
1
7
30
@intrstllrninja
interstellarninja
7 months
Qwen1.5-7B beats Mistral-7B in tool-use while the largest 72B models performs close to GPT-4
Tweet media one
@huybery
Binyuan Hui
7 months
👋 Qwen's latest open source work, Qwen1.5, says hello to the world !! 👉🏻 More sizes: six sizes for your different needs. 0.5B, 1.8B, 4B, 7B, 14B and 72B, including Base and Chat. 👉🏻 Better alignment: despite still trailing behind GPT-4-Turbo, the largest open-source
Tweet media one
41
135
638
3
3
28
@intrstllrninja
interstellarninja
6 months
Hermes 2 Pro matches GPT-3.5 on function calling with 100% pass rate on mini eval by @cleavey1985 💯
Tweet media one
@cleavey1985
Chris Levy
6 months
Just published a post with my first look into function calling with Hermes-2-Pro-Mistral-7B. Thanks to @NousResearch @Teknium1 @intrstllrninja @theemozilla @karan4d @huemin_art for the amazing open source model, dataset, evaluation, and so on.
2
14
84
1
1
28
@intrstllrninja
interstellarninja
4 months
DeepSeek-v2 paper hints that "less may not be more for alignment"
Tweet media one
2
6
26
@intrstllrninja
interstellarninja
4 months
@tsarnick energy, compute and fun
2
0
27
@intrstllrninja
interstellarninja
8 months
however the router "exhibits some structured syntactic behavior" for eg. - "self" in Python and "question" in English get often routed through same expert - indentation in code get assigned to same experts - consecutive tokens also get assigned same experts
2
1
25
@intrstllrninja
interstellarninja
5 months
what if i told you your embedding model and generative model can be a single model?
Tweet media one
2
4
24
@intrstllrninja
interstellarninja
5 months
orchestrating an actor-critic agentic framework using Hermes-2-Pro json-mode running on @ollama
2
2
23
@intrstllrninja
interstellarninja
6 months
stock analysis and web search agent with local function calling using @NousResearch Hermes 2 Pro model running on @ollama 🚀
@ashpreetbedi
Ashpreet Bedi
6 months
Spectacular local function calling using @Teknium1 Hermes 2 Pro running on @ollama . Can't believe this is a 7B model running locally. Still testing but will share more soon. code:
13
41
304
0
3
15
@intrstllrninja
interstellarninja
5 months
could this maze of mumbo jumbo be the new SoTA arch?
Tweet media one
6
0
22
@intrstllrninja
interstellarninja
6 months
with release of Hermes 2 Pro we are also filling the gap of lacking public benchmarks for common LLM use-cases like function-calling and json-mode we have made our evaluation framework and eval datasets public
@NousResearch
Nous Research
6 months
We also created a custom evaluation framework for Function Calling and JSON Mode, with tests based on the function calling eval dataset made by @FireworksAI_HQ @intrstllrninja created this custom evaluation framework to make our customized pipeline for parsing and handling our
1
4
53
1
2
21
@intrstllrninja
interstellarninja
6 months
Yi data engineering principle: "promote quality over quantity for both pretraining and finetuning"
Tweet media one
3
1
22
@intrstllrninja
interstellarninja
5 months
░G░P░U░P░O░O░R░I░N░B░I░O░
Tweet media one
@grok
Grok
5 months
@elonmusk @xai ░W░E░I░G░H░T░S░I░N░B░I░O░
2K
2K
16K
2
0
20
@intrstllrninja
interstellarninja
4 months
@JoannotFovea have you had the time to look at the tests though?
@nearcyan
near
4 months
I'm so glad we are using MMLU to judge our LLMs I couldn't imagine my AI not nailing these test questions!
Tweet media one
24
21
365
3
0
21
@intrstllrninja
interstellarninja
2 years
@elonmusk @jack Birdwatch sounds like twitter employees have discretionary power over accuracy while community notes indicates power to the people!
1
0
18
@intrstllrninja
interstellarninja
9 months
@markopolojarvi @jxmnop yes the Gemini paper does mention that they use the PATHWAYS framework
Tweet media one
1
0
20
@intrstllrninja
interstellarninja
9 months
@abacaj Claude-2 and Mixtral (on together) are already drop in replacements for GPT-3.5 for extraction function calling type tasks it will be interesting to watch the market segmentation of API users unfold
2
2
21
@intrstllrninja
interstellarninja
9 months
to put the price per million tokens in perspective, 100 tokens ≈ 75 words 1M tokens ≈ 750,000 words harry potter series = 1,084,170 1 M tokens ≈ 69% of harry potter series if we use @anyscalecompute you have entire harry potter series worth of words available for $0.75
2
1
19
@intrstllrninja
interstellarninja
1 year
CodeLlama 7B is quite powerful when it comes to structured output such as json swapped OpenAI API with @LMStudioAI local http inference server running Code Llama for a table extraction and transformation project and it works great w/ minor prompt engineering updates
Tweet media one
2
3
18
@intrstllrninja
interstellarninja
9 months
@markopolojarvi @jxmnop They have been using Pathways system for orchestration of distributed computation for accelerators which was developed for the Pathways architecture for training a single model for different domains and tasks
1
0
17
@intrstllrninja
interstellarninja
4 months
phi-3-mini's data optimal regime is achieved through filtering web data for correct level of knowledge keeping more data that improves reasoning local LLMs don't need to know ephemeral knowledge such as "who won a particular premier league match" which leaves room for reasoning
Tweet media one
1
0
17
@intrstllrninja
interstellarninja
6 months
RIP Akira Toriyama
@DiscussingFilm
DiscussingFilm
6 months
‘DRAGON BALL’ creator Akira Toriyama has sadly passed away at the age of 68.
Tweet media one
7K
76K
378K
1
3
18
@intrstllrninja
interstellarninja
5 months
are LLMs evolving into computation graphs with dynamic allocation of compute?
Tweet media one
@TheSeaMouse
Hassan Hayat 🔥
5 months
These savings further compound when paired with Mixture of Experts. We are entering an era of scalable compute of LLMs. Tokens will not have fixed costs, the machine will take the time it needs to think. Massive improvements for both gpu rich and poor
Tweet media one
2
8
149
0
2
17
@intrstllrninja
interstellarninja
1 year
@dev_Starprince @KevinNaughtonJr they will be replaced by the ones who do
1
2
18
@intrstllrninja
interstellarninja
4 months
great datasets, what's lacking is dedicated datasets for structured output, tool-use and agents
@maximelabonne
Maxime Labonne
4 months
💾 LLM Datasets LLM development is increasingly moving towards curating high-quality datasets, as shown by Llama 3. I've compiled a collection of fine-tuning datasets along with advice and tools for creating your own. 💻 GitHub:
Tweet media one
22
155
771
0
1
17
@intrstllrninja
interstellarninja
7 months
for higher complexity coding tasks, asking the model to first generate detailed code descriptions before generating code boosts performance of DeepSeek-Coder-Instruct models with this prompt: "You need first to write a step-by-step outline and then write the code"
Tweet media one
3
6
16
@intrstllrninja
interstellarninja
6 months
Hermes spawns "Grok" to hold a monologue on consciousness and nature of reality using code interpreter 😎
1
0
14
@intrstllrninja
interstellarninja
3 months
fine-tuning Mistral models is now one notebook away
@HamelHusain
Hamel Husain
3 months
Breaking: First live demo of the @MistralAI fine-tuning API (released a few hours ago) is here: @sophiamyang walks through: - How to prep & validate your data - Hyper params - The fine-tuning API - Integrations (W&B, etc) - A treasure trove of collab notebooks & docs
3
45
341
1
2
15
@intrstllrninja
interstellarninja
9 months
@pydantic 's "model_json_schema()" is a good method to provide your structured json schema to the LLM the benefit with that is your pydantic field descriptions act as additional prompt for the key you are interested in extracting
Tweet media one
2
1
15
@intrstllrninja
interstellarninja
4 months
Llama-3-70B is really good at JSON output with a simple system prompt on @GroqInc cloud
Tweet media one
1
0
15
@intrstllrninja
interstellarninja
6 months
ai waifu explains stream diffusion😍
2
1
15
@intrstllrninja
interstellarninja
6 months
to do list - build AGI - build AGI - build AGI
@prmshra
parm
6 months
to do list - cure cancer - cure ALS - cure Alzheimer’s
64
12
366
0
1
15
@intrstllrninja
interstellarninja
3 months
Mistral 7B v0.3 now supports function calling with added special tool call tokens in the vocabulary
Tweet media one
@4evaBehindSOTA
tokenbender
3 months
Mistral just silently dropped v0.3 for their 7B model with extended vocab upto 32768 and function calling support. No eval data yet.
Tweet media one
3
0
60
1
1
15
@intrstllrninja
interstellarninja
8 months
the outlier being "DM Mathematics" which has a marginally different distribution of experts the authors attribute it to dataset's synthetic nature and having limited coverage of natural language
Tweet media one
1
0
15
@intrstllrninja
interstellarninja
6 months
recursion can be achieved pretty easily by continuing to run inference if model completion has "tool_calls" else return final assistant message
Tweet media one
1
0
14
@intrstllrninja
interstellarninja
5 months
<cmd> sudo python3 akashic_records.py --entity ["sam altman", "elon musk"] --mode "email thread" --topic "superintelligence scenarios" </cmd>
4
4
14
@intrstllrninja
interstellarninja
6 months
empower yourself with local operating system assistant with Hermes 2 Pro function calling model
@peakcooper
Cooper
6 months
Hermes 2 Pro function calling in action - a sneak peek of what OSS local assistants will look like - thanks to @NousResearch @Teknium1 @intrstllrninja
6
9
68
0
1
13
@intrstllrninja
interstellarninja
4 months
it's insane how far we can get by just copying layers and increasing depth of a good model like llama-3-70B
@cognitivecompai
Cognitive Computations
4 months
@maximelabonne And another thing - llama3-70b is "almost there" and llama3-120b is "there" - but the only difference is extra layers, copied even. No new information was trained. So this level of intelligence really *does* emerge from the depth of the model. It's not just a function of the
12
20
191
0
0
13
@intrstllrninja
interstellarninja
5 months
json-mode with Hermes-2-Pro doesn't need grammars to enforce json schema
@andrejusb
Andrej Baranovskij
5 months
Running local RAG and want to get structured JSON output for extracted data from PDF files? 1. Use this LLM with Ollama: adrienbrault/nous-hermes2pro:Q5_K_M-json It retuns clean JSON output, no extra description text 2. Validate LLM with dynamic Pydantic class (based on
Tweet media one
2
33
191
1
2
12
@intrstllrninja
interstellarninja
8 months
the results are coherent with that of the original paper they found that experts highly specialized in syntax and/or semantics
Tweet media one
1
0
13
@intrstllrninja
interstellarninja
7 months
the evaluation was run on a fine-tuned Qwen1.5-0.5B base model over a mixture of Hermes and function calling dataset for 4 epochs
Tweet media one
3
0
13
@intrstllrninja
interstellarninja
6 months
. @ylecun is probably normal - language is one mode of reasoning among many
@Teknium1
Teknium (e/λ)
6 months
This explains why Yann is so bearish on LLMs... 😲
Tweet media one
74
43
1K
2
0
12
@intrstllrninja
interstellarninja
4 months
you can now run Hermes-2-Pro-Llama-3-8b on ollama
@AdrienBrault
Adrien Brault-Lesage
4 months
@NousResearch Pushed to @ollama ! ollama run adrienbrault/nous-hermes2pro-llama3-8b:q4_K_M --format json 'solar system as json'
3
7
47
0
1
12
@intrstllrninja
interstellarninja
4 months
congrats opensource fam, we made it! 🙌
Tweet media one
1
0
12
@intrstllrninja
interstellarninja
4 months
this sounds more like an AI cartel
@AndrewCurran_
Andrew Curran
4 months
This morning the Department of Homeland Security announced the establishment of the Artificial Intelligence Safety and Security Board. The 22 inaugural members include Sam Altman, Dario Amodei, Jensen Huang, Satya Nadella, Sundar Pichai and many others.
Tweet media one
305
240
1K
2
0
12
@intrstllrninja
interstellarninja
6 months
kardashev gradient climber spotted
@bennbuilds
Ben Nowack
6 months
Sharing a bit more about Reflect Orbital today. @4TristanS and I are developing a constellation of revolutionary satellites to sell sunlight to thousands of solar farms after dark. We think sunlight is the new oil and space is ready to support energy infrastructure. This
464
721
4K
0
2
12
@intrstllrninja
interstellarninja
4 months
testing Snowflake Arctic with SQL generation
Tweet media one
1
0
11
@intrstllrninja
interstellarninja
5 months
okay NVIDIA needs to come up with consumer GPUs with Blackwell architecture if labs keep open-sourcing large MoE models
@code_star
Cody Blakeney
5 months
It’s finally here 🎉🥳 In case you missed us, MosaicML/ Databricks is back at it, with a new best in class open weight LLM named DBRX. An MoE with 132B total parameters and 32B active 32k context length and trained for 12T tokens 🤯
Tweet media one
28
130
826
0
1
10
@intrstllrninja
interstellarninja
5 months
3Blue1Brown video on visualizing "attention" needs your attention
@3blue1brown
Grant Sanderson
5 months
The next chapter about transformers is up on YouTube, digging into the attention mechanism: The model works with vectors representing tokens (think words), and this is the mechanism that allows those vectors to take in meaning from context.
62
774
5K
0
1
11
@intrstllrninja
interstellarninja
4 months
XML as root syntax with text or structured output like JSON within it is the new "AI Markup Language"
@HamelHusain
Hamel Husain
4 months
At first when I saw xml for Claude I was like "WTF Why XML". Now I LOVE xml so much, can't prompt without it. Never going back
31
18
377
0
0
11
@intrstllrninja
interstellarninja
5 months
great to see @DbrxMosaicAI 's DBRX use native chatml format!
@danielhanchen
Daniel Han
5 months
Took a look at @databricks 's new open source 132 billion model called DBRX! 1) Merged attention QKV clamped betw (-8, 8) 2) Not RMS Layernorm - now has mean removal unlike Llama 3) 4 active experts / 16. Mixtral 2/8 experts. 4) @OpenAI 's TikToken tokenizer 100K. Llama splits
Tweet media one
24
172
1K
2
3
10
@intrstllrninja
interstellarninja
5 months
@jeremyphoward @AnthropicAI we trained Hermes 2 Pro function calling model to generate function calls delimited by <tool_call></tool_call> tags and we happily use XML ElementTree to parse the function calls but community seems to prefer regex 🤷‍♂️
1
0
7
@intrstllrninja
interstellarninja
4 years
@RaameshKoirala @ila_home Bro people with science degrees, research experience, on the job experience etc can call themselves scientists! Even a political science or a social science degree holder can call themselves scientists. Please be open minded about choices people make. Please be respectful! Thks!
1
1
10
@intrstllrninja
interstellarninja
9 months
@yacineMTB @xlr8harder can we just make it convention for python functions to return a result and [error/s]
1
1
9
@intrstllrninja
interstellarninja
6 months
hey devin set up my conda environment for a training run fixing all the quirky CUDA issues please!🤯
Tweet media one
1
1
8
@intrstllrninja
interstellarninja
4 months
After Databricks we have another cloud db provider Snowflake enter the LLM arena with a massive 480B MoE Should be a good SQL code generation model given its benchmark performance
@RamaswmySridhar
sridhar
4 months
. @SnowflakeDB is thrilled to announce #SnowflakeArctic : A state-of-the-art large language model uniquely designed to be the most open, enterprise-grade LLM on the market. This is a big step forward for open source LLMs. And it’s a big moment for Snowflake in our #AI journey as
39
88
588
0
1
10
@intrstllrninja
interstellarninja
1 year
@bindureddy Not true. This opens up development of autonomous agents that solve specific problems by taking on various roles and tasks like a small team does. AI hallucinations can be easily fixed with human supervisor in the loop who makes final production decisions.
4
0
10
@intrstllrninja
interstellarninja
8 months
Tweet media one
1
0
10
@intrstllrninja
interstellarninja
5 months
@giffmana who is it made for if OSS GPU poor community can't play with it?
1
0
10
@intrstllrninja
interstellarninja
4 months
if you're building agentic frameworks for process automation, here's my two cents: - for repeated tasks human curated agents with deterministic workflow is the way to go - for spontaneous tasks let the model generate execution graph with agents personas on the fly
@yoheinakajima
Yohei
4 months
if you overlay our agent types... - if it's important and all the time, you should hand-craft it. - the stuff that is all the time but not important is likely tasks types and tools that are relevant to your business, so a specialized agent can help. - for stuff that's
Tweet media one
3
3
43
1
1
9
@intrstllrninja
interstellarninja
4 months
LocalAI API supports Hermes-2-Pro Llama-3-8B function calling in OpenAI API standard 🚀
@LocalAI_API
LocalAI
4 months
👇New models available in the LocalAI gallery! - NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF () Another great release from @Teknium1 🙌 ! - MaziyarPanahi/WizardLM2-7b () cheapeau to @MaziyarPanahi for the fantastic work ! 🫶 Enjoy!
Tweet media one
2
1
13
0
2
10
@intrstllrninja
interstellarninja
6 months
@karan4d @lumpenspace claude was trained with prompts that use xml tags for wrapping parts of the prompt such as instruction, examples, documents etc. their guide also mentions that it's especially useful for mathematics and code generation
Tweet media one
1
0
8
@intrstllrninja
interstellarninja
1 year
@erhartford @abacaj We’re working on open source modules for Mixture of Experts (MoE) inference with finetuned LoRA experts loaded in memory that are sparsely activated through a prompt gating network
2
0
9
@intrstllrninja
interstellarninja
4 months
@Teknium1 here's one that came to my radar, looks good for avoiding refusals
1
0
9
@intrstllrninja
interstellarninja
10 months
@abacaj @shockrobortyy Small models accelerate research and faster to iterate
0
0
9
@intrstllrninja
interstellarninja
5 months
hear me out, an LLM trained on world model of the marvel universe
Tweet media one
2
0
7
@intrstllrninja
interstellarninja
4 months
check out the latest Hermes 2 Pro on Llama-3 8B w/ json-mode and function calling which beats Llama-3 8B Instruct on several benchmarks
@NousResearch
Nous Research
4 months
Announcing Hermes 2 Pro on Llama-3 8B! Nous Research's first Llama-3 based model is now available on HuggingFace. Hermes Pro comes with Function Calling and Structured Output capabilities, and the Llama-3 version now uses dedicated tokens for tool call parsing tags, to make
Tweet media one
35
96
581
1
1
9
@intrstllrninja
interstellarninja
6 months
Hermes 2 Pro function-calling model integrated with search engine by @ExaAILabs 👀
@bmorphism
barton 🥩🐝
6 months
added @ExaAILabs support for use with @NousResearch new function-calling model Nous Hermes 2 Pro and it's pretty great!
3
7
48
0
0
8