许哲 Profile
许哲

@xuzhe42

Followers
733
Following
22
Statuses
45

Joined February 2023
Don't wanna be here? Send us removal request.
@xuzhe42
许哲
7 months
RT @elonmusk:
Tweet media one
0
410K
0
@xuzhe42
许哲
7 months
RT @stevenmarkryan: @elonmusk Warren Buffet proposed a great solution. You just pass a law that says that any time there's a deficit of mo…
0
7K
0
@xuzhe42
许哲
8 months
RT @RealAlexJones: Shocking ending to the Trump vs Biden debate, what most people didn't see. This says it all.
0
33K
0
@xuzhe42
许哲
9 months
哈哈哈哈哈哈
@0xgaut
gaut
9 months
we've officially reached AGI
Tweet media one
0
0
14
@xuzhe42
许哲
9 months
RT @nntaleb: Complexity artists claim to make the unpredictable predictable. I prefer to work on making the unpredictable irrelevant.
0
255
0
@xuzhe42
许哲
10 months
so true
@AndrewYNg
Andrew Ng
10 months
Tool use, in which an LLM is given functions it can request to call for gathering information, taking action, or manipulating data, is a key design pattern of AI agentic workflows. You may be familiar with LLM-based systems that can perform a web search or execute code. Some of the large, consumer-facing LLMs already incorporate these features. But tool use goes well beyond these examples. If you prompt an online LLM-based chat system, “What is the best coffee maker according to reviewers?”, it might decide to carry out a web search and download one or more web pages to gain context. Early on, LLM developers realized that relying only on a pre-trained transformer to generate output tokens is limiting, and that giving an LLM a tool for web search lets it do much more. With such a tool, an LLM is either fine-tuned or prompted (perhaps with few-shot prompting) to generate a special string like {tool: web-search, query: "coffee maker reviews"} to request calling a search engine. (The exact format of the string depends on the implementation.) A post-processing step then looks for strings like these, calls the web search function with the relevant parameters when it finds one, and passes the result back to the LLM as additional input context for further processing. Similarly, if you ask, “If I invest $100 at compound 7% interest for 12 years, what do I have at the end?”, rather than trying to generate the answer directly using a transformer network — which is unlikely to result in the right answer — the LLM might use a code execution tool to run a Python command to compute 100 * (1+0.07)**12 to get the right answer. The LLM might generate a string like this: {tool: python-interpreter, code: "100 * (1+0.07)**12"}. But tool use in agentic workflows now goes much further. Developers are using functions to search different sources (web, Wikipedia, arXiv, etc.), to interface with productivity tools (send email, read/write calendar entries, etc.), generate or interpret images, and much more. We can prompt an LLM using context that gives detailed descriptions of many functions. These descriptions might include a text description of what the function does plus details of what arguments the function expects. And we’d expect the LLM to automatically choose the right function to call to do a job. Further, systems are being built in which the LLM has access to hundreds of tools. In such settings, there might be too many functions at your disposal to put all of them into the LLM context, so you might use heuristics to pick the most relevant subset to include in the LLM context at the current step of processing. This technique, which is described in the Gorilla paper cited below, is reminiscent of how, if there is too much text to include as context, retrieval augmented generation (RAG) systems offer heuristics for picking a subset of the text to include. Early in the history of LLMs, before widespread availability of large multimodal models (LMMs) like LLaVa, GPT-4V, and Gemini, LLMs could not process images directly, so a lot of work on tool use was carried out by the computer vision community. At that time, the only way for an LLM-based system to manipulate an image was by calling a function to, say, carry out object recognition or some other function on it. Since then, practices for tool use have exploded. GPT-4’s function calling capability, released in the middle of last year, was a significant step toward general-purpose tool use. Since then, more and more LLMs are being developed to similarly be facile with tool use. If you’re interested in learning more about tool use, I recommend: - Gorilla: Large Language Model Connected with Massive APIs, Patil et al. (2023) - MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action, Yang et al. (2023) - Efficient Tool Use with Chain-of-Abstraction Reasoning, Gao et al. (2024) Both Tool Use and Reflection, which I posted about last week, are design patterns that I can get to work fairly reliably on my applications — both are capabilities well worth learning about. In the future, I’ll describe the Planning and Multi-agent collaboration design patterns. They allow AI agents to do much more but are less mature, less predictable — albeit very exciting — technologies. [Original text: ]
0
0
1
@xuzhe42
许哲
1 year
RT @anammostarac: Microsoft’s board right now
0
2K
0
@xuzhe42
许哲
1 year
RT @OpenAI: OpenAI announces leadership transition
0
4K
0
@xuzhe42
许哲
2 years
RT @sama: “a much faster rate of change” is maybe my single highest-confidence prediction about what a world with AGI in it will be like.
0
98
0
@xuzhe42
许哲
2 years
RT @Easy2Mine: 【E2M Research 0602期AMA】 未来不可预测vs做投资必须要做预测,投资人应如何决策? 本周五晚8点,一起讨论:面对不确定性的未来,做决策时的那些底层逻辑思考 分享嘉宾: Odyssey @OdysseysEth Zhen Dong…
0
5
0
@xuzhe42
许哲
2 years
Requesting faucet funds into 0x92f71B1a1161e88588b16363b3bC242aF14ED95d on the #Goerli #Ethereum test network via
3
1
3
@xuzhe42
许哲
2 years
@takit888 是的,现在还要自己干很多事,未来可期
0
0
0
@xuzhe42
许哲
2 years
All the gifts of fate have secretly been marked with a price. The standard for evaluating an economist is to see how much they do not believe in the existence of free things in the world.
@nntaleb
Nassim Nicholas Taleb
2 years
Those who thought it was free money are now discovering that they have to pay for it retrospectively.
0
0
9
@xuzhe42
许哲
2 years
so interesting. we are forced to think what is think right now
@d_feldman
Daniel
2 years
On the left is GPT-3.5. On the right is GPT-4. If you think the answer on the left indicates that GPT-3.5 does not have a world-model.... Then you have to agree that the answer on the right indicates GPT-4 does.
Tweet media one
Tweet media two
0
0
0
@xuzhe42
许哲
2 years
@maoyingyi It's a choice
0
0
0
@xuzhe42
许哲
2 years
RT @nntaleb: They are all libertarians until they are hit by higher interest rates.
0
2K
0
@xuzhe42
许哲
2 years
@maoyingyi The mechanism is somewhat complex, perhaps requiring a whitepaper.
1
0
0