![Shuai Kyle Zheng Profile](https://pbs.twimg.com/profile_images/573944591080165376/-L_SBZlY_x96.jpeg)
Shuai Kyle Zheng
@bittnt
Followers
1K
Following
15K
Statuses
483
Researcher at @cruise. Interests in: computer vision and machine learning. All opinions are my own.
California, USA
Joined May 2012
That's actually not true. It uses JSON representation to represent all the data. ```messages=[ {"role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": { "url": f"data:image/png;base64,{base64_image}"} } ]} ],```. It requires user to encode the raw image as base64 and then converts this base64 string into a UTF-8 string. e.g. ```import base64 def encode_image(image_path): with open(image_path, "rb") as image_file: # Read the image in binary mode return base64.b64encode(image_file.read()).decode("utf-8") ```. I think this can later be read using OpenCV.
0
0
0
@chriswolfvision Follow your analogy, switching to Camtasia is like hopping onto an electric bike: straightforward, smooth, and still powerful enough to get you where you need to go without breaking a sweat. No need for pilot training—just start pedaling and enjoy the ride!
1
0
1
@YiMaTweets I want to agree. But how can you prove that claim? One direction could go wrong is that the existing mathematical languages that human invented are not sufficient to explain the progress made in the AI through “engineering hacks”.
0
0
0
@mtrainier2020 感觉主要是因为,千里马常有而伯乐不常有。可能这里伯乐就是工程师。当前AI最火的一个方向,diffusion model就是基于一个微分方程求解的论文,BDO Anderson 1982. 那个paper的引用几十年都是几十个,无人问津,图像生成sora出来之后,开始暴涨。可能���力学好微积分某天也能用来做Netflix电影。
0
1
7
@ducha_aiki Definitely “ A discriminatively trained, multiscale, deformable part model” Cvpr 2008.
0
1
3
@eric_brachmann “ Wer mit Ungeheuern kämpft, mag zusehn, dasser nicht dabei zum Ungeheuer wird. Und wenn du lange in einen Abgrund blickst, blickt der Abgrund auch in dich hinein.” Lol
1
0
3
@ftm_guney Very cool work! A bit disappointed, no discussion like unknown unknown. Would be useful to have that uncertainty measure.
0
0
0
Yes, with 8 USD per month, @elonmusk twitter needs have at least markdown/latex support, so that we can tweet code/equation and more.
0
0
1
Yu Cheng from @MSFTResearch presents the talk title Towards data efficient vision-language (VL) models. It covers the methods such as FewVLM and Grounded-FewVLM.
0
0
1