xinlu zhang Profile
xinlu zhang

@XZ1023_

Followers
72
Following
14
Statuses
24

Ph.D. @ucsbcs, Bachelor& Master @IUBloomington. Research interests: Instructional tuning, LLM evaluation, LLM prompt (Looking for full-time/internship).

Joined April 2022
Don't wanna be here? Send us removal request.
@XZ1023_
xinlu zhang
4 months
RT @ZhiyuChen4: πŸ€– Can LLMs effectively assist in cognitive behavior therapy (CBT)? πŸ”—New paper: We present the fir…
0
6
0
@XZ1023_
xinlu zhang
7 months
Many thanks to my wonderful collaborators: Zhiyu (@ZhiyuChen4), Xi(@xiye_nlp), Xianjun ( @Qnolan4), Lichang (@LichangChen2), William Wang ( @WilliamWangNLP) and Linda Ruth Petzold.
0
0
2
@XZ1023_
xinlu zhang
7 months
4/🧩 Further analysis revealed that coding data generally provides similar task-specific benefits across model families. While most optimal proportions of coding data are consistent across families, no single proportion enhances all task-specific reasoning abilities.
Tweet media one
0
0
1
@XZ1023_
xinlu zhang
7 months
3/πŸ” Diving into each domain, we found that coding data uniquely impacts different reasoning abilities. Consistent trends within each domain across model backbones and sizes suggest the benefits of coding data transfer effectively during the IFT stage.
Tweet media one
0
0
1
@XZ1023_
xinlu zhang
7 months
2/ ✨ Overall, we observed a consistent and gradual enhancement in the LLMs' reasoning performance as the proportion of coding data used for fine-tuning increased.
Tweet media one
0
0
1
@XZ1023_
xinlu zhang
7 months
1/πŸ“Š We created IFT datasets with increasing coding data proportions, fine-tuned six LLM backbones, evaluated performance across twelve tasks in three reasoning domains, and analyzed outcomes from overall, domain-level, and task-specific perspectives.
0
0
1
@XZ1023_
xinlu zhang
1 year
Check out our medical instruction dataset, MedInstrcut-52k, the clinician-crafted evaluation test set, MedInstrcut-test, and our models AlpaCare-7B/13B on the GitHub link:
0
0
2
@XZ1023_
xinlu zhang
1 year
@Qnolan4 @LichangChen2 @ZekunLi0323 6/πŸ”” Takeaway: Our study shows the power of tuning LLMs with diverse & domain-specific instructions. High-quality, diverse data with domain knowledge boosts domain-specific capacity & generalization, even with small quantities.
0
0
1
@XZ1023_
xinlu zhang
1 year
@Qnolan4 @LichangChen2 @ZekunLi0323 5/ πŸ₯ In-depth comparisons of AlpaCare-13B vs. its 13B instruction-tuned LLM counterparts reveal a consistent edge in medical proficiency and adaptability.
Tweet media one
0
0
1
@XZ1023_
xinlu zhang
1 year
@Qnolan4 @LichangChen2 @ZekunLi0323 4/πŸ” Evaluation on AlpacaFarm reveals AlpaCare's robust generalization abilities in both medical and general domains. Training with a diverse, domain-specific instruction dataset also enhances generalizability.
Tweet media one
0
0
1
@XZ1023_
xinlu zhang
1 year
3/πŸ¦™ AlpaCare: Leveraging a 7B-LLaMA with our 52k med dataset. Evaluations reveal AlpaCare's superior medical prowess, outperforming other 7B instruction-tuned models even those trained on much larger datasets.
Tweet media one
0
0
2
@XZ1023_
xinlu zhang
1 year
@Qnolan4 @LichangChen2 @ZekunLi0323 2/πŸ” How do we create a diverse med dataset? Begin with expert-crafted tasks spanning medical topics, types & levels. Use #GPT4 for generating diverse tasks, ensuring depth! Tasks are then input into #ChatGPT sequentially for detailed output. See our seed example:
Tweet media one
0
0
1
@XZ1023_
xinlu zhang
1 year
1/πŸ“Š Many medical LLMs utilize vast amounts of data, but diversity is key! To fill this gap, We created a dataset of 52k unique med instructions. See our language diversity compared to the baseline below. The left is ours, and the right is one of baseline.
Tweet media one
0
0
1
@XZ1023_
xinlu zhang
2 years
🧡6/6 A big shoutout to our fantastic team of authors: @ShiyangLi6, @Qnolan4, Chenxin Tian, @YaoQinUCSD, and Linda Ruth Petzold! Don't miss out on more insightful results in our paper! πŸ“šπŸ”
0
0
1
@XZ1023_
xinlu zhang
2 years
🧡 5/6 It also demonstrates impressive generalizability and broad applicability by OOD testing and experiments in two general domain datasets. πŸ’‘
0
0
0
@XZ1023_
xinlu zhang
2 years
🧡4/6 Our method achieves up to a 22.57% increase in absolute accuracy compared to SLM fine-tuning w/o context and sets SOTA results in two medical tasks within privacy-restricted scenarios.πŸš€πŸ©Ί
0
0
0