Multimodal LLM

Motivation:

VQA가 만들어지고 이걸 synthesis한다.

direct하게 학습을하면 step을 나눠서 학습을한다.

LLaVA로 medical instruction tuning하는데.. 결국 dataset이 중요하다.