LLaVA视觉问答模型
visual-question-answering
  • 模型资讯
  • 模型资料

模型描述 (Model Description)

🌋 LLaVA: Large Language and Vision Assistant

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.

[Project Page] [Paper]

Visual Instruction Tuning

Haotian Liu*, Chunyuan Li*, Qingyang Wu, Yong Jae Lee (*Equal Contribution)


Generated by GLIGEN via "a cute lava llama with glasses" and box prompt

运行环境 (Operating environment)

Install

  1. git clone the original repository
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
  1. Install Package
conda create -n llava python=3.10 -y
conda activate llava
进入pyproject.toml所在文件夹
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

LLaVA Weights

提供模型运行所需完整13B权重,使用需遵循LLaMA model license。

代码范例 (Code example)

The current implementation only supports for a single-turn Q-A session, and the interactive CLI is WIP.
This also serves as an example for users to build customized inference scripts.
(运行时,可将ms_wrapper.py文件复制到原git仓库路径下,添加下列代码运行)

from modelscope.models import Model
from modelscope.pipelines import pipeline

model_id = 'xingzi/llava_visual-question-answering'
inference = pipeline('llava-task', model=model_id, model_revision='v1.1.0')

image_file = "https://llava-vl.github.io/static/images/view.jpg"
query = "What are the things I should be cautious about when I visit here?"
conv_mode = None
inputs = {'image_file': image_file, 'query': query}

output = inference(inputs, conv_mode=conv_mode)

print(output)

#注:模型加载可能需要几分钟的时间

Citation

If you find LLaVA useful for your your research and applications, please cite using this BibTeX:

@misc{liu2023llava,
      title={Visual Instruction Tuning}, 
      author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},
      publisher={arXiv:2304.08485},
      year={2023},
}

Acknowledgement

  • Vicuna: the codebase we built upon, and our base model Vicuna-13B that has the amazing language capabilities!