亚洲国产爱久久全部精品_日韩有码在线播放_国产欧美在线观看_中文字幕不卡在线观看

BuboGPT:

Enabling Visual Grounding in Multi-Modal LLMs


Bytedance Inc.   *Equal Contribution   +Project Lead

BuboGPT is an advanced Large Language Model (LLM) that incorporates multi-modal inputs including text, image and audio, with a unique ability to ground its responses to visual objects. It demonstrates remarkable chat abilities for arbitrary image-audio data understanding, whether aligned or unaligned.

Bubo owls are well known for having strong vision and hearing abilities that help them thrive.

Abstract

LLMs have demonstrated remarkable abilities at interacting with humans through language, especially with the usage of instruction-following data. Recent advancements in LLMs, such as MiniGPT-4, LLaVA, and X-LLM, further enlarge their abilities by incorporating multi-modal inputs, including image, video, and speech. Despite their effectiveness at generating precise and detailed language understanding of the given modality signal, these LLMs give up the ability to ground specific parts of inputs, thus only constructing a coarse-grained mapping. However, explicit and informative correspondence between text and other modalities will not only improve the user experience but also help to expand the application scenario of multi-modal LLMs.

  1. BuboGPT Architecture . We build a multi-modal LLM, BuboGPT for multi-modal understanding including image, audio and text by learning a common semantic space and further explore the fine-grained relation between different visual objects and different modalities.
  2. Multimodal Instruct Data. We construct a high-quality multi-modal instruction-tuning dataset including fine-grained audio descriptions and cross-modal sound localization, and introduce both positive and negative image-audio pairs for semantic matching to facilitate the cross-modal understanding..

BuboGPT Architecture

As the figure shown, we perform joint multi-modal understanding and chatting for text, vision and audio, which is achieved by learning a shared representation space that aligns well with pre-trained Vicuna. We also build an off-the-shelf visual grounding pipeline to explore the fine-grained relation between different visual objects and modalities.

The framework of BuboGPT.

BuboGPT: Training Procedure

BuboGPT connects different modality Q-Former with pre-trained large language model Vicuna, using a simple projection matrix. We consider a two-stage instruction-tuning procedure:

  • Stage 1: Single-modal Pre-training. We train the corresponding modality Q-Former and linear projection layer on a large number of modality-text paired data.
  • Stage 2: Multi-Modal Instruct Tuning. We curate a high-quality multi-modal instruction-following dataset to fine tune only the linear projection layer:
    • Image-Text: We employ two previously published datasets from MiniGPT-4 and LLaVa for visual instruct tuning.
    • Audio-Text: We build a series of expressive and descriptive data to facilitate this process based on Clotho dataset.
    • Audio-Image-Text: We build <audio, image, text> pairs to act as triple-modality instruction tuning dataset based on VGGSS dataset and further introduce negative set to enhance our model.

-->

Examples on Fine-grained Visual Understanding

We first consider using a single image as input for fine-grained visual understanding with grounding. As the exmaples shown, the model can accurately associate textural words or phrases with image regions in various scenarios with different complexities.


Examples on Audio Understanding

When a single audio clip is provided for audio understanding, BuboGPT gives informative descriptions covering nearly all acoustic parts included, even when some audio fragments are too short for humans to notice, see examples for details.


Examples on Aligned audio-image understanding

We show that BuboGPT can perform sound localization with a matched audio-image pair provided, which gives a perfect example for aligned audio-image understanding, see examples for details.


Examples on Arbitrary audio-image understanding

The BuboGPT can also tell whether the image and audio are relevant to each other and generate high-quality response for arbitrary audio-image understanding, see examples for details.

BibTeX


  @article{zhao2023bubogpt,
    author      = {Yang Zhao and Zhijie Lin and Daquan Zhou and Zilong Huang and Jiashi Feng and Bingyi Kang},
    title       = {BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs},
    publisher   = {arXiv:2307.08581},
    year        = {2023}
  }
  
亚洲国产爱久久全部精品_日韩有码在线播放_国产欧美在线观看_中文字幕不卡在线观看

    
    

    9000px;">

      
      

      国产.欧美.日韩| 国产精品久久久久久久久果冻传媒| 日韩手机在线导航| 日韩精品一二三四| 欧美日韩国产高清一区二区| 日韩av电影免费观看高清完整版 | 久久精品国产色蜜蜜麻豆| 色一区在线观看| 亚洲成a人v欧美综合天堂下载 | 亚洲国产日韩综合久久精品| av激情综合网| 国产欧美视频一区二区三区| 精品一区二区在线视频| 欧美国产国产综合| 欧美怡红院视频| 日本特黄久久久高潮| 国产精品三级视频| 欧美写真视频网站| 日本美女一区二区三区| 国产精品日韩成人| 日韩欧美一二区| 91浏览器打开| 国产福利一区在线| 午夜精品久久久久久久| 欧美亚洲综合网| 成人看片黄a免费看在线| 日本成人超碰在线观看| 日韩一区在线免费观看| 26uuu亚洲综合色| 91麻豆精品国产自产在线| 国产99久久久国产精品潘金| 久久丁香综合五月国产三级网站| 亚洲欧洲成人av每日更新| 欧美一区二区三区免费| 欧美在线观看视频在线| 成人性视频免费网站| 麻豆一区二区在线| 日韩精品电影在线| 天堂午夜影视日韩欧美一区二区| 国产精品成人免费精品自在线观看| 日韩西西人体444www| 欧美无砖专区一中文字| 99精品视频在线免费观看| 不卡在线观看av| 成人国产精品免费网站| a在线欧美一区| 色噜噜狠狠色综合中国| 久久久久国产精品厨房| 亚洲综合区在线| 成人激情校园春色| www日韩大片| 久久精品国产精品亚洲红杏 | 亚洲激情图片小说视频| 国产一区二区毛片| 精品噜噜噜噜久久久久久久久试看 | 亚洲男人电影天堂| 日韩—二三区免费观看av| 国产高清不卡一区| 日韩一区二区视频| 亚洲夂夂婷婷色拍ww47| 国产91精品一区二区| 精品久久国产老人久久综合| 综合久久国产九一剧情麻豆| 一色屋精品亚洲香蕉网站| 日韩av高清在线观看| 日本久久精品电影| 欧美国产在线观看| 日本美女一区二区三区| 91香蕉国产在线观看软件| 26uuu欧美日本| 美女视频网站黄色亚洲| 欧美电影在线免费观看| 亚洲在线视频网站| 一本大道久久精品懂色aⅴ| 国产精品成人一区二区艾草| 国产精品一卡二卡在线观看| 精品国产免费人成电影在线观看四季| 亚洲激情五月婷婷| 欧美性感一类影片在线播放| 一区二区高清免费观看影视大全| 国产91精品一区二区麻豆亚洲| 91精品国产一区二区三区| 国产一区二区三区不卡在线观看| 国产精品青草久久| 亚洲bt欧美bt精品| 午夜久久久久久| 中文一区二区在线观看| 欧美日韩小视频| 99久久久久久99| 精品一区二区精品| 国产精品久线在线观看| 成人丝袜18视频在线观看| 成人一区二区在线观看| 亚洲综合色噜噜狠狠| 国产欧美日韩在线视频| 欧美一区二区三区四区视频| 91香蕉视频黄| 成人avav在线| 国产成人福利片| 免费看黄色91| 美女脱光内衣内裤视频久久网站| 国产精品麻豆久久久| 国产欧美精品日韩区二区麻豆天美| 在线免费观看一区| 99麻豆久久久国产精品免费优播| 久久精品国产99国产| 日本视频一区二区| 亚洲国产综合91精品麻豆| 中文字幕亚洲电影| 中文字幕精品在线不卡| 国产日韩影视精品| 国产女同互慰高潮91漫画| 久久中文字幕电影| 国产免费观看久久| 国产欧美一区二区精品性色超碰| 欧美日韩国产精品成人| 亚洲成人av在线电影| 国产一区在线看| 欧美系列在线观看| 欧美疯狂做受xxxx富婆| 国产精品午夜春色av| 粉嫩av亚洲一区二区图片| 在线播放91灌醉迷j高跟美女| 日韩美女视频一区| 在线视频综合导航| 曰韩精品一区二区| 91麻豆精品国产自产在线| 丝袜美腿亚洲色图| 日韩一区二区三区免费看| 国产美女精品人人做人人爽 | 中文字幕的久久| 99久久精品国产观看| 亚洲国产精品自拍| 91精品国产综合久久久久久| 欧美综合久久久| 国产日韩亚洲欧美综合| 在线看一区二区| 日韩和欧美一区二区三区| 久久久久国产免费免费| 日韩一区二区三区免费看 | 日韩不卡在线观看日韩不卡视频| 成人免费看视频| 欧美一区在线视频| 丁香网亚洲国际| 亚洲图片欧美视频| 亚洲色图欧洲色图| 国产91丝袜在线观看| 欧美性感一区二区三区| 综合色天天鬼久久鬼色| 亚洲欧美日韩国产手机在线| 亚洲男人的天堂在线aⅴ视频| 一区二区三区在线视频免费观看 | 欧美精品久久天天躁| 正在播放一区二区| 在线播放一区二区三区| 99国内精品久久| 国产乱对白刺激视频不卡| 亚洲一区二区三区四区在线观看| 精品久久人人做人人爰| 欧美日韩国产美| 日韩一区二区在线看| 一本色道久久综合亚洲aⅴ蜜桃| 美女被吸乳得到大胸91| 亚洲第四色夜色| 亚洲一区国产视频| 视频一区二区不卡| 看片网站欧美日韩| 国产精品香蕉一区二区三区| 91视频一区二区| 欧美性受xxxx黑人xyx性爽| 日韩一区二区在线观看| 久久综合九色综合97婷婷女人| 日韩欧美卡一卡二| 国产精品污污网站在线观看| 亚洲美女电影在线| 一区二区三区高清在线| 视频一区欧美精品| 成人99免费视频| 欧美日本一道本在线视频| 精品欧美一区二区久久| 久久五月婷婷丁香社区| 亚洲电影视频在线| 一区二区三区在线不卡| 91无套直看片红桃| 日韩精品一区二区在线| 亚洲h在线观看| 亚洲日本在线观看| 欧美aⅴ一区二区三区视频| 精品午夜一区二区三区在线观看| 日韩一区二区三区免费观看| 91在线看国产| 99久久精品免费观看| 精品av综合导航| 无吗不卡中文字幕| 成人美女在线视频| 久久久久国产精品人| 青草国产精品久久久久久| 日本久久一区二区| 亚洲欧洲av色图| 粉嫩av一区二区三区在线播放|