This article mainly discusses relevant content about the multimodal large model Cambrain-1; the author’s team has representative work in the field of computer vision pre-training, and this paper emphasizes whether visual-side features are well learned.
This is a discussion topic separated from the original topic at https://juejin.cn/post/7399578293953380391