InternVideo2 Scaling Video Foundation Models for Multimodal Video Understanding We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in actio

@_akhaliq@x.good.news

InternVideo2 Scaling Video Foundation Models for Multimodal Video Understanding We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue. Our approach

查看详情

@_akhaliq@x.good.news

0/478