The new generation of general vision technology system "Scholar" is officially released

Shanghai Artificial Intelligence Laboratory, SenseTime, the Chinese University of Hong Kong, and Shanghai Jiaotong University jointly released a new generation of universal vision technology system "Scholar" (INTERN) , which aims to systematize Solve a series of bottlenecks in the current artificial intelligence vision field, such as task generalization, scene generalization, and dataAt present, the technical report "INTERN: A New Learning Paradigm Towards General Vision" has been released on the arXiv platform. OpenGVLab, a general-purpose visual open source platform based on "Scholar", will also be officially open sourced at the beginning of next year. Its usage paradigm, data system and evaluation benchmark, etc.

According to related technical reports, a "book student" base model can fully cover the four core vision tasks of classification, target detection, semantic segmentation, and depth estimation.

The Shanghai Artificial Intelligence Laboratory stated that compared with the current strongest open source model (OpenAI's CLIP released in 2021), "Scholars" have achieved significant improvements in accuracy and data usage efficiency. Specifically, based on the same downstream scene data, the average error rate of the "shusheng" on the 26 data sets of the four tasks of classification, target detection, semantic segmentation and depth estimation has been reduced by 40.2%, 47.3%, 34.8% and 9.4, respectively. %.

 The general vision technology system "Shusheng" (INTERN) is composed of seven modules, including three infrastructure modules: general vision data system, general vision network structure, and general vision evaluation benchmarks, as well as four different upstream and downstream infrastructure modules. Training phase module.

Post a Comment