Chinese artificial intelligence pioneer SenseTime unveiled what it called the "largest multimodal open-source large-language model" on Tuesday, amid the latest AI wave triggered by ChatGPT.
The model is the latest push by SenseTime, and by China at large, to upgrade its AI technology to better power the application of AI in more sectors.
Named Intern 2.5, the model was jointly developed by SenseTime, Shanghai Artificial Intelligence Laboratory, Tsinghua University, the Chinese University of Hong Kong and Shanghai Jiao Tong University.
Boasting 3 billion parameters, Intern 2.5 is the largest and most accurate on ImageNet among the world's open-source models, and it is the only model in the object detection benchmark dataset COCO that exceeds 65.0 mAP, SenseTime said.
The ImageNet project is a large visual database designed for use in visual object recognition software research.
The model's cross-modal open-task processing ability can provide efficient and accurate perception and understanding support for general scenarios such as autonomous driving and robots, SenseTime added.
Intern 2.5, a higher-level visual system with universal scene perception and complex problem-solving capabilities, achieves this by defining tasks through text, making it possible to flexibly define the task requirements of different scenarios.
It can give instructions or answers based on given visual images and prompts for tasks, thereby possessing advanced perception and complex problem-solving abilities in general scenarios such as image description, visual question-answering, visual reasoning and text recognition, the company added.