China Telecom will open source the TeleChat-12B star semantic large model

FTT World - April 16, 2024

China Telecom has made a significant stride in the field of semantic large models by open sourcing the TeleChat-12B, a star semantic large model with 12 billion parameters. This move is part of a broader initiative, with plans to open source a large model with hundreds of billions of parameters within the year.

Compared to the 7B version open sourced in January, the TeleChat-12B has seen a 30% increase in overall effect in terms of content, performance, and applications. Notably, areas such as multi-round reasoning and security issues have seen improvements of more than 40%.

TeleChat-12B has upgraded the 1.5T training data of the 7B version to 3T. This upgrade, coupled with optimized data cleaning and annotation strategies, has greatly improved data quality. The model also continues to build special task SFT (supervised fine-tuning) data and optimize data construction specifications.

In terms of model structure, the TeleChat-12B has made significant improvements. Small-scale models were used to try combinations of multiple model structures to select the optimal structure. The TeleChat-12B model adopts a structure in which the word embedding layer and the output layer are decoupled, enhancing training stability and convergence.

The training data for TeleChat-12B covers a wide range of topics in both Chinese and English, including books, encyclopedias, news, government affairs, law, medicine, patents, papers, mathematics, and code. By optimizing the data cleaning strategy, the text cleanliness, unbiasedness, content validity, and format standardization of the data have been greatly improved.

China Telecom employs scientific data matching learning and course learning methods for training. Small parameter models are used to fit data with various data matching to obtain a priori estimates of the difficulty of each data set. During the training process, the model automatically evaluates the loss on all data sets and the generation effect on the evaluation set, dynamically increasing the weight of the more difficult to learn data sets.

China Telecom's open source provides basic models and dialogue models based on corresponding versions. It supports traditional full parameter updates as well as efficient fine-tuning methods such as LoRA that only update part of the parameters. It also supports Deepspeed fine-tuning, int8, int4 quantization, and domestic chips, promoting the localization process of large models.

With the unveiling of the TeleChat-12B, China Telecom is poised to make a significant impact in the field of semantic large models, offering enhanced performance and applications that promise to revolutionize the industry.

China Telecom will open source the TeleChat-12B star semantic large model

Post a Comment

0 Comments

Email

Get new posts by email:

Newsletter

Multiplex Ads

Tags

Popular Posts

Dell registers 2022 XPS 17: A new generation of full-screen high-performance notebooks

Honor GS Pro watch update 10.1.2.52 firmware

Is this the original iPhone 13 packaging or is it fake?

Software

Menu Footer Widget

China Telecom will open source the TeleChat-12B star semantic large model

You may like these posts

Post a Comment

0 Comments

Email

Get new posts by email:

Newsletter

Social Plugin

Multiplex Ads

Tags

Popular Posts

Dell registers 2022 XPS 17: A new generation of full-screen high-performance notebooks

Honor GS Pro watch update 10.1.2.52 firmware

Is this the original iPhone 13 packaging or is it fake?

Software

Menu Footer Widget