China’s new censorship system leaked

by nativetechdoctor
1 minutes read

A recently leaked dataset reveals that the Chinese government is actively developing an advanced censorship system utilizing large-scale language models (LLMs). This new system aims to target not only traditionally sensitive topics but also a broader range of issues, including rural poverty, police corruption, and matters related to leadership.

Discovered by security researcher NetAskari, the leaked dataset, which is approximately 300GB in size, demonstrates how China’s LLM classifies various types of information. It was found in an unsecured Elasticsearch database operated by tech company Baidu, with the most recent data entries dating as far back as December 2024.

The dataset comprises around 133,000 entries and includes references to terms such as “eb35” and “eb_speedpro,” indicating it may serve as a training dataset for Baidu’s AI chatbot, Ernie Bot. NetAskari suggests that this dataset is instrumental in developing an “advanced AI system” intended to automatically flag sensitive content for the Chinese government.

The censorship targets within the dataset reveal a focus on critical subjects. Complaints regarding rural poverty, news related to corruption, and posts addressing the coercive practices involving businesses are among the discussed topics. Political, social, and military matters are prioritized for flagging, with particular emphasis placed on the term “Taiwan,” which appears over 15,000 times, underscoring China’s significant interest in the political dynamics concerning Taiwan.

Xiao Zhang, a security researcher at the University of California, Berkeley, remarked that the dataset provides compelling evidence of the Chinese government’s intention to employ LLM technology to bolster its content censorship efforts. Unlike prior methods that primarily relied on basic algorithms to block specific banned words, LLMs are capable of identifying nuanced criticism, thereby enhancing the efficacy of censorship practices.

Related Posts

Leave a Reply

[script_24]

Discover more from ITFamilyMedia

Subscribe now to keep reading and get access to the full archive.

Continue reading

Adblock Detected

Please support us by disabling your AdBlocker extension from your browsers for our website.