Legal regulation and enhancement path for mitigating risks in training
Xing Luyuan1,,Shen Xinyi2,Wang Jiayi3
1 School of Law, Nanjing University, Nanjing 210046, China; 2 School of Law, London School of Economics and Political Science, London WC2A 2AE, England,;3 School of Arts and Sciences, Northeast Agricultural University, Harbin 150030, China
Abstract: This article discusses the legal risks and regulatory issues of generative artificial intelligence such as ChatGPT in training data. It begins by analyzing issues related to the sources of data, tendencies towards discrimination, data quality, and security risks in generative AI. Subsequently, the article undertakes a comparative study of Chinese and European legal systems, proposing the clear definition of governance principles and the development of comprehensive pathways for data compliance. Finally, the article offers specific recommendations from a practical standpoint for the improvement of the current legal regulations in China. These suggestions are intended to serve as proper references for the healthy development and legal regulation of generative artificial intelligence.
Key words : generative AI; artificial intelligence act; training data risks; data compliance
生成式人工智能中的訓(xùn)練數(shù)據(jù)風(fēng)險(xiǎn)不同于以往僅能進(jìn)行分類,、預(yù)測(cè)或?qū)崿F(xiàn)特定功能的模型,生成式人工智能大模型(Large Generative AI Models,LGAIMs)經(jīng)過(guò)訓(xùn)練可生成新的文本,、圖像或音頻等內(nèi)容,,且具有強(qiáng)大的涌現(xiàn)特性和泛化能力[1]。訓(xùn)練數(shù)據(jù)表示為概率分布,,LGAIMs可以實(shí)現(xiàn)自行學(xué)習(xí)訓(xùn)練數(shù)據(jù)中的模式和關(guān)系,,可以生成訓(xùn)練數(shù)據(jù)集之外的內(nèi)容[2]。同時(shí),,LGAIMs與用戶之間進(jìn)行人機(jī)交互所產(chǎn)生的數(shù)據(jù)還會(huì)被用于大模型的迭代訓(xùn)練,。LGAIMs的開(kāi)發(fā)者往往需要使用互聯(lián)網(wǎng)上公開(kāi)的數(shù)據(jù)以及和用戶的交互數(shù)據(jù)作為訓(xùn)練數(shù)據(jù),而這些數(shù)據(jù)可能存在諸多合規(guī)風(fēng)險(xiǎn),,例如數(shù)據(jù)來(lái)源風(fēng)險(xiǎn),、歧視風(fēng)險(xiǎn)和質(zhì)量風(fēng)險(xiǎn)。