abusesaffiliationarrow-downarrow-leftarrow-rightarrow-upattack-typeburgerchevron-downchevron-leftchevron-rightchevron-upClock iconclosedeletedevelopment-povertydiscriminationdollardownloademailenvironmentexternal-linkfacebookfiltergenderglobegroupshealthC4067174-3DD9-4B9E-AD64-284FDAAE6338@1xinformation-outlineinformationinstagraminvestment-trade-globalisationissueslabourlanguagesShapeCombined Shapeline, chart, up, arrow, graphLinkedInlocationmap-pinminusnewsorganisationotheroverviewpluspreviewArtboard 185profilerefreshIconnewssearchsecurityPathStock downStock steadyStock uptagticktooltiptwitteruniversalityweb

這頁面沒有繁體中文版本,現以English顯示

文章

9 四月 2024

作者:
Christo Buschek and Jer Thorp, A Knowing Machines Project

"A Knowing Machines Project" unpacks layers of concerns with AI datasets, including the prevalence of Child Sexual Abuse Material

"Models all the way down"

If you want to make a really big AI model — the kind that can generate images or do your homework, or build this website, or fake a moon landing — you start by finding a really big training set. Images and words, harvested by the billions from the internet, material to build the world that your AI model will reflect back to you...

...In December, researchers from Stanford's Internet Observatory identified more than 1,000 images categorized as Child Sexual Abuse Material (CSAM) in one of the most influential AI training sets of the moment: LAION-5B...

...LAION-5B is a really big, open-source dataset of images and text captions scraped from the internet, designed for large AI models...

...The stated goal of the project to create LAION-5B was to conduct basic research into dataset curation. Specifically, its authors wanted to create an image training set with purely automated methods - with no humans in the mix. The resulting "hands-off" dataset has been used in hundreds of academic projects. The paper announcing LAION-5B has been cited 1,331 times...

...Midjourney and Stable Diffusion, two large models for which some of the data sources are known, are both trained in part on LAION-5B. It’s likely that many other commercial models - perhaps hundreds - have been trained on the set. Models that power chat bots and image generators and have hundreds of thousands of users...

...LAION-5B has, since the CSAM findings in December, been unavailable for download. The developers say they are working on remediating it.

隱私資訊

本網站使用 cookie 和其他網絡存儲技術。您可以在下方設置您的隱私選項。您所作的更改將立即生效。

有關我們使用網絡儲存技術的更多資訊,請參閱我們的 數據使用和 Cookie 政策

Strictly necessary storage

ON
OFF

Necessary storage enables core site functionality. This site cannot function without it, so it can only be disabled by changing settings in your browser.

分析cookie

ON
OFF

您瀏覽本網頁時我們將以Google Analytics收集信息。接受此cookie將有助我們理解您的瀏覽資訊,並協助我們改善呈現資訊的方法。所有分析資訊都以匿名方式收集,我們並不能用相關資訊得到您的個人信息。谷歌在所有主要瀏覽器中都提供退出Google Analytics的添加應用程式。

市場營銷cookies

ON
OFF

我們從第三方網站獲得企業責任資訊,當中包括社交媒體和搜尋引擎。這些cookie協助我們理解相關瀏覽數據。

您在此網站上的隱私選項

本網站使用 cookie 和其他網絡儲存技術來增強您在必要核心功能之外的體驗。