abusesaffiliationarrow-downarrow-leftarrow-rightarrow-upattack-typeburgerchevron-downchevron-leftchevron-rightchevron-upClock iconclosedeletedevelopment-povertydiscriminationdollardownloademailenvironmentexternal-linkfacebookfiltergenderglobegroupshealthC4067174-3DD9-4B9E-AD64-284FDAAE6338@1xinformation-outlineinformationinstagraminvestment-trade-globalisationissueslabourlanguagesShapeCombined Shapeline, chart, up, arrow, graphLinkedInlocationmap-pinminusnewsorganisationotheroverviewpluspreviewArtboard 185profilerefreshIconnewssearchsecurityPathStock downStock steadyStock uptagticktooltiptwitteruniversalityweb

このページは 日本語 では利用できません。English で表示されています

記事

9 4月 2024

著者:
Christo Buschek and Jer Thorp, A Knowing Machines Project

"A Knowing Machines Project" unpacks layers of concerns with AI datasets, including the prevalence of Child Sexual Abuse Material

"Models all the way down"

If you want to make a really big AI model — the kind that can generate images or do your homework, or build this website, or fake a moon landing — you start by finding a really big training set. Images and words, harvested by the billions from the internet, material to build the world that your AI model will reflect back to you...

...In December, researchers from Stanford's Internet Observatory identified more than 1,000 images categorized as Child Sexual Abuse Material (CSAM) in one of the most influential AI training sets of the moment: LAION-5B...

...LAION-5B is a really big, open-source dataset of images and text captions scraped from the internet, designed for large AI models...

...The stated goal of the project to create LAION-5B was to conduct basic research into dataset curation. Specifically, its authors wanted to create an image training set with purely automated methods - with no humans in the mix. The resulting "hands-off" dataset has been used in hundreds of academic projects. The paper announcing LAION-5B has been cited 1,331 times...

...Midjourney and Stable Diffusion, two large models for which some of the data sources are known, are both trained in part on LAION-5B. It’s likely that many other commercial models - perhaps hundreds - have been trained on the set. Models that power chat bots and image generators and have hundreds of thousands of users...

...LAION-5B has, since the CSAM findings in December, been unavailable for download. The developers say they are working on remediating it.

プライバシー情報

このサイトでは、クッキーやその他のウェブストレージ技術を使用しています。お客様は、以下の方法でプライバシーに関する選択肢を設定することができます。変更は直ちに反映されます。

ウェブストレージの使用についての詳細は、当社の データ使用およびクッキーに関するポリシーをご覧ください

Strictly necessary storage

ON
OFF

Necessary storage enables core site functionality. This site cannot function without it, so it can only be disabled by changing settings in your browser.

クッキーのアナリティクス

ON
OFF

When you access our website we use Google Analytics to collect information on your visit. Accepting this cookie will allow us to understand more details about your journey, and improve how we surface information. All analytics information is anonymous and we do not use it to identify you. Google provides a Google Analytics opt-out add on for all popular browsers.

Promotional cookies

ON
OFF

We share news and updates on business and human rights through third party platforms, including social media and search engines. These cookies help us to understand the performance of these promotions.

本サイトにおけるお客様のプライバシーに関する選択

このサイトでは、必要なコア機能を超えてお客様の利便性を高めるために、クッキーやその他のウェブストレージ技術を使用しています。