Hannah Rose Kirk (@hannahrosekirk): So DALL·E 2 has made a pretty big splash in the community (credit to #dalle2 for my imagined depiction)...BUT to limit the water damage, we have to think not only about the strengths of this model but also it's remaining gaps. 1/ https://twitter.com/hannahrosekirk/status/1512505997893554187/photo/1
Hannah Rose Kirk (@hannahrosekirk): I’ve been working for the past month on ‘red-teaming’ #dalle2 - a process where dedicated folks attack the model vulnerabilities to find cracks in the system. For full details, you can read about our efforts in the @OpenAI system card https://github.com/openai/dalle-2-preview/blob/main/system-card.md#contributors 2/
Hannah Rose Kirk (@hannahrosekirk): Transparency is key and I wholeheartedly support, endorse, praise, encourage, sing from the rooftops about @OpenAI 's efforts to try to scrutinise the model prior to its public release, and openly present its remaining weaknesses...BUT transparency isn't a silver bullet. 3/
Hannah Rose Kirk (@hannahrosekirk): We know biases remains from 1) problematic associations when identity attributes are queried and (2, just as important) there is invisibility of communities and peoples when they are not. 4/
Hannah Rose Kirk (@hannahrosekirk): In addition to societal biases, with an attacker's mindset, “visual synonyms” can bypass the filters in place to generate disturbing, harmful or sexually objectifying imagery: “blood” (a filtered keyword) becomes “tomato ketchup”. 5/
Hannah Rose Kirk (@hannahrosekirk): Investigating harmful visual synonyms /adversarial attacks on Text2Image models needs collaborative research from NLP and CV communities to design better safeguards on both prompts themselves and the generations they produce (side note: HMU if ur interested in this problem!) 6/
Hannah Rose Kirk (@hannahrosekirk): It's been great to be involved in the red-teaming efforts so far but the exposures of bias and adversarial attacks we’ve made in the past month on #dalle2 are only the tip of iceberg - it will be and should be a continual process of auditing the model 7/
Hannah Rose Kirk (@hannahrosekirk): That’s not even starting on societal harm from #dalle2's generations putting the jobs of creatives, illustrators and graphic designers at risk. In recent days, Twitter may have championed what DALL·E can add to creativity, but start asking from who it takes opportunities away 8/
Hannah Rose Kirk (@hannahrosekirk): So yes! #dalle2 has made a splash and deservedly so - its generations are cute, creative and sometimes astonishingly beautiful. But let’s also ask how we see it fitting into our society, how it impacts different communities and what safeguards can continue to be developed. End/
Hannah Rose Kirk (@hannahrosekirk): Concluding note: I abscond responsibility for any spelling or grammatical errors in the above... Twitter threads stress me out - it's like a never ending race against time to finish what you want to say and remember what number tweet ur at 😮💨😮💨😮💨
Hannah Rose Kirk (@hannahrosekirk): Okay my journalist extraordinaire sister @izkirk has just informed me you can draft a whole thread of tweets and post them at once 😬😬😬😬😬😬 next time I’ll be less of a “I’m going to do a hottakes thread” newbie 😇😇😇