Measuring Success Without Undermining Others: Reflections on AI Content Creation Communities
In the realm of AI content creation, collaborative growth and mutual respect are key to fostering innovation and shared success. However, recent interactions on a Discord server dedicated to AI development, particularly surrounding the release of Stable Diffusion 3 (SD3), have highlighted a concerning trend: the tendency to disparage those with differing opinions.
The Discord server, whose owner was significantly involved in the development of SD3, has seen an uptick in divisive rhetoric. This is especially disheartening considering the owner's initial reputation for kindness and humility in the AI content creation community. While the owner did not directly engage in name-calling, a vocal segment of the community began labeling critics of SD3 as "Pony Cultists," drawing lines that echo past divides seen with the "NAI LEAK" incident.
SD3's Promises and Reality
Stable Diffusion 3, touted as Stability AI’s most advanced text-to-image model yet, has faced criticism for its inconsistencies, particularly in generating human anatomy. Despite claims of being a "MidJourney Killer," users have reported issues like distorted faces and unrealistic features. These imperfections are not unusual for new AI releases and should be addressed constructively rather than used as a basis for community discord.
Key Takeaways from Stability AI
SD3 Medium is designed for consumer PCs and laptops as well as enterprise-tier GPUs.
Weights are available under an open non-commercial license and a low-cost Creator License.
For large-scale commercial use, licensing details are available upon request.
Stability AI emphasizes SD3’s capabilities in typography and prompt adherence, using a new Multimodal Diffusion Transformer (MMDiT) architecture.
However, the release of SD3 Medium has been ridiculed online, particularly for its failures in rendering human anatomy. A Reddit thread titled "Is this release supposed to be a joke? [SD3-2B]" details the model's spectacular failures at rendering humans, especially limbs. Another thread, "Why is SD3 so bad at generating girls lying on the grass?" shows similar issues with entire human bodies.
Community Concerns and Comparisons
The issues with SD3 are reminiscent of past Stability AI releases, such as SD2.0, which also suffered from problems depicting humans accurately. AI researchers have found that censoring adult content from training datasets can severely hamper a model's ability to generate accurate human anatomy. Although Stability AI reversed some of these issues with subsequent releases like SD 2.1 and SDXL, the problems persist with SD3.
A Reddit user summed up the frustration: "It wasn't too long ago that StableDiffusion was competing with Midjourney, now it just looks like a joke in comparison. At least our datasets are safe and ethical!" This reflects a broader sentiment that while Stability AI focuses on ethical dataset curation, it compromises on the technical performance of its models.
The Reality of Running SD3
Stability AI claims that SD3 Medium requires only 5GB of GPU VRAM, making it accessible for a wide range of consumer PCs. However, this is misleading. Users need a specific kind of GPU with CUDA cores, and at least 12GB of system RAM to run the UI/UX effectively. The promise of accessibility contrasts sharply with the actual requirements, leading to frustration among users who cannot run the model as easily as advertised.
Addressing Community Disparagement
My approach to content creation has always been rooted in collective success. Learning from and supporting each other, regardless of personal preferences or affiliations, is crucial. The recent disparagement directed at the creators and users of models like Pony Diffusion XL V6—an innovative and versatile model capable of producing a wide range of artistic content—undermines this collaborative spirit. It's important to recognize that each model has its strengths and limitations, and the contributions of diverse creators enrich the entire AI community.
The echo chamber created by uncritical praise and the absence of updates from some creators does a disservice to the field. Success should not be measured solely by association with a particular project or figure. True success lies in continuous improvement, openness to feedback, and the willingness to learn from all members of the community.
Additional Insights: Stability AI’s Technical and Organizational Challenges
SD3's Release and Community Reaction
On Wednesday, Stability AI released weights for Stable Diffusion 3 Medium, an AI image-synthesis model that turns text prompts into AI-generated images. Its arrival has been ridiculed online, however, because it generates images of humans in a way that seems like a step backward from other state-of-the-art image-synthesis models like MidJourney or DALL-E 3. As a result, it can churn out wild anatomically incorrect visual abominations with ease.
A thread on Reddit, titled, "Is this release supposed to be a joke? [SD3-2B]," details the spectacular failures of SD3 Medium at rendering humans, especially human limbs like hands and feet. Another thread, titled, "Why is SD3 so bad at generating girls lying on the grass?" shows similar issues, but for entire human bodies.
Hands have traditionally been a challenge for AI image generators due to lack of good examples in early training data sets, but more recently, several image-synthesis models seemed to have overcome the issue. In that sense, SD3 appears to be a huge step backward for the image-synthesis enthusiasts that gather on Reddit—especially compared to recent Stability releases like SD XL Turbo in November.
"It wasn't too long ago that StableDiffusion was competing with Midjourney, now it just looks like a joke in comparison. At least our datasets are safe and ethical!" wrote one Reddit user.
AI image fans are so far blaming Stable Diffusion 3's anatomy failures on Stability's insistence on filtering out adult content (often called "NSFW" content) from the SD3 training data that teaches the model how to generate images. "Believe it or not, heavily censoring a model also gets rid of human anatomy, so... that's what happened," wrote one Reddit user in the thread.
Basically, any time a user prompt homes in on a concept that isn't represented well in the AI model's training dataset, the image-synthesis model will confabulate its best interpretation of what the user is asking for. And sometimes that can be completely terrifying.
Historical Context and Technical Issues
The release of Stable Diffusion 2.0 in 2022 suffered from similar problems in depicting humans well, and AI researchers soon discovered that censoring adult content that contains nudity could severely hamper an AI model's ability to generate accurate human anatomy. At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some abilities lost by strongly filtering NSFW content.
Another issue that can occur during model pre-training is that sometimes the NSFW filter researchers use to remove adult images from the dataset is too picky, accidentally removing images that might not be offensive and depriving the model of depictions of humans in certain situations. "[SD3] works fine as long as there are no humans in the picture, I think their improved nsfw filter for filtering training data decided anything humanoid is nsfw," wrote one Redditor on the topic.
Using a free online demo of SD3 on Hugging Face, we ran prompts and saw similar results to those being reported by others. For example, the prompt "a man showing his hands" returned an image of a man holding up two giant-sized backward hands, although each hand at least had five fingers.
Stability's Organizational Challenges
Stability announced Stable Diffusion 3 in February, and the company has planned to make it available in various model sizes. Today's release is for the "Medium" version, which is a 2 billion-parameter model. In addition to the weights being available on Hugging Face, they are also available for experimentation through the company's Stability Platform. The weights are available for download and use for free under a non-commercial license only.
Soon after its February announcement, delays in releasing the SD3 model weights inspired rumors that the release was being held back due to technical issues or mismanagement. Stability AI as a company fell into a tailspin recently with the resignation of its founder and CEO, Emad Mostaque, in March and then a series of layoffs. Just prior to that, three key engineers—Robin Rombach, Andreas Blattmann, and Dominik Lorenz—left the company. And its troubles go back even further, with news of the company's dire financial position lingering since 2023.
To some Stable Diffusion fans, the failures with Stable Diffusion 3 Medium are a visual manifestation of the company's mismanagement—and an obvious sign of things falling apart. Although the company has not filed for bankruptcy, some users made dark jokes about the possibility after seeing SD3 Medium: "I guess now they can go bankrupt in a safe and ethically [sic] way, after all."
Contentious Community Dynamics
In the broader AI content creation community, there has been considerable anger directed at a particular developer due to their involvement with SD3. This developer had previously merged models with another project, resulting in a new model, and claimed that they had trained everything themselves. However, it was quickly discovered that the other project’s work was integrated into the new model. This revelation has sparked further tension and mistrust within the community.
Crowdfunding and Future Models
One Reddit user suggested that the community could consider crowdfunding a new base model. They noted that if millions of dollars can be raised for video games, a few hundred thousand dollars could potentially fund an uncensored equivalent to SD3. With over 500,000 members in the Stable Diffusion subreddit, even an average contribution of $1 per person could support the development of a high-quality model, possibly even multiple models per year. This user proposed that a reputable company could facilitate this effort, suggesting that platforms like Civitai could serve as the crowdfunding base, similar to Patreon or Kickstarter, to support the creation of new models.
Measuring Success Without Undermining Others: Reflections on AI Content Creation Communities: A Conclusion
The AI content creation community thrives on diversity and shared knowledge. Disparaging others for their preferences or criticisms undermines the collective growth we should all strive for. Let’s measure success not by tearing others down, but by lifting each other up, learning from our differences, and working together to advance the field. This my friends is how you begin measuring success without undermining others.
References
Analytics Insight. (n.d.). Stable Diffusion NSFW and its alternatives. https://www.analyticsinsight.net/artificial-intelligence/stable-diffusion-nsfw-and-its-alternatives#:~:text=Since%20Stable%20Diffusion%20does%20not,will%20kick%20in%20every%20time
Ars Technica. (2024, June 13). Ridiculed Stable Diffusion 3 release excels at AI-generated body horror. https://arstechnica.com/information-technology/2024/06/ridiculed-stable-diffusion-3-release-excels-at-ai-generated-body-horror/
Creative Bloq. (2024). Stable Diffusion 3 Medium. https://www.creativebloq.com/ai/ai-art/stable-diffusion-3-medium
Education Civitai. (2024). Stable Diffusion 3 pre-release overview. https://education.civitai.com/stable-diffusion-3-pre-release-overview/
Hugging Face. (n.d.). RunwayML: Stable Diffusion v1.5. https://huggingface.co/runwayml/stable-diffusion-v1-5/discussions/1
Reddit. (2024). The censorship argument makes little sense to me. https://www.reddit.com/r/StableDiffusion/comments/1deyw1m/the_censorship_argument_makes_little_sense_to_me/
Reddit. (2024). Women Stable Diffusion 2.0. https://www.reddit.com/r/StableDiffusion/comments/z4mdcv/women_stable_diffusion_20/
Reddit. (2024). Are we sure it's not a bug in adopting the weights? https://www.reddit.com/r/StableDiffusion/comments/1depcxv/are_we_sure_its_not_a_bug_in_adopting_the_weights/
Reddit. (2024). Stability has sent a takedown request of Runway. https://www.reddit.com/r/StableDiffusion/comments/y9706v/stability_has_sent_a_takedown_request_of_runway/
Reddit. (2024). Stable Diffusion v1.5. https://www.reddit.com/r/StableDiffusion/comments/y91pp7/stable_diffusion_v15/
Reddit. (2024). Is SD3 a breakthrough or just broken? https://www.reddit.com/r/StableDiffusion/comments/1demlr9/is_sd3_a_breakthrough_or_just_broken/
Stability AI. (n.d.). License. https://stability.ai/license#select_membership
Stability AI. (2024). Stable Diffusion 3 Medium release notes. https://stability.ai/news/stable-diffusion-3-medium
VentureBeat. (2024). Stability AI brings new size to image generation with Stable Diffusion Medium. https://venturebeat.com/ai/stability-ai-brings-new-size-to-image-generation-with-stable-diffusion-medium/
X (formerly Twitter). (2024). RunwayML status update. https://x.com/runwayml/status/1583109275643105280
Disclaimer
The thoughts and opinions expressed in this article are my own and reflect the views of the AI content creation community as gathered from various discussions and sources. They do not represent the views of any professional entity, including Civitai and its staff. This article is intended for informational and discussion purposes only. The content should not be taken as professional advice or be used as the basis for any legal actions.
Commenti