Generative art with language+diffusion models
September 16, 2022 — March 9, 2025
Suspiciously similar content
Generative art using modern diffusion-backed image generators. The name-brand models are DALL-E 2, Stable Diffusion, Midjourney etc., which are diffusion models for the image generation + transformer models for the text-to-image part.
I’m interested in this in general. I am especially interested in models that
- work on macos
- on my local machine (i.e. use my local GPU)
A method that allows me to use or train my own model is especially interesting. I like using the community trained models for specialisation or jailbreaking. As with many other prt of AI the community is incredible.
For audio stuff, see music diffusion.
1 Community Model Ecosystems
1.1 Hugging Face
- Role: A research-first platform hosting 300,000+ models, including Stable Diffusion variants, ControlNet, and LoRAs.
- Key Features:
- Model Hub: Central repository for downloading/uploading models (e.g.,
CompVis/stable-diffusion-v1-4
). - Ethical AI: Focus on documentation (model cards) and bias mitigation.
- Interoperability: Supports PyTorch/TensorFlow and tools like
diffusers
for pipeline customization.
- Model Hub: Central repository for downloading/uploading models (e.g.,
1.2 CivitAI
- Role: Community-driven hub specializing in artistic models (anime, photorealistic, 3D) and fine-tuned LoRAs.
- Key Features:
- Visual Discovery: Instant previews of model outputs.
- Social Features: Model ratings, comments, and collaborative training.
- Growth: 25M+ monthly visits, 500+ new models daily, with NSFW filters for moderation.
Relationship to GUI Clients:
- Most macOS tools (e.g., DiffusionBee, Mochi Diffusion) support importing models from both ecosystems.
- CivitAI’s LoRAs/styles are popular for artistic workflows, while Hugging Face provides foundational models like SDXL.
2 GUIs
I was using DiffusionBee and some other hugging-face models which I have now forgotten. Since then, new clients have appeared. I got Perplexity to generate a features matrix of the promising ones.
Tool | Cost | Open Source | BYO Models | Apple Silicon Support | Ease of Use | Generation Types |
---|---|---|---|---|---|---|
DiffusionBee | Free | Yes | Yes (TensorFlow) | Native (M1/M2/M3) | ⭐⭐⭐⭐ | Text/Image-Conditional |
Mochi Diffusion | Free | Yes | Yes (CoreML required) | CoreML Optimized | ⭐⭐⭐ | Text/Image-Conditional |
ComfyUI+MLX | Free | Yes | Yes (PyTorch/GGUF) | MLX Accelerated | ⭐⭐ | Text/Image/Video |
InvokeAI | Free | Yes | Yes | Optimized | ⭐⭐⭐ | Text/Image-Conditional |
Draw Things | Free | No | Yes (CivitAI import) | Metal Acceleration | ⭐⭐⭐⭐ | Text/Image-Conditional |
2.1 DiffusionBee
- Description: User-friendly desktop app optimized for Apple Silicon. Offers offline generation, video tools, and FLUX model support.
- Key Features:
- Simplified installation (drag-and-drop DMG)
- Direct CivitAI/Hugging Face model imports
- Image-to-image transformations and inpainting
- Apple Silicon: Native M-series support via TensorFlow/Metal
- Ease: ⭐⭐⭐⭐ (Beginner-friendly)
2.2 Mochi Diffusion
- Native SwiftUI app using Apple’s Core ML framework for maximum hardware efficiency.
- Key Features:
- ~150MB RAM usage with Neural Engine
- EXIF metadata preservation
- ControlNet and RealESRGAN upscaling
- Apple Silicon: CoreML-optimized (3-4GB VRAM usage)
- Ease: ⭐⭐⭐ (Moderate technical skill)
2.3 ComfyUI+MLX
- Visual-programming node-based workflow system enabling granular control over generation pipelines.
- Key Features:
- MLX acceleration for Apple Neural Engine
- First access to new models (SD 3, VideoCrafter)
- 8K upscaling via custom nodes
- Apple Silicon: Requires MLX setup
- Ease: ⭐⭐ (Advanced users)
2.4 InvokeAI
- Dual CLI/WebUI interfaces.
- Key Features:
- Canvas-based iterative editing
- Multi-model blending
- Outpainting with context awareness
- Apple Silicon: Optimized via Metal Performance Shaders
- Ease: ⭐⭐⭐ (Web UI accessible)
InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry-leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.
2.5 Draw Things
- Description: App Store-native tool with AR previews and live generation tuning.
- Key Features:
- One-click CivitAI model imports
- Real-time diffusion process visualization
- Model mixing via sliders
- Apple Silicon: Metal API acceleration (5-7GB VRAM)
- Ease: ⭐⭐⭐⭐ (TouchBar support)
2.6 Misc
-
- Description: Feature-rich browser interface via Gradio.
- Apple Silicon: Requires manual PyTorch/MPS setup
- Best For: Users familiar with Linux-centric workflows
A browser interface based on Gradio library for Stable Diffusion.
-
A handy GUI to run Stable Diffusion, a machine learning toolkit to generate images from text, locally on your own hardware.
It is completely uncensored and unfiltered - I am not responsible for any of the content generated with it. No data is shared/collected by me or any third party.
If I use the Huggingface tooling, building a local UI is easy; it integrates easily with gradio.
Fancier UIs are possible, e.g. ComfyUI, which has a confusing profusion of entities involved in it.
3 Integrating Ecosystems with GUI Tools
- Hugging Face:
- Download models via
git lfs
and place in~/Documents/DiffusionBee/models
. - Use
diffusers
pipelines for custom workflows in ComfyUI. - https://huggingface.co/docs/diffusers/en/using-diffusers/other-formats
- Download models via
- CivitAI:
- Directly import
.safetensors
/LoRAs into Draw Things or DiffusionBee. - Filter models by “macOS-optimized” tags for CoreML/MLX compatibility.
- Directly import
4 Optimizing for Apple Silicon
4.1 Model Conversion
CoreML Tools: Convert PyTorch models to CoreML for Mochi Diffusion using Apple’s
coremltools
.MLX Framework: For ComfyUI, use MLX nodes to enable Neural Engine acceleration (30–70% speed gains).
4.2 Metal Acceleration
- Enable
METAL_PERFORMANCE_SHADERS
in InvokeAI or use Draw Things’ native Metal API for faster inference.
4.3 Model customisation and fine-tuning
4.4 Model suppliers
Hugging Face is the heavy-hitter. See also Civitai:
Civitai is a labour of love from a small team. After being inspired daily by the incredible progress of the Stable Diffusion community and the explosion of custom fine-tuned models, textual inversions, and more, we wanted to see if we could create something that would continue to help the community grow and thrive.
After seeing a gap around sharing the custom models that were being made by the community, we decided to try our hand at putting together a tool that would make it easy for anyone to share, find, and review models. While there were existing services like HuggingFace that allowed users to expose their models as repositories, we felt that it was missing a few key features that would really allow it to serve as a home for the growing community and use case:
- A way for creators to tag models with things that make sense to the SD community
- A good way for people interested in the model to review and share their creations
- A simpler upload and download interface (how many of us are really familiar with code repos)
- An indexed and visual browsing experience of all the models available
- An API that can be used by SD tools to tap into the growing library of models, embeds, aesthetic gradients, and hyper networks available
5 Hosted models
Just go to a website, give someone money and get images back. Trade convenience for privacy and privacy.
5.1 Runway.ml
a platform for creators of all kinds to use machine learning tools in intuitive ways without any coding experience. Find resources here to start creating with RunwayML quickly.
In particular, it plugs into Blender and Photoshop and allows you to use those programs as a UI for ML-backed algorithms. Nice.
5.2 Midjourney
Midjourney produces high-quality images from text prompts. Addictive in that you can get better at it, which feels like mastering a real skill.
5.3 Nightcafe
Stable Diffusion, DALL-E 2, CLIP-Guided Diffusion, VQGAN+CLIP and Neural Style Transfer are all available on NightCafe.
5.4 Playgroundai
6 Punditry
7 Theory
8 Folk history of Stability
The story of modern AI image tools begins with Stable Diffusion—a 2022 open-source breakthrough developed by Stability AI, CompVis (LMU Munich), and RunwayML. Its release democratized high-quality image generation, letting users run models locally and fine-tune them freely. But by 2024 key researchers behind Stable Diffusion left Stability AI to form Black Forest Labs, citing disagreements over open-source commitments and commercialization strategies. Their exit birthed Flux, a transformer-based model family praised for its precision but criticized for its hefty hardware demands (think 24GB VRAM for full features).
The landscape now is fractured
- Corporate models (Flux, DALL-E 3, Midjourney) offer polish and ease but often lock advanced features behind APIs or subscriptions.
- Community-driven tools (SDXL, CivitAI LoRAs) prioritize customization and local control, albeit with steeper learning curves.
Some interesting new contenders have arrives:
- Ideogram (ex-Google Imagen team): Masters text-in-images and typography.
- PixArt-Σ (Tencent): Balances speed and photorealism for commercial workflows.
- CogView-3 (BAAI): Favored for industrial design prototyping.
Corporate models shine for plug-and-play reliability; community forks thrive for niche artistry and ethical transparency. Black Forest’s Flux straddles both worlds—its “dev” version is open-weights but non-commercial, while “pro” targets enterprises. The takeaway? Diversify your toolbox: use corporate tools for client work, community models for experimentation, and keep an eye on Hugging Face’s diffusers
library to stay ahead of the curve.
9 Incoming
- Reddit for AI-generated and manipulated content