Text to image tutorial

5 Replies, 636 Views

(05 May 2024, 06:13 )PurpleVibes Wrote: I propose a 101 introduction to generative AI. I don't understand how to use the available tools, how to train models, or what AI is available that allows the creation of adult content like bdsm, fetish, etc.

Or was it ideas like in topics to prompt an AI? In that case, backseam stockings and garter belts.


My recommendation is InvokeAI. https://github.com/invoke-ai/InvokeAI

(because that's the tool I'm using and I found it more user- and beginner friendly than the other popular tools, namely Automatic1111 (https://github.com/AUTOMATIC1111/stable-diffusion-webui) and ComfyUI)

Step 1
Have an nvidia GPU, RTX 2000 or higher (highly recommended, AMD should work by now, but I can't attest to speed and capabilities and do not know which generation of cards is the minimum requirement). You could run it on your CPU, but generating a single image will take minutes instead of seconds, which will be very frustrating, especially when you're still trying to figure things out by trial and error.
Forget about training your own models unless you own a nvidia GPU with at least 16 GB or VRAM - and even that might get dicey.
It will be interesting to see what the next generation of consumer cards will bring in terms of RAM and AI capabilities.

Step 2:
Read up on the installation requiremens for InvokeAI here.
Aside from the GPU requirement above, you'll need at least 12 GB of RAM and installations of Python, Nvidia CUDA drivers and possibly some MS Visual C++ components. It's all listed in detail. All of these are basically one-click installations.
Just follow the instructions carefully and you should be good.
Once you have checked those boxes, you can download the Invoke installer, unzip it, run the .bat file and follow the instructions.
If it's confusing, I can provide some screenshots with explanations.

Step 3:
Download some checkpoint models. There are thousands. There are two go-to repositories, Huggingface and CivitAI.
If you want to do kinky stuff, I highly recommend the PonyXL checkpoint model, because it's really capable right out of the box and there are also some very good LoRas. LoRas are basically "style/content addons" that guide the checkpoint towards a certain outcome.

CivitAi has lots of example images for each checkpoint/LoRa, often with the corresponding prompt and settings, so you can gather inspiration from there. Word of warning, CivitAI hosts A LOT of NSFW content (which should be filtered by default and needs to be unlocked in your profile settings), so be prepared to find a lot of... interesting imagery. You will want to create a profile to "like" and keep track of your favourite Models and LoRas, though.

PonyXL
Joschek's LoRas for PonyXL (very good bondage related LoRas, probably the best around currently)
Vixion's Styles for PonyXL (a variety of art style LoRas, from cartoon to hyperrealism. You can mix these (in moderation) for interesting combinations)

Step 4:
Familiarize yourself with the InvokeAI GUI.
They have their own YouTube channel with short tutorial videos explainging both the basics and more advanced topics.
For starters, you will want to know how to install your checkpoint models via the built-in Model Manager, but you will install some basic ones while running the InvokeAI setup procedure, so you are good to go for now. Those selectable checkpoint models will not be as receptive for kinky prompts as PonyXL though.

Step 5:
Experiment, learn and have fun. Enter some phrases into the "Positive Prompt" text box, select the number of images you want to create and hit "Invoke". See what you get. Correct unwanted elements by adding them to the "Negative Prompt" text box. Use basic language skills, don't overthink it. Phrases at the start of the prompt will be weighted heavier in the outcome than those at the end of the prompt. Use synonyms to get your point across.

Well, that's cutting it short a bit, but that's enough to get you started. There's obviously a lot of nuance to it both from the technical standpoint and the "art" of prompt engineering (that sounds so pretentious, but, ah well...). There is of course a learning curve that can feel steep at times, but once you're over the initial bump, it gets a lot easier.

If there are any questions, I'll try and do my best to assist.
(This post was last modified: 05 May 2024, 22:37 by Like Ra.)
I moved the post to a new thread and tweaked it a bit.
(05 May 2024, 15:15 )Bound Whore Wrote: Forget about training your own models unless you own a nvidia GPU with at least 16 GB or VRAM - and even that might get dicey.
The process takes a bit more than 6GB. So, 8GB should be enough (you have to stop all other programs, which use the graphical card). I use 3060 with 12GB VRAM - unbeatable for the price!

E.g. AutismSDXLPony model + 6 LoRAs:
ai/invokeai/.venv/bin/python 5938MiB <- less than 6GB VRAM

(05 May 2024, 15:15 )Bound Whore Wrote: Aside from the GPU requirement above, you'll need at least 12 GB of RAM
I think it depends on the amount of LoRAs. I have 64GBRAM with memory compression (zram-config) and at one time I notice that all SWAP space is used.
(This post was last modified: 05 May 2024, 23:56 by Like Ra.)
When uploading images, please convert them to JPG or WEBP!!!! PNG takes way too much space.
For image generation I use an offline AI called Stable Diffusion - Forge.

Here is the link to the download page:
https://github.com/lllyasviel/stable-dif...ebui-forge

You can find installation instructions on this page:
https://stable-diffusion-art.com/sd-forg...OMATIC1111

The checkpoint I'm currently using is:
https://civitai.com/models/161068?modelVersionId=979329

I also use some different LORA's.

Hope this helps you further.
If you need help, just let me know and I'll try to help you.
(12 Dec 2024, 15:03 )theo Wrote: For image generation I use an offline AI called Stable Diffusion - Forge.

Here is the link to the download page:
https://github.com/lllyasviel/stable-dif...ebui-forge

I'm also using Stable Diffusion - Forge , mostly with the Flux based models these days.
I use a prompt randomizer extension sd-dynamic-prompts to add some variety in, and
then I use another extension sd-webui-reactor to overlay a custom face onto the generated face.

I've been using a automatic img2txt tool called 'taggui' (running an algorthm called 'joycaption')  to generate prompts
from existing pictures ..  I can then swipe the details of a picture, and then put that back into SD-F to generate more similar images.

oop.. got into the nerd weeds there 😁  .

    -lopbunny , first post!

Possibly Related Threads…
Thread Author Replies Views Last Post
  General Image AI thread Like Ra 11 1,108 15 May 2024, 14:41
Last Post: RedCattyLatex