I've been taking a look at AI again recently and the technology is very impressive.
Seems most decent images these days come from an SDXL model called Pony Diffusion.
https://civitai.com/models/257749/pony-diffusion-v6-xl
Yes, it has furry shit but you can turn it all off since it comes with built in negative prompts for entire parts of the dataset.
Despite being named after ponies 50% of the dataset was anime, and it's apparently the best or one of the best local models for generating it right now.
It intuitively knows how to do sex, blowjobs, footjobs, etc without any loras.
>What is a lora?
Just think of it as an addon that adds additional features to a model it is trained for. (You can't use loras that aren't compatible with your model).
I use Automatic1111 which is buggy as shit but has a lot of good extensions and I've put far too much time into it.
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-NVidia-GPUs
Honestly, there are a few extensions that basically force me to stay now, and let me tell you about them:
https://github.com/zixaphir/Stable-Diffusion-Webui-Civitai-Helper
This labels everything you download from CivitAI and allows you to load in the "best prompt" provided with the lora, along with inserting all the model's trigger words and giving display images to each of your loras.
https://github.com/Mikubill/sd-webui-controlnet
Lets you use an image as direct reference for generating, so you could maybe even get tricky stuff like tail pussys or slime girls to work if you do it right.
I'm not quite familiar with it yet but it's already performed really well. I had the best experience with anime canny and lineart models. You need to download separate models for it and put them in your a1111 folder's models/ControlNet folder.
For SDXL, you can find those
Here:
https://civitai.com/models/136070?modelVersionId=267516
And here:
https://huggingface.co/kohya-ss/controlnet-lllite/tree/main
Finally, by far the most important one, ADetailer.
https://github.com/Bing-su/adetailer
It automatically detects faces in your picture and inpaints them to keep them from looking terrible. You can provide custom prompts for finetuning.
Things to note: I recommend at least having a minimum of 6GB vram and an RTX 20 series or greater gpu.
If you have below 9GB of vram, you should probably use xformers. Look at the automatic1111 wiki for how to do that.
You should also check out
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Optimizations if it runs too slowly.