Image analysis

5 Replies, 389 Views

I'm experimenting with Hermes (https://www.likera.com/forum/mybb/showth...p?tid=4606), and for some of my projects I need a detailed image analysis AND proper tagging.

It turned out, that it's VERY difficult to find an LLM which indeed understands what's pictured.

Post your experience, ideas and solutions here.
(This post was last modified: 26 May 2026, 17:18 by Like Ra.)
To my great surprise, only Claude and Gemini were able to determine what the girl is wearing on this image: https://www.likera.com/forum/mybb/Thread...1#pid86451

Gemini Wrote:Upper Body / Arms: She is wearing a black, long-sleeved, skin-tight top that resembles a sleek leather or latex material. A notable feature is that both of her arms are enclosed together behind her back within a single, continuous sleeve or heavy mitten-like glove that secures at the wrists with a pink band. The top has a visible zipper running down the back.

Claude Wrote:A black latex/leather armbinder (single-glove) — her arms are bound behind her back in a single sleeve, which is a bondage restraint device

Neither ChatGPT, nor Grok, nor Gemma, nor Qwen, nor Mistral were able to understand the concept of a "single-glove"!

From the small local models, only Qwen was close. From small local models, only Qwen can describe bondage scenes quite detailed.

So far my recommendation for local "image description" models - https://ollama.com/lukey03/qwen3.5-9b-ab...ted-vision (it's uncensored)
(This post was last modified: Yesterday, 21:17 by Like Ra.)
(23 May 2026, 01:04 )Like Ra Wrote: So far my recommendation for local "image description" models - https://ollama.com/lukey03/qwen3.5-9b-ab...ted-vision (it's uncensored)

I've been using https://ollama.com/sorc/qwen3.5-instruct-heretic for image description and Stable-Diffusion img->prompt generation . I'll give lukey's a try. Thanks!
It's still a 9b Qwen3.5 version, so should be very similar. I also tried https://ollama.com/huihui_ai/Qwen3.6-abliterated:27b with only 30 layers offloaded to the GPU.
While slow, it's only a bit better, than Qwen3.5 9B, yet can go completely crazy.

https://ollama.com/huihui_ai/qwen3-vl-abliterated 8b is only 6.1GB, and should understand videos, but it's a bit less precise for images.
(This post was last modified: 26 May 2026, 00:12 by Like Ra.)
A hack: ask the model to check what it wrote about the image with the image itself. THAT result is much more precise.
(23 May 2026, 01:04 )Like Ra Wrote: To my great surprise, only Claude and Gemini were able to determine what the girl is wearing on this image: https://www.likera.com/forum/mybb/Thread...1#pid86451

Neither ChatGPT, nor Grok, nor Gemma, nor Qwen, nor Mistral were able to understand the concept of a "single-glove"!

From the small local models, only Qwen was close. From small local models, only Qwen can  describe bondage scenes quite detailed.

So far my recommendation for local "image description" models - https://ollama.com/lukey03/qwen3.5-9b-ab...ted-vision (it's uncensored)

The latest Qwen3.7 Plus can not do that.
Deepseek is only able to discover text.
But Kimi can!!!

Kimi Wrote:Arm Restraint
Her arms are secured behind her back inside a black single-glove armbinder. This is a single sleeve that encases both arms together from the shoulders down to the hands, keeping them pinned behind her. The binder appears to have a pink strap or buckle near the top, just below the shoulder line, and the material looks like leather or a shiny synthetic fabric.

So, we have 3 models so far with the understanding of single-gloves.

Possibly Related Threads…
Thread Author Replies Views Last Post
  Is this [image] AI generated? FireDesire 17 3,020 16 Jan 2026, 13:08
Last Post: krypton85
  Text to image tutorial Bound Whore 6 2,353 23 Feb 2025, 20:18
Last Post: theo
  General Image AI thread Like Ra 10 3,059 15 May 2024, 14:41
Last Post: RedCattyLatex