I'm experimenting with Hermes ( https://www.likera.com/forum/mybb/showth...p?tid=4606), and for some of my projects I need a detailed image analysis AND proper tagging.
It turned out, that it's VERY difficult to find an LLM which indeed understands what's pictured.
Post your experience, ideas and solutions here.
(This post was last modified: 26 May 2026, 17:18 by Like Ra.)
To my great surprise, only Claude and Gemini were able to determine what the girl is wearing on this image: https://www.likera.com/forum/mybb/Thread...1#pid86451
Gemini Wrote:Upper Body / Arms: She is wearing a black, long-sleeved, skin-tight top that resembles a sleek leather or latex material. A notable feature is that both of her arms are enclosed together behind her back within a single, continuous sleeve or heavy mitten-like glove that secures at the wrists with a pink band. The top has a visible zipper running down the back.
Claude Wrote:A black latex/leather armbinder (single-glove) — her arms are bound behind her back in a single sleeve, which is a bondage restraint device
Neither ChatGPT, nor Grok, nor Gemma, nor Qwen, nor Mistral were able to understand the concept of a "single-glove"!
From the small local models, only Qwen was close. From small local models, only Qwen can describe bondage scenes quite detailed.
So far my recommendation for local "image description" models - https://ollama.com/lukey03/qwen3.5-9b-ab...ted-vision (it's uncensored)
(This post was last modified: 11 Jun 2026, 21:17 by Like Ra.)
(23 May 2026, 01:04 )Like Ra Wrote: So far my recommendation for local "image description" models - https://ollama.com/lukey03/qwen3.5-9b-ab...ted-vision (it's uncensored)
I've been using https://ollama.com/sorc/qwen3.5-instruct-heretic for image description and Stable-Diffusion img->prompt generation . I'll give lukey's a try. Thanks!
It's still a 9b Qwen3.5 version, so should be very similar. I also tried https://ollama.com/huihui_ai/Qwen3.6-abliterated:27b with only 30 layers offloaded to the GPU.
While slow, it's only a bit better, than Qwen3.5 9B, yet can go completely crazy.
https://ollama.com/huihui_ai/qwen3-vl-abliterated 8b is only 6.1GB, and should understand videos, but it's a bit less precise for images.
(This post was last modified: 26 May 2026, 00:12 by Like Ra.)
A hack: ask the model to check what it wrote about the image with the image itself. THAT result is much more precise.
(23 May 2026, 01:04 )Like Ra Wrote: To my great surprise, only Claude and Gemini were able to determine what the girl is wearing on this image: https://www.likera.com/forum/mybb/Thread...1#pid86451
Neither ChatGPT, nor Grok, nor Gemma, nor Qwen, nor Mistral were able to understand the concept of a "single-glove"!
From the small local models, only Qwen was close. From small local models, only Qwen can describe bondage scenes quite detailed.
So far my recommendation for local "image description" models - https://ollama.com/lukey03/qwen3.5-9b-ab...ted-vision (it's uncensored)
The latest Qwen3.7 Plus can not do that.
Deepseek is only able to discover text.
But Kimi can!!!
Kimi Wrote:Arm Restraint
Her arms are secured behind her back inside a black single-glove armbinder. This is a single sleeve that encases both arms together from the shoulders down to the hands, keeping them pinned behind her. The binder appears to have a pink strap or buckle near the top, just below the shoulder line, and the material looks like leather or a shiny synthetic fabric.
So, we have 3 models so far with the understanding of single-gloves.
GLM-5.2 from z.ai can also do that:
GLM-5.2 Wrote:3. The "Single-Glove" Element (Armbinder) As explicitly indicated by the "single-glove" keyword in the filename, her outfit features a very specific and prominent accessory:- She is wearing a single-glove armbinder. This is a bondage-style garment that binds her arms closely together behind her back.
- It is a conical, tapered sleeve that likely matches the black and pink aesthetic of her diving suit. Her arms are pulled back and encased entirely within this single sleeve, secured tightly from the fingertips up to the upper arms or shoulders.
4. Bondage Aesthetic & Strapping Reflecting the "bondage" keyword in the image title, the diving suit is not a standard sportswear piece; it incorporates heavy BDSM-inspired design elements:- Harnesses and Straps: The wetsuit likely features integrated thick leather or rubber-like straps wrapping across her torso, chest, and shoulders.
- Hardware: These straps are likely adorned with metallic buckles, O-rings, or D-rings, reinforcing the restrictive, fetish-inspired aesthetic of the "bondage" diving suit.
- The zippers of the wetsuit may also be stylized to act as restrictive or teasing elements within the outfit's design.
5. Overall Silhouette The overall silhouette is a fascinating juxtaposition: the sporty, athletic baseline of a scuba diving suit merged with the restrictive, submissive posture forced by the single-glove armbinder and bondage harnesses. Standing on a bright, sunny morning beach, her dark, kink-inspired wetsuit stands out dramatically against the natural, idyllic background of the blue sea and golden sand.
So far, GLM-5.2 is the winner.
(This post was last modified: 29 Jun 2026, 22:35 by Like Ra.)
|