This reminds me of my experience trying to generate a reference photo for a 3d model.
I told Nano Banana to generate an image of the character with his feet shoulder width apart. It ended up generating him with his feet pressed together, so I told Nano Banana to widen his stance slightly.
It gave me an image of the man with his feet spread far apart enough to straddle a horse. I asked for a slightly narrowed stance and his feet were once again brought together.
This went back and forth unsuccessfully for a while until I asked, "I'm asking you to make his feet shoulder-width apart. Why are you ignoring me?" And Nano Banana confidently asserted that they are shoulder width apart, and I must be wrong.
Ultimately I ended up telling the model to render the same character, pinching a cantaloupe between his ankles, and then to remove the cantaloupe. It worked, but why do I have to trick Google's SOTA image generator to give me very basic stuff like this?
I told Nano Banana to generate an image of the character with his feet shoulder width apart. It ended up generating him with his feet pressed together, so I told Nano Banana to widen his stance slightly.
It gave me an image of the man with his feet spread far apart enough to straddle a horse. I asked for a slightly narrowed stance and his feet were once again brought together.
This went back and forth unsuccessfully for a while until I asked, "I'm asking you to make his feet shoulder-width apart. Why are you ignoring me?" And Nano Banana confidently asserted that they are shoulder width apart, and I must be wrong.
Ultimately I ended up telling the model to render the same character, pinching a cantaloupe between his ankles, and then to remove the cantaloupe. It worked, but why do I have to trick Google's SOTA image generator to give me very basic stuff like this?