Abstract
Prompt engineering is still the primary way for users of
generative text-to-image models to manipulate generated
images in a targeted way. Based on treating the model as a
continuous function and by passing gradients between the
image space and the prompt embedding space, we propose and
analyze a new method to directly manipulate the embedding of
a prompt instead of the prompt text. We then derive three
practical interaction tools to support users with image
generation: (1) Optimization of a metric defined in the image
space that measures, for example, the image style. (2)
Supporting a user in creative tasks by allowing them to
navigate in the image space along a selection of directions
of ``near'' prompt embeddings. (3) Changing the embedding of
the prompt to include information that a user has seen in a
particular seed but has difficulty describing in the prompt.
Compared to prompt engineering, user-driven prompt embedding
manipulation enables a more fine-grained, targeted control
that integrates a user's intentions. Our user study shows
that our methods are considered less tedious and that the
resulting images are often preferred.
Users
Please
log in to take part in the discussion (add own reviews or comments).