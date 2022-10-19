Prompts are how you can control the outputs for Diffusion models. Diffusion models are verbose and take two primary inputs and translate these into a fixed point in its model’s latent space, a seed integer, and a text prompt. The seed integer is generally automatically generated, and the user provides the text prompt. Continuous experimentation via Prompt engineering is critical to getting the perfect outcomes. We explored Dall-E 2 and Stable Diffusion and have consolidated our best tips and tricks to getting the most out of your prompts, including prompt length, artistic style, and key terms to help you sculpt the images you want to generate.

How to prompt

In general, there are three main components to a prompt:

Frame + Subject + Style + an optional Seed.

1. Frame - The frame of an image is the type of image to be generated. This is combined with the Style later in the prompt to provide an overall look and feel of the image. Examples of Frames include photograph, digital illustration, oil painting, pencil drawing, one-line drawing, and matte painting.

The following examples are modified versions of the base prompt "Painting of a person in a Grocery Store," in the frame of an oil painting, a digital illustration, a realistic photo, and a 3D cartoon.

Diffusion models typically default to a “picture” frame if not specified, though this is dependent on the subject matter. By specifying a frame of an image, you control the output directly.

By modifying the frame to “Polaroid” you can mimic the output of a polaroid camera, complete with large white borders.

Pencil Drawings can be produced as well.

And as already covered, different painting techniques can be applied.

Frames provide a rough guide for the output type the diffusion model should generate. But in order to create remarkable images, a good subject and refined style should also be added to your prompts. We will cover subjects next and then detail tips and tricks for combining frames, subjects, and styles to fine-tune your images.

2. Subject - The main subject for generated images can be anything you can dream up.

Diffusion models are built largely from publicly available internet data and are able to produce highly accurate images of objects that exist in the real world.

However, Diffusion models often struggle with compositionality, so ideally, limiting your prompts to one to two subjects is best.

Sticking to one or two subjects produces generally good results, for example "Chef Chopping Carrots on a cutting board."

Even though there is some confusion here with a knife chopping another knife, there are chopped carrots in the scene, which is generally close to the original prompt.

However, expanding to more than two subjects can produce unreliable and sometimes humorous results:

Diffusion models tend to fuse two subjects into a single subject if the subjects are less common. For example, the prompt “a giraffe and an elephant” yields a giraffe-elephant hybrid rather than a single giraffe and a single elephant. Interestingly, there are often two animals in the scene, but each is typically a hybrid.

Some attempts to prevent this, including adding in a preposition like “beside,” have mixed results but are closer to the original intent of the prompt.

This issue appears subject-dependent, as a more popular pair of animals, such as “a dog and a cat,” generates distinct animals without a problem.

3. Style - The style of an image has several facets, key ones being the lighting, the theme, the art influence, or the period.

Details such as “Beautifully lit”, “Modern Cinema”, or “Surrealist”, will all influence the final output of the image.

Referring back to the prompt of "chefs chopping carrots," we can influence this simple image by applying new styles. Here we see a “modern film look” applied to the frames of “Oil Painting” and “Picture.”

The tone of the images can be shaped by a style, here we see “spooky lighting.”

You can fine-tune the look of the resulting images by slightly modifying the style. We start with a blank slate of “a house in a suburban neighborhood.”

By adding “beautifully lit surrealist art” we get much more dynamic and intense images.

Tweaking this we can get a spooky theme to the images by replacing “beautifully lit” with the phrase “spooky scary.”

Apply this to a different frame to get the desired output, here we see the same prompt with the frame of an oil painting.

We can then alter the tone to “happy light” and see the dramatic difference in the output.

You can change the art style to further refine the images, in this case switching from “surrealist art” to “art nouveau.”

As another demonstration of how the frame influences the output, here we switch to “watercolor” with the same style.

Different seasons can be applied to images to influence the setting and tone of the image.

There is a near-infinite variety of combinations of frames and styles and we only scratch the surface here.

Artists can be used to fine-tune your prompts as well. The following are versions of the same prompt, "person shopping at a grocery store," styled to look like works of art from famous historic painters.

Start with a base prompt of “painting of a human cyborg in a city {artist} 8K highly detailed.”

While the subject is a bit unorthodox for this group, each painting fits the expected style profile of each artist.

We can alter the style by modifying the tone, in this case, to “muted tones”:

You can further alter the output by modifying both the frame and the tone to get unique results, in this case, a frame of a “3D model painting” with neon tones.

Adding the qualifier, “the most beautiful image you’ve ever seen” yields eye-catching results.

And depictions such as “3D model paintings” yield unique, novel works of art.

By modifying the frame and style of the image, you can yield some amazing and novel results. Try different combinations of style modifiers, including “dramatic lighting”, or “washed colors” in addition to the examples that we provided to fine-tune your concepts further.

We hardly scratched the surface in this guide, and look forward to amazing new creations from the community.

4. Seed

A combination of the same seed, same prompt, and same version of Stable Diffusion will always result in the same image.

If you are getting different images for the same prompt, it is likely caused by using a random seed instead of a fixed seed. For example, "Bright orange tennis shoes, realistic lighting e-commerce website" can be varied by modifying the value of the random seed.

Changing any of these values will result in a different image. You can hold the prompt or seed in place and traverse the latent space by changing the other variable. This method provides a deterministic way to find similar images and vary the images slightly.

Varying the prompt to "bright blue suede dress shoes, realistic lighting e-commerce website" and holding the seed in place at 3732591490 produces results with similar compositions but matching the desired prompt. And again, holding that prompt in place and traversing the latent space by changing the seed produces different variations:

To summarize a good way to structure your prompts is to include the elements of “[frame] [main subject] [style type] [modifiers]” or “A [frame type] of a [main subject], [style example]” And an optional seed. The order of these exact phrases may alter your outcome, so if you are looking for a particular result it is best to experiment with all of these values until you are satisfied with the result.

4. Prompt Length

Generally, prompts should be just as verbose as you need them to be to get the desired result. It is best to start with a simple prompt to experiment with the results returned and then refine your prompts, extending the length as needed.

However, many fine-tuned prompts already exist that should be reused or modified.

Modifiers such as "ultra-realistic," "octane render," and "unreal engine" tend to help refine the quality of images, as you can see in some of the examples below.

“A female daytrader with glasses in a clean home office at her computer working looking out the window, ultra realistic, concept art, intricate details, serious, highly detailed, photorealistic, octane render, 8 k, unreal engine”

“portrait photo of a man staring serious eyes with green, purple and pink facepaint, 50mm portrait photography, hard rim lighting photography–beta –ar 2:3 –beta –upbeta”

“Extremely detailed wide angle photograph, atmospheric, night, reflections, award winning contemporary modern interior design apartment living room, cozy and calm, fabrics and textiles, geometric wood carvings, colorful accents, reflective brass and copper decorations, reading nook, many light sources, lamps, oiled hardwood floors, color sorted book shelves, couch, tv, desk, plants”

“Hyperrealistic and heavy detailed fashion week runway show in the year 2050, leica sl2 50mm, vivid color, high quality, high textured, real life”

“Full-body cyberpunk style sculpture of a young handsome colombian prince half android with a chest opening exposing circuitry and electric sparks, glowing pink eyes, crown of blue flowers, flowing salmon-colored silk, fabric, raptors. baroque elements. full-length view. baroque element. intricate artwork by caravaggio. many many birds birds on background. trending on artstation, octane render, cinematic lighting from the right, hyper realism, octane render, 8k, depth of field, 3d”

“Architectural illustration of an awesome sunny day environment concept art on a cliff, architecture by kengo kuma with village, residential area, mixed development, high - rise made up staircases, balconies, full of clear glass facades, cgsociety, fantastic realism, artstation hq, cinematic, volumetric lighting, vray”

5. Additional Tips

A few additional items are worth mentioning.

Placing the primary subject of the image closer to the beginning of the prompt tends to ensure that subject is included in the image. For instance, compare the two prompts

"A city street with a black velvet couch" at times will miss the intent of the prompt entirely and the resulting image will not include a couch.

By rearranging the prompt to have the keyword "couch" closer to the beginning of the prompt, the resulting images will almost always contain a couch.





There are combinations of subject and location that tend to yield poor results. For instance, "A black Velvet Couch on the surface of the moon" yields uneven results, with different backgrounds and missing couches entirely. However, a similar prompt, "A black velvet couch in a desert" tends to reflect the intent of the prompt, capturing the velvet material, the black color, and the characteristics of the scene more accurately. Presumably, there are more desert images contained in the training data, making the model better at creating coherent scenes for deserts than the moon.





Prompt engineering is an ever-evolving topic, with new tips and tricks being uncovered daily. As more businesses discover the power of diffusion models to help solve their problems, it is likely that a new type of career, "Prompt Engineer" will emerge.



