AI art generation has been evolving at a wild pace, and Google just threw another big contender into the mix through its Gemini Flash 2.0. You can play with the new image creation tool in Google’s AI Studio.
Gemini Flash is, as the name suggests, very fast, notably faster than DALL-E 3 and other image creators. That speed might mean lower quality images, but that’s not the case here, especially because all of the changes and upgrades to the model’s image production ability. Still, if you want really good results, you must know how to talk to the AI. After plenty of trial and error, I’ve put together five tips for getting the absolute best art out of Gemini Flash 2.0. Some of these may seem similar to advice about other AI art creators, because they are, but that doesn’t make them less useful in this context.
Tell a story
The most interesting new feature for Gemini Flash’s image creation is that it isn’t just good for one-off illustrations, it can actually help you create a visual story by generating a series of related images with consistent style, settings, and moods.
To get started, you just have to ask it to tell you a story and how often you want an illustration to go with the action. The result will include those images accompanying the text.
For my project, I asked the AI to “Generate a story of a heroic baby dragon who protected a fairy queen from an evil wizard in a 3d cartoon animation style. For each scene, generate an image.” I saw the above start to appear. And, if there’s an issue, you can rewrite any of the bits of the story and the model will regenerate the image accordingly.
Be super specific
If you tell Gemini to make “a dog in a park,” you might get a blurry golden retriever sitting somewhere vaguely green. But if you say, “A fluffy golden retriever sitting on a wooden bench in Central Park during autumn, with red and orange leaves scattered on the ground”—you get exactly what you’re picturing.
AI models thrive on detail. The more you provide, the better your image will be. So for the image above, instead of just asking for a futuristic looking city, I requested “A retro-futuristic cityscape at sunset, with neon signs glowing in pink and blue, flying cars in the sky, and people walking in retro-future style outfits.” Seven seconds later, the result came in.
Get conversational
One of my favorite things about the new Gemini Flash is that you can get conversational with it without losing much of the speed. That means you don’t have to get everything right in one go. After generating an image, you can literally chat with the AI to make edits. Want to change the colors? Add a character? Make the lighting moodier? Just ask.
In the image set above, I started by asking for “A cozy reading nook with a fireplace, bookshelves filled with novels, and a big comfy armchair.” I then refined it by asking it to “Make it nighttime with soft, warm lighting,” then followed up by asking it to “Add a sleeping cat on the armchair,” and finished by requesting the AI “Give the room a vintage, Victorian aesthetic.” The final result on the left looks almost exactly like what I imagined, and makes Gemini feel like an art assistant, one capable of adjusting to what I want without starting over from scratch every time.
Gemini Flash matches ChatGPT
Google has boasted that Gemini is full of real-world knowledge, which means you can get historical accuracy, realistic cultural details, and true-to-life imagery if you ask for it. Of course, that requires being specific. For example, if you prompt it for “a Viking warrior,” you might get something that looks more like a Game of Thrones character. But if you say, “A historically accurate Viking warrior from the 9th century, wearing detailed chainmail armor, a round wooden shield, and a traditional Norse helmet”—you’ll get something much more precise.
As a test I asked the AI to make “An ancient Mayan city at sunrise, with towering stone pyramids, lush jungle surroundings, and people dressed in traditional Mayan garments.” It’s not perfect, but it looks a lot more like the real thing than previous versions, which would sometimes come back with almost an Egyptian pyramid.
Write fast
Most AI image models have long struggled with rendering text, turning words into illegible scribbles. Even the better models today that can do so take a bit to do it and getting it right can take a few tries. But, Gemini Flash is shockingly good at integrating text into images quickly and legibly. Being very specific can help though.
That’s how I generated the image above by asking the AI to “Make a vintage-style travel poster that says ‘Visit London’ in bold, retro typography, featuring a stylized illustration of the city.”