ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling

Abstract

We report ACE++, an instruction-based diffusion framework that tackles various image generation and editing tasks. Inspired by the input format for the inpainting task proposed by FLUX.1-Fill-dev, we improve the Long-context Condition Unit (LCU) introduced in ACE and extend this input paradigm to any editing and generation tasks. To take full advantage of image generative priors, we develop a two-stage training scheme to minimize the efforts of finetuning powerful text-to-image diffusion models like FLUX.1-dev. In the first stage, we pre-train the model using task data with the 0-ref tasks from the text-to-image model. There are many models in the community based on the post-training of text-to-image foundational models that meet this training paradigm of the first stage. For example, FLUX.1-Fill-dev deals primarily with painting tasks and can be used as an initialization to accelerate the training process. In the second stage, we finetune the above model to support the general instructions using all tasks defined in ACE. To promote the widespread application of ACE++ in different scenarios, we provide a comprehensive set of models that cover both full finetuning and lightweight finetuning, while considering general applicability and applicability in vertical scenarios. The qualitative analysis showcases the superiority of ACE++ in terms of generating image quality and prompt following ability

Subject-Driven Generation

The duck is walking on the road with many shops on both sides, anime style.

The duck sit on beach chairs drinking drinks, with coconut trees and the sea behind, anime style.

The duck sits on the sofa in the living room with many indoor decorations.

The duck is riding a skateboard in a skate park filled with colorful graffiti, anime style.

Show the logo printed elegantly on the front of a premium product box, nestled among natural elements like leaves and wood to reflect an eco-friendly brand.

Incorporate the logo into a stylish name badge hanging from a lanyard, worn by participants at a professional conference with a bright and enaging background.

Render the logo in dynamic animation on a digital billboard, showcasing it in a colorful urban landscape as commuters pass by during evening rush hour.

Display the logo in a minimalist style printed in white on a matte black ceramic coffee mug, alongside a steaming cup of coffee on a cozy cafe table.

The girl's pattern is printed on the computer screen and the packing box on the table, with rich anime elements.

The girl’s style acrylic standing plaque is placed in front of the computer on the desk.

Print the girl's design on a handsome off-road vehicle, with the streets of the city behind it.

The casing of a desktop computer from ASUS features a design of this cartoon girl. The main unit is placed on the desk, with a close-up showcasing the design details of the case. The girl's pattern blends seamlessly with the exterior of the unit. The transparent case design allows the internal hardware to be clearly visible, with blinking LED lights highlighting the details of the graphics card, motherboard, and other components.

Attach this icon to the top right corner of a pair of jeans, product image.

Place this icon in the center of a sleek black leather wallet on a wooden table.

Stick this icon to the right sleeve of a stylish red sports jacket hanging in a vibrant urban alley.

Stick this icon on the tag hanging on the fashionable red jacket.

the furry duck toy wears a blue sweater, is placed on a bookshelf nearby a television.

Plush ducks are placed on Spider Man's head and shuttle through the city with him.

Plush ducks are drifting on the sea surface in a small boat.

Plush duck toy wearing sunglasses, placed next to vinyl record, rock style.

The fox is standing with Snow White in front of a cartoon castle gate.

Make this fox become a twin fox. They are in the office, with one fox looking at his watch.

There is a fox on the pedestrian street of the city, surrounded by people coming and going.

The fox pendant is hanging next to the backpack, with a close-up of the fox pendant and a blurred background.

Portrait-Consistency Generation

Dress the character in the image with elf ears and a wizard's robe, transforming them into a mage character from a fantasy world.

Make the character in the image embody the goddess Artemis, adorned in ancient Greek-style clothing, showcasing elegance and strength.

(Seed: xxx) Maintain the facial features, A girl is wearing a neat police uniform with "HOPPS" labeled on it and sporting a badge with a cute Disney cartoon style like Officer Judy Hopps from "Zootopia". She has large, bright purple eyes, smiling with a friendly and confident demeanor. She stands in front of a microphone, seemingly giving a speech or presentation, with lively gestures that convey positive emotions. The background is blurred, featuring a cartoon logo of "Police Department".

(Seed: yyy) Maintain the facial features, A girl is wearing a neat police uniform with "HOPPS" labeled on it and sporting a badge with a cute Disney cartoon style like Officer Judy Hopps from "Zootopia". She has large, bright purple eyes, smiling with a friendly and confident demeanor. She stands in front of a microphone, seemingly giving a speech or presentation, with lively gestures that convey positive emotions. The background is blurred, featuring a cartoon logo of "Police Department".

Keep the characters in the picture unchanged and switch the background to Chinese architecture.

Transform the character into a modern athlete, wearing shiny sportswear.

Replace the character's clothing with a Chinese traditional clothing, with an ancient teahouse in the background.

Insert the character into Grant Wood’s "American Gothic" scene, filled with a rural atmosphere.

Replace the character's outfit with an Indian sari and place her in a colorful market.

the man like muscular superhero character standing confidently in an underwater setting. He is dressed in a striking metallic costume featuring green and gold hues, designed with intricate patterns that evoke a sense of aquatic origins. The character holds a powerful trident in one hand and a shield in the other, emphasizing his role as a protector of the seas. Water splashes around him, enhancing the dynamic and heroic atmosphere of the scene. His flowing hair adds to the dramatic effect, highlighting his fierce demeanor as he prepares for action in the depths of the ocean.

the person seated at a chessboard, holding a knight piece in their hand. The chess setup includes a mix of wooden and lightcolored pieces, creating a contrast against a polished wooden table. Surrounding the chessboard are various small bottles, likely containing different liquids, suggesting an assortment of beverages or condiments. The person is dressed in a simple black shirt with a white collar, exuding a vintage or classic aesthetic, and has short, styled hair. The backdrop is a muted gray.

Replace the character's clothing with classic Western cowboy attire, holding a guitar by the campfire.

Dress the character in the image with elf ears and a wizard's robe, transforming them into a mage character from a fantasy world.

The man, with his white hair billowing in the gentle breeze, glides down the sunlit street on his sleek white electric scooter, a picture of carefree joy. He’s dressed in a light blue shirt and comfortable jeans, exuding a relaxed charm as he makes his way to the local market to pick up groceries. The pleasant weather adds to his contentment, with birds chirping and the sun casting a warm glow. Behind him, a colorful roadside vendor displays vibrant fruits and fresh flowers, creating a lively scene. The man smiles at the vendor, appreciating the community spirit around him, as he feels the excitement of a simple errand that brightens his day.

Dress the character in old-fashioned pirate attire, standing inside a ship's cabin.

Dress the character in the image in Harry Potter's wizarding robes, holding a wand as if casting a spell.

Transform the character in the image into Superman, wearing a blue jumpsuit, a red cape, and featuring the Superman logo on the chest.

the man dressed in traditional attire inspired by historical Asian clothing. He has a decorative hair accessory. Her outfit consists of layered garments featuring vibrant colors, notably orange and blue, complemented by a patterned sash around his waist. He carries a long, ornate sword with a decorative hilt, resting against her arm. Additionally, he has a straw backpack slung over one shoulder and various pouches attached to her belt, suggesting readiness for travel or adventure. The backdrop is a lush, green landscape, enhancing the adventurous theme of the image.

Dress the character in medieval knight armor, standing in front of a castle.

Make the character in the image resemble Peter Pan, dressed in a green outfit and evoking a sense of flying.

Dress the character in the image in the combat gear of Desert Fox, displaying a brave and adventurous demeanor.

Dress the character in the image with elf ears and a wizard's robe, transforming them into a mage character from a fantasy world.

the man standing confidently in traditional Middle Eastern attire. He is wearing a light, flowing thobe with intricate golden detailing along the neckline. A white dishdasha is worn underneath, complementing the overall elegant look. The attire is completed with a black agal and a white ghutrah draped gracefully. The background is plain, emphasizing the cultural significance of the garment. The background is a magnificent palace.

Change the character's attire to traditional Russian lace headscarf and long skirt, with a winter snow landscape around her.

Flexible instruction or descriptions

A man in a blue jacket is standing on a street, facing the camera.

Move the location to a desert.

Let this car in {image} drive on Mars, dust kicked up.

A stuffed bunny is holding an apple and standing on the table.

a couple getting their picture taken in front of the skyscraper.

Please let her sitting on a wooden boat.

Make the man standing on road and holding a skateboard.

Let the robot dance.

Change the cloth color from orange to pink.

A photo of a park bench. The bench is made of wood and painted orange and dark green. Behind the bench is a row of green shrubs. There is a husky lying on a bench.

A young woman with long red hair and fair skin stands in what appears to be a warehouse. She is wearing a simple red long-sleeved t-shirt and carrying a dark blue tote bag.

A photo of a small boat moored on calm water, with a black cormorant standing on the stern, spreading its wings. The boat is white, with a Yamaha outboard motor on the stern. The port side of the boat reads 'ME 14XSR'. A man sitting on the boat.

Add a Santa Claus to the sky in {image}.

A vintage car parked on a forest path, the car is dark green, the hood covere with a blu satin, looks mysterious and elegant. The car is surrounded by fallen leaves, flanked by towering trees, the light through the leaves sprinkled down, creating a quiet, retro atmosphere.

Let the boat run in the Arctic.

Let her touch her hair.

The ocean wave gets stronger and the man is closer.

Create a futuristic cityscape with a neon-lit monkey in a cybernetic suit, sitting on a skyscraper at sunset.

Local Editing

Two nearly identical dark-colored vases with light blue designs sit on a light gray surface. Each vase features a light blue female figure in a flowing gown, positioned slightly differently on each vase. The necks and bases of the vases have decorative light blue bands with intricate patterns. The background behind the vases is a lighter gray than the surface they rest on. A subtle shadow is cast by each vase onto the surface.

A driver's perspective from inside a car traveling on a highway. A green highway sign indicates Comerica Park, Ford Field, and exit 46. A speed limit sign of 55 mph is visible on the right side of the highway. Construction barrels and traffic cones suggest ongoing roadwork in the median. An overpass crosses above the highway in the background.

A chestnut-colored horse with a prominent white blaze on its face stands in the foreground of the image. It looks directly at the camera, and its ears are perked forward. A small, dark brown miniature horse or donkey is visible in the background to the left of the larger horse. A red barn is partially visible behind the small equine. The background consists of a grassy field and a line of trees.

By referencing the mask, restore a partial image from the doodle {image} that aligns with the textual explanation: "1 white old owl".

In line with the details in "A small teddy bear tucked in to the pocket of a suit case", restore the high-quality definition of the mask part of the {image}.

Generate the {image} mask region to real image: A plate is full of broccoli and some type of meat.

Please colorize the selected area mask in {image} without changing anything else.

Restore the real image using {image}, mask and description: "A bus is driving down a road with cars behind it."

Add a bench is in front brick wall in mask of {image}.

Add a dog following her into {image} within the mask.

I need penknife removed from {image}, according to the mask derived from mask.

Remove the apple in {image} mask.

Add text to {image} mask region, text is 'SPOR'.

Detach all texts in the mask zone of {image}.

Local Reference Editing

The man is facing the camera and is serious/smiling.

The man is facing the camera.

The logo is printed on the headphones/bottle.

The woman dresses this skirt.

The item is put on the table/ground.

The item is put on the table.

BibTeX

@article{mao2025ace++, title={ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling}, author={Mao, Chaojie and Zhang, Jingfeng and Pan, Yulin and Jiang, Zeyinzi and Han, Zhen and Liu, Yu and Zhou, Jingren}, journal={arXiv preprint arXiv:2501.02487}, year={2025} } @article{han2024ace, title={ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer}, author={Han, Zhen and Jiang, Zeyinzi and Pan, Yulin and Zhang, Jingfeng and Mao, Chaojie and Xie, Chenwei and Liu, Yu and Zhou, Jingren}, journal={arXiv preprint arXiv:2410.00086}, year={2024} }