In-Context LoRA for Diffusion Transformers
Lianghua Huang
Wei Wang
Zhi-Fan Wu
Yupeng Shi
Huanzhang Dou
Chen Liang
Yutong Feng
Yu Liu
Jingren Zhou
Tongyi Lab
[Paper]
[BibTeX]
[Code]
Prompt:
“This set of four images illustrates a young artist's creative process
in a bright and inspiring studio;
[IMAGE1] she stands before a large canvas, brush in hand, adding
vibrant colors to a partially completed painting,
[IMAGE2] she sits at a cluttered wooden table, sketching ideas in a
notebook with various art supplies scattered around,
[IMAGE3] she takes a moment to step back and observe her work, and
[IMAGE4] she experiments with different textures by mixing paints
directly on the palette, her focused expression showcasing her dedication to her craft.”
In-Context LoRA fine-tunes text-to-image models to generate image sets with customizable
intrinsic relationships, optionally conditioned on another set, enabling adaptation to a wide
range of tasks.
Abstract
Recent research [Huang et al., 2024] has explored
the use of diffusion transformers (DiTs) for task-agnostic image generation by simply
concatenating attention tokens across images. However, despite substantial computational resources, the fidelity
of the generated images remains suboptimal. In this study, we reevaluate and streamline this framework by
hypothesizing that text-to-image DiTs inherently possess in-context generation
capabilities, requiring only minimal tuning to activate them. Through diverse task experiments, we
qualitatively demonstrate that existing text-to-image DiTs can effectively perform in-context generation without
any tuning. Building on this insight, we propose a remarkably simple pipeline to leverage the in-context abilities
of DiTs: (1) concatenate images instead of tokens, (2) perform joint captioning of multiple images, and (3) apply
task-specific LoRA tuning using small datasets (e.g., 20 ~ 100 samples) instead of full-parameter tuning
with large datasets. We name our models In-Context LoRA (IC-LoRA). This approach requires no modifications to the
original DiT models, only changes to the training data. Remarkably, our pipeline generates high-fidelity image
sets that better adhere to prompts. While task-specific in terms of tuning data, our framework remains
task-agnostic in architecture and pipeline, offering a powerful tool for the community and providing valuable
insights for further research on product-level task-agnostic generation systems. We release our code, data, and
models at here.
Film Storyboard Generation
Each three-image sequence is generated simultaneously using In-Context LoRA. A placeholder character name uniquely
references the character’s identity across the images.
Prompt:
“In this adventurous three-image sequence, [IMAGE1] Ethan, an intrepid
archaeologist with a rugged appearance, uncovers an ancient map in a sunlit desert dig site, his excitement
palpable as he brushes away the sand, [IMAGE2] transitioning to a bustling marketplace in a vibrant foreign city
where Ethan negotiates with local merchants and gathers essential
supplies for his quest, [IMAGE3] and finally, Ethan treks through a
dense, mist-covered jungle, the towering trees and exotic wildlife emphasizing the challenges and mysteries that
lie ahead on his journey.”
Prompt:
“In a vibrant festival, [IMAGE1] we find Leo, a shy boy, standing at
the edge of a bustling carnival, eyes wide with awe at the colorful rides and laughter, [IMAGE2] transitioning
to him reluctantly trying a daring game, his friends cheering him on, [IMAGE3] culminating in a triumphant
moment as he wins a giant stuffed bear, his face beaming with pride as he holds it up for all to see.”
Prompt:
“In a captivating tale of resilience, [IMAGE1] we see Lena, a
determined girl, planting seeds in a barren field, her face set with resolve, [IMAGE2] transitioning to her
nurturing the plants, watering them daily, her efforts slowly yielding results, [IMAGE3] culminating in a lush
garden bursting with life, Lena standing proudly amidst her creation,
symbolizing growth and perseverance.”
Prompt:
“In a warm portrayal of family dynamics, [IMAGE1] shows Liam assisting
his little sister Sophie with her homework at the dining table, their
expressions serious yet playful, [IMAGE2] shifting to the living room, where Sophie triumphantly holds up her completed project, her eyes sparkling with
pride while Liam shares in her joy, [IMAGE3] concluding with both
siblings snuggled on the couch, engrossed in a movie, their laughter echoing through the cozy space.”
Prompt:
“In a tender exploration of first love, [IMAGE1] we see Jamie nervously
arranging flowers in a park, glancing around as if waiting for someone special, [IMAGE2] transitioning to the
moment arrives, their eyes locking in a shy smile that speaks volumes, [IMAGE3] finally showing them
seated on a bench, sharing stories and laughter, surrounded by blooming blossoms, embodying the magic of young
romance.”
Prompt:
“In a heartwarming depiction of a community gathering, [IMAGE1] captures Ella preparing colorful decorations for a local festival, her excitement
palpable, [IMAGE2] then shifts to her helping Tom set up a booth, their
teamwork highlighted by laughter and shared smiles, [IMAGE3] culminating with the festival in full swing,
Ella and Tom surrounded by
friends, their joy radiating against the festive backdrop.”
Portrait Photography
Each set of four images is generated concurrently with In-Context LoRA, aiming to maintain consistent subject
identities across images within each set.
Prompt:
“This set of four images showcases a teenage girl with curly black hair wearing a stylish denim jacket, each
image highlighting her dynamic personality in urban settings; [IMAGE1] she is skateboarding down a
graffiti-covered alley, a confident smile on her face as she maneuvers around obstacles; [IMAGE2] she is seated
at a trendy café, typing on her laptop with focused determination, the bustling city life visible through the
large windows behind her; [IMAGE3] she stands on a rooftop at sunset, her hair blowing in the breeze as she
gazes thoughtfully over the city skyline; and [IMAGE4] she is laughing with friends at a vibrant street market,
colorful lights and stalls creating a lively atmosphere around her.”
Prompt:
“The set of four images highlights the playful energy of a young boy in a city playground. [IMAGE1] He climbs up
a jungle gym with a look of determination, his hands gripping the bars as he pulls himself up; [IMAGE2] he
swings high on a set of swings, his head thrown back in laughter as his feet touch the sky; [IMAGE3] a close-up
captures him mid-slide, his eyes wide with excitement as he descends down a bright yellow slide; [IMAGE4] he
races down a pathway lined with trees, his arms pumping with energy as he chases after a soccer ball, his face
alight with joy.”
Prompt:
“The set of four images showcases a young girl exploring a cozy kitchen setting with her mother, filled with
warmth and affection. [IMAGE1] She stands on a stool, her hands reaching into a bowl of cookie dough as her
mother smiles beside her; [IMAGE2] she’s caught mid-laugh, flour dusted across her cheeks as she playfully
tosses a bit of dough in the air; [IMAGE3] the scene focuses on her concentration as she carefully uses cookie
cutters, her tiny hands pressing down on the dough; [IMAGE4] she proudly holds up a finished tray of cookies,
her face beaming with joy and accomplishment.”
Prompt:
“This set of four images captures the serene moments of an elderly woman tending to her garden. [IMAGE1] She
kneels beside a bed of blooming flowers, her hands gently pruning a rose bush, the soft morning light
illuminating her silver hair; [IMAGE2] she stands with a watering can, her face calm and peaceful as she
nurtures her plants; [IMAGE3] a close-up reveals her content smile as she examines a budding flower in her hand,
a sense of pride and joy evident; [IMAGE4] she sits on a small bench, sipping tea with her garden behind her,
surrounded by the vibrant colors of her hard work.”
Prompt:
“This set of four images captures a lively day spent at a beach between a mother and her son, highlighting their
playful connection and shared joy; [IMAGE1] the boy runs towards the water, his arms wide open, with the mother
following behind, smiling as she watches him; [IMAGE2] they are knee-deep in the ocean, laughing as they splash
each other, the sunlight reflecting off the water; [IMAGE3] they sit on the sand, the boy intently building a
sandcastle while the mother assists, both focused and relaxed; [IMAGE4] the final image shows the two walking
along the shore at sunset, the mother’s arm draped protectively around her son’s shoulders, their footprints
trailing behind them in the sand.”
Font Design
Each set of four images is generated concurrently with In-Context LoRA, aiming to achieve a consistent font style
across images within each set.
Prompt:
“The set of four images features a minimalist handwriting font for casual use. [IMAGE1] shows "Everyday" on a
coffee cup; [IMAGE2] displays "Notes" on a small journal; [IMAGE3] has "Live Simply" on a white pillow; [IMAGE4]
shows "Good Vibes" on a cozy blanket, perfect for lifestyle and home decor branding.”
Prompt:
“The set of four image displays a tech-inspired sans serif font in minimalist designs. [IMAGE1] features "Tech
Flow" in silver on a circuit board; [IMAGE2] shows "Future World" in neon on a digital background; [IMAGE3] has
"Virtual Space" in blue on a sleek black setting; [IMAGE4] displays "AI Vision" in holographic font, ideal for
technology branding.”
Prompt:
“The set of four images presents a stylized font for travel themes. [IMAGE1] displays "Wanderlust" over a
mountain scene; [IMAGE2] features "Explore" on a beach background; [IMAGE3] shows "Adventure" with a compass
illustration; [IMAGE4] has "Journey" on a vintage suitcase, perfect for travel branding.”
Prompt:
“The set of four images highlights a serif font with Victorian-style details. [IMAGE1] displays "Vintage Charm"
on an old book cover; [IMAGE2] shows "Elegance" on a dark lace background; [IMAGE3] features "Old Times" on a
vintage clock; [IMAGE4] presents "Antique" on an ornate mirror, perfect for historical themes.”
Prompt:
“The set of four images showcases a playful bubble font in a vibrant pop-art style. [IMAGE1] displays "Pop
Candy" in bright pink with a polka dot background; [IMAGE2] shows "Sweet Treat" in purple, surrounded by candy
illustrations; [IMAGE3] has "Yum!" in a mix of bright colors; [IMAGE4] shows "Delicious" against a striped
background, perfect for fun, kid-friendly products.”
Home Decoration
Each set of four images is generated concurrently using In-Context LoRA, aiming to maintain a consistent
decorative style across images within each set.
Prompt:
“This set of four images captures a colorful, nature-inspired living space with touches of green and earthy
textures; [IMAGE1] features a cozy nook with a woven chair draped in green blankets, surrounded by potted plants
and botanical prints on the wall; [IMAGE2] highlights a rustic wooden shelf adorned with small planters,
candles, and woven baskets; [IMAGE3] displays a serene bedroom with a bed made up in white linens, a natural
wood nightstand, and a forest-themed mural; [IMAGE4] shows a close-up of a large plant pot with unique textures
beside a patterned area rug.”
Prompt:
“This vibrant set of four image captures a lively home decor scene filled with color and eclectic charm;
[IMAGE1] the first image showcases a cozy living area with pastel-colored walls, a soft blue sofa, wooden
storage units displaying colorful accents, and a unique layered pendant light, [IMAGE2] the second image
features a kitchen setup with open shelves holding assorted kitchenware, a wire grid for organizing mugs above a
white sink, and warm sunlight streaming onto the countertop, [IMAGE3] the third image highlights a bold art wall
with an array of colorful, abstract paintings above a sage green sofa adorned with bright cushions, and [IMAGE4]
the fourth image shows a cheerful dining nook with a blue table, vividly striped cushions, framed artwork on the
sunny yellow wall, and a distinctive green pendant lamp casting a soft glow over the space.”
Prompt:
“This set of four images showcases a rustic living room with warm wood tones and cozy decor elements; [IMAGE1]
features a large stone fireplace with wooden shelves filled with books and candles; [IMAGE2] shows a vintage
leather sofa draped in plaid blankets, complemented by a mix of textured cushions; [IMAGE3] displays a corner
with a wooden armchair beside a side table holding a steaming mug and a classic book; [IMAGE4] captures a cozy
reading nook with a window seat, a soft fur throw, and decorative logs stacked neatly.”
Prompt:
“This set of four images showcases a vibrant and cozy kitchen with eclectic decor and warm tones; [IMAGE1]
reveals a colorful countertop with an assortment of spices in glass jars, a vintage kettle, and potted herbs;
[IMAGE2] displays a kitchen island with high chairs, bright red cabinets, and a hanging pot rack; [IMAGE3] shows
an inviting breakfast nook with a patterned bench, floral cushions, and a small round table; [IMAGE4] highlights
a section of open shelving with eclectic dinnerware, vibrant mugs, and unique artwork, creating a warm and
lively ambiance.”
PowerPoint Template Design
Each set of four images is generated concurrently with In-Context LoRA, aiming to create a cohesive and unified
presentation style across slides within each set.
Prompt:
“This set of four images showcases a rustic-themed PowerPoint template for a culinary workshop; [IMAGE1]
introduces "Farm to Table Cooking" in warm, earthy tones; [IMAGE2] organizes workshop sections like
"Ingredients," "Preparation," and "Serving"; [IMAGE3] displays ingredient lists for seasonal produce; [IMAGE4]
includes chef profiles with short bios.”
Prompt:
“The set of four images presents a PowerPoint template designed for a charity fundraiser; [IMAGE1] introduces
"Help Make a Difference" in large, bold text over a background of hands reaching out; [IMAGE2] lists causes like
“Education,” “Healthcare,” and “Water Access” with heart icons; [IMAGE3] displays donation statistics; [IMAGE4]
includes a call-to-action slide with links to donate and volunteer.”
Prompt:
“This set of four images presents a PowerPoint template for an art history class on surrealism; [IMAGE1] shows
“Exploring Surrealism” over a Dali-inspired background; [IMAGE2] lists iconic surrealist artists like “Dali,”
“Magritte,” and “Ernst”; [IMAGE3] includes a timeline of the surrealist movement; [IMAGE4] showcases famous
artworks with short interpretations.”
Prompt:
“This set of four images depicts a colorful and engaging PowerPoint template for a “Food Science” educational
presentation; [IMAGE1] features a cover slide with “Understanding Nutrition” in bold typography and vegetable
illustrations; [IMAGE2] presents topics like “Macronutrients,” “Vitamins,” and “Minerals”; [IMAGE3] includes a
pie chart displaying daily nutrient intake recommendations; [IMAGE4] shows recipe ideas with images and
nutritional benefits.”
Prompt:
“The set of four images displays a vibrant template for a fashion branding presentation; [IMAGE1] introduces the
title “New Collection 2024” with a runway-inspired background; [IMAGE2] lists fashion sections like
“Streetwear,” “Formal,” and “Accessories” with icons; [IMAGE3] includes a color palette guide for the season;
[IMAGE4] presents a trend forecast with illustrated outfit ideas.”
Couple Profile Generation
Each image pair is generated concurrently with In-Context LoRA, aiming to maintain a consistent style and identity
features across both images in each set.
Prompt:
“This pair of images features a couple as cartoon characters in medieval attire; [IMAGE1] shows a knight
with a plumed helmet and a determined look, holding a small shield, while [IMAGE2] displays a character
dressed as a princess with a crown, smiling as they hold a flower, both against a castle background.”
Prompt:
“The pair of images captures a whimsical depiction of a couple in cartoon dragon costumes; [IMAGE1] a
character in a green dragon onesie with pointed ears and a toothy smile peeks towards the right, while
[IMAGE2] shows a character in a purple dragon suit with matching horns, displaying a playful wink, both set
against a cloudy sky background.”
Prompt:
“This pair of images portrays a couple of cartoon cats in detective attire; [IMAGE1] a black cat in a
trench coat and fedora holds a magnifying glass and peers to the right, while [IMAGE2] a white cat with a
bow tie and matching hat raises an eyebrow in curiosity, creating a fun, noir-inspired scene against a dimly
lit background.”
Prompt:
“The pair of images depicts cartoon characters enjoying music together; [IMAGE1] features a character with
a spiky mohawk and wide headphones, bobbing their head with closed eyes, while [IMAGE2] presents a character
with a ponytail, holding a guitar and also wearing headphones, both set against a dark blue background with
musical notes scattered around.”
Prompt:
“The pair of images depicts a couple in a cartoon-style grocery shopping scene; [IMAGE1] one character
reaches for a snack on a high shelf with a playful grin, while [IMAGE2] the other character with wide eyes
and a towering cart of food holds a grocery list, all set in a colorful grocery aisle.”
Prompt:
“This pair of images capture a couple in a pillow fight; [IMAGE1] a character with tousled hair and a
mischievous grin winds up to swing a fluffy pillow, while [IMAGE2] another character, already hit with
feathers flying around them, has a playful look of shock, both in a cozy bedroom with fluffy bedding.”
Visual Identity Design
Each image pair is generated concurrently with In-Context LoRA, aiming to achieve a cohesive and consistent visual
identity across both images in each pair.
Prompt:
“The pair of images highlights a logo and its real-world use for a rustic coffee brand; [IMAGE1] a
striking teal background showcases a logo with a stylized, perched bird in black and white, titled “Bluebird
Roast” in an elegant serif font, with a leafy branch detail underneath; [IMAGE2] this logo is applied to a
coffee mug sitting atop a woven coaster on a dark mahogany table, with a blurred background that emphasizes
the warm tones and classic aesthetic of the branding in a cozy setting.”
Prompt:
“The pair of images showcases the joyful identity of a produce brand, [IMAGE1] showing a smiling pineapple
graphic and the brand name “Fresh Tropic” in a fun, casual font on a light aqua background; while [IMAGE2]
translates the design onto a reusable shopping tote with the pineapple logo in black, held by a person in a
market setting, emphasizing the brand’s approachable and eco-friendly vibe.”
Prompt:
“This pair of images presents an artisan soap brand inspired by botanical elements. [IMAGE1] On a rich
sage green background, delicate gold-foil leaves and flower motifs intertwine around the brand name “Herbal
Haven” in an elegant, serif font, conveying a sophisticated, earthy aesthetic. [IMAGE2] The design is
applied to a set of organic soaps wrapped in handmade paper and twine, placed with real herbs and flowers on
a wooden board, radiating the brand’s commitment to natural beauty and luxury through a warm, inviting
setting.”
Prompt:
“This pair of images introduces a sophisticated confectionery brand identity blending elegance and whimsy.
[IMAGE1] The first image resents a whimsical, Art Nouveau-inspired design, featuring a pattern of golden
leaves intertwined with pastel-colored candy shapes on a deep plum background. The brand name "Golden
Garden" appears in a flowing, decorative font, surrounded by delicate floral filigree. [IMAGE2] The design
is applied to a set of artisanal chocolate boxes, displayed with gold-foil accents and delicate paper
flowers, conveying the brand’s high-end and enchanting quality through luxurious textures and intricate
details.”
Prompt:
“In this set of two images, a bold animal-themed logo is introduced and adapted to a lifestyle product;
[IMAGE1] a simplistic black logo featuring a bear face and the brand name “Bear Lane” on a sky blue
background; [IMAGE2] the design is printed on a gray gym bag and water bottle, with both items positioned on
a wooden gym bench.”
Prompt:
“In this set of two images, a modern mystical brand identity comes to life. [IMAGE1] Against a deep navy
background, intricate star and moon motifs in metallic silver and soft blush pink shimmer in various sizes,
creating a cosmic, dreamlike atmosphere. The brand name “Celestial Glow” is displayed in a sleek, geometric
font that radiates a mystical yet minimalist vibe. [IMAGE2] The design is adapted onto a glowing glass
misting bottle and a crystal-infused body lotion bottle, arranged on a soft, cloud-like velvet fabric with
crystals and candles, showing the brand’s ethereal charm in self-care products.”
Portrait Illustration
Each pair of images is generated with In-Context LoRA, aiming to maintain consistent identity, clothing,
expression, similar pose, and atmosphere between the ‘before’ and ‘after’ illustration versions. Instead of
directly replicating the original photo, the illustration enhances key features with added expressive emphasis.
Prompt:
“This image pair presents a transformation from a realistic portrait to a playful illustration, capturing
both detail and artistic flair; [IMAGE1] the photograph shows a woman standing in a bustling marketplace,
wearing a wide-brimmed hat, a flowing bohemian dress, and a leather crossbody bag; [IMAGE2] the illustration
version exaggerates her accessories and features, with the bohemian dress depicted in vibrant patterns and
bold colors, while the background is simplified into abstract market stalls, giving the scene an animated
and lively feel.”
Prompt:
“The image pair highlights a transformation from a high-fashion portrait to an artistic interpretation,
capturing elegance in both styles; [IMAGE1] the photo shows a woman wearing a sleek black dress with lace
details, posing against a white studio backdrop, her hair styled in an intricate updo; [IMAGE2] the
illustration reimagines her as a stylized figure, with the lace details transformed into bold, intricate
patterns and her hair exaggerated into voluminous curls, while the background is simplified into a gradient
of soft, muted colors, enhancing the contrast between her formal attire and the artistic rendering.”
Prompt:
“The image pair showcases the transformation from reality to a stylized interpretation; [IMAGE1] the photo
shows a person with a topknot, wearing a cozy yellow sweater and plaid scarf, standing in front of a shop
window, while [IMAGE2] the illustrated version highlights the warm tones, adding playful, oversized shapes
and bright hues, creating an animated feel with a soft, inviting background.”
Prompt:
“The image pair illustrates a transformation from a candid photograph to a dynamic illustration, each
capturing distinct artistic qualities; [IMAGE1] the original photo features a man with a beard, wearing a
denim jacket over a graphic tee and black jeans, seated on a staircase with a skateboard beside him, while
[IMAGE2] the illustrated version amplifies his outfit with bold colors, adding stylized graffiti on the
steps and vibrant motion lines around the skateboard.”
Prompt:
“This image pair captures a transformation from a street-style photograph to a dynamic digital
illustration; [IMAGE1] the photo shows a person wearing a colorful windbreaker jacket, ripped jeans, and
white sneakers, walking along a busy city street with a skateboard tucked under their arm; [IMAGE2] the
illustration simplifies the background into bold, abstract shapes, while the figure’s outfit is brightened
with more vibrant colors and their pose is exaggerated, giving the image a sense of movement and energy that
contrasts with the stillness of the photograph.”
Prompt:
“The image pair contrasts a photographic portrait with its illustrated counterpart, showcasing an artistic
reinterpretation; [IMAGE1] the initial photo shows a woman with a high bun, dressed in a classic black
trench coat, holding a bright yellow umbrella, standing on a rainy street, while [IMAGE2] the illustration
accentuates her pose with exaggerated features, making the umbrella the focal point with vivid yellows and
reds, transforming the rain into playful, curving lines.”
Sandstorm Visual Effect
Each image pair is generated using In-Context LoRA, aiming to demonstrate strong consistency between the ‘before’
and ‘after’ sandstorm effect images.
Prompt:
“This image pair showcases the transformation of a cyclist through a sandstorm visual effect; [IMAGE1]
features a cyclist in vibrant gear pedaling steadily on a clear, open road with a serene sky in the background,
highlighting focus and determination, [IMAGE2] transforms the scene as the cyclist becomes enveloped in a fierce
sandstorm, with sand particles swirling intensely around the bike and rider against a stormy, darkened backdrop,
emphasizing chaos and power.”
Prompt:
“The image pair illustrates the metamorphosis of a musician enhanced by a sandstorm effect; [IMAGE1] the first
image depicts a guitarist playing calmly on a minimalist stage with soft lighting, capturing the essence of
tranquility and artistry, [IMAGE2] the second image erupts into a dynamic sandstorm with sand and debris
swirling around the musician and instrument, set against a tumultuous background, conveying an intense and
electrifying performance.”
Prompt:
“This pair of images highlights a stunning transformation with a sandstorm visual effect, balancing calm and
intensity; [IMAGE1] features a man in a meditative pose, seated cross-legged in a black outfit against a white
backdrop, eyes closed, [IMAGE2] shows the man shrouded in a fierce explosion of swirling sand particles mixed
with streaks of electric light, against a deeper background, creating a captivating display of serenity
overtaken by chaos.”
Image-Conditional Generation
Examples of image-conditional generation using In-Context LoRA across multiple tasks with training-free SDEdit.
Portrait Identity Transfer.
Font Style Transfer.
Application of Visual Identity.
Portrait to Illustration.
Failure case of Sandstorm Visual Effect Application.
Failure cases of Portrait Identity Transfer.
We observe that SDEdit for In-Context LoRA tends to be unstable, often failing to preserve identity. Addressing this issue is left for future work.
BibTex
@article{lhhuang2024iclora,
title={In-Context LoRA for Diffusion Transformers},
author={Huang, Lianghua and Wang, Wei and Wu, Zhi-Fan and Shi, Yupeng and Dou, Huanzhang and Liang, Chen and Feng, Yutong and Liu, Yu and Zhou, Jingren},
booktitle={arXiv preprint arxiv:2410.23775},
year={2024}
}