Scene Graph Driven Text-Prompt Generation for Image Inpainting

Tripti Shukla; Paridhi Maheshwari; Rajhans Singh; Ankita Shukla; Kuldeep Kulkarni; Pavan Turaga

doi:10.1109/CVPRW59228.2023.00083

Scene Graph Driven Text-Prompt Generation for Image Inpainting

Tripti Shukla, Paridhi Maheshwari, Rajhans Singh, Ankita Shukla, Kuldeep Kulkarni, Pavan Turaga

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

Scene editing methods are undergoing a revolution, driven by text-to-image synthesis methods. Applications in media content generation have benefited from a careful set of engineered text prompts, that have been arrived at by the artists by trial and error. There is a growing need to better model prompt generation, for it to be useful for a broad range of consumer-grade applications. We propose a novel method for text prompt generation for the explicit purpose of consumer-grade image inpainting, i.e. insertion of new objects into missing regions in an image. Our approach leverages existing inter-object relationships to generate plausible textual descriptions for the missing object, that can then be used with any text-to-image generator. Given an image and a location where a new object is to be inserted, our approach first converts the given image to an intermediate scene graph. Then, we use graph convolutional networks to 'expand' the scene graph by predicting the identity and relationships of the new object to be inserted, with respect to the existing objects in the scene. The output of the expanded scene graph is cast into a textual description, which is then processed by a text-to-image generator, conditioned on the given image, to produce the final inpainted image. We conduct extensive experiments on the Visual Genome dataset, and show through qualitative and quantitative metrics that our method is superior to other methods.

Original language	English (US)
Title of host publication	Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023
Publisher	IEEE Computer Society
Pages	759-768
Number of pages	10
ISBN (Electronic)	9798350302493
DOIs	https://doi.org/10.1109/CVPRW59228.2023.00083
State	Published - 2023
Externally published	Yes
Event	2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023 - Vancouver, Canada Duration: Jun 18 2023 → Jun 22 2023

Publication series

Name	IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Volume	2023-June
ISSN (Print)	2160-7508
ISSN (Electronic)	2160-7516

Conference

Conference	2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023
Country/Territory	Canada
City	Vancouver
Period	6/18/23 → 6/22/23

ASJC Scopus subject areas

Computer Vision and Pattern Recognition
Electrical and Electronic Engineering

Access to Document

10.1109/CVPRW59228.2023.00083

Cite this

Shukla, T., Maheshwari, P., Singh, R., Shukla, A., Kulkarni, K., & Turaga, P. (2023). Scene Graph Driven Text-Prompt Generation for Image Inpainting. In Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023 (pp. 759-768). (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Vol. 2023-June). IEEE Computer Society. https://doi.org/10.1109/CVPRW59228.2023.00083

Scene Graph Driven Text-Prompt Generation for Image Inpainting. / Shukla, Tripti; Maheshwari, Paridhi; Singh, Rajhans et al.
Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023. IEEE Computer Society, 2023. p. 759-768 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Vol. 2023-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Shukla, T, Maheshwari, P, Singh, R, Shukla, A, Kulkarni, K & Turaga, P 2023, Scene Graph Driven Text-Prompt Generation for Image Inpainting. in Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2023-June, IEEE Computer Society, pp. 759-768, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023, Vancouver, Canada, 6/18/23. https://doi.org/10.1109/CVPRW59228.2023.00083

Shukla T, Maheshwari P, Singh R, Shukla A, Kulkarni K, Turaga P. Scene Graph Driven Text-Prompt Generation for Image Inpainting. In Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023. IEEE Computer Society. 2023. p. 759-768. (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops). doi: 10.1109/CVPRW59228.2023.00083

@inproceedings{41bcd856826243c9a9867127197e7c56,

title = "Scene Graph Driven Text-Prompt Generation for Image Inpainting",

abstract = "Scene editing methods are undergoing a revolution, driven by text-to-image synthesis methods. Applications in media content generation have benefited from a careful set of engineered text prompts, that have been arrived at by the artists by trial and error. There is a growing need to better model prompt generation, for it to be useful for a broad range of consumer-grade applications. We propose a novel method for text prompt generation for the explicit purpose of consumer-grade image inpainting, i.e. insertion of new objects into missing regions in an image. Our approach leverages existing inter-object relationships to generate plausible textual descriptions for the missing object, that can then be used with any text-to-image generator. Given an image and a location where a new object is to be inserted, our approach first converts the given image to an intermediate scene graph. Then, we use graph convolutional networks to 'expand' the scene graph by predicting the identity and relationships of the new object to be inserted, with respect to the existing objects in the scene. The output of the expanded scene graph is cast into a textual description, which is then processed by a text-to-image generator, conditioned on the given image, to produce the final inpainted image. We conduct extensive experiments on the Visual Genome dataset, and show through qualitative and quantitative metrics that our method is superior to other methods.",

author = "Tripti Shukla and Paridhi Maheshwari and Rajhans Singh and Ankita Shukla and Kuldeep Kulkarni and Pavan Turaga",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023 ; Conference date: 18-06-2023 Through 22-06-2023",

year = "2023",

doi = "10.1109/CVPRW59228.2023.00083",

language = "English (US)",

series = "IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops",

publisher = "IEEE Computer Society",

pages = "759--768",

booktitle = "Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023",

}

TY - GEN

T1 - Scene Graph Driven Text-Prompt Generation for Image Inpainting

AU - Shukla, Tripti

AU - Maheshwari, Paridhi

AU - Singh, Rajhans

AU - Shukla, Ankita

AU - Kulkarni, Kuldeep

AU - Turaga, Pavan

PY - 2023

Y1 - 2023

N2 - Scene editing methods are undergoing a revolution, driven by text-to-image synthesis methods. Applications in media content generation have benefited from a careful set of engineered text prompts, that have been arrived at by the artists by trial and error. There is a growing need to better model prompt generation, for it to be useful for a broad range of consumer-grade applications. We propose a novel method for text prompt generation for the explicit purpose of consumer-grade image inpainting, i.e. insertion of new objects into missing regions in an image. Our approach leverages existing inter-object relationships to generate plausible textual descriptions for the missing object, that can then be used with any text-to-image generator. Given an image and a location where a new object is to be inserted, our approach first converts the given image to an intermediate scene graph. Then, we use graph convolutional networks to 'expand' the scene graph by predicting the identity and relationships of the new object to be inserted, with respect to the existing objects in the scene. The output of the expanded scene graph is cast into a textual description, which is then processed by a text-to-image generator, conditioned on the given image, to produce the final inpainted image. We conduct extensive experiments on the Visual Genome dataset, and show through qualitative and quantitative metrics that our method is superior to other methods.

AB - Scene editing methods are undergoing a revolution, driven by text-to-image synthesis methods. Applications in media content generation have benefited from a careful set of engineered text prompts, that have been arrived at by the artists by trial and error. There is a growing need to better model prompt generation, for it to be useful for a broad range of consumer-grade applications. We propose a novel method for text prompt generation for the explicit purpose of consumer-grade image inpainting, i.e. insertion of new objects into missing regions in an image. Our approach leverages existing inter-object relationships to generate plausible textual descriptions for the missing object, that can then be used with any text-to-image generator. Given an image and a location where a new object is to be inserted, our approach first converts the given image to an intermediate scene graph. Then, we use graph convolutional networks to 'expand' the scene graph by predicting the identity and relationships of the new object to be inserted, with respect to the existing objects in the scene. The output of the expanded scene graph is cast into a textual description, which is then processed by a text-to-image generator, conditioned on the given image, to produce the final inpainted image. We conduct extensive experiments on the Visual Genome dataset, and show through qualitative and quantitative metrics that our method is superior to other methods.

UR - http://www.scopus.com/inward/record.url?scp=85170821274&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85170821274&partnerID=8YFLogxK

U2 - 10.1109/CVPRW59228.2023.00083

DO - 10.1109/CVPRW59228.2023.00083

M3 - Conference contribution

AN - SCOPUS:85170821274

T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

SP - 759

EP - 768

BT - Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023

PB - IEEE Computer Society

T2 - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023

Y2 - 18 June 2023 through 22 June 2023

ER -

Scene Graph Driven Text-Prompt Generation for Image Inpainting

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this