TY - GEN
T1 - Scene Graph Driven Text-Prompt Generation for Image Inpainting
AU - Shukla, Tripti
AU - Maheshwari, Paridhi
AU - Singh, Rajhans
AU - Shukla, Ankita
AU - Kulkarni, Kuldeep
AU - Turaga, Pavan
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Scene editing methods are undergoing a revolution, driven by text-to-image synthesis methods. Applications in media content generation have benefited from a careful set of engineered text prompts, that have been arrived at by the artists by trial and error. There is a growing need to better model prompt generation, for it to be useful for a broad range of consumer-grade applications. We propose a novel method for text prompt generation for the explicit purpose of consumer-grade image inpainting, i.e. insertion of new objects into missing regions in an image. Our approach leverages existing inter-object relationships to generate plausible textual descriptions for the missing object, that can then be used with any text-to-image generator. Given an image and a location where a new object is to be inserted, our approach first converts the given image to an intermediate scene graph. Then, we use graph convolutional networks to 'expand' the scene graph by predicting the identity and relationships of the new object to be inserted, with respect to the existing objects in the scene. The output of the expanded scene graph is cast into a textual description, which is then processed by a text-to-image generator, conditioned on the given image, to produce the final inpainted image. We conduct extensive experiments on the Visual Genome dataset, and show through qualitative and quantitative metrics that our method is superior to other methods.
AB - Scene editing methods are undergoing a revolution, driven by text-to-image synthesis methods. Applications in media content generation have benefited from a careful set of engineered text prompts, that have been arrived at by the artists by trial and error. There is a growing need to better model prompt generation, for it to be useful for a broad range of consumer-grade applications. We propose a novel method for text prompt generation for the explicit purpose of consumer-grade image inpainting, i.e. insertion of new objects into missing regions in an image. Our approach leverages existing inter-object relationships to generate plausible textual descriptions for the missing object, that can then be used with any text-to-image generator. Given an image and a location where a new object is to be inserted, our approach first converts the given image to an intermediate scene graph. Then, we use graph convolutional networks to 'expand' the scene graph by predicting the identity and relationships of the new object to be inserted, with respect to the existing objects in the scene. The output of the expanded scene graph is cast into a textual description, which is then processed by a text-to-image generator, conditioned on the given image, to produce the final inpainted image. We conduct extensive experiments on the Visual Genome dataset, and show through qualitative and quantitative metrics that our method is superior to other methods.
UR - http://www.scopus.com/inward/record.url?scp=85170821274&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85170821274&partnerID=8YFLogxK
U2 - 10.1109/CVPRW59228.2023.00083
DO - 10.1109/CVPRW59228.2023.00083
M3 - Conference contribution
AN - SCOPUS:85170821274
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 759
EP - 768
BT - Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023
PB - IEEE Computer Society
T2 - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023
Y2 - 18 June 2023 through 22 June 2023
ER -