CONCEPTBED: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

Maitreya Patel; Tejas Gokhale; Chitta Baral; Yezhou Yang

doi:10.1609/aaai.v38i13.29371

CONCEPTBED: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to journal › Conference article › peer-review

Abstract

The ability to understand visual concepts and replicate and compose these concepts from images is a central goal for computer vision. Recent advances in text-to-image (T2I) models have lead to high definition and realistic image quality generation by learning from large databases of images and their descriptions. However, the evaluation of T2I models has focused on photorealism and limited qualitative measures of visual understanding. To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a.k.a. personalized T2I), we introduce CONCEPTBED, a large-scale dataset that consists of 284 unique visual concepts, and 33K composite text prompts. Along with the dataset, we propose an evaluation metric, Concept Confidence Deviation (CCD), that uses the confidence of oracle concept classifiers to measure the alignment between concepts generated by T2I generators and concepts contained in target images. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our human study shows that CCD is highly correlated with human understanding of concepts. Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome. The data, code, and interactive demo is available at: https://conceptbed.github.io/

Original language	English (US)
Pages (from-to)	14554-14562
Number of pages	9
Journal	Proceedings of the AAAI Conference on Artificial Intelligence
Volume	38
Issue number	13
DOIs	https://doi.org/10.1609/aaai.v38i13.29371
State	Published - Mar 25 2024
Event	38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada Duration: Feb 20 2024 → Feb 27 2024

ASJC Scopus subject areas

Artificial Intelligence

Access to Document

10.1609/aaai.v38i13.29371

Cite this

@article{14b4530de6a645e7b172f117fe0ea3ab,

title = "CONCEPTBED: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models",

abstract = "The ability to understand visual concepts and replicate and compose these concepts from images is a central goal for computer vision. Recent advances in text-to-image (T2I) models have lead to high definition and realistic image quality generation by learning from large databases of images and their descriptions. However, the evaluation of T2I models has focused on photorealism and limited qualitative measures of visual understanding. To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a.k.a. personalized T2I), we introduce CONCEPTBED, a large-scale dataset that consists of 284 unique visual concepts, and 33K composite text prompts. Along with the dataset, we propose an evaluation metric, Concept Confidence Deviation (CCD), that uses the confidence of oracle concept classifiers to measure the alignment between concepts generated by T2I generators and concepts contained in target images. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our human study shows that CCD is highly correlated with human understanding of concepts. Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome. The data, code, and interactive demo is available at: https://conceptbed.github.io/",

author = "Maitreya Patel and Tejas Gokhale and Chitta Baral and Yezhou Yang",

note = "Publisher Copyright: Copyright {\textcopyright} 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 38th AAAI Conference on Artificial Intelligence, AAAI 2024 ; Conference date: 20-02-2024 Through 27-02-2024",

year = "2024",

month = mar,

day = "25",

doi = "10.1609/aaai.v38i13.29371",

language = "English (US)",

volume = "38",

pages = "14554--14562",

journal = "Proceedings of the AAAI Conference on Artificial Intelligence",

issn = "2159-5399",

number = "13",

}

TY - JOUR

T1 - CONCEPTBED

T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024

AU - Patel, Maitreya

AU - Gokhale, Tejas

AU - Baral, Chitta

AU - Yang, Yezhou

PY - 2024/3/25

Y1 - 2024/3/25

N2 - The ability to understand visual concepts and replicate and compose these concepts from images is a central goal for computer vision. Recent advances in text-to-image (T2I) models have lead to high definition and realistic image quality generation by learning from large databases of images and their descriptions. However, the evaluation of T2I models has focused on photorealism and limited qualitative measures of visual understanding. To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a.k.a. personalized T2I), we introduce CONCEPTBED, a large-scale dataset that consists of 284 unique visual concepts, and 33K composite text prompts. Along with the dataset, we propose an evaluation metric, Concept Confidence Deviation (CCD), that uses the confidence of oracle concept classifiers to measure the alignment between concepts generated by T2I generators and concepts contained in target images. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our human study shows that CCD is highly correlated with human understanding of concepts. Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome. The data, code, and interactive demo is available at: https://conceptbed.github.io/

AB - The ability to understand visual concepts and replicate and compose these concepts from images is a central goal for computer vision. Recent advances in text-to-image (T2I) models have lead to high definition and realistic image quality generation by learning from large databases of images and their descriptions. However, the evaluation of T2I models has focused on photorealism and limited qualitative measures of visual understanding. To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a.k.a. personalized T2I), we introduce CONCEPTBED, a large-scale dataset that consists of 284 unique visual concepts, and 33K composite text prompts. Along with the dataset, we propose an evaluation metric, Concept Confidence Deviation (CCD), that uses the confidence of oracle concept classifiers to measure the alignment between concepts generated by T2I generators and concepts contained in target images. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our human study shows that CCD is highly correlated with human understanding of concepts. Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome. The data, code, and interactive demo is available at: https://conceptbed.github.io/

UR - http://www.scopus.com/inward/record.url?scp=85189635727&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85189635727&partnerID=8YFLogxK

U2 - 10.1609/aaai.v38i13.29371

DO - 10.1609/aaai.v38i13.29371

M3 - Conference article

AN - SCOPUS:85189635727

SN - 2159-5399

VL - 38

SP - 14554

EP - 14562

JO - Proceedings of the AAAI Conference on Artificial Intelligence

JF - Proceedings of the AAAI Conference on Artificial Intelligence

IS - 13

Y2 - 20 February 2024 through 27 February 2024

ER -

CONCEPTBED: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this