同构这一点实际不能完成:同一句话可能对应多张图片;所以我们需要Dalle-E2:because these two categories are not purely isomorphic. When we type “a photo of dog”, there exists millions of different matching images. Therefore, Dall-E2 modifies the definition of image category, so that each object is a probability distribution of images. From this perspective, the diffusion model is a generator of images, while Clip learns the functor from the category of texts to the category of probability distributions of images
从clip到dalle-2的演变,或许可以motivate出新的工作??
一个横贯本文的问题:虽然同构在数学上很优美,但是鉴于这个同构映射一直在变化,它真的有那么好学吗??
数学联邦政治世界观提示您:看后求收藏(同人小说网http://tongren.me),接着再看更方便。