BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Seoul
X-LIC-LOCATION:Asia/Seoul
BEGIN:STANDARD
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:KST
DTSTART:18871231T000000
DTSTART:19881009T020000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20230103T035307Z
LOCATION:Auditorium\, Level 5\, West Wing
DTSTART;TZID=Asia/Seoul:20221206T100000
DTEND;TZID=Asia/Seoul:20221206T120000
UID:siggraphasia_SIGGRAPH Asia 2022_sess153_papers_277@linklings.com
SUMMARY:Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation
in Transformers
DESCRIPTION:Technical Papers\n\nMasked Lip-Sync Prediction by Audio-Visual
Contextual Exploitation in Transformers\n\nSun, Zhou, Wang, Wu, Hong...\n
\nPrevious studies have explored generating accurately lip-synced talking
faces for arbitrary targets given audio conditions. However, most of them
deform or generate the whole facial area, leading to non-realistic results
. In this work, we delve into the formulation of altering only the mouth s
hapes of the target person. This requires masking a large percentage of th
e original image and seamlessly inpainting it with the aid of audio and re
ference frames. To this end, we propose the Audio-Visual Context-Aware Tra
nsformer (AV-CAT) framework, which produces accurate lip-sync with photo-r
ealistic quality by predicting the masked mouth shapes. Our key insight is
to exploit desired contextual information provided in audio and visual mo
dalities thoroughly with delicately designed Transformers. Specifically, w
e propose a convolution-Transformer hybrid backbone and design an attentio
n-based fusion strategy for filling the masked parts. It uniformly attends
to the textural information on the unmasked regions and the reference fra
me. Then the semantic audio information is involved in enhancing the self-
attention computation. Additionally, a refinement network with audio injec
tion improves both image and lip-sync quality. Extensive experiments valid
ate that our model can generate high-fidelity lip-synced results for arbit
rary subjects.\n\nRegistration Category: FULL ACCESS, EXPERIENCE PLUS ACCE
SS, EXPERIENCE ACCESS, TRADE EXHIBITOR\n\nLanguage: ENGLISH\n\nFormat: IN-
PERSON
URL:https://sa2022.siggraph.org/en/full-program/?id=papers_277&sess=sess15
3
END:VEVENT
END:VCALENDAR