BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Seoul X-LIC-LOCATION:Asia/Seoul BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:KST DTSTART:18871231T000000 DTSTART:19881009T020000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20230103T035309Z LOCATION:Room 325-AB\, Level 3\, West Wing DTSTART;TZID=Asia/Seoul:20221207T140000 DTEND;TZID=Asia/Seoul:20221207T153000 UID:siggraphasia_SIGGRAPH Asia 2022_sess162_papers_277@linklings.com SUMMARY:Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers DESCRIPTION:Technical Communications, Technical Papers\n\nMasked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers\n\nSun, Zhou, Wang, Wu, Hong...\n\nPrevious studies have explored generating accu rately lip-synced talking faces for arbitrary targets given audio conditio ns. However, most of them deform or generate the whole facial area, leadin g to non-realistic results. In this work, we delve into the formulation of altering only the mouth shapes of the target person. This requires maskin g a large percentage of the original image and seamlessly inpainting it wi th the aid of audio and reference frames. To this end, we propose the Audi o-Visual Context-Aware Transformer (AV-CAT) framework, which produces accu rate lip-sync with photo-realistic quality by predicting the masked mouth shapes. Our key insight is to exploit desired contextual information provi ded in audio and visual modalities thoroughly with delicately designed Tra nsformers. Specifically, we propose a convolution-Transformer hybrid backb one and design an attention-based fusion strategy for filling the masked p arts. It uniformly attends to the textural information on the unmasked reg ions and the reference frame. Then the semantic audio information is invol ved in enhancing the self-attention computation. Additionally, a refinemen t network with audio injection improves both image and lip-sync quality. E xtensive experiments validate that our model can generate high-fidelity li p-synced results for arbitrary subjects.\n\nRegistration Category: FULL AC CESS, ON-DEMAND ACCESS\n\nLanguage: ENGLISH\n\nFormat: IN-PERSON, ON-DEMAN D URL:https://sa2022.siggraph.org/en/full-program/?id=papers_277&sess=sess16 2 END:VEVENT END:VCALENDAR