BEGIN:VCALENDAR VERSION:2.0 PRODID:Linklings LLC BEGIN:VTIMEZONE TZID:Asia/Seoul X-LIC-LOCATION:Asia/Seoul BEGIN:STANDARD TZOFFSETFROM:+0900 TZOFFSETTO:+0900 TZNAME:KST DTSTART:18871231T000000 DTSTART:19881009T020000 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTAMP:20230103T035307Z LOCATION:Auditorium\, Level 5\, West Wing DTSTART;TZID=Asia/Seoul:20221206T100000 DTEND;TZID=Asia/Seoul:20221206T120000 UID:siggraphasia_SIGGRAPH Asia 2022_sess153_papers_277@linklings.com SUMMARY:Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers DESCRIPTION:Technical Papers\n\nMasked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers\n\nSun, Zhou, Wang, Wu, Hong...\n \nPrevious studies have explored generating accurately lip-synced talking faces for arbitrary targets given audio conditions. However, most of them deform or generate the whole facial area, leading to non-realistic results . In this work, we delve into the formulation of altering only the mouth s hapes of the target person. This requires masking a large percentage of th e original image and seamlessly inpainting it with the aid of audio and re ference frames. To this end, we propose the Audio-Visual Context-Aware Tra nsformer (AV-CAT) framework, which produces accurate lip-sync with photo-r ealistic quality by predicting the masked mouth shapes. Our key insight is to exploit desired contextual information provided in audio and visual mo dalities thoroughly with delicately designed Transformers. Specifically, w e propose a convolution-Transformer hybrid backbone and design an attentio n-based fusion strategy for filling the masked parts. It uniformly attends to the textural information on the unmasked regions and the reference fra me. Then the semantic audio information is involved in enhancing the self- attention computation. Additionally, a refinement network with audio injec tion improves both image and lip-sync quality. Extensive experiments valid ate that our model can generate high-fidelity lip-synced results for arbit rary subjects.\n\nRegistration Category: FULL ACCESS, EXPERIENCE PLUS ACCE SS, EXPERIENCE ACCESS, TRADE EXHIBITOR\n\nLanguage: ENGLISH\n\nFormat: IN- PERSON URL:https://sa2022.siggraph.org/en/full-program/?id=papers_277&sess=sess15 3 END:VEVENT END:VCALENDAR