2025 International Conference on Generative Artificial Intelligence and Digital Media (GADM 2025)
Keynote Speakers
Home / Keynote Speakers



Committee

李青.jpg


Prof. Qing Li

The Hong Kong Polytechnic University, Hong Kong, China 

IEEE  Fellow

Qing Li is a Chair Professor and Head of the Department of Computing, the Hong Kong Polytechnic University. He received his B.Eng. from Hunan University (Changsha), and M.Sc. and Ph.D. degrees from the University of Southern California (Los Angeles), all in computer science. His research interests include multi-modal data management, conceptual data modeling, social media, Web services, and e-learning systems. He has authored/co-authored over 500 publications in these areas,with over 45,000 citations and H-index of 87 (source: Google Scholars). He is actively involved in the research community and has served as Editor-in-Chief of Computer & Education: X Realitty (CEXR) by Elsevier, associate editor of IEEE Transactions on Artificial Intelligence (TAI), IEEE Transactions on Cognitive and Developmental Systems (TCDS), IEEE Transactions on Knowledge and Data Engineering (TKDE), ACM Transactions on Internet Technology (TOIT), Data Science and Engineering (DSE), and World Wide Web (WWW) Journal, in addition to being a Conference and Program Chair/Co-Chair of numerous major international conferences. He also sits/sat in the Steering Committees of DASFAA, ER, ACM RecSys, IEEE U-MEDIA, and ICWL. Prof. Li is a Fellow of IEEE, AAIA, and IET/IEE.

Title: A Multi-level Querying Method for an Indoor Robot Smart Space

Abstract: 

Smart Space denotes dynamic, adaptive environments enhanced with robotics and AI technologies. Examples include smart homes/offices/cafes. By leveraging and integrating Computer Vision, Natural Language Processing, AIoT, Data Mining, Recommender Systems, and Sympathetic Computing, Smart Space can help improve efficiency, personalization, and user satisfactions with seamless interactions. In this talk, we introduce PolyRAG, a multi-level knowledge QA framework supporting multi-level querying for an indoor robot application system. Building on top of a naive RAG layer, we construct a knowledge pyramid by adding a knowledge graph layer and an ontology schema, so as to obtain a good balance of recall and precision when applied to a specific domain such as coffee robot interactions. An experimental coffee robot prototype is implemented, and preliminary empirical studies are conducted to show the effectiveness of our PolyRAG supporting top-down querying from ontology to KG to RAG. 









郭天佑.jpg


Prof. James Tin Yau KWOK

The Hong Kong University of Science and Technology, Hong Kong, China 

IEEE Fellow

Prof Kwok received his B.Sc. degree in Electrical and Electronic Engineering from the University of Hong Kong and his Ph.D. degree in computer science from the Hong Kong University of Science and Technology. He then joined the Department of Computer Science, Hong Kong Baptist University as an Assistant Professor. He returned to the Hong Kong University of Science and Technology and is now a Professor in the Department of Computer Science and Engineering. He is serving / served as an Associate Editor for the IEEE Transactions on Neural Networks and Learning Systems, Neural Networks, Neurocomputing, Artificial Intelligence Journal, International Journal of Data Science and Analytics, and on the Editorial Board of Machine Learning. He is also serving as Senior Area Chairs of major machine learning / AI conferences including NeurIPS, ICML, ICLR, IJCAI, and as Area Chairs of conferences including AAAI and ECML. He is on the IJCAI Board of Trustees. 

Title: Vision-Language Models: Pre-Training, Fine-Tuning and Trustworthiness

Abstract: 

Vision-language models (VLMs) are now widely used in various vision-language modeling tasks. However, there are still a number of challenges. First, cross-modal masked language modeling is often used to learn the vision-language associations. However, existing masking strategies are insufficient in that the masked tokens can sometimes be simply recovered with only the language information, ignoring the visual inputs. Second, during fine-tuning, multiple models with various hyperparameter configurations are often created, but typically only one of these models is actually utilized in the downstream task. Third, vision-language models are more vulnerable to jailbreak attacks than their LM predecessors.

 

To address the first issue, we use a masking strategy based on the saliencies of language tokens to the image. For the second issue, we consider the learned soup, which combines all fine-tuned models with learned weighting coefficients. While this can significantly enhance performance, it is also computationally expensive. We propose to mitigate this by formulating the learned soup as a computationally-efficient hyperplane optimization problem and employing block coordinate gradient descent to learn the mixing coefficients. Finally, to construct robust VLMs, we propose a training-free protecting approach that exploits the inherent safety awareness of LLMs, and generates safer responses via adaptively transforming unsafe images into texts to activate the intrinsic safety mechanism of pre-aligned LLMs in VLMs. 










彭小江.jpeg


Prof. Xiaojiang Peng

Shenzhen Technology University, China

IEEE Senior Member

Xiaojiang Peng (IEEE Senior Member) is a full professor at the College of Big Data and Internet, Shenzhen Technology University, and serves as the dean of artificial intelligence department. He received  his  Ph.D. degrees from Southwest Jiaotong University. He was an associate professor at Shenzhen Technology University from 2020 to 2023, and an associate professor at Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences from 2017 to 2020. Previously, he was a postdoctoral researcher at Idiap, Switzerland and Inria LEAR/THOTH, France from 2015 to 2017. He has published more than 100 top journal/conference papers (e.g., TIP, CVPR, ICCV, ECCV, NeurIPS, AAAI), garnering more than 6,700 citations on Google Scholar. His research interests include computer vision, affective computing, and generative AI applications.


Title: Emotional AI: From Facial Expression Neural Networks to Multimodal Large Language Models

Abstract: 

In an increasingly digital world, understanding human emotions is more important than ever. Affective computing bridges the gap between technology and human experience, enabling machines to recognize, interpret, and respond to our emotions. Nevertheless, it is challenging to analyse human emotions due to ambiguous annotations and data rarity. This talk will trace the evolution of affective computing, starting with the foundational work on facial expression recognition, which provided initial insights into how machines can decode visual cues of emotion. We will explore the limitations of early approaches and the subsequent shift towards more comprehensive multimodal emotion understanding, which integrates data from various sources, including image context, voice tone, and natural language. Moreover, we will examine recent Multimodal LLMs or LLMs in the field of affective computing, and highlight how these technologies enhance our ability to create emotionally intelligent systems. 






Ata Jahangir Moshayedi.jpg


Assoc. Prof. Ata Jahangir Moshayed

Jiangxi University of Science and Technology, China

IEEE Senior Member

Ata Jahangir Moshayedi received the Ph.D. degree in electronic science from Savitribai Phule Pune University, India. He is currently an Associate Professor with Jiangxi University of Science and Technology, China. He is a member of the editorial team of various conferences and published various articles in journals, two books published, and owns two patents. His research interests include robotics and automation/sensor modeling/bio-inspired robots, mobile robot olfaction/plume tracking, embedded systems/machine vision-based systems/virtual reality, and machine vision/artificial intelligence. He is a member of different scientific societies, such as IEEE, ACM, the Instrument Society of India, a Life Member, and a Lifetime Member of the Speed Society, India.ing, Embedded System, Machine vision-based Systems, Virtual reality and Artificial Intelligence.

Title: Service Robotic as an Assistance for alzimer disease

Abstract: 

Service robots are increasingly being integrated into various fields to assist humans in their daily lives. One of the most promising applications is in the care of individuals with Alzheimer’s disease (AD). This keynote explores the role of service robotics in providing assistance to those suffering from AD, examining their potential to enhance the quality of life, ease the burden on caregivers, and improve overall care delivery.The integration of service robotics into Alzheimer’s care presents a promising frontier for enhancing patient well-being and supporting caregivers. While challenges remain in terms of technology adaptation, cost, and ethical considerations, the potential benefits are substantial. Future advancements in robotics, AI, and machine learning will likely lead to more sophisticated, user-friendly, and effective solutions tailored to the unique needs of Alzheimer’s patients. Embracing these innovations can significantly contribute to improved care, greater independence for patients, and a better quality of life for both patients and their caregivers.  This keynote discusses the transformative role of service robotics in Alzheimer’s care, addressing challenges such as memory loss and caregiver burnout. It explores various types of robots—from companions to assistive and therapy robots—and showcases case studies highlighting their effectiveness. Ultimately, it emphasizes the potential of these technologies to improve patient well-being, enhance safety, and support daily activities, while considering future advancements in robotics and AI for even greater impact.