Publications

\\\\* equal contribution; † corresponding author.

2025

  1. Last-vit.jpg
    Vision Transformer Needs More Than Register
    In submission, 2025
  2. LoopTrans
    Closed-Loop Transfer for Weakly-supervised Affordance Grounding
    In submission, 2025
  3. SuperChat
    SuperChat: Introducing Super-Image Representation for Video LLMs
    In submission, 2025
  4. EyesWideOpen
    Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video
    In submission, 2025
  5. HalTrapper
    Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context
    In submission, 2025
  6. SCHall
    Discovering Compositional Hallucination in LVLMs
    In submission, 2025
  7. TTA.jpg
    Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EM
    Accepted by CVPR, 2025
  8. GCD.png
    Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement
    Accepted by CVPR, 2025
  9. Rethinking.png
    Rethinking Query-based Transformer for Continual Image Segmentation
    Accepted by CVPR, 2025
  10. SegAfford.png
    SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
    Chunlin Yu*, Hanqing Wang*, Ye Shi, Haoyang Luo, Sibei Yang, Jingyi Yu, Jingya Wang
    Accepted by CVPR, 2025
  11. vton.png
    VTON 360: High-fidelity virtual try-on from any viewing direction
    Zijian He, Yuwei Ning, Yipeng Qin, Guangrun Wang, Sibei Yang, Liang Lin, Guanbin Li
    Accepted by CVPR, 2025
  12. dissecting.png
    Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability
    Yingdong Shi, Changming Li, Yifan Wang, Yongxiang Zhao, Anqi Pang, Sibei Yang, Jingyi Yu, Kan Ren
    Accepted by CVPR, 2025
  13. delman.jpg
    DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing
    Yi Wang, Fenghua Weng, Sibei Yang, Zhan Qin, Minlie Huang, Wenjie Wang
    Accepted by ACL, 2025
  14. DSN.png
    Don’t Say No: Jailbreaking LLM by Suppressing Refusal
    Yukai Zhou, Jian Lou, Zhijie Huang, Zhan Qin, Sibei Yang, Wenjie Wang
    Accepted by ACL, 2025
  15. mvtokenflow.gif
    MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow
    Hanzhuo Huang*, Yuan Liu*, Ge Zheng, Jiepeng Wang, Zhiyang Dou, Sibei Yang†
    Accepted by ICLR, 2025
  16. cityanchor.png
    CityAnchor: City-scale 3D Visual Grounding with Multi-modality LLMs
    jinpeng Li*, Haiping Wang*, Jiabin Chen, Yuan Liu, Zhiyang Dou, Yuexin Ma, Sibei Yang, Yuan Li, Wang Wenping, Zhen Dong, Bisheng Yang
    Accepted by ICLR, 2025
  17. ICLR2025
    Discovering Influential Neuron Path in Vision Transformers
    Yifan Wang, Yifei Liu, Yingdong Shi, Changming Li, Anqi Pang, Sibei Yang, Jingyi Yu, Kan Ren
    Accepted by ICLR, 2025

2024

  1. TPAMI2024
    A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-oriented Perspective
    Chaoqi Chen*, Yushuang Wu*, Qiyuan Dai*, Hong-Yu Zhou*, Mutian Xu, Sibei Yang†, Xiaoguang Han†, Yizhou Yu†
    Accepted by TPAMI, 2024
  2. Part2Object.png
    Part2Object: Hierarchical Unsupervised 3D Instance Segmentation
    Accepted by ECCV, 2024
  3. Plain-D.png
    Plain-DNet: A Plain Multi-Dataset Object Detector
    Accepted by ECCV, 2024
  4. wildrefer.png
    WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
    Zhenxiang Lin, Xidong Peng, Peishan Cong, Ge Zheng, Yujing Sun, Yuenan Hou, Xinge Zhu, Sibei Yang, Yuexin Ma
    Accepted by ECCV, 2024
  5. CVPR2024
    Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation
    Accepted by CVPR, 2024
  6. devil.png
    The Devil is in the Object Boundary: Towards Annotation-free Instance Segmentation Using Foundation Models
    Accepted by ICLR, 2024
  7. omg.png
    OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers
    Han Liang, Jiacheng Bao, Ruichi Zhang, Sihan Ren, Yuecheng Xu, Sibei Yang, Xin Chen, Jingyi Yu, Lan Xu
    Accepted by CVPR, 2024
  8. RealDex.png
    RealDex: Towards Human-like Grasping for Robotic Dexterous Hand
    Yumeng Liu*, Yaxun Yang*, Youzhuo Wang*, Xiaofei Wu, Jiamin Wang, Yichen Yao, Sören Schwertfeger, Sibei Yang, Wenping Wang, Jingyi Yu, Xuming He, Yuexin Ma
    Accepted by IJCAI, 2024

2023

  1. ddcot.jpg
    DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
    Accepted by NeurIPS, 2023
  2. flower_bloom.gif
    Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
    Hanzhuo Huang*Yufan Feng*Cheng Shi, Lan Xu, Jingyi Yu, Sibei Yang†
    Accepted by NeurIPS, 2023
  3. ICCV2023
  4. ICCV2023
    EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
    Accepted by ICCV, 2023
  5. ICCV2023
    CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection
    Accepted by ICCV, 2023
  6. ICCV2023
    Temporal Collection and Distribution for Referring Video Object Segmentation
    Accepted by ICCV, 2023
  7. ICCV2023
    Grounded lmage Text Matching with Mismatched Relation Reasoning
    Yu Wu*, Yana Wei*, Haozhe Wang, Yongfei Liu, Sibei Yang, Xuming He†
    Accepted by ICCV, 2023
  8. CVPR2023
    Contrastive Grouping with Transformer for Referring Image Segmentation
    Accepted by CVPR, 2023
  9. AAAI2023
    CCQ: Cross-Class Query Network for Partially Labeled Organ Segmentation
    Xuyang Liu, Bingbing Wen, and Sibei Yang†
    Accepted by AAAI, 2023
  10. SIGGRAPH2023
    DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance
    Longwen Zhang*, Qiwei Qiu*, Hongyang Lin*, Qixuan Zhang, Cheng Shi, Wei Yang, Ye Shi, Sibei Yang†, Lan Xu†, Jingyi Yu†
    Accepted by SIGGRAPH, 2023
  11. TPAMI2023
    A Unified Visual Information Preservation Framework for Self-supervised Pre-training in Medical Image Analysis
    Hong-Yu Zhou*, Chixiang Lu*, Chaoqi Chen, Sibei Yang, Yizhou Yu†
    Accepted by TPAMI, 2023
  12. WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
    Zhenxiang Lin, Xidong Peng, Peishan Cong, Yuenan Hou, Xinge Zhu, Sibei Yang, Yuexin Ma
    arXiv preprint arXiv:2304.05645, 2023

2022

  1. ECCV2022
    Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding
    Accepted by ECCV, 2022

2021

  1. TMM2021
    Structured attention network for referring image segmentation
    Liang Lin, Pengxiang Yan, Xiaoqian Xu, Sibei Yang, Kun Zeng, Guanbin Li
    Accepted by Transactions on Multimedia, 2021
  2. CVPR2021
    Bottom-up shift and reasoning for referring image segmentation
    Sibei Yang, Meng Xia, Guanbin Li, Hong-Yu Zhou, Yizhou Yu
    CVPR, 2021
  3. ICCV2021
    Convnets vs. transformers: Whose visual representations are more transferable?
    Hong-Yu Zhou, Chixiang Lu, Sibei Yang, Yizhou Yu
    Accepted by ICCV, 2021
  4. ICCV2021
    Preservational learning improves self-supervised medical image models by reconstructing diverse contexts
    Hong-Yu Zhou, Chixiang Lu, Sibei Yang, Xiaoguang Han, Yizhou Yu
    Accepted by ICCV, 2021

2020

  1. TPAMI2020
    Relationship-embedded representation learning for grounding referring expressions
    Sibei Yang, Guanbin Li, and Yizhou Yu
    TPAMI, 2020
  2. CVPR2020
    Graph-structured referring expression reasoning in the wild
    Sibei Yang, Guanbin Li, and Yizhou Yu
    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020
  3. ECCV2020
    Propagating over phrase relations for one-stage visual grounding
    Sibei Yang, Guanbin Li, and Yizhou Yu
    Accepted by ECCV, 2020

2019

  1. AAAI2019
    Non-Local Context Encoder: Robust Biomedical Image Segmentation against Adversarial Attacks
    Xiang He, Sibei Yang, Guanbin Li, Haofeng Li, HuiYou Chang, Yizhou Yu
    Accepted by AAAI, 2019
  2. Dynamic Graph Attention for Referring Expression Comprehension
    Sibei Yang, Guanbin Li, and Yizhou Yu
    Accepted by ICCV, Oct 2019
  3. CVPR2019
    Cross-Modal Relationship Inference for Grounding Referring Expressions
    Sibei Yang, Guanbin Li, and Yizhou Yu
    Accepted by CVPR, Oct 2019

2018

  1. CVPR2018
    Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning
    Weifeng Ge, Sibei Yang, and Yizhou Yu
    Accepted by CVPR, Jun 2018