Awesome Egocentric

exocentric (third-person) and egocentric (first-person)

Survey

Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision

Datasets

main category	sub-category	title	publish	comment	Code Avaliable	Computing Resource
Egocentric Video Understanding	Egocentric Video Grounding	Rgnet: A unified clip retrieval and grounding network for long videos	ECCV 2024		has code	4 NVIDIA-RTX-A6000
		Grounded question-answering in long egocentric videos	CVPR 2024		has code	4 NVIDIA A100 (80GB)
		Snag: Scalable and accurate video grounding	CVPR 2024		has code
		Object-Shot Enhanced Grounding Network for Egocentric Video	CVPR 2025		has code
	Egocentric Video Captioning	Retrieval-Augmented Egocentric Video Captioning	CVPR 2024		has code
	Egocentric Video Retrieval	EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieva	ECCV 2024		has code
		Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues	CVPR 2025		has code	8 V100 GPUs
	Egocentric Videos Mistake Detection	Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities	CVPR 2025		no code
	Egocentric Environment Understanding	DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos	CVPR 2025	Dynamic Object Segmentation, Affordance Segmentation	has code	1 NVIDIA 4090.
	Egocentric Motion and pose estimation	EgoPoseFormer: A simple baseline for stereo egocentric 3D human pose estimation	ECCV 2024		has code
		Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement	CVPR 2024		has code
		REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning	CVPR 2025		has code	NVIDIA RTX8000
	Natural Language-based Egocentric Task Verification	PHGC: Procedural Heterogeneous Graph Completion for Natural Language Task Verification in Egocentric Videos	CVPR 2025	agents to determine if operation flows of procedural tasks in egocentric videos	has code
	Open-world 3D Segmentation	EgoLifter : Open-world 3D Segmentation for Egocentric Perception	ECCV 2024		has code	1 NVIDIA A100 (40GB)
	Audio-Visual Localization	Spherical World-Locking for Audio-Visual Localization in Egocentric Videos	ECCV 2024		no code
	Social Role Understanding.	Ex2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding	ECCV 2024		no code
	Egocentric Gaze Anticipation	Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation	ECCV 2024
	egocentric visual grounding	Visual Intention Grounding for Egocentric Assistants	ICCV 2025
Egocentric Motion	Action Recognition	Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition	ECCV 2024		has code
		Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition	ECCV 2024		has code
	Temporal Action Segmentation	Synchronization is All You Need:Exocentric-to-Egocentric Transfer for TemporalAction Segmentation with UnlabeledSynchronized Video Pairs	ECCV 2024		has code
	egocentric activity recognition	ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition	ICCV 2025
	Hand Pose Estimation & Reconstruction	3D Hand Pose Estimation in Everyday Egocentric Images	ECCV 2024		has code	2 A40
		HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos	CVPR 2025		has code	4 NVIDIA A800
	Motion Segmentation	Layered Motion Fusion: Lifting Motion Segmentation to 3D in Egocentric Videos	CVPR 2025		no code	1 NVIDIA RTX A4000
	Motions LM	EgoLM: Multi-Modal Language Model of Egocentric Motions	CVPR 2025		no code
	Human Motion Capture	Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input	CVPR 2025		no code
	Object Manipulation Trajectory	Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision	CVPR 2025		has code	same as PointLLM
Others	Egocentric Interaction Reasoning and pixel Grounding (Ego-IRG).	ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction	CVPR 2025	Referring Image Segmentation (RIS), Egocentric Hand-Object Interaction detection (EHOI), and Question Answering (EgoVQA). Egocentric	no code	4 NVIDIA 6000Ada
	Generation of Action Sounds	Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos	ECCV 2024		has code
	3D Human Mesh Recovery	Fish2Mesh Transformer: 3D Human Mesh Recovery from Egocentric Vision	ICCV 2025
Egocentric Datasets		Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild	ECCV 2024	Motion tasks, Multimodal spatial reasoning and video understanding
	Composed Video Retrieva	EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieva	ECCV 2024
	Synthetic Data about Hand-Object Interaction Detection	Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection	ECCV 2024
	Body Tracking	EgoBody3M: Egocentric Body Tracking on a VR Headset using a Diverse Dataset	ECCV 2024
	3D Hand and Object Tracking	HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos	CVPR 2025
		HD-EPIC: A Highly-Detailed Egocentric Video Dataset	CVPR 2025	VQA
		Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives	CVPR 2024
		EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision	CVPR 2025
		Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos	CVPR 2025
		EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering	CVPR 2025
		RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments	CVPR 2025	a suite of long-context, life-oriented question-answering tasks
	EgoLifeQA	EgoLife: Towards Egocentric Life Assistant	CVPR 2025
	Generation of Action Sounds	Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos	ECCV 2024
	egocentric video grounding	Fine-grained Spatiotemporal Grounding on Egocentric Videos	ICCV 2025

alt text

Words and Sentences

spur 激励，鼓励；促进 cross-modal heterogeneity and hierarchical misalignment challenges dub 把……称为，给……起绰号 Literature on xxx is rich

Survey​

Datasets​

Words and Sentences​

Survey

Datasets

Words and Sentences