Skip to main content

Awesome Egocentric

exocentric (third-person) and egocentric (first-person)

Survey

  1. Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision

Datasets

main categorysub-categorytitlepublishcommentCode AvaliableComputing Resource
Egocentric Video UnderstandingEgocentric Video GroundingRgnet: A unified clip retrieval and grounding network for long videosECCV 2024has code4 NVIDIA-RTX-A6000
Grounded question-answering in long egocentric videosCVPR 2024has code4 NVIDIA A100 (80GB)
Snag: Scalable and accurate video groundingCVPR 2024has code
Object-Shot Enhanced Grounding Network for Egocentric VideoCVPR 2025has code
Egocentric Video CaptioningRetrieval-Augmented Egocentric Video CaptioningCVPR 2024has code
Egocentric Video RetrievalEgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video RetrievaECCV 2024has code
Sound Bridge: Associating Egocentric and Exocentric Videos via Audio CuesCVPR 2025has code8 V100 GPUs
Egocentric Videos Mistake DetectionGazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human ActivitiesCVPR 2025no code
Egocentric Environment UnderstandingDIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric VideosCVPR 2025Dynamic Object Segmentation, Affordance Segmentationhas code1 NVIDIA 4090.
Egocentric Motion and pose estimationEgoPoseFormer: A simple baseline for stereo egocentric 3D human pose estimationECCV 2024has code
Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion RefinementCVPR 2024has code
REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity ConditioningCVPR 2025has codeNVIDIA RTX8000
Natural Language-based Egocentric Task VerificationPHGC: Procedural Heterogeneous Graph Completion for Natural Language Task Verification in Egocentric VideosCVPR 2025agents to determine if operation flows of procedural tasks in egocentric videoshas code
Open-world 3D SegmentationEgoLifter : Open-world 3D Segmentation for Egocentric PerceptionECCV 2024has code1 NVIDIA A100 (40GB)
Audio-Visual LocalizationSpherical World-Locking for Audio-Visual Localization in Egocentric VideosECCV 2024no code
Social Role Understanding.Ex2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role UnderstandingECCV 2024no code
Egocentric Gaze AnticipationListen to Look into the Future: Audio-Visual Egocentric Gaze AnticipationECCV 2024
egocentric visual groundingVisual Intention Grounding for Egocentric AssistantsICCV 2025
Egocentric MotionAction RecognitionMasked Video and Body-worn IMU Autoencoder for Egocentric Action RecognitionECCV 2024has code
Multimodal Cross-Domain Few-Shot Learning for Egocentric Action RecognitionECCV 2024has code
Temporal Action SegmentationSynchronization is All You Need:Exocentric-to-Egocentric Transfer for TemporalAction Segmentation with UnlabeledSynchronized Video PairsECCV 2024has code
egocentric activity recognitionProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity RecognitionICCV 2025
Hand Pose Estimation & Reconstruction3D Hand Pose Estimation in Everyday Egocentric ImagesECCV 2024has code2 A40
HaWoR: World-Space Hand Motion Reconstruction from Egocentric VideosCVPR 2025has code4 NVIDIA A800
Motion SegmentationLayered Motion Fusion: Lifting Motion Segmentation to 3D in Egocentric VideosCVPR 2025no code1 NVIDIA RTX A4000
Motions LMEgoLM: Multi-Modal Language Model of Egocentric MotionsCVPR 2025no code
Human Motion CaptureEgo4o: Egocentric Human Motion Capture and Understanding from Multi-Modal InputCVPR 2025no code
Object Manipulation TrajectoryGenerating 6DoF Object Manipulation Trajectories from Action Description in Egocentric VisionCVPR 2025has codesame as PointLLM
OthersEgocentric Interaction Reasoning and pixel Grounding (Ego-IRG).ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric InteractionCVPR 2025Referring Image Segmentation (RIS), Egocentric Hand-Object Interaction detection (EHOI), and Question Answering (EgoVQA). Egocentricno code4 NVIDIA 6000Ada
Generation of Action SoundsAction2Sound: Ambient-Aware Generation of Action Sounds from Egocentric VideosECCV 2024has code
3D Human Mesh RecoveryFish2Mesh Transformer: 3D Human Mesh Recovery from Egocentric VisionICCV 2025
Egocentric DatasetsNymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the WildECCV 2024Motion tasks, Multimodal spatial reasoning and video understanding
Composed Video RetrievaEgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video RetrievaECCV 2024
Synthetic Data about Hand-Object Interaction DetectionAre Synthetic Data Useful for Egocentric Hand-Object Interaction DetectionECCV 2024
Body TrackingEgoBody3M: Egocentric Body Tracking on a VR Headset using a Diverse DatasetECCV 2024
3D Hand and Object TrackingHOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View VideosCVPR 2025
HD-EPIC: A Highly-Detailed Egocentric Video DatasetCVPR 2025VQA
Ego-exo4d: Understanding skilled human activity from first-and third-person perspectivesCVPR 2024
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric VisionCVPR 2025
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric VideosCVPR 2025
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question AnsweringCVPR 2025
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured EnvironmentsCVPR 2025a suite of long-context, life-oriented question-answering tasks
EgoLifeQAEgoLife: Towards Egocentric Life AssistantCVPR 2025
Generation of Action SoundsAction2Sound: Ambient-Aware Generation of Action Sounds from Egocentric VideosECCV 2024
egocentric video groundingFine-grained Spatiotemporal Grounding on Egocentric VideosICCV 2025

alt text

Words and Sentences

spur 激励,鼓励;促进 cross-modal heterogeneity and hierarchical misalignment challenges dub 把……称为,给……起绰号 Literature on xxx is rich