WebEyeTrack: Scalable Eye-Tracking for the Browser via On-Device Few-S…

페이지 정보

작성자 Reyes Osteen 작성일 25-09-25 03:31 조회 3 댓글 0

본문

With advancements in AI, new gaze estimation methods are exceeding state-of-the-art (SOTA) benchmarks, but their real-world software reveals a hole with commercial eye-monitoring solutions. Factors like model dimension, inference time, and privateness usually go unaddressed. Meanwhile, webcam-based mostly eye-tracking strategies lack ample accuracy, specifically as a consequence of head motion. To tackle these issues, we introduce WebEyeTrack, a framework that integrates lightweight SOTA gaze estimation models straight in the browser. Eye-tracking has been a transformative software for investigating human-pc interactions, as it uncovers subtle shifts in visual consideration (Jacob and Karn 2003). However, its reliance on costly specialized hardware, such as EyeLink one thousand and Tobii Pro Fusion has confined most gaze-monitoring analysis to controlled laboratory environments (Heck, Becker, and ItagPro Deutscher 2023). Similarly, digital actuality options just like the Apple Vision Pro remain financially out of reach for widespread use. These limitations have hindered the scalability and sensible utility of gaze-enhanced technologies and suggestions programs. To reduce reliance on specialized hardware, researchers have actively pursued webcam-based mostly eye-tracking options that utilize built-in cameras on client devices.



Two key areas of focus in this area are look-based mostly gaze estimation and webcam-primarily based eye-monitoring, both of which have made vital developments utilizing normal monocular cameras (Cheng et al. 2021). For instance, current look-based mostly strategies have shown improved accuracy on commonly used gaze estimation datasets equivalent to MPIIGaze (Zhang et al. 2015), MPIIFaceGaze (Zhang et al. 2016), and EyeDiap (Alberto Funes Mora, Monay, and Odobez 2014). However, many of these AI fashions primarily intention to realize state-of-the-art (SOTA) performance with out contemplating sensible deployment constraints. These constraints include varying display sizes, computational efficiency, mannequin size, ease of calibration, and the power to generalize to new customers. While some efforts have efficiently built-in gaze estimation models into comprehensive eye-tracking options (Heck, Becker, and Deutscher 2023), attaining actual-time, totally functional eye-monitoring methods remains a substantial technical problem. Retrofitting existing models that don't handle these design concerns often involves in depth optimization and should fail to satisfy sensible necessities.



Because of this, state-of-the-art gaze estimation strategies have not yet been broadly carried out, primarily due to the difficulties of running these AI models on resource-constrained devices. At the identical time, webcam-primarily based eye-monitoring methods have taken a practical strategy, addressing actual-world deployment challenges (Heck, Becker, and Deutscher 2023). These solutions are often tied to particular software program ecosystems and toolkits, hindering portability to platforms similar to cellular gadgets or internet browsers. As net applications achieve popularity for their scalability, ease of deployment, and cloud integration (Shukla et al. 2023), instruments like WebGazer (Papoutsaki et al. 2016) have emerged to help eye-monitoring straight within the browser. However, many browser-pleasant approaches rely on simple statistical or classical machine studying fashions (Heck, Becker, and Deutscher 2023), equivalent to ridge regression (Xu et al. 2015) or assist vector regression (Papoutsaki et al. 2016), and avoid 3D gaze reasoning to scale back computational load. While these techniques improve accessibility, they typically compromise accuracy and robustness below pure head motion.



vsco5fe6f34ce70dd.jpgTo bridge the gap between excessive-accuracy look-based mostly gaze estimation methods and scalable webcam-primarily based eye-monitoring solutions, we introduce WebEyeTrack, a few-shot, headpose-aware gaze estimation resolution for the browser (Fig 2). WebEyeTrack combines mannequin-based mostly headpose estimation (through 3D face reconstruction and radial procrustes analysis) with BlazeGaze, a lightweight CNN model optimized for real-time inference. We provide both Python and client-side JavaScript implementations to support mannequin development and seamless integration into research and deployment pipelines. In evaluations on customary gaze datasets, WebEyeTrack achieves comparable SOTA efficiency and demonstrates actual-time efficiency on mobile phones, tablets, and iTagPro device laptops. WebEyeTrack: an open-supply novel browser-pleasant framework that performs few-shot gaze estimation with privateness-preserving on-iTagPro device personalization and inference. A novel model-based metric headpose estimation via face mesh reconstruction and radial procrustes analysis. BlazeGaze: A novel, 670KB CNN mannequin primarily based on BlazeBlocks that achieves real-time inference on mobile CPUs and GPUs. Classical gaze estimation relied on mannequin-primarily based approaches for (1) 3D gaze estimation (predicting gaze route as a unit vector), and (2) 2D gaze estimation (predicting gaze target on a screen).



These methods used predefined eyeball fashions and intensive calibration procedures (Dongheng Li, Winfield, and Parkhurst 2005; Wood and Bulling 2014; Brousseau, Rose, and Eizenman 2018; Wang and Ji 2017). In contrast, trendy appearance-primarily based methods require minimal setup and leverage deep studying for improved robustness (Cheng et al. The emergence of CNNs and datasets akin to MPIIGaze (Zhang et al. 2015), itagpro device GazeCapture (Krafka et al. 2016), and EyeDiap (Alberto Funes Mora, Monay, and iTagPro device Odobez 2014) has led to the development of 2D and 3D gaze estimation techniques capable of achieving errors of 6-eight degrees and iTagPro device 3-7 centimeters (Zhang et al. 2015). Key techniques which have contributed to this progress embody multimodal inputs (Krafka et al. 2016), multitask studying (Yu, Liu, and Odobez 2019), self-supervised studying (Cheng, Lu, and Zhang 2018), information normalization (Zhang, Sugano, and Bulling 2018), and area adaptation (Li, Zhan, and Yang 2020). More lately, Vision Transformers have further enhanced accuracy, decreasing error to 4.0 degrees and 3.6 centimeters (Cheng and Lu 2022). Despite robust inside-dataset performance, generalization to unseen users stays poor (Cheng et al.

댓글목록 0

등록된 댓글이 없습니다.