MonkeyTrail: A scalable video-based method for tracking macaque movement trajectory in daily living cages
摘要: 猕猴的行为分析能为神经科学研究提供重要的实验证据。近年来，自动化的动物行为视频分析受到了广泛的关注。然而，这些方法大多需要特定的实验环境以减少物体遮挡或环境变化带来的干扰，目前还缺乏能够规模化用于日常饲养条件下猕猴运动轨迹跟踪的有效手段。在该研究中，我们提出了一种新的方法(MonkeyTrail)用于实现这一目的。其关键原理是通过频繁生成的虚拟空背景，结合背景减除法准确获得包含运动中动物的前景图像。空背景生成利用了帧差法(FDM)和基于深度学习的视觉目标检测模型(YOLOv5)。整个装置由低成本的硬件构成，并可以在单笼饲养猕猴的日常环境中有效工作。为了测试这一方法的性能，我们标定了>8000帧的视频图像作为验证数据集，其中包含各种条件下的猕猴边界框数据。测试结果表明，在相同条件下，MonkeyTrail的跟踪精度和稳定性均超过了传统帧差法、背景减除法和两种基于深度学习的方法(YOLOv5和SSD)。通过对长期监控视频的分析，MonkeyTrail成功地检测到了猕猴在运动量和空间偏好方面的变化。这些结果表明，该方法可以用于实现低成本、较大规模的猕猴日常行为分析。Abstract: Behavioral analysis of macaques provides important experimental evidence in the field of neuroscience. In recent years, video-based automatic animal behavior analysis has received widespread attention. However, methods capable of extracting and analyzing daily movement trajectories of macaques in their daily living cages remain underdeveloped, with previous approaches usually requiring specific environments to reduce interference from occlusion or environmental change. Here, we introduce a novel method, called MonkeyTrail, which satisfies the above requirements by frequently generating virtual empty backgrounds and using background subtraction to accurately obtain the foreground of moving animals. The empty background is generated by combining the frame difference method (FDM) and deep learning-based model (YOLOv5). The entire setup can be operated with low-cost hardware and can be applied to the daily living environments of individually caged macaques. To test MonkeyTrail performance, we labeled a dataset containing >8 000 video frames with the bounding boxes of macaques under various conditions as ground-truth. Results showed that the tracking accuracy and stability of MonkeyTrail exceeded that of two deep learning-based methods (YOLOv5 and Single-Shot MultiBox Detector), traditional frame difference method, and naïve background subtraction method. Using MonkeyTrail to analyze long-term surveillance video recordings, we successfully assessed changes in animal behavior in terms of movement amount and spatial preference. Thus, these findings demonstrate that MonkeyTrail enables low-cost, large-scale daily behavioral analysis of macaques.
Figure 1. Overall recording environment and camera setup
A: One frame of recorded video, showing arrangement of monkey cages. For each recording, two cages in upper and middle positions with better visibility (marked by yellow box) were analyzed by proposed method. Position of camera in A is marked by red box. B: Diagram showing setup of recording cameras mounted on the other side of the room above cage height. Yellow and red boxes in B correspond to A.
Figure 3. Influence of environmental changes on efficacy of background subtraction
A: Relationships among B–H. (C, F), (D, G), (E, H) are empty backgrounds at certain times and corresponding background subtraction results. Empty backgrounds of C, D, and E were obtained at 1 h intervals. Real-time frame B subtracted from C–E is a video frame near time of E.
Figure 5. Background subtraction process with generated empty background
A: One video frame showing typical situation in daily living cage. B: Background subtraction between A and virtual empty background generated temporally close to A, thus highlighting foreground containing animal. C: Image processing result of B, after spatial median filtering, binarizing, eroding, and dilating. A and B are redrawn from Figure 3B and H, respectively.
Figure 6. Representative tracking results for macaque during daytime and nighttime
Green box and blue line represent bounding box and trajectory, respectively. Sequence of frames is from left to right, then top to bottom. Time interval between each frame is >10 s. These examples include different motions and various levels of occlusion.
Figure 7. Visualization of performance in generating bounding boxes by different methods
Results of several trajectory tracking methods were compared with results of manual annotation to calculate accuracy. IoU, which measures accuracy of bounding box, was plotted for individual frames concatenated in time. A–E: Results of MonkeyTrail, SSD, YOLOv5, BSM, and FDM, with IoU shown in different colors. Green dashed line indicates mean value of IoU for MonkeyTrail, and red dashed lines represent mean values of IoU for corresponding methods. F: Amount of motion (calculated by length of trajectory movement) was plotted with the same time frame as in A–E. Gray box represents time when macaque is occluded by parts of cage. Data were from three monkeys, including 8130 frames.
Figure 10. Spatial preference of macaques extracted by MonkeyTrail
A, B/C, D, results of monkeys A/B obtained in 2019 and 2020, respectively. Horizontal and vertical axes of heat map represent X and Y coordinates of cage, respectively. Each heat map region represents number of times macaque’s trajectory passed through this space, normalized by maximum number found in one region (color-coded). Each heat map was obtained by averaging trajectory data of five days.
 Bala PC, Eisenreich BR, Yoo SBM, Hayden BY, Park HS, Zimmermann J. 2020. Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nature Communications, 11(1): 4560. doi: 10.1038/s41467-020-18441-5  Ballesta S, Reymond G, Pozzobon M, Duhamel JR. 2014. A real-time 3D video tracking system for monitoring primate groups. Journal of Neuroscience Methods, 234: 147−152. doi: 10.1016/j.jneumeth.2014.05.022  Bateson M, Martin PR. 2021. Measuring Behaviour: an Introductory Guide. 4th ed. Cambridge: Cambridge University Press.  Beckman D, Morrison JH. 2021. Towards developing a rhesus monkey model of early Alzheimer's disease focusing on women's health. American Journal of Primatology, 83(11): e23289.  Bezard E, Dovero S, Prunier C, Ravenscroft P, Chalon S, Guilloteau D, et al. 2001. Relationship between the appearance of symptoms and the level of nigrostriatal degeneration in a progressive 1-methyl-4-phenyl-1, 2, 3, 6-tetrahydropyridine-lesioned macaque model of Parkinson's disease. Journal of Neuroscience, 21(17): 6853−6861. doi: 10.1523/JNEUROSCI.21-17-06853.2001  Caiola M, Pittard D, Wichmann T, Galvan A. 2019. Quantification of movement in normal and parkinsonian macaques using video analysis. Journal of Neuroscience Methods, 322: 96−102. doi: 10.1016/j.jneumeth.2019.05.001  Chen YC, Yu JH, Niu YY, Qin DD, Liu HL, Li G, et al. 2017. Modeling rett syndrome using TALEN-edited MECP2 mutant cynomolgus monkeys. Cell, 169(5): 945−955.E10. doi: 10.1016/j.cell.2017.04.035  Francisco FA, Nührenberg P, Jordan AL. 2019. A low-cost, open-source framework for tracking and behavioural analysis of animals in aquatic ecosystems. BioRxiv: 571232.  Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, et al. 2007. Evolutionary and biomedical insights from the rhesus macaque genome. Science, 316(5822): 222−234. doi: 10.1126/science.1139247  Gonzalez RC, Woods RE. 2002. Digital Image Processing. 2nd ed. Prentice Hall: Upper Saddle River.  Graving JM, Chae D, Naik H, Li L, Koger B, Costelloe BR, et al. 2019. DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning. Elife, 8: e47994. doi: 10.7554/eLife.47994  Hashimoto T, Izawa Y, Yokoyama H, Kato T, Moriizumi T. 1999. A new video/computer method to measure the amount of overall movement in experimental animals (two-dimensional object-difference method). Journal of Neuroscience Methods, 91(1-2): 115−122. doi: 10.1016/S0165-0270(99)00082-5  Hu GY, Cui B, Yu S. 2020a. Joint learning in the spatio-temporal and frequency domains for skeleton-based action recognition. IEEE Transactions on Multimedia, 22(9): 2207−2220. doi: 10.1109/TMM.2019.2953325  Hu GY, Cui B, He Y, Yu S. 2020b. Progressive relation learning for group activity recognition. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 977–986.  Jocher G. 2021. YOLOv5.https://github.com/ultralytics/yolov5.  Johansson G. 1973. Visual perception of biological motion and a model for its analysis. Perception & Psychophysics, 14(2): 201−211.  Krakauer JW, Ghazanfar AA, Gomez-Marin A, Maciver MA, Poeppel D. 2017. Neuroscience needs behavior: correcting a reductionist bias. Neuron, 93(3): 480−490. doi: 10.1016/j.neuron.2016.12.041  Lehner PN. 1987. Design and execution of animal behavior research: an overview. Journal of Animal Science, 65(5): 1213−1219. doi: 10.2527/jas1987.6551213x  Lind NM, Vinther M, Hemmingsen RP, Hansen AK. 2005. Validation of a digital video tracking system for recording pig locomotor behaviour. Journal of Neuroscience Methods, 143(2): 123−132. doi: 10.1016/j.jneumeth.2004.09.019  Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, et al. 2016a. SSD: single shot multibox detector. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 21–37.  Liu Z, Li X, Zhang JT, Cai YJ, Cheng TL, Cheng C, et al. 2016b. Autism-like behaviours and germline transmission in transgenic monkeys overexpressing MeCP2. Nature, 530(7588): 98−102. doi: 10.1038/nature16533  Mathis A, Mamidanna P, Cury KM, Abe T, Murthy VN, Mathis MW, et al. 2018. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience, 21(9): 1281−1289. doi: 10.1038/s41593-018-0209-y  Mathis A, Schneider S, Lauer J, Mathis MW. 2020. A primer on motion capture with deep learning: principles, pitfalls, and perspectives. Neuron, 108(1): 44−65. doi: 10.1016/j.neuron.2020.09.017  Nice M M. 1954. Reviewed work: The Herring Gull's World. A study of the social behaviour of birds by Niko Tinbergen. Bird-Banding, 25(2): 81−82. doi: 10.2307/4510469  Pandya JD, Grondin R, Yonutas HM, Haghnazar H, Gash DM, Zhang ZM, et al. 2015. Decreased mitochondrial bioenergetics and calcium buffering capacity in the basal ganglia correlates with motor deficits in a nonhuman primate model of aging. Neurobiology of Aging, 36(5): 1903−1913. doi: 10.1016/j.neurobiolaging.2015.01.018  Redmon J, Divvala S, Girshick R, Farhadi A. 2016. You only look once: Unified, real-time object detection. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 779–788.  Togasaki DM, Hsu A, Samant M, Farzan B, DeLanney LE, Langston JW, et al. 2005. The Webcam system: a simple, automated, computer-based video system for quantitative measurement of movement in nonhuman primates. Journal of Neuroscience Methods, 145(1-2): 159−166. doi: 10.1016/j.jneumeth.2004.12.010  Tzutalin. 2015. LabelImg.https://github.com/tzutalin/labelImg.  Ueno M, Hayashi H, Kabata R, Terada K, Yamada K. 2019. Automatically detecting and tracking free-ranging Japanese macaques in video recordings with deep learning and particle filters. Ethology, 125(5): 332−340. doi: 10.1111/eth.12851  Walton A, Branham A, Gash DM, Grondin R. 2006. Automated video analysis of age-related motor deficits in monkeys using EthoVision. Neurobiology of Aging, 27(10): 1477−1483. doi: 10.1016/j.neurobiolaging.2005.08.003  Wiltschko AB, Johnson MJ, Iurilli G, Peterson RE, Katon JM, Pashkovski SL, et al. 2015. Mapping sub-second structure in mouse behavior. Neuron, 88(6): 1121−1135. doi: 10.1016/j.neuron.2015.11.031  Wu Y, Lim J, Yang MH. 2013. Online object tracking: A benchmark. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2411–2418  Yabumoto T, Yoshida F, Miyauchi H, Baba K, Tsuda H, Ikenaka K, et al. 2019. MarmoDetector: a novel 3D automated system for the quantitative assessment of marmoset behavior. Journal of Neuroscience Methods, 322: 23−33. doi: 10.1016/j.jneumeth.2019.03.016  Yao Y, Jafarian Y, Park HS. 2019. MONET: multiview semi-supervised keypoint detection via epipolar divergence. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 753–762.