Volume 44 Issue 5
Sep.  2023
Turn off MathJax
Article Contents
Chuxi Li, Zifan Xiao, Yerong Li, Zhinan Chen, Xun Ji, Yiqun Liu, Shufei Feng, Zhen Zhang, Kaiming Zhang, Jianfeng Feng, Trevor W. Robbins, Shisheng Xiong, Yongchang Chen, Xiao Xiao. Deep learning-based activity recognition and fine motor identification using 2D skeletons of cynomolgus monkeys. Zoological Research, 2023, 44(5): 967-980. doi: 10.24272/j.issn.2095-8137.2022.449
Citation: Chuxi Li, Zifan Xiao, Yerong Li, Zhinan Chen, Xun Ji, Yiqun Liu, Shufei Feng, Zhen Zhang, Kaiming Zhang, Jianfeng Feng, Trevor W. Robbins, Shisheng Xiong, Yongchang Chen, Xiao Xiao. Deep learning-based activity recognition and fine motor identification using 2D skeletons of cynomolgus monkeys. Zoological Research, 2023, 44(5): 967-980. doi: 10.24272/j.issn.2095-8137.2022.449

Deep learning-based activity recognition and fine motor identification using 2D skeletons of cynomolgus monkeys

doi: 10.24272/j.issn.2095-8137.2022.449
The datasets and MonKit generated and/or analyzed in the current study are available from the corresponding author on reasonable request. Our model is provided and maintained on our GitHub repository (https://github.com/MonKitFudan/MonKit).
Supplementary data to this article can be found online.
The authors declare that they have no competing interests.
C.L., Z.X., and X.X. conceived the research and experiments. C.L., Z.X., and Y.L. designed the network structure and performed data collection and analysis. Y.L., Z.C. and X.J. participated in data collection and analysis. S.F., Z.Z., Y.C., and K.Z. provided video data and suggestions for the design of the dataset. J.F and T.W.R. provided valuable suggestions and advice on the research and experiments. X.X., S.X., and Y.C. provided the funding to support the research. X.X. supervised the study and led the writing of the manuscript. All authors read and approved the final version of the manuscript.
#Authors contributed equally to this work
Funds:  This work was supported by the National Key R&D Program of China (2021ZD0202805, 2019YFA0709504, 2021ZD0200900), National Defense Science and Technology Innovation Special Zone Spark Project (20-163-00-TS-009-152-01), National Natural Science Foundation of China (31900719, U20A20227, 82125008), Innovative Research Team of High-level Local Universities in Shanghai, Science and Technology Committee Rising-Star Program (19QA1401400), 111 Project (B18015), Shanghai Municipal Science and Technology Major Project (2018SHZDZX01), and Shanghai Center for Brain Science and Brain-Inspired Technology
More Information
  • Video-based action recognition is becoming a vital tool in clinical research and neuroscientific study for disorder detection and prediction. However, action recognition currently used in non-human primate (NHP) research relies heavily on intense manual labor and lacks standardized assessment. In this work, we established two standard benchmark datasets of NHPs in the laboratory: MonkeyinLab (MiL), which includes 13 categories of actions and postures, and MiL2D, which includes sequences of two-dimensional (2D) skeleton features. Furthermore, based on recent methodological advances in deep learning and skeleton visualization, we introduced the MonkeyMonitorKit (MonKit) toolbox for automatic action recognition, posture estimation, and identification of fine motor activity in monkeys. Using the datasets and MonKit, we evaluated the daily behaviors of wild-type cynomolgus monkeys within their home cages and experimental environments and compared these observations with the behaviors exhibited by cynomolgus monkeys possessing mutations in the MECP2 gene as a disease model of Rett syndrome (RTT). MonKit was used to assess motor function, stereotyped behaviors, and depressive phenotypes, with the outcomes compared with human manual detection. MonKit established consistent criteria for identifying behavior in NHPs with high accuracy and efficiency, thus providing a novel and comprehensive tool for assessing phenotypic behavior in monkeys.

  • The datasets and MonKit generated and/or analyzed in the current study are available from the corresponding author on reasonable request. Our model is provided and maintained on our GitHub repository (https://github.com/MonKitFudan/MonKit).
    Supplementary data to this article can be found online.
    The authors declare that they have no competing interests.
    C.L., Z.X., and X.X. conceived the research and experiments. C.L., Z.X., and Y.L. designed the network structure and performed data collection and analysis. Y.L., Z.C. and X.J. participated in data collection and analysis. S.F., Z.Z., Y.C., and K.Z. provided video data and suggestions for the design of the dataset. J.F and T.W.R. provided valuable suggestions and advice on the research and experiments. X.X., S.X., and Y.C. provided the funding to support the research. X.X. supervised the study and led the writing of the manuscript. All authors read and approved the final version of the manuscript.
    #Authors contributed equally to this work
  • loading
  • [1]
    Ahmad Z, Khan N. 2020. Human action recognition using deep multilevel multimodal (M2) fusion of depth and inertial sensors. IEEE Sensors Journal, 20(3): 1445−1455. doi: 10.1109/JSEN.2019.2947446
    [2]
    Amir RE, Van den Veyver IB, Wan MM, et al. 1999. Rett syndrome is caused by mutations in X-linked MECP2, encoding methyl-CpG-binding protein 2. Nature Genetics, 23(2): 185−188. doi: 10.1038/13810
    [3]
    Andriluka M, Pishchulin L, Gehler P, et al. 2014. 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of 2014 IEEE Computer Vision and Pattern Recognition. Columbus: IEEE.
    [4]
    Bala PC, Eisenreich BR, Yoo SBM, et al. 2020. Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nature Communications, 11(1): 4560. doi: 10.1038/s41467-020-18441-5
    [5]
    Ben Mabrouk A, Zagrouba E. 2018. Abnormal behavior recognition for intelligent video surveillance systems: a review. Expert Systems with Applications, 91: 480−491. doi: 10.1016/j.eswa.2017.09.029
    [6]
    Berger M, Agha NS, Gail A. 2020. Wireless recording from unrestrained monkeys reveals motor goal encoding beyond immediate reach in frontoparietal cortex. eLife, 9: e51322. doi: 10.7554/eLife.51322
    [7]
    Blake R. 1993. Cats perceive biological motion. Psychological Science, 4(1): 54−57. doi: 10.1111/j.1467-9280.1993.tb00557.x
    [8]
    Cao KD, Ji JW, Cao ZJ, et al. 2020. Few-shot video classification via temporal alignment. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 10618–10627.
    [9]
    Cashin A, Yorke J. 2018. The relationship between anxiety, external structure, behavioral history and becoming locked into restricted and repetitive behaviors in autism spectrum disorder. Issues in Mental Health Nursing, 39(6): 533−537. doi: 10.1080/01612840.2017.1418035
    [10]
    Chahrour M, Zoghbi HY. 2007. The story of Rett syndrome: from clinic to neurobiology. Neuron, 56(3): 422−437. doi: 10.1016/j.neuron.2007.10.001
    [11]
    Chattopadhay A, Sarkar A, Howlader P, et al. 2018. Grad-CAM++: generalized gradient-based visual explanations for deep convolutional networks. In: Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe: IEEE, 839–847.
    [12]
    Chen YC, Yu JH, Niu YY, et al. 2017. Modeling rett syndrome using TALEN-Edited MECP2 mutant cynomolgus monkeys. Cell, 169(5): 945−955.e10. doi: 10.1016/j.cell.2017.04.035
    [13]
    Chen YL, Wang ZC, Peng YX, et al. 2018. Cascaded pyramid network for multi-person pose estimation. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE.
    [14]
    Delanoeije J, Gerencsér L, Miklósi Á. 2020. Do dogs mind the dots? Investigating domestic dogs' (Canis familiaris) preferential looking at human‐shaped point‐light figures. Ethology, 126(6): 637−650. doi: 10.1111/eth.13016
    [15]
    Dittrich WH, Lea SEG. 1993. Motion as a natural category for pigeons: generalization and a feature‐positive effect. Journal of the Experimental Analysis of Behavior, 59(1): 115−129. doi: 10.1901/jeab.1993.59-115
    [16]
    Downey R, Rapport MJK. 2012. Motor activity in children with autism: a review of current literature. Pediatric Physical Therapy, 24(1): 2−20. doi: 10.1097/PEP.0b013e31823db95f
    [17]
    Feichtenhofer C, Pinz A, Zisserman A. 2016. Convolutional two-stream network fusion for video action recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 1933–1941.
    [18]
    Feng XL, Wang LN, Yang SC, et al. 2011. Maternal separation produces lasting changes in cortisol and behavior in rhesus monkeys. Proceedings of the National Academy of Sciences, 108(34): 14312−14317. doi: 10.1073/pnas.1010943108
    [19]
    Gosztolai A, Günel S, Lobato-Ríos V, et al. 2021. LiftPose3D, a deep learning-based approach for transforming two-dimensional to three-dimensional poses in laboratory animals. Nature Methods, 18(8): 975−981. doi: 10.1038/s41592-021-01226-z
    [20]
    Harlow HF, Suomi SJ. 1971. Production of depressive behaviors in young monkeys. Journal of Autism and Childhood Schizophrenia, 1(3): 246−255. doi: 10.1007/BF01557346
    [21]
    He KM, Zhang XY, Ren SQ, et al. 2016a. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 770–778.
    [22]
    He KM, Zhang XY, Ren SQ, et al. 2016b. Identity mappings in deep residual networks. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 630–645.
    [23]
    Hirasaki E, Kumakura H, Matano S. 2000. Biomechanical analysis of vertical climbing in the spider monkey and the Japanese macaque. American Journal of Physical Anthropology, 113(4): 455−472. doi: 10.1002/1096-8644(200012)113:4<455::AID-AJPA2>3.0.CO;2-C
    [24]
    Hossain E, Chetty G, Goecke R. 2013. Multi-view multi-modal gait based human identity recognition from surveillance videos. In: Proceedings of the 1st IAPR Workshop on Multimodal Pattern Recognition of Social Signals in Human-Computer Interaction. Tsukuba: Springer, 88–99.
    [25]
    Hryniewiecka-Jaworska A, Foden E, Kerr M, et al. 2016. Prevalence and associated features of depression in women with Rett syndrome. Journal of Intellectual Disability Research, 60(6): 564−570. doi: 10.1111/jir.12270
    [26]
    Joosten AV, Bundy AC, Einfeld SL. 2009. Intrinsic and extrinsic motivation for stereotypic and repetitive behavior. Journal of Autism and Developmental Disorders, 39(3): 521−531. doi: 10.1007/s10803-008-0654-7
    [27]
    Karashchuk P, Tuthill JC, Brunton BW. 2021. The DANNCE of the rats: a new toolkit for 3D tracking of animal behavior. Nature Methods, 18(5): 460−462. doi: 10.1038/s41592-021-01110-w
    [28]
    Karpathy A, Toderici G, Shetty S, et al. 2014. Large-scale video classification with convolutional neural networks. In: Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 1725–1732.
    [29]
    Kay W, Carreira J, Simonyan K, et al. 2017. The kinetics human action video dataset. arXiv preprint arXiv: 1705.06950.
    [30]
    Labuguen R, Matsumoto J, Negrete SB, et al. 2021. MacaquePose: a novel "in the wild" macaque monkey pose dataset for markerless motion capture. Frontiers in Behavioral Neuroscience, 14: 581154. doi: 10.3389/fnbeh.2020.581154
    [31]
    Li CX, Yang C, Li YR, et al. 2021a. MonkeyPosekit: automated markerless 2D pose estimation of monkey. In: Proceedings of 2021 China Automation Congress. Beijing: IEEE, 1280–1284.
    [32]
    Li WT, Wang QX, Liu X, et al. 2021b. Simple action for depression detection: using kinect-recorded human kinematic skeletal data. BMC Psychiatry, 21(1): 205. doi: 10.1186/s12888-021-03184-4
    [33]
    Li YS, Xia RJ, Liu X. 2020. Learning shape and motion representations for view invariant skeleton-based action recognition. Pattern Recognition, 103: 107293. doi: 10.1016/j.patcog.2020.107293
    [34]
    Li ZY, Gavrilyuk K, Gavves E, et al. 2018. VideoLSTM convolves, attends and flows for action recognition. Computer Vision and Image Understanding, 166: 41−50. doi: 10.1016/j.cviu.2017.10.011
    [35]
    Lin J, Gan C, Han S. 2019. TSM: temporal shift module for efficient video understanding. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE, 7083–7093.
    [36]
    Liu MS, Gao JQ, Hu GY, et al. 2022. MonkeyTrail: a scalable video-based method for tracking macaque movement trajectory in daily living cages. Zoological Research, 43(3): 343−351. doi: 10.24272/j.issn.2095-8137.2021.353
    [37]
    Liu Z, Li X, Zhang JT, et al. 2016. Autism-like behaviours and germline transmission in transgenic monkeys overexpressing MeCP2. Nature, 530(7588): 98−102. doi: 10.1038/nature16533
    [38]
    Lo Presti L, La Cascia M. 2016. 3D skeleton-based human action classification: a survey. Pattern Recognition, 53: 130−147. doi: 10.1016/j.patcog.2015.11.019
    [39]
    Ma X, Ma CL, Huang J, et al. 2017. Decoding lower limb muscle activity and kinematics from cortical neural spike trains during monkey performing stand and squat movements. Frontiers in Neuroscience, 11: 44.
    [40]
    Mathis A, Mamidanna P, Cury KM, et al. 2018. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience, 21(9): 1281−1289. doi: 10.1038/s41593-018-0209-y
    [41]
    Mendes LST, Manfro GG, Gadelha A, et al. 2018. Fine motor ability and psychiatric disorders in youth. European Child & Adolescent Psychiatry, 27(5): 605−613.
    [42]
    Nath T, Mathis A, Chen AC, et al. 2019. Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nature Protocols, 14(7): 2152−2176. doi: 10.1038/s41596-019-0176-0
    [43]
    Ng JYH, Hausknecht M, Vijayanarasimhan S, et al. 2015. Beyond short snippets: deep networks for video classification. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 4694–4702.
    [44]
    Qin DD, Wu SH, Chen YC, et al. 2019. Behavioral screening tools for identifying autism in macaques: existing and promising tests. Brain Research Bulletin, 146: 87−93. doi: 10.1016/j.brainresbull.2018.12.018
    [45]
    Qin ZQ, Zhang PY, Wu F, et al. 2020. FcaNet: frequency channel attention networks. arXiv preprint arXiv: 2012.11879.
    [46]
    Ricciardi C, Amboni M, De Santis C, et al. 2019. Using gait analysis’ parameters to classify Parkinsonism: a data mining approach. Computer Methods and Programs in Biomedicine, 180: 105033. doi: 10.1016/j.cmpb.2019.105033
    [47]
    Richter CP. 1931. The grasping reflex in the new-born monkey. Archives of Neurology and Psychiatry, 26(4): 784−790. doi: 10.1001/archneurpsyc.1931.02230100102008
    [48]
    Sabbe B, Hulstijn W, Van Hoof J, et al. 1996. Fine motor retardation and depression. Journal of Psychiatric Research, 30(4): 295−306. doi: 10.1016/0022-3956(96)00014-3
    [49]
    Shah RR, Bird AP. 2017. MeCP2 mutations: progress towards understanding and treating Rett syndrome. Genome Medicine, 9(1): 17. doi: 10.1186/s13073-017-0411-7
    [50]
    Sharma S, Kiros R, Salakhutdinov R. 2015. Action recognition using visual attention. arXiv preprint arXiv: 1511.04119.
    [51]
    Simonyan K, Zisserman A. 2014. Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 568–576.
    [52]
    Soomro K, Zamir AR, Shah M. 2012. UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv: 1212.0402.
    [53]
    Sun B, Zhang XY, Liu LZ, et al. 2017. Effects of head-down tilt on nerve conduction in rhesus monkeys. Chinese Medical Journal, 130(3): 323−327. doi: 10.4103/0366-6999.198925
    [54]
    Tran D, Bourdev L, Fergus R, et al. 2015. Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 4489–4497.
    [55]
    Tran TH, Le TL, Hoang VN, et al. 2017. Continuous detection of human fall using multimodal features from Kinect sensors in scalable environment. Computer Methods and Programs in Biomedicine, 146: 151−165. doi: 10.1016/j.cmpb.2017.05.007
    [56]
    Van Damme T, Simons J, Sabbe B, et al. 2015. Motor abilities of children and adolescents with a psychiatric condition: a systematic literature review. World Journal of Psychiatry, 5(3): 315−329. doi: 10.5498/wjp.v5.i3.315
    [57]
    Venkataraman V, Turaga P, Lehrer N, et al. 2013. Attractor-shape for dynamical analysis of human movement: applications in stroke rehabilitation and action recognition. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Portland: IEEE, 514–520.
    [58]
    Viher PV, Docx L, Van Hecke W, et al. 2019. Aberrant fronto-striatal connectivity and fine motor function in schizophrenia. Psychiatry Research:Neuroimaging, 288: 44−50. doi: 10.1016/j.pscychresns.2019.04.010
    [59]
    Vyas S, Rawat YS, Shah M. 2020. Multi-view action recognition using cross-view video prediction. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow: Springer, 427–444.
    [60]
    Wang JD, Sun K, Cheng TH, et al. 2021. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10): 3349−3364. doi: 10.1109/TPAMI.2020.2983686
    [61]
    Wang LM, Xiong YJ, Wang Z, et al. 2016. Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 20–36.
    [62]
    Xiao ZF, Liu YQ, Li CX, et al. 2022. Two-stream action recognition network based on temporal shift and split attention. Computer Systems & Applications, 31(1): 204−211. (in Chinese)
    [63]
    Xie SN, Girshick R, Dollár P, et al. 2017. Aggregated residual transformations for deep neural networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 1492–1500.
    [64]
    Yang LJ, Fan YC, Xu N. 2019. Video instance segmentation. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE, 5188–5197.
    [65]
    Zhang H, Wu CR, Zhang ZY, et al. 2022. ResNeSt: split-attention networks. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New Orleans: IEEE.
    [66]
    Zhou Y, Sharma J, Ke Q, et al. 2019. Atypical behaviour and connectivity in SHANK3-mutant macaques. Nature, 570(7761): 326−331. doi: 10.1038/s41586-019-1278-0
  • ZR-2022-449-Supplementary Materials.pdf
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(9)  / Tables(2)

    Article Metrics

    Article views (907) PDF downloads(210) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return