刘婷婷,李玉鹏,张良(中国民航大学天津市智能信号与图像处理重点实验室, 天津 300300)
目的 使用运动历史点云（MHPC）进行人体行为识别的方法，由于点云数据量大，在提取特征时运算复杂度很高。而使用深度运动图（DMM）进行人体行为识别的方法，提取特征简单，但是包含的动作信息不全面，限制了人体行为识别精度的上限。针对上述问题，提出了一种多视角深度运动图的人体行为识别算法。方法 首先采用深度图序列生成MHPC对动作进行表示，接着将MHPC旋转特定角度补充更多视角下的动作信息；然后将原始和旋转后MHPC投影到笛卡儿坐标平面，生成多视角深度运动图，并对其提取方向梯度直方图，采用串联融合生成特征向量；最后使用支持向量机对特征向量进行分类识别，在MSR Action3D和自建数据库上对算法进行验证。结果 MSR Action3D数据库有2种实验设置，采用实验设置1时，算法识别率为96.8%，比APS_PHOG（axonometric projections and PHOG feature）算法高2.5%，比DMM算法高1.9%，比DMM_CRC（depth motion maps and collaborative representation classifier）算法高1.1%。采用实验设置2时，算法识别率为93.82%，比DMM算法高5.09%，比HON4D（histogram of oriented 4D surface normal）算法高4.93%。在自建数据库上该算法识别率达到97.98%，比MHPC算法高3.98%。结论 实验结果表明，多视角深度运动图不但解决了MHPC提取特征复杂的问题，而且使DMM包含了更多视角下的动作信息，有效提高了人体行为识别的精度。
Human action recognition based on multi-perspective depth motion maps
Liu Tingting,Li Yupeng,Zhang Liang(Key Laboratory of Advanced Signal and Image Processing, Civil Aviation University of China, Tianjin 300300, China)
Objective Action recognition based on depth data is gradually performed due to the insensitivity to illumination of depth data. Two main methods are used; one refers to the point clouds converted from depth maps, and the other refers to the depth motion maps (DMMs) generated from depth map projection. Motion history point cloud (MHPC) is proposed to represent actions, but the large amount of points in the MHPC incur expensive computations when extracting features. DMMs are generated by stacking the motion energy of depth map sequence projected onto three orthogonal Cartesian planes. Projecting the depth maps onto a specific plane provides additional body shape and motion information. However, DMM contains inadequate motion information, which caps the human action recognition accuracy, although extracting features from DMMs is simple. In other words, an action is represented by DMMs from only three views; consequently, the action information from other perspectives is lacking. Multi-perspective DMMs for human action recognition are proposed to solve the above problems. Method In the algorithm, MHPC is first generated from a depth map sequence to represent actions. Motion information under different perspectives is supplemented through rotating the MHPC around axis Y at a certain angle. The primary MHPC is then projected onto three orthogonal Cartesian planes, and the rotated MHPC is projected onto XOY planes. The multi-perspective DMMs are generated from these projected MHPCs. After projection, the point clouds are distributed in the plane where many overlapping points under the same coordinates exist. These points may come from the same frame of depth map or from different frames. We use these overlapping points to generate DMMs and capture the spatial energy distribution of motion. For example, the pixel in DMMs generated from the MHPC projected onto XOY plane is the sum of the absolute difference of z of the adjacent two overlapping points belonging to different frames. DMM generation from the MHPC projected onto YOZ and XOZ planes is similar to this; only the point of z is correspondingly changed to x and y. MHPC is projected onto three orthogonal Cartesian planes to generate DMMs from the front, side, and top views. The rotated MHPC is projected onto the XOY plane to generate DMMs under different views. Multi-perspective DMMs that encode the 4D information of an action to 2D maps are utilized to represent an action. Thus, the action information under considerable perspective is replenished. The values of the x,y,z of points in the projected MHPC are normalized to fixed values as multi-perspective DMM image coordinates, which can reduce the intraclass variability due to different action performers. According to experience, this study normalizes the values of x and z to 511 and those of y to 1 023. The histogram of oriented gradients is extracted from each DMM and then concatenated as a feature vector of an action. Finally, the SVM classifier is adopted to train the classifier to recognize the action. Experiments with this method on the MSR Action3D dataset and our dataset are performed. Result The proposed algorithm exhibits improved performances on MSR Action 3D database and our dataset. Two experimental settings are considered for MSR Action3D. The proposed algorithm achieves an identification rate of 96.8% in the first experimental setting, which is obviously better than those for most algorithms. The action recognition rate of the proposed algorithm is 2.5% higher than that of the APS-PHOG (axonometric projections and PHOG feature) algorithm, 1.9% higher than that of the DMM algorithm, and 1.1% higher than that of the DMM_CRC (DMMs and collaborative representation classifier) algorithm. In the second experimental setting, the recognition rate of the proposed algorithm reaches 93.82%, which is 5.09% higher than that of the DMM algorithm, 4.93% higher than that of the HON4D algorithm, 2.18% higher than that of the HOPC algorithm, and 1.92% higher than that of the DMM_LBP feature fusion. In our database, the recognition rate of the proposed algorithm is 97.98%, which is 3.98% higher than that of the MHPC algorithm. Conclusion MHPC is used to represent actions which supplement the action information from different perspectives by rotating certain angles. Multi-perspective DMMs are generated by computing the distribution of overlapping points in the projected MHPC, which captures the spatial distribution of the absolute motion energy. Coordinate normalization reduces the intraclass variability. Experimental results show that multi-perspective DMMs not only solve the difficulty of extracting features from MHPCs but also supplement the motion information of traditional DMMs. Human action recognition based on multi-perspective DMMs outperforms some existing methods. The new approach combines the method of point clouds with the method of deep motion map, utilizing the advantages of both and weakening their disadvantages.