江少锋,赵鹏,杨素华,陈震,张聪炫(南昌航空大学测试与光电工程学院, 南昌 330063;南昌航空大学信息工程学院, 南昌 330063)
目的 基于马尔可夫随机场（MRF）的变分光流计算是一种较为鲁棒的光流计算方法，但是计算效率很低。置信传播算法（BP） 是一种针对MRF较为高效的全局优化算法。本文提出一种MRF变分光流计算模型并采用并行BP方法实现，极大提高计算效率。方法 提出的MRF变分光流计算模型中的数据项采用了Horn等人根据灰度守恒假设得到的光流基本约束方程，并采用非平方惩罚函数进行调整以平滑边界影响。为在CUDA平台上实现高效并行处理，本文提出了一种优化的基于置信传播的MRF并行光流计算方法。该优化方法在采用置信传播最小化MRF光流能量函数时，采用了一种4层的3维网络结构进行并行计算，每层对应MRF4邻域模型中的一个方向的信息传播，同时在每层中为每个像素分配多个线程采用并行降维法计算所要传递的信息，大大降低单线程计算负荷，大幅度提高计算效率。结果 采用旋转小球图像序列进行实验，计算效率提高314倍；采用旋转小球、Yosemite山谷和RubberWhale 3种不同图像序列，与Horn算法、Weickert算法、Hossen并行Lucas算法、Grauer-Gray并行MRF算法进行对比实验，本文方法得到最低的平均端点误差（AEE），分别为0.13、0.55和0.34。结论 本文提出了一种新的MRF光流计算模型，并在CUDA平台上实现了并行优化计算。实验结果表明，本文提出的并行计算方法在保持计算精度的同时极大提高了计算效率。本文方法对内存需求巨大，在处理高分辨率图像时，限制了采样点数，难以计算大位移。
Parallel computing of variational optical flow based on the Markov random field model
Jiang Shaofeng,Zhao Peng,Yang Suhua,Chen Zhen,Zhang Congxuan(School of Measuring and Optical Engineering, Nanchang Hangkong University, Nanchang 330063, China;School of Information Engineering, Nanchang Hangkong University, Nanchang 330063, China)
Objective The Markov random field (MRF) model for variational optical flow computing is a robust one. The MRF variational energy function includes the data and smooth terms. Brief propagation (BP) is a highly effective global method that minimizes the MRF energy function by iteratively transmitting messages from one pixel to the next in four directions. However, its computational cost remains high. The computational complexity of the BP method with a 3-level data pyramid is O(N×L2×(10.5T + 2)), where N is the number of pixels in the image, T is the iterative time and L is the width of spatial sampling. We proposed a new MRF model to calculate optical flow and a parallel method to minimize the MRF energy function, which can dramatically reduce computational time. Method The data term in the proposed MRF variational energy function for optical flow computing is derived from the optical flow constraint equation from the brightness constancy assumption proposed by Horn. The term is smoothed by a non-square penalty function to reduce the boundary effect. To realize parallel computing on the CUDA platform, we proposed an optimal parallel BP method to minimize the MRF energy function. This optimal method used a 3D grid with four levels to perform the parallel method. At varying levels, the message was transmitted along different directions in the neighborhood. At each level, we assigned L threads to one pixel and used the parallel dimension reduction method to calculate the message to be transmitted. Thus, we dramatically reduced the computational load of one thread and computational time. The dimension reduction method transforms the 2D energy function into two independent 1D energy functions and uses a low parabola envelope-based method to calculate the 1D energy function, whose computational complexity is O(4L2). The parallel dimension reduction method simultaneously calculates the MRF energy functions of the different rows (or columns) in the label region of optical flow, whose size is L×L by allowing each thread to correspond to a row (or column) in the region. Thus, the computational complexity of the proposed parallel dimension reduction method is reduced to O(4L). All parallel processing steps were completed on the CUDA platform for general parallel computing on GPU. The first step is calculating the data term by assigning N threads and letting each thread correspond to a pixel in the image. The corresponding computational complexity is O(L2). The second step is the use of the proposed parallel dimension reduction method to calculate the message and transmit it to up, down, left, and right for T times from the first to the third level in the data pyramid with three levels by assigning 4L threads to a pixel in the image. The corresponding computational complexity is O(3×4×L×T). The last step is outputting the optical flow corresponding to the minimal message in each pixel by assigning N threads and letting each thread correspond to a pixel in the image. The corresponding computational complexity is O(L2). Thus, the entire computational complexity of this method is only O(2L2 + 12L×T) with three-level data pyramid. Result In the experiment for rotational sphere image sequence, the computational time was only 0.27 s, and corresponding computational time of the conventional method was 85 s. The computational efficiency of the proposed method was 314 times more than that of the conventional method. Our proposed method performed as rapidly as the Horn method, and faster than the Weickert (6 s) and Grauer-Gray (0.68 s) methods but slower than the Hossen method (0.01 s). In comparative experiments on errors with the Horn, Weickert, Hossen, and Grauer-Gray methods by using three image sequences (sphere image sequence with rotational moving, the Yosemite Valley image sequence with high change of brightness, and RubberWhale image sequence with many moving objects), the proposed method achieved the lowest average endpoint error (AEE) (0.13, 0.55, and 0.34, respectively) on all testing image sequences. Conclusion This paper proposed a novel parallel computing strategy of variational optical flow based on the MRF model by using a 3D grid with four levels to simultaneously transmit the message in the BP method along four directions, where multi-threads were assigned to one pixel for calculating the message to be transmitted. The results indicate that our proposed model has high computational efficiency and can obtain accurate optical flow from image sequences with rotational moving, high change of brightness and many moving objects. However, our proposed method requires a substantial amount of memory, which limits the width of spatial sampling. Therefore, our proposed method is unsuitable for calculating optical flows with large displacement from image sequences with high resolution.