# 1987 ~2017 年历届 ICCV 最佳论文（Marr Prize Paper）汇总

本文汇总了从1987年~2017年 ICCV 所有的**马尔奖获奖论文（Marr Prize Paper）**，附上作者和论文链接（点击论文题目即可跳转）, 文末有最佳论文合集的 下载链接 ~

ICCV 的全称是 IEEE International Conference on Computer Vision，即国际计算机视觉大会，由IEEE主办，与计算机视觉模式识别会议（CVPR）和欧洲计算机视觉会议（ECCV）并称计算机视觉方向的三大顶级会议

首届 ICCV 于 1987 年在 伦敦 举行，其后两年举办一届，2019年 ICCV 将在 韩国首尔 举办。

ICCV2019 全部论文分类汇总（含目标检测 / 图像分割等，0806 更新中）

ICCV 2019 全部论文下载及解读汇总（更新）

会议收录论文的内容包括：底层视觉与感知，颜色、光照与纹理处理，分割与聚合，运动与跟踪，立体视觉与运动结构重构，基于图像的建模，基于物理的建模，视觉中的统计学习，视频监控，物体、事件和场景的识别，基于视觉的图形学，图片和视频的获取，性能评估，具体应用等

**2017** （举办地：意大利威尼斯）

Mask R-CNN**作者：**何凯明, Facebook AI Research；

Georgia Gkioxari, Facebook AI Research；

Piotr Dollar, Facebook AI Research；

Ross Girshick, Facebook AI Research

**摘要：**我们提出一个概念上简单，灵活，通用的物体实例分割框架（object instance segmentation）。我们的方法能有效检测图像中的对象，同时为每个实例生成高质量的分割掩膜（segmentation mask）。我们将该方法称为 Mask R-CNN，是在 Faster R-CNN 上的扩展，即在用于边界框识别的现有分支上添加一个并行的用于预测对象掩膜（object mask）的分支。Mask R-CNN 的训练简单，仅比 Faster R-CNN 多一点系统开销，运行速度是 5 fps。此外，Mask R-CNN 很容易推广到其他任务，例如可以用于在同一个框架中判断人的姿势。

我们在 COCO 竞赛的3个任务上都得到最佳结果，包括实例分割，边界框对象检测，以及人物关键点检测。没有使用其他技巧，Mask R-CNN 在每个任务上都优于现有的单一模型，包括优于 COCO 2016 竞赛的获胜模型。我们希望这个简单而有效的方法将成为一个可靠的基准，有助于未来的实例层面识别的研究。

大神何恺明和Mask R-CNN简史

ICCV 2017 论文解读集锦

**2015** （举办地：智利圣地亚哥）

Deep Neural Decision Forests**作者：**Peter Kontschieder, Microsoft Research；

Madalina Fiterau, Carnegie Mellon University；

Antonio Criminisi, Microsoft Research；

Samuel Rota Bulò, Microsoft Research

**摘要：**We present Deep Neural Decision Forests – a novel ap-proach that unifies classification trees with the representa-

tion learning functionality known from deep convolutional networks, by training them in an end-to-end manner. To combine these two worlds, we introduce a stochastic and differentiable decision tree model, which steers the rep-resentation learning usually conducted in the initial lay-ers of a (deep) convolutional network. Our model differs from conventional deep networks because a decision for-est provides the final predictions and it differs from con-ventional decision forests since we propose a principled,joint and global optimization of split and leaf node param-eters. We show experimental results on benchmark machine learning datasets like MNIST and ImageNet and find on-par or superior results when compared to state-of-the-art

deep models. Most remarkably, we obtain Top5-Errors of only 7.84%/6.38% on ImageNet validation data when in-tegrating our forests in a single-crop, single/seven model GoogLeNet architecture, respectively. Thus, even without any form of training data set augmentation we are improv-ing on the 6.67% error obtained by the best GoogLeNet ar-chitecture (7 models, 144 crops).

**2013** （举办地：澳大利亚悉尼）

From Large Scale Image Categorization to Entry-Level Categories**作者：**Vicente Ordonez, University of North Carolina at Chapel Hill；

Jia Deng, Stanford University；

Yejin Choi, Stony Brook University；

Alexander Berg, University of North Carolina at Chapel Hill；

Tamara Berg, University of North Carolina at Chapel Hill

**摘要：**Entry level categories – the labels people will use to name an object – were originally defined and studied by

psychologists in the 1980s. In this paper we study entry-level categories at a large scale and learn the first mod-els for predicting entry-level categories for images. Our models combine visual recognition predictions with proxies for word “naturalness” mined from the enormous amounts of text on the web. We demonstrate the usefulness of our models for predicting nouns (entry-level words) associated with images by people. We also learn mappings between concepts predicted by existing visual recognition systems and entry-level concepts that could be useful for improv-ing human-focused applications such as natural language image description or retrieval.

**2011** （举办地：西班牙巴塞罗那）

Relative Attributes**作者：**Devi Parikh, Toyota Technological Institute at Chicago；

Kristen Grauman, University of Texas at Austin

**摘要：**Human-nameable visual "attributes" can benefit various recognition tasks. However, existing techniques restrict these properties to categorical labels (for example, a person is 'smiling' or not, a scene is 'dry' or not), and thus fail to capture more general semantic relationships. We propose to model relative attributes. Given training data stating how object/scene categories relate according to different attributes, we learn a ranking function per attribute. The learned ranking functions predict the relative strength of each property in novel images. We then build a generative model over the joint space of attribute ranking outputs, and propose a novel form of zero-shot learning in which the supervisor relates the unseen object category to previously seen objects via attributes (for example, 'bears are furrier than giraffes'). We further show how the proposed relative attributes enable richer textual descriptions for new images, which in practice are more precise for human interpretation. We demonstrate the approach on datasets of faces and natural scenes, and show its clear advantages over traditional binary attribute prediction for these new tasks.

论文解读：https://www.cc.gatech.edu/~parikh/relative.html

论文笔记：论文笔记 | Relative Attributes

**2009** (举办地：日本京都）

Discriminative models for multi-class object layout**作者：**Chaitanya Desai, University of California Irvine；

Deva Ramanan, University of California Irvine；

Charless Fowlkes, University of California Irvine

**摘要：**Many state-of-the-art approaches for object rec-ognition reduce the problem to a 0-1 classification task. This allows one to leverage sophisticated machine learning tech-niques for training classifiers from labeled examples. How-ever, these models are typically trained independently for each class using positive and negative examples cropped from images. At test-time, various post-processing heuris-tics such as non-maxima suppression (NMS) are required to reconcile multiple detections within and between differ-ent classes for each image. Though crucial to good perfor-mance on benchmarks, this post-processing is usually de-fined heuristically.We introduce a unified model for multi-class object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image window independently, our model simultaneously predicts astructured labeling of the entire image (Fig. 1). Our model learns statistics that capture the spatial arrangements of var-ious object classes in real images, both in terms of which arrangements to suppress through NMS and which arrange-ments to favor through spatial co-occurrence statistics.We formulate parameter estimation in our model as a max-margin learning problem. Given training images with ground-truth object locations, we show how to formu-

late learning as a convex optimization problem. We em-ploy the cutting plane algorithm of Joachims et al. (Mach.Learn. 2009) to efficiently learn a model from thousands of training images. We show state-of-the-art results on the PASCAL VOC benchmark that indicate the benefits of learning a global model encapsulating the spatial layout of multiple object classes (a preliminary version of this work appeared in ICCV 2009, Desai et al., IEEE international conference on computer vision, 2009).

**2007** （举办地：巴西里约热内卢）

Population Shape Regression From Random Design Data**作者：**Bradley Davis, University of North Carolina at Chapel Hill；

P. Thomas Fletcher, University of Utah；

Elizabeth Bullitt, University of North Carolina at Chapel Hill；

Sarang Joshi, University of Utah

**摘要：**Regression analysis is a powerful tool for the study of changes in a dependent variable as a function of an in-dependent regressor variable, and in particular it is ap-plicable to the study of anatomical growth and shape change. When the underlying process can be modeled by parameters in a Euclidean space, classical regression tech-niques [13, 34] are applicable and have been studied ex-tensively. However, recent work suggests that attempts to describe anatomical shapes using flat Euclidean spaces un-dermines our ability to represent natural biological vari-ability [9, 11].In this paper we develop a method for regression analy-sis ofgeneral, manifold-valueddata. Specifically, we extend Nadaraya-Watson kernel regression by recasting the regres-sion problem in terms of Fréchet expectation. Although this method is quite general, our driving problem is the study anatomical shape change as a function of age from random design image data.We demonstrate our method by analyzing shape change in the brain from a random design dataset of MR images of 89 healthy adult ranging in age from 22 to 79 years.To study the small scale changes in anatomy, we use the infinite dimensional manifold of diffeomorphic transforma-tions, with an associated metric. We regress a representa-tive anatomical shape, as a function of age, from this popu-lation.

**2005** （举办地：中国北京）

Globally Optimal Estimates for Geometric Reconstruction Problems**作者：**Fredrik Kahl, Lund University；Didier Henrion, LAAS-CNRS

**摘要：**We introduce a framework for computing statistically opti-mal estimates of geometric reconstruction problems. While traditional algorithms often suffer from either local minima or non-optimality - or a combination of both - we pursue the

goal of achieving global solutions of the statistically optimal cost-function.Our approach is based on a hierarchy of convex relax-ations to solve non-convex optimization problems with poly-nomials. These convex relaxations generate a monotone se-quence of lower bounds and we show how one can detect whether the globaloptimum is attainedat a given relaxation.

Thetechniqueis appliedtoanumberofclassicalvisionprob-lems: triangulation, camera pose, homography estimation andlast, but notleast, epipolargeometry estimation. Experi-mentalvalidationonbothsyntheticandrealdatais provided.In practice, only a few relaxations are needed for attaining the global optimum.

**2003** （举办地：法国尼斯）

Detecting Pedestrians using Patterns of Motion and Appearance**作者：**Paul Viola, Microsoft Research；

Michael J. Jones, Mitsubishi Electric Research Laboratories；

Daniel Snow, Mitsubishi Electric Research Laboratories

**摘要：**This paper describes a pedestrian detection system that in-tegrates image intensity information with motion informa-tion. We use a detection style algorithm that scans a detec-tor over two consecutive frames of a video sequence. The detector is trained (using AdaBoost) to take advantage of both motion and appearance information to detect a walk-

ing person. Past approaches have built detectors based on motion information or detectors based on appearance in-

formation, but ours is the first to combine both sources of information in a single detector. The implementation de-scribed runs at about 4 frames/second, detects pedestrians at very small scales (as small as 20x15 pixels), and has a very low false positive rate.Our approach builds on the detection work of Viola and Jones. Novel contributions of this paper include: i) devel-opment of a representation of image motion which is ex-tremely efficient, and ii) implementation of a state of the art pedestrian detection system which operates on low res-olution images under difficult conditions (such as rain and snow).

Image Parsing: Unifying Segmentation, Detection and Recognition**作者：**Zhuowen Tu, University of California Los Angeles；

Xiangrong Chen, University of California Los Angeles；

Alan L. Yuille, University of California Los Angeles；

Song-Chun Zhu, University of California Los Angeles

**摘要：**In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation in a “parsing graph”, in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dy-namically using a set of reversible Markov chain jumps. This computational framework integrates two popular inference approaches – generative (top-down) methods and discrim-inative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter com-putes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters.In our Markov chain design, the posterior probability, defined by the generative models, isthe invariant (target) probability for the Markov chain, and the discriminative probabilitiesare used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper,we focus on two types of visual patterns – generic visual patterns, such as texture and shad-ing, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation [46].). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and con-versely object detection can be improved by using generic visual patterns to explain away shadows and occlusions.

Image-based Rendering using Image-based Priors**作者：**Andrew Fitzgibbon, University of Oxford；

Yonatan Wexler, Weizmann Institute of Science；

Andrew Zisserman, University of Oxford

**摘要：**Given a set of images acquired from known viewpoints, we describe a method for synthesizing the image which would be seen from a new viewpoint. In contrast to existing tech-niques, which explicitly reconstruct the 3D geometry of the

scene, we transform the problem to the reconstruction of colour rather than depth. This retains the benefits of geo-metric constraints, but projects out the ambiguities in depth estimation which occur in textureless regions.On the other hand, regularization is still needed in or-der to generate high-quality images. The paper’s second contribution is to constrain the generated views to lie in the space of images whose texture statistics are those of the in-put images. This amounts to a image-based prior on the reconstruction which regularizes the solution, yielding re-alistic synthetic views. Examples are given of new view generation for cameras interpolated between the acquisi-tion viewpoints—which enables synthetic steadicam stabi-lization of a sequence with a high level of realism.

**2001** （举办地：加拿大温哥华）

Probabilistic Tracking with Exemplars in a Metric Space**作者：**Kentaro Toyama & Andrew Blake, Microsoft Research

**摘要：**A new, exemplar-based, probabilistic paradigm for visual tracking is presented. Probabilistic mecha-nisms are attractive because they handle fusion of information, especially temporal fusion, in a principled manner. Exemplarsareselectedrepresentativesofrawtrainingdata,usedheretorepresentprobabilisticmixturedistributions of object configurations. Their use avoids tedious hand-construction of object models, and problems with changes of topology. Usingexemplarsinplaceofaparameterizedmodelposesseveralchallenges,addressedherewithwhatwecallthe“Metric Mixture” (M 2 ) approach, which has a number of attractions. Principally, it provides alternatives to standard learning algorithms by allowing the use of metrics that are not embedded in a vector space. Secondly, it uses a noise model that is learned from training data. Lastly, it eliminates any need for an assumption of probabilistic pixelwise independence.Experiments demonstrate the effectiveness of the M 2 model in two domains: tracking walking people using “chamfer” distances on binary edge images, and tracking mouth movements by means of a shuffle distance.

The Space of All Stereo Images**作者：**Steven Seitz, University of Washington

**摘要：**A theory of stereo image formation is presented that enables a complete classification of all possible stereo views, including non-perspective varieties. Towards this end, the notion of epipolar geometry is generalized to apply to multiperspective images. It is shown that any stereo pair must consist of rays lying on one of three varieties of quadric surfaces. A unified representation is developed to model all classes of stereo views, based on the concept of a quadric view. The benefits include aunified treatment of projection and triangulation operations for all stereo views. The framework is applied to derive new types of stereo image representations with unusual and useful properties. Experimental examples of these images are constructed and used to obtain 3D binocular object reconstructions.

**1999** （举办地：希腊克基拉岛）

Euclidean Reconstruction and Reprojection up to Subgroups**作者：**Yi Ma, University of California Berkeley；

Stefano Soatto, Washington University in St. Louis；

Jana Kosecka, University of California Berkeley；

Shankar Sastry, University of California Berkeley

**摘要：**The necessary and sufficient conditionsfor being able to estimate scene structure, motion and camera calibration

from a sequence of images are very rarely satisfied in practice. What exactly can be estimated in sequences of

practical importance, when such conditions are not satisfied? In this paper we give a complete answer to this

question. For every camera motion that fails to meet the conditions, we give explicit formulas for the ambiguities

in the reconstructed scene, motion and calibration. Such a characterization is crucial both for designing robust

estimation algorithms (that do not try to recover parameters that cannot be recovered), and for generating novel

views of the scene by controlling the vantage point. To this end, we characterize explicitly all the vantage pointsthat

give rise to a valid Euclidean reprojection regardless of the ambiguity in the reconstruction. We also characterize

vantage points that generate views that are altogether invariant to the ambiguity. All the results are presented

using simple notation that involves no tensors nor complex projective geometry, and should be accessible with

basic background in linear algebra.

A Theory of Shape by Space Carving**作者：**Kiriakos Kutulakos, University of Rochester；

Steven Seitz, Carnegie Mellon University

**摘要：** In this paper we consider the problem of computing the 3D shape of an unknown, arbitrarily-shaped

scene from multiple photographs taken at known but arbitrarily-distributed viewpoints. By studying the equivalence

class of all 3D shapes that reproduce the input photographs, we prove the existence of a special member of this class,

the photo hull, that (1) can be computed directly from photographs of the scene, and (2) subsumes all other members

of this class. We then give a provably-correct algorithm, called Space Carving, for computing this shape and present

experimental results on complex real-world scenes. The approach is designed to (1) capture photorealistic shapes

that accurately model scene appearance from a wide range of viewpoints, and (2) account for the complex interactions

between occlusion, parallax, shading, and their view-dependent effects on scene-appearance.

**1998** （举办地：印度孟买）

Self-Calibration and Metric Reconstruction Inspite of Varying and Unknown Intrinsic Camera Parameters**作者：**Marc Pollefeys, Katholieke Universiteit Leuven；

Reinhard Koch, Katholieke Universiteit Leuven；

Luc Van Gool, Katholieke Universiteit Leuven

**摘要：**In this paper the theoretical and practical feasibility of self-calibration in the presence of varying intrinsic camera parameters is under investigation. The paper’s main contribution is to propose a self-calibration

method which efficiently deals with all kinds of constraints on the intrinsic camera parameters. Within this framework a practical method is proposed which can retrieve metric reconstruction from image sequences obtained with uncalibrated zooming/focusing cameras. The feasibility of the approach is illustrated on real and synthetic examples. Besides this a theoretical proof is given which shows that the absence of skew in the image plane is sufficient to allow forself-calibration. A counting argument is developed which –depending on the set of constraints–givesthe minimumsequence length forself-calibration and a method to detect critical motion sequencesis proposed

The Problem of Degeneracy in Structure and Motion Recovery from Uncalibrated Image Sequences**作者：**Phil Torr, Microsoft Research；

Andrew Fitzgibbon, University of Oxford；

Andrew Zisserman, University of Oxford

**摘要：**

**1995** （举办地：美国波士顿）

A Theory of Specular Surface Geometry**作者：**Michael Oren and Shree Nayar, Columbia University

**摘要：**

Shape from Shading with Interreflections under a Proximal Light Source: Distortion-Free Copying of an Unfolded Book**作者：**Toshikazu Wada, Hiroyuki Ukida, and Takashi Matsuyama

**摘要：**We address the problem of recovering the 3D shape of an unfolded book surface from the shading information in a scanner image. This shape-from-shading problem in a real world environment is made difficult by aproximal, movinglightsource, interreflections, specularreflections, andanonuniformalbedodistribution. Taking all these factors into account, we formulate the problem as an iterative, non-linear optimization problem. Piecewise polynomial models of the 3D shape and albedo distribution are introduced to efficiently and stably compute the shape in practice. Finally, we propose a method to restore the distorted scanner image based on the reconstructed 3D shape. The image restoration experiments for real book surfaces demonstrate that much of the geometric and photometric distortions are removed by our method.

**1993** （举办地：德国柏林）

Extracting Projective Structure from Single Perspective Views of 3D Point Sets**作者：**Charles A. Rothwell, David A. Forsyth, Andrew Zisserman, and Joseph L. Mundy

**摘要：**

**1990** （举办地：日本大阪）

Shape from Interreflections**作者：**Shree Nayar, Katsushi Ikeuchi, and Takeo Kanade

**摘要：**AI1 shape-from-intensity methods assume that points in a scene are only illuminated by sources of light. Most scenes consist of concave surfaces and/or concavities that result from multiple objects in the scene. In such cases， points in the scene reflect light between themselves. In the presence of these interreflections, shape-from-intensity methods produce erroneous (pseudo) estimates of shape and reflectance. The pseudo shape and reflectance estimates, however, are shown to carry information about the actual shape and Eflectance of the surface. An iterative algorithm is presented that simultaneously recovers the actual shape and the actual reflectance from the pseudo estimates.The recovery algorithm works on Lambertian surfaces of arbitrary shape with possibly varying and unknown reflectance. The general behavior of the algorithm and its convergence properties are discussed. Both simulation as well as experimental results are included to demonstrate the accuracy and stability of the algorithm.

**1988** （举办地：美国弗罗里达）

Color from Black and White**作者：**Brian Funt and Jian Ho,

**1987** （举办地：英国伦敦）

Optical Flow using Spatiotemporal Filters**作者：**David Heeger

**1990~2017年 ICCV 最佳论文（Marr Prize Paper）汇总**

下载链接：https://pan.baidu.com/s/1E6c-K4HCPEl7VAOgvXomzQ

提取码：提示：此内容登录后可查看

**推荐阅读：**

2000 ~2019 年历届 CVPR 最佳论文汇总

1996 ~2018 年历届 AAAI 最佳论文汇总

△ 扫一扫关注 **极市平台**

每天推送最新CV干货

微信公众号: 极市平台（ID: extrememart ）

每天推送最新CV干货

自由转载-非商用-非衍生-保持署名（创意共享3.0许可证）