1987 ~2017 年历届 ICCV 最佳论文（Marr Prize Paper）汇总
本文汇总了从1987年~2017年 ICCV 所有的马尔奖获奖论文（Marr Prize Paper），附上作者和论文链接（点击论文题目即可跳转）, 文末有最佳论文合集的 下载链接 ~
ICCV 的全称是 IEEE International Conference on Computer Vision，即国际计算机视觉大会，由IEEE主办，与计算机视觉模式识别会议（CVPR）和欧洲计算机视觉会议（ECCV）并称计算机视觉方向的三大顶级会议
作者：何凯明, Facebook AI Research；
Georgia Gkioxari, Facebook AI Research；
Piotr Dollar, Facebook AI Research；
Ross Girshick, Facebook AI Research
摘要：我们提出一个概念上简单，灵活，通用的物体实例分割框架（object instance segmentation）。我们的方法能有效检测图像中的对象，同时为每个实例生成高质量的分割掩膜（segmentation mask）。我们将该方法称为 Mask R-CNN，是在 Faster R-CNN 上的扩展，即在用于边界框识别的现有分支上添加一个并行的用于预测对象掩膜（object mask）的分支。Mask R-CNN 的训练简单，仅比 Faster R-CNN 多一点系统开销，运行速度是 5 fps。此外，Mask R-CNN 很容易推广到其他任务，例如可以用于在同一个框架中判断人的姿势。
我们在 COCO 竞赛的3个任务上都得到最佳结果，包括实例分割，边界框对象检测，以及人物关键点检测。没有使用其他技巧，Mask R-CNN 在每个任务上都优于现有的单一模型，包括优于 COCO 2016 竞赛的获胜模型。我们希望这个简单而有效的方法将成为一个可靠的基准，有助于未来的实例层面识别的研究。
Deep Neural Decision Forests
作者：Peter Kontschieder, Microsoft Research；
Madalina Fiterau, Carnegie Mellon University；
Antonio Criminisi, Microsoft Research；
Samuel Rota Bulò, Microsoft Research
摘要：We present Deep Neural Decision Forests – a novel ap-proach that unifies classification trees with the representa-
tion learning functionality known from deep convolutional networks, by training them in an end-to-end manner. To combine these two worlds, we introduce a stochastic and differentiable decision tree model, which steers the rep-resentation learning usually conducted in the initial lay-ers of a (deep) convolutional network. Our model differs from conventional deep networks because a decision for-est provides the final predictions and it differs from con-ventional decision forests since we propose a principled,joint and global optimization of split and leaf node param-eters. We show experimental results on benchmark machine learning datasets like MNIST and ImageNet and find on-par or superior results when compared to state-of-the-art
deep models. Most remarkably, we obtain Top5-Errors of only 7.84%/6.38% on ImageNet validation data when in-tegrating our forests in a single-crop, single/seven model GoogLeNet architecture, respectively. Thus, even without any form of training data set augmentation we are improv-ing on the 6.67% error obtained by the best GoogLeNet ar-chitecture (7 models, 144 crops).
From Large Scale Image Categorization to Entry-Level Categories
作者：Vicente Ordonez, University of North Carolina at Chapel Hill；
Jia Deng, Stanford University；
Yejin Choi, Stony Brook University；
Alexander Berg, University of North Carolina at Chapel Hill；
Tamara Berg, University of North Carolina at Chapel Hill
摘要：Entry level categories – the labels people will use to name an object – were originally defined and studied by
psychologists in the 1980s. In this paper we study entry-level categories at a large scale and learn the first mod-els for predicting entry-level categories for images. Our models combine visual recognition predictions with proxies for word “naturalness” mined from the enormous amounts of text on the web. We demonstrate the usefulness of our models for predicting nouns (entry-level words) associated with images by people. We also learn mappings between concepts predicted by existing visual recognition systems and entry-level concepts that could be useful for improv-ing human-focused applications such as natural language image description or retrieval.
作者：Devi Parikh, Toyota Technological Institute at Chicago；
Kristen Grauman, University of Texas at Austin
摘要：Human-nameable visual "attributes" can benefit various recognition tasks. However, existing techniques restrict these properties to categorical labels (for example, a person is 'smiling' or not, a scene is 'dry' or not), and thus fail to capture more general semantic relationships. We propose to model relative attributes. Given training data stating how object/scene categories relate according to different attributes, we learn a ranking function per attribute. The learned ranking functions predict the relative strength of each property in novel images. We then build a generative model over the joint space of attribute ranking outputs, and propose a novel form of zero-shot learning in which the supervisor relates the unseen object category to previously seen objects via attributes (for example, 'bears are furrier than giraffes'). We further show how the proposed relative attributes enable richer textual descriptions for new images, which in practice are more precise for human interpretation. We demonstrate the approach on datasets of faces and natural scenes, and show its clear advantages over traditional binary attribute prediction for these new tasks.
Discriminative models for multi-class object layout
作者：Chaitanya Desai, University of California Irvine；
Deva Ramanan, University of California Irvine；
Charless Fowlkes, University of California Irvine
摘要：Many state-of-the-art approaches for object rec-ognition reduce the problem to a 0-1 classification task. This allows one to leverage sophisticated machine learning tech-niques for training classifiers from labeled examples. How-ever, these models are typically trained independently for each class using positive and negative examples cropped from images. At test-time, various post-processing heuris-tics such as non-maxima suppression (NMS) are required to reconcile multiple detections within and between differ-ent classes for each image. Though crucial to good perfor-mance on benchmarks, this post-processing is usually de-fined heuristically.We introduce a unified model for multi-class object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image window independently, our model simultaneously predicts astructured labeling of the entire image (Fig. 1). Our model learns statistics that capture the spatial arrangements of var-ious object classes in real images, both in terms of which arrangements to suppress through NMS and which arrange-ments to favor through spatial co-occurrence statistics.We formulate parameter estimation in our model as a max-margin learning problem. Given training images with ground-truth object locations, we show how to formu-
late learning as a convex optimization problem. We em-ploy the cutting plane algorithm of Joachims et al. (Mach.Learn. 2009) to efficiently learn a model from thousands of training images. We show state-of-the-art results on the PASCAL VOC benchmark that indicate the benefits of learning a global model encapsulating the spatial layout of multiple object classes (a preliminary version of this work appeared in ICCV 2009, Desai et al., IEEE international conference on computer vision, 2009).
Population Shape Regression From Random Design Data
作者：Bradley Davis, University of North Carolina at Chapel Hill；
P. Thomas Fletcher, University of Utah；
Elizabeth Bullitt, University of North Carolina at Chapel Hill；
Sarang Joshi, University of Utah
摘要：Regression analysis is a powerful tool for the study of changes in a dependent variable as a function of an in-dependent regressor variable, and in particular it is ap-plicable to the study of anatomical growth and shape change. When the underlying process can be modeled by parameters in a Euclidean space, classical regression tech-niques [13, 34] are applicable and have been studied ex-tensively. However, recent work suggests that attempts to describe anatomical shapes using flat Euclidean spaces un-dermines our ability to represent natural biological vari-ability [9, 11].In this paper we develop a method for regression analy-sis ofgeneral, manifold-valueddata. Specifically, we extend Nadaraya-Watson kernel regression by recasting the regres-sion problem in terms of Fréchet expectation. Although this method is quite general, our driving problem is the study anatomical shape change as a function of age from random design image data.We demonstrate our method by analyzing shape change in the brain from a random design dataset of MR images of 89 healthy adult ranging in age from 22 to 79 years.To study the small scale changes in anatomy, we use the infinite dimensional manifold of diffeomorphic transforma-tions, with an associated metric. We regress a representa-tive anatomical shape, as a function of age, from this popu-lation.
Globally Optimal Estimates for Geometric Reconstruction Problems
作者：Fredrik Kahl, Lund University；Didier Henrion, LAAS-CNRS
摘要：We introduce a framework for computing statistically opti-mal estimates of geometric reconstruction problems. While traditional algorithms often suffer from either local minima or non-optimality - or a combination of both - we pursue the
goal of achieving global solutions of the statistically optimal cost-function.Our approach is based on a hierarchy of convex relax-ations to solve non-convex optimization problems with poly-nomials. These convex relaxations generate a monotone se-quence of lower bounds and we show how one can detect whether the globaloptimum is attainedat a given relaxation.
Thetechniqueis appliedtoanumberofclassicalvisionprob-lems: triangulation, camera pose, homography estimation andlast, but notleast, epipolargeometry estimation. Experi-mentalvalidationonbothsyntheticandrealdatais provided.In practice, only a few relaxations are needed for attaining the global optimum.
Detecting Pedestrians using Patterns of Motion and Appearance
作者：Paul Viola, Microsoft Research；
Michael J. Jones, Mitsubishi Electric Research Laboratories；
Daniel Snow, Mitsubishi Electric Research Laboratories
摘要：This paper describes a pedestrian detection system that in-tegrates image intensity information with motion informa-tion. We use a detection style algorithm that scans a detec-tor over two consecutive frames of a video sequence. The detector is trained (using AdaBoost) to take advantage of both motion and appearance information to detect a walk-
ing person. Past approaches have built detectors based on motion information or detectors based on appearance in-
formation, but ours is the first to combine both sources of information in a single detector. The implementation de-scribed runs at about 4 frames/second, detects pedestrians at very small scales (as small as 20x15 pixels), and has a very low false positive rate.Our approach builds on the detection work of Viola and Jones. Novel contributions of this paper include: i) devel-opment of a representation of image motion which is ex-tremely efficient, and ii) implementation of a state of the art pedestrian detection system which operates on low res-olution images under difficult conditions (such as rain and snow).
Image Parsing: Unifying Segmentation, Detection and Recognition
作者：Zhuowen Tu, University of California Los Angeles；
Xiangrong Chen, University of California Los Angeles；
Alan L. Yuille, University of California Los Angeles；
Song-Chun Zhu, University of California Los Angeles
摘要：In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation in a “parsing graph”, in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dy-namically using a set of reversible Markov chain jumps. This computational framework integrates two popular inference approaches – generative (top-down) methods and discrim-inative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter com-putes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters.In our Markov chain design, the posterior probability, defined by the generative models, isthe invariant (target) probability for the Markov chain, and the discriminative probabilitiesare used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper,we focus on two types of visual patterns – generic visual patterns, such as texture and shad-ing, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation .). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and con-versely object detection can be improved by using generic visual patterns to explain away shadows and occlusions.
Image-based Rendering using Image-based Priors
作者：Andrew Fitzgibbon, University of Oxford；
Yonatan Wexler, Weizmann Institute of Science；
Andrew Zisserman, University of Oxford
摘要：Given a set of images acquired from known viewpoints, we describe a method for synthesizing the image which would be seen from a new viewpoint. In contrast to existing tech-niques, which explicitly reconstruct the 3D geometry of the
scene, we transform the problem to the reconstruction of colour rather than depth. This retains the benefits of geo-metric constraints, but projects out the ambiguities in depth estimation which occur in textureless regions.On the other hand, regularization is still needed in or-der to generate high-quality images. The paper’s second contribution is to constrain the generated views to lie in the space of images whose texture statistics are those of the in-put images. This amounts to a image-based prior on the reconstruction which regularizes the solution, yielding re-alistic synthetic views. Examples are given of new view generation for cameras interpolated between the acquisi-tion viewpoints—which enables synthetic steadicam stabi-lization of a sequence with a high level of realism.
Probabilistic Tracking with Exemplars in a Metric Space
作者：Kentaro Toyama & Andrew Blake, Microsoft Research
摘要：A new, exemplar-based, probabilistic paradigm for visual tracking is presented. Probabilistic mecha-nisms are attractive because they handle fusion of information, especially temporal fusion, in a principled manner. Exemplarsareselectedrepresentativesofrawtrainingdata,usedheretorepresentprobabilisticmixturedistributions of object configurations. Their use avoids tedious hand-construction of object models, and problems with changes of topology. Usingexemplarsinplaceofaparameterizedmodelposesseveralchallenges,addressedherewithwhatwecallthe“Metric Mixture” (M 2 ) approach, which has a number of attractions. Principally, it provides alternatives to standard learning algorithms by allowing the use of metrics that are not embedded in a vector space. Secondly, it uses a noise model that is learned from training data. Lastly, it eliminates any need for an assumption of probabilistic pixelwise independence.Experiments demonstrate the effectiveness of the M 2 model in two domains: tracking walking people using “chamfer” distances on binary edge images, and tracking mouth movements by means of a shuffle distance.
The Space of All Stereo Images
作者：Steven Seitz, University of Washington
摘要：A theory of stereo image formation is presented that enables a complete classification of all possible stereo views, including non-perspective varieties. Towards this end, the notion of epipolar geometry is generalized to apply to multiperspective images. It is shown that any stereo pair must consist of rays lying on one of three varieties of quadric surfaces. A unified representation is developed to model all classes of stereo views, based on the concept of a quadric view. The benefits include aunified treatment of projection and triangulation operations for all stereo views. The framework is applied to derive new types of stereo image representations with unusual and useful properties. Experimental examples of these images are constructed and used to obtain 3D binocular object reconstructions.
Euclidean Reconstruction and Reprojection up to Subgroups
作者：Yi Ma, University of California Berkeley；
Stefano Soatto, Washington University in St. Louis；
Jana Kosecka, University of California Berkeley；
Shankar Sastry, University of California Berkeley
摘要：The necessary and sufficient conditionsfor being able to estimate scene structure, motion and camera calibration
from a sequence of images are very rarely satisfied in practice. What exactly can be estimated in sequences of
practical importance, when such conditions are not satisfied? In this paper we give a complete answer to this
question. For every camera motion that fails to meet the conditions, we give explicit formulas for the ambiguities
in the reconstructed scene, motion and calibration. Such a characterization is crucial both for designing robust
estimation algorithms (that do not try to recover parameters that cannot be recovered), and for generating novel
views of the scene by controlling the vantage point. To this end, we characterize explicitly all the vantage pointsthat
give rise to a valid Euclidean reprojection regardless of the ambiguity in the reconstruction. We also characterize
vantage points that generate views that are altogether invariant to the ambiguity. All the results are presented
using simple notation that involves no tensors nor complex projective geometry, and should be accessible with
basic background in linear algebra.
A Theory of Shape by Space Carving
作者：Kiriakos Kutulakos, University of Rochester；
Steven Seitz, Carnegie Mellon University
摘要： In this paper we consider the problem of computing the 3D shape of an unknown, arbitrarily-shaped
scene from multiple photographs taken at known but arbitrarily-distributed viewpoints. By studying the equivalence
class of all 3D shapes that reproduce the input photographs, we prove the existence of a special member of this class,
the photo hull, that (1) can be computed directly from photographs of the scene, and (2) subsumes all other members
of this class. We then give a provably-correct algorithm, called Space Carving, for computing this shape and present
experimental results on complex real-world scenes. The approach is designed to (1) capture photorealistic shapes
that accurately model scene appearance from a wide range of viewpoints, and (2) account for the complex interactions
between occlusion, parallax, shading, and their view-dependent effects on scene-appearance.
Self-Calibration and Metric Reconstruction Inspite of Varying and Unknown Intrinsic Camera Parameters
作者：Marc Pollefeys, Katholieke Universiteit Leuven；
Reinhard Koch, Katholieke Universiteit Leuven；
Luc Van Gool, Katholieke Universiteit Leuven
摘要：In this paper the theoretical and practical feasibility of self-calibration in the presence of varying intrinsic camera parameters is under investigation. The paper’s main contribution is to propose a self-calibration
method which efficiently deals with all kinds of constraints on the intrinsic camera parameters. Within this framework a practical method is proposed which can retrieve metric reconstruction from image sequences obtained with uncalibrated zooming/focusing cameras. The feasibility of the approach is illustrated on real and synthetic examples. Besides this a theoretical proof is given which shows that the absence of skew in the image plane is sufficient to allow forself-calibration. A counting argument is developed which –depending on the set of constraints–givesthe minimumsequence length forself-calibration and a method to detect critical motion sequencesis proposed
The Problem of Degeneracy in Structure and Motion Recovery from Uncalibrated Image Sequences
作者：Phil Torr, Microsoft Research；
Andrew Fitzgibbon, University of Oxford；
Andrew Zisserman, University of Oxford
A Theory of Specular Surface Geometry
作者：Michael Oren and Shree Nayar, Columbia University
Shape from Shading with Interreflections under a Proximal Light Source: Distortion-Free Copying of an Unfolded Book
作者：Toshikazu Wada, Hiroyuki Ukida, and Takashi Matsuyama
摘要：We address the problem of recovering the 3D shape of an unfolded book surface from the shading information in a scanner image. This shape-from-shading problem in a real world environment is made difficult by aproximal, movinglightsource, interreflections, specularreflections, andanonuniformalbedodistribution. Taking all these factors into account, we formulate the problem as an iterative, non-linear optimization problem. Piecewise polynomial models of the 3D shape and albedo distribution are introduced to efficiently and stably compute the shape in practice. Finally, we propose a method to restore the distorted scanner image based on the reconstructed 3D shape. The image restoration experiments for real book surfaces demonstrate that much of the geometric and photometric distortions are removed by our method.
Extracting Projective Structure from Single Perspective Views of 3D Point Sets
作者：Charles A. Rothwell, David A. Forsyth, Andrew Zisserman, and Joseph L. Mundy
Shape from Interreflections
作者：Shree Nayar, Katsushi Ikeuchi, and Takeo Kanade
摘要：AI1 shape-from-intensity methods assume that points in a scene are only illuminated by sources of light. Most scenes consist of concave surfaces and/or concavities that result from multiple objects in the scene. In such cases， points in the scene reflect light between themselves. In the presence of these interreflections, shape-from-intensity methods produce erroneous (pseudo) estimates of shape and reflectance. The pseudo shape and reflectance estimates, however, are shown to carry information about the actual shape and Eflectance of the surface. An iterative algorithm is presented that simultaneously recovers the actual shape and the actual reflectance from the pseudo estimates.The recovery algorithm works on Lambertian surfaces of arbitrary shape with possibly varying and unknown reflectance. The general behavior of the algorithm and its convergence properties are discussed. Both simulation as well as experimental results are included to demonstrate the accuracy and stability of the algorithm.
Color from Black and White
作者：Brian Funt and Jian Ho,
Optical Flow using Spatiotemporal Filters
△ 扫一扫关注 极市平台