This easy-to-follow textbook is the second of three volumes which provide a modern, algorithmic introduction to digital image processing, designed to be used both by learners desiring a firm foundation on which to build, and practitioners in search of critical analysis and concrete implementations of the most important techniques. This volume extends the introductory material presented in the first volume (<EM>Fundamental Techniques) with additional techniques that form part of the standard image-processing toolbox.

This thorough, reader-friendly text will equip undergraduates with a deeper understanding of the topic and will be invaluable for further developing knowledge via self-study.

Wilhelm Burger, Ph.D., is the director of the Digital Media degree programs at the Upper Austria University of Applied Sciences at Hagenberg.

Mark J. Burge, Ph.D., is a senior principal in the Center for National Security and Intelligence at Noblis in Washington, D.C.

Undergraduate Topics in Computer Science

For further volumes: http://www.springer.com/series/7592

Undergraduate Topics in Computer Science (UTiCS) delivers high-quality instructional content for undergraduates studying in all areas of computing and information science. From core foundational and theoretical material to final-year topics and applications, UTiCS books take a fresh, concise, and modern approach and are ideal for self-study or for a one- or two-semester course. The texts are all authored by established experts in their fields, reviewed by an international advisory board, and contain numerous examples and problems. Many include fully worked solutions.

Wilhelm Burger Mark J. Burge •

Principles of Digital Image Processing Core Algorithms

123

Wilhelm Burger University of Applied Sciences Hagenberg Austria [emailprotected]

Mark J. Burge noblis.org Washington, D.C. [emailprotected]

Series editor ´ Ian Mackie, Ecole Polytechnique, France and University of Sussex, UK Advisory board Samson Abramsky, University of Oxford, UK Chris Hankin, Imperial College London, UK Dexter Kozen, Cornell University, USA Andrew Pitts, University of Cambridge, UK Hanne Riis Nielson, Technical University of Denmark, Denmark Steven Skiena, Stony Brook University, USA Iain Stewart, University of Durham, UK David Zhang, The Hong Kong Polytechnic University, Hong Kong

Undergraduate Topics in Computer Science ISSN 1863-7310 ISBN 978-1-84800-194-7 e-ISBN 978-1-84800-195-4 DOI 10.1007/978-1-84800-195-4 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Control Number: 2008942518 c Springer-Verlag London Limited 2009 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. Printed on acid-free paper Springer Science+Business Media springer.com

Preface

This is the second volume of a book series that provides a modern, algorithmic introduction to digital image processing. It is designed to be used both by learners desiring a ﬁrm foundation on which to build and practitioners in search of critical analysis and modern implementations of the most important techniques. This updated and enhanced paperback edition of our comprehensive textbook Digital Image Processing: An Algorithmic Approach Using Java packages the original material into a series of compact volumes, thereby supporting a ﬂexible sequence of courses in digital image processing. Tailoring the contents to the scope of individual semester courses is also an attempt to provide aﬀordable (and “backpack-compatible”) textbooks without comprimising the quality and depth of content. This second volume, titled Core Algorithms, extends the introductory material presented in the ﬁrst volume (Fundamental Techniques) with additional techniques that are, nevertheless, part of the standard image processing toolbox. A forthcoming third volume (Advanced Techniques) will extend this series and add important material beyond the elementary level, suitable for an advanced undergraduate or even graduate course.

Math, Algorithms, and “Real” Code It has been our experience in teaching in this ﬁeld that mastering the core takes more than just reading about the techniques—it requires active construction and experimentation with the algorithms to acquire a feeling for how to use these methods in the real world. Internet search engines have made ﬁnding someone’s code for almost any imaging problem as simple as coming up with a succinct enough set of keywords. However, the problem is not to ﬁnd a solution, but developing one’s own and understanding how it works—or why it

vi

Preface

eventually does not. Whereas we feel that the real value of this series is not in its code, but rather in the critical selection of algorithms, illustrated explanations, and concise mathematical derivations, we continue to augment our algorithms with complete implementations, as even the best description of a method often omits some essential element necessary for the actual implementation, which only the unambiguous semantics of a real programming language can provide.

Online Resources The authors maintain a Website for this text that provides supplementary materials, including the complete Java source code for the examples, the test images used in the examples, and corrections. Visit this site at www.imagingbook.com Additional materials are available for educators, including a complete set of ﬁgures, tables, and mathematical elements shown in the text, in a format suitable for easy inclusion in presentations and course notes. Comments, questions, and corrections are welcome and should be addressed to [emailprotected]

Acknowledgements As with its predecessors, this book would not have been possible without the understanding and steady support of our families. Thanks go to Wayne Rasband (NIH) for developing and reﬁning ImageJ and for his truly outstanding support of the community. We appreciate the contribution from many careful readers who have contacted us to suggest new topics, recommend alternative solutions, or to suggest corrections. Finally, we are grateful to Wayne Wheeler for initiating this book series and Catherine Brett and her colleagues at Springer’s UK and New York oﬃces for their professional support.

Hagenberg, Austria / Washington DC, USA June 2008

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v

1.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Programming with Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 2 3

2.

Regions in Binary Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Finding Image Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Region Labeling with Flood Filling . . . . . . . . . . . . . . . . . . . 2.1.2 Sequential Region Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Region Labeling—Summary . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Region Contours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 External and Internal Contours . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Combining Region Labeling and Contour Finding . . . . . . 2.2.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Representing Image Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Matrix Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Run Length Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Chain Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Properties of Binary Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Shape Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Geometric Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Statistical Shape Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Moment-Based Geometrical Properties . . . . . . . . . . . . . . . . 2.4.5 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 6 6 11 17 17 18 20 22 25 26 26 27 28 32 32 33 36 38 44

viii

Contents

2.4.6 Topological Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.

Detecting Simple Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Salient Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Parameter Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Accumulator Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 A Better Line Representation . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Implementing the Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Filling the Accumulator Array . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Analyzing the Accumulator Array . . . . . . . . . . . . . . . . . . . . 3.3.3 Hough Transform Extensions . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Hough Transform for Circles and Ellipses . . . . . . . . . . . . . . . . . . . . 3.4.1 Circles and Arcs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Ellipses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49 49 50 51 54 54 55 56 56 60 63 64 66 67

4.

Corner Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Points of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Harris Corner Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Local Structure Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Corner Response Function (CRF) . . . . . . . . . . . . . . . . . . . . 4.2.3 Determining Corner Points . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Step 1: Computing the Corner Response Function . . . . . . 4.3.2 Step 2: Selecting “Good” Corner Points . . . . . . . . . . . . . . . 4.3.3 Displaying the Corner Points . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69 69 70 70 71 72 72 72 76 79 83 83 84

5.

Color Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Scalar Color Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Populosity algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Median-cut algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Octree algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Other methods for vector quantization . . . . . . . . . . . . . . . . 5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85 86 88 88 88 89 94 95

Contents

ix

6.

Colorimetric Color Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.1 CIE Color Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.1.1 CIE XYZ color space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.1.2 CIE x, y chromaticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.1.3 Standard illuminants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.1.4 Gamut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.1.5 Variants of the CIE color space . . . . . . . . . . . . . . . . . . . . . . 103 6.2 CIE L∗ a∗ b∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2.1 Transformation CIE XYZ → L∗ a∗ b∗ . . . . . . . . . . . . . . . . . . . 104 6.2.2 Transformation L∗ a∗ b∗ → CIE XYZ . . . . . . . . . . . . . . . . . . . 105 6.2.3 Measuring color diﬀerences . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.3 sRGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.3.1 Linear vs. nonlinear color components . . . . . . . . . . . . . . . . . 107 6.3.2 Transformation CIE XYZ→sRGB . . . . . . . . . . . . . . . . . . . . 108 6.3.3 Transformation sRGB→CIE XYZ . . . . . . . . . . . . . . . . . . . . 108 6.3.4 Calculating with sRGB values . . . . . . . . . . . . . . . . . . . . . . . . 109 6.4 Adobe RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.5 Chromatic Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.5.1 XYZ scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.5.2 Bradford adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.6 Colorimetric Support in Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.6.1 sRGB colors in Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.6.2 Proﬁle connection space (PCS) . . . . . . . . . . . . . . . . . . . . . . . 115 6.6.3 Color-related Java classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.6.4 A L∗ a∗ b∗ color space implementation . . . . . . . . . . . . . . . . . . 120 6.6.5 ICC proﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7.

Introduction to Spectral Techniques . . . . . . . . . . . . . . . . . . . . . . . . 125 7.1 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.1.1 Sine and Cosine Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.1.2 Fourier Series of Periodic Functions . . . . . . . . . . . . . . . . . . . 130 7.1.3 Fourier Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.1.4 Fourier Spectrum and Transformation . . . . . . . . . . . . . . . . . 131 7.1.5 Fourier Transform Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.1.6 Important Properties of the Fourier Transform . . . . . . . . . 136 7.2 Working with Discrete Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.2.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.2.2 Discrete and Periodic Functions . . . . . . . . . . . . . . . . . . . . . . 144 7.3 The Discrete Fourier Transform (DFT) . . . . . . . . . . . . . . . . . . . . . . 144 7.3.1 Deﬁnition of the DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

x

Contents

7.3.2 Discrete Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.3.3 Aliasing Again! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.3.4 Units in Signal and Frequency Space . . . . . . . . . . . . . . . . . . 152 7.3.5 Power Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.4 Implementing the DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.4.1 Direct Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.4.2 Fast Fourier Transform (FFT) . . . . . . . . . . . . . . . . . . . . . . . 155 7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.

The Discrete Fourier Transform in 2D . . . . . . . . . . . . . . . . . . . . . . 157 8.1 Deﬁnition of the 2D DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.1.1 2D Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 8.1.2 Implementing the Two-Dimensional DFT . . . . . . . . . . . . . . 158 8.2 Visualizing the 2D Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . 162 8.2.1 Range of Spectral Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 8.2.2 Centered Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 8.3 Frequencies and Orientation in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . 164 8.3.1 Eﬀective Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 8.3.2 Frequency Limits and Aliasing in 2D . . . . . . . . . . . . . . . . . . 164 8.3.3 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 8.3.4 Normalizing the 2D Spectrum . . . . . . . . . . . . . . . . . . . . . . . . 166 8.3.5 Eﬀects of Periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 8.3.6 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 8.3.7 Windowing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 8.4 2D Fourier Transform Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 8.5 Applications of the DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 8.5.1 Linear Filter Operations in Frequency Space . . . . . . . . . . . 175 8.5.2 Linear Convolution versus Correlation . . . . . . . . . . . . . . . . 177 8.5.3 Inverse Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 8.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

9.

The Discrete Cosine Transform (DCT) . . . . . . . . . . . . . . . . . . . . . 183 9.1 One-Dimensional DCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 9.1.1 DCT Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 9.1.2 Implementing the One-Dimensional DCT . . . . . . . . . . . . . . 186 9.2 Two-Dimensional DCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.2.1 Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 9.3 Other Spectral Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 9.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

Contents

xi

10. Geometric Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 10.1 2D Mapping Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 10.1.1 Simple Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 10.1.2 Homogeneous Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 10.1.3 Aﬃne (Three-Point) Mapping . . . . . . . . . . . . . . . . . . . . . . . . 195 10.1.4 Projective (Four-Point) Mapping . . . . . . . . . . . . . . . . . . . . . 197 10.1.5 Bilinear Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 10.1.6 Other Nonlinear Image Transformations . . . . . . . . . . . . . . . 204 10.1.7 Local Image Transformations . . . . . . . . . . . . . . . . . . . . . . . . 207 10.2 Resampling the Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 10.2.1 Source-to-Target Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 10.2.2 Target-to-Source Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 10.3 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 10.3.1 Simple Interpolation Methods . . . . . . . . . . . . . . . . . . . . . . . . 211 10.3.2 Ideal Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 10.3.3 Interpolation by Convolution . . . . . . . . . . . . . . . . . . . . . . . . 217 10.3.4 Cubic Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 10.3.5 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 10.3.6 Lanczos Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 10.3.7 Interpolation in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 10.3.8 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 10.4 Java Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 10.4.1 Geometric Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 238 10.4.2 Pixel Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 10.4.3 Sample Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 10.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 11. Comparing Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 11.1 Template Matching in Intensity Images . . . . . . . . . . . . . . . . . . . . . . 257 11.1.1 Distance between Image Patterns . . . . . . . . . . . . . . . . . . . . . 258 11.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 11.1.3 Matching under Rotation and Scaling . . . . . . . . . . . . . . . . . 267 11.2 Matching Binary Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 11.2.1 Direct Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 11.2.2 The Distance Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 11.2.3 Chamfer Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 11.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 A. Mathematical Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 A.1 Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 A.2 Set Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 A.3 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

xii

Contents

B. Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 B.1 Combined Region Labeling and Contour Tracing . . . . . . . . . . . . . 283 B.1.1 Contour_Tracing_Plugin (Class) . . . . . . . . . . . . . . . . . . . . 283 B.1.2 Contour (Class) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 B.1.3 BinaryRegion (Class) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 B.1.4 ContourTracer (Class) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 B.1.5 ContourOverlay (Class) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 B.2 Harris Corner Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 B.2.1 Harris_Corner_Plugin (Class) . . . . . . . . . . . . . . . . . . . . . . 294 B.2.2 File Corner (Class) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 B.2.3 File HarrisCornerDetector (Class) . . . . . . . . . . . . . . . . . . 296 B.3 Median-Cut Color Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 B.3.1 ColorQuantizer (Interface) . . . . . . . . . . . . . . . . . . . . . . . . . 301 B.3.2 MedianCutQuantizer (Class) . . . . . . . . . . . . . . . . . . . . . . . . 301 B.3.3 ColorHistogram (Class) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 B.3.4 Median_Cut_Quantization (Class) . . . . . . . . . . . . . . . . . . . 310 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321

1

Introduction

Today, IT professionals must be more than simply familiar with digital image processing. They are expected to be able to knowledgeably manipulate images and related digital media and, in the same way, software engineers and computer scientists are increasingly confronted with developing programs, databases, and related systems that must correctly deal with digital images. The simple lack of practical experience with this type of material, combined with an often unclear understanding of its basic foundations and a tendency to underestimate its diﬃculties, frequently leads to ineﬃcient solutions, costly errors, and personal frustration. In fact, it often appears at ﬁrst glance that a given image processing task will have a simple solution, especially when it is something that is easily accomplished by our own visual system. Yet, in practice, it turns out that developing reliable, robust, and timely solutions is diﬃcult or simply impossible. This is especially true when the problem involves image analysis; that is, where the ultimate goal is not to enhance or otherwise alter the appearance of an image but instead to extract meaningful information about its contents—be it distinguishing an object from its background, following a street on a map, or ﬁnding the bar code on a milk carton, tasks such as these often turn out to be much more diﬃcult to accomplish than we would anticipate at ﬁrst. We expect technology to improve on what we as humans can do by ourselves. Be it as simple as a lever to lift more weight or binoculars to see farther or as complex as an airplane to move us across continents—science has created so much that improves on, sometimes by unbelievable factors, what our biological systems are able to perform. So, it is perhaps humbling to discover

W. Burger, M.J. Burge, Principles of Digital Image Processing, Undergraduate Topics in Computer Science, DOI 10.1007/978-1-84800-195-4_1, © Springer-Verlag London Limited, 2009

2

1. Introduction

that today’s technology is nowhere near as capable, when it comes to image analysis, as our own visual system. Although it is possible that this will always remain true, we should not be discouraged, but instead consider this a creative engineering challenge. On the other hand, image processing technology has become a reliable and indispensable element in many everyday applications. As in every engineering discipline, sound knowledge of elementary concepts, careful design, and professional implementation are the essential keys to success.

1.1 Programming with Images Even though the term “image processing” is often used interchangeably with that of “image editing”, we introduce the following more precise deﬁnitions. Digital image editing, or, as it is sometimes referred to, digital imaging, is the manipulation of digital images using an existing software application such as Adobe Photoshop or Corel Paint. Digital image processing, on the other hand, is the conception, design, development, and enhancement of digital imaging programs. Modern programming environments, with their extensive APIs (application programming interfaces), make practically every aspect of computing, be it networking, databases, graphics, sound, or imaging, easily available to nonspecialists. The possibility of developing a program that can reach into an image and manipulate the individual elements at its very core is fascinating and seductive. You will discover that with the right knowledge, an image becomes ultimately no more than a simple array of values, that with the right tools you can manipulate in any way imaginable. Computer graphics, in contrast to digital image processing, concentrates on the synthesis of digital images from geometrical descriptions such as threedimensional object models [22, 27, 77]. Although graphics professionals today tend to be interested in topics such as realism and, especially in terms of computer games, rendering speed, the ﬁeld does draw on a number of methods that originate in image processing, such as image transformation (morphing), reconstruction of 3D models from image data, and specialized techniques such as image-based and nonphotorealistic rendering [57, 78]. Similarly, image processing makes use of a number of ideas that have their origin in computational geometry and computer graphics, such as volumetric (voxel) models in medical image processing. The two ﬁelds perhaps work closest when it comes to digital postproduction of ﬁlm and video and the creation of special eﬀects [79]. This book provides a thorough grounding in the eﬀective processing of not only images but also sequences of images—that is, videos.

1.2 Image Analysis

3

1.2 Image Analysis Although image analysis is not the central theme of this book, most methods described here exhibit a certain “analytical ﬂavor” that adds to the elementary “pixel crunching” techniques described in the preceding volume [14]. This intersection becomes evident in tasks like segmenting image regions (Ch. 2), detecting simple curves and corners (Chs. 3–4), or comparing images (Ch. 11) at the pixel level. All these methods work directly on the pixel data in a bottomup way without recourse to any domain-speciﬁc or “semantic” knowledge. In some sense, one could describe all these methods as “dumb and blind”, which diﬀerentiates them from the approach pursued in pattern recognition and computer vision. Although these two disciplines are ﬁrmly grounded in, and rely heavily on, image processing, their ultimate goals are much loftier. Pattern recognition is primarily a mathematical discipline and has been responsible for techniques such as probabilistic modeling, clustering, decision trees, or principal component analysis (PCA), which are used to discover patterns in data and signals. Methods from pattern recognition have been applied extensively to problems arising in computer vision and image analysis. A good example of their successful application is optical character recognition (OCR), where robust, highly accurate turnkey solutions are available for recognizing scanned text. Pattern recognition methods are truly universal and have been successfully applied not only to images but also speech and audio signals, text documents, stock trades, and for ﬁnding trends in large databases, where it is often called “data mining”. Dimensionality reduction, statistical, and syntactical methods play important roles in pattern recognition (see, for example, [21, 55, 72]). Computer vision tackles the problem of engineering artiﬁcial visual systems capable of somehow comprehending and interpreting our real, threedimensional world. Popular topics in this ﬁeld include scene understanding, object recognition, motion interpretation (tracking), autonomous navigation, and the robotic manipulation of objects in a scene. Since computer vision has its roots in artiﬁcial intelligence (AI), many AI methods were originally developed to either tackle or represent a problem in computer vision (see, for example, [19, Ch. 13]). The ﬁelds still have much in common today, especially in terms of adaptive methods and machine learning. Further literature on computer vision includes [2, 24, 35, 65, 69, 73]. Ultimately, you will ﬁnd image processing to be both intellectually challenging and professionally rewarding, as the ﬁeld is ripe with problems that were originally thought to be relatively simple to solve but have, to this day, refused to give up their secrets. With the background and techniques presented in this text, you will not only be able to develop complete image processing solutions

4

1. Introduction

but will also have the prerequisite knowledge to tackle unsolved problems and the real possibility of expanding the horizons of science.

2

Regions in Binary Images

In binary images, a pixel can take on exactly one of two values. These values are often thought of as representing the “foreground” and “background” in the image, even though these concepts often are not applicable to natural scenes. In this chapter we focus on connected regions in images and how to isolate and describe such structures. Let us assume that our task is to devise a procedure for ﬁnding the number and type of objects contained in a ﬁgure like Fig. 2.1. As long as we continue

Figure 2.1 Binary image with nine objects. Each object corresponds to a connnected region of related foreground pixels.

W. Burger, M.J. Burge, Principles of Digital Image Processing, Undergraduate Topics in Computer Science, DOI 10.1007/978-1-84800-195-4_2, © Springer-Verlag London Limited, 2009

6

2. Regions in Binary Images

to consider each pixel in isolation, we will not be able to determine how many objects there are overall in the image, where they are located, and which pixels belong to which objects. Therefore our ﬁrst step is to ﬁnd each object by grouping together all the pixels that belong to it. In the simplest case, an object is a group of touching foreground pixels; that is, a connected binary region.

2.1 Finding Image Regions In the search for binary regions, the most important tasks are to ﬁnd out which pixels belong to which regions, how many regions are in the image, and where these regions are located. These steps usually take place as part of a process called region labeling or region coloring. During this process, neighboring pixels are pieced together in a stepwise manner to build regions in which all pixels within that region are assigned a unique number (“label”) for identiﬁcation. In the following sections, we describe two variations on this idea. In the ﬁrst method, region marking through ﬂood ﬁlling, a region is ﬁlled in all directions starting from a single point or “seed” within the region. In the second method, sequential region marking, the image is traversed from top to bottom, marking regions as they are encountered. In Sec. 2.2.2, we describe a third method that combines two useful processes, region labeling and contour ﬁnding, in a single algorithm. Independent of which of the methods above we use, we must ﬁrst settle on either the 4- or 8-connected deﬁnition of neighboring (see Vol. 1 [14, Fig. 7.5]) for determining when two pixels are “connected” to each other, since under each deﬁnition we can end up with diﬀerent results. In the following regionmarking algorithms, we use the following convention: the original binary image I(u, v) contains the values 0 and 1 to mark the background and foreground, respectively; any other value is used for numbering (labeling) the regions, i. e., the pixel values are ⎧ a background pixel ⎨0 I(u, v) = 1 a foreground pixel ⎩ 2, 3, . . . a region label.

2.1.1 Region Labeling with Flood Filling The underlying algorithm for region marking by ﬂood ﬁlling is simple: search for an unmarked foreground pixel and then ﬁll (visit and mark) all the rest of the neighboring pixels in its region (Alg. 2.1). This operation is called a “ﬂood ﬁll” because it is as if a ﬂood of water erupts at the start pixel and ﬂows out across a ﬂat region. There are various methods for carrying out the ﬁll operation that

2.1 Finding Image Regions

7

Algorithm 2.1 Region marking using flood filling (Part 1). The binary input image I uses the value 0 for background pixels and 1 for foreground pixels. Unmarked foreground pixels are searched for, and then the region to which they belong is ﬁlled. The actual FloodFill() procedure is described in Alg. 2.2.

1: RegionLabeling(I)

I: binary image; I(u, v) = 0: background, I(u, v) = 1: foreground The image I is labeled (destructively modiﬁed) and returned. 2: 3: 4: 5: 6: 7:

Let m ← 2 value of the next label to be assigned for all image coordinates (u, v) do if I(u, v) = 1 then FloodFill(I, u, v, m) use any version from Alg. 2.2 m ← m + 1. return the labeled image I.

ultimately diﬀer in how to select the coordinates of the next pixel to be visited during the ﬁll. We present three diﬀerent ways of performing the FloodFill() procedure: a recursive version, an iterative depth-ﬁrst version, and an iterative breadth-ﬁrst version (see Alg. 2.2): (A) Recursive Flood Filling: The recursive version (Alg. 2.2, lines 1–8) does not make use of explicit data structures to keep track of the image coordinates but uses the local variables that are implicitly allocated by recursive procedure calls.1 Within each region, a tree structure, rooted at the starting point, is deﬁned by the neighborhood relation between pixels. The recursive step corresponds to a depth-ﬁrst traversal [20] of this tree and results in very short and elegant program code. Unfortunately, since the maximum depth of the recursion—and thus the size of the required stack memory—is proportional to the size of the region, stack memory is quickly exhausted. Therefore this method is risky and really only practical for very small images. (B) Iterative Flood Filling (depth-first): Every recursive algorithm can also be reformulated as an iterative algorithm (Alg. 2.2, lines 9–20) by implementing and managing its own stacks. In this case, the stack records the “open” (that is, the adjacent but not yet visited) elements. As in the recursive version (A), the corresponding tree of pixels is traversed in depthﬁrst order. By making use of its own dedicated stack (which is created in the much larger heap memory), the depth of the tree is no longer limited 1

In Java, and similar imperative programming languages such as C and C++, local variables are automatically stored on the call stack at each procedure call and restored from the stack when the procedure returns.

8

2. Regions in Binary Images

Algorithm 2.2 Region marking using flood filling (Part 2). FloodFill() procedure: recursive, depth-first, and breadth-first.

1: FloodFill(I, u, v, label)

Three variations of the

Recursive Version

if (u, v) is inside the image and I(u, v) = 1 then Set I(u, v) ← label FloodFill(I, u+1, v, label) FloodFill(I, u, v+1, label) FloodFill(I, u, v−1, label) FloodFill(I, u−1, v, label) return. 9: FloodFill(I, u, v, label) Depth-First Version 2: 3: 4: 5: 6: 7: 8:

10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32:

Create an empty stack S Put the seed coordinate (u, v) onto the stack: Push(S, (u, v)) while S is not empty do Get the next coordinate from the top of the stack: (x, y) ← Pop(S) if (x, y) is inside the image and I(x, y) = 1 then Set I(x, y) ← label Push(S, (x+1, y)) Push(S, (x, y+1)) Push(S, (x, y−1)) Push(S, (x−1, y)) return. FloodFill(I, u, v, label) Breadth-First Version Create an empty queue Q Insert the seed coordinate (u, v) into the queue: Enqueue(Q, (u, v)) while Q is not empty do Get the next coordinate from the front of the queue: (x, y) ← Dequeue(Q) if (x, y) is inside the image and I(x, y) = 1 then Set I(x, y) ← label Enqueue(Q, (x+1, y)) Enqueue(Q, (x, y+1)) Enqueue(Q, (x, y−1)) Enqueue(Q, (x−1, y)) return.

2.1 Finding Image Regions

9

to the size of the call stack. (C) Iterative Flood Filling (breadth-first): In this version, pixels are traversed in a way that resembles an expanding wave front propagating out from the starting point (Alg. 2.2, lines 21–32). The data structure used to hold the as yet unvisited pixel coordinates is in this case a queue instead of a stack, but otherwise it is identical to version B. Java implementation The recursive version (A) of the algorithm corresponds practically 1:1 to its Java implementation. However, a normal Java runtime environment does not support more than about 10,000 recursive calls of the FloodFill() procedure (Alg. 2.2, line 1) before the memory allocated for the call stack is exhausted. This is only suﬃcient for relatively small images with fewer than approximately 200 × 200 pixels. Program 2.1 gives the complete Java implementation for both variants of the iterative FloodFill() procedure. In implementing the stack (S) in the iterative depth-ﬁrst Version (B), we use the stack data structure provided by the Java class Stack (Prog. 2.1, line 1), which serves as a container for generic Java objects. For the queue data structure (Q) in the breadth-ﬁrst variant (C), we use the Java class LinkedList2 with the methods addFirst(), removeLast(), and isEmpty() (Prog. 2.1, line 18). We have speciﬁed as a type parameter for both generic container classes so they can only contain objects of type Point.3 Figure 2.2 illustrates the progress of the region marking in both variants within an example region, where the start point (i. e., seed point), which would normally lie on a contour edge, has been placed arbitrarily within the region in order to better illustrate the process. It is clearly visible that the depthﬁrst method ﬁrst explores one direction (in this case horizontally to the left) completely (that is, until it reaches the edge of the region) and only then examines the remaining directions. In contrast the breadth-ﬁrst method markings proceed outward, layer by layer, equally in all directions. Due to the way exploration takes place, the memory requirement of the breadth-ﬁrst variant of the ﬂood-ﬁll version is generally much lower than that of the depth-ﬁrst variant. For example, when ﬂood ﬁlling the region in Fig. 2.2 (using the implementation given Prog. 2.1), the stack in the depth-ﬁrst variant 2 3

The class LinkedList is a part of the Java Collection Framework (see also Vol. 1 [14, Appendix B.2]). Generic types and templates (i. e., the ability to specify a parameterization for a container) have only been available since Java 5 (1.5).

10

2. Regions in Binary Images

Depth-ﬁrst variant (using a stack ): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

void floodFill(int x, int y, int label) { Stack s = new Stack(); // stack s.push(new Point(x,y)); while (!s.isEmpty()){ Point n = s.pop(); int u = n.x; int v = n.y; if ((u>=0) && (u=0) && (v=0) && (u=0) && (v 1 then set I(u, v) ← nk else if several neighbors of (u, v) have label values nj > 1 then Select one of them as the new label: I(u, v) ← k ∈ {nj }. for all other neighbors of (u, v) with label values ni > 1 and ni = k do Create a new label collision: ci = ni , k. Record the collision: C ← C ∪ {ci }

22:

Remark: The image I now contains label values 0, 2, . . . m − 1. return (m, C).

7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19:

continued in Alg. 2.4

2.1 Finding Image Regions

15

Algorithm 2.4 Sequential region labeling (Part 2).

1: ResolveLabelCollisions(m, C)

Resolves the label collisions contained in the set C. Returns R, a vector of sets that represents a partitioning of the complete label set into equivalent labels. 2: 3:

4: 5:

6: 7:

Let L = {2, 3, . . . m − 1} be the set of preliminary region labels. Create a partitioning of L as a vector of sets, one set for each label value: R ← [R2 , R3 , . . . , Rm−1 ] = [{2}, {3}, {4}, . . . , {m − 1}], so Ri = {i} for all i ∈ L. for all collisions a, b ∈ C do Find in R the sets Ra , Rb : Ra ← the set that currently contains label a Rb ← the set that currently contains label b if Ra = Rb (a and b are contained in diﬀerent sets) then Merge sets Ra and Rb by moving all elements of Rb to Ra : Ra ← Ra ∪ Rb , Rb ← {} Remark: All equivalent label values (i. e., all labels of pixels in the same region) are now contained in the same set Ri within R.

8:

return R.

9: RelabelImage(I, R)

Relabels the image I using the label partitioning in R. The image I is modiﬁed. 10: 11: 12: 13:

14: 15:

for all image locations (u, v) do if I(u, v) > 1 then I(u, v) contains a region label Find the set Ri in R that contains the label I(u, v) Choose one unique representative element k from the set Ri , e. g., the minimum value: k = min(Ri ) Replace the image label: I(u, v) ← k return.

Label collisions. In the case where two or more neighbors have labels belonging to diﬀerent regions, then a label collision has occurred; that is, pixels within a single connected region have diﬀerent labels. For example, in a U-shaped region, the pixels in the left and right arms are at ﬁrst assigned diﬀerent labels

16

2. Regions in Binary Images

0 0 0 0 0 0 0 0

0 0 5 0 6 0 7 0

0 0 5 0 6 0 7 0

0 0 5 0 2 0 0 0

0 0 2 2 2 2 0 0

0 2 2 0 2 2 0 0

0 2 2 2 2 2 2 0

0 0 0 0 2 2 0 0

0 0 0 0 2 2 2 0

(a)

0 3 3 0 2 2 0 0

0 3 0 0 2 2 0 0

0 0 0 0 2 2 0 0

0 4 4 4 2 2 0 0

0 0 0 0 0 0 0 0

2

5 7

6

3 4

(b)

Figure 2.4 Sequential region labeling—intermediate result after Step 1. Label collisions indicated by circles (a); the nodes of the undirected graph (b) correspond to the labels, and its edges correspond to the collisions.

since it is not immediately apparent that they are actually part of a single region. The two labels will propagate down independently from each other until they eventually collide in the lower part of the “U” (Fig. 2.3 (d)). When two labels a, b collide, then we know that they are actually “equivalent”; i. e., they are contained in the same image region. These collisions are registered but otherwise not dealt with during the ﬁrst step. Once all collisions have been registered, they are then resolved in the second step of the algorithm. The number of collisions depends on the content of the image. There can be only a few or very many collisions, and the exact number is only known at the end of the ﬁrst step, once the whole image has been traversed. For this reason, collision management must make use of dynamic data structures such as lists or hash tables. Upon the completion of the ﬁrst steps, all the original foreground pixels have been provisionally marked, and all the collisions between labels within the same regions have been registered for subsequent processing. The example in Fig. 2.4 illustrates the state upon completion of step 1: all foreground pixels have been assigned preliminary labels (Fig. 2.4 (a)), and the following collisions (depicted by circles) between the labels 2, 4, 2, 5, and 2, 6 have been registered. The labels L = {2, 3, 4, 5, 6, 7} and collisions C = {2, 4, 2, 5, 2, 6} correspond to the nodes and edges of an undirected graph (Fig. 2.4 (b)). Step 2: Resolving collisions The task in the second step is to resolve the label collisions that arose in the ﬁrst step in order to merge the corresponding “partial” regions. This process is nontrivial since it is possible for two regions with diﬀerent labels to be connected transitively (e. g., a, b ∩ b, c ⇒ a, c ) through a third region or, more generally, through a series of regions. In fact, this problem is identical to the problem of ﬁnding the connected components of a graph [20], where the labels L determined in Step 1 constitute the “nodes” of the graph and the registered

2.2 Region Contours

17

collisions C make up its “edges” (Fig. 2.4 (b)). Step 3: Relabeling the image Once all the distinct labels within a single region have been collected, the labels of all the pixels in the region are updated so they carry the same label (for example, chosing the smallest label number in the region), as shown in Fig. 2.5. 0 0 0 0 0 0 0 0

0 0 2 0 2 0 7 0

0 0 2 0 2 0 7 0

0 0 2 0 2 0 0 0

0 0 2 2 2 2 0 0

0 2 2 0 2 2 0 0

0 2 2 2 2 2 2 0

0 0 0 0 2 2 0 0

0 0 0 0 2 2 2 0

0 3 3 0 2 2 0 0

0 3 0 0 2 2 0 0

0 0 0 0 2 2 0 0

0 2 2 2 2 2 0 0

0 0 0 0 0 0 0 0

Figure 2.5 Sequential region labeling—ﬁnal result after Step 3. All equivalent labels have been replaced by the smallest label within that region.

2.1.3 Region Labeling—Summary In this section, we described a selection of algorithms for ﬁnding and labeling connected regions in images. We discovered that the elegant idea of labeling individual regions using a simple recursive ﬂood-ﬁlling method (Sec. 2.1.1) was not useful because of practical limitations on the depth of recursion and the high memory costs associated with it. We also saw that classical sequential region labeling (Sec. 2.1.2) is relatively complex and oﬀers no real advantage over iterative implementations of the depth-ﬁrst and breadth-ﬁrst methods. In practice, the iterative breadth-ﬁrst method is generally the best choice for large and complex images.

2.2 Region Contours Once the regions in a binary image have been found, the next step is often to ﬁnd the contours (that is, the outlines) of the regions. Like so many other tasks in image processing, at ﬁrst glance this appears to be an easy one: simply follow along the edge of the region. We will see that, in actuality, describing this apparently simple process algorithmically requires careful thought, which has made contour ﬁnding one of the classic problems in image analysis.

18

2. Regions in Binary Images

Label 2 3 4 5 6 7 8 9 10

Area (pixels) 14978 36156 25904 2024 2293 4394 29777 20724 16566

Bounding Box (left, top, right, bottom) (887, 21, 1144, 399) ( 40, 37, 438, 419) (464, 126, 841, 382) (387, 281, 442, 341) (244, 367, 342, 506) (406, 400, 507, 512) (510, 416, 883, 765) (833, 497, 1168, 759) ( 82, 558, 411, 821)

Center (xc , yc ) (1049.7, 242.8) ( 261.9, 209.5) ( 680.6, 240.6) ( 414.2, 310.6) ( 294.4, 439.0) ( 454.1, 457.3) ( 704.9, 583.9) (1016.0, 624.1) ( 208.7, 661.6)

Figure 2.6 Example of a complete region labeling. The pixels within each region have been colored according to the consecutive label values 2, 3, . . . 10 they were assigned. The corresponding region statistics are shown in the table below (total image size is 1212 × 836).

2.2.1 External and Internal Contours As we discussed in Vol. 1 [14, Sec. 7.2.7], the pixels along the edge of a binary region (that is, its border) can be identiﬁed using simple morphological operations and diﬀerence images. It must be stressed, however, that this process only marks the pixels along the contour, which is useful, for instance, for display purposes. In this section, we will go one step further and develop an algorithm for obtaining an ordered sequence of border pixel coordinates for describing a region’s contour. Note that connected image regions contain exactly one outer contour, yet, due to holes, they can contain arbitrarily many inner contours. Within such

2.2 Region Contours

19

Outer Contour Inner Contour

Figure 2.7 Binary image with outer and inner contours. The outer contour lies along the outside of the foreground region (dark). The inner contour surrounds the space within the region, which may contain further regions (holes), and so on.

holes, smaller regions may be found, which will again have their own outer contours, and in turn these regions may themselves contain further holes with even smaller regions, and so on in a recursive manner (Fig. 2.7). An additional complication arises when regions are connected by parts that taper down to the width of a single pixel. In such cases, the contour can run through the same pixel more than once and from diﬀerent directions (Fig. 2.8). Therefore, when tracing a contour from a start point xS , returning to the start point is not a suﬃcient condition for terminating the contour tracing process. Other factors, such as the current direction along which contour points are being traversed, must be taken into account. One apparently simple way of determining a contour is to proceed in analogy to the two-stage process presented in the previous section (2.1); that is, to ﬁrst identify the connected regions in the image and second, for each region, proceed around it, starting from a pixel selected from its border. In the same way, an internal contour can be found by starting at a border pixel of a region’s hole. A wide range of algorithms based on ﬁrst ﬁnding the regions and then following along their contours have been published, including [61], [57, pp. 142–148], and [65, p. 296]. However, while the idea of contour tracing is simple in essence, the implementation requires careful record-keeping and is complicated by special cases such as the single-pixel bridges described in the previous section. As a modern alternative, we present the following combined algorithm that, in contrast to the classical methods above, combines contour ﬁnding and region labeling in a single process.

20

2. Regions in Binary Images

xS

Figure 2.8 The path along a contour as an ordered sequence of pixel coordinates with a given start point xS . Individual pixels may occur (be visited) more than once within the path, and a region consisting of a single isolated pixel will also have a contour (bottom right).

2.2.2 Combining Region Labeling and Contour Finding This method, based on [18], combines the concepts of sequential region labeling (Sec. 2.1) and traditional contour tracing into a single algorithm able to perform both tasks simultaneously during a single pass through the image. It identiﬁes and labels regions and at the same time traces both their inner and outer contours. The algorithm does not require any complicated data structures and is very eﬃcient when compared with other methods with similar capabilities. The key steps of this method are described below and illustrated in Fig. 2.9: 1. As in the sequential region labeling (Alg. 2.3), the binary image I is traversed from the top left to the bottom right. Such a traversal ensures that all pixels in the image are eventually examined and assigned an appropriate label. 2. At a given position in the image, the following cases may occur: Case A: The transition from a foreground pixel to a previously unmarked foreground pixel (A in Fig. 2.9 (a)) means that this pixel lies on the outer edge of a new region. A new label is assigned and the associated outer contour is traversed and marked by calling the method TraceContour() (see Fig. 2.9 (a) and Alg. 2.5 (line 19)). Furthermore, all background pixels directly bordering the region are marked with the special label −1. Case B: The transition from a foreground pixel (B in Fig. 2.9 (b)) to an

2.2 Region Contours

···

21

A B

(a)

(b)

B C

(c)

(d)

Figure 2.9 Combined region labeling and contour following (after [18]). The image is traversed from the top left to the lower right a row at a time. In (a), the ﬁrst point A on the outer edge of the region is found. Starting from point A, the pixels on the edge along the outer contour are visited and labeled until A is reached again. In (b), the ﬁrst point B on an inner contour is found. The pixels along the inner contour are visited and labeled until arriving back at B (c). In (d), an already labeled point C on an inner contour is found. Its label is propagated along the image row within the region.

unmarked background pixel means that this pixel lies on an inner contour. Starting from B, the inner contour is traversed and its pixels are marked with labels from the surrounding region (Fig. 2.9 (c)). Also, all bordering background pixels are again assigned the special label value −1. Case C: When a foreground pixel does not lie on a contour, then the neighboring pixel to the left has already been labeled (Fig. 2.9 (d)) and this label is propagated to the current pixel.

22

2. Regions in Binary Images

In Algorithms 2.5 and 2.6, the entire procedure is presented again and explained precisely. The method CombinedContourLabeling() traverses the image line-by-line and calls the method TraceContour() whenever a new inner or outer contour must be traced. The labels of the image elements along the contour, as well as the neighboring foreground pixels, are stored in the “label map” L (a rectangular array of the same size as the image) by the method FindNextPoint() in Alg. 2.6.

2.2.3 Implementation While the main idea of the algorithm can be sketched out in a few simple steps, the actual implementation requires attention to a number of details, so we have provided the complete Java source for an ImageJ plugin implementation in Appendix B (pp. 283–293). The implementation closely follows the description in Algs. 2.5 and 2.6 but illustrates several additional details:4 – The task is performed by methods of the class ContourTracer. First the image I (pixelArray) and the associated label map L (labelArray) are enlarged by padding one layer of elements around their borders. The new pixels are marked as background (0) in the image I. This simpliﬁes contour following and eliminates the need to handle a number of special situations. – As contours are found they are turned into objects of class Contour and collected in two separate lists: outerContours and innerContours. Every contour consists of an ordered sequence of coordinate points of the standard class Point (deﬁned in java.awt). The Java container class ArrayList (templated on the type Point) is used as a dynamic data structure for storing the point sequences of the outer and inner contours. – The method traceContour() (see p. 289) traverses an outer or inner contour, beginning from the starting point xS (xS, yS). It calls the method findNextPoint(), to determine the next contour point xT (xT, yT) following xS : – In the case that no following point is found, then xS = xT and the region (contour) consists of a single isolated pixel. The method traceContour() is ﬁnished. – In the other case the remaining contour points are found by repeatedly calling findNextPoint(), and for every successive pair of points the current point xc (xC, yC) and the previous point xp (xP, yP) are recorded. Only when both points correspond to the original starting 4

In the following description the names in parentheses after the algorithmic symbols denote the corresponding identiﬁers used in the Java implementation.

2.2 Region Contours

23

Algorithm 2.5 Combined contour tracing and region labeling (Part 1). Given a binary image I, the method CombinedContourLabeling() returns a set of contours and an array containing region labels for all pixels in the image. When a new point on either an outer or inner contour is found, then an ordered list of the contour’s points is constructed by calling the method TraceContour() (line 19 and line 26). TraceContour() itself is described in Alg. 2.6.

1: CombinedContourLabeling (I)

I: binary image. Returns the sets of outer and inner contours and a label map. 2: 3: 4: 5: 6:

Couter ← {}, Cinner ← {} create two empty sets of contours Create a label map L of the same size as I and initialize: for all image locations (u, v) do L(u, v) ← 0 label map L R←0 region counter R

7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29:

Scan the image from left to right and top to bottom: for v ← 0 . . . N −1 do l←0 set the current label l to “none” for u ← 0 . . . M −1 do if I(u, v) is a foreground pixel then if (l = 0) then continue inside region L(u, v) ← l else l ← L(u, v) if (l = 0) then hit a new outer contour R←R+1 l←R xS ← (u, v) c ← TraceContour(xS , 0, l, I, L) Couter ← Couter ∪ {c} collect outer contour L(u, v) ← l else I(u, v) is a background pixel if (l = 0) then if (L(u, v) = 0) then hit new inner contour xS ← (u−1, v) c ← TraceContour(xS , 1, l, I, L) Cinner ← Cinner ∪ {c} collect inner contour l←0 return (Couter , Cinner, L). return the contour sets and label map continued in Alg. 2.6

24

2. Regions in Binary Images

Algorithm 2.6 Combined contour ﬁnding and region labeling (Part 2, continued from Alg. 2.5). Starting from xS , the procedure TraceContour traces along the contour in the direction dS = 0 for outer contours or dS = 1 for inner contours. During this process, all contour points as well as neighboring background points are marked in the label array L. Given a point xc , TraceContour uses FindNextPoint() to determine the next point along the contour (line 10). The function Delta() returns the next coordinate in the sequence, taking into account the search direction d.

1: TraceContour(xS , dS , l, I, L)

2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16:

xS : start position, dS : initial search direction (0 for outer, 1 for inner contours), l: label for this contour, I: original image, L: label map. Traces and returns the contour starting at xS . (xT , dnext ) ← FindNextPoint(xS , dS , I, L) c ← [xT ] create a contour starting with xT x p ← xS previous position xp = (up , vp ) x c ← xT current position xc = (uc , vc ) done ← (xS ≡ xT ) isolated pixel? while (¬done) do L(uc , vc ) ← l dsearch ← (dnext + 6) mod 8 (xn , dnext ) ← FindNextPoint(xc , dsearch , I, L) xp ← xc xc ← xn done ← (xp ≡ xS ∧ xc ≡ xT ) back at start point? if (¬done) then Append(c, xn ) add point xn to contour c return c. return this contour

17: FindNextPoint(xc , d, I, L)

18: 19: 20: 21: 22: 23: 24: 25:

xc : start point, d: search direction, I: original image, L: label map. for i ← 0 . . . 6 do search in 7 directions x ← xc + Delta(d) x = (u , v ) if I(u , v ) is a background pixel then L(u , v ) ← −1 mark background as visited (−1) d ← (d + 1) mod 8 else found a nonbackground pixel at x return (x , d) return (xc , d). found no next point, return start point

26: Delta(d) = (Δx, Δy),

with

d Δx Δy

0 1 0

1 1 1

2 3 4 5 6 7 0 −1 −1 −1 0 1 1 1 0 −1 −1 −1

2.2 Region Contours

1 2 3 4 5 6 7 8 9 10 11 12

25

import java.util.List; ... public class Trace_Contours implements PlugInFilter { public void run(ImageProcessor ip) { ContourTracer tracer = new ContourTracer(ip); // extract contours and regions List outerContours = tracer.getOuterContours(); List innerContours = tracer.getInnerContours(); List regions = tracer.getRegions(); ... } }

Program 2.2 Example of using the class ContourTracer. See Appendix B.1 for a listing of the complete implementation.

points on the contour, xp = xS and xc = xT , we know that the contour has been completely traversed. – The method findNextPoint() (see p. 290 ) determines which point on the contour follows the current point xc (xC, yC) by searching in the direction d (dir), depending upon the position of the previous contour point. Starting in the ﬁrst search direction, up to seven neighboring pixels (all neighbors except the previous contour point) are searched in clockwise direction until the next contour point is found. At the same time, all background pixels in the label map L (labelArray) are marked with the value −1 to prevent them from being searched again. If no valid contour point is found among the seven possible neighbors, then findNextPoint() returns the original point xc (xC, yC). In this implementation the core of the algorithm is contained in the class ContourTracer (pp. 287–292). Program 2.2 provides an example of its usage within the run() method of an ImageJ plugin. An interesting detail is the class ContourOverlay (pp. 292–293) that is used to display the resulting contours by a vector graphics overlay. In this way graphic structures that are smaller and thinner than image pixels can be visualized on top of ImageJ’s raster images at arbitrary magniﬁcation (zooming).

2.2.4 Example This combined algorithm for region marking and contour following is particularly well suited for processing large binary images since it is eﬃcient and has only modest memory requirements. Figure 2.10 shows a synthetic test image that illustrates a number of special situations, such as isolated pixels and thin sections, which the algorithm must deal with correctly when following the contours. In the resulting plot, outer contours are shown as black polygon lines

26

2. Regions in Binary Images

(a)

(b)

Figure 2.10 Combined contour and region marking: original image in gray (a), located contours (b) with black lines for out and white lines for inner contours. The contour consisting of singe isolated pixels (for example, in the upper-right of (b)) are marked by a single circle in the appropriate color.

running trough the centers of the contour pixels, and inner contours are drawn white. Contours of single-pixel regions are marked by small circles ﬁlled with the corresponding color. Figure 2.11 shows the results for a larger section taken from a real image (Vol. 1 [14, Fig. 7.12]).

2.3 Representing Image Regions 2.3.1 Matrix Representation A natural representation for images is a matrix (that is, a two-dimensional array) in which elements represent the intensity or the color at a corresponding position in the image. This representation lends itself, in most programming languages, to a simple and elegant mapping onto two-dimensional arrays, which makes possible a very natural way to work with raster images. One possible disadvantage with this representation is that it does not depend on the content of the image. In other words, it makes no diﬀerence whether the image contains only a pair of lines or is of a complex scene because the amount of memory required is constant and depends only on the dimensions of the image. Regions in an image can be represented using a logical mask in which the area within the region is assigned the value true and the area without the value false (Fig. 2.12). Since Boolean values can be represented by a single bit, such

2.3 Representing Image Regions

27

Figure 2.11 Example of a complex contour (in a section cut from Fig. 7.12 in Vol. 1 [14]). Outer contours are marked in black and inner contours in white.

a matrix is often referred to as a “bitmap”.5

2.3.2 Run Length Encoding In run length encoding (RLE), sequences of adjacent foreground pixels can be represented compactly as “runs”. A run, or contiguous block, is a maximal length sequence of adjacent pixels of the same type within either a row or a column. Runs of arbitrary length can be encoded compactly using three integers, Run i = rowi , columni , lengthi , 5

In Java, variables of the type boolean are represented internally within the Java virtual machine (JVM) as 32-bit ints. There is currently no direct way to implement genuine bitmaps in Java.

28

2. Regions in Binary Images

(a)

(b)

(c)

Figure 2.12 Use of a binary mask to specify a region of an image: original image (a), logical (bit) mask (b), and masked image (c). Bitmap 0 1 2 3 4 0 1 × × × 2 3 × 4 × × × 5 × × × × × 6

RLE

5 6 7 8 × × × × × × × × × × × × ×

row, column, length

→

1, 2, 6 3, 4, 4 4, 1, 3 4, 5, 3 5, 0, 9

Figure 2.13 Run length encoding in row direction. A run of pixels can be represented by its starting point (1, 2) and its length (6).

two to represent the starting pixel (row, column) and a third for the length of the run as illustrated in Fig. 2.13. When representing a sequence of runs within the same row, the number of the row is redundant and can be left out. Also, in some applications, it is more useful to record the coordinate of the end column instead of the length of the run. Since the RLE representation can be easily implemented and eﬃciently computed, it has long been used as a simple lossless compression method. It forms the foundation for fax transmission and can be found in a number of other important codecs, including TIFF, GIF, and JPEG. In addition, RLE provides precomputed information about the image that can be used directly when computing certain properties of the image (for example, statistical moments; see Sec. 2.4.3).

2.3.3 Chain Codes Regions can be represented not only using their interiors but also by their contours. Chain codes, which are often referred to as Freeman codes [25], are a classical method of contour encoding. In this encoding, the contour beginning

2.3 Representing Image Regions

29

xS

xS

3

2 0

2

3

2

2

2

3

2

2

3

6

4

5

7

3 3

5

4

4

5

4

1 0

2

3

3

6 0

4

7

5

3 4-Chain Code 3223222322303303...111 length = 28

2

1 0

6

7

8-Chain Code 54544546767...222 √ length = 18 + 5 2 ≈ 25

Figure 2.14 Chain codes with 4- and 8-connected neighborhoods. To compute a chain code, begin traversing the contour from a given starting point xS . Encode the relative position between adjacent contour points using the directional code for either 4-connected (left) or 8-connected (right) neighborhoods. The length of the resulting path, calculated as the sum of the individual segments, can be used to approximate the true length of the contour.

at a given start point xS is represented by the sequence of directional changes it describes on the discrete image raster (Fig. 2.14). Absolute chain code For a closed contour of a region R, described by the sequence of points cR = [x0 , x1 , . . . xM−1 ] with xi = ui , vi , we create the elements of its chain code sequence cR = [c0 , c1 , . . . cM−1 ] by ci = Code(Δui , Δvi ), where

(Δui , Δvi ) =

(2.1)

(ui+1 −ui, vi+1 −vi )

for 0 ≤ i < M −1

(u0 −ui , v0 −vi )

for i = M −1,

and Code(Δu, Δv) being deﬁned by the following table:6

6

Δu

1

1

0 −1 −1 −1

Δv

1

1

1

0 −1 −1 −1

Code(Δu, Δv)

1

2

3

4

Assuming an 8-connected neighborhood.

5

0 6

1 7

30

2. Regions in Binary Images

Chain codes are compact since instead of storing the absolute coordinates for every point on the contour, only that of the starting point is recorded. The remaining points are encoded relative to the starting point by indicating in which of the eight possible directions the next point lies. Since only 3 bits are required to encode these eight directions the values can be stored using a smaller numeric type. Diﬀerential chain code Directly comparing two regions represented using chain codes is diﬃcult since the description depends on the starting point selected xS , and for instance simply rotating the region by 90◦ results in a completely diﬀerent chain code. When using a diﬀerential chain code, the situation improves slightly. Instead of encoding the diﬀerence in the position of the next contour point, the change in the direction along the discrete contour is encoded. A given absolute chain code cR = [c0 , c1 , . . . cM−1 ] can be converted element by element to a diﬀerential chain code cR = [c0 , c1 , . . . cM−1 ], with ci

=

(ci+1 − ci ) mod 8

for 0 ≤ i < M −1

(c0 − ci ) mod 8

for i = M −1,

(2.2)

again under the assumption of an 8-connected neighborhood.7 The element ci thus describes the change in direction (curvature) of the contour between two successive segments ci and ci+1 of the original chain code cR . For the contour in Fig. 2.14 (b), the results are cR = [5, 4, 5, 4, 4, 5, 4, 6, 7, 6, 7, . . .2, 2, 2], cR = [7, 1, 7, 0, 1, 7, 2, 1, 7, 1, 1, . . .0, 0, 3]. Given the starting point xS and the (absolute) initial direction c0 , the original contour can be unambiguously reconstructed from the diﬀerential chain code. Shape numbers While the diﬀerential chain code remains the same when a region is rotated by 90◦ , the encoding is still dependent on the selected starting point. If we want to determine the similarity of two contours of the same length M using their diﬀerential chain codes c1 , c2 , we must ﬁrst ensure that the same start point was used when computing the codes. A method that is often used [2, 28] is to interpret the elements ci in the diﬀerential chain code as the digits of 7

See Vol. 1 [14, Appendix B.1.2] for implementing the mod operator used in Eqn. (2.2).

2.3 Representing Image Regions

31

a number to the base b (b = 8 for an 8-connected contour or b = 4 for a 4-connected contour) and the numeric value Val(cR ) = c0 · b0 + c1 · b1 + . . . + cM−1 · bM−1 =

M−1

ci · bi .

(2.3)

i=0

Then the sequence cR is shifted cyclically until the numeric value of the corresponding number reaches a maximum. We use the expression cR k to denote the sequence cR being cyclically shifted by k positions to the right,8 such as (for k = 2) cR = [ 0, 1, 3, 2, . . . 9, 3, 7, 4 ] cR 2 = [ 7, 4, 0, 1, 3, 2, . . . 9, 3 ] and kmax = arg max Val(cR k) 0≤k Val(c2 ) is determined by comparing the lexicographic ordering between the sequences c1 and c2 so that the arithmetic values need not be computed at all. Unfortunately, comparisons based on chain codes are generally not very useful for determining the similarity between regions simply because rotations at arbitrary angles (= 90◦ ) have too great of an impact (change) on a region’s code. In addition, chain codes are not capable of handling changes in size (scaling) or other distortions. Section 2.4 presents a number of tools that are more appropriate in these types of cases. 8

(cR k)[ i ] = cR [(i − k) mod M ].

32

2. Regions in Binary Images

Fourier descriptors An elegant approach to describing contours are so-called Fourier descriptors, which interpret the two-dimensional contour cR = [x0 , x1 , . . . xM−1 ] with xi = (ui , vi ) as a sequence of values [z0 , z1 . . . zM−1 ] in the complex plane, where zi = (ui + i · vi ) ∈ C. (2.6) From this sequence, one obtains (using a suitable method of interpolation in case of an 8-connected contour), a discrete, one-dimensional periodic function f (s) ∈ C with a constant sampling interval over s, the path length around the contour. The coeﬃcients of the one-dimensional Fourier spectrum (see Sec. 7.3) of this function f (s) provide a shape description of the contour in frequency space, where the lower spectral coeﬃcients deliver a gross description of the shape. The details of this classical method can be found for example in [28, 30, 46, 47, 69].

2.4 Properties of Binary Regions Imagine that you have to describe the contents of a digital image to another person over the telephone. One possibility would be to call out the value of each pixel in some agreed upon order. A much simpler way of course would be to describe the image on the basis of its properties—for example, “a red rectangle on a blue background”, or at an even higher level such as “a sunset at the beach with two dogs playing in the sand”. While using such a description is simple and natural for us, it is not (yet) possible for a computer to generate these types of descriptions without human intervention. For computers, it is of course simpler to calculate the mathematical properties of an image or region and to use these as the basis for further classiﬁcation. Using features to classify, be they images or other items, is a fundamental part of the ﬁeld of pattern recognition, a research area with many applications in image processing and computer vision [21, 55, 72].

2.4.1 Shape Features The comparison and classiﬁcation of binary regions is widely used, for example, in optical character recognition (OCR) and for automating processes ranging from blood cell counting to quality control inspection of manufactured products on assembly lines. The analysis of binary regions turns out to be one of the simpler tasks for which many eﬃcient algorithms have been developed and used to implement reliable applications that are in use every day. By a feature of a region, we mean a speciﬁc numerical or qualitative measure that is computable from the values and coordinates of the pixels that make up

2.4 Properties of Binary Regions

33

the region. As an example, one of the simplest features is its size or area; that is the number of pixels that make up a region. In order to describe a region in a compact form, diﬀerent features are often combined into a feature vector. This vector is then used as a sort of “signature” for the region that can be used for classiﬁcation or comparison with other regions. The best features are those that are simple to calculate and are not easily inﬂuenced (robust) by irrelevant changes, particularly translation, rotation, and scaling.

2.4.2 Geometric Features A region R of a binary image can be interpreted as a two-dimensional distribution of foreground points xi = (ui , vi ) on the discrete plane Z2 , R = {x0 , x1 . . . xN −1 } = {(u0 , v0 ), (u1 , v1 ) . . . (uN −1 , vN −1 )}. Most geometric properties are deﬁned in such a way that a region is considered to be a set of pixels that, in contrast to the deﬁnition in Sec. 2.1, does not necessarily have to be connected. Perimeter The perimeter (or circumference) of a region R is deﬁned as the length of its outer contour, where R must be connected. As illustrated in Fig. 2.14, the type of neighborhood relation must be taken into account for this calculation. When using a 4-neighborhood, the measured length of the contour (except when that length is 1) will be larger than its actual length. In the case of 8-neighborhoods, a good approximation is reached by weighing √ the horizontal and vertical segments with 1 and diagonal segments with 2. Given an 8connected chain code cR = [c0 , c1 , . . . cM−1 ], the perimeter of the region is arrived at by Perimeter(R) =

M−1

length(ci ),

(2.7)

i=0

with

length(c) =

√1 2

for c = 0, 2, 4, 6, for c = 1, 3, 5, 7.

However, with this conventional method of calculation,the real perimeter (P (R)) is systematically overestimated. As a simple remedy, an empirical correction factor of 0.95 works satisfactory even for relatively small regions: P (R) ≈ Perimetercorr (R) = 0.95 · Perimeter(R).

(2.8)

34

2. Regions in Binary Images

Area The area of a binary region R can be found by simply counting the image pixels that make up the region, A(R) = |R| = N. (2.9) The area of a connected region without holes can also be approximated from its closed contour, deﬁned by M coordinate points (x0 , x1 , . . . xM−1 ), where xi = (ui , vi ), using the Gaussian area formula for polygons: M−1

1 A(R) ≈ · ui · v(i+1) mod M − u(i+1) mod M · vi . (2.10) 2 i=0 When the contour is already encoded as a chain code cR = [c0 , c1 , . . . cM−1 ], then the region’s area can be computed using Eqn. (2.10) by expanding cR into a sequence of contour points, using an arbitrary starting point (e. g., (0, 0)). While simple region properties such as area and perimeter are not inﬂuenced (except for quantization errors) by translation and rotation of the region, they are deﬁnitely aﬀected by changes in size; for example, when the object to which the region corresponds is imaged from diﬀerent distances. However, as described below, it is possible to specify combined features that are invariant to translation, rotation, and scaling as well. Compactness and roundness Compactness is understood as the relation between a region’s area and its perimeter. We can use the fact that a region’s perimeter P increases linearly with the enlargement factor while the area A increases quadratically to see that, for a particular shape, the ratio A/P 2 should be the same at any scale. This ratio can thus be used as a feature that is invariant under translation, rotation, and scaling. When applied to a circular region of any diameter, this 1 ratio has a value of 4π , so by normalizing it against a ﬁlled circle, we create a feature that is sensitive to the roundness or circularity of a region, Circularity(R) = 4π ·

A(R) , P 2 (R)

(2.11)

which results in a maximum value of 1 for a perfectly round region R and a value in the range [0, 1) for all other shapes (Fig. 2.15). If an absolute value for a region’s roundness is required, the corrected perimeter estimate (Eqn. (2.8)) should be employed: Circularity(R) ≈ 4π ·

A(R) . Perimeter2corr (R)

(2.12)

Figure 2.15 shows the circularity values of diﬀerent regions as computed with the formulation in Eqn. (2.12).

2.4 Properties of Binary Regions

1.001

35

0.672

0.086

Figure 2.15 Circularity values for diﬀerent shapes. Shown are the corresponding estimates for Circularity(R) as deﬁned in Eqn. (2.12).

Bounding box The bounding box of a region R is the minimal axis-parallel rectangle that encloses all points of R, BoundingBox(R) = umin , umax , vmin , vmax ,

(2.13)

where umin, umax and vmin , vmax are the minimal and maximal coordinate values of all points (ui , vi ) ∈ R in the x and y directions, respectively (Fig. 2.16 (a)). Convex hull The convex hull is the smallest convex polygon that contains all points of the region R. A physical analogy is a board in which nails stick out in correspondence to each of the points in the region. If you were to place an elastic band around all the nails, then, when you release it, it will contract into a convex hull around the nails (Fig. 2.16 (b)). The convex hull can be computed for N contour points in time O(N log V ), where V is the number vertices in the polygon of the resulting convex hull [3].9 The convex hull is useful, for example, for determining the convexity or the density of a region. The convexity is deﬁned as the relationship between the length of the convex hull and the original perimeter of the region. Density is then deﬁned as the ratio between the area of the region and the area of its convex hull. The diameter, on the other hand, is the maximal distance between any two nodes on the convex hull.

9

For O() complexity notation, see Vol. 1 [14, Appendix A.3].

36

2. Regions in Binary Images

(a)

(b)

Figure 2.16 Example bounding box (a) and convex hull (b) of a binary image region.

2.4.3 Statistical Shape Properties When computing statistical shape properties, we consider a region R to be a collection of coordinate points distributed within a two-dimensional space. Since statistical properties can be computed for point distributions that do not form a connected region, they can be applied before segmentation. An important concept in this context are the central moments of the region’s point distribution, which measure characteristic properties with respect to its midpoint or centroid.

Centroid The centroid or center of gravity of a connected region can be easily visualized. Imagine drawing the region on a piece of cardboard or tin and then cutting it out and attempting to balance it on the tip of your ﬁnger. The location on the region where you must place your ﬁnger in order for the region to balance is the centroid of the region.10 ¯ = (¯ The centroid x x, y¯) of a binary (not necessarily connected) region is the arithmetic mean of the coordinates in the x and y directions,

x¯ =

1 · u |R| (u,v)∈R

10

and

y¯ =

1 · v. |R|

(2.14)

(u,v)∈R

Assuming you did not imagine a region where the centroid lies outside of the region or within a hole in the region, which is of course possible.

2.4 Properties of Binary Regions

37

Moments The formulation of the region’s centroid in Eqn. (2.14) is only a special case of the more general statistical concept of a moment. Speciﬁcally, the expression mpq =

I(u, v) · up v q

(2.15)

(u,v)∈R

describes the (ordinary) moment of the order p, q for a discrete (image) function I(u, v) ∈ R; for example, a grayscale image. All the following deﬁnitions are also generally applicable to regions in grayscale images. The moments of connected binary regions can also be computed directly from the coordinates of the contour points [64, p. 148]. In the special case of a binary image I(u, v) ∈ {0, 1}, only the foreground pixels with I(u, v) = 1 in the region R need to be considered, and therefore Eqn. (2.15) can be simpliﬁed to

mpq =

up v q .

(2.16)

(u,v)∈R

In this way, the area of a binary region can be expressed as its zero-order moment, A(R) = |R| =

1=

(u,v)∈R

u0 v 0 = m00 (R),

(2.17)

(u,v)∈R

¯ Eqn. (2.14) as and similarly the centroid x x ¯=

1 m10 (R) · , u1 v 0 = |R| m00 (R)

(2.18)

m01 (R) 1 u0 v 1 = · . |R| m00 (R)

(2.19)

(u,v)∈R

y¯ =

(u,v)∈R

These moments thus represent concrete physical properties of a region. Specifically, the area m00 is in practice an important basis for characterizing regions, and the centroid (¯ x, y¯) permits the reliable and (within a fraction of a pixel) exact speciﬁcation of a region’s position. Central moments To compute position-independent (translation-invariant) region features, the region’s centroid, which can be determined precisely in any situation, can be

38

2. Regions in Binary Images

used as a reference point. In other words, we can shift the origin of the coordi¯ = (¯ nate system to the region’s centroid x x, y¯) to obtain the central moments of order p, q: I(u, v) · (u − x ¯)p · (v − y¯)q . (2.20) μpq (R) = (u,v)∈R

For a binary image (with I(u, v) = 1 within the region R), Eqn. (2.20) can be simpliﬁed to (u − x ¯)p · (v − y¯)q . (2.21) μpq (R) = (u,v)∈R

Normalized central moments Central moment values of course depend on the absolute size of the region since the value depends directly on the distance of all region points to its centroid. So, if a 2D shape is scaled uniformly by some factor s ∈ R, its central moments multiply by the factor s(p+q+2) . (2.22) Thus size-invariant “normalized” moments are obtained by scaling with the reciprocal of the area μ00 = m00 raised to the required power in the form μ ¯pq (R) = μpq (R) ·

1 (p+q+2)/2 μ00 (R)

(2.23)

for (p + q) ≥ 2 [46, p. 529]. Program 2.3 gives a direct (brute force) Java implementation for computing the ordinary, central, and normalized central moments for binary images (BACKGROUND = 0). This implementation is only meant to clarify the computation, and naturally much more eﬃcient implementations are possible (see, for example, [48]).

2.4.4 Moment-Based Geometrical Properties While normalized moments can be directly applied for classifying regions, further interesting and geometrically relevant features can be elegantly derived from moments. Orientation Orientation describes the direction of the major axis, that is the axis that runs through the centroid and along the widest part of the region (Fig. 2.18 (a)). Since rotating the region around the major axis requires less eﬀort (smaller moment of inertia) than spinning it around any other axis, it is sometimes referred to as the major axis of rotation. As an example, when you hold a

2.4 Properties of Binary Regions

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

39

import ij.process.ImageProcessor; public class Moments { static final int BACKGROUND = 0; static double moment(ImageProcessor ip,int p,int q) { double Mpq = 0.0; for (int v = 0; v < ip.getHeight(); v++) { for (int u = 0; u < ip.getWidth(); u++) { if (ip.getPixel(u,v) != BACKGROUND) { Mpq += Math.pow(u, p) * Math.pow(v, q); } } } return Mpq; } static double centralMoment(ImageProcessor ip,int p,int q) { double m00 = moment(ip, 0, 0); // region area double xCtr = moment(ip, 1, 0) / m00; double yCtr = moment(ip, 0, 1) / m00; double cMpq = 0.0; for (int v = 0; v < ip.getHeight(); v++) { for (int u = 0; u < ip.getWidth(); u++) { if (ip.getPixel(u,v) != BACKGROUND) { cMpq += Math.pow(u - xCtr, p) * Math.pow(v - yCtr, q); } } } return cMpq; } static double normalCentralMoment (ImageProcessor ip,int p,int q) { double m00 = moment(ip, 0, 0); double norm = Math.pow(m00, (double)(p + q + 2) / 2); return centralMoment(ip, p, q) / norm; } } // end of class Moments

Program 2.3 Example of directly computing moments in Java. The methods moment(), centralMoment(), and normalCentralMoment() compute for a binary image the moments ¯pq (Eqns. (2.16), (2.21), and (2.23)). mpq , μpq , and μ

pencil between your hands and twist it around its major axis (that is, around the lead), the pencil exhibits the least mass inertia (Fig. 2.17). As long as a region exhibits an orientation at all (μ20 (R) = μ02 (R)), the direction θR of the major axis can be found directly from the central moments μpq as

40

2. Regions in Binary Images

R

Figure 2.17 Major axis of a region. Rotating an elongated region R, interpreted as a physical body, around its major axis requires less eﬀort (least moment of inertia) than rotating it around any other axis.

tan(2 θR ) = and therefore θR

2 · μ11 (R) μ20 (R) − μ02 (R)

2 · μ11 (R) 1 −1 = tan 2 μ20 (R) − μ02 (R)

Arctan 2 · μ11 (R), μ20 (R)−μ02 (R) = . 2

(2.24)

(2.25) (2.26)

The resulting angle θR is in the range [− π2 , π2 ].11 Orientation measurements based on region moments are very accurate in general. Computing orientation vectors. When visualizing region properties, a frequent task is to plot the region’s orientation as a line or arrow, that are usually ¯ = (¯ anchored at the center of gravity x x, y¯); for example, by a parametric line of the form

x ¯ cos(θR ) ¯ + λ · xd = , (2.27) +λ· x=x sin(θR ) y¯ for some length λ > 0. To ﬁnd the unit orientation vector xd = (cos θ, sin θ)T , we could ﬁrst compute the inverse tangent to get 2θ (Eqn. (2.25)) and then compute the cosine and sine of θ. However, the vector xd can also be obtained without using trigonometric functions as follows. Rewriting Eqn. (2.24) as tan(2θR ) = 11

A sin(2θR ) 2 · μ11 (R) = = μ20 (R) − μ02 (R) B cos(2θR )

(2.28)

See Appendix A.1 for the computation of angles with the Arctan() (inverse tangent) function and Vol. 1 [14, Appendix B.1.6] for the corresponding Java method Math.atan2().

2.4 Properties of Binary Regions

41

−y − π2

rb

−x

θ

+x

¯ x

ra

+y

+ π2

Figure 2.18 Region orientation and eccentricity. The major axis of the region extends ¯ at the orientation θ. Note that angles are in the range through its center of gravity x [− π2 , + π2 ] and increment in the clockwise direction because the y axis of the image coordinate system points downward (in this example, θ ≈ −0.759 ≈ −43.5◦ ). The eccentricity of the region is deﬁned as the ratio between the lengths of the major axis (ra ) and the minor axis (rb ) of the “equivalent” ellipse.

we get (by Pythagoras’ theorem) sin(2θR ) = √

A A2 +B 2

and

B cos(2θR ) = √ , 2 A +B 2

where A = 2μ11(R) and B = μ20 (R) − μ02 (R). Using the relations cos2 α = 2 1 1 2 [1 + cos(2α)] and sin α = 2 [1 − cos(2α)], we can compute the region’s orienT tation vector xd = (xd , yd ) as ⎧ ⎨ 0 for A = B = 0 12 xd = cos(θR ) = (2.29)

⎩ 1 1+ √ 2B 2 otherwise, 2 A +B

yd = sin(θR ) =

⎧ 0 ⎪ ⎪ ⎪ ⎨ 1

12

√ B 2 1− A2 +B 2 ⎪ ⎪

12 ⎪ ⎩ − 1 1− √ b 2 A2 +B 2

for A = B = 0 for A ≥ 0

(2.30)

for A < 0,

straight from the central region moments μ11 (R), μ20 (R), and μ02 (R), as deﬁned in Eqn. (2.28). The horizontal component (xd ) in Eqn. (2.29) is always positive, while the case clause in Eqn. (2.30) corrects the sign of the vertical component (yd ) to map to the same angular range [− π2 , + π2 ] as Eqn. (2.25). The resulting vector xd is normalized (i. e., (xd , yd ) = 1) and could be scaled

42

2. Regions in Binary Images

arbitrarily for display purposes by a suitable length λ, for example, using the region’s eccentricity value described below. Eccentricity Similar to the region orientation, moments can also be used to determine the “elongatedness” or eccentricity of a region. A naive approach for computing the eccentricity could be to rotate the region until we can ﬁt a bounding box (or enclosing ellipse) with a maximum aspect ratio. Of course this process would be computationally intensive simply because of the many rotations required. If we know the orientation of the region (Eqn. (2.25)), then we may ﬁt a bounding box that is parallel to the region’s major axis. In general, the proportions of the region’s bounding box is not a good eccentricity measure anyway because it does not consider the distribution of pixels inside the box. Based on region moments, highly accurate and stable measures can be obtained without any iterative search or optimization. Also, moment-based methods do not require knowledge of the boundary length (as required for computing the circularity feature in Sec. 2.4.2), and they can also handle nonconnected regions or point clouds. Several diﬀerent formulations of region eccentricity can be found in the literature [2, 46, 47] (see also Exercise 2.11). We adopt the following deﬁnition because of its simple geometrical interpretation: μ20 + μ02 + (μ20 − μ02 )2 + 4 · μ211 a1 = , (2.31) Ecc(R) = a2 μ20 + μ02 − (μ20 − μ02 )2 + 4 · μ211 where a1 = 2λ1 , a2 = 2λ2 are multiples of the eigenvalues λ1 , λ2 of the symmetric 2 × 2 matrix

μ20 μ11 A= μ11 μ02 formed by the central moments μpq of the region R. The values of Ecc are in the range [1, ∞), where Ecc = 1 corresponds to a circular disk and elongated regions have values > 1. Ecc itself is invariant to the region’s orientation and size. However, the values a1 , a2 contain information about the spatial extent of the region. Geometrically, the eigenvalues λ1 , λ2 (and thus a1 , a2 ) directly relate to the proportions of the “equivalent” ellipse, positioned at the region’s center of gravity (¯ x, y¯) and oriented at θ = θR Eqn. (2.25). The lengths of the ellipse’s major and minor axes, ra and rb , are λ 12 2 a 12 1 1 = , |R| |R| λ 12 2 a 12 2 2 = , rb = 2 · |R| |R|

ra = 2 ·

(2.32) (2.33)

2.4 Properties of Binary Regions

43

Figure 2.19 Orientation and eccentricity examples. The orientation θ (Eqn. (2.25)) is displayed for each connected region as a vector with the length proportional to the region’s eccentricity value Ecc(R) (Eqn. (2.31)). Also shown are the ellipses (Eqns. (2.32) and (2.33)) corresponding to the orientation and eccentricity parameters.

respectively, with a1 , a2 as deﬁned in Eqn. (2.31) and |R| being the number of pixels in the region. The resulting parametric equation of the equivalent ellipse is

x(t) x ¯ cos(θ) − sin(θ) ra · cos(t) = + · y(t) y¯ sin(θ) cos(θ) rb · sin(t)

x ¯ + cos(θ) · ra · cos(t) − sin(θ) · rb · sin(t) = (2.34) y¯ + sin(θ) · ra · cos(t) + cos(θ) · rb · sin(t) for 0 ≤ t < 2π. If entirely ﬁlled, the region described by this ellipse would have the same (ﬁrst and second order) central moments as the original region R. Figure 2.19 shows a set of regions with overlaid orientation and eccentricity results. Invariant moments Normalized central moments are not aﬀected by the translation or uniform scaling of a region (i. e., the values are invariant), but in general rotating the image will change these values. A classical solution to this problem is a clever

44

2. Regions in Binary Images

combination of simpler features known as “Hu’s Moments” [37]:12 (2.35)

H1 = μ ¯20 + μ ¯02 , H2 = (¯ μ20 − μ ¯02 ) + 2

4μ ¯ 211 ,

μ30 − 3 μ ¯ 12 )2 + (3 μ ¯ 21 − μ ¯03 )2 , H3 = (¯ H4 = (¯ μ30 + μ ¯12 )2 + (¯ μ21 + μ ¯ 03 )2 , μ30 − 3 μ ¯ 12 ) · (¯ μ30 + μ ¯12 ) · (¯ μ30 + μ ¯12 )2 − 3(¯ μ21 + μ ¯ 03 )2 H5 = (¯ ¯ 03 ) · (¯ μ21 + μ ¯03 ) · 3 (¯ μ30 + μ ¯12 )2 − (¯ μ21 + μ ¯ 03 )2 , + (3 μ ¯21 − μ μ20 − μ ¯02 ) · (¯ μ30 + μ ¯ 12 )2 − (¯ μ21 + μ ¯03 )2 H6 = (¯ μ30 + μ ¯12 ) · (¯ μ21 + μ ¯03 ), + 4μ ¯11 · (¯ H7 = (3 μ ¯21 − μ ¯ 03 ) · (¯ μ30 + μ ¯12 ) · (¯ μ30 + μ ¯12 )2 − 3 (¯ μ21 + μ ¯ 03 )2 ¯ 30 ) · (¯ μ21 + μ ¯03 ) · 3 (¯ μ30 + μ ¯12 )2 − (¯ μ21 + μ ¯ 03 )2 . + (3 μ ¯12 − μ In practice, the logarithm of the results (that is, log(Hk )) is used since the raw values can have a very large range. These features are also known as moment invariants since they are invariant under translation, rotation, and scaling. While deﬁned here for binary images, they are also applicable to grayscale images; for further information, see [28, p. 517].

2.4.5 Projections Image projections are one-dimensional representations of the image contents, usually computed parallel to the coordinate axis; in this case, the horizontal, as well as the vertical, projection of an image I(u, v), with 0 ≤ u < M , 0 ≤ v < N , deﬁned as Phor (v0 ) =

M−1

I(u, v0 )

for 0 ≤ v0 < N,

(2.36)

I(u0 , v)

for 0 ≤ u0 < M.

(2.37)

u=0

Pver (u0 ) =

N −1 v=0

The horizontal projection Phor (v0 ) (Eqn. (2.36)) is the sum of the pixel values in the image row v0 and has length N corresponding to the height of the image. On the other hand, a vertical projection Pver of length M is the sum of all the values in the image column u0 (Eqn. (2.37)). In the case of a binary image with I(u, v) ∈ 0, 1, the projection contains the count of the foreground pixels in the corresponding image row or column. 12

In order to improve the legibility of Eqn. (2.35) the argument for the region (R) has been dropped; as an example, with the region argument, the ﬁrst line would ¯20 (R) + μ ¯02 (R), and so on. read H1 (R) = μ

2.4 Properties of Binary Regions

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

45

public void run(ImageProcessor ip) { int M = ip.getWidth(); int N = ip.getHeight(); int[] horProj = new int[N]; int[] verProj = new int[M]; for (int v = 0; v < N; v++) { for (int u = 0; u < M; u++) { int p = ip.getPixel(u, v); horProj[v] += p; verProj[u] += p; } } // use projections horProj, verProj now // ... }

Program 2.4 Computation of horizontal and vertical projections. The run() method for an ImageJ plugin (ip is of type ByteProcessor or ShortProcessor) computes the projections in x and y directions simultaneously in a a single traversal of the image. The projections are represented by the one-dimensional arrays horProj and verProj with elements of type int.

Program Prog. 2.4 gives a direct implementation of the projection calculations as the run() method for an ImageJ plugin, where projections in both directions are computed during a single traversal of the image. Projections in the direction of the coordinate axis are often utilized to quickly analyze the structure of an image and isolate its component parts; for example, in document images it is used to separate graphic elements from text blocks as well as to isolate individual lines (see the example in Fig. 2.20). In practice, especially to account for document skew, projections are often computed along the major axis of an image region Eqn. (2.25). When the projection vectors of a region are computed in reference to the centroid of the region along the major axis, the result is a rotation-invariant vector description (often referred to as a “signature”) of the region.

2.4.6 Topological Properties Topological features do not describe the shape of a region in continuous terms; instead, they capture its structural properties. They are typically invariant even under extreme image transformations. Two simple and robust topological features are the number of regions NR (R) and the number of holes NL (R) in those regions. NL (R) can be easily computed while ﬁnding the inner contours of a region, as described in Sec. 2.2.2. A feature that can be derived from the number of holes is the so-called Euler number NE , which is the diﬀerence between the number of connected

46

2. Regions in Binary Images

Figure 2.20 Example of the horizontal projection Phor (v) (right) and vertical projection Pver (u) (bottom) of a binary image.

regions NR and the number of their holes NH , NE (R) = NR (R) − NH (R).

(2.38)

For a single connected region, the above formula simpliﬁes to 1 − NH , so, for example, for an image of the number “8”, NE = 1 − 2 = −1, while for an image of the letter “D”, NE = 1 − 1 = 0. Topological features are often used in combination with numerical features for classiﬁcation, for example in optical character recognition (OCR) [12].

2.5 Exercises Exercise 2.1 Trace, by hand, the execution of both variations (depth-ﬁrst and breadthﬁrst ) of the ﬂood-ﬁll algorithm using the image shown in Fig. 2.21 and starting at coordinates (5, 1). Exercise 2.2 The implementation of the ﬂood-ﬁll algorithm in Prog. 2.1 places all the neighboring pixels of each visited pixel into either the stack or the queue without ensuring they are foreground pixels and that they lie within the image boundaries. The number of items in the stack or the queue can be reduced by ignoring (not inserting) those neighboring pixels that do not

2.5 Exercises

0 0 0 0 0 0 0 0

47

0 0 1 0 1 0 1 0

0 0 1 0 1 0 1 0

0 0 1 0 1 0 0 0

0 0 1 1 1 1 0 0

0 1 1 0 1 1 0 0

0 1 1 1 1 1 1 0

0 0 0 0 1 1 0 0

0 0 0 0 1 1 1 0

0 1 1 0 1 1 0 0

0 1 0 0 1 1 0 0

0 0 0 0 1 1 0 0

0 1 1 1 1 1 0 0

0 0 0 0 0 0 0 0

0 Background 1 Foreground

Figure 2.21 Binary image for Exercise 2.1.

meet the two conditions given above. Modify the depth-ﬁrst and breadthﬁrst variants given in Prog. 2.1 accordingly and compare the new running times. Exercise 2.3 Implement an ImageJ plugin that encodes a grayscale image using run length encoding (Sec. 2.3.2) and stores it in a ﬁle. Develop a second plugin that reads the ﬁle and reconstructs the image. Exercise 2.4 Calculate the amount of memory required to represent a contour with 1000 points in the following ways: (a) as a sequence of coordinate points stored as pairs of int values; (b) as an 8-chain code using Java byte elements, and (c) as an 8-chain code using only 3 bits per element. Exercise 2.5 Implement a Java class for describing a binary image region using chain codes. It is up to you, whether you want to use an absolute or diﬀerential chain code. The implementation should be able to encode closed contours as chain codes and also reconstruct the contours given a chain code. Exercise 2.6 While computing the convex hull of a region, the maximal diameter (maximum distance between two arbitrary points) can also be simply found. Devise an alternative method for computing this feature without using the convex hull. Determine the running time of your algorithm in terms of the number of points in the region. Exercise 2.7 Implement an algorithm for comparing contours using their shape numbers Eqn. (2.3). For this purpose, develop a metric for measuring the distance between two normalized chain codes. Describe if, and under which conditions, the results will be reliable.

48

2. Regions in Binary Images

Exercise 2.8 Using Eqn. (2.10) as the basis, develop and implement an algorithm that computes the area of a region from its 8-chain code encoded contour. What type of discrepancy from the region’s actual area (the number of pixels it contains) do you expect? Exercise 2.9 Sketch an example binary region where the centroid lies outside of the region. Exercise 2.10 Implement the moment features developed by Hu (Eqn. (2.35)) and show that they are invariant under scaling and rotation for both binary and grayscale images. Exercise 2.11 There are alternative deﬁnitions for the eccentricity of a region Eqn. (2.31); for example,

2 μ20 − μ02 + 4 · μ211 [47, p. 394], Ecc2 (R) =

2 μ20 + μ02 (μ20 − μ02 )2 + 4 · μ11 m00 √ μ20 − μ02 + 4 · μ11 Ecc4 (R) = m00 Ecc3 (R) =

[46, p. 531], [2, p. 255].

Implement all four variations (including the one in Eqn. (2.31)) and contrast the results using suitably designed regions. Determine how these measures work and what their range of values is, and propose a geometrical interpretation for each. Exercise 2.12 Write an ImageJ plugin that (a) ﬁnds (labels) all regions in a binary image, (b) computes the orientation and eccentricity for each region, and (c) shows the results as a direction vector and the equivalent ellipse on top of each region (as exempliﬁed in Fig. 2.19). Hint: Use Eqn. (2.34) to develop a method for drawing ellipses at arbitrary orientations (not available in ImageJ). Exercise 2.13 The Java method in Prog. 2.4 computes an image’s horizontal and vertical projections. For document image processing, projections in the diagonal directions are also useful. Implement these projections and consider what role they play in document image analysis.

3

Detecting Simple Curves

In Volume 1 we demonstrated how to use appropriately designed ﬁlters to detect edges in images [14, Chap. 6]. These ﬁlters compute both the edge strength and orientation at every position in the image. In the following sections, we explain how to decide (for example, by using a threshold operation on the edge strength) if a curve is actually present at a given image location. The result of this process is generally represented as a binary edge map. Edge maps are considered preliminary results since with an edge ﬁlter’s limited (“myopic”) view it is not possible to accurately ascertain if a point belongs to a true edge. Edge maps created using simple threshold operations contain many edge points that do not belong to true edges (false positives), and, on the other hand, many edge points are not detected and so are missing from the map (false negatives).1 In general, edge maps contain many irrelevant structures, while at the same time many important structures are completely missing. The theme of this chapter is how, given a binary edge map, one can ﬁnd relevant and possibly signiﬁcant structures based on their forms.

3.1 Salient Structures An intuitive approach to locating large image structures is to ﬁrst select an arbitrary edge point, systematically examine its neighboring pixels and add 1

Typically thresholding is performed at a level that decreases false negatives at the expense of introducing false positives, the reasoning being that it is much simpler to remove false positives during higher-level processing than it is to, in essence, ﬁll in the missing elements eliminated during low-level processing.

W. Burger, M.J. Burge, Principles of Digital Image Processing, Undergraduate Topics in Computer Science, DOI 10.1007/978-1-84800-195-4_3, © Springer-Verlag London Limited, 2009

50

3. Detecting Simple Curves

Figure 3.1 The human visual system is capable of instantly recognizing prominent image structures even under diﬃcult conditions.

them if they belong to the object’s contour, and repeat. In principle, such an approach could be applied to either a continuous edge map consisting of edge strengths and orientations or a simple binary edge map. Unfortunately, with either input, such an approach is likely to fail due to image noise and ambiguities that arise when trying to follow the contours. Additional constraints and information about the type of object sought are needed in order to handle pixellevel problems such as branching, as well as interruptions. This type of local sequential contour tracing makes for an interesting optimization problem [47] (see also Sec. 2.2). A completely diﬀerent approach is to search for globally apparent structures that consist of certain simple shape features. As an example, Fig. 3.1 shows that certain structures are readily apparent to the human visual system, even when they overlap in noisy images. The biological basis for why the human visual system spontaneously recognizes four lines or three circles in Fig. 3.1 instead of a larger number of disjoint segments and arcs is not completely known. At the cognitive level, theories such as “Gestalt” grouping have been proposed to address this behavior. The next sections explore one technique, the Hough transform, that provides an algorithmic solution to this problem.

3.2 Hough Transform The method from Paul Hough—originally published as a US Patent [36] and often referred to as the “Hough transform” (HT)—is a general approach to localizing any shape that can be deﬁned parametrically within a distribution of points [21, 39]. For example, many geometrical shapes, such as lines, circles, and ellipses, can be readily described using simple equations with only a few parameters. Since simple geometric forms often occur as part of man-made objects, they are especially useful features for analysis of these types of images (Fig. 3.2).

3.2 Hough Transform

51

Figure 3.2 Simple geometrical forms such as sections of lines, circles, and ellipses are often found in man-made objects.

The Hough transform is perhaps most often used for detecting line segments in edge maps. A line segment in 2D can be described with two real-valued parameters using the classic slope-intercept form (3.1)

y = kx + d,

where k is the slope and d the intercept—that is, the height at which the line would intercept the y axis (Fig. 3.3). A line segment that passes through two given edge points p1 = (x1 , y1 ) and p2 = (x2 , y2 ) must satisfy the conditions y1 = kx1 + d

and

y2 = kx2 + d

(3.2)

for k, d ∈ R. The goal is to ﬁnd values of k and d such that as many edge points as possible lie on the line they describe; in other words, the line that ﬁts the most edge points. But how can you determine the number of edge points that lie on a given line segment? One possibility is to exhaustively “draw” every possible line segment into the image while counting the number of points that lie exactly on each of these. Even though the discrete nature of pixel images (with only a ﬁnite number of diﬀerent lines) makes this approach possible in theory, generating such a large number of lines is infeasible in practice.

3.2.1 Parameter Space The Hough transform approaches the problem from another direction. It examines all the possible line segments that run through a single given point in

52

3. Detecting Simple Curves

y p2 = (x2 , y2 )

y1 = kx1 + d y2 = kx2 + d

p1 = (x1 , y1 )

d x Figure 3.3 Two points, p1 and p2 , lie on the same line when y1 = kx1 + d and y2 = kx2 + d for a particular pair of parameters k and d. y

L1

p0

L2 L4

L3 x

Figure 3.4 Set of lines passing through an image point. For all possible lines Lj passing through the point p0 = (x0 , y0 ), the equation y0 = kj x0 + dj holds for appropriate values of the parameters kj , dj .

the image. Every line Lj = kj , dj that runs through a point p0 = (x0 , y0 ) must satisfy the condition Lj : y0 = kj x0 + dj

(3.3)

for some suitable pair of values kj , dj . Equation 3.3 is underdetermined and the possible solutions for kj , dj correspond to an inﬁnite set of lines passing through the given point p0 (Fig. 3.4). Note that for a given kj , the solution for dj in Eqn. (3.3) is dj = −x0 kj + y0 ,

(3.4)

which is another equation for a line, where now kj , dj are the variables and x0 , y0 are the constant parameters of the equation. The solution set {(kj , dj )} of Eqn. (3.4) describes the parameters of all possible lines Lj passing through the image point p0 = (x0 , y0 ). For an arbitrary image point pi = (xi , yi ), Eqn. (3.4) describes the line Mi : d = −xi k + yi

(3.5)

3.2 Hough Transform

53

L12

y

d

p2 = (x2 , y2 ) M2 : d = −x2 k + y2

p1 = (x1 , y1 )

q12 = (k12 , d12 ) M1 : d = −x1 k + y1 k

x (a) x/y image space

(b) k/d parameter space

Figure 3.5 Relationship between image space and parameter space. The parameter values for all possible lines passing through the image point pi = (xi , yi ) in image space (a) lie on a single line Mi in parameter space (b). This means that each point qj = (kj , dj ) in parameter space corresponds to a single line Lj in image space. The intersection of the two lines M1 , M2 at the point q12 = (k12 , d12 ) in parameter space indicates that a line L12 through the two points k12 and d12 exists in the image space.

with the parameters −xi , yi in the so-called parameter or Hough space, spanned by the coordinates k, d. The relationship between (x, y) image space and (k, d) parameter space can be summarized as follows: Image Space (x, y)

Parameter Space (k, d)

Point

pi = (xi , yi )

Mi : d = −xi k + yi

Line

Line

Lj : y = kj x + dj

q j = (kj , dj )

Point

Each image point pi and its associated line bundle correspond to exactly one line Mi in parameter space. Therefore we are interested in those places in the parameter space where lines intersect. The example in Fig. 3.5 illustrates how the lines M1 and M2 intersect at the position q12 = (k12 , d12 ) in the parameter space, which means (k12 , d12 ) are the parameters of the line in the image space that runs through both image points p1 and p2 . The more lines Mi that intersect at a single point in the parameter space, the more image space points lie on the corresponding line in the image! In general, we can state: If N lines intersect at position (k , d ) in parameter space, then N image points lie on the corresponding line y = k x + d in image space.

54

3. Detecting Simple Curves

y

d

x (a) Image Space

k (b) Accumulator Array

Figure 3.6 Main idea of the Hough transform. The accumulator array is a discrete representation of the parameter space (k, d). For each image point found (a), a discrete line in the parameter space (b) is drawn. This operation is performed additively so that the values of the array through which the line passes are incremented by 1. The value at each cell of the accumulator array is the number of parameter space lines that intersect it (in this case 2).

3.2.2 Accumulator Array Finding the dominant lines in the image can now be reformulated as ﬁnding all the locations in parameter space where a signiﬁcant number of lines intersect. This is basically the goal of the HT. In order to compute the HT, we must ﬁrst decide on a discrete representation of the continuous parameter space by selecting an appropriate step size for the k and d axes. Once we have selected step sizes for the coordinates, we can represent the space naturally using a twodimensional array. Since the array will be used to keep track of the number of times parameter space lines intersect, it is called an “accumulator” array. Each parameter space line is painted into the accumulator array and the cells through which it passes are incremented, so that ultimately each cell accumulates the total number of lines that intersect at that cell (Fig. 3.6).

3.2.3 A Better Line Representation The line representation in Eqn. (3.1) is not used in practice because for vertical lines the slope is inﬁnite, i. e., k = ∞. A more practical representation is the so-called Hessian normal form (HNF, [11, p. 195]) for representing lines, x · cos(θ) + y · sin(θ) = r,

(3.6)

3.3 Implementing the Hough Transform

y = kx + d

y

55

y

x · cos(θ) + y · sin(θ) = r

(x, y)

(x, y) r k=∞ d

θ x (a)

x (b)

Figure 3.7 Representation of lines in 2D. In the normal k, d representation (a), vertical lines pose a problem because k = ∞. The Hessian normal form (b) avoids this problem by representing a line by its angle θ and distance r from the origin.

which does not exhibit such singularities and also provides a natural linear quantization for its parameters, the angle θ and the radius r (Fig. 3.7). With the HNF2 representation, the parameter space is deﬁned by the coordinates θ, r, and a point p = (x, y) in image space corresponds to the function rx,y (θ) = x · cos(θ) + y · sin(θ)

(3.7)

for angles in the range 0 ≤ θ < π (Fig. 3.8). If we use the center of the image as the reference point for the x/y image space, then it is possible to limit the range of the radius to half the diagonal of the image, (3.8) −rmax ≤ rx,y (θ) ≤ rmax , where rmax = 12 M 2 + N 2 , for an image of width M and height N .

3.3 Implementing the Hough Transform The fundamental Hough algorithm using the HNF line representation (Eqn. (3.6)) is given in Alg. 3.1. Starting with a binary image I(u, v) where the edge pixels have been assigned a value of 1, the ﬁrst stage creates a twodimensional accumulator array and then iterates over the image to ﬁll it. In the second stage, the accumulator array is searched (FindMaxLines()) for maximum values, and a list of parameter pairs for the K strongest lines MaxLines = [θ1 , r1 , θ2 , r2 , . . . θK , rK ] is computed. The next sections explain these two stages in detail. 2

The Hessian normal form is a constrained variant of the general line equation ax + by + c = 0, with a = cos(θ), b = sin(θ), and c = −r (see [11, p. 194]).

56

3. Detecting Simple Curves

θ/r-parameter space

x/y-space y

r c c b b

a

a −x

x

θ d e

d e −r 0

−y

π 2

(a)

π

(b)

Figure 3.8 Image space and parameter space using the HNF representation.

3.3.1 Filling the Accumulator Array A direct implementation of the ﬁrst phase of Alg. 3.1 is given in the Java class LinearHT Prog. 3.1.3 The accumulator array (houghArray) is deﬁned as a twodimensional int Array. The HT is computed from the original image ip by creating a new instance of the class LinearHT, for example, LinearHT ht = new LinearHT(ip, 256, 256); The binary image is passed as an ImageProcessor (ip), wherein any value greater than 0 is interpreted as an edge pixel. The other two parameters, nAng (256) and nRad (256), specify the number of discrete steps to use for the angle (Nθ steps for θi = 0 to π) and the radius (Nr steps for ri = −rmax to rmax ). The resulting increments for the angle and radius are thus Δθ =

π Nθ

and

Δr =

2 · rmax Nr

(see lines 17 and 21 in Prog. 3.1, respectively). The output of this program for a very noisy edge image is given in Fig. 3.9.

3.3.2 Analyzing the Accumulator Array The second phase is localizing the maximum values in the accumulator array Acc(iθ , ir ). As can readily be seen in Fig. 3.9 (b), even in the case where 3

The complete implementation of the Hough transform for straight lines can be found in the source code section of this book’s Website.

3.3 Implementing the Hough Transform

57

Algorithm 3.1 Simple Hough algorithm for detecting straight lines. It returns a list containing the parameters θ, r of the K strongest lines in the binary edge image I.

1: HoughLines(I, Nθ , Nr , K)

Computes the Hough transform to detect straight lines in the binary image I (of size M × N ), using Nθ , Nr discrete steps for the angle and radius, respectively. Returns the list of parameter pairs θi , ri for the K strongest lines found.

3:

(uc , vc ) ← ( M , N) 2 2 rmax ← u2c + vc2

4:

Δθ ←

5:

Δr ←

6: 7: 8:

Create the accumulator array Acc(iθ , ir ) of size Nθ × Nr for all accumulator cells (iθ , ir ) do Acc(iθ , ir ) ← 0 initialize the accumulator array

9: 10: 11: 12: 13: 14: 15: 16:

for all image coordinates (u, v) do scan the image if I(u, v) is an edge point then (x, y) ← (u−uc , v−vc ) coordinate relative to center for iθ ← 0 . . . Nθ −1 do angular index iθ θ ← Δθ · iθ real angle, 0 ≤ θ < π r ← x · cos(θ) + y · sin(θ) real radius (pos./neg.) ir ← N2r + round( Δrr ) radial index ir Acc(iθ , ir ) ← Acc(iθ , ir ) + 1 increment Acc(iθ , ir )

17: 18:

Find the parameters pairs θj , rj for the K strongest lines: MaxLines ← FindMaxLines(Acc, K) return MaxLines.

2:

π Nθ 2·rmax Nr

image center max. radius is half the image diagonal angular increment radial increment

the lines in the image are geometrically “straight”, the parameter space curves associated with them do not intercept at exactly one point in the accumulator array but rather their intersection points are distributed within a small area. This is primarily caused by the rounding errors introduced due to the discrete coordinate grid used in the accumulator array. Since the maximum points are really maximum areas in the accumulator array, simply traversing the array and returning its K largest values is not suﬃcient. Since this is a critical step in the algorithm, we will examine two diﬀerent approaches (Fig. 3.10) in the following.

58

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

3. Detecting Simple Curves

class LinearHT { ImageProcessor ip; // reference to the original image I int xCtr, yCtr; // x/y-coordinates of image center (uc , vc ) int nAng; // Nθ steps for the angle (θ = 0 . . . π) int nRad; // Nr steps for the radius (r = −rmax . . . rmax ) int cRad; // center of radius axis (r = 0) double dAng; // increment of angle Δθ double dRad; // increment of radius Δr int[][] houghArray; // Hough accumulator Acc(iθ , ir ) //constructor method: LinearHT(ImageProcessor ip, int nAng, int nRad) { this.ip = ip; this.xCtr = ip.getWidth()/2; this.yCtr = ip.getHeight()/2; this.nAng = nAng; this.dAng = Math.PI / nAng; this.nRad = nRad; this.cRad = nRad / 2; double rMax = Math.sqrt(xCtr * xCtr + yCtr * yCtr); this.dRad = (2.0 * rMax) / nRad; this.houghArray = new int[nAng][nRad]; fillHoughAccumulator(); } void fillHoughAccumulator() { int h = ip.getHeight(); int w = ip.getWidth(); for (int v = 0; v < h; v++) { for (int u = 0; u < w; u++) { if (ip.get(u, v) > 0) { doPixel(u, v); } } } } void doPixel(int u, int v) { int x = u - xCtr, y = v - yCtr; for (int ia = 0; ia < nAng; ia++) { double theta = dAng * ia; int ir = cRad + (int) Math.rint ((x*Math.cos(theta) + y*Math.sin(theta)) / dRad); if (ir >= 0 && ir < nRad) { houghArray[ia][ir]++; } } } } // end of class LinearHT

Program 3.1 Hough transform for localizing straight lines (partial implementation). The complete Java implementation can be found in the source code section of the book’s Website.

3.3 Implementing the Hough Transform

(a)

59

(b)

Figure 3.9 Hough transform for straight lines. The dimensions of the original image (a) are 360 × 240 pixels, so the maximal radius (measured from the image center (uc , vc )) is rmax ≈ 216. For the parameter space (b), a step size of 256 is used for both the angle θ = 0 . . . π (horizontal axis) and the radius r = −rmax . . . rmax (vertical axis). The four darkest spots in (b) mark the maximum values in the accumulator array, and their parameters correspond to the four lines in the original image. In (b), intensities have been inverted to improve legibility.

Approach A: Thresholding First the accumulator is thresholded to the value of ta by setting all accumulator values Acc(iθ , ir ) < ta to 0. The resulting scattering of points, or point clouds, are ﬁrst coalesced into regions (Fig. 3.10 (b)) using a technique such as a morphological closing operation (see Vol. 1 [14, Sec. 7.3.2]). Next the remaining regions must be localized, for instance using the region-ﬁnding technique from Sec. 2.1, and then each region’s centroid (see Sec. 2.4.3) can be utilized as the (noninteger) coordinates for the potential image space line. Often the sum of the accumulator’s values within a region is used as a measure of the strength (number of image points) of the line it represents. Approach B: Nonmaximum suppression In this method, local maxima in the accumulator array are found by suppressing nonmaximal values.4 This is carried out by determining for every cell in Acc(θ, r) whether the value is higher than the value of all of its neighboring cells. If this is the case, then the value remains the same; otherwise it is set to 0 (Fig. 3.10 (c)). The (integer) coordinates of the remaining peaks are potential line parameters, and their respective heights correlate with the strength of the image space line they represent. This method can be used in conjunction with a threshold operation to reduce the number of candidate points that must be 4

Nonmaximum suppression is also used in Sec. 4.2.3 for isolating corner points.

60

3. Detecting Simple Curves

(a)

(b)

(c)

(d)

Figure 3.10 Determining the local maximum values in the accumulator array. Original distribution of the values in the Hough accumulator (a). Variant A: Threshold operation using 50% of the maximum value (b). The remaining regions represent the four dominant lines in the image, and the coordinates of their centroids are a good approximation to the line parameters. Variant B: Using nonmaximum suppression results in a large number of local maxima (c) that must then be reduced using a threshold operation (d).

considered. The result for Fig. 3.9 (a) is shown in Fig. 3.10 (d).

3.3.3 Hough Transform Extensions So far, we have presented the Hough transform only in its most basic formulation. The following is a list of some of the more common methods of improving and reﬁning the algorithm.

3.3 Implementing the Hough Transform

61

Modiﬁed accumulator updating The purpose of the accumulator array is to ﬁnd the intersections of twodimensional curves. Due to the discrete nature of the image and accumulator coordinates, rounding errors usually cause the parameter curves for multiple image points on the same line not to intersect in a single accumulator cell. A common remedy is, for a given angle θ = iθ · Δθ (Alg. 3.1), to increment not only the corresponding accumulator cell Acc(iθ , ir ) but also the neighboring cells Acc(iθ , ir −1) and Acc(iθ , ir +1). This makes the Hough transform more tolerant against inaccurate point coordinates and rounding errors. Bias problem Since the value of a cell in the Hough accumulator represents the number of image points falling on a line, longer lines naturally have higher values than shorter lines. This may seem like an obvious point to make, but consider when the image only contains a small section of a “long” line. For instance, if a line only passes through the corner of an image then the cells representing it in the accumulator array will naturally have lower values than a “shorter” line that lies entirely within the image (Fig. 3.11). y a ra b

x

rb

Figure 3.11 Bias problem. When an image represents only a ﬁnite section of an object, then those lines nearer the center (smaller r values) will have higher values than those farther away (larger r values). As an example, the maximum value of the accumulator for line a will be higher than that of line b.

It follows then that if we only search the accumulator array for maximal values, it is likely that we will completely miss short line segments. One way to compensate for this inherent bias is to compute for each accumulator entry Acc(iθ , ir ) the maximum number of image points MaxHits(iθ , ir ) possible for a line with the parameters θ, r and then normalize Acc (iθ , ir ) ←

Acc(iθ , ir ) MaxHits(iθ , ir )

(3.9)

62

3. Detecting Simple Curves

for MaxHits(iθ , ir ) > 0. The normalization term MaxHits(iθ , ir ) can be determined, for example, by computing the Hough transform of an image with the same dimensions in which all pixels are edge pixels or by using a random image in which the pixels are uniformly distributed. Line endpoints Our simple version of the Hough transform determines the parameters of the line in the image but not their endpoints. These could be found in a subsequent step by determining which image points belong to any detected line (e. g., by applying a threshold to the perpendicular distance between the ideal line—deﬁned by its parameters—and the actual image points). An alternative solution is to calculate the extreme point of the line during the computation of the accumulator array. For this, every cell of the accumulator array is supplemented with two additional coordinate pairs xs = (xs , ys ), xe = (xe , ye ), i. e., Acc(iθ , ir ) = count, xs , xe . Now the coordinates for the endpoints (xs , xe ) of every line can be stored while ﬁlling in the accumulator array so that by the end of the process each cell contains the two endpoints that lie farthest from each other on the line it represents. When ﬁnding the maximum values in the second stage, care should be taken so that the merged cells contain the correct endpoints. Line intersections It may be useful in certain applications not to ﬁnd the lines themselves but their intersections, e. g., for precisely locating the corner points of a polygonshaped object. The Hough transform delivers the parameters of the recovered lines in Hessian normal form (i. e., as pairs Li = θi , ri ). To compute the point of intersection x0 = (x0 , y0 )T for two lines L1 = θ1 , r1

and

L2 = θ2 , r2 ,

we need to solve the system of linear equations x0 · cos(θ1 ) + y0 · sin(θ1 ) = r1 ,

(3.10)

x0 · cos(θ2 ) + y0 · sin(θ2 ) = r2 ,

(3.11)

for the unknowns x0 , y0 . The solution is x0

1 r1 sin(θ2 ) − r2 sin(θ1 ) · = cos(θ1 ) sin(θ2 ) − cos(θ2 ) sin(θ1 ) r2 cos(θ1 ) − r1 cos(θ2 ) 1 r sin(θ2 ) − r2 sin(θ1 ) (3.12) · 1 = sin(θ2 − θ1 ) r2 cos(θ1 ) − r1 cos(θ2 )

3.4 Hough Transform for Circles and Ellipses

63

for sin(θ2 − θ1 ) = 0. Obviously x0 is undeﬁned (no intersection point exists) if the lines L1 , L2 are parallel to each other (i. e., if θ1 ≡ θ2 ). Considering edge strength and orientation Until now, the raw data for the Hough transform was typically an edge map that was interpreted as a binary image with ones at potential edge points. Yet edge maps contain additional information, such as the edge strength E(u, v) and local edge orientation Φ(u, v) (see Vol. 1 [14, Sec. 6.3]), which can be used to improve the results of the HT. The edge strength E(u, v) is especially easy to take into consideration. Instead of incrementing visited accumulator cells by 1, add the strength of the respective edge: Acc(iθ , ir ) ← Acc(iθ , ir ) + E(u, v). In this way, strong edge points will contribute more to the accumulated value than weak points. The local edge orientation Φ(u, v) is also useful for limiting the range of possible orientation angles for the line at (u, v). The angle Φ(u, v) can be used to increase the eﬃciency of the algorithm by reducing the number of accumulator cells to be considered along the θ axis. Since this also reduces the number of irrelevant “votes” in the accumulator, it increases the overall sensitivity of the Hough transform (see, for example, [45, p. 483]). Hierarchical Hough transform The accuracy of the results increases with the size of the parameter space used; for example, a step size of 256 along the θ axis is equivalent to searching for π ≈ 0.7◦ . While increasing the number of accumulator cells lines at every 256 provides a ﬁner result, bear in mind that it also increases the computation time and especially the amount of memory required. Instead of increasing the resolution of the entire parameter space, the idea of the hierarchical HT is to gradually “zoom” in and reﬁne the parameter space. First, the regions containing the most important lines are found using a relatively low-resolution parameter space, and then the parameter spaces of those regions are recursively passed to the HT and examined at a higher resolution. In this way, a relatively exact determination of the parameters can be found using a limited (in comparison) parameter space.

3.4 Hough Transform for Circles and Ellipses

64

3. Detecting Simple Curves

3.4.1 Circles and Arcs Since lines in 2D have two degrees of freedom, they could be completely speciﬁed using two real-valued parameters. In a similar fashion, representing a circle in 2D requires three parameters, for example Circle = ¯ x, y¯, ρ, where x¯, y¯ are the coordinates of the center and ρ is the radius of the circle (Fig. 3.12). A point p = (x, y) lies on this circle when the relation (x − x ¯)2 + (y − y¯)2 = ρ2

(3.13)

holds. Therefore the Hough transform requires a three-dimensional parameter space Acc(¯ x, y¯, ρ) to ﬁnd the position and radius of circles (and circular arcs) in an image. Unlike the HT for lines, there does not exist a simple functional dependency between the coordinates in parameter space, so how can we ﬁnd every parameter combination (¯ x, y¯, ρ) that satisﬁes Eqn. (3.13) for a given image point (u, v)? One solution is to apply a “brute force” method such as described in Alg. 3.2 that exhaustively tests each cell in the parameter space to see if the relation in Eqn. (3.13) holds.

ρ

b

α

y¯ p = (u, v) x ¯

y¯ p = (u, v)

a x ¯

Figure 3.12 Representation of circles and ellipses in 2D.

If we examine Fig. 3.13, we can see that a better idea might be to make use of the fact that the coordinates of the center points also form a circle in Hough space. It is not necessary therefore to search the entire three-dimensional parameter space for each image point p = (u, v). Instead we need only increase the cell values along the edge of the appropriate circle on each ρ plane of the accumulator array. To do this, we can adapt any of the standard algorithms for generating circles. In this case, the integer math version of the well-known Bresenham algorithm [9] is particularly well-suited. Figure 3.14 shows the spatial structure of the three-dimensional parameter space for circles. For a given image point pk = (uk , vk ), at each plane along

3.4 Hough Transform for Circles and Ellipses

65

Algorithm 3.2 Exhaustive Hough algorithm for localizing circles.

1: HoughCircles(I)

Returns the list of parameters ¯ xi , y¯i , ρi corresponding to the strongest circles found in the binary image I. 2: 3: 4: 5: 6: 7: 8: 9:

Set up a three-dimensional array Acc(¯ x, y¯, ρ) and initialize to 0 for all image coordinates (u, v) do if I(u, v) is an edge point then for all (¯ xi , y¯i , ρi ) in the accumulator space do if (u− x ¯i )2 + (v− y¯i)2 = ρ2i then Increment Acc(¯ xi , y¯i , ρi ) MaxCircles ← FindMaxCircles(Acc) a list of tuples ¯ xj , y¯j , ρj return MaxCircles .

y¯

Acc(¯ x, y¯, ρi ) C1

C3 ρi p1

p3

v1

possible centers

C

p2

true center C2 x ¯ u1

Figure 3.13 Hough transform for circles. The illustration depicts a slice of the threedimensional accumulator array Acc(¯ x, y¯, ρ) at a given circle radius ρ = ρi . The center points of all the circles running through a given image point p1 = (u1 , v1 ) form a circle C1 with a radius of ρi centered around p1 , just as the center points of the circles that pass through p2 and p3 lie on the circles C2 , C3 . The cells along the edges of the three circles C1 , C2 , C3 of radius ρi are traversed and their values in the accumulator array incremented. The cell in the accumulator array contains a value of three where the circles intersect at the true center of the image circle C.

the ρ axis (for ρi = ρmin . . . ρmax ), a circle centered at (uk , vk ) with the radius ρi is traversed, ultimately creating a three-dimensional cone-shaped surface in the parameter space. The coordinates of the dominant circles can be found by searching the accumulator space for the cells with the highest values; that is, the cells where the most cones intersect. Just as in the linear HT, the bias problem (see Sec. 3.3.3) also occurs in

66

3. Detecting Simple Curves

100

80 3D parameter space: x ¯, y¯ = 0 . . . 100 ρ = 10 . . . 30

60 y¯

40 30

20 25

Image points pk : p1 = (30, 50) p2 = (50, 50) p3 = (40, 40) p4 = (80, 20)

20 ρ

0 0

20

15

40

60 x ¯

80

10 100

Figure 3.14 Three-dimensional parameter space for circles. For each image point pk = x, y¯, ρ) (uk , vk ), the cells lying along a cone in the three-dimensional accumulator array Acc(¯ are incremented.

the circle HT. Sections of circles (i. e., arcs) can be found in a similar way, in which case the maximum value possible for a given cell is proportional to the arc length.

3.4.2 Ellipses In a perspective image, most circular objects originating in our real, threedimensional world will actually appear in 2D images as ellipses, except in the case where the object lies on the optical axis and is observed from the front. For this reason, perfectly circular structures seldom occur in photographs. While the Hough transform can still be used to ﬁnd ellipses, the larger parameter space required makes it substantially more expensive. A general ellipse in 2D has ﬁve degrees of freedom and therefore requires ﬁve parameters to represent it, Ellipse = ¯ x, y¯, ra , rb , α, where (¯ x, y¯) are the coordinates of the center points, (ra , rb ) are the two radii,

3.5 Exercises

67

and α is the orientation of the principal axis (Fig. 3.12).5 In order to ﬁnd ellipses of any size, position, and orientation using the Hough transform, a ﬁve-dimensional parameter space with a suitable resolution in each dimension is required. A simple calculation illustrates the enormous expense of representing this space: using a resolution of only 128 = 27 steps in ever dimension results in 235 accumulator cells, and implementing these using 4-byte int values thus requires 237 bytes (128 gigabytes) of memory. An interesting alternative in this case is the generalized Hough transform, which in principle can be used for detecting any arbitrary two-dimensional shape [2, 39]. Using the generalized Hough transform, the shape of the soughtafter contour is ﬁrst encoded point by point in a table and then the associated parameter space is related to the position (xc , yc ), scale S, and orientation θ of the shape. This requires a four-dimensional space, which is smaller than that of the Hough method for ellipses described above.

3.5 Exercises Exercise 3.1 Implement a version of the Hough transform for straight lines that incorporates the modiﬁed accumulator update, as suggested in Sec. 3.3.3. Analyze the extent to which the method improves the robustness with respect to inaccurate or noisy point positions. Exercise 3.2 Implement a version of the Hough transform for ﬁnding lines that takes into account line endpoints as described in Sec. 3.3.3. Exercise 3.3 Implement a hierarchical Hough transform for straight lines (see p. 63) capable of accurately determining line parameters. Exercise 3.4 Implement the Hough transform for ﬁnding circles and circular arcs with varying radii. Make use of a fast algorithm for generating circles, such as described in Sec. 3.4, in the accumulator array.

5

See Eqn. (2.34) on p. 43 for a parametric equation of this ellipse.

4

Corner Detection

Corners are prominent structural elements in an image and are therefore useful in a wide variety of applications, including following objects across related images (tracking), determining the correspondence between stereo images, serving as reference points for precise geometrical measurements, and calibrating camera systems for machine vision applications. Thus corner points are important not only in human vision but they are also “robust” in the sense that they do not arise accidentally in 3D scenes and furthermore can be located quite reliably under a wide range of viewing angles and lighting conditions.

4.1 Points of Interest Despite being easily recognized by our visual system, accurately and precisely detecting corners automatically is not a trivial task. A good corner detector must satisfy a number of criteria, including distinguishing between true and accidental corners, reliably detecting corners in the presence of realistic image noise, and precisely and accurately determining the locations of corners, and ﬁnally it should be possible to implement the detector eﬃciently enough so that it can be utilized in real-time applications such as video tracking. Numerous methods for ﬁnding corners or similar interest points have been proposed and most of them take advantage of the following basic principle. While an edge is usually deﬁned as a location in the image at which the gradient is especially high in one direction and low in the direction normal to it, a corner point is deﬁned as a location that exhibits a strong gradient value in multiple directions at the same time.

W. Burger, M.J. Burge, Principles of Digital Image Processing, Undergraduate Topics in Computer Science, DOI 10.1007/978-1-84800-195-4_4, © Springer-Verlag London Limited, 2009

70

4. Corner Detection

Most methods take advantage of this observation by examining the ﬁrst or second derivative of the image in the x and y directions to ﬁnd corners (e. g., [23, 31, 49, 51]). In the next section, we describe in detail the Harris detector, also known as the “Plessey feature point detector” [31], since it turns out that even though more eﬃcient detectors are known (see, for example, [63, 68]), the Harris detector, and other detectors based on it, are the most widely used in practice.

4.2 Harris Corner Detector This operator, developed by Harris and Stephens [31], is one of a group of related methods based on the same premise: a corner point exists where the gradient of the image is especially strong in more than one direction at the same time. In addition, locations along edges, where the gradient is strong in only one direction, should not be considered as corners, and the detector should be isotropic, i. e., independent of the orientation of the local gradients.

4.2.1 Local Structure Matrix Computations based on the ﬁrst partial derivatives of the image function I(u, v) in the horizontal and vertical directions are the foundation of the Harris detector: ∂I ∂I (u, v) and Iy (u, v) = (u, v). (4.1) Ix (u, v) = ∂x ∂y For each image position (u, v), we ﬁrst compute the three values A(u, v), B(u, v), and C(u, v), A(u, v) = Ix2 (u, v),

(4.2)

Iy2 (u, v),

(4.3)

C(u, v) = Ix (u, v) · Iy (u, v),

(4.4)

B(u, v) =

which will be interpreted as elements of the local structural matrix M (u, v):1 M=

Ix2 Ix Iy Ix Iy Iy2

=

A C C B

.

(4.5)

Next, each of the three functions A(u, v), B(u, v), C(u, v) is individually smoothed by convolution with a linear Gaussian ﬁlter H G,σ (see Vol. 1 [14, 1

For improved legibility, we simplify the notation used in the following by omitting the function coordinates (u, v); e. g., the function Ix (u, v) is abbreviated as Ix or A(u, v) is simply denoted A etc.

4.2 Harris Corner Detector

Sec. 5.2.7]),

¯ = M

71

A∗H G,σ C ∗H G,σ C ∗H G,σ B ∗H G,σ

=

A¯ C¯ ¯ C¯ B

.

¯ is symmetric, it can be diagonalized to Since the matrix M

¯ = λ1 0 , M 0 λ2 ¯ , deﬁned as2 where λ1 and λ2 are the eigenvalues of the matrix M trace(M ¯) ¯ ) 2 trace(M ¯) λ1,2 = − det(M ± 2 2 1 ¯ ¯ ¯2 ¯ +B ¯ 2 + 4C¯ 2 . = A + B ± A − 2A¯B 2

(4.6)

(4.7)

(4.8)

These eigenvalues, which are positive and real, contain essential information about the local image structure. Within an image region that is uniform (that ¯ = 0 and therefore λ1 = λ2 = 0. On an ideal ramp, however, is, appears ﬂat), M the eigenvalues are λ1 > 0 and λ2 = 0, independent of the orientation of the edge. The eigenvalues thus encode an edge’s strength, and their associated eigenvectors represent the edge’s orientation. A corner should have a strong edge in the main direction (corresponding to the larger of the two eigenvalues), another edge normal to the ﬁrst (corresponding to the smaller eigenvalues), and both eigenvalues must be signiﬁcant. Since ¯ ) > 0 and thus |λ1 | ≥ |λ2 |. Therefore ¯ B ¯ ≥ 0, we can assume that trace(M A, ¯ )/2 − only the smaller of the two eigenvalues, λ2 = trace(M . . . , is relevant when determining a corner.

4.2.2 Corner Response Function (CRF) As we can see from Eqn. (4.8), the diﬀerence between the two eigenvalues is

1 ¯ ) 2 − det(M ¯ ), · trace(M λ1 − λ2 = 2 · 4

¯ ) holds. At a corner, this ¯ ))2 > det(M where in every case 0.25 · (trace(M expression should be as small as possible, and therefore the Harris detector deﬁnes the function

¯ ) − α · trace(M ¯) 2 Q(u, v) = det(M ¯ − C¯ 2 ) − α · (A¯ + B) ¯ 2 = (A¯B (4.9) 2

¯ ) denotes the determinant and trace(M ¯ ) denotes the trace of the Where det(M ¯ (see, for example, [11, pp. 252 and 259]). matrix M

72

4. Corner Detection

as a measure of “corner strength”, where the parameter α determines the sensitivity of the detector. Q(u, v) is called the “corner response function” and returns maximum values at isolated corners. In practice, α is assigned a ﬁxed value in the range of 0.04 to 0.06 (max. 0.25 = 14 ). The larger the value of α, the less sensitive the detector is and the fewer corners detected.

4.2.3 Determining Corner Points An image location (u, v) is selected as a candidate for a corner point when Q(u, v) > tH , where the threshold tH is selected based on image content and typically lies within the range of 10,000 to 1,000,000. Once selected, the corners ci = ui , vi , qi are inserted into the vector Corners = [c1 , c2 , . . . cN ] , which is then sorted in descending order (i. e., qi ≥ qi+1 ) according to corner strength qi = Q(ui , vi ), as deﬁned in Eqn. (4.9). To suppress the false corners that tend to arise in densely packed groups around true corners, all except the strongest corner in a speciﬁed vicinity are eliminated. To accomplish this, the list Corners is traversed from the front to the back, and the weaker corners toward the end of the list, which lie in the surrounding neighborhood of a stronger corner, are deleted. The complete algorithm for the Harris detector is summarized again in Alg. 4.1, and the associated parameters are explained in Table 4.1.

4.2.4 Example Figure 4.1 uses a simple synthetic image to illustrate the most important steps in corner detection using the Harris detector. The ﬁgure shows the result of the gradient the three components of the structure matrix

A Ccomputation, M (u, v) = C B , and the values of the corner response function Q(u, v) for each image position (u, v). This example utilizes the standard settings as given in Table 4.1. The second example (Fig. 4.2) illustrates the detection of corner points in a grayscale representation of a natural scene. It demonstrates how weak corners are eliminated in favor of the strongest corner in a region.

4.3 Implementation Since the Harris detector algorithm is more complex than the algorithms we presented earlier, in the following sections, we explain its implementation in

4.3 Implementation

73

I(u, v)

A(u, v) = Ix2 (u, v)

B(u, v) = Iy2 (u, v)

C(u, v) = Ix (u, v) · Iy (u, v)

Q(u, v)

detected corners

Figure 4.1 Harris corner detector—Example 1. Starting with the original image I(u, v), the ﬁrst derivative is computed, and then from it the components of the structure matrix M(u, v), with A(u, v) = Ix2 (u, v), B = Iy2 (u, v), C = Ix (u, v) · Iy (u, v). A(u, v) and B(u, v) represent, respectively, the strength of the horizontal and vertical edges. In C(u, v), the values are strongly positive (white) or strongly negative (black) only where the edges are strong in both directions (null values are shown in gray). The corner response function, Q(u, v), exhibits noticeable positive peaks at the corner positions.

74

4. Corner Detection

(a)

(b)

(c)

(d)

(e)

Figure 4.2 Harris corner detector—Example 2. A complete result with the ﬁnal corner points marked (a). After selecting the strongest corner points within a 10-pixel radius, only 335 of the original 615 candidate corners remain. Details before (b, c) and after selection (d, e).

4.3 Implementation

75

Algorithm 4.1 Harris corner detector (Part 1). This algorithm takes an intensity image I and creates a sorted list of detected corner points. ∗ is the convolution operator used for linear ﬁlter operations. Details for the parameters Hp , Hdx , Hdy , Hb , α, and tH can be found in Table 4.1.

1: HarrisCorners(I)

Returns a list of the strongest corners found in the image I. Step 1—compute the corner response function: 2: 3: 4:

I ← I ∗ Hp Ix ← I ∗ Hdx Iy ← I ∗ Hdy

5:

9: 10: 11:

for all image coordinates (u, v) do A C

Compute elements of the local structure matrix M = C B : 2 A(u, v) ← Ix (u, v) B(u, v) ← Iy2 (u, v) C(u, v) ← Ix (u, v) · Iy (u, v)

¯ = A¯¯ C¯¯ : Blur each component of the structure matrix: M C B A¯ ← A ∗ Hb ¯ ← B ∗ Hb B ¯ C ← C ∗ Hb

12:

Compute the corner response function:

¯ v) · B(u, ¯ v) − C¯ 2 (u, v) − α · A(u, ¯ v) + B(u, ¯ v) 2 Q(u, v) ← A(u,

6: 7: 8:

preﬁlter (smooth) the image horizontal derivative vertical derivative

Step 2—collect the corner points:

17: 18: 19:

Create an empty list: Corners ← [ ] for all image coordinates (u, v) do if Q(u, v) > tH and IsLocalMax(Q, u, v) then Create a new corner ci : ci ← ui , vi , qi = u, v, Q(u, v) Add ci to Corners Sort Corners by qi in descending order (strongest corners ﬁrst) GoodCorners ← CleanUpNeighbors(Corners)

20:

return GoodCorners .

13: 14: 15: 16:

greater detail. While reading the following sections you may wish to refer to the complete source code for the class HarrisCornerDetector, which can be found in Appendix B (pp. 294–300).

76

4. Corner Detection

Algorithm 4.2 Harris corner detector (Part 2). Procedures for ﬁnding local maxima in the corner response function and cleaning up the list of detected corner points. Details for the parameter dmin can be found in Table 4.1.

1: IsLocalMax(Q, u, v) determine if Q(u, v) is a local maximum 2: Let qc ← Q(u, v) (center pixel) 3: Let N ← Neighbors(Q, u, v) values of all neighboring pixels 4: if qc ≥ qi for all qi ∈ N then 5: return true 6: else 7: return false. 8: CleanUpNeighbors(Corners) 9: Create an empty list: 10: 11: 12: 13: 14: 15: 16:

Corners is sorted by descending q

GoodCorners ← [ ] while Corners is not empty do ci ← RemoveFirst(Corners) Add ci to GoodCorners for all cj in Corners do if Dist(ci , cj ) < dmin then Delete cj from Corners return GoodCorners .

4.3.1 Step 1: Computing the Corner Response Function In order to handle the range of the positive and negative values generated by the ﬁlters used in this step, we will need to use ﬂoating-point images to store the intermediate results, which also assures suﬃcient range and precision for small values. The kernels of the required ﬁlters, i. e., the presmoothing ﬁlter Hp , the gradient ﬁlters Hdx , Hdy , and the smoothing ﬁlter for the structure matrix Hb are stored as one-dimensional float arrays: 1 2 3 4

float[] pfilt = {0.223755f,0.552490f,0.223755f}; // Hp float[] dfilt = {0.453014f,0.0f,-0.453014f}; // Hdx , Hdy float[] bfilt = {0.01563f,0.09375f,0.234375f,0.3125f, 0.234375f,0.09375f,0.01563f}; // Hb

From the original 8-bit image (of type ByteProcessor), we ﬁrst create two copies, Ix and Iy, of type FloatProcessor: 5 6

FloatProcessor Ix = (FloatProcessor) ip.convertToFloat(); FloatProcessor Iy = (FloatProcessor) ip.convertToFloat();

The ﬁrst processing step is a presmoothing with the ﬁlter Hp (Alg. 4.1, line 2). Subsequently the gradient ﬁlters Hdx and Hdy are used to compute the horizontal and vertical derivatives (Alg. 4.1, line 4). Since one-dimensional

4.3 Implementation

77

Table 4.1 Harris corner detector—actual parameter values.

Preﬁlter (Alg. 4.1, line 2): Smoothing with a small xy-separable ﬁlter Hp = Hpx ∗ Hpy , where ⎡ ⎤ 2 1 1 T Hpx = and Hpy = Hpx = ⎣5⎦ . 252 9 9 2 Gradient ﬁlter (Alg. 4.1, line 4): Computing the ﬁrst partial derivative in the x and y directions with ⎡ ⎤ −0.453014 T ⎦. and Hdy = Hdx =⎣ Hdx = −0.453014 0 0.453014 0 0.453014 Blurﬁlter (Alg. 4.1, line 11): Smoothing the individual components of the structure matrix M with separable Gaussian ﬁlters Hb = Hbx ∗ Hby with ⎡ ⎤ 1 ⎢6⎥ ⎢ ⎥ ⎢ 15 ⎥ ⎥ 1 1 ⎢ ⎢ ⎥ T Hby = Hbx = Hbx = 1 6 15 20 15 6 1 , ⎢20⎥ . 64 64 ⎢ ⎥ ⎢ 15 ⎥ ⎢ ⎥ ⎣6⎦ 1 Steering parameter (Alg. 4.1, line 12): α = 0.04 to 0.06 (default 0.05) Response threshold (Alg. 4.1, line 15): tH = 10,000 to 1,000,000 (default 25,000) Neighborhood radius (Alg. 4.2, line 15): dmin = 10 pixels

ﬁlters of the same direction are applied in each step, presmoothing and gradient computation can be combined in a single step: 7 8

Ix = convolve1h(convolve1h(Ix,pfilt),dfilt); Iy = convolve1v(convolve1v(Iy,pfilt),dfilt);

The methods convolve1h(I , h) and convolve1v(I , h) above perform onedimensional ﬁlter operations h on the image I in the horizontal and vertical directions, respectively (see “ﬁlter methods” below). Now the components A,

78

4. Corner Detection

B, C of the structure matrix M are computed and then smoothed using the separable 2D ﬁlter Hb (bfilt): 9 10 11 12 13 14 15

A = sqr ((FloatProcessor) Ix.duplicate()); B = sqr ((FloatProcessor) Iy.duplicate()); C = mult((FloatProcessor) Ix.duplicate(),Iy); A = convolve2(A,bfilt); B = convolve2(B,bfilt); C = convolve2(C,bfilt);

// convolve with Hb

The variables A, B, C of type FloatProcessor are declared in the class Harris CornerDetector. The method convolve2(I , h) performs a separable 2D convolution of the image I using the 1D ﬁlter kernel h. mult() and sqr() are auxiliary methods for multiplying two images and squaring an image, respectively (see Appendix B, p. 299 for the complete source code). Finally, the corner response function (Alg. 4.1, line 12) is computed using the method makeCrf(), and a new image of type FloatProcessor is created: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

void makeCrf() { // deﬁned in class HarrisCornerDetector int w = ipOrig.getWidth(); int h = ipOrig.getHeight(); Q = new FloatProcessor(w,h); float[] Apix = (float[]) A.getPixels(); float[] Bpix = (float[]) B.getPixels(); float[] Cpix = (float[]) C.getPixels(); float[] Qpix = (float[]) Q.getPixels(); for (int v=0; v c2.q) return -1; if (this.q < c2.q) return 1; else return 0; }

Cleaning up The ﬁnal step is to remove the weakest corners in a limited area where the size of this area is speciﬁed by the radius dmin (Alg. 4.1, lines 8–16). This process is outlined in Fig. 4.3 and implemented in the method cleanupCorners() below. The Vector corners, which was already sorted according to q, is now converted into an ordinary array (line 89) and then iterated through from beginning to end: 85 86 87 88 89 90 91 92 93 94 95 96

List cleanupCorners(List corners) { // corners is assumed to be sorted by descending q double dmin2 = dmin*dmin; // d2min ( dmin is an object variable) Corner[] cornerArray = new Corner[corners.size()]; cornerArray = corners.toArray(cornerArray); List goodCorners = new Vector(corners.size()); for (int i = 0; i < cornerArray.length; i++){ if (cornerArray[i] != null) { // select the next “good” corner c1

82

4. Corner Detection

c3 c6

c0 c7

c0 c1 c2 c3 c4 c5 c6 c7 c8 c9

c5

dmin c4

corners

c1 c9

goodCorners

c8 c2

c0

Figure 4.3 Selecting the strongest corners within a given spatial distance. The original list of corners (corners) is sorted by “corner strength” in descending order; i. e., c0 is the strongest corner. First, corner c0 is added to a new list goodCorners, while the weaker corners c4 and c8 (which are both within distance dmin from c0 ) are removed from the original corners list. The following corners c1 , c2 , . . . are treated similarly until no more elements remain in corners. None of the corners in the resulting list goodCorners is closer to another corner than dmin .

97 98 99 100 101 102 103 104 105 106 107 108 109 110

Corner c1 = cornerArray[i]; goodCorners.add(c1); // remove all remaining corners too close to c1 for (int j = i+1; j < cornerArray.length; j++) { if (cornerArray[j] != null) { Corner c2 = cornerArray[j]; if (c1.dist2(c2) < dmin2) //compare squared distances cornerArray[j] = null; //remove corner c2 } } } } return goodCorners; }

At this point, weak corner points within the neighborhood of a stronger corner point, where the neighborhood is deﬁned by the dmin radius, are deleted (line 104), and only those corner points that remain (that is, the strongest ones) are copied into the new list goodCorners (which is also implemented as a Vector). The method call c1.dist2(c2) in line 103 computes the squared Euclidean distance d2 (c1 , c2 ) = (u1 −u2 )2 +(v1 −v2 )2 between the corner points c1 and c2 . Since the square of the distance suﬃces for the comparison, we do not need to compute the actual distance, and consequently we avoid calling the expensive square root function. This is a common trick when comparing distances.

4.3 Implementation

83

4.3.3 Displaying the Corner Points In order to visualize the locations of the corner points ﬁnally selected, we now place markers at the corresponding positions in the original image. The method showCornerPoints() below (deﬁned in the class HarrisCornerDetector) ﬁrst creates a copy of the original image ip and increases, with the help of a lookup table, the overall brightness of the intensity range 128 to 255, and at the same time reduces the contrast by half (lines 114–118). Then the list corners is iterated through, and each Corner object “draws itself” onto the display image ipResult by calling its draw() method (line 121): 111 112 113 114 115 116 117 118 119 120 121 122 123 124

ImageProcessor showCornerPoints(ImageProcessor ip) { ByteProcessor ipResult = (ByteProcessor) ip.duplicate(); // change background image contrast and brightness int[] lookupTable = new int[256]; for (int i=0; i> 16; int G = (C & 0x0000FF00) >> 8; int B = (C & 0x000000FF); byte RGB = (byte) // 8-bit color pixel (3:3:2-packed) (R & 0xE0 | (G & 0xE0) >> 3 | (B & 0xC0) >> 6);

Program 5.1 3:3:2 quantization of a 24-bit RGB color pixel using bit operations (see also Fig. 5.1 (b) and Exercise 5.1).

(a) W B

W B

G

G

S

S R (b)

R (c)

Figure 5.2 Color distribution after a scalar 3:3:2 quantization. Original color image (a). Distribution of the original 226,321 colors (b) and the remaining 8 × 8 × 4 = 256 colors after 3:3:2 quantization (c) in the RGB color cube.

quantization is an optimal solution only if the image colors are uniformly distributed within the RGB cube. However, the typical color distribution in natural images is anything but uniform, with some regions of the color space being densely populated and many colors entirely missing. In this case, scalar quan-

88

5. Color Quantization

tization is not optimal because the interesting colors may not be sampled with suﬃcient density while at the same time colors are represented that do not appear in the image at all.

5.2 Vector Quantization Vector quantization does not treat the individual color components separately as does scalar quantization, but each color vector Ci = (ri , gi , bi ) or pixel in the image is treated as a single entity. Starting from a set of original color tuples C = {C1 , C2 , . . . Cm }, the task of vector quantization is (a) to ﬁnd a set of n representative color vectors C = {C1 , C2 , . . . Cn } and (b) to replace each original color Ci by one of the new color vectors Cj ∈ C , where n is usually predetermined (n < m) and the resulting deviation from the original image shall be minimal. This is a combinatorial optimization problem in a rather large search space, which usually makes it impossible to determine a global optimum in adequate time. Thus all of the following methods only compute a “local” optimum at best.

5.2.1 Populosity algorithm The populosity algorithm1 [32] selects the n most frequent colors in the image as the representative set of color vectors C . Being very easy to implement, this procedure is quite popular. The method described in Vol. 1 [14, Sec. 8.3.1], based on sorting the image pixels, can be used to determine the n most frequent image colors. Each original pixel Ci is then replaced by the closest representative color vector in C ; i. e., the quantized color vector with the smallest distance in the 3D color space. The algorithm performs suﬃciently only as long as the original image colors are not widely scattered through the color space. Some improvement is possible by grouping similar colors into larger cells ﬁrst (by scalar quantization). However, a less frequent (but possibly important) color may get lost whenever it is not suﬃciently similar to any of the n most frequent colors.

5.2.2 Median-cut algorithm The median-cut algorithm [32] is considered a classical method for color quantization that is implemented in many applications (including ImageJ). As in the populosity method, a color histogram is ﬁrst computed for the original image, traditionally with a reduced number of histogram cells (such 1

Sometimes also called the “popularity” algorithm.

5.2 Vector Quantization

1st cut

89

2nd cut

3rd cut

Figure 5.3 Median-cut algorithm. The RGB color space is recursively split into smaller cubes along one of the color axes.

as 32 × 32 × 32) for eﬃciency reasons.2 The initial histogram volume is then recursively split into smaller boxes until the desired number of representative colors is reached. In each recursive step, the color box representing the largest number of pixels is selected for splitting. A box is always split across the longest of its three axes at the median point, such that half of the contained pixels remain in each of the resulting subboxes (Fig. 5.3). The result of this recursive splitting process is a partitioning of the color space into a set of disjoint boxes, with each box ideally containing the same number of image pixels. In the last step, a representative color vector (e. g., the mean vector of the contained colors) is computed for each color cube, and all the image pixels it contains are replaced by that color. The advantage of this method is that color regions of high pixel density are split into many smaller cells, thus reducing the overall quantization error. In color regions of low density, however, relatively large cubes and thus large color deviations may occur for individual pixels. The median-cut method is described in detail in Algorithms 5.1–5.3 and a corresponding Java implementation can be found in the source code section of this book’s Website.

5.2.3 Octree algorithm Similar to the median-cut algorithm, this method is also based on partitioning the three-dimensional color space into cells of varying size. The octree algorithm [26] utilizes a hierarchical structure, where each cube in color space may contain eight subcubes. This partitioning is represented by a tree structure (octree) with a cube at each node that may again link to up to eight further nodes. Thus each node corresponds to a subrange of the color space that re2

This corresponds to a scalar prequantization on the color components, which leads to additional quantization errors and thus produces suboptimal results. This step seems unnecessary on modern computers and should be avoided.

90

5. Color Quantization

Algorithm 5.1 Median-cut color quantization (Part 1). The input image I is quantized to up to Kmax representative colors and a new, quantized image is returned. The main work is done in procedure FindRepresentativeColors(), which iteratively partitions the color space into increasingly smaller boxes. It returns a set of representative colors (CR ) that are subsequently used by procedure QuantizeImage() to quantize the original image I. Note that (unlike in most common implementations) no prequantization is applied to the original image colors.

1: MedianCut(I, Kmax )

I: color image, Kmax : max. number of quantized colors Returns a new quantized image with at most Kmax colors. 2: 3:

CR ← FindRepresentativeColors(I, Kmax ) return QuantizeImage(I, CR )

see Alg. 5.3

4: FindRepresentativeColors(I, Kmax )

Returns a set of up to Kmax representative colors for the image I. 5:

6: 7: 8:

9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23:

Let C = {c1 , c2 , . . . cK } be the set of distinct colors in I. Each of the K color elements in C is a tuple ci = redi , grni , blui , cnti consisting of the RGB color components (red, grn, blu) and the number of pixels (cnt) in I with that particular color. if |C| ≤ Kmax then return C. else Create a color box b0 at level 0 that contains all image colors C and make it the initial element in the set of color boxes B: Let b0 ← CreateColorBox(C , 0) see Alg. 5.2 Let B ← {b0 } initial set of color boxes Let k ← 1 Let done ← false while k < Nmax and not done do b ← FindBoxToSplit(B) see Alg. 5.2 if b = nil then (b1 , b2 ) ← SplitBox(b) see Alg. 5.2 B ← B − {b} remove b from B B ← B ∪ {b1 , b2 } insert b1 , b2 into B k ←k+1 else no more boxes to split done ← true Determine the average color inside each color box in set B: Let CR ← {AverageColors(bj ) | bj ∈ B} see Alg. 5.3 return CR .

5.2 Vector Quantization

91

Algorithm 5.2 Median-cut color quantization (Part 2).

1: CreateColorBox(C, m)

Creates and returns a new color box containing the colors C. A color box b is a tuple colors, level, rmin, rmax, gmin, gmax, bmin, bmax, where colors is the vector of image colors represented by the box, level denotes the split-level, and rmin, . . . bmax describe the color boundaries of the box in RGB space. Find the RGB extrema of all⎫colors in this box: Let rmin ← min red(c) ⎪ ⎪ Let rmax ← max red(c) ⎪ ⎪ ⎪ ⎬ Let gmin ← min grn(c) for all colors c ∈ C Let gmax ← max grn(c) ⎪ ⎪ ⎪ Let bmin ← min blu(c) ⎪ ⎪ ⎭ Let bmax ← max blu(c) 3: Create a new color box: b ← C, m, rmin , rmax , gmin, gmax , bmin , bmax 4: return b. 5: FindBoxToSplit(B) Searches the set of boxes B for a box to split and returns this box, or nil if no splittable box can be found. 2:

6:

7: 8: 9: 10: 11:

Let Bs be the set of all color boxes that can be split (i. e., contain at least 2 diﬀerent colors): Bs ← { b | b ∈ B ∧ |colors(b)| ≥ 2} if Bs = {} then no splittable box was found return nil. else Select a box bx ∈ Bs , such that level(bx ) is a minimum. return bx .

12: SplitBox(b)

13: 14: 15: 16: 17: 18: 19: 20:

Splits the color box b at the median plane perpendicular to its longest dimension and returns a pair of new color boxes. Let m ← level(b) Let d ← FindMaxBoxDimension(b) see Alg. 5.3 Let C ← colors(b) the set of colors in box b From all color samples in C determine xmed as the median of the color distribution along dimension d . Partition the set C into two disjoint sets C1 and C2 by splitting at xmed along dimension d. Let b1 ← CreateColorBox(C1 , m + 1) Let b2 ← CreateColorBox(C2 , m + 1) return (b1 , b2 ).

92

5. Color Quantization

Algorithm 5.3 Median-cut color quantization (Part 3).

1: AverageColors(b)

Returns the average color cavg for the pixels represented by the color box b. 2: 3:

Let C ← colors(b) the set of colors in box b Let n ← 0, rsum ← 0, gsum ← 0, bsum ← 0

4: 5: 6: 7: 8: 9:

for all Let Let Let Let Let

c ∈ C do k ← cnt(c) n←n+k rsum ← rsum + k · red(c) gsum ← gsum + k · grn(c) bsum ← bsum + k · blu(c)

10:

Let ravg ←

11:

Let cavg ← (ravg , gavg , bavg )

12:

return cavg .

1 n

· rsum , gavg ←

1 n

· gsum , bavg ←

1 n

· bsum

13: FindMaxBoxDimension(b) 14: 15: 16: 17:

Returns the largest dimension of the color box b (Red, Green, or Blue). Let sizer = rmax(b) − rmin(b) Let sizeg = gmax(b) − gmin(b) Let sizeb = bmax(b) − bmin(b) Let sizemax = max(sizer , sizeg , sizeb )

18: 19: 20: 21: 22: 23:

if sizemax = sizer then return Red. else if sizemax = sizeg then return Green. else return Blue.

24: QuantizeImage(I, CR )

Returns a new image with color pixels from I replaced by their closest representative colors in CR . 25: 26: 27: 28: 29:

Create a new image I the same size as I. for all image coordinates (u, v) do Let c be the color in CR that is “closest” to I(u, v) (e. g., using the Euclidean distance in RGB space). I (u, v) ← c return I .

5.2 Vector Quantization

93

W

B

W

B

G

G

S

S R (a)

R (b)

Figure 5.4 Color distribution after application of the median-cut (a) and octree (b) algorithms. In both cases, the set of 226,321 colors in the original image (Fig. 5.2 (a)) was reduced to 256 representative colors.

duces to a single color point at a certain tree depth d (e. g., d = 8 for a 3 × 8-bit RGB color image). When an image is processed, the corresponding quantization tree, which is initially empty, is created dynamically by evaluating all pixels in a sequence. Each pixel’s color tuple is inserted into the quantization tree, while at the same time the number of nodes is limited to a predeﬁned value K (typically 256). When a new color tuple Ci is inserted and the tree does not contain this color, one of the following situations can occur: 1. If the number of nodes is less than K, a new node is created for Ci . 2. Otherwise (i. e., if the number of nodes is K), the existing nodes at the maximum tree depth (which represent similar colors) are merged into a common node. A key advantage of the iterative octree method is that the number of color nodes remains limited to K in any step and thus the amount of required storage is small. The ﬁnal replacement of the image pixels by the quantized color vectors can also be performed easily and eﬃciently with the octree structure because only up to eight comparisons (one at each tree layer) are necessary to locate the best-matching color for each pixel. Figure 5.4 shows the resulting color distributions in RGB space after applying the median-cut and octree algorithms. In both cases, the original image (Fig. 5.2 (a)) is quantized to 256 colors. Notice in particular the dense placement of quantized colors in certain regions of the green hues. For both algorithms and the (scalar) 3:3:2 quantization, the resulting dis-

94

5. Color Quantization

(a) Detail

(b) 3:3:2

(c) Median-Cut

(d) Octree

Figure 5.5 Quantization errors. Original image (a), distance between original and quantized color pixels for scalar 3:3:2 quantization (b), median-cut (c), and octree (d) algorithms.

tances between the original pixels and the quantized colors are shown in Fig. 5.5. The greatest error naturally results from 3:3:2 quantization, because this method does not consider the contents of the image at all. Compared with the median-cut method, the overall error for the octree algorithm is smaller, although the latter creates several large deviations, particularly inside the colored foreground regions and the forest region in the background. In general, however, the octree algorithm does not oﬀer signiﬁcant advantages in terms of the resulting image quality over the simpler median-cut algorithm.

5.2.4 Other methods for vector quantization A suitable set of representative color vectors can usually be determined without inspecting all pixels in the original image. It is often suﬃcient to use only 10% of randomly selected pixels to obtain a high probability that none of the important colors is lost. In addition to the color quantization methods described above, several other

5.3 Exercises

95

procedures and reﬁned algorithms have been proposed. This includes statistical and clustering methods, such as the classical k-means algorithm, but also the use of neural networks and genetic algorithms. A good overview can be found in [67].

5.3 Exercises Exercise 5.1 Simplify the 3:3:2 quantization given in Prog. 5.1 such that only a single bit mask/shift step is performed for each color component. Exercise 5.2 The median-cut algorithm for color quantization (Sec. 5.2.2) is implemented in the Independent JPEG Group’s 3 libjpeg open source software with the following modiﬁcation: the choice of the cube to be split next depends alternately on (a) the number of contained image pixels and (b) the cube’s geometric volume. Consider the possible motives and discuss examples where this approach may oﬀer an improvement over the original algorithm.

3

www.ijg.org.

6 Colorimetric Color Spaces

In any application that requires precise, reproducible, and device-independent presentation of colors, the use of calibrated color systems is an absolute necessity. For example, color calibration is routinely used throughout the digital print work ﬂow but also in digital ﬁlm production, professional photography, image databases, etc. One may have experienced how diﬃcult it is, for example, to render a good photograph on a color laser printer, and even the color reproduction on monitors largely depends on the particular manufacturer and computer system. All the color spaces described in Vol. 1 [14, Sec. 8.2] somehow relate to the physical properties of some media device, such as the speciﬁc colors of the phosphor coatings inside a CRT tube or the colors of the inks used for printing. To make colors appear similar or even identical on diﬀerent media modalities, we need a representation that is independent of how a particular device reproduces these colors. Color systems that describe colors in a measurable, device-independent fashion are called colorimetric or calibrated, and the ﬁeld of color science is traditionally concerned with the properties and application of these color systems (see, e. g., [80] or [66] for an overview). While several colorimetric standards exist, we focus on the most widely used CIE systems in the remaining part of this section.

W. Burger, M.J. Burge, Principles of Digital Image Processing, Undergraduate Topics in Computer Science, DOI 10.1007/978-1-84800-195-4_6, © Springer-Verlag London Limited, 2009

98

6. Colorimetric Color Spaces

Y

Y

1

1

Y G

E 0

X

W

R C

0 S

1

X

M 1

1

B

Z

Z (a)

(b)

Figure 6.1 CIE XYZ color space. The XYZ color space is deﬁned by the three imaginary primary colors X, Y , Z, where the Y dimension corresponds to the perceived luminance. All visible colors are contained inside an open, cone-shaped volume that originates at the black point S (a), where E denotes the axis of neutral (gray) colors. The RGB color space maps to the XYZ space as a linearly distorted cube (b).

6.1 CIE Color Spaces The XYZ color system, developed by the CIE (Commission Internationale d’Èclairage)1 in the 1920s and standardized in 1931, is the foundation of most colorimetric color systems that are in use today [60, p. 22].

6.1.1 CIE XYZ color space The CIE XYZ color scheme was developed after extensive measurements of human visual perception under controlled conditions. It is based on three imaginary primary colors X, Y , Z, which are chosen such that all visible colors can be described as a summation of positive-only components, where the Y component corresponds to the perceived lightness or luminosity of a color. All visible colors lie inside a three-dimensional cone-shaped region (Fig. 6.1 (a)), which interestingly enough does not include the primary colors themselves. Some common color spaces, and the RGB color space in particular, conveniently relate to XYZ space by a linear coordinate transformation, as described in Sec. 6.3. Thus, as shown in Fig. 6.1 (b), the RGB color space is embedded in 1

International Commission on Illumination (www.cie.co.at).

6.1 CIE Color Spaces

99

Table 6.1 Coordinates of the RGB color cube in CIE XYZ space. The X, Y, Z values refer to standard (ITU-R BT.709) primaries and white point D65 (see Table 6.2), x, y denote the corresponding CIE chromaticity coordinates. Pt. S R Y G C B M W

Color black red yellow green cyan blue magenta white

R 0.00 1.00 1.00 0.00 0.00 0.00 1.00 1.00

G 0.00 0.00 1.00 1.00 1.00 0.00 0.00 1.00

B 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00

X 0.0000 0.4125 0.7700 0.3576 0.5380 0.1804 0.5929 0.9505

Y 0.0000 0.2127 0.9278 0.7152 0.7873 0.0722 0.2848 1.0000

Z 0.0000 0.0193 0.1385 0.1192 1.0694 0.9502 0.9696 1.0888

x 0.3127 0.6400 0.4193 0.3000 0.2247 0.1500 0.3209 0.3127

y 0.3290 0.3300 0.5052 0.6000 0.3288 0.0600 0.1542 0.3290

the XYZ space as a distorted cube, and therefore straight lines in RGB space map to straight lines in XYZ again. The CIE XYZ scheme is (similar to the RGB color space) nonlinear with respect to human visual perception, that it, a particular ﬁxed distance in XYZ is not perceived as a uniform color change throughout the entire color space. The XYZ coordinates of the RGB color cube (based on the primary colors deﬁned by ITU-R BT.709) are listed in Table 6.1.

6.1.2 CIE x, y chromaticity As mentioned, the luminance in XYZ color space increases along the Y axis, starting at the black point S located at the coordinate origin (X = Y = Z = 0). The color hue is independent of the luminance and thus independent of the Y value. To describe the corresponding “pure” color hues and saturation in a convenient manner, the CIE system also deﬁnes the three chromaticity values x=

X , X +Y +Z

y=

Y , X +Y +Z

z=

Z , X +Y +Z

(6.1)

where (obviously) x + y + z = 1 and thus one of the three values (e. g., z) is redundant. Equation (6.1) describes a central projection from X, Y, Z coordinates onto the three-dimensional plane X + Y + Z = 1, with the origin S as the projection center (Fig. 6.2). Thus, for an arbitrary XYZ color point A = (Xa , Ya , Za ), the corresponding chromaticity coordinates a = (xa , ya , za ) are found by intersecting the line SA with the X + Y + Z = 1 plane (Fig. 6.2 (a)). The ﬁnal x, y coordinates are the result of projecting these intersection points onto the X/Y -plane (Fig. 6.2 (b)) by simply dropping the Z component za . The result is the well-known horseshoe-shaped CIE x, y chromaticity diagram, which is shown in Fig. 6.2 (c). Any x, y point in this diagram deﬁnes the

100

6. Colorimetric Color Spaces

Y

Y

A = (Xa , Ya , Za )

1

1

Y a

G

y

W E 0

C

S

X

1

x

1

X

M B

1

1

Z

Z (a)

(b)

y 1.0 520 nm 540 nm 560 nm

G

500 nm

Y

0.5

C

480 nm 0.0 0.0

580 nm 600 nm

E

R

680...780 nm

(c)

M B 380 nm

0.5

1.0

x

Figure 6.2 CIE x, y chromaticity diagram. For an arbitrary XYZ color point A = (Xa , Ya , Za ), the chromaticity values a = (xa , ya , za ) are obtained by a central projection onto the 3D plane X + Y + Z = 1 (a). The corner points of the RGB cube map to a triangle, and its white point W maps to the (colorless) neutral point E. The intersection points are then projected onto the X/Y plane (b) by simply dropping the Z omponent, which produces the familiar CIE chromaticity diagram shown in (c). The CIE diagram contains all visible color tones (hues and saturations) but no luminance information, with wavelengths in the range 380–780 nanometers. A particular color space is speciﬁed by at least three primary colors (tristimulus values; e. g., R, G, B), which deﬁne a triangle (linear hull) containing all representable colors.

6.1 CIE Color Spaces

101

hue and saturation of a particular color, but only the colors inside the horseshoe curve are potentially visible. Obviously an inﬁnite number of X, Y, Z colors (with diﬀerent luminance values) project to the same x, y, z chromaticity values, and the XYZ color coordinates thus cannot be uniquely reconstructed from given chromaticity values. Additional information is required. For example, it is common to specify the visible colors of the CIE system in the form Yxy, where Y is the original luminance component of the XYZ color. Given a pair of chromaticity values x, y (with y > 0) and an arbitrary Y value, the missing X, Z coordinates are obtained (using the deﬁnitions in Eqn. (6.1)) as Y Y Y Z =z· = (1 − x − y) · . (6.2) X =x· , y y y The CIE diagram not only yields an intuitive layout of color hues but exhibits some remarkable formal properties. The xy values along the outer horseshoe boundary correspond to monochromatic (“spectrally pure”), maximally saturated colors with wavelengths ranging from below 400 nm (purple) up to 780 nm (red). Thus, the position of any color inside the xy diagram can be speciﬁed with respect to any of the primary colors at the boundary, except for the points on the connecting line (“purple line”) between 380 and 780 nm, whose purple hues do not correspond to primary colors but can only be generated by mixing other colors. The saturation of colors falls oﬀ continuously toward the “neutral point” (E) at the center of the horseshoe, with x = y = 13 (or X = Y = Z = 1, respectively) and zero saturation. All other colorless (i. e., gray) values also map to the neutral point, just as any set of colors with the same hue but diﬀerent brightness corresponds to a single x, y point. All possible composite colors lie inside the convex hull speciﬁed by the coordinates of the primary colors of the CIE diagram and, in particular, complementary colors are located on straight lines that run diagonally through the white point.

6.1.3 Standard illuminants A central goal of colorimetry is the quantitative measurement of colors in physical reality, which strongly depends on the color properties of the illumination. The CIE system speciﬁes a number of standard illuminants for a variety of real and hypothetical light sources, each speciﬁed by a spectral radiant power distribution and the “correlated color temperature” (expressed in degrees Kelvin) [80, Sec. 3.3.3]. The following daylight (D) illuminants are particularly important for the design of digital color spaces (Table 6.2): D50 emulates the spectrum of natural (direct) sunlight with an equivalent color temperature of approximately 5000◦ K. D50 is the recommended

102

6. Colorimetric Color Spaces

illuminant for viewing reﬂective images, such as paper prints. In practice, D50 lighting is commonly implemented with ﬂuorescent lamps using multiple phosphors to approximate the speciﬁed color spectrum. D65 has a correlated color temperature of approximately 6500◦ K and is designed to emulate the average (indirect) daylight observed under an overcast sky on the northern hemisphere. D65 is also used as the reference white for emittive devices, such as display screens. The standard illuminants serve to specify the ambient viewing light but also to deﬁne the reference white points in various color spaces in the CIE color system. For example, the sRGB standard (see Sec. 6.3) refers to D65 as the media white point and D50 as the ambient viewing illuminant. In addition, the CIE system also speciﬁes the range of admissible viewing angles (commonly at ±2◦ ). Table 6.2 CIE color parameters for the standard illuminants D50 and D65. E denotes the absolute neutral point in CIE XYZ space.

Pt. D50 D65 E

Temp.

X

Y

Z

x

y

◦

0.964296

1.000000

0.825105

0.3457

0.3585

◦

0.950456

1.000000

1.088754

0.3127

0.3290

◦

1

1

1

1/3

1/3

5000 K 6500 K 5400 K

6.1.4 Gamut The set of all colors that can be handled by a certain media device or can be represented by a particular color space is called “gamut”. This is usually a contiguous region in the three-dimensional CIE XYZ color space or, reduced to the representable color hues and ignoring the luminance component, a convex region in the two-dimensional CIE chromaticity diagram. Figure 6.3 illustrates some typical gamut regions inside the CIE diagram. The gamut of an output device mainly depends on the technology employed. For example, ordinary color monitors are typically not capable of displaying all colors of the gamut covered by the corresponding color space (usually sRGB). Conversely, it is also possible that devices would reproduce certain colors that cannot be represented in the utilized color space. Signiﬁcant deviations exist, for example, between the RGB color space and the gamuts associated with CMYK-based printers. Also, media devices with very large gamuts exist, as demonstrated by the laser display system in Fig. 6.3. Representing such large gamuts and, in particular, transforming between diﬀerent color representations

6.1 CIE Color Spaces

103

y

CIE L∗ a∗ b∗

1.0

Laser Display Adobe RGB sRGB CMYK

0.5

D65

0.0 0.0

0.5

1.0

x

Figure 6.3 Gamut regions for diﬀerent color spaces and output devices inside the CIE diagram.

requires adequately sized color spaces, such as the Adobe-RGB color space or L∗ a∗ b∗ (described below), which covers the entire visible portion of the CIE diagram.

6.1.5 Variants of the CIE color space The original CIE XYZ color space and the derived xy chromaticity diagram have the disadvantage that color diﬀerences are not perceived equally in diﬀerent regions of the color space. For example, large color changes are perceived in the magenta region for a given shift in XYZ while the change is relatively small in the green region for the same coordinate distance. Several variants of the CIE color space have been developed for diﬀerent purposes, primarily with the goal of creating perceptually uniform color representations without sacriﬁcing the formal qualities of the CIE reference system. Popular CIE-derived color spaces include CIE YUV, YU V , L∗ u∗ v∗ , YCb Cr , and particularly L∗ a∗ b∗ , which is described below. In addition, CIE-compliant speciﬁcations exist for most common color spaces (see Vol. 1 [14, Sec. 8.2]), which allow more or less dependable conversions between almost any pair of color spaces.

104

6. Colorimetric Color Spaces

6.2 CIE L∗ a∗ b∗ The L∗ a∗ b∗ color model (speciﬁed by CIE in 1976) was developed with the goal of linearizing the representation with respect to human color perception and at the same time creating a more intuitive color system. Since then, L∗ a∗ b∗2 has become a popular and widely used color model, particularly for high-quality photographic applications. It is used, for example, inside Adobe Photoshop as the standard model for converting between diﬀerent color spaces. The dimensions in this color space are the luminosity L∗ and the two color components a∗ , b∗ , which specify the color hue and saturation along the green-red and blueyellow axes, respectively. All three components are relative values and refer to the speciﬁed reference white point Cref = (Xref , Yref , Zref ). In addition, a nonlinear correction function (similar to the modiﬁed gamma correction described in Vol. 1 [14, Sec. 4.7.6]) is applied to all three components, as detailed below.

6.2.1 Transformation CIE XYZ → L∗a∗b∗ Several speciﬁcations for converting to and from L∗ a∗ b∗ space exist that, however, diﬀer marginally and for very small L values only. The current speciﬁcation for converting between CIE XYZ and L∗ a∗ b∗ colors is deﬁned by ISO Standard 13655 [42] as follows: L∗ = 116 · Y − 16, a∗ = 500 · (X − Y ), ∗

(6.3)

b = 200 · (Y − Z ),

where X = f1 XXref , Y = f1 YYref , Z = f1 ZZref , 1 c3 for c > 0.008856 and f1 (c) = 16 7.787 · c + 116 for c ≤ 0.008856 . Usually D65 is speciﬁed as the reference white point (Xref , Yref , Zref ) (see Table 6.2). The L∗ values are positive and usually within the range [0, 100] (often scaled to [0, 255]), but may theoretically be greater. The possible values for a∗ and b∗ are in the range [−127, +127].

2

Often L∗ a∗ b∗ is simply referred to as the “Lab” color space.

6.2 CIE L∗ a∗ b∗

105

Table 6.3 CIE L∗ a∗ b∗ coordinates for selected RGB color points. The X65 , Y65 , Z65 values relate to the standard (ITU-R BT.709) primaries and white point D65 (see Tables 6.1 and 6.2). Pt. S R Y G C B M W

Color black red yellow green cyan blue magenta white

R 0.00 1.00 1.00 0.00 0.00 0.00 0.00 1.00

G 0.00 0.00 1.00 1.00 1.00 0.00 1.00 1.00

B 0.00 0.00 0.00 0.00 1.00 1.00 1.00 1.00

X65 0.0000 0.4125 0.7700 0.3576 0.5380 0.1804 0.5929 0.9505

Y65 0.0000 0.2127 0.9278 0.7152 0.7873 0.0722 0.2848 1.0000

Z65 L∗ a∗ b∗ 0.0000 0.00 0.00 0.00 0.0193 53.24 80.09 67.20 0.1385 97.14 −21.55 94.48 0.1192 87.74 −86.18 83.18 1.0694 91.11 −48.09 −14.13 0.9502 32.30 79.19 −107.86 0.9696 60.32 98.23 −60.83 1.0888 100.00 0.00 0.00

6.2.2 Transformation L∗a∗b∗ → CIE XYZ The reverse transformation from L∗ a∗ b∗ space to XYZ coordinates is deﬁned as follows: a∗

+Y , X = Xref · f2 500

Y = Yref · f2 Y ,

b∗ Z = Zref · f2 Y − 200 , where Y = and f2 (c) =

(6.4)

L∗ +16 116

c3 c−16/116 7.787

for c3 > 0.008856 for c3 ≤ 0.008856 .

The complete Java code for the L∗ a∗ b∗ /XYZ conversion and the implementation of the associated ColorSpace class can be found in Progs. 6.1 and 6.2 (pp. 121– 122). Table 6.3 lists the relation between L∗ a∗ b∗ and XYZ coordinates for selected RGB colors. Figure 6.4 shows the separation of a color image into the corresponding L∗ a∗ b∗ components.

6.2.3 Measuring color diﬀerences Due to its high uniformity with respect to human color perception, the L∗ a∗ b∗ color space is a particularly good choice for determining the diﬀerence between colors (the same holds for the L∗ u∗ v∗ space) [29, p. 57]. The diﬀerence between two color points C1 and C2 can be found by simply measuring the Euclidean

106

6. Colorimetric Color Spaces

L∗

a∗

b∗

Figure 6.4 L∗ a∗ b∗ components shown as grayscale images. The contrast of the a∗ and b∗ images has been increased by 40% for better viewing.

distance in L∗ a∗ b∗ space, (6.5) ColorDistLab (C1 , C2 ) = C1 − C2 % = (L∗1 − L∗2 )2 + (a∗1 − a∗2 )2 + (b∗1 − b∗2 )2 , where C1 = (L∗1 , a∗1 , b∗1 ) and C2 = (L∗2 , a∗2 , b∗2 ).

6.3 sRGB CIE-based color spaces such as L∗ a∗ b∗ (and L∗ u∗ v∗ ) are device-independent and have a gamut suﬃciently large to represent virtually all visible colors in the CIE XYZ system. However, in many computer-based, display-oriented applications, such as computer graphics or multimedia, the direct use of CIE-based color spaces may be too cumbersome or ineﬃcient. sRGB (“standard RGB” [41]) was developed (jointly by Hewlett-Packard and Microsoft) with the goal of creating a precisely speciﬁed color space for these applications, based on standardized mappings with respect to the colorimetric CIE XYZ color space. This includes precise speciﬁcations of the three primary colors, the white reference point, ambient lighting conditions, and gamma values. Interestingly, the sRGB color speciﬁcation is the same as the one speciﬁed many years before for the European PAL/SECAM television standards. Compared to L∗ a∗ b∗ , sRGB exhibits a relatively small gamut (see Fig. 6.3), which, however, includes most colors that can be reproduced by current computer and video monitors. Although sRGB was not designed as a universal color space, its CIE-based speciﬁcation at least permits more or less exact conversions to and from other color spaces.

6.3 sRGB

107

Table 6.4 sRGB tristimulus values R, G, B with reference to the white point D65 (W). R, G, B denote the linearized component values (which at 0 and 1 are identical to the nonlinear R , G , B values).

Pt. R G B W

R 1.0 0.0 0.0 1.0

G 0.0 1.0 0.0 1.0

B 0.0 0.0 1.0 1.0

X65 0.412453 0.357580 0.180423 0.950456

Y65 0.212671 0.715160 0.072169 1.000000

Z65 0.019334 0.119193 0.950227 1.088754

x65 0.6400 0.3000 0.1500 0.3127

y65 0.3300 0.6000 0.0600 0.3290

Several standard image formats, including EXIF (JPEG) and PNG are based on sRGB color data, which makes sRGB the de facto standard for digital still cameras, color printers, and other imaging devices at the consumer level [34]. sRGB is used as a relatively dependable archive format for digital images, particularly in less demanding applications that do not require (or allow) explicit color management [71]. In particular, sRGB was deﬁned as the standard default color space for Internet/Web applications by the W3C consortium as part of the HTML 4 speciﬁcation [70]. Thus, in practice, working with any RGB color data almost always means dealing with sRGB. It is thus no coincidence that sRGB is also the common color scheme in Java and is extensively supported by the Java standard API (see Sec. 6.6 below). Table 6.4 lists the key parameters of the sRGB color space (i. e., the XYZ coordinates for the primary colors R, G, B and the white point W (D65)), which are deﬁned according to ITU-R BT.709 [44] (see Tables 6.1 and 6.2). Together, these values permit the unambiguous mapping of all other colors in the CIE diagram.

6.3.1 Linear vs. nonlinear color components sRGB is a nonlinear color space with respect to the XYZ coordinate system, and it is important to carefully distinguish between the linear and nonlinear RGB component values. The nonlinear values (denoted R , G , B ) represent the actual color tuples, the data values read from an image ﬁle or received from a digital camera. These values are precorrected with a ﬁxed Gamma (≈ 2.2) such that they can be easily viewed on a common color monitor without any additional conversion. The corresponding linear components (denoted R, G, B) relate to the CIE XYZ color space by a linear mapping and can thus be computed from X, Y, Z coordinates and vice versa by simple matrix multiplication, ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ X R X R ⎝ ⎠ ⎝G⎠ = M RGB · ⎝ Y ⎠ and ⎝ Y ⎠ = M −1 (6.6) RGB · G , Z B Z B

108

6. Colorimetric Color Spaces

respectively, with ⎛

M RGB

M −1 RGB

⎞ 3.240479 −1.537150 −0.498535 = ⎝−0.969256 1.875992 0.041556⎠, 0.055648 −0.204043 1.057311 ⎛ ⎞ 0.412453 0.357580 0.180423 = ⎝ 0.212671 0.715160 0.072169⎠. 0.019334 0.119193 0.950227

(6.7)

(6.8)

Notice that the three column vectors of M −1 RGB (Eqn. (6.8)) are the coordinates of the primary colors R, G, B (tristimulus values) in XYZ space (cf. Table 6.4) and thus ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 0 0 −1 −1 ⎝ ⎝ ⎝ ⎠ ⎠ · · · , G = M , B = M (6.9) R = M −1 0 1 0⎠. RGB RGB RGB 0 0 1

6.3.2 Transformation CIE XYZ→sRGB To transform a given XYZ color to sRGB (Fig. 6.5), we ﬁrst compute the linear R, G, B values by multiplying the (X, Y, Z) coordinate vector with the matrix M RGB (Eqn. (6.7)), ⎛ ⎞ ⎛ ⎞ R X ⎝G⎠ = M RGB⎝ Y ⎠ . (6.10) B Z Subsequently, a modiﬁed gamma correction (see Vol. 1 [14, Sec. 4.7.6]) with γ = 2.4 (which corresponds to an eﬀective gamma value of ca. 2.2) is applied to the linear R, G, B values,

with

R = fγ (R), G = fγ (G), B = fγ (B), 1 1.055 · c 2.4 − 0.055 for c > 0.0031308 fγ (c) = 12.92 · c for c ≤ 0.0031308 .

(6.11)

The resulting nonlinear sRGB components R , G , B are limited to the interval [0, 1]. To obtain discrete numbers, the R , G , B values are ﬁnally scaled linearly to the 8-bit integer range [0, 255].

6.3.3 Transformation sRGB→CIE XYZ To compute the reverse transformation from sRGB to XYZ, the given (nonlinear) R G B values (in the range [0, 1]) are ﬁrst linearized by inverting the

6.3 sRGB

109

⎛

⎞ ⎛ ⎞ ⎛ ⎞ linear gamma X R R ⎝ Y ⎠ −→ mapping −→ ⎝ G⎠ −→ correction −→ ⎝ G ⎠ M RGB fγ () Z B B Figure 6.5 Color transformation from CIE XYZ to sRGB.

gamma correction3 (Eqn. (6.11)), R = fγ−1(R ), with

fγ−1 (c ) =

G = fγ−1(G ), B = fγ−1 (B ), ⎧

⎨ c +0.055 2.4 for c > 0.03928 1.055 ⎩

c 12.92

for c ≤ 0.03928.

(6.12) (6.13)

Subsequently, the linearized (R, G, B) vector is transformed to XYZ coordinates by multiplication with the inverse of the matrix M RGB (Eqn. (6.8)); i. e., ⎛ ⎞ ⎛ ⎞ X R ⎝ Y ⎠ = M −1 ⎝ ⎠ (6.14) RGB G . Z B Table 6.5 lists the nonlinear and the linear RGB component values for selected color points. Note that component values of 0 and 1 are not aﬀected by the gamma correction because these values map to themselves. The coordinates of the extremal points of the RGB color cube are therefore identical in nonlinear and linear RGB spaces. However, intermediate values are strongly aﬀected by the gamma correction, as illustrated by the coordinates for the color points K . . . P, which emphasizes the importance of diﬀerentiating between linear and nonlinear color coordinates.

6.3.4 Calculating with sRGB values Due to the wide use of sRGB in digital photography, graphics, multimedia, Internet imaging, etc., there is a probability that a given image is encoded in sRGB colors. If, for example, a JPEG image is opened with ImageJ or Java, the pixel values in the resulting data array are media-oriented (i. e., nonlinear R , G , B components of the sRGB color space). Unfortunately, this fact is often overlooked by programmers, with the consequence that colors are incorrectly manipulated and reproduced. As a general rule, any arithmetic operation on color values should always be performed on the linearized R, G, B components, which are obtained from the nonlinear R , G , B values through the inverse gamma function fγ−1 (Eqn. (6.13)) and converted back again with fγ (Eqn. (6.11)). 3

See Eqn. (4.35) in Vol. 1 [14, p. 86] for a general formulation of the inverse modiﬁed gamma function.

110

6. Colorimetric Color Spaces

Table 6.5 CIE XYZ coordinates for selected sRGB colors. The table lists the nonlinear R , G , and B components, the linearized R, G, and B values, and the corresponding X, Y , and Z coordinates (for white point D65). The linear and nonlinear RGB values are identical for the extremal points of the RGB color cube S . . . W (top rows) because the gamma correction does not aﬀect 0 and 1 component values. However, intermediate colors (K . . . P, shaded rows) may exhibit large diﬀerences between the nonlinear and linear components (e. g., compare the R and R values for R25 ). sRGB nonlinear Pt. Color R G B S black 0.00 0.0 0.0 R red 1.00 0.0 0.0 Y yellow 1.00 1.0 0.0 G green 0.00 1.0 0.0 C cyan 0.00 1.0 1.0 B blue 0.00 0.0 1.0 M magenta 1.00 0.0 1.0 W white 1.00 1.0 1.0 K 50% gray 0.50 0.5 0.5 R75 75% red 0.75 0.0 0.0 R50 50% red 0.50 0.0 0.0 R25 25% red 0.25 0.0 0.0 P pink 1.00 0.5 0.5

sRGB linearized R G B 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 1.0000 1.0000 0.0000 0.0000 1.0000 0.0000 0.0000 1.0000 1.0000 0.0000 0.0000 1.0000 1.0000 0.0000 1.0000 1.0000 1.0000 1.0000 0.2140 0.2140 0.2140 0.5225 0.0000 0.0000 0.2140 0.0000 0.0000 0.0509 0.0000 0.0000 1.0000 0.2140 0.2140

CIE XYZ X65 Y65 Z65 0.0000 0.0000 0.0000 0.4125 0.2127 0.0193 0.7700 0.9278 0.1385 0.3576 0.7152 0.1192 0.5380 0.7873 1.0694 0.1804 0.0722 0.9502 0.5929 0.2848 0.9696 0.9505 1.0000 1.0888 0.2034 0.2140 0.2330 0.2155 0.1111 0.0101 0.0883 0.0455 0.0041 0.0210 0.0108 0.0010 0.5276 0.3812 0.2482

Example: color to grayscale conversion The principle of converting RGB colors to grayscale values by computing a weighted sum of the color components was described already in Vol. 1 [14, Sec. 8.2.1], where we had simply ignored the issue of possible nonlinearities. As one may probably have guessed, however, the variables R, G, B, and Y in Eqn. (8.7) of Vol. 1 (p. 203), Y = 0.2125 · R + 0.7154 · G + 0.0721 · B,

(6.15)

implicitly refer to linear color and gray values, respectively, and not the raw sRGB values! Based on Eqn. (6.15), the correct grayscale conversion from raw (nonlinear) sRGB components R , G , B is Y = fγ 0.2125 · fγ−1 (R ) + 0.7154 · fγ−1(G ) + 0.0721 · fγ−1 (B ) ,

(6.16)

with fγ () and fγ−1 () as deﬁned in Eqns. (6.11) and (6.13). The result (Y ) is again a nonlinear, sRGB-compatible gray value; i. e., the sRGB color tuple (Y , Y , Y ) should have the same perceived luminance as the original color (R , G , B ).

6.4 Adobe RGB

111

Note that setting the components of an sRGB color pixel to three arbitrary but identical values Y , (R , G , B ) → (Y , Y , Y ), always creates a gray (colorless) pixel, despite the nonlinearities of the sRGB space. This is due to the fact that the gamma correction (Eqns. (6.11) and (6.13)) applies evenly to all three color components and thus any three identical values map to a (linearized) color on the straight gray line between the black point S and the white point W in XYZ space (cf. Fig. 6.1 (b)). For many applications, however, the following approximation to the exact grayscale conversion in Eqn. (6.16) is suﬃcient. It works without converting the sRGB values (i. e., directly on the nonlinear R , G , B components) by computing a linear combination · R + wG · G + wB · B Y ≈ wR

(6.17)

with a modiﬁed set of weights; e. g., wR = 0.309, wG = 0.609, wB = 0.082, as proposed in [58].

6.4 Adobe RGB A distinct weakness of sRGB is its relatively small gamut, which is limited to the range of colors reproducible by ordinary color monitors. This causes problems, for example, in printing, where larger gamuts are needed, particularly in the green regions. The “Adobe RGB (1998)” [1] color space, developed by Adobe as their own standard, is based on the same general concept as sRGB but exhibits a signiﬁcantly larger gamut (Fig. 6.3), which extends its use particularly to print applications. Figure 6.6 shows the noted diﬀerence between the sRGB and Adobe RGB gamuts in three-dimensional CIE XYZ color space. The white point of Adobe RGB corresponds to the D65 standard (with x = 0.3127, y = 0.3290), and the gamma value is 2.199 (compared with 2.4 1 for sRGB) for the forward correction and 2.199 for the inverse correction, respectively. The associated ﬁle speciﬁcation provides for a number of diﬀerent codings (8 to 16-bit integer and 32-bit ﬂoating point) for the color components. Adobe RGB is frequently used in professional photography as an alternative to the L∗ a∗ b∗ color space and for picture archive applications.

6.5 Chromatic Adaptation The human eye has the capability to interpret colors as being constant under varying viewing conditions and illumination in particular. A white sheet of paper appears white to us in bright daylight as well as under ﬂuorescent lighting,

112

6. Colorimetric Color Spaces

(a)

(b)

Figure 6.6 Gamuts for sRGB (a) and Adobe RGB (b) in the three-dimensional CIE XYZ color space.

although the spectral composition of the light that enters the eye is completely diﬀerent in both situations. The CIE color system takes into account the color temperature of the ambient lighting because the exact interpretation of XYZ color values also requires knowledge of the corresponding reference white point. For example, a color value (X, Y, Z) speciﬁed with respect to the D50 reference white point is generally perceived diﬀerently when reproduced by a D65-based media device, although the absolute (i. e., measured) color is the same. Thus the actual meaning of XYZ values cannot be known without knowing the corresponding white point. This is known as relative colorimetry. If colors are speciﬁed with respect to diﬀerent white points, for example W1 = (XW1 , YW1 , ZW1) and W2 = (XW2 , YW2 , ZW2 ), they can be related by ﬁrst applying a so-called chromatic adaptation transformation (CAT) [38, Ch. 34] in the XYZ color space. This transformation determines for given color coordinates (X1 , Y1 , Z1 ) and the associated white point W1 the new color coordinates (X2 , Y2 , Z2 ) relative to the alternate white point W2 .

6.5.1 XYZ scaling The simplest chromatic adaptation method is XYZ scaling, where the individual color coordinates are individually multiplied by the ratios of the corresponding white point coordinates: X 2 = X1 ·

XW2 , XW1

Y2 = Y1 ·

YW2 , YW1

Z2 = Z 1 ·

ZW2 . ZW1

(6.18)

For example, to convert colors from a system based on the white point W1 = D65 to a system relative to W2 = D50 (see Table 6.2), the result-

6.5 Chromatic Adaptation

113

ing transformation is X50 = X65 ·

XD50 0.964296 = X65 · 1.01456 , = X65 · XD65 0.950456

Y50 = Y65 ·

YD50 1.000000 = Y65 , = Y65 · YD65 1.000000

Z50 = Z65 ·

ZD50 0.825105 = Z65 · 0.757843 . = Z65 · ZD65 1.088754

(6.19)

This form of scaling color coordinates in XYZ space is usually not considered a good color adaptation model and is not recommended for high-quality applications.

6.5.2 Bradford adaptation The most common chromatic adaptation models are based on scaling the color coordinates not directly in XYZ but in a “virtual” R∗ G∗ B ∗ color space obtained from the XYZ values by a linear transformation ⎛ ⎞ ⎛ ∗⎞ X R ⎝G∗ ⎠ = M CAT · ⎝ Y ⎠ , (6.20) Z B∗ where M CAT is a 3 × 3 transformation matrix (deﬁned below). After appropriate scaling, the R∗ G∗ B ∗ coordinates are transformed back to XYZ, so the complete adaptation transform from color coordinates X1 , Y1 , Z1 (w.r.t. white point W1 ) to the new color coordinates X2 , Y2 , Z2 (w.r.t. white point W2 ) takes the form ⎛ R∗ ⎞ ⎛ ⎞ ⎛ ⎞ W2 0 0 ∗ RW1 X1 X2 ⎜ ⎟ ∗ ⎜ 0 GW2 ⎟ ·M CAT · ⎝ Y1 ⎠ , ⎝ Y2 ⎠ = M −1 0 · (6.21) ∗ CAT ⎝ GW1 ⎠ ∗ BW2 Z2 Z 1 0 0 B∗ W1

R∗ G∗ B∗ W2 , GW2 , BW2 ∗ ∗ R∗ W1 W1 W1

are the (constant) ratios of the R∗ G∗ B ∗ values of the where white points W2 , W1 , respectively; i. e., ⎞ ⎛ ∗ ⎞ ⎞ ⎛ ∗ ⎞ ⎛ ⎛ RW2 RW1 XW1 XW2 ⎝G∗W1 ⎠ = M CAT · ⎝ YW1 ⎠ and ⎝G∗W2 ⎠ = M CAT · ⎝ YW2 ⎠ . ∗ ∗ BW1 ZW1 BW2 ZW2 The popular “Bradford” model [38, p. 590] for chromatic adaptation speciﬁes the transformation matrix ⎛ ⎞ 0.8951 0.2664 −0.1614 M CAT = ⎝−0.7502 1.7135 0.0367⎠ . (6.22) 0.0389 −0.0685 1.0296

114

6. Colorimetric Color Spaces

Inserting this particular M CAT matrix in Eqn. (6.21) gives the complete chromatic adaptation. For example, the resulting transformation for converting from D65-based to D50-based colors (i. e., W1 = D65, W2 = D50, as listed in Table 6.2) is ⎛ ⎞ ⎞ X65 X50 ⎝ Y50 ⎠ = M 50|65 · ⎝ Y65 ⎠ Z50 Z65 ⎞ ⎛ ⎞ ⎛ 1.047884 0.022928 −0.050149 X65 = ⎝ 0.029603 0.990437 −0.017059⎠ · ⎝ Y65 ⎠ , Z65 −0.009235 0.015042 0.752085 ⎛

(6.23)

and conversely from D50-based to D65-based colors (i. e., W1 = D50, W2 = D65), ⎛ ⎛ ⎞ ⎞ ⎞ X50 X50 X65 ⎝ Y65 ⎠ = M 65|50 · ⎝ Y50 ⎠ = M −1 · ⎝ Y50 ⎠ 50|65 Z65 Z50 Z50 ⎞ ⎛ ⎞ ⎛ 0.955513 −0.023079 0.063190 X50 = ⎝−0.028348 1.009992 0.021019⎠ · ⎝ Y50 ⎠ . 0.012300 −0.020484 1.329993 Z50 ⎛

(6.24)

Fig. 6.7 illustrates the eﬀects of adaptation from the D65 white point to D50 in the CIE x, y chromaticity diagram. A short list of corresponding color coordinates is given in Table 6.6. The Bradford model is a widely used chromatic adaptation scheme but several similar procedures have been proposed (see also Exercise 6.1). Generally speaking, chromatic adaptation and related problems have a long history in color engineering and are still active ﬁelds of scientiﬁc research [80, Sec. 5.12].

6.6 Colorimetric Support in Java 6.6.1 sRGB colors in Java sRGB is the standard color space in Java; i. e., the components of color objects and RGB color images are gamma-corrected, nonlinear R , G , B values (see Fig. 6.5). The nonlinear R , G , B values are related to the linear R, G, B values by a modiﬁed gamma correction, as speciﬁed by the sRGB standard (Eqns. (6.11) and (6.13)).

6.6 Colorimetric Support in Java

115

Table 6.6 Bradford chromatic adaptation from white point D65 to D50 for selected sRGB colors. The XYZ coordinates X65 , Y65 , Z65 relate to the original white point D65 (W1 ). X50 , Y50 , Z50 are the corresponding coordinates for the new white point D50 (W2 ), obtained with the Bradford adaptation according to Eqn. (6.23).

Pt. S R Y G C B M W K R75 R50 R25 P

Color black red yellow green cyan blue magenta white 50% gray 75% red 50% red 25% red pink

sRGB R G B 0.00 0.0 0.0 1.00 0.0 0.0 1.00 1.0 0.0 0.00 1.0 0.0 0.00 1.0 1.0 0.00 0.0 1.0 1.00 0.0 1.0 1.00 1.0 1.0 0.50 0.5 0.5 0.75 0.0 0.0 0.50 0.0 0.0 0.25 0.0 0.0 1.00 0.5 0.5

XYZ (D65) X65 Y65 Z65 0.0000 0.0000 0.0000 0.4125 0.2127 0.0193 0.7700 0.9278 0.1385 0.3576 0.7152 0.1192 0.5380 0.7873 1.0694 0.1804 0.0722 0.9502 0.5929 0.2848 0.9696 0.9505 1.0000 1.0888 0.2034 0.2140 0.2330 0.2155 0.1111 0.0101 0.0883 0.0455 0.0041 0.0210 0.0108 0.0010 0.5276 0.3812 0.2482

XYZ (D50) X50 Y50 Z50 0.0000 0.0000 0.0000 0.4361 0.2225 0.0139 0.8212 0.9394 0.1110 0.3851 0.7169 0.0971 0.5282 0.7775 0.8112 0.1431 0.0606 0.7141 0.5792 0.2831 0.7280 0.9643 1.0000 0.8251 0.2064 0.2140 0.1766 0.2279 0.1163 0.0073 0.0933 0.0476 0.0030 0.0222 0.0113 0.0007 0.5492 0.3889 0.1876

y 1.0

G

D50 D65

R

B 0.0 0.0

0.5

1.0

x

Figure 6.7 Bradford chromatic adaptation from white point D65 to D50. The solid triangle represents the original RGB gamut for white point D65, with the primaries (R, G, B) located at the corner points. The dashed triangle is the corresponding gamut after chromatic adaptation to white point D50.

6.6.2 Proﬁle connection space (PCS) The Java API (AWT) provides classes for representing color objects and color spaces, together with a rich set of corresponding methods. Java’s color sys-

116

6. Colorimetric Color Spaces

Table 6.7 Color coordinates for sRGB primaries and the white point in Java’s default XYZ color space. The white point W is equal to D50.

Pt. R G B W

R 1.0 0.0 0.0 1.0

G 0.0 1.0 0.0 1.0

B 0.0 0.0 1.0 1.0

X50 0.436108 0.385120 0.143064 0.964296

Y50 0.222517 0.716873 0.060610 1.000000

Z50 0.013931 0.097099 0.714075 0.825106

x50 0.6484 0.3212 0.1559 0.3457

y50 0.3309 0.5978 0.0660 0.3585

tem is designed after the ICC4 “color management architecture”, which uses a CIE XYZ-based device-independent color space called the “proﬁle connection space” (PCS) [40,43]. The PCS color space is used as the intermediate reference for converting colors between diﬀerent color spaces. The ICC standard deﬁnes device proﬁles (see Sec. 6.6.5) that specify the transforms to convert between a device’s color space and the PCS. The advantage of this approach is that for any given device only a single color transformation (proﬁle) must be speciﬁed to convert between device-speciﬁc colors and the uniﬁed, colorimetric proﬁle connection space. Every ColorSpace class (or subclass) provides the methods fromCIEXYZ() and toCIEXYZ() to convert device color values to XYZ coordinates in the standardized PCS. Figure 6.8 illustrates the principal application of ColorSpace objects for converting colors between diﬀerent color spaces in Java using the XYZ space as a common “hub”. Diﬀerent from the sRGB speciﬁcation, the ICC speciﬁes D50 (and not D65) as the illuminant white point for its default PCS color space (see Table 6.2). The reason is that the ICC standard was developed primarily for color management in photography, graphics, and printing, where D50 is normally used as the reﬂective media white point. The Java methods fromCIEXYZ() and toCIEXYZ() thus take and return X, Y, Z color coordinates that are relative to the D50 white point. The resulting coordinates for the primary colors (listed in Table 6.7) are diﬀerent from the ones given for white point D65 (see Table 6.4)! This is a frequent cause of confusion since the sRGB component values are D65-based (as speciﬁed by the sRGB standard) but Java’s XYZ values are relative to the D50. Chromatic adaptation (see Sec. 6.5) is used to convert between XYZ color coordinates that are measured with respect to diﬀerent white points. The ICC speciﬁcation [40] recommends a linear chromatic adaptation based on the Bradford model to convert between the D65-related XYZ coordinates (X65 , Y65 , Z65 ) and D50-related values (X50 , Y50 , Z50 ). This is also implemented by the Java API. 4

International Color Consortium (ICC, www.color.org).

6.6 Colorimetric Support in Java

117

ColorSpace CS_sRGB sRGB non-linear

R'G'B' (D65)

toXYZ() fromXYZ()

CS_LINEAR_RGB sRGB linear

RGB (D65)

toXYZ() fromXYZ()

Profile Connection Space

XYZ (D50)

Lab_ColorSpace L*a*b*

L*a*b* (D65)

toXYZ() fromXYZ()

Figure 6.8 XYZ-based color conversion in Java. ColorSpace objects implement the methods fromCIEXYZ() and toCIEXYZ() to convert color vectors from and to the CIE XYZ color space, respectively. Colorimetric transformations between color spaces can be accomplished as a two-step process via the XYZ space. For example, to convert from sRGB to L∗ a∗ b∗ , the sRGB color is ﬁrst converted to XYZ and subsequently from XYZ to L∗ a∗ b∗ . Notice that Java’s standard XYZ color space is based on the D50 white point, while most common color spaces refer to D65.

The complete mapping between the linearized sRGB color values (R, G, B) and the D50-based (X50 , Y50 , Z50 ) coordinates can be expressed as a linear transformation composed of the RGB→XYZ65 transformation by matrix M RGB (Eqns. (6.7) and (6.8)) and the chromatic adaptation transformation XYZ65 →XYZ50 deﬁned by the matrix M 50|65 (Eqn. (6.23)), ⎞ ⎛ ⎛ ⎞ X50 R ⎝ Y50 ⎠ = M 50|65 · M −1 ⎝ · G⎠ RGB Z50 B ⎛ ⎞ R −1 · ⎝ G⎠ = M RGB · M 65|50 B ⎛ ⎞ ⎛ ⎞ 0.436131 0.385147 0.143033 R = ⎝0.222527 0.716878 0.060600⎠ · ⎝G⎠ , 0.013926 0.097080 0.713871 B

(6.25)

118

6. Colorimetric Color Spaces

PCS

sRGB fγ

X50 Y50

X65 M 65|50

Z50

Y65

M RGB

Z65

chromatic adaptation

R

R G

fγ

B

B

XYZ to linear RGB

G

fγ

Figure 6.9 Transformation from D50-based PCS coordinates (X50 , Y50 , Z50 ) to nonlinear sRGB values (R , G , B ).

and, in the reverse direction, ⎛ ⎞ ⎛ ⎞ R X50 ⎝G⎠ = M RGB · M 65|50 · ⎝ Y50 ⎠ B Z50 ⎞ ⎛ ⎞ ⎛ 3.133660 −1.617140 −0.490588 X50 = ⎝−0.978808 1.916280 0.033444⎠ · ⎝ Y50 ⎠ . Z50 0.071979 −0.229051 1.405840

(6.26)

Equations (6.25) and (6.26) are the transformations implemented by the methods toCIEXYZ() and fromCIEXYZ(), respectively, for Java’s default sRGB ColorSpace class. Of course, these methods must also perform the necessary gamma correction between the linear R, G, B components and the actual (nonlinear) sRGB values R , G , B . Figure 6.9 illustrates the complete transformation from D50-based PCS coordinates to nonlinear sRGB values.

6.6.3 Color-related Java classes The Java standard API oﬀers extensive support for working with colors and color images. The most important classes contained in the Java AWT package are: – Color: deﬁnes individual color objects. – ColorSpace: speciﬁes the properties of entire color spaces. – ColorModel: describes the structure of color images; e. g., full-color images or indexed-color images, as used in Vol. 1 [14, Sec. 8.1.2] (see Prog. 8.3 on p. 196).

6.6 Colorimetric Support in Java

119

Class Color (java.awt.Color) An object of class Color describes a particular color in the associated color space, which deﬁnes the number and type of the color components. Color objects are primarily used for graphic operations, such as to specify the color for drawing or ﬁlling graphic objects. Unless the color space is not explicitly speciﬁed, new Color objects are created as sRGB colors. The arguments passed to the Color constructor methods may be either float components in the range [0, 1] or integers in the range [0, 255], as demonstrated by the following example: 1 2

Color pink = new Color(1.0f,0.5f,0.5f); Color blue = new Color(0,0,255);

Note that in both cases the arguments are interpreted as nonlinear sRGB values (R , G , B ). Other constructor methods exist for class Color that in addition accept alpha (transparency) values. In addition, the Color class oﬀers two useful static methods, RGBtoHSB() and HSBtoRGB(), for converting between sRGB and HSV5 colors (see Sec. 8.2.3 of Vol. 1 [14, p. 209]). Class ColorSpace (java.awt.color.ColorSpace) An object of type ColorSpace represents an entire color space, such as sRGB or CMYK. Every subclass of ColorSpace (which itself is an abstract class) provides methods for converting its native colors to the CIE XYZ and sRGB color space and vice versa, such that conversions between arbitrary color spaces can easily be performed (through Java’s XYZ-based proﬁle connection space). In the following example, we ﬁrst create an instance of the default sRGB color space by invoking the static method ColorSpace.getInstance() and subsequently convert an sRGB color object (pink) to the corresponding (X50 , Y50 , Z50 ) coordinates in Java’s (D50-based) CIE XYZ proﬁle connection space: 1 2 3 4 5 6

// create an sRGB color space object: ColorSpace sRGBcsp = ColorSpace.getInstance(ColorSpace.CS_sRGB); float[] pink_RGB = new float[] {1.0f, 0.5f, 0.5f}; // convert from sRGB to XYZ: float[] pink_XYZ = sRGBcsp.toCIEXYZ(pink_RGB);

Notice that color vectors are represented as float[] arrays for color conversions with ColorSpace objects. If required, the method getComponents() can be used to convert Color objects to float[] arrays. In summary, the types of 5

The HSV color space is referred to as “HSB” (hue, saturation, brightness) in the Java API.

120

6. Colorimetric Color Spaces

color spaces that can be created with the ColorSpace.getInstance() method include: – CS_sRGB: the standard (D65-based) RGB color space with nonlinear R , G , B components, as speciﬁed in [41], – CS_LINEAR_RGB: color space with linear R, G, B components (i. e., no gamma correction applied), – CS_GRAY: single-component color space with linear grayscale values, – CS_PYCC: Kodak’s Photo YCC color space, – CS_CIEXYZ: the default XYZ proﬁle connection space (based on the D50 white point). The color space objects returned by getInstance() are all instances of ICC_ColorSpace, which is the only implementation of (the abstract class) ColorSpace provided by the Java standard API.Other color spaces can be implemented by creating additional implementations (subclasses) of ColorSpace, as demonstrated for L∗ a∗ b∗ in the example below.

6.6.4 A L∗a∗b∗ color space implementation In the following, we show a complete implementation of the L∗ a∗ b∗ color space, which is not available in the current Java API, based on the speciﬁcation given in Sec. 6.2. For this purpose, we deﬁne a subclass of ColorSpace (deﬁned in the package java.awt.color) named Lab_ColorSpace, which implements the required methods toCIEXYZ(), fromCIEXYZ() for converting to and from Java’s default proﬁle connection space, respectively, and toRGB(), fromRGB() for converting between L∗ a∗ b∗ and sRGB (Progs. 6.1 and 6.2). These conversions are performed in two steps via XYZ coordinates, where care must be taken regarding the right choice of the associated white point (L∗ a∗ b∗ is based on D65 and Java XYZ on D50). The following examples demonstrate the principal use of the new Lab_ColorSpace class: 1 2 3 4 5 6 7 8

ColorSpace LABcsp = new LabColorSpace(); float[] cyan_sRGB = {0.0f, 1.0f, 1.0f}; // sRGB→L*a*b*: float[] cyan_LAB = LABcsp.fromRGB(cyan_sRGB) // L*a*b*→XYZ: float[] cyan_XYZ = LABcsp.toXYZ(cyan_LAB);

6.6 Colorimetric Support in Java

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

121

public class LabColorSpace extends ColorSpace { // D65 static static static

reference white point final double Xref = Illuminant.D65.X; // 0.950456; final double Yref = Illuminant.D65.Y; // 1.000000; final double Zref = Illuminant.D65.Z; // 1.088754;

// create two chromatic adaptation objects ChromaticAdaptation catD65toD50 = new BradfordAdaptation(Illuminant.D65, Illuminant.D50); ChromaticAdaptation catD50toD65 = new BradfordAdaptation(Illuminant.D50, Illuminant.D65); // sRGB color space for methods toRGB() and fromRGB() static final ColorSpace sRGBcs = ColorSpace.getInstance(CS_sRGB); // constructor method: public LabColorSpace(){ super(TYPE_Lab,3); } // XYZ→CIELab: returns D65-related L*a*b values // from D50-related XYZ values: public float[] fromCIEXYZ(float[] XYZ50) { float[] XYZ65 = catD50toD65.apply(XYZ50); double xx = f1(XYZ65[0] / Xref); double yy = f1(XYZ65[1] / Yref); double zz = f1(XYZ65[2] / Zref); float L = (float)(116 * yy - 16); float a = (float)(500 * (xx - yy)); float b = (float)(200 * (yy - zz)); return new float[] {L, a, b}; } // continued...

Program 6.1 Java implementation of the L∗ a∗ b∗ color space (Part 1). Lab_ColorSpace is a subclass of the standard AWT class ColorSpace. The conversion from the proﬁle connection space (XYZ) to L∗ a∗ b∗ (Eqn. (6.3)) is implemented by the method fromCIEXYZ(), where a chromatic adaptation from D50 to D65 is applied ﬁrst (line 26). The auxiliary method f1() (deﬁned in Alg. 6.2, line 52) performs the required gamma correction (lines 27–29). The deﬁnitions of the classes Illuminant, ChromaticAdaptation, and BradfordAdaptation can be found in the source code section of the book’s Website.

6.6.5 ICC proﬁles Even with the most precise speciﬁcation, a standard color space may not be suﬃcient to accurately describe the transfer characteristics of some input or output device. ICC proﬁles are standardized descriptions of individual device transfer properties that warrant that an image or graphics can be reproduced

122

38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80

6. Colorimetric Color Spaces

// class Lab_ColorSpace (continued) // CIELab→XYZ: returns D50-related XYZ values // from D65-related L*a*b* values: public float[] toCIEXYZ(float[] Lab) { double yy = ( Lab[0] + 16 ) / 116; float X65 = (float) (Xref * f2(Lab[1] / 500 + yy)); float Y65 = (float) (Yref * f2(yy)); float Z65 = (float) (Zref * f2(yy - Lab[2] / 200)); float[] XYZ65 = new float[] {X65, Y65, Z65}; return catD65toD50.apply(XYZ65); } // Gamma correction (forward) double f1 (double c) { if (c > 0.008856) return Math.pow(c, 1.0 / 3); else return (7.787 * c) + (16.0 / 116); } // Gamma correction (inverse) double f2 (double c) { double c3 = Math.pow(c, 3.0); if (c3 > 0.008856) return c3; else return (c - 16.0 / 116) / 7.787; } //sRGB→CIELab public float[] fromRGB(float[] sRGB) { float[] XYZ50 = sRGBcs.toCIEXYZ(sRGB); return this.fromCIEXYZ(XYZ50); } //CIELab→sRGB public float[] toRGB(float[] Lab) { float[] XYZ50 = this.toCIEXYZ(Lab); return sRGBcs.fromCIEXYZ(XYZ50); } } // end of class LabColorSpace

Program 6.2 Java implementation of the L∗ a∗ b∗ color space (Part 2). The method toCIEXYZ() implements the reverse transformation from L∗ a∗ b∗ to XYZ, where the method f2() (deﬁned in line 60) does the inverse gamma correction (lines 44–46), followed by the chromatic adaptation from D65 to D50 in line 48. Auxiliary methods f1() and f2() implement the forward and inverse gamma corrections, respectively (as deﬁned in Eqns. (6.3) and (6.4)). The methods toRGB() and fromRGB() perform the conversions to and from sRGB in two steps via XYZ coordinates.

6.6 Colorimetric Support in Java

123

accurately on diﬀerent media. The contents and the format of ICC proﬁle ﬁles is speciﬁed in [40], which is identical to ISO standard 15076 [43]. Proﬁles are thus a key element in the process of digital color management [76]. The standard Java API supports the use of ICC proﬁles mainly through the classes ICC_ColorSpace and ICC_Profile, which allow application designers to create various standard proﬁles and read ICC proﬁles from data ﬁles.6 Assume, for example, that an image was recorded with a calibrated scanner and shall be displayed accurately on a monitor. For this purpose, we need the ICC proﬁles for the scanner and the monitor, which are often supplied by the manufacturers as .icc data ﬁles.7 For standard color spaces, the associated ICC proﬁles are often available as part of the computer installation, such as CIERGB.icc or NTSC1953.icc. With these proﬁles, a color space object can be speciﬁed that converts the image data produced by the scanner into corresponding CIE XYZ or sRGB values, as illustrated by the following example: 1 2 3 4 5 6 7

// load the scanner’s ICC proﬁle ICC_ColorSpace scannerCS = new ICC_ColorSpace(ICC_ProfileRGB.getInstance("scanner.icc")); // convert to RGB color float[] RGBColor = scannerCS.toRGB(scannerColor); // convert to XYZ color float[] XYZColor = scannerCS.toCIEXYZ(scannerColor);

Similarly, we can compute the accurate color values to be sent to the monitor by creating a suitable color space object from this device’s ICC proﬁle.

6

7

In the Java API, the transformations for all standard color space types are speciﬁed through corresponding ICC proﬁles, which are part of the standard Java distribution (ﬁles sRGB.pf, etc., usually contained in jdk.../jre/lib/cmm). However, up to the current Java release (1.6.0), the methods toCIEXYZ() and fromCIEXYZ() do not properly invert; i. e., col = csp.fromCIEXYZ(csp.toCIEXYZ(col )) for a color space object csp. (This has been a documented Java problem for some time.) A “clean” implementation of the sRGB color space can be found in the source code section of this book’s Website. ICC proﬁle ﬁles may also come with extensions .icm or .pf (as in the Java distribution).

124

6. Colorimetric Color Spaces

6.7 Exercises Exercise 6.1 For chromatic adaptation (deﬁned in Eqn. (6.21)), transformation matrices other than the Bradford model (Eqn. (6.22)) have been proposed; e. g. [71], ⎛ ⎞ 1.2694 −0.0988 −0.1706 M Sharp = ⎝−0.8364 1.8006 0.0357⎠ and 0.0297 −0.0315 1.0018 ⎛ ⎞ 0.7982 0.3389 −0.1371 M CMC = ⎝−0.5918 1.5512 0.0406⎠ . 0.0008 −0.0239 0.9753 Derive the complete chromatic adaptation transformations M 50|65 and M 65|50 for converting between D65 and D50 colors, analogous to Eqns. (6.23) and (6.24), for each of the transformation matrices above. Exercise 6.2 Implement the conversion of an sRGB color image to a colorless (grayscale) sRGB image using the three methods in Eqns. (6.15) (incorrectly applying standard weights to nonlinear R G B components), (6.16) (exact computation), and (6.17) (approximation using nonlinear components and modiﬁed weights). Compare the results by computing diﬀerence images, and also determine the total errors. Exercise 6.3 Write a program to evaluate the errors that are introduced by using nonlinear instead of linear color components for grayscale conversion. To do this, compute the diﬀence between the Y values obtained with the linear = variant (Eqn. (6.16)) and the non-linear variant (Eqn. (6.17) with wR 2 0.309, wG = 0.609, wB = 0.082) for all possible 2 4 RGB colors. Let you program return the maximum gray value diﬀerence and the sum of the abolute diﬀerences for all colors. Exercise 6.4 Explain why—in contrast to Fig. 6.1 (b)—the edges of the 3D gamut volume for sRGB and Adobe RGB in Fig. 6.6 are not straight.

7

Introduction to Spectral Techniques

The following three chapters deal with the representation and analysis of images in the frequency domain, based on the decomposition of image signals into sine and cosine functions—which are also known as harmonic functions—using the well-known Fourier transform. Students often consider this a diﬃcult topic, mainly because of its mathematical ﬂavor and that its practical applications are not immediately obvious. Indeed, most common operations and methods in digital image processing can be suﬃciently described in the original signal or image space without even mentioning spectral techniques. This is the reason why we pick up this topic relatively late in this text. While spectral techniques were often used to improve the eﬃciency of imageprocessing operations, this has become increasingly less important due to the high power of modern computers. There exist, however, some important eﬀects, concepts, and techniques in digital image processing that are considerably easier to describe in the frequency domain or cannot otherwise be understood at all. The topic should therefore not be avoided all together. Fourier analysis not only owns a very elegant (perhaps not always suﬃciently appreciated) mathematical theory but interestingly enough also complements some important concepts we have seen earlier, in particular linear ﬁlters and linear convolution (see Vol. 1 [14, Sec. 5.2]). Equally important are applications of spectral techniques in many popular methods for image and video compression, and they provide valuable insight into the mechanisms of sampling (discretization) of continuous signals as well as the reconstruction and interpolation of discrete signals. In the following, we ﬁrst give a basic introduction to the concepts of frequency and spectral decomposition that tries to be minimally formal and thus

W. Burger, M.J. Burge, Principles of Digital Image Processing, Undergraduate Topics in Computer Science, DOI 10.1007/978-1-84800-195-4_7, © Springer-Verlag London Limited, 2009

126

7. Introduction to Spectral Techniques

should be easily “digestible” even for readers without previous exposure to this topic. We start with the representation of one-dimensional signals and will then extend the discussion to two-dimensional signals (images) in the next chapter. Subsequently, Ch. 9 brieﬂy explains the discrete cosine transform, a popular variant of the discrete Fourier transform that is frequently used in image compression.

7.1 The Fourier Transform The concept of frequency and the decomposition of waveforms into elementary “harmonic” functions ﬁrst arose in the context of music and sound. The idea of describing acoustic events in terms of “pure” sinusoidal functions does not seem unreasonable, considering that sine waves appear naturally in every form of oscillation (e. g., on a free-swinging pendulum).

7.1.1 Sine and Cosine Functions The well-known cosine function (7.1)

f (x) = cos(x)

has the value 1 at the origin (cos(0) = 1) and performs exactly one full cycle between the origin and the point x = 2π (Fig. 7.1 (a)). We say that the function is periodic with a cycle length (period) T = 2π; i. e., cos(x) = cos(x + 2π) = cos(x + 4π) = · · · = cos(x + k2π)

(7.2)

for any k ∈ Z. The same is true for the corresponding sine function, except that its value is zero at the origin (sin(0) = 0). sin(x)

cos(x)

f (x) 1

π 2

sin(3x)

π

cos(3x)

0.5

0.5 4

x

2

f (x) 1

2

4

4

x

2

2

0.5

0.5

1

1

(a)

4

(b)

Figure 7.1 Cosine and sine functions. The expression cos(ωx) describes a cosine function with angular frequency ω at position x. The angular frequency ω of this periodic function corresponds to a cycle length (period) T = 2π/ω. For ω = 1, the period is T1 = 2π (a), and for ω = 3 it is T3 = 2π/3 ≈ 2.0944 (b). The same holds for the sine function sin(ωx).

7.1 The Fourier Transform

127

Frequency and amplitude The number of oscillations of cos(x) over the distance T = 2π is one and thus the value of the angular frequency 2π = 1. T

ω=

(7.3)

If we modify the function to f (x) = cos(3x),

(7.4)

we obtain a compressed cosine wave that oscillates three times faster than the original function cos(x) (Fig. 7.1 (b)). The function cos(3x) performs three full cycles over a distance of 2π and thus has the angular frequency ω = 3 and a . In general, the period T relates to the angular frequency ω as period T = 2π 3 T =

2π ω

(7.5)

for ω > 0. A sine or cosine function oscillates between peak values +1 and −1, and its amplitude is 1. Multiplying by a constant a ∈ R changes the peak values of the function to ±a and its amplitude to a. In general, the expression a · cos(ωx)

and

a · sin(ωx)

denotes a cosine or sine function with amplitude a and angular frequency ω, evaluated at position (or point in time) x. The relation between the angular frequency ω and the “common” frequency f is given by f=

ω 1 = T 2π

or ω = 2πf,

(7.6)

where f is measured in cycles per length or time unit.1 In the following, we use either ω or f as appropriate, and the meaning should always be clear from the symbol used. Phase Shifting a cosine function along the x axis by a distance ϕ, cos(x) → cos(x − ϕ), 1

For example, a temporal oscillation with frequency f = 1000 cycles/s (Hertz) has the period T = 1/1000 s and therefore the angular frequency ω = 2000π. The latter is a unitless magnitude.

128

7. Introduction to Spectral Techniques

changes the phase of the cosine wave, and ϕ denotes the phase angle of the resulting function. Thus a sine function is really just a cosine function shifted = π2 ), so to the right2 by a quarter period (ϕ = 2π 4

sin(ωx) = cos ωx − π2 . (7.7) If we take the cosine function as the reference with phase ϕcos = 0, then the phase angle of the corresponding sine function is ϕsin = π2 = 90◦ . Cosine and sine functions are “orthogonal” in a sense and we can use this fact to create new “sinusoidal” functions with arbitrary frequency, phase, and amplitude. In particular, adding a cosine and a sine function with the identical frequencies ω and arbitrary amplitudes A and B, respectively, creates another sinusoid: A · cos(ωx) + B · sin(ωx) = C · cos(ωx − ϕ). (7.8) The resulting amplitude C and the phase angle ϕ are deﬁned only by the two original amplitudes A and B as

(7.9) C = A2 + B 2 and ϕ = tan−1 B A . Figure 7.2 (a) shows an example with amplitudes A = B = 0.5 and a resulting phase angle ϕ = 45◦ . Complex-valued sine functions—Euler’s notation Figure 7.2 (b) depicts the contributing cosine and sine components of the new function as a pair of orthogonal vectors in 2-space whose lengths correspond to the amplitudes A and B. Not coincidentally, this reminds us of the representation of real and imaginary components of complex numbers z =a+ib in the two-dimensional plane C, where i is the imaginary unit (i2 = −1). This association becomes even stronger if we look at Euler’s famous notation of complex numbers along the unit circle, z = eiθ = cos(θ) + i · sin(θ),

(7.10)

where e ≈ 2.71828 is the Euler number. If we take the expression eiθ as a function of the angle θ rotating around the unit circle, we obtain a “complexvalued sinusoid” whose real and imaginary parts correspond to a cosine and a sine function, respectively,

Re eiθ = cos(θ), (7.11)

Im eiθ = sin(θ). 2

In general, the function f (x−d) is the original function f (x) shifted to the right by a distance d.

7.1 The Fourier Transform

129

A cos(ωx) + B sin(ωx)

1 0.75 0.5 0.25

(a)

3

2

1

x

1

0.25 0.5 0.75 1

2

3

sin(ωx) A cos(ωx) + B sin(ωx) 0.5 (b)

B

C ϕ cos(ωx) A

0.5

Figure 7.2 Adding cosine and sine functions with identical frequencies, A · cos(ωx) + B · sin(ωx), with ω = 3 and A = √ B = 0.5. The result is a phase-shifted cosine function (dotted curve) with amplitude C = 0.52 + 0.52 ≈ 0.707 and phase angle ϕ = 45◦ (a). If the cosine and sine components are treated as orthogonal vectors (A, B) in 2-space, the amplitude and phase of the resulting sinusoid (C) can be easily determined by vector summation (b).

Since z = eiθ is placed on the unit circle, the amplitude of the complex-valued sinusoid is |z| = r = 1. We can easily modify the amplitude of this function by multiplying it by some real value a ≥ 0, |a · eiθ | = a · |eiθ | = a.

(7.12)

Similarly, we can alter the phase of a complex-valued sinusoid by adding a phase angle ϕ in the function’s exponent or, equivalently, by multiplying it by a complex-valued constant c = eiϕ , ei(θ+ϕ) = eiθ · eiϕ .

(7.13)

In summary, multiplying by some real value aﬀects only the amplitude of a sinusoid, while multiplying by some complex value c (with unit amplitude |c| = 1) modiﬁes only the function’s phase (without changing its amplitude). In general, of course, multiplying by some arbitrary complex value changes both the amplitude and the phase of the function (also see Appendix A.3).

130

7. Introduction to Spectral Techniques

The complex notation makes it easy to combine orthogonal pairs of sine functions cos(ωx) and sin(ωx) with identical frequencies ω into a single functional expression (7.14) eiθ = eiωx = cos(ωx) + i · sin(ωx). We will make more use of this notation later in Sec. 7.1.4 to explain the Fourier transform.

7.1.2 Fourier Series of Periodic Functions As we demonstrated in Eqn. (7.8), sinusoidal functions of arbitrary frequency, amplitude, and phase can be described as the sum of suitably weighted cosine and sine functions. One may wonder if non-sinusoidal functions can also be decomposed into a sum of cosine and sine functions. The answer is yes, of course. It was Fourier3 who ﬁrst extended this idea to arbitrary functions and showed that (almost) any periodic function g(x) with a fundamental frequency ω0 can be described as a—possibly inﬁnite—sum of “harmonic” sinusoids; i. e., g(x) =

∞

Ak cos(kω0 x) + Bk sin(kω0 x) .

(7.15)

k=0

This is called a Fourier series, and the constant factors Ak , Bk are the Fourier coeﬃcients of the function g(x). Notice that in Eqn. (7.15) the frequencies of the sine and cosine functions contributing to the Fourier series are integral multiples (“harmonics”) of the fundamental frequency ω0 , including the zero frequency for k = 0. The corresponding coeﬃcients Ak and Bk , which are initially unknown, can be uniquely derived from the original function g(x). This process is commonly referred to as Fourier analysis.

7.1.3 Fourier Integral Fourier did not want to limit this concept to periodic functions and postulated that nonperiodic functions, too, could be described as sums of sine and cosine functions. While this proved to be true in principle, it generally requires— beyond multiples of the fundamental frequency (kω0 )—inﬁnitely many, densely spaced frequencies! The resulting decomposition , ∞ Aω cos(ωx) + Bω sin(ωx) dω (7.16) g(x) = 0

is called a Fourier integral, and the coeﬃcients Aω , Bω are again the weights for the corresponding cosine and sine functions with the (continuous) frequency 3

Jean Baptiste Joseph de Fourier (1768–1830).

7.1 The Fourier Transform

131

ω. The Fourier integral is the basis of the Fourier spectrum and the Fourier transform, as described below (for details, see e. g., [11, Sec. 15.3]). In Eqn. (7.16), every coeﬃcient Aω and Bω speciﬁes the amplitude of the corresponding cosine or sine function, respectively. The coeﬃcients thus deﬁne “how much of each frequency” contributes to a given function or signal g(x). But what are the proper values of these coeﬃcients for a given function g(x), and can they be determined uniquely? The answer is yes again, and the “recipe” for computing the coeﬃcients is amazingly simple: , 1 ∞ Aω = A(ω) = g(x) · cos(ωx) dx, (7.17) π −∞ , ∞ 1 Bω = B(ω) = g(x) · sin(ωx) dx. (7.18) π −∞ Since this representation of the function g(x) involves inﬁnitely many densely spaced frequency values ω, the corresponding coeﬃcients A(ω) and B(ω) are indeed continuous functions as well. They hold the continuous distribution of frequency components contained in the original signal, which is called a “spectrum”. Thus the Fourier integral in Eqn. (7.16) describes the original function g(x) as a sum of inﬁnitely many cosine and sine functions, with the corresponding Fourier coeﬃcients contained in the functions A(ω) and B(ω). In addition, a signal g(x) is uniquely and fully represented by the corresponding coeﬃcient functions A(ω) and B(ω). We know from Eqn. (7.17) how to compute the spectrum for a given function g(x), and Eqn. (7.16) explains how to reconstruct the original function from its spectrum if it is ever needed.

7.1.4 Fourier Spectrum and Transformation There is now only a small remaining step from the decomposition of a function g(x), as shown in Eqn. (7.17), to the “real” Fourier transform. In contrast to the Fourier integral, the Fourier transform treats both the original signal and the corresponding spectrum as complex-valued functions, which considerably simpliﬁes the resulting notation. Based on the functions A(ω) and B(ω) deﬁned in the Fourier integral (Eqn. (7.17)), the Fourier spectrum G(ω) of a function g(x) is given as % G(ω) = π2 A(ω) − i · B(ω) , % 1 , ∞ 1 ∞ π g(x) · cos(ωx) dx − i · g(x) · sin(ωx) dx = 2 π −∞ π −∞ , ∞ 1 g(x) · cos(ωx) − i · sin(ωx) dx, (7.19) = √ 2π −∞

132

7. Introduction to Spectral Techniques

with g(x), G(ω) ∈ C. Using Euler’s notation of complex values (Eqn. (7.14)) yields the continuous Fourier spectrum from Eqn. (7.19) in its most popular form: , ∞ 1 G(ω) = √ g(x) · cos(ωx) − i · sin(ωx) dx 2π −∞ (7.20) , ∞ 1 −iωx = √ g(x) · e dx. 2π −∞ The transition from the function g(x) to its Fourier spectrum G(ω) is called the Fourier transform 4 (F). Conversely, the original function g(x) can be reconstructed completely from its Fourier spectrum G(ω) using the inverse Fourier transform 5 (F −1 ), deﬁned as , ∞ 1 g(x) = √ G(ω) · cos(ωx) + i · sin(ωx) dω 2π −∞ (7.21) , ∞ 1 = √ G(ω) · eiωx dω. 2π −∞ In general, even if one of the involved functions (g(x) or G(ω)) is realvalued (which is usually the case for physical signals g(x)), the other function is complex-valued. One may also note that the forward transformation F (Eqn. (7.20)) and the inverse transformation F −1 (Eqn. (7.21)) are almost completely symmetrical, the sign of the exponent being the only diﬀerence.6 The spectrum produced by the Fourier transform is a new representation of the signal in a space of frequencies. Apparently, this “frequency space” and the original “signal space” are dual and interchangeable mathematical representations.

7.1.5 Fourier Transform Pairs The relationship between a function g(x) and its Fourier spectrum G(ω) is unique in both directions: the Fourier spectrum is uniquely deﬁned for a given function, and for any Fourier spectrum there is only one matching signal—the two functions g(x) and G(ω) constitute a “transform pair”, g(x)

G(ω).

Table 7.1 lists the transform pairs for some selected analytical functions, which are also shown graphically in Figs. 7.3 and 7.4. 4 5 6

Also called the “direct” or “forward” transformation. Also called “backward” transformation. Various deﬁnitions of the Fourier transform are in common use. They are contrasted mainly by the constant factors outside the integral and the signs of the exponents in the forward and inverse transforms, but all versions are equivalent in √ principle. The symmetric variant shown here uses the same factor (1/ 2π) in the forward and inverse transforms.

7.1 The Fourier Transform

133

Table 7.1 Fourier transforms of selected analytical functions; δ() denotes the “impulse” or Dirac function (see Sec. 7.2.1).

Function

Transform Pair g(x)

G(ω)

Figure

Cosine function with g(x) = cos(ω0 x)

frequency ω0 G(ω) = π2 · δ(ω +ω0 ) + δ(ω −ω0 )

7.3 (a, c)

Sine function with frequency ω0

g(x) = sin(ω0 x)

G(ω) = i π2 · δ(ω +ω0 ) − δ(ω −ω0 )

7.3 (b, d)

Gaussian function of width σ

g(x) =

Rectangular pulse of width 2b

g(x) = Πb (x) =

1 σ

G(ω) = e

G(ω) =

x2

· e− 2σ2 −σ

2 ω2 2

7.4 (a, b)

1 for |x| ≤ b 0 otherwise

7.4 (c, d)

2b sin(bω) √ 2πω

The Fourier spectrum of a cosine function cos(ω0 x), for example, consists of two separate thin pulses arranged symmetrically at a distance ω0 from the origin (Fig. 7.3 (a, c)). Intuitively, this corresponds to our physical understanding of a spectrum ( e. g., if we think of a pure monophonic sound in acoustics or the thin line produced by some extremely pure color in the optical spectrum). Increasing the frequency ω0 would move the corresponding pulses in the spectrum away from the origin. Notice that the spectrum of the cosine function is real-valued, the imaginary part being zero. Of course, the same relation holds for the sine function (Fig. 7.3 (b, d)), with the only diﬀerence being that the pulses have diﬀerent polarities and appear in the imaginary part of the spectrum. In this case, the real part of the spectrum G(ω) is zero. The Gaussian function is particularly interesting because its Fourier spectrum is also a Gaussian function (Fig. 7.4 (a, b))! It is one of the few examples where the function type in frequency space is the same as in signal space. With the Gaussian function, it is also clear to see that stretching a function in signal space corresponds to shortening its spectrum and vice versa. The Fourier transform of a rectangular pulse (Fig. 7.4 (c, d)) is the “Sinc” function of type sin(x)/x. With increasing frequencies, this function drops oﬀ quite slowly, which shows that the components contained in the original rectangular signal are spread out over a large frequency range. Thus a rectangular pulse function exhibits a very wide spectrum in general.

134

7. Introduction to Spectral Techniques

GRe (ω)

g(x)

9

7

5

3

1

1

0.5

0.5

1

1

3

5

7

9

x

9

7

5

3

0.5

0.5

1

1

% (a)

1

cosine (ω0 = 3): g(x) = cos(3x)

G(ω) =

7

5

3

1

0.5

0.5

1

3

5

7

9

x

9

7

5

3

1

0.5

0.5

1

1

% (b)

sine (ω0 = 3): g(x) = sin(3x)

G(ω) = i

π 2

3

1

1

3

5

7

9

9

7

5

3

1

0.5

0.5

1

1

cosine (ω0 = 5): g(x) = cos(5x)

G(ω) =

7

5

3

1

0.5

0.5

1

3

5

7

9

x

9

7

5

3

1

0.5

0.5

1

1

% (d)

5

7

9

ω

1

3

5

7

9

ω

GIm (ω)

1

1

3

· δ(ω+5) + δ(ω−5)

π 2

g(x)

9

1

0.5

x

% (c)

ω

1

0.5

5

9

GRe (ω)

1

7

7

· δ(ω+3) − δ(ω−3)

g(x)

9

5

GIm (ω)

1

1

3

· δ(ω+3) + δ(ω−3)

π 2

g(x)

9

1

sine (ω0 = 5): g(x) = sin(5x)

G(ω) = i

π 2

1

3

5

7

9

· δ(ω+5) − δ(ω−5)

Figure 7.3 Fourier transform pairs—cosine and sine functions.

ω

7.1 The Fourier Transform

135

GRe (ω)

g(x)

9

7

5

3

1

1

0.5

0.5

1

1

3

5

7

x

9

7

5

3

1

0.5

0.5

1

1

Gaussian (σ = 1): g(x) = e−

(a)

9

x2 2

(b)

7

5

3

1

0.5

0.5

1

3

5

7

9

x

9

7

5

3

1

0.5

0.5

1

1

Gaussian (σ = 3): g(x) =

1 3

x2

· e− 2·9

7

5

3

1.5

1.5

1

1

0.5

0.5

1

1

3

5

7

9

x

9

7

5

3

0.5

(c)

1

5

3

1.5

1.5

1

1

0.5

0.5

1

1

3

5

7

9

0.5

(d)

1

9

7

9

7

9

7

9

ω

ω2 2

rect. pulse (b = 2): g(x) = Π2 (x)

3

x

9

7

5

3

1 0.5

G(ω) =

5

ω

9ω2 2

3

5

ω

2 sin(ω) √ 2πω

G(ω) =

g(x)

7

7

GRe (ω)

0.5

rect. pulse (b = 1): g(x) = Π1 (x)

9

1

G(ω) = e−

g(x)

9

5

GRe (ω)

1

1

3

G(ω) = e−

g(x)

9

1

GRe (ω)

1

3

5

4 sin(2ω) √ 2πω

Figure 7.4 Fourier transform pairs—Gaussian functions and rectangular pulses.

ω

136

7. Introduction to Spectral Techniques

7.1.6 Important Properties of the Fourier Transform Symmetry. The Fourier spectrum extends over positive and negative frequencies and could, in principle, be an arbitrary complex-valued function. However, in many situations, the spectrum is symmetric about its origin (see, e. g., [15, p. 178]). In particular, the Fourier transform of a real-valued signal g(x) ∈ R is a so-called Hermite function with the property G(ω) = G∗ (−ω),

(7.22)

where G∗ denotes the complex conjugate of G (see also Appendix A.3). Linearity. The Fourier transform is also a linear operation such that multiplying the signal by a constant value c ∈ C scales the corresponding spectrum by the same amount, c · g(x)

c · G(ω).

(7.23)

Linearity also means that the transform of the sum of two signals g(x) = g1 (x) + g2 (x) is identical to the sum of their individual transforms G1 (ω) and G2 (ω) and thus g1 (x) + g2 (x)

G1 (ω) + G2 (ω).

(7.24)

Similarity. If the original function g(x) is scaled in space or time, the opposite eﬀect appears in the corresponding Fourier spectrum. In particular, as observed on the Gaussian function in Fig. 7.4, stretching a signal by a factor s (i. e., g(x) → g(sx)) leads to a shortening of the Fourier spectrum: g(sx)

1 |s|

·G

ω

s

.

(7.25)

Similarly, the signal is shortened if the corresponding spectrum is stretched. Shift property. If the original function g(x) is shifted by a distance d along its coordinate axis (i. e., g(x) → g(x−d)), then the Fourier spectrum multiplies by the complex value e−iωd dependent on ω: g(x−d)

e−iωd · G(ω).

(7.26)

Since e−iωd lies on the unit circle, the multiplication causes a phase shift on the spectral values (i. e., a redistribution between the real and imaginary components) without altering the magnitude |G(ω)|. Obviously, the amount (angle) of phase shift (ωd) is proportional to the angular frequency ω.

7.2 Working with Discrete Signals

137

Convolution property. From the image-processing point of view, the most interesting property of the Fourier transform is its relation to linear convolution, which we described in Vol. 1 [14, Sec. 5.3.1]. Let us assume that we have two functions g(x) and h(x) and their corresponding Fourier spectra G(ω) and H(ω), respectively. If the original functions are subject to linear convolution (i. e., g(x)∗h(x)), then the Fourier transform of the result equals the (pointwise) product of the individual Fourier transforms G(ω) and H(ω): g(x) ∗ h(x)

G(ω) · H(ω).

(7.27)

Due to the duality of signal space and frequency space, the same also holds in the opposite direction; i. e., a pointwise multiplication of two signals is equivalent to convolving the corresponding spectra: g(x) · h(x)

G(ω) ∗ H(ω).

(7.28)

A multiplication of the functions in one space (signal or frequency space) thus corresponds to a linear convolution of the Fourier spectra in the opposite space.

7.2 Working with Discrete Signals The deﬁnition of the continuous Fourier transform above is of little use for numerical computation on a computer. Neither can arbitrary continuous (and possibly inﬁnite) functions be represented in practice. Nor can the required integrals be computed. In reality, we must always deal with discrete signals, and we therefore need a new version of the Fourier transform that treats signals and spectra as ﬁnite data vectors—the “discrete” Fourier transform. Before continuing with this issue we want to use our existing wisdom to take a closer look at the process of discretizing signals in general.

7.2.1 Sampling We ﬁrst consider the question of how a continuous function can be converted to a discrete signal in the ﬁrst place. This process is usually called “sampling” (i. e., taking samples of the continuous function at certain points in time (or in space), usually spaced at regular distances). To describe this step in a simple but formal way, we require an inconspicuous but nevertheless important piece from the mathematician’s toolbox. The impulse function δ(x) We casually encountered the impulse function (also called the delta or Dirac function) earlier when we looked at the impulse response of linear ﬁlters (see

138

7. Introduction to Spectral Techniques

Vol. 1 [14, Sec. 5.3.4]) and in the Fourier transforms of the cosine and sine functions (Fig. 7.3). This function, which models a continuous “ideal” impulse, is unusual in several respects: its value is zero everywhere except at the origin, where it is nonzero (though undeﬁned), but its integral is one; i. e., , ∞ δ(x) dx = 1. (7.29) δ(x) = 0 for x = 0 and −∞

One could imagine δ(x) as a single pulse at position x = 0 that is inﬁnitesimally narrow but still contains ﬁnite energy (1). Also remarkable is the impulse function’s behavior under scaling along the time (or space) axis (i. e., δ(x) → δ(sx)), with 1 · δ(x) for s = 0. (7.30) δ(sx) = |s| Despite the fact that δ(x) does not exist in physical reality and cannot be plotted (the corresponding plots in Fig. 7.3 are for illustration only), this function is a useful mathematical tool for describing the sampling process, as shown below. Sampling with the impulse function Using the concept of the ideal impulse, the sampling process can be described in a straightforward and intuitive way.7 If a continuous function g(x) is multiplied with the impulse function δ(x), we obtain a new function g¯(x) = g(x) · δ(x) =

g(0) 0

for x = 0 otherwise.

(7.31)

The resulting function g¯(x) consists of a single pulse at position 0 whose height corresponds to the original function value g(0) (at position 0). Thus, by multiplying the function g(x) by the impulse function, we obtain a single discrete sample value of g(x) at position x = 0. If the impulse function δ(x) is shifted by a distance x0 , we can sample g(x) at an arbitrary position x = x0 , g(x0 ) for x = x0 (7.32) g¯(x) = g(x) · δ(x−x0 ) = 0 otherwise. Here δ(x−x0 ) is the impulse function shifted by x0 , and the resulting function g¯(x) is zero except at position x0 , where it contains the original function value g(x0 ). This relationship is illustrated in Fig. 7.5 for the sampling position x0 = 3. 7

The following description is intentionally casual and superﬁcial in a mathematical sense. See, e. g., [15, 47] for more precise coverage of these topics.

7.2 Working with Discrete Signals

139

g(x)

1

g¯(x)

δ(x−3)

1

3

x

1

1

3

x

1

1

3

x

Figure 7.5 Sampling with the impulse function. The continuous signal g(x) is sampled at position x0 = 3 by multiplying g(x) by a shifted impulse function δ(x−3).

To sample the function g(x) at more than one position simultaneously (e. g., at positions x1 and x2 ), we use two separately shifted versions of the impulse function, multiply g(x) by both of them, and simply add the resulting function values. In this particular case, we get g¯(x) = g(x) · δ(x−x1 ) + g(x) · δ(x−x2 ) = g(x) · δ(x−x1 ) + δ(x−x2 ) ⎧ ⎨ g(x1 ) for x = x1 = g(x2 ) for x = x2 ⎩ 0 otherwise.

(7.33) (7.34) (7.35)

From Eqn. (7.34), sampling a continuous function g(x) at N positions xi = 1, 2, . . . N can thus be described as the sum of the N individual samples, g¯(x) = g(x) · δ(x−1) + δ(x−2) + . . . + δ(x−N ) = g(x) ·

N

(7.36)

δ(x−i).

i=1

The comb function The sum of shifted impulses N i=1 δ(x−i) in Eqn. (7.36) is called a pulse sequence or pulse train. Extending this sequence to inﬁnity in both directions, we obtain the “comb” or “Shah” function III(x) =

∞

δ(x − i).

(7.37)

i=−∞

The process of discretizing a continuous function by taking samples at regular integral intervals can thus be written simply as g¯(x) = g(x) · III(x),

(7.38)

140

7. Introduction to Spectral Techniques

g(x)

9

7

5

3

1

1

3

5

7

9

3

5

7

9

x

III(x)

9

7

5

3

1

1

x

g¯(x)

x

Figure 7.6 Sampling with the comb function. The original continuous signal g(x) is multiplied by the comb function III(x). The function value g(x) is transferred to the resulting function g¯(x) only at integral positions x = xi ∈ Z and ignored at all nonintegral positions.

i. e., as a pointwise multiplication of the original signal g(x) with the comb function III(x). As Fig. 7.6 illustrates, the function values of g(x) at integral positions xi ∈ Z are transferred to the discrete function g¯(xi ) and ignored at all nonintegral positions. Of course, the sampling interval (i. e., the distance between adjacent samples) is not restricted to 1. To take samples at regular but arbitrary intervals τ , the sampling function III(x) is simply scaled along the time or space axis; i. e.,

for τ > 0. (7.39) g¯(x) = g(x) · III xτ Eﬀects of sampling in frequency space Despite the elegant formulation made possible by the use of the comb function, one may still wonder why all this math is necessary to describe a process that appears intuitively to be so simple anyway. The Fourier spectrum gives one answer to this question. Sampling a continuous function has massive—though predictable—eﬀects upon the frequency spectrum of the resulting (discrete)

7.2 Working with Discrete Signals

141

signal. Using the comb function as a formal model for the sampling process makes it relatively easy to estimate and interpret those spectral eﬀects. Similar to the Gaussian (see Sec. 7.1.5), the comb function features the rare property that its Fourier transform III(x)

1 ω) III( 2π

(7.40)

is again a comb function (i. e., the same type of function). In general, the Fourier transform of a comb function scaled to an arbitrary sampling interval τ is τ

III( xτ ) τ III 2π ω (7.41) due to the similarity property of the Fourier transform (Eqn. (7.25)). Figure 7.7 shows two examples of the comb function IIIτ (x) with sampling intervals τ = 1 and τ = 3 and the corresponding Fourier transforms. Now, what happens to the Fourier spectrum during discretization; i. e., when we multiply a function in signal space by the comb function III( xτ )? We get the answer by recalling the convolution property of the Fourier transform (Eqn. (7.27)): the product of two functions in one space (signal or frequency space) corresponds to the linear convolution of the transformed functions in the opposite space, and thus τ

G(ω) ∗ τ III 2π ω . (7.42) g(x) · III( xτ ) We already know that the Fourier spectrum of the sampling function is a comb function again and therefore consists of a sequence of regularly spaced pulses (Fig. 7.7). In addition, we know that convolving an arbitrary function with the impulse δ(x) returns the original function; i. e., f (x) ∗ δ(x) = f (x) (see Vol. 1 [14, Sec. 5.3.4]). Convolving with a shifted pulse δ(x−d) also reproduces the original function f (x), though shifted by the same distance d; i. e., f (x) ∗ δ(x−d) = f (x−d).

(7.43)

As a consequence, the spectrum G(ω) of the original continuous signal becomes ¯ replicated in the Fourier spectrum G(ω) of a sampled signal at every pulse of the sampling function’s spectrum; i. e., inﬁnitely many times (see Fig. 7.8 (a, b))! Thus the resulting Fourier spectrum is repetitive with a period 2π τ , which corresponds to the sampling frequency ωs . Aliasing and the sampling theorem ¯ As long as the spectral replicas in G(ω) created by the sampling process do not overlap, the original spectrum G(ω)—and thus the original continuous function—can be reconstructed without loss from any isolated replica of G(ω) ¯ in the periodic spectrum G(ω). As we can see in Fig. 7.8, this requires that

142

7. Introduction to Spectral Techniques

Comb function: III1 (x) = III(x)

τ =1 (a)

13

11

9

7

5

3

1

1

3 5 τ =1

7

9

11

13

11

13

x

1 ω) Fourier transform: III( 2π

(b) 13

11

9

7

5

3

1

1

3

5

7

9

ω

ω0 = 2π Comb function: III3 (x) = III( 13 x)

τ =3 (c)

13

11

9

7

5

3

1

1

3

5 τ =3

7

9

11

13

11

13

x

3 ω) Fourier transform: 3III( 2π

(d) 13

11

9

7

5

3

1

1

3

ω0 =

5

7

9

ω

2π 3

Figure 7.7 Comb function and its Fourier transform. Comb function IIIτ (x) for the sampling interval τ = 1 (a) and its Fourier transform. Comb function for τ = 3 (c) and its Fourier transform (d). Note that the actual height of the δ-pulses is undeﬁned and shown only for illustration.

the frequencies contained in the original signal g(x) be within some upper limit ωmax ; i. e., the signal contains no components with frequencies greater than ωmax . The maximum allowed signal frequency ωmax depends upon the sampling frequency ωs used to discretize the signal, with the requirement ωmax ≤ 12 ωs

or ωs ≥ 2ωmax .

(7.44)

Discretizing a continuous signal g(x) with frequency components in the range 0 ≤ ω ≤ ωmax thus requires a sampling frequency ωs of at least twice the maximum signal frequency ωmax . If this condition is not met, the replicas in the spectrum of the sampled signal overlap (Fig. 7.8 (c)) and the spectrum becomes corrupted. Consequently, the original signal cannot be recovered ﬂawlessly from the sampled signal’s spectrum. This eﬀect is commonly called “aliasing”.

7.2 Working with Discrete Signals

143

G(ω) (a) ω ωmax ¯ 1 (ω) G (b) ω ωmax ω1 ¯ 2 (ω) G (c) ω ω2 aliasing Figure 7.8 Spectral eﬀects of sampling. The spectrum G(ω) of the original continuous signal is assumed to be band-limited within the range ±ωmax (a). Sampling the signal at a rate (sampling frequency) ωs = ω1 causes the signal’s spectrum G(ω) to be replicated at multiples of ω1 along the frequency (ω) axis (b). Obviously, the replicas in the spectrum do not overlap as long as ωs > 2ωmax . In (c), the sampling frequency ωs = ω2 is less than 2ωmax , so there is overlap between the replicas in the spectrum, and frequency components are mirrored at 2ωmax and superimpose the original spectrum. This eﬀect is called “aliasing” because the original spectrum (and thus the original signal) cannot be reproduced from such a corrupted spectrum.

What we just said in simple terms is nothing but the essence of the famous “sampling theorem” formulated by Shannon and Nyquist (see e. g. [15, p. 256]). It actually states that the sampling frequency must be at least twice the bandwidth 8 of the continuous signal to avoid aliasing eﬀects. However, if we assume that a signal’s frequency range starts at zero, then bandwidth and maximum frequency are the same anyway.

8

This may be surprising at ﬁrst because it allows a signal with high frequency—but low bandwidth—to be sampled (and correctly reconstructed) at a relatively low sampling frequency, even well below the maximum signal frequency. This is possible because one can also use a ﬁlter with suitably low bandwidth for reconstructing the original signal. For example, it may be suﬃcient to strike (i. e., “sample”) a church bell (a low-bandwidth oscillatory system with small internal damping) to uniquely generate a sound wave of relatively high frequency.

144

7. Introduction to Spectral Techniques

7.2.2 Discrete and Periodic Functions Assume that we are given a continuous signal g(x) that is periodic with a period of length T . In this case, the corresponding Fourier spectrum G(ω) is a sequence of thin spectral lines equally spaced at a distance ω0 = 2π/T . As discussed in Sec. 7.1.2, the Fourier spectrum of a periodic function can be represented as a Fourier series and is therefore discrete. Conversely, if a continuous signal g(x) is sampled at regular intervals τ , then the corresponding Fourier spectrum becomes periodic with a period of length ωs = 2π/τ . Sampling in signal space thus leads to periodicity in frequency space and vice versa. Figure 7.9 illustrates this relationship and the transition from a continuous nonperiodic signal to a discrete periodic function, which can be represented as a ﬁnite vector of numbers and thus easily processed on a computer. Thus, in general, the Fourier spectrum of a continuous, nonperiodic signal g(x) is also continuous and nonperiodic (Fig. 7.9 (a, b)). However, if the signal g(x) is periodic, then the corresponding spectrum is discrete (Fig. 7.9 (c,d)). Conversely, a discrete—but not necessarily periodic—signal leads to a periodic spectrum (Fig. 7.9 (e, f)). Finally, if a signal is discrete and periodic with M samples per period, then its spectrum is also discrete and periodic with M values (Fig. 7.9 (g, h)). Note that the particular signals and spectra in Fig. 7.9 were chosen for illustration only and do not really correspond with each other.

7.3 The Discrete Fourier Transform (DFT) In the case of a discrete periodic signal, only a ﬁnite sequence of M sample values is required to completely represent either the signal g(u) itself or its Fourier spectrum G(m).9 This representation as ﬁnite vectors makes it straightforward to store and process signals and spectra on a computer. What we still need is a version of the Fourier transform applicable to discrete signals.

7.3.1 Deﬁnition of the DFT The discrete Fourier transform is, just like its continuous counterpart, identical in both directions. For a discrete signal g(u) of length M (u = 0 . . . M −1), the 9

Notation: we use g(x), G(ω) for a continuous signal or spectrum, respectively, and g(u), G(m) for the discrete versions.

7.3 The Discrete Fourier Transform (DFT)

Signal g(x)

g(x)

145

G(ω)

Spectrum G(ω)

x

(a) Continuous nonperiodic signal.

ω

(b) Continuous nonperiodic spectrum.

g(x)

G(ω)

x

t0 (c) Continuous periodic signal with period t0 .

ω

ω0 (d) Discrete nonperiodic spectrum with values spaced at ω0 = 2π/t0 . G(ω)

g(x)

ω

x

ts (e) Discrete nonperiodic signal with samples spaced at ts . g(x)

ωs (f) Continuous periodic spectrum with period ωs = 2π/ts . G(ω)

ts

ωs

ω

x

t0 (g) Discrete periodic signal with samples spaced at ts and period t0 = ts M .

ω0 (h) Discrete periodic spectrum with values spaced at ω0 = 2π/t0 and period ωs = 2π/ts = ω0 M .

Figure 7.9 Transition from continuous to discrete periodic functions.

146

7. Introduction to Spectral Techniques

u

g(u)

G(m)

1.0000

0.0000

1

3.0000

0.0000

2

5.0000

0.0000

3

7.0000

4

m

14.2302

0.0000

DFT

−5.6745

−2.9198

1

−→

∗ 0.0000

∗ 0.0000

2

0.0000

−0.0176

−0.6893

3

9.0000

0.0000

∗ 0.0000

∗ 0.0000

4

5

8.0000

0.0000

0.3162

0.0000

5

6

6.0000

0.0000

∗ 0.0000

∗ 0.0000

6

7

4.0000

0.0000

DFT−1

−0.0176

0.6893

7

8

2.0000

0.0000

←−

∗ 0.0000

∗ 0.0000

8

9

0.0000

0.0000

−5.6745

2.9198

9

Re

Im

Re

Im

Figure 7.10 Complex-valued vectors (example). In the discrete Fourier transform (DFT), both the original signal g(u) and its spectrum G(m) are complex-valued vectors of length M (M = 10 in this example); ∗ indicates values with |G(m)| < 10−15 .

forward transform (DFT) is deﬁned as M−1 mu mu 1 − i · sin 2π G(m) = √ g(u) · cos 2π M M M u=0 M−1 1 mu = √ g(u) · e−i2π M M u=0

(7.45)

for 0 ≤ m < M

and the inverse transform (DFT−1 ) as M−1 mu mu 1 g(u) = √ G(m) · cos 2π + i · sin 2π M M M m=0 M−1 1 mu G(m) · ei2π M = √ M m=0

(7.46)

for 0 ≤ u < M.

(Compare these deﬁnitions with the corresponding expressions for the continuous forward and inverse Fourier transforms in Eqns. (7.20) and (7.21), respectively.) Both the signal g(u) and the discrete spectrum G(m) are complexvalued vectors of length M , g(u) = gRe (u) + i·gIm(u),

(7.47)

G(m) = GRe (m) + i·GIm (m), for u, m = 0 . . . M −1 (Fig. 7.10). Expanding the ﬁrst line of Eqn. (7.45), we

7.3 The Discrete Fourier Transform (DFT)

147

obtain the complex values of the Fourier spectrum in component notation as M−1

1 √ − i·sin 2π mu , G(m) = gRe (u) + i·gIm(u) · cos 2π mu M M /0 1 . /0 1 . /0 1 M u=0 . g(u) C M (u) S M (u) m

(7.48)

m

M where we denote as C M m and S m the discrete (cosine and sine) basis functions, as described in the next section. Applying the usual complex multiplication, we obtain the real and imaginary parts of the discrete Fourier spectrum as M−1 1 M GRe (m) = √ gRe (u) · C M m (u) + gIm (u) · S m (u), M u=0

(7.49)

M−1 1 M gIm (u) · C M GIm (m) = √ m (u) − gRe (u) · S m (u), M u=0

(7.50)

for m = 0 . . . M − 1. Analogously, the inverse DFT in Eqn. (7.46) expands to M−1

1 + i·sin 2π mu , (7.51) GRe (m) + i·GIm (m) · cos 2π mu g(u) = √ M M /0 1 . /0 1 . /0 1 M m=0 . G(m) C M (u) S M (u) m

m

and thus the real and imaginary parts of the reconstructed signal are M−1 1 M GRe (m) · C M gRe (u) = √ m (u) − GIm (m) · S m (u), M m=0

(7.52)

M−1 1 M GIm (m) · C M gIm (u) = √ m (u) + GRe (m) · S m (u), M m=0

(7.53)

for u = 0 . . . M − 1.

7.3.2 Discrete Basis Functions Equation (7.51) describes the decomposition of the discrete function g(u) into M a ﬁnite sum of M discrete cosine and sine functions (C M m , S m ) whose weights (or “amplitudes”) are determined by the DFT coeﬃcients in G(m). Each of these one-dimensional basis functions, mu

M CM (7.54) = cos(ωm u), m (u) = C u (m) = cos 2π M mu

M M (7.55) S m (u) = S u (m) = sin 2π M = sin(ωm u), is periodic with M and has a discrete frequency (wave number) m, which corresponds to the angular frequency m ωm = 2π . M

148

7. Introduction to Spectral Techniques

As an example, Figs. 7.11 and 7.12 show the discrete basis functions (with integer ordinate values u ∈ Z) for the DFT of length M = 8 as well as their continuous counterparts (with ordinate values x ∈ R). For wave number m = 0, the cosine function C M 0 (u) (Eqn. (7.54)) has the constant value 1. The corresponding DFT coeﬃcient GRe (0)—the real part of G(0)—thus speciﬁes the constant part of the signal or the average value of the signal g(u) in Eqn. (7.52). In contrast, the zero-frequency sine function S M 0 (u) is zero for any value of u and thus cannot contribute anything to the signal. The corresponding DFT coeﬃcients GIm (0) in Eqn. (7.52) and GRe (0) in Eqn. (7.53) are therefore of no relevance. For a real-valued signal (i. e., gIm (u) = 0 for all u), the coeﬃcient GIm (0) in the corresponding Fourier spectrum must also be zero. As shown in Fig. 7.11, the wave number m = 1 relates to a cosine or sine function that performs exactly one full cycle over the signal length M = 8. Similarly, the wave numbers m = 2 . . . 7 correspond to 2 . . . 7 complete cycles over the signal length M (Figs. 7.11 and 7.12).

7.3.3 Aliasing Again! A closer look at Figs. 7.11 and 7.12 reveals an interesting fact: the sampled (discrete) cosine and sine functions for m = 3 and m = 5 are identical, although their continuous counterparts are diﬀerent! The same is true for the frequency pairs m = 2, 6 and m = 1, 7. What we see here is another manifestation of the sampling theorem—which we had originally encountered (Sec. 7.2.1) in frequency space—in signal space. Obviously, m = 4 is the maximum frequency component that can be represented by a discrete signal of length M = 8. Any discrete function with a higher frequency (m = 5 . . . 7 in this case) has an identical counterpart with a lower wave number and thus cannot be reconstructed from the sampled signal! If a continuous signal is sampled at a regular distance τ , the corresponding Fourier spectrum is repeated at multiples of ωs = 2π/τ , as we have shown earlier (Fig. 7.8). In the discrete case, the spectrum is periodic with length M . Since the Fourier spectrum of a real-valued signal is symmetric about the origin (Eqn. (7.22)), there is for every coeﬃcient with wave number m an equalsized duplicate with wave number −m. Thus the spectral components appear pairwise and mirrored at multiples of M ; i. e., |G(m)| = |G(M −m)| = |G(M +m)| = |G(2M −m)| = |G(2M +m)| ... = |G(kM −m)| = |G(kM +m)|

(7.56)

7.3 The Discrete Fourier Transform (DFT)

C8m (u) = cos

149

2πm

u 8

S8m (u) = sin

C80 (u)

S80 (u)

1

1

0.5

0.5

1

2

3

4

5

6

7

8

u

1

0.5

0.5

1

1

C81 (u)

S81 (u)

1

1

0.5

0.5

1

2

3

4

5

6

7

8

u

1

0.5

0.5

1

1

C82 (u)

S82 (u)

1

1

0.5

0.5

1

2

3

4

5

6

7

8

u

1

0.5

0.5

1

1

C83 (u)

S83 (u)

1

1

0.5

0.5

1

2

3

4

5

6

7

8

u

1

0.5

0.5

1

1

2πm

u 8 m=0

2

3

4

5

6

7

8

u

m=1

2

3

4

5

6

7

8

u

m=2

2

3

4

5

6

7

8

u

m=3

2

3

4

5

6

7

8

u

M Figure 7.11 Discrete basis functions CM m (u) and Sm (u) for the signal length M = 8 and wave numbers m = 0 . . . 3. Each plot shows both the discrete function (round dots) and the corresponding continuous function.

150

7. Introduction to Spectral Techniques

C8m (u) = cos

2πm

u 8

S8m (u) = sin

C84 (u)

S84 (u)

1

1

0.5

0.5

1

2

3

4

5

6

7

8

u

1

0.5

0.5

1

1

C85 (u)

S85 (u)

1

1

0.5

0.5

1

2

3

4

5

6

7

8

u

1

0.5

0.5

1

1

C86 (u)

S86 (u)

1

1

0.5

0.5

1

2

3

4

5

6

7

8

u

1

0.5

0.5

1

1

C87 (u)

S87 (u)

1

1

0.5

0.5

1

2

3

4

5

6

7

8

u

1

0.5

0.5

1

1

2πm

u 8 m=4

2

3

4

5

6

7

8

u

m=5

2

3

4

5

6

7

8

u

m=6

2

3

4

5

6

7

8

u

m=7

2

3

4

5

6

7

8

u

Figure 7.12 Discrete basis functions (continued). Signal length M = 8 and wave numbers m = 4 . . . 7. Notice that, for example, the discrete functions for m = 5 and m = 3 (Fig. 7.11) are identical because m = 4 is the maximum wave number that can be represented in a discrete spectrum of length M = 8.

7.3 The Discrete Fourier Transform (DFT)

C8m (u) = cos

151

2πm

u 8

S8m (u) = sin

C81

S81 (u)

1

1

0.5

0.5

1

2

3

4

5

6

7

8

u

1

0.5

0.5

1

1

C89

S89 (u)

1

1

0.5

0.5

1

2

3

4

5

6

7

8

u

1

0.5

0.5

1

1

C817

S817 (u)

1

1

0.5

0.5

1

2

3

4

5

6

7

8

u

1

0.5

0.5

1

1

2πm

u 8 m=1

2

3

4

5

6

7

8

u

m=9

2

3

4

5

6

7

8

u

m = 17

2

3

4

5

6

7

8

u

Figure 7.13 Aliasing in signal space. For the signal length M = 8, the discrete cosine and sine basis functions for the wave numbers m = 1, 9, 17, . . . (round dots) are all identical. The sampling frequency itself corresponds to the wave number m = 8.

for all k ∈ Z. If the original continuous signal contains “energy” with the frequencies ωm > ωM/2 (i. e., signal components with wave numbers m > M/2), then, according to the sampling theorem, the overlapping parts of the spectra are superimposed in the resulting periodic spectrum of the discrete signal.

152

7. Introduction to Spectral Techniques

7.3.4 Units in Signal and Frequency Space The relationship between the units in signal and frequency space and the interpretation of wave numbers m is a common cause of confusion. While the discrete signal and its spectrum are simple numerical vectors and units of measurement are irrelevant for computing the DFT itself, it is nevertheless important to understand how the coordinates in the spectrum relate to physical dimensions in the real world. Clearly, every complex-valued spectral coeﬃcient G(m) corresponds to one pair of cosine and sine functions with a particular frequency in signal space. Assume a continuous signal is sampled at M consecutive positions spaced at τ (an interval in time or distance in space). The wave number m = 1 then corresponds to the fundamental period of the discrete signal (which is now assumed to be periodic) with a period of length M τ ; i. e., to the frequency 1 . (7.57) Mτ In general, the wave number m of a discrete spectrum relates to the physical frequency as 1 fm = m (7.58) = m · f1 Mτ for 0 ≤ m < M , which is equivalent to the angular frequency 2π = m · ω1 . ωm = 2πfm = m (7.59) Mτ Obviously then, the sampling frequency fs = 1/τ = M · f1 corresponds to the wave number ms = M . As expected, the maximum nonaliased wave number in the spectrum is ms M = , (7.60) mmax = 2 2 half the wave number of the sampling frequency ms . f1 =

Example 1: Time-domain signal We assume for this example that g(u) is a signal in the time domain (e. g., a discrete sound signal) that contains M = 500 sample values taken at regular intervals τ = 1 ms = 10−3 s. Thus the sampling frequency is fs = 1/τ = 1000 Hertz (cycles per second) and the total duration (fundamental period) of the signal is M τ = 0.5 s. The signal is implicitly periodic, and from Eqn. (7.57) we obtain its funda1 1 mental frequency as f1 = 500·10 −3 = 0.5 = 2 Hertz. The wave number m = 2 in this case corresponds to a real frequency f2 = 2f1 = 4 Hertz, f3 = 6 Hertz, etc. The maximum frequency that can be represented by this discrete signal 1 without aliasing is fmax = M 2 f1 = 2τ = 500 Hertz, exactly half the sampling frequency fs .

7.3 The Discrete Fourier Transform (DFT)

153

Example 2: Space-domain signal Assume we have a one-dimensional print pattern with a resolution (i. e., spatial sampling frequency) of 120 dots per cm, which equals approximately 300 dots per inch (dpi) and a total signal length of M = 1800 samples. This corresponds to a spatial sampling interval of τ = 1/120 cm ≈ 83 μm and a physical signal length of (1800/120) cm = 15 cm. The fundamental frequency of this signal (again implicitly assumed to be 1 periodic) is f1 = 15 , expressed in cycles per cm. The sampling frequency is fs = 120 cycles per cm and thus the maximum signal frequency is fmax = f2s = 60 cycles per cm. The maximum signal frequency speciﬁes the ﬁnest structure 1 cm) that can be reproduced by this print raster. ( 60

7.3.5 Power Spectrum The magnitude of the complex-valued Fourier spectrum % |G(m)| = G2Re (m) + G2Im (m)

(7.61)

is commonly called the “power spectrum” of a signal. It speciﬁes the energy that individual frequency components in the spectrum contribute to the signal. The power spectrum is real-valued and positive and thus often used for graphically displaying the results of Fourier transforms (see also Sec. 8.2). Since all phase information is lost in the power spectrum, the original signal cannot be reconstructed from the power spectrum alone. However, because of the missing phase information, the power spectrum is insensitive to shifts of the original signal and can thus be eﬃciently used for comparing signals. To be more precise, the power spectrum of a cyclically shifted signal is identical to the power spectrum of the original signal. Thus, given a discrete periodic signal g1 (u) of length M and a second signal g2 (u) shifted by some oﬀset d, such that g2 (u) = g1 (u−d),

(7.62)

the corresponding power spectra are the same, |G2 (m)| = |G1 (m)|,

(7.63)

although in general the complex-valued spectra G1 (m) and G2 (m) are diﬀerent. Furthermore, from the symmetry property of the Fourier spectrum, it follows that |G(m)| = |G(−m)| for real-valued signals g(u) ∈ R.

(7.64)

154

7. Introduction to Spectral Techniques

7.4 Implementing the DFT 7.4.1 Direct Implementation Based on the deﬁnitions in Eqns. (7.49)–(7.50) the DFT can be directly implemented, as shown in Prog. 7.1. The main method DFT() transforms a signal vector of arbitrary length M (not necessarily a power of 2). It requires roughly M 2 operations (multiplications and additions); i. e., the time complexity of this DFT algorithm is O(M 2 ). One way to improve the eﬃciency of the DFT algorithm is to use lookup tables for the sin and cos functions (which are relatively “expensive” to compute) since only function values for a set of M diﬀerent angles ϕm are ever m needed. The angles ϕm = 2π M corresponding to m = 0 . . . M − 1 are evenly ◦ distributed over the full 360 circle. Any integral multiple ϕm · u (for u ∈ Z) can only fall onto one of these angles again because ≡ ϕm · u = 2π mu M

2π M (mu /0 M1) . mod 0≤k rmax ,

yc + r · sin(β)

for r ≤ rmax

y

(10.42)

(10.43)

for r > rmax ,

with %

dx = x − xc ,

r=

dy = y − yc ,

β = Arctan(dy , dx ) + α ·

d2x + d2y ,

rmax −r

rmax

.

206

10. Geometric Operations

(a)

(b)

(c)

(d)

(e)

(f)

Figure 10.7 Various nonlinear image deformations: twirl (a, d), ripple (b, e), and sphere (c, f) transformations. The original (source) images are shown in Fig. 10.6 (a) and Fig. 10.1 (a), respectively.

Figure 10.7 (a, d) shows a twirl mapping with the anchor point xc placed at the image center. The limiting radius rmax is half the length of the image diagonal, and the rotation angle is α = 43◦ at the center. A Java implementation of this transformation is shown in the class TwirlMapping on page 247. “Ripple” transformation The ripple transformation causes a local wavelike displacement of the image along both the x and y directions. The parameters of this mapping function are the period lengths τx , τy = 0 (in pixels) and the corresponding amplitude values ax , ay for the displacement in both directions:

, Tx−1 : x = x + ax · sin 2π·y τx 2π·x

−1 Ty : y = y + ay · sin τy .

(10.44) (10.45)

An example for the ripple mapping with τx = 120, τy = 250, ax = 10, and ay = 15 is shown in Fig. 10.7 (b, e).

10.1 2D Mapping Function

207

Spherical transformation The spherical deformation imitates the eﬀect of viewing the image through a transparent hemisphere or lens placed on top of the image. The parameters of this transformation are the position xc = (xc , yc ) of the lens center, the radius of the lens rmax and its refraction index ρ. The corresponding mapping functions are deﬁned as z · tan(βx ) for r ≤ rmax −1 Tx : x = x − (10.46) 0 for r > rmax , z · tan(βy ) for r ≤ rmax (10.47) Ty−1 : y = y − 0 for r > rmax , with dx = x −xc , dy = y −yc ,

% d2x + d2y , 2 z = rmax − r2 , r=

βx = 1− ρ1 · sin−1 √ d2x 2 , (dx +z )

−1 √ dy 1 βy = 1− ρ · sin . 2 2 (dy +z )

Figure 10.7 (c, f) shows a spherical transformation with the lens positioned at the image center. The lens radius rmax is set to half of the image width, and the refraction index is ρ = 1.8.

10.1.7 Local Image Transformations All the geometric transformations discussed so far are global (i. e., the same mapping function is applied to all pixels in the given image). It is often necessary to deform an image such that a larger number of n original image points x1 . . . xn are precisely mapped onto a given set of target points x1 . . . xn . For n = 3, this problem can be solved with an aﬃne mapping (see Sec. 10.1.3), and for n = 4 we could use a projective or bilinear mapping (see Secs. 10.1.4 and 10.1.5). A precise global mapping of n > 4 points requires a more complicated function T (x) (e. g., a two-dimensional nth-order polynomial or a spline function). An alternative is to use local or piecewise transformations, where the image is partitioned into disjoint patches that are transformed separately, applying an individual mapping function to each patch. In practice, it is common to partition the image into a mesh of triangles or quadrilaterals, as illustrated in Fig. 10.8. For a triangular mesh partitioning (Fig. 10.8 (a)), the transformation between each pair of triangles Di → Di could be accomplished with an aﬃne mapping, whose parameters must be computed individually for every patch. Similarly, the projective transformation would be suitable for mapping each

208

10. Geometric Operations

Di

Di

(a)

Qi

Qi

(b)

Figure 10.8 Mesh partitioning. Almost arbitrary image deformations can be implemented by partitioning the image plane into nonoverlapping triangles Di , Di (a) or quadrilaterals Qi , Qi (b) and applying simple local transformations. Every patch in the resulting mesh is transformed separately with the required transformation parameters derived from the corresponding three or four corner points, respectively.

patch in a mesh partitioning composed of quadrilaterals Qi (Fig. 10.8 (b)). Since both the aﬃne and the projective transformations preserve the straightness of lines, we can be certain that no holes or overlaps will arise and the deformation will appear continuous between adjacent mesh patches. Local transformations of this type are frequently used; for example, to register aerial and satellite images or to undistort images for panoramic stitching. In computer graphics, similar techniques are used to map texture images onto polygonal 3D surfaces in the rendered 2D image. Another popular application of this technique is “morphing” [79], which performs a stepwise geometric

10.2 Resampling the Image

209

transformation from one image to another while simultaneously blending their intensity (or color) values.2

10.2 Resampling the Image In the discussion of geometric transformations, we have so far considered the 2D image coordinates as being continuous (i. e., real-valued). In reality, the picture elements in digital images reside at discrete (i. e., integer-valued) coordinates, and thus transferring a discrete image into another discrete image without introducing signiﬁcant losses in quality is a nontrivial subproblem in the implementation of geometric transformations. Based on the original image I(u, v) and some (continuous) geometric transformations T (x, y), the aim is to create a transformed image I (u , v ) where all coordinates are discrete (i. e., u, v ∈ Z and u , v ∈ Z).3 This can be accomplished in one of two ways, which diﬀer by the mapping direction and are commonly referred to as source-to-target or target-to-source mapping, respectively.

10.2.1 Source-to-Target Mapping In this approach, which appears quite natural at ﬁrst sight, we compute for every pixel (u, v) of the original (source) image I the corresponding transformed position (x , y ) = T (u, v) in the target image I . In general, the result will not coincide with any of the raster points, as illustrated in Fig. 10.9. Subsequently, we would have to decide in which pixel in the target image I the original intensity or color value from I(u, v) should be stored. We could perhaps even think of somehow distributing this value onto all adjacent pixels. The problem with the source-to-target method is that, depending on the geometric transformation T , some elements in the target image I may never be “hit” at all (i. e., never receive a source pixel value)! This happens, for example, when the image is enlarged (even slightly) by the geometric transformation. The resulting holes in the target image would be diﬃcult to close in a subsequent processing step. Conversely, one would have to consider (e. g., when the image is shrunk) that a single element in the target image I may be hit by multiple source pixels and thus image content may get lost. In the light 2 3

Image morphing has also been implemented in ImageJ as a plugin (iMorph) by Hajime Hirase (http://rsb.info.nih.gov/ij/plugins/morph.html). Remark on notation: We use (u, v) or (u , v ) to denote discrete (integer) coordinates and (x, y) or (x , y ) for continuous (real-valued) coordinates.

210

10. Geometric Operations

target image I

source image I T v

y

u

x

Figure 10.9 Source-to-target mapping. For each discrete pixel position (u, v) in the source image I, the corresponding (continuous) target position (x , y ) is found by applying the geometric transformation T (u, v). In general, the target position (x , y ) does not coincide with any discrete raster point. The source pixel value I(u, v) is subsequently transferred to one of the adjacent target pixels.

of all these complications, source-to-target mapping is not really the method of choice.

10.2.2 Target-to-Source Mapping This method avoids most diﬃculties encountered in the source-to-target mapping by simply reversing the image generation process. For every discrete pixel position (u , v ) in the target image, we compute the corresponding (continuous) point (x, y) = T −1(u , v ) in the source image plane using the inverse geometric transformation T −1 . Of course, the coordinate (x, y) again does not fall onto a raster point in general and thus we have to decide from which of the neighboring source pixels to extract the resulting target pixel value. This problem of interpolating among intensity values will be discussed in detail in Sec. 10.3. The major advantage of the target-to-source method is that all pixels in the target image I (and only these) are computed and ﬁlled exactly once such that no holes or multiple hits can occur. This, however, requires the inverse geometric transformation T −1 to be available, which is no disadvantage in most cases since the forward transformation T itself is never really needed. Due to its simplicity, which is also demonstrated in Alg. 10.1, target-to-source mapping is the common method for geometrically transforming 2D images.

10.3 Interpolation Interpolation is the process of estimating the intermediate values of a sampled function or signal at continuous positions or the attempt to reconstruct

10.3 Interpolation

211

target image I

source image I T −1 y

v

x

u

Figure 10.10 Target-to-source mapping. For each discrete pixel position (u , v ) in the target image I , the corresponding continuous source position (x, y) is found by applying the inverse mapping function T −1 (u , v ). The new pixel value I (u , v ) is determined by interpolating the pixel values in the source image within some neighborhood of (x, y). Algorithm 10.1 Geometric image transformation using target-to-source mapping. Given are the original (source) image I and the continuous coordinate transformation T . getInterpolatedValue(I, x, y) returns the interpolated value of the source image I at the continuous position (x, y).

1: TransformImage (I, T )

I: source image T : continuous coordinate transform function (R2 → R2 ) Returns the transformed image. 2: 3: 4: 5: 6:

Create the target image I . for all target image coordinates (u , v ) do Let (x, y) ← T −1(u , v ) I (u , v ) ← getInterpolatedValue(I, x, y) return I .

the original continuous function from a set of discrete samples. In the context of geometric operations this task arises from the fact that discrete pixel positions in one image are generally not mapped to discrete raster positions in the other image under some continuous geometric transformation T (or T −1 , respectively). The concrete goal is to obtain an optimal estimate for the value of the two-dimensional image function I(x, y) at any continuous position (x, y) ∈ R2 . In practice, the interpolated function should preserve as much detail (i. e., sharpness) as possible without causing visible artifacts such as ringing or moiré patterns.

10.3.1 Simple Interpolation Methods To illustrate the problem, we ﬁrst attend to the one-dimensional case (Fig. 10.11). Several simple methods exist for interpolating the values of a

212

10. Geometric Operations

g(u)

f (x)

1

2

3

4

5

6

7

8

9

10

u

1

2

3

4

5

(a)

6

7

8

9

x

10

(b)

Figure 10.11 Interpolating a discrete function in 1D. Given the discrete function values g(u) (a), the goal is to estimate the original function f (x) at arbitrary continuous positions x ∈ R (b). gˆ(x)

1

gˆ(x)

2

3

4

5

6

7

8

9

10

x

1

2

3

4

(a)

5

6

7

8

9

10

x

(b)

Figure 10.12 Simple interpolation methods. The nearest-neighbor interpolation (a) simply selects the discrete sample g(u) closest to the given continuous coordinate x as the interpolating value gˆ(x). Under linear interpolation (b), the result is a piecewise linear function connecting adjacent samples g(u) and g(u + 1).

discrete function g(u), with u ∈ Z, at arbitrary continuous positions x ∈ R. While these ad hoc methods are easy to implement, they lack a theoretical justiﬁcation and usually give poor results. Nearest-neighbor interpolation The simplest of all interpolation methods is to round the continuous coordinate x to the closest integer u0 and use the sample g(u0 ) as the estimated function value gˆ(x), gˆ(x) = g(u0 ), where u0 = round(x) = x + 0.5.

(10.48) (10.49)

A typical result of this so-called nearest-neighbor interpolation is shown in Fig. 10.12 (a).

10.3 Interpolation

213

Linear interpolation Another simple method is linear interpolation. Here the estimated value is the sum of the two closest samples g(u0 ) and g(u0 + 1), with u0 = x. The weight of each sample is proportional to its closeness to the continuous position x,

gˆ(x) = g(u0 ) + (x − u0 ) · g(u0 + 1) − g(u0 )

= g(u0 ) · 1 − (x − u0 ) + g(u0 + 1) · (x − u0 ). (10.50) As shown in Fig. 10.12 (b), the result is a piecewise linear function made up of straight line segments between consecutive sample values.

10.3.2 Ideal Interpolation Obviously the results of these simple interpolation methods do not well approximate the original continuous function (Fig. 10.11). But how can we obtain a better approximation from the discrete samples only when the original function is unknown? This may appear hopeless at ﬁrst, because the discrete samples g(u) could possibly originate from any continuous function f (x) with identical values at the discrete sample positions. We ﬁnd an intuitive answer to this question (once again) by looking at the functions in the spectral domain. If the original function f (x) was discretized in accordance with the sampling theorem (see Sec. 7.2.1), then f (x) must have been “band limited”—it could not contain any signal components with frequencies higher than half the sampling frequency ωs . This means that the reconstructed signal can only contain a limited set of frequencies and thus its trajectory between the discrete sample values is not arbitrary but naturally constrained. In this context, absolute units of measure are of no concern since in a digital signal all frequencies relate to the sampling frequency. In particular, if we take τs = 1 as the (unitless) sampling interval, the resulting sampling frequency is ωs = 2π and thus the maximum signal frequency is ωmax = ω2s = π. To isolate the frequency range −ωmax . . . ωmax in the corresponding (periodic) Fourier spectrum, we multiply the spectrum G(ω) by a square windowing function Ππ (ω) of width ±ωmax = ±π, 1 for − π ≤ ω ≤ π ˆ G(ω) = G(ω) · Ππ (ω) = G(ω) · 0 otherwise. This is called an ideal low-pass ﬁlter, which cuts oﬀ all signal components with frequencies greater than π and keeps all lower-frequency components unchanged. In the signal domain, the operation in Eqn. (10.51) corresponds (see

214

10. Geometric Operations

Sinc(x)

1

0.5

-6

-4

x

-2

2

4

6

Figure 10.13 Sinc function in 1D. The function Sinc(x) has the value 1 at the origin and 1 zero values at all integer positions. The dashed line plots the amplitude | πx | of the underlying sine function.

Eqn. (7.28)) to a linear convolution with the inverse Fourier transform of the windowing function Ππ (ω), which is the Sinc function, deﬁned as 1 for |x| = 0 Sinc(x) = sin(πx) (10.51) for |x| > 0 πx and shown in Fig. 10.13 (see also Table 7.1). This correspondence, which was already discussed in Sec. 7.1.6, between convolution in the signal domain and simple multiplication in the frequency domain is summarized in Fig. 10.14. So theoretically Sinc(x) is the ideal interpolation function for reconstructing a frequency-limited continuous signal. To compute the interpolated value for the discrete function g(u) at an arbitrary position x0 , the Sinc function is shifted to x0 (such that its origin lies at x0 ), multiplied with all sample values g(u), with u ∈ Z, and the results are summed—i. e., g(u) and Sinc(x) are convolved. The reconstructed value of the continuous function at position x0 is thus gˆ(x0 ) = [Sinc ∗ g] (x0 ) =

∞

Sinc(x0 − u) · g(u),

(10.52)

u=−∞

where ∗ is the linear convolution operator (see Vol. 1 [14, Sec. 5.3.1]). If the discrete signal g(u) is ﬁnite with length N (as is usually the case), it is assumed to be periodic (i. e., g(u) = g(u + kN ) for all k ∈ Z).4 In this case, Eqn. (10.52) modiﬁes to gˆ(x0 ) =

∞

Sinc(x0 − u) · g(u mod N ).

(10.53)

u=−∞

It may be surprising that the ideal interpolation of a discrete function g(u) at a position x0 apparently involves not only a few neighboring sample points but 4

This assumption is explained by the fact that a discrete Fourier spectrum implicitly corresponds to a periodic signal (also see Sec. 7.2.2).

10.3 Interpolation

Signal space

215

Frequency space

g(u)

G(ω)

Sinc(x)

Ππ (ω)

gˆ(x) = [Sinc∗g] (x)

ˆ G(ω) = G(ω)·Ππ (ω)

Figure 10.14 Interpolation of a discrete signal—relation between signal and frequency space. The discrete signal g(u) in signal space (left) corresponds to the periodic Fourier ˆ spectrum G(ω) in frequency space (right). The spectrum G(ω) of the continuous signal is isolated from G(ω) by pointwise multiplication (×) with the square function Ππ (ω), which constitutes an ideal low-pass ﬁlter (right). In signal space (left), this operation corresponds to a linear convolution (∗) with the function Sinc(x).

in general inﬁnitely many values of g(u) whose weights decrease continuously with their distance from the given interpolation point x0 (at the rate | π(x01−u) |). Figure 10.15 shows two examples for interpolating the function g(u) at positions x0 = 4.4 and x0 = 5. If the function is interpolated at some integral position, such as x0 = 5, the sample g(u) at u = x0 receives the weight 1, while all other samples coincide with the zero positions of the Sinc function and are thus ignored. Consequently, the resulting interpolation values gˆ(x) are identical to the sample values g(u) at all integral positions x = u. If a continuous signal is properly frequency limited (by half the sampling frequency ω2s ), it can be exactly reconstructed from the discrete signal by interpolation with the Sinc function, as Fig. 10.16 (a) demonstrates. Problems occur, however, around local high-frequency signal events, such as rapid transitions or pulses, as shown in Fig. 10.16 (b, c). In those situations, the Sinc interpolation causes strong overshooting or “ringing” artifacts, which are perceived as visually disturbing. For practical applications, the Sinc function is therefore not suitable as an interpolation kernel—not only because of its inﬁnite extent (and the resulting noncomputability).

216

10. Geometric Operations

Sinc(x−4.4)

Sinc(x−5)

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2 2

-0.2

4

6

8

10

x

x0 = 4.4

-0.4

2

-0.2

4

6

8

x

10

x0 = 5

-0.4

(a)

(b)

Figure 10.15 Interpolation by convolving with the Sinc function. The Sinc function is shifted by aligning its origin with the interpolation points x0 = 4.4 (a) and x0 = 5 (b). The values of the shifted Sinc function (dashed curve) at the integral positions are the weights (coeﬃcients) for the corresponding sample values g(u).

gˆ1 (x)

1

2

gˆ2 (x)

3

4

5

6

(a)

7

8

9

10

x

1

2

gˆ3 (x)

3

4

5

6

(b)

7

8

9

10

x

1

2

3

4

5

6

7

8

9

10

x

(c)

Figure 10.16 Sinc interpolation on various signal types. The reconstructed function in (a) is identical to the continuous, band-limited original. The results for the step function (b) and the pulse function (c) show the strong ringing caused by Sinc (ideal low-pass) interpolation.

A useful interpolation function implements a low-pass ﬁlter that on the one hand introduces minimal blurring by maintaining the maximum the signal bandwidth but also delivers a good reconstruction at rapid signal transitions on the other hand. In this regard, the Sinc function is an extreme choice—it implements an ideal low-pass ﬁlter and thus preserves a maximum bandwidth and signal continuity but gives inferior results at signal transitions. At the opposite extreme, nearest-neighbor interpolation (Fig. 10.12) can perfectly handle steps and pulses but generally fails to produce a continuous signal reconstruction between sample points. The design of an interpolation function thus always involves a trade-oﬀ, and the quality of the results often depends on the particular application and subjective judgment. In the following, we discuss some common interpolation functions that come close to this goal and are therefore frequently used in practice.

10.3 Interpolation

217

wnn (x)

wlin (x)

1

-3

-2

-1

1

1

2

x

3

-3

-2

(a)

-1

1

2

3

x

(b)

Figure 10.17 Convolution kernels for the nearest-neighbor interpolation wnn (x) and the linear interpolation wlin(x).

10.3.3 Interpolation by Convolution As we saw earlier in the context of Sinc interpolation (Eqn. (10.51)), the reconstruction of a continuous signal can be described as a linear convolution operation. In general, we can express interpolation as a convolution of the given discrete function g(u) with some continuous interpolation kernel w(x) as gˆ(x0 ) = [w ∗ g] (x0 ) =

∞

w(x0 − u) · g(u).

(10.54)

u=−∞

The Sinc interpolation in Eqn. (10.52) is obviously only a special case with w(x) = Sinc(x). Similarly, the one-dimensional nearest-neighbor interpolation (Eqn. (10.49), Fig. 10.12 (a)) can be expressed as a linear convolution with the kernel 1 for −0.5 ≤ x < 0.5 wnn (x) = (10.55) 0 otherwise and the linear interpolation (Eqn. (10.50), Fig. 10.12 (b)) with the kernel wlin (x) =

1−x

for |x| < 1

for |x| ≥ 1.

(10.56)

The interpolation kernels wnn (x) and wlin (x) are both shown in Fig. 10.17, and sample results for various function types are plotted in Fig. 10.18.

10.3.4 Cubic Interpolation Because of its inﬁnite extent and the ringing artifacts caused by its slowly decaying oscillations, the Sinc function is not a useful interpolation kernel in practice. Therefore, several interpolation methods employ a truncated version of the Sinc function or an approximation of it, thereby making the convolution

218

10. Geometric Operations

gˆ1 (x)

1

2

gˆ2 (x)

3

4

5

6

7

8

9

10

x

1

2

gˆ3 (x)

3

4

(a)

2

6

7

8

9

10

x

1

4

5

6

3

4

7

8

9

10

x

1

2

5

6

7

8

9

10

7

8

9

10

x

(c)

gˆ2 (x)

3

2

(b)

gˆ1 (x)

1

5

gˆ3 (x)

3

4

5

(d)

6

7

8

9

10

x

1

2

3

4

(e)

5

6

x

(f)

Figure 10.18 Interpolation examples: nearest-neighbor interpolation (a–c), linear interpolation (d–f). wcub (x, a)

2

wcub (x), Sinc(x)

1

1

0.5

0.5

1

1

(a)

2

x

6

4

2

2

4

6

x

(b)

Figure 10.19 Cubic interpolation kernel. Function wcub (x, a) with control parameter a set to a = 0.25 (dashed curve), a = 1 (continuous curve), and a = 1.75 (dotted curve) (a). Cubic function wcub (x) and Sinc function compared (b).

kernel more compact and reducing the ringing. A frequently used approximation of a truncated Sinc function is the so-called cubic interpolation, whose convolution kernel is deﬁned as the piecewise cubic polynomial ⎧ 3 2 ⎪ for 0 ≤ |x| < 1 ⎪ ⎨ (−a + 2) · |x| + (a − 3) · |x| + 1 3 2 wcub (x, a) = −a · |x| + 5a · |x| − 8a · |x| + 4a for 1 ≤ |x| < 2 (10.57) ⎪ ⎪ ⎩ 0 for |x| ≥ 2 . The single control parameter a can be used to adjust the slope of this spline5 function (Fig. 10.19), which aﬀects the amount of overshoot and thus the per5

The family of functions described by Eqn. (10.57) are commonly referred to as cardinal splines [6] (see also Sec. 10.3.5).

10.3 Interpolation

219

ceived “sharpness” of the interpolated signal. For a = 1, which is often recommended as a standard setting, Eqn. (10.57) simpliﬁes to ⎧ 3 2 ⎪ ⎪ ⎨ |x| − 2 · |x| + 1 wcub (x) =

−|x|3 + 5 · |x|2 − 8 · |x| + 4 ⎪ ⎪ ⎩ 0

for 0 ≤ |x| < 1 for 1 ≤ |x| < 2

(10.58)

for |x| ≥ 2 .

Figure 10.20 shows the results of cubic interpolation with diﬀerent settings of the control parameter a. Notice that the cubic reconstruction obtained with the popular standard setting (a = 1) exhibits substantial overshooting at edges as well as strong ripple eﬀects in the continuous parts of the signal (Fig. 10.20 (d)). With a = 0.5, the expression in Eqn. (10.57) corresponds to a Catmull-Rom spline [16] (see also Sec. 10.3.5), which produces signiﬁcantly better results than the standard setup (with a = 1), particularly in smooth signal regions (see Fig. 10.22 (a–c)). In contrast to the Sinc function, the cubic interpolation kernel wcub (x) has a very small extent and is therefore eﬃcient to compute (Fig. 10.19 (b)). Since wcub (x, a) = 0 for |x| ≥ 2, only four discrete values g(u) need to be accounted for in the convolution operation (Eqn. (10.54)) at any continuous position x0 ∈ R, g(u0 −1), g(u0 ), g(u0 +1), g(u0 +2),

where u0 = x0 .

This reduces the one-dimensional cubic interpolation to the expression x0 +2

gˆ(x0 ) =

wcub (x0 −u, a) · g(u).

(10.59)

u= x0 −1

10.3.5 Spline Interpolation The cubic interpolation kernel (Eqn. (10.57)) described in the previous section is a piecewise cubic polynomial function, also known as a cubic spline in computer graphics. In its general form, this function takes not only one but two

220

10. Geometric Operations

gˆ1 (x)

1

2

gˆ2 (x)

3

4

5

6

7

8

9

10

x

1

2

gˆ3 (x)

3

4

(a)

2

8

9

10

x

3

4

5

6

4

5

6

2

3

4

7

8

9

10

x

1

2

3

4

5

6

7

8

9

10

x

1

2

3

4

(e)

7

8

9

10

x

1

(g)

2

5

6

7

8

9

10

7

8

9

10

7

8

9

10

x

(c)

5

6

x

(f)

gˆ2 (x)

3

1

gˆ3 (x)

(d)

2

7

gˆ2 (x)

gˆ1 (x)

1

6

(b)

gˆ1 (x)

1

5

gˆ3 (x)

3

4

5

6

7

8

9

10

(h)

x

1

2

3

4

5

6

x

(i)

Figure 10.20 Cubic interpolation examples. Parameter a in Eqn. (10.57) controls the amount of signal overshoot or perceived sharpness: a = 0.25 (a–c), standard setting a = 1 (d–f), a = 1.75 (g–i). Notice in (d) the ripple eﬀects incurred by interpolating with the standard settings in smooth signal regions.

control parameters (a, b) [54],6 wcs (x, a, b) = ⎧ (−6a − 9b + 12) · |x|3 ⎪ ⎪ ⎪ ⎪ + (6a + 12b − 18) · |x|2 − 2b + 6 ⎪ ⎪ ⎨ 1 · (−6a − b) · |x|3 + (30a + 6b) · |x|2 6 ⎪ ⎪ ⎪ + (−48a − 12b) · |x| + 24a + 8b ⎪ ⎪ ⎪ ⎩ 0

for 0 ≤ |x| < 1 (10.60) for 1 ≤ |x| < 2 for |x| ≥ 2 .

Equation (10.60) describes a family of C2-continuous functions; i. e., their ﬁrst and second derivatives are continuous everywhere and thus their trajectories exhibit no discontinuities, corners, or abrupt changes of curvature. For b = 0, the function wcs (x, a, b) speciﬁes a one-parameter family of so-called cardinal 6

In [54], the parameters a and b were originally named C and B, respectively, with B ≡ b and C ≡ a.

10.3 Interpolation

221

wcs (x, a, b)

1

0.5

2

1

x

1

2

Figure 10.21 Examples of cardinal spline functions wcs (x, a, b) as speciﬁed by Eqn. (10.60): Catmull-Rom spline wcs (x, 0.5, 0) (dotted line), cubic B-spline wcs (x, 0, 1) (dashed line), and Mitchell-Netravali function wcs (x, 13 , 13 ) (solid line).

splines equivalent to the cubic interpolation function wcub (x, a) in Eqn. (10.57), wcs (x, a, 0) = wcub (x, a), and for the standard setting a = 1 (Eqn. (10.58)) in particular wcs (x, 1, 0) = wcub (x, 1) = wcub (x). Figure 10.21 shows three additional examples of this function type that are important in the context of interpolation: Catmull-Rom splines, cubic B-splines, and the Mitchell-Netravali function. All three functions are brieﬂy described below. The actual computation of the interpolated signal follows exactly the same scheme as used for the cubic interpolation described in Eqn. (10.59). Catmull-Rom interpolation With the control parameters set to a = 0.5 and b = 0, the function in Eqn. (10.60) is a Catmull-Rom spline [16], as already mentioned in Sec. 10.3.4: wcrm (x) = wcs (x, 0.5, 0) ⎧ ⎪ 3 · |x|3 − 5 · |x|2 + 2 ⎪ ⎨ 1 = · −|x|3 + 5 · |x|2 − 8 · |x| + 4 2 ⎪ ⎪ ⎩ 0

for 0 ≤ |x| < 1 for 1 ≤ |x| < 2

(10.61)

for |x| ≥ 2 .

Examples of signals interpolated with this kernel are shown in Fig. 10.22 (a–c). The results are similar to ones produced by cubic interpolation (with a = 1, see Fig. 10.20) with regard to sharpness, but the Catmull-Rom reconstruction is clearly superior in smooth signal regions (compare, e. g., Fig. 10.20 (d) vs. Fig. 10.22 (a)).

222

10. Geometric Operations

Cubic B-spline approximation With parameters set to a = 0 and b = 1, Eqn. (10.60) corresponds to a cubic B-spline function [6] of the form wcbs (x) = wcs (x, 0, 1) ⎧ 3 2 ⎪ ⎪ 3 · |x| − 6 · |x| − 4 1 ⎨ = · −|x|3 + 6 · |x|2 − 12 · |x| + 8 6 ⎪ ⎪ ⎩ 0

for 0 ≤ |x| < 1 for 1 ≤ |x| < 2

(10.62)

for |x| ≥ 2 .

This function is positive everywhere and, when used as an interpolation kernel, causes a pure smoothing eﬀect similar to a Gaussian smoothing ﬁlter (see Fig. 10.22 (d–f)). Notice also that—in contrast to all previously described interpolation methods—the reconstructed function does not pass through all discrete sample points. Thus, to be precise, the reconstruction with cubic B-splines is not called an interpolation but an approximation of the signal. Mitchell-Netravali approximation The design of an optimal interpolation kernel is always a trade-oﬀ between high bandwidth (sharpness) and good transient response (low ringing). CatmullRom interpolation, for example, emphasizes high sharpness, whereas cubic Bspline interpolation blurs but creates no ringing. Based on empirical tests, Mitchell and Netravali [54] proposed a cubic interpolation kernel as described in Eqn. (10.60) with parameter settings a = 13 and b = 13 , and the resulting interpolation function

wmn (x) = wcs x, 13 , 13 ⎧ ⎪ 21 · |x|3 − 36 · |x|2 + 16 ⎪ ⎨ 1 · −7 · |x|3 + 36 · |x|2 − 60 · |x| + 32 = 18 ⎪ ⎪ ⎩ 0

for 0 ≤ |x| < 1 for 1 ≤ |x| < 2

(10.63)

for |x| ≥ 2.

This function is the weighted sum of a Catmull-Rom spline (Eqn. (10.61)) and a cubic B-spline (Eqn. (10.62)), as is apparent in Fig. 10.21.7 The examples in Fig. 10.22 (g–i) show that this method is a good compromise, producing little overshoot, high edge sharpness, and good signal continuity in smooth regions. Since the resulting function does not pass through the original sample points, the Mitchell-Netravali method is again an approximation and not an interpolation. 7

See also Exercise 10.5.

10.3 Interpolation

223

gˆ1 (x)

1

2

gˆ2 (x)

3

4

5

6

7

8

9

10

x

1

2

gˆ3 (x)

3

4

(a)

2

8

9

10

x

3

4

5

6

4

5

6

2

3

4

7

8

9

10

x

1

2

3

4

5

6

7

8

9

10

x

1

2

3

4

(e)

7

8

9

10

x

1

(g)

2

5

6

7

8

9

10

7

8

9

10

7

8

9

10

x

(c)

5

6

x

(f)

gˆ2 (x)

3

1

gˆ3 (x)

(d)

2

7

gˆ2 (x)

gˆ1 (x)

1

6

(b)

gˆ1 (x)

1

5

gˆ3 (x)

3

4

5

6

7

8

9

10

x

1

(h)

2

3

4

5

6

x

(i)

Figure 10.22 Cardinal spline reconstruction examples: Catmull-Rom interpolation (a–c), cubic B-spline approximation (d–f), and Mitchell-Netravali approximation (g–i).

10.3.6 Lanczos Interpolation The Lanczos8 interpolation belongs to the family of “windowed Sinc” methods. In contrast to the methods described in the previous sections, these do not use a polynomial (or other) approximation of the Sinc function but the Sinc function itself combined with a suitable window function ψ(x); i. e., an interpolation kernel of the form w(x) = ψ(x) · Sinc(x). (10.64) The particular window functions for the ⎧ ⎪ ⎨1 x sin(π n ) ψLn (x) = x π n ⎪ ⎩ 0

Lanczos interpolation are deﬁned as for |x| = 0 for 0 < |x| < n

(10.65)

for |x| ≥ n ,

where n ∈ N denotes the order of the ﬁlter [56, 74]. Notice that the window function is again a truncated Sinc function! For the Lanczos ﬁlters of order n = 8

Cornelius Lanczos (1893–1974).

224

10. Geometric Operations

2, 3, which are the most commonly used in image processing, the corresponding window functions are ⎧ for |x| = 0 ⎪ ⎨1 sin(π x 2) for 0 < |x| < 2 (10.66) ψL2 (x) = πx 2 ⎪ ⎩ 0 for |x| ≥ 2 , ⎧ for |x| = 0 ⎪ ⎨1 sin(π x 3) for 0 < |x| < 3 (10.67) ψL3 (x) = πx 3 ⎪ ⎩ 0 for |x| ≥ 3 . Both window functions are shown in Fig. 10.23 (a, b). From Eqns. (10.64) and (10.65), the general one-dimensional Lanczos interpolation kernel of order n ≥ 1 is then deﬁned as wLn (x) = ψLn (x) · Sinc(x) ⎧ ⎪ ⎨1 x = n · sin(π nπ 2)·sin(πx) x2 ⎪ ⎩ 0

for |x| = 0 for 0 < |x| < n for |x| ≥ n,

and thus the 1D Lanczos kernels of orders n = 2 and n = 3 are ⎧ ⎪ for |x| = 0 ⎨1 x wL2 (x) = 2 · sin(π π2 2)·sin(πx) for 0 < |x| < 2 x2 ⎪ ⎩ 0 for |x| ≥ 2

(10.68)

(10.69)

and ⎧ ⎪ ⎨1 wL3 (x) = 3 · ⎪ ⎩ 0

sin(π x 3 )·sin(πx) π 2 x2

for |x| = 0 for 0 < |x| < 3 for |x| ≥ 3 .

(10.70)

Figure 10.23 (c, d) shows the resulting interpolation kernels together with the original Sinc function. The function wL2 (x) is quite similar to the CatmullRom kernel wcrm (x) (Eqn. (10.61), Fig. 10.21), so the results can be expected to be similar as well, as shown in Fig. 10.24 (a–c) (cf. Fig. 10.22 (a– c)). Notice, however, the relatively poor reconstruction in the smooth signal regions (Fig. 10.24 (a)) and the strong ringing introduced in the constant high-amplitude regions (Fig. 10.24 (b)). The “3-tap” kernel wL3 (x) reduces these artifacts and produces steeper edges, at the cost of increased overshoot (Fig. 10.22 (d–f)). In summary, although Lanczos interpolators have seen revived interest and popularity in recent years, they do not seem to oﬀer much (if any) advantage over other established methods, particularly the cubic, Catmull-Rom, or

10.3 Interpolation

225

ψL2

3

2

ψL3

1

1

0.5

0.5

1

1

2

3

x

3

2

1

(a)

1

2

x

wL3 (x), Sinc(x)

1

1

0.5

0.5

1

3

(b)

wL2 (x), Sinc(x)

3

2

1

2

3

x

(c)

3

2

1

1

2

3

x

(d)

Figure 10.23 One-dimensional Lanczos interpolation kernels. Lanczos window functions ψL2 (a), ψL3 (b), and the corresponding interpolation kernels wL2 (c), wL3 (d). The original Sinc function (dotted curve) is shown for comparison.

Mitchell-Netravali interpolations. While these are based on eﬃciently computable polynomial functions, Lanczos interpolation requires trigonometric functions which are relatively costly to compute, unless some form of tabulation is used.

10.3.7 Interpolation in 2D So far we have only looked at interpolating (or reconstructing) one-dimensional signals from discrete samples. Images are two-dimensional signals but, as we shall see in this section, the techniques for interpolating images are very similar and can be derived from the one-dimensional approach. In particular, “ideal” (low-pass ﬁlter) interpolation requires a two-dimensional Sinc function deﬁned as Sinc(x, y) = Sinc(x) · Sinc(y) =

sin(πx) sin(πy) · , πx πy

(10.71)

which is shown in Fig. 10.25 (a). Just as in 1D, the 2D Sinc function is not a practical interpolation function for various reasons. In the following, we look at some common interpolation methods for images, particularly the nearestneighbor, bilinear, bicubic, and Lanczos interpolations, whose 1D versions were described in the previous sections.

226

10. Geometric Operations

gˆ1 (x)

1

2

gˆ2 (x)

3

4

5

6

7

8

9

10

x

1

2

gˆ3 (x)

3

4

(a)

2

6

7

8

9

10

x

4

5

6

2

3

4

7

8

9

10

x

1

2

5

6

7

8

9

10

7

8

9

10

x

(c)

gˆ2 (x)

3

1

(b)

gˆ1 (x)

1

5

gˆ3 (x)

3

4

(d)

5

6

7

8

9

10

x

1

2

3

(e)

4

5

6

x

(f)

Figure 10.24 Lanczos interpolation examples: Lanczos-2 (a–c), Lanczos-3 (d–f). Note the ringing in the ﬂat (constant) regions caused by Lanczos-2 interpolation in the left part of (b). The Lanczos-3 interpolator shows less ringing (e) but produces steeper edges at the cost of increased overshoot (e, f).

1

1

(a)

(b)

Figure 10.25 Interpolation kernels in 2D: Sinc kernel Sinc(x, y) (a) and nearest-neighbor kernel Wnn (x, y) (b) for −3 ≤ x, y ≤ 3.

Nearest-neighbor interpolation in 2D The pixel closest to a given continuous point (x0 , y0 ) is found by rounding the x and y coordinates independently to integral values, ˆ 0 , y0 ) = I(u0 , v0 ), I(x with

u0 = round(x0 ) = x0 + 0.5, v0 = round(y0 ) = y0 + 0.5.

(10.72)

10.3 Interpolation

(a)

227

(b)

(c)

Figure 10.26 Image enlargement (8×): original (a), nearest-neighbor interpolation (b), and bilinear interpolation (c).

As in the 1D case, the interpolation in 2D can be described as a linear convolution (linear ﬁlter). The 2D kernel for the nearest-neighbor interpolation is, analogous to Eqn. (10.55), deﬁned as 1 for −0.5 ≤ x, y < 0.5 Wnn (x, y) = (10.73) 0 otherwise. This function is shown in Fig. 10.25 (b). Nearest-neighbor interpolation is known for its strong blocking eﬀects (Fig. 10.26 (b)) and thus is rarely used for geometric image operations. However, in some situations, this eﬀect may be intended; for example, if an image is to be enlarged by replicating each pixel without any smoothing. Bilinear interpolation The 2D counterpart to the linear interpolation (Sec. 10.3.1) is the so-called bilinear interpolation,9 whose operation is illustrated in Fig. 10.27. For the given interpolation point (x0 , y0 ), we ﬁrst ﬁnd the four closest (surrounding) pixels A, B, C, D in the image I with A = I(u0 , v0 ),

B = I(u0 +1, v0 ),

C = I(u0 , v0 +1),

D = I(u0 +1, v0 +1),

(10.74)

where u0 = x0 and v0 = y0 . Then the pixel values A, B, C, D are interpolated in horizontal and subsequently in vertical direction. The intermediate 9

Not to be confused with the bilinear mapping (transformation) described in Sec. 10.1.5.

228

10. Geometric Operations

C

G

F D 1−b

v0 +1

A E

B

y0

b

a u0

x0

1−a

v0 u0 +1

(a)

(b)

Figure 10.27 Bilinear interpolation. For a given position (x0 , y0 ), the interpolated value is computed from the values A, B, C, D of the four closest pixels in two steps (a). First the intermediate values E and F are computed by linear interpolation in the horizontal direction between A, B and C, D, respectively, where a = x0 − u0 is the distance to the nearest pixel to the left of x0 . Subsequently, the intermediate values E, F are interpolated in the vertical direction, where b = y0 − v0 is the distance to the nearest pixel below y0 . An example for the resulting surface between four adjacent pixels is shown in (b).

values E, F are determined by the distance a = x0 − u0 between the interpolation point (x0 , y0 ) and the horizontal raster coordinate u0 as E = A + (x0 − u0 ) · (B −A) = A + a · (B −A), F = C + (x0 − u0 ) · (D−C) = C + a · (D−C), and the ﬁnal interpolation value G is computed from the vertical distance b = y0 − v0 as ˆ 0 , y0 ) = G = E + (y0 − v0 ) · (F −E) = E + b · (F −E) I(x = (a−1)(b−1) A + a(1−b) B + (1−a) b C + a b D.

(10.75)

Expressed as a linear convolution ﬁlter, the corresponding 2D kernel Wbil (x, y) is the product of the two one-dimensional kernels wlin (x) and wlin (y) (Eqn. (10.56)), Wbil (x, y) = wlin (x) · wlin (y) 1−x−y−x·y = 0

for 0 ≤ |x|, |y| < 1 otherwise.

(10.76)

In this function (plotted in Fig. 10.28 (a)), we can recognize the bilinear term that gives this method its name.

10.3 Interpolation

229

1

1

-1

1 (a)

(b)

Figure 10.28 2D interpolation kernels: bilinear kernel Wbil (x, y) (a) and bicubic kernel Wbic (x, y) (b) for −3 ≤ x, y ≤ 3.

Bicubic and spline interpolation The convolution kernel for the two-dimensional cubic interpolation is also deﬁned as the product of the corresponding one-dimensional kernels (Eqn. (10.58)), Wbic (x, y) = wcub (x) · wcub (y). (10.77) The resulting kernel is plotted in Fig. 10.28 (b). Due to the decomposition into one-dimensional kernels (Eqn. (10.77)), the computation of the bicubic interpolation is separable in x, y and can thus be expressed as ˆ 0 , y0 ) = I(x

y0 +2 x0 +2

I(u, v) · Wbic (x0 −u, y0 −v)

v= u= y0 −1 x0 −1

=

3 3 wcub (y0 −vj ) · I(ui , vj ) · wcub (x0 −ui ) , j=0

i=0

.

/0 pj

(10.78)

1

with ui = x0 − 1 + i and vj = y0 − 1 + j. The value pj denotes the intermediate result of the cubic interpolation in the x direction in line j, as illustrated in Fig. 10.29. Equation (10.78) describes a simple and eﬃcient procedure for computing the bicubic interpolation using only a one-dimensional kernel wcub (x). The interpolation is based on a 4 × 4 neighborhood of pixels and requires a total of 16 + 4 = 20 additions and multiplications. This method, which is summarized in Alg. 10.2, can be used to implement any x/y-separable 2D interpolation kernel of size 4 × 4, such as the twodimensional Catmull-Rom interpolation (Eqn. (10.61)) with Wcrm (x, y) = wcrm (x) · wcrm (y)

(10.79)

230

10. Geometric Operations

ˆ 0 , y0 ) I(x p3

p2 y0

y0 p1 v0

v0 p0

u0 I(u0 , v0 )

u0 x0

x0

Figure 10.29 Bicubic interpolation in two steps. The discrete image I (pixels are marked ) is to be interpolated at some continuous position (x0 , y0 ). In step 1 (left), a one-dimensional interpolation is performed in the horizontal direction with wcub (x) over four pixels I(ui , vj ) in four lines. One intermediate result pj (marked ) is computed for each line j. In step 2 ˆ 0 , y0 ) is computed by a single cubic interpolation in the vertical direction (right), the result I(x over the intermediate results p0 . . . p3 .

1

1

(a)

(b)

Figure 10.30 Two-dimensional spline interpolation kernels: Catmull-Rom kernel Wcrm (x, y) (a), Mitchell-Netravali kernel Wmn (x, y) (b), for −3 ≤ x, y ≤ 3.

or the Mitchell-Netravali interpolation (Eqn. (10.63)) with Wmn (x, y) = wmn (x) · wmn (y).

(10.80)

The corresponding 2D kernels are shown in Fig. 10.30. For interpolation with separable kernels of larger size see the general procedure in Alg. 10.3.

10.3 Interpolation

231

Algorithm 10.2 Bicubic interpolation of image I at position (x0 , y0 ). The one-dimensional cubic function wcub (·) (Eqn. (10.57)) is used for the separate interpolation in the x and y directions based on a neighborhood of 4 × 4 pixels.

(x0 , y0 ) ∈ R2 Returns the interpolated value of the image I at the continuous position (x0 , y0 ).

1: BicubicInterpolation (I, x0 , y0 )

2: 3: 4: 5: 6: 7: 8: 9: 10:

Let q ← 0 for j ← 0 . . . 3 do Let v ← y0 + j − 1 Let p ← 0

iterate over 4 lines

for i ← 0 . . . 3 do Let u ← x0 + i − 1 Let p ← p + I(u, v) · wcub (x0 −u)

iterate over 4 columns

q ← q + p · wcub (y0 −v) return q.

Lanczos interpolation The kernels for the 2D Lanczos interpolation are also x/y-separable into onedimensional kernels (Eqns. (10.69) and (10.70), respectively), WLn (x, y) = wLn (x) · wLn (y).

(10.81)

The resulting kernels for orders n = 2 and n = 3 are shown in Fig. 10.31. Because of the separability the 2D Lanczos interpolation can be computed, similar to the bicubic interpolation, separately in the x and y directions. Like the bicubic kernel, the 2-tap Lanczos kernel WL2 (Eqn. (10.69)) is zero outside the interval −2 ≤ x, y ≤ 2, and thus the procedure described in Eqn. (10.78) and Alg. 10.2 can be used with only a small modiﬁcation (replace wcub by wL2 ). The 3-tap Lanczos kernel WL3 (Eqn. (10.70)) requires two additional rows and columns, and therefore the 2D interpolation changes to ˆ 0 , y0 ) = I(x

I(u, v) · WL3 (x0 − u, y0 − v)

y0 +3 x0 +3

v= u= y0 −2 x0 −2

=

5 5 wL3 (y0 − vj ) · I(ui , vj ) · wL3 (x0 − ui ) , j=0

with

ui = x0 + i − 2

(10.82)

i=0

and vj = y0 + j − 2.

Thus, the L3 Lanczos interpolation in 2D uses a support region of 6 × 6 = 36 pixels from the original image, 20 pixels more than the bicubic interpolation.

232

10. Geometric Operations

1

1

(a)

(b)

Figure 10.31 Two-dimensional Lanczos kernels for n = 2 and n = 3: kernels WL2 (x, y) (a) and WL3 (x, y) (b), with −3 ≤ x, y ≤ 3.

In general, the expression for a 2D Lanczos interpolator Ln of arbitrary order n ≥ 1 is y0 +n

ˆ 0 , y0 ) = I(x

x0 +n

I(u, v) · WLn (x0 − u, y0 − v)

v= u= y0 −n+1 x0 −n+1

=

2n−1

wLn (y0 − vj ) ·

j=0

with

2n−1

I(ui , vj ) · wLn (x0 − ui ) ,

(10.83)

i=0

ui = x0 + i − n + 1 and vj = y0 + j − n + 1.

The size of this interpolator’s support region is 2n × 2n pixels. How the expression in Eqn. (10.83) could be computed is shown in Alg. 10.3, which actually describes a general interpolation procedure that can be used with any separable interpolation kernel W (x, y) = wn (x) · wn (y) of extent ±n. Examples and discussion Figures 10.32 and 10.33 compare the interpolation methods described above: nearest-neighbor, bilinear, bicubic Catmull-Rom, cubic B-spline, MitchellNetravali, and Lanczos interpolation. In both ﬁgures, the original images are rotated counter-clockwise by 15◦ . A gray background is used to visualize the edge overshoot produced by some of the interpolators. Nearest-neighbor interpolation (Fig. 10.32 (b)) creates no new pixel values but forms, as expected, coarse blocks of pixels with the same intensity. The eﬀect of the bilinear interpolation (Fig. 10.32 (c)) is local smoothing over four neighboring pixels. The weights for these four pixels are positive, and thus no result can be smaller than the smallest neighboring pixel value or greater than the greatest neighboring pixel value. In other words, bilinear interpolation

10.3 Interpolation

233

Algorithm 10.3 General interpolation with a separable interpolation kernel W (x, y) = wn (x) · wn (y) of extent ±n (i. e., the 1D kernel wn (x) is zero for x < −n and x > n, with n ∈ N). Note that the BicubicInterpolation procedure in Alg. 10.2 is a special instance of this algorithm (with n = 2).

(x0 , y0 ) ∈ R2 Returns the interpolated value of the image I at the continuous position (x0 , y0 ), using the interpolation kernel W (x, y) = wn (x) · wn (y).

1: SeparableInterpolation (I, x0 , y0 , wn )

2: 3: 4: 5: 6: 7: 8: 9: 10: 11:

Let n be the extent of the kernel wn (n ≥ 1) Let q ← 0 for j ← 0 . . . 2n−1 do Let v ← vj = y0 + j − n + 1 Let p ← 0 for i ← 0 . . . 2n−1 do Let u ← ui = x0 + i − n + 1 Let p ← p + I(u, v) · wn (x0 −u)

iterate over 2n lines

iterate over 2n columns

q ← q + p · wn (y0 −v) return q.

cannot create any over- or undershoot at edges. This is not the case for the bicubic interpolation (Fig. 10.32 (d)): some of the coeﬃcients in the bicubic interpolation kernel are negative, which makes pixels near edges clearly brighter or darker, respectively, thus increasing the perceived sharpness. In general, bicubic interpolation produces clearly better results than the bilinear method at comparable computing cost, and it is thus widely accepted as the standard technique and used in most image manipulation programs. By adjusting the control parameter a (Eqn. (10.57)), the bicubic kernel can be easily tuned to ﬁt the need of particular applications. For example, the Catmull-Rom method (Fig. 10.32 (e)) can be implemented with the bicubic interpolation by setting a = 0.5 (Eqns. (10.61) and (10.79)). Results from the 2D Lanczos interpolation (Fig. 10.32 (h)) using the 2tap kernel WL2 cannot be much better than from the bicubic interpolation, which can be adjusted to give similar results without causing any ringing in ﬂat regions, as visible in Fig. 10.24. The 3-tap Lanczos kernel WL3 (Fig. 10.32 (i)) on the other hand should produce slightly sharper edges at the cost of increased overshoot (see also Exercise 10.7). In summary, for high-quality applications one should consider the CatmullRom (Eqns. (10.61) and (10.79)) or the Mitchell-Netravali (Eqns. (10.63) and (10.80)) methods, which oﬀer good reconstruction at the same computational cost as the bicubic interpolation.

234

10. Geometric Operations

(a) original

(b) nearest-neighbor

(c) bilinear

(d) bicubic

(e) Catmull-Rom

(f) cubic B-spline

(g) Mitchell-Netravali

(h) Lanczos-2

(i) Lanczos-3

Figure 10.32 Image interpolation methods compared (line art): part of the original image (a), which is subsequently rotated by 15◦ . Nearest-neighbor (b), bilinear (c), bicubic (d), Catmull-Rom (e), cubic B-spline (f), Mitchell-Netravali (g), Lanczos-2 (h), Lanczos-3 (i) interpolation. The image was given a gray background to visualize overshooting, particularly noticable with bicubic (d) and Lanczos-3 (i) interpolation. Notice the ringing in the ﬂat image regions produced by the Lanczos-2 interpolation (h).

10.3.8 Aliasing As we described in the previous parts of this chapter, the usual approach for implementing geometric image transformations can be summarized by the following three steps (Fig. 10.34): 1. Each discrete image point (u0 , v0 ) of the target image is projected by the

10.3 Interpolation

235

(a) original

(b) nearest-neighbor

(c) bilinear

(d) bicubic

(e) Catmull-Rom

(f) cubic B-spline

(g) Mitchell-Netravali

(h) Lanczos-2

(i) Lanczos-3

Figure 10.33 Image interpolation methods compared (text image): original image (a), which is subsequently rotated by 15◦ . Nearest-neighbor (b), bilinear (c), bicubic (d), CatmullRom (e), cubic B-spline (f), Mitchell-Netravali (g), Lanczos-2 (h), Lanczos-3 (i) interpolation.

inverse geometric transformation T −1 to the continuous coordinate (x0 , y0 ) in the source image. ˆ y) is reconstructed from the discrete 2. The continuous image function I(x, source image I(u, v) by interpolation (using one of the methods described above). 3. The interpolated function is sampled at position (x0 , y0 ), and the sample ˆ 0 , y0 ) is transferred to the target pixel I (u , v ). value I(x 0 0

236

10. Geometric Operations

I (u , v )

I(u, v)

T −1

(x0 , y0 )

(uo , v0 )

Figure 10.34 Sampling errors in geometric operations. If the geometric transformation T leads to a local contraction of the image (which corresponds to a local enlargement by T −1 ), the distance between adjacent sample points in I is increased. This reduces the local sampling frequency and thus the maximum signal frequency allowed in the source image, which eventually leads to aliasing.

Sampling the interpolated image One problem not considered so far concerns the process of sampling the reconstructed, continuous image function in step 3 above. The problem occurs when the geometric transformation T causes parts of the image to be contracted. In this case, the distance between adjacent sample points on the source image is locally increased by the corresponding inverse transformation T −1. Now, widening the sampling distance reduces the spatial sampling rate and thus the ˆ y). maximum permissible frequencies in the reconstructed image function I(x, Eventually this leads to a violation of the sampling criterion and causes visible aliasing in the transformed image. The problem does not occur when the image is enlarged by the geometric transformation because in this case the sampling interval on the source image is shortened (corresponding to a higher sampling frequency) and no aliasing can occur. Notice that this eﬀect is largely unrelated to the interpolation method, as demonstrated by the examples in Fig. 10.35. The eﬀect is most noticeable under nearest-neighbor interpolation in Fig. 10.35 (b), where the thin lines are simply not “hit” by the widened sampling raster and thus disappear in some places. Important image information is thereby lost. The bilinear and bicubic interpolation methods in Fig. 10.35 (c, d) have wider interpolation kernels but still cannot avoid the aliasing eﬀect. The problem of course gets worse with increasing reduction factors.

10.3 Interpolation

237

(a)

(b)

(c)

(d)

Figure 10.35 Aliasing caused by local image contraction. Aliasing is caused by a violation of the sampling criterion and is largely unaﬀected by the interpolation method used: complete transformed image (a), detail using nearest-neighbor interpolation (b), bilinear interpolation (c), and bicubic interpolation (d).

Low-pass ﬁltering One solution to the aliasing problem is to make sure that the interpolated image function is properly frequency-limited before it gets resampled. This can be accomplished with a suitable low-pass ﬁlter, as illustrated in Fig. 10.36. The cutoﬀ frequency of the low-pass ﬁlter is determined by the amount of local scale change, which may—depending upon the type of transformation— be diﬀerent in various parts of the image. In the simplest, case the amount of scale change is the same throughout the image (e. g., under global scaling

238

10. Geometric Operations

I(u, v)

Interpolation

Filter

Sampling

1

2

3

I (u , v )

Figure 10.36 Low-pass ﬁltering to avoid aliasing in geometric operations. After interpolation (step 1), the reconstructed image function is subjected to low-pass ﬁltering (step 2) before being resampled (step 3).

or aﬃne transformations, where the same ﬁlter can be used everywhere in the image). In general, however, the low-pass ﬁlter is space-variant or nonhomogeneous, and the local ﬁlter parameters are determined by the transformation T and the current image position. If convolution ﬁlters are used for both interpolation and low-pass ﬁltering, they could be combined into a common, space-variant reconstruction ﬁlter. Unfortunately, space-variant ﬁltering is computationally expensive and thus is often avoided, even in professional applications (e. g., Adobe Photoshop). The technique is nevertheless used in certain applications, such as high-quality texture mapping in computer graphics [22, 33, 79].

10.4 Java Implementation In ImageJ, only a few simple geometric operations are currently implemented as methods in the ImageProcessor class, such as rotation and ﬂipping. Additional operations, including aﬃne transformations, are available as plugin classes as part of the optional TransformJ package [53]. In the following, we develop a rudimentary Java implementation for a set of geometric operations with the class structure summarized in Fig. 10.37. The Java classes form two groups: the ﬁrst group implements the geometric transformations discussed in Sec. 10.1,10 while the second group implements the most important interpolation methods described in Sec. 10.3. Finally, we show sample ImageJ plugins to demonstrate the use of this implementation.

10.4.1 Geometric Transformations The following Java classes represent geometric transformations in 2D and provide methods for computing the transformation parameters from corresponding point pairs. 10

The standard Java API currently only implements the aﬃne transformation (in class java.awt.geom.AﬃneTransform).

10.4 Java Implementation

239

Package mappings

Package interpolators

Mapping

PixelInterpolator LinearMapping

NearestNeighborInterpolator

AffineMapping

BilinearInterpolator

Translation

BicubicInterpolator

Scaling Shear Rotation ProjectiveMapping BilinearMapping

Package java.awt.geom

Point2D Point Point2D.Double

TwirlMapping RippleMapping SphereMapping

Figure 10.37 Package and class structure for the Java implementation of geometric operations. The class Mapping and its subclasses implement the geometric transformations, and PixelInterpolator implements various interpolation methods. The standard (abstract) Java AWT class Point2D (and its concrete subclasses Point and Point2D.Double) are used to represent individual points in 2D.

Class Point2D Two-dimensional coordinates are represented by the (abstract) class Point2D, deﬁned in the standard Java package java.awt.geom. Its subclasses Point and Point2D.Double are used to specify coordinate points with integer and ﬂoating-point coordinates, respectively. Class Mapping The abstract class Mapping is the superclass for all subsequent transformations. All subclasses of Mapping are required to implement the method Point2D applyTo(Point2D pnt) which applies the corresponding transformation to a given coordinate point pnt and returns the transformed point. The method void applyTo(ImageProcessor ip, PixelInterpolator intPol)

240

10. Geometric Operations

on the other hand applies this geometric mapping to a whole image (ip) using a speciﬁed pixel interpolator (intPol). This method is implemented by the class Mapping itself and is not supposed to be overwritten by subclasses (see line 28 in the code segment below). The actual image transformation is performed using the target-to-source method (Sec. 10.2.2) and thus requires the inverse coordinate transform T −1 , which can be obtained via the method getInverse() (see lines 19 and 33 below). The inverse mapping is computed and returned unless the particular mapping is already an inverse mapping (isInverse is true). Note that the inversion is only implemented for linear transformations (class LinearMapping and derived subclasses). In all other cases, an inverse mapping is created immediately when the Mapping object is instantiated, so no inversion is ever needed. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

// ﬁle Mapping.java package mappings; import ij.process.ImageProcessor; import interpolators.PixelInterpolator; public abstract class Mapping implements Cloneable { boolean isInverse = false; // subclasses must implement this method: abstract Point2D applyTo(Point2D pnt); Mapping invert() { throw new IllegalArgumentException("cannot invert mapping"); } Mapping getInverse() { if (isInverse) return this; else return this.invert(); // only linear mappings invert } // transforms the image ip using this geometric mapping // and the speciﬁed pixel interpolator intPol public void applyTo(ImageProcessor ip, PixelInterpolator intPol) { ImageProcessor targetIp = ip; // make a temporary copy of the image: ImageProcessor sourceIp = ip.duplicate(); Mapping invMap = this.getInverse(); // get inverse mapping intPol.setImageProcessor(sourceIp); int w = targetIp.getWidth();

10.4 Java Implementation

37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

241

int h = targetIp.getHeight(); Point2D pt = new Point2D.Double(); for (int v=0; v c2.q) return -1; if (this.q < c2.q) return 1; else return 0; }

296

22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

B. Source Code

double dist2 (Corner c2) { // returns the squared distance between this corner and corner c2 int dx = this.u - c2.u; int dy = this.v - c2.v; return (dx*dx)+(dy*dy); } void draw(ImageProcessor ip) { // draw this corner as a black cross in ip int paintvalue = 0; // black int size = 2; ip.setValue(paintvalue); ip.drawLine(u-size,v,u+size,v); ip.drawLine(u,v-size,u,v+size); } } // end of class Corner

B.2.3 File HarrisCornerDetector (Class) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

package harris; import ij.IJ; import ij.ImagePlus; import ij.plugin.filter.Convolver; import ij.process.Blitter; import ij.process.ByteProcessor; import ij.process.FloatProcessor; import ij.process.ImageProcessor; import java.util.Arrays; import java.util.Collections; import java.util.List; import java.util.Vector; public class HarrisCornerDetector { public static final float DEFAULT_ALPHA = 0.050f; public static final int DEFAULT_THRESHOLD = 20000; float alpha = DEFAULT_ALPHA; int threshold = DEFAULT_THRESHOLD; double dmin = 10; final int border = 20; // ﬁlter kernels (1D part of separable 2D ﬁlters) final float[] pfilt = {0.223755f,0.552490f,0.223755f}; final float[] dfilt = {0.453014f,0.0f,-0.453014f}; final float[] bfilt = {0.01563f,0.09375f,0.234375f,0.3125f ,0.234375f,0.09375f,0.01563f}; // = [1, 6, 15, 20, 15, 6, 1]/64 ImageProcessor ipOrig; FloatProcessor A; FloatProcessor B;

B.2 Harris Corner Detector

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83

FloatProcessor C; FloatProcessor Q; List corners; HarrisCornerDetector(ImageProcessor ip) { this.ipOrig = ip; } public HarrisCornerDetector(ImageProcessor ip, float alpha, int threshold) { this.ipOrig = ip; this.alpha = alpha; this.threshold = threshold; } public void findCorners() { makeDerivatives(); makeCrf(); //corner response function (CRF) corners = collectCorners(border); corners = cleanupCorners(corners); } void makeDerivatives() { FloatProcessor Ix = (FloatProcessor) ipOrig.convertToFloat(); FloatProcessor Iy = (FloatProcessor) ipOrig.convertToFloat(); Ix = convolve1h(convolve1h(Ix,pfilt),dfilt); Iy = convolve1v(convolve1v(Iy,pfilt),dfilt); A = sqr((FloatProcessor) Ix.duplicate()); A = convolve2(A,bfilt); B = sqr((FloatProcessor) Iy.duplicate()); B = convolve2(B,bfilt); C = mult((FloatProcessor)Ix.duplicate(),Iy); C = convolve2(C,bfilt); } void makeCrf() { // corner response function (CRF) int w = ipOrig.getWidth(); int h = ipOrig.getHeight(); Q = new FloatProcessor(w,h); float[] Apix = (float[]) A.getPixels(); float[] Bpix = (float[]) B.getPixels(); float[] Cpix = (float[]) C.getPixels(); float[] Qpix = (float[]) Q.getPixels(); for (int v=0; v pix[i2] && cp > pix[i2+1] ; } } } // end of class HarrisCornerDetector

B.3 Median-Cut Color Quantization

301

B.3 Median-Cut Color Quantization This is an implementation of Heckbert’s median-cut color quantization algorithm [32], as described in Sec. 5.2 (Alg. 5.1–5.3). Unlike in the original algorithm, no initial uniform (scalar) quantization is used for reducing the number of image colors. Instead, all colors contained in the original image are considered in the quantization process. After the set of representative colors has been found, each image color is mapped to the closest representative in RGB color space using the Euclidean distance.

B.3.1 ColorQuantizer (Interface) This is a general interface for all color quantizers. 1 2 3 4 5 6 7 8 9 10 11

package color; import ij.process.ByteProcessor; import ij.process.ColorProcessor; public interface ColorQuantizer { public abstract ByteProcessor quantizeImage(ColorProcessor cp); public abstract int[] quantizeImage(int[] origPixels); public abstract int countQuantizedColors(); }

B.3.2 MedianCutQuantizer (Class) This class contains the main functionality of the median-cut quantizer. Figure B.1 illustrates the key data structures involved and their relationships. The classes ColorNode and ColorBox are implemented as nested classes inside MedianCutQuantizer. Also, notice the use of the nested enumeration class ColorDimension for implementing the constants RED, GREEN, and BLUE and the associated comparator methods. 1 2 3 4 5 6 7 8 9 10 11 12 13

package color; import ij.process.ByteProcessor; import ij.process.ColorProcessor; import import import import import

java.awt.image.IndexColorModel; java.util.ArrayList; java.util.Arrays; java.util.Comparator; java.util.List;

public class MedianCutQuantizer implements ColorQuantizer {

302

B. Source Code

Original Image ColorProcessor

ColorNode

ip

cnt red green blue

new ColorHistogram()

imageColors K −1

upper lower level count

ColorBox

ColorBox

upper lower level count

Quantized Image

colorSet quantizeImage() Figure B.1 Median-cut data structures. Initially, a new ColorHistogram is computed for the original color image (ip of type ColorProcessor). The resulting array imageColors of size K corresponds to the unique colors (C = {c1 , c2 , . . . cK } in Alg. 5.1) contained in the original RGB image. Each cell of imageColors refers to a colorNode object (ci ) that holds the associated color (red, green, blue) and its frequency (cnt) in the image. Each colorBox object (corresponding to a color box b in Alg. 5.1) selects a contiguous range of image colors, bounded by the indices lower and upper. The ranges of elements in imageColors, indexed by diﬀerent colorBox objects, never overlap. Each element in imageColors is contained in exactly one colorBox; i. e., the color boxes held in colorSet (B in Alg. 5.1) form a partitioning of imageColors (colorSet is implemented as a list of ColorBox objects). To split a particular colorBox along a color dimension d = Red, Green, or Blue, the corresponding subrange of elements in imageColors is sorted with the property red, green, or blue, respectively, as the sorting key. In Java, this is quite easy to implement using the standard Arrays.sort() utility method and a dedicated Comparator object for each color dimension. Finally, the method quantizeImage() replaces each pixel in ip by the closest color in colorSet.

14 15 16 17 18 19 20 21 22 23 24 25

private ColorNode[] imageColors = null; // original image colors private ColorNode[] quantColors = null; // quantized colors public MedianCutQuantizer(ColorProcessor ip, int Kmax) { this((int[]) ip.getPixels(), Kmax); } public MedianCutQuantizer(int[] pixels, int Kmax) { quantColors = findRepresentativeColors(pixels, Kmax); } public int countQuantizedColors() {

B.3 Median-Cut Color Quantization

26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

return quantColors.length; } public ColorNode[] getQuantizedColors() { return quantColors; } ColorNode[] findRepresentativeColors(int[] pixels, int Kmax) { ColorHistogram colorHist = new ColorHistogram(pixels); int K = colorHist.getNumberOfColors(); ColorNode[] rCols = null; imageColors = new ColorNode[K]; for (int i = 0; i < K; i++) { int rgb = colorHist.getColor(i); int cnt = colorHist.getCount(i); imageColors[i] = new ColorNode(rgb, cnt); } if (K 256)

303

304

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131

B. Source Code

throw new Error("cannot index to more than 256 colors"); int w = cp.getWidth(); int h = cp.getHeight(); int[] origPixels = (int[]) cp.getPixels(); byte[] idxPixels = new byte[origPixels.length]; for (int i = 0; i < origPixels.length; i++) { idxPixels[i] = (byte) findClosestColorIndex(origPixels[i]); } IndexColorModel idxCm = makeIndexColorModel(); return new ByteProcessor(w, h, idxPixels, idxCm); } IndexColorModel makeIndexColorModel() { int nColors = countQuantizedColors(); byte[] rMap = new byte[nColors]; byte[] gMap = new byte[nColors]; byte[] bMap = new byte[nColors]; for (int i=0; i> 16); int grn = ((rgb & 0xFF00) >> 8); int blu = (rgb & 0xFF); int minIdx = 0; int minDistance = Integer.MAX_VALUE; for (int i=0; i= 2) { // box can be split if (box.level < minLevel) { boxToSplit = box; minLevel = box.level; } } } return boxToSplit; } private ColorNode[] averageColors(List colorBoxes) { int n = colorBoxes.size(); ColorNode[] avgColors = new ColorNode[n]; int i = 0; for (ColorBox box : colorBoxes) { avgColors[i] = box.getAverageColor(); i = i + 1; } return avgColors; } // ————– class ColorNode —————————————— class ColorNode { private int rgb; private int red, grn, blu; private int cnt; ColorNode (int rgb, this.rgb = (rgb & this.red = (rgb & this.grn = (rgb & this.blu = (rgb & this.cnt = cnt; }

int cnt) { 0xFFFFFF); 0xFF0000) >> 16; 0xFF00) >> 8; 0xFF);

ColorNode (int red, int grn, int blu, int cnt) { this.rgb = ((red & 0xff) bmax) bmax = b; if (b < bmin) bmin = b; } }

B.3 Median-Cut Color Quantization

236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288

307

// Split this color box at the median point along its // longest color dimension ColorBox splitBox() { if (this.colorCount() < 2) // this box cannot be split return null; else { // ﬁnd longest dimension of this box: ColorDimension dim = getLongestColorDimension(); // ﬁnd median along dim int med = findMedian(dim); // now split this box at the median return the resulting new // box. int nextLevel = level + 1; ColorBox newBox = new ColorBox(med + 1, upper, nextLevel); this.upper = med; this.level = nextLevel; this.trim(); return newBox; } } // Find longest dimension of this color box ( RED, GREEN, or BLUE) ColorDimension getLongestColorDimension() { int rLength = rmax - rmin; int gLength = gmax - gmin; int bLength = bmax - bmin; if (bLength >= rLength && bLength >= gLength) return ColorDimension.BLUE; else if (gLength >= rLength && gLength >= bLength) return ColorDimension.GREEN; else return ColorDimension.RED; } // Find the position of the median in RGB space along // the red, green or blue dimension, respectively. int findMedian(ColorDimension dim) { // sort color in this box along dimension dim: Arrays.sort(imageColors, lower, upper+1, dim.comparator); // ﬁnd the median point: int half = count / 2; int nPixels, median; for (median = lower, nPixels = 0; median < upper; median++) { nPixels = nPixels + imageColors[median].cnt; if (nPixels >= half) break; } return median; } ColorNode getAverageColor() {

308

289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341

B. Source Code

int rSum = 0; int gSum = 0; int bSum = 0; int n = 0; for (int i = lower; i > (operator) 87

A accumulator array 54 add (method) 285, 291, 298 addNumericField (method) 295 adjoint matrix 200 Adobe – RGB 111, 112 aﬃne mapping 195, 205 AffineMapping (class) 243, 244 AffineTransform (class) 238 aliasing 141, 142, 148, 151, 152, 164, 234 ambient lighting 102 amplitude 127, 128 angular frequency 126, 127, 147, 152, 158 anonymous class 308 apply (method) 121, 122 applyTable (method) 83, 299

applyTo (method) 240, 242, 246, 252, 253 approximation 222 Arctan function 40, 165, 205, 280 area – polygon 34 – region 34 ArrayList (class) 80, 285, 290 Arrays.sort (method) 309 atan2 (method) 280 AWT 118

B bandwidth 143 Bartlett window 170, 173, 174 BasicStroke (class) 292, 293 basis function 147, 149–151, 158, 164, 183, 184, 190 bias problem 61 bicubic interpolation 229 BicubicInterpolator (class) 250, 252 bilinear – interpolation 227 – mapping 203, 205 BilinearInterpolator (class) 249, 252 BilinearMapping (class) 246 binary – image 5 bitmap image 27 black (constant) 292 BLUE (constant) 308 bounding box 35 Bradford model 113, 116 BradfordAdaptation (class) 121

322

breadth-ﬁrst 9 Bresenham algorithm

Index

64

C C2-continuous 220 card 280, 281 cardinal spline 218, 221 cardinality 280, 281 Catmull-Rom interpolation 221 centralMoment (method) 39 centroid 36 chain code 28, 34 chamfer – algorithm 271 – matching 274 chromatic adaptation 111 – Bradford model 113, 116 – XYZ scaling 112 ChromaticAdaptation (class) 121 CIE 98 – chromaticity diagram 99, 102 – L*a*b* 104, 105 – standard illuminant 101 – XYZ 98, 104, 105, 108, 119, 123 circle 64, 198 circularity 34 circumference 33 city block distance 270 clone (method) 241 Cloneable (interface) 240 clutter 275 collectCorners (method) 80 Collections (class) 81 collision 15 Color (class) 118, 119, 292, 293 color – diﬀerence 105 – image 85–124 – management 123 – temperature 101 color quantization 85–95, 301 – 3:3:2 86 – median-cut 88, 301–311 – octree 89 – populosity 88 color space 123 – colorimetric 97–123 – HSB 119 – HSV 119 – in Java 114 – L*a*b* 104 – sRGB 106 – XYZ 98

ColorBox (class) 304, 306 ColorDimension (class) 308 ColorModel (class) 118 ColorNode (class) 304, 305 ColorProcessor (class) 309 ColorQuantizer (interface) 301, 311 ColorSpace (class) 117–119, 121 comb function 139 compactness 34 compareTo (method) 81, 295 comparing images 255–278 Complex (class) 155 complex number 128, 282 computeMatch (method) 266, 268 computer – graphics 2 – vision 3 concat (method) 242, 246 conic section 198 connected components problem 16 container 79 contour 17–26 ContourOverlay (class) 25 ContourTracer (class) 22 convertToFloat (method) 297 convertToRGB (method) 311 convex hull 35, 47 convexity 35 convolution 177, 259 – property 137, 175 convolve (method) 78 Convolver (class) 78, 299 coordinate – Cartesian 194 – homogeneous 194, 241 copyBits (method) 300 Corner (class) 79, 81 corner 69 – detection 69–84 – point 84 – response function 71, 73 – strength 72 CorrCoeffMatcher (class) 266, 267 correlation 177, 259 – coeﬃcient 260 cosine function 134 – one-dimensional 126 – two-dimensional 160, 161 cosine transform 183 cosine2 window 173, 174 cross correlation 259–261 CS_CIEXYZ (constant) 120 CS_GRAY (constant) 120

Index

323

CS_LINEAR_RGB (constant) 120 CS_PYCC (constant) 120 CS_sRGB (constant) 119, 120 CS_sRGBt (constant) 121 cubic – B-spline interpolation 222 – interpolation 217 – spline 219 cycle length 126

D D50 101, 102, 119 D65 102, 104, 107 DCT 183–190 – one-dimensional 183, 186 – two-dimensional 187 DCT (method) 186 deconvolution 180 delta function 137 depth-ﬁrst 7 derivative – ﬁrst 77 determinant 200 DFT 144–183, 280 – one-dimensional 144–154 – two-dimensional 157–183 DFT (method) 155 diameter 35 Dirac function 133, 137 discrete – cosine transform 183–190 – Fourier transform 144–183, 280 – sine transform 183 distance 82, 258 – city block 270 – Euclidean 258, 270 – Manhattan 270 – mask 271 – maximum diﬀerence 258 – sum of diﬀerences 258 – sum of squared diﬀerences 258 – transform 270 DOES_RGB (constant) 311 dots per inch (dpi) 153 dpi 153 draw (method) 83, 293 drawLine (method) 83, 296 DST 183 duplicate (method) 83, 241, 284, 297

E eccentricity edge

42, 48

– map 49 – strength 71 eigenvalue 42, 71 eigenvector 71 ellipse 42, 66, 198 Ellipse2D (class) 286 elliptical window 172 elongatedness 42 enum type 308 Euclidean distance 82, 265, 270 Euler number 45 Euler’s notation 128 EXIF 107

F fast Fourier transform 155, 162, 175, 177 fax encoding 28 feature 32 FFT see fast Fourier transform ﬁlter – Gaussian 70, 77 – in frequency space 175 – inverse 178 – linear 75 Find_Corners (plugin) 84 findCorners (method) 83 FloatProcessor (class) 266 ﬂood ﬁlling 6–10 ﬂoor function 281 four-point mapping 197 Fourier 130 – analysis 130 – coeﬃcients 130 – descriptor 32 – integral 130 – series 130 – spectrum 32, 131, 144 – transform 126–280 – transform pair 132, 134, 135 frequency 127, 152 – angular 126, 127, 147, 158 – common 127 – directional 164 – eﬀective 164 – fundamental 130, 152, 153 – maximum 142, 164 – space 132, 152, 175 – two-dimensional 164 fromCIEXYZ (method) 116–118, 121 function – basis 147, 149–151, 158 – cosine 126 – delta 137

324

– Dirac 133, 137 – impulse 133, 137 – periodic 126 – sine 126 fundamental – frequency 130, 152, 153 – period 152

G gamma correction 114, 120, 122 – modiﬁed 108 gamut 102, 106, 111 – Adobe RGB 112 – sRGB 112 Gaussian – area formula 34 – ﬁlter 70, 77 – function 133, 135 – window 170, 172, 174 GenericDialog (class) 295 geometric operation 191–254 get (method) 286, 289 getComponents (method) 119 getf (method) 267, 268 getInterpolatedPixel (method) 249–251 getInverse (method) 240 getMagnification (method) 293 getMatchValue (method) 268 getNextNumber (method) 295 getPixel (method) 87 getPixels (method) 297 getTitle (method) 294 GIF 28 gradient 70, 77 graph 16 Graphics (class) 292 graphics overlay 25 Graphics2D (class) 293 grayscale – conversion 110 GREEN (constant) 308

H Hadamard transform 188 Hanning window 170, 171, 173, 174 Harris corner detector 70 HarrisCornerDetector (class) 78, 84 hasNext (method) 285 Hertz 127, 152 Hessian normal form 54, 62 histogram 281 homogeneous

Index

– coordinate 194, 241 Hough transform 50–67 – bias problem 61 – edge strength 63 – for circles 64–66 – for ellipses 66–67 – for lines 50–63 – generalized 67 – hierarchical 63 HSBtoRGB (method) 119 HSV 119

I i (imaginary unit) 128, 281, 282 ICC 116 – proﬁle 121 ICC_ColorSpace (class) 120, 123 ICC_Profile (class) 123 iDCT (method) 186 Illuminant (class) 121 illuminant 101 image – binary 5 – coordinates 281 – space 175 – warping 204 ImageCanvas (class) 292 ImageJ – geometric operation 238 ImagePlus (class) 84, 292, 311 ImageWindow (class) 284 impulse – function 133, 137 in place 159 IndexColorModel (class) 304 Integer.MAX_VALUE (constant) 286 interest point 69 interpolation 210–233, 248–251 – B-spline 221, 222 – bicubic 229, 233, 250 – bilinear 227, 232, 249 – by convolution 217 – Catmull-Rom 219, 221, 251 – cubic 217 – ideal 213 – kernel 217 – Lanczos 223, 231, 254 – linear 217 – Mitchell-Netravali 221, 222, 254 – nearest-neighbor 217, 226, 232, 236, 249 – spline 219 – two-dimensional 225–233 invalidNumber (method) 295

Index

invariance 34, 37, 38, 43, 45, 256 inverse – ﬁlter 178 invert (method) 240, 242, 246 isLocalMax (method) 81 isNaN (method) 287 isotropic 70, 84 Iterator (class) 285 iterator (method) 285 ITU709 107

J Jama (package) 199, 246, 247 JPEG 28, 95, 107, 109, 187

L L*a*b* 104 Lab_ColorSpace (class) 117, 121, 122 label 6 Lanczos interpolation 223, 231, 254 line – endpoints 62 – equation 51, 54 – Hessian normal form 54 – intercept/slope form 51 – intersection 62 linear – convolution 75 – interpolation 217 – transformation 199 linearity 136 LinearMapping (class) 241, 244 LinkedList (class) 9, 291 List (interface) 80, 285, 288, 291, 292, 297 list 279 local mapping 207 lookup table 83, 299

M major axis 38 makeInverseMapping (method) 248 makeMapping (method) 243, 245 Manhattan distance 270 Mapping (class) 239 mapping – aﬃne 195, 205 – bilinear 203, 205 – four-point 197 – function 193 – linear 199 – local 207 – nonlinear 204

325

– perspective 198 – projective 197–203, 205 – ripple 206 – spherical 207 – three-point 195 – twirl 204 mask 26 Matrix (class) 247 maximum – frequency 142, 164 media-oriented color 109 median-cut algorithm 88, 301 MedianCutQuantizer (class) 301, 311 mesh partitioning 207 Mitchell-Netravali interpolation 222, 254 mod operator 154, 214, 281 moment 28, 37–44 – central 37 – Hu’s 43, 48 – invariant 43 – least inertia 38 moment (method) 39 morphing 208 MULTIPLY (constant) 300

N NaN (constant) 286 nearest-neighbor interpolation 217 NearestNeighborInterpolator (class) 249 neighborhood 6, 33 neutral – point 101 next (method) 285 NO_CHANGES (constant) 311 nonmaximum suppression 59 normalCentralMoment (method) 39 Nyquist 143, 164

O object 280 OCR 32, 46 octree algorithm 89 orientation 38, 164, 165 orthogonal 190 oscillation 126, 127 overlay 284

P parameter space 51 Parzen window 170, 171, 173, 174 pattern recognition 3, 32

326

Index

perimeter 33 period 126 periodicity 126, 158, 163, 167 perspective – image 66 – mapping 198 phase 127, 153 – angle 128 PixelInterpolator (class) 249 Plessey detector 70 PNG 107 Point (class) 253, 285, 286, 289 point (class) 22 Point2D (class) 239, 241, 253 Point2D.Double (class) 287 Polygon (class) 286, 293 polygon – area 34 pop (method) 10 populosity algorithm 88 power spectrum 153, 162 print pattern 181 proﬁle connection space 115, 119 projection 44, 48 projective mapping 197–203, 205 ProjectiveMapping (class) 244, 252 pseudo-perspective mapping 198 push (method) 10

Q quadrilateral 197 quantization 85–95 – linear 86 – scalar 86 – vector 88 quantizeImage (method)

311

R Rectangle (class) 287 rectangular pulse 133, 135 – window 172 RED (constant) 308 refraction index 207 region 5–48 – area 34, 38, 48 – centroid 36, 48 – convex hull 35 – diameter 35 – eccentricity 42 – labeling 6–17 – major axis 38 – matrix representation 26 – moment 37

– orientation 38 – perimeter 33 – projection 44 – run length encoding 27 – topology 45 relative colorimetry 112 RenderingHints (class) 293 resampling 209–210 RGBtoHSB (method) 119 ripple mapping 206 Rotation (class) 244, 252 rotation 43, 177, 191, 193, 251 round function 281 roundness 34 run length encoding 27

S sampling 137–143 – frequency 164 – interval 140, 141 – theorem 141, 143, 148, 151, 164, 213 scale (method) 293 Scaling (class) 244 scaling 43, 191, 193 separability 187 sequence 279 set 279 setColor (method) 293 setf (method) 268 setMinAndMax (method) 284 setNormalize (method) 78, 299 setRenderingHint (method) 293 setStroke (method) 293 setup (method) 284, 294 setValue (method) 83, 296 Shah function 139 Shannon 143 Shape (class) 286, 292, 293 shape – feature 32 – number 30, 31, 47 Shear (class) 244 shearing 193 shift property 136 show (method) 84, 299, 311 showDialog (method) 295 showMessage (method) 295 showProcessor (method) 299 signal space 132, 152 similarity 136 Sinc function 133, 214, 225 sine function 134 – one-dimensional 126

Index

sine transform 183 size (method) 285 solve (method) 247 sort (method) 81, 298, 309 source-to-target mapping 209 spectrum 125–190 spherical mapping 207 spline – cardinal 218, 221 – Catmull-Rom 219, 221, 222 – cubic 219, 222 – cubic B- 221, 222, 253 – interpolation 219 sqr (method) 299 square window 174 sRGB 106, 108, 112, 114 – ambient lighting 102 – grayscale conversion 110 – white point 102 Stack (class) 9, 10 stack 7 standard illuminant 101, 111 Stroke (class) 293 structure 280 super (method) 292 super-Gaussian window 170, 172

T target-to-source mapping 204, 210, 240 template matching 255, 257, 267 three-point mapping 195 threshold 59 TIFF 28 time unit 127 toArray (method) 81, 298 toCIEXYZ (method) 116–119, 122 topological property 45 tracking 69 transform pair 132 TransformJ (package) 238 translate (method) 286, 293 Translation (class) 244

327

translation 43, 193 tree 7 truncate function 281 tuple 280 twirl mapping 204 TwirlMapping (class) 247 TYPE_Lab (constant) 121

U unit square

204

V variance 261 Vector (class) 80, 298 vector 279 – graphics 25 viewing angle 102

W Walsh transform 188 warping 204 wasCanceled (method) 295 wave number 147, 158, 164, 184 wavelet 188 white (constant) 292 white point 101, 104 – D50 101, 116 – D65 102, 107 windowed matching 267 windowing 169 windowing function 169–171 – Bartlett 170, 173, 174 – cosine2 173, 174 – elliptical 170, 172 – Gaussian 170, 172, 174 – Hanning 170, 173, 174 – Parzen 170, 173, 174 – rectangular pulse 172 – super-Gaussian 170, 172

X XYZ scaling

112

About the Authors Wilhelm Burger received a Master’s degree in Computer Science from the University of Utah (Salt Lake City) and a doctorate in Systems Science from Johannes Kepler University in Linz, Austria. As a post-graduate researcher at the Honeywell Systems & Research Center in Minneapolis and the University of California at Riverside, he worked mainly in the areas of visual motion analysis and autonomous navigation. In the Austrian research initiative on digital imaging, he was engaged in projects on generic object recognition and biometric identiﬁcation. Since 1996, he has been the director of the Digital Media degree programs at the Upper Austria University of Applied Sciences at Hagenberg. Personally the author appreciates large-engine vehicles and (occasionally) a glass of dry “Veltliner”. Mark J. Burge received a BA degree from Ohio Wesleyan University, a MSc in Computer Science from the Ohio State University, and a doctorate from Johannes Kepler University in Linz, Austria. He spent several years as a researcher in Zürich, Switzerland at the Swiss Federal Institute of Technology (ETH), where he worked in computer vision and pattern recognition. As a post-graduate researcher at the Ohio State University, he was involved in the “Image Understanding and Interpretation Project” sponsored by the NASA Commercial Space Center. He earned tenure within the University System of Georgia as an associate professor in computer science and served as a Program Director at the National Science Foundation. Currently he is a Principal at Noblis (Mitretek) in Washington D.C. Personally, he is an expert on classic Italian espresso machines.

About this Book Series The complete manuscript for this book was prepared by the authors “cameraready” in LATEX using Donald Knuth’s Computer Modern fonts. The additional packages algorithmicx (by Szász János) for presenting algorithms, listings (by Carsten Heinz) for listing progam code, and psfrag (by Michael C. Grant and David Carlisle) for replacing text in graphics were particularly helpful in this task. Most illustrations were produced with Macromedia Freehand (now part of Adobe), function plots with Mathematica, and images with ImageJ or Adobe Photoshop. All book ﬁgures, test images in color and full resolution, as well as the Java source code for all examples are available at the book’s support site: www.imagingbook.com.