An Introduction of Computer Vision

November 8, 2017 Author: rajesh
Print Friendly, PDF & Email

Computer vision is the science and technology of machines that see, and seeing in this case means that the machine is able to extract from an image some information that is necessary for solving some task. As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences, views from multiple cameras, or multi-dimensional data from a medical scanner. As a technological discipline, computer vision seeks to apply its theories and models to the construction of computer vision systems.

Computer Vision : Overview

The human ability to interact with other people is based on their ability of recognition. This innate ability to effortlessly identify and recognize objects, even if distorted or modified, has induced to research on how the human brain processes these images. This skill is quite reliable, despite changes due to viewing conditions, emotional expressions, ageing, added artifacts, or even circumstances that permit seeing only a fraction of the face. Furthermore, humans are able to recognize thousands of individuals during their lifetime. Understanding the human mechanism, in addition to cognitive aspects, would help to build a system for the automatic identification of faces by a machine. However, face recognition is still an area of active research since a completely successful approach or model has not yet been proposed to solve the face recognition problem. Automated face recognition is a very popular field nowadays. Face recognition can be used in a multitude of commercial and law enforcement applications. For example, a security system could grab an image of a person and the identity of the individual by matching the image with the one stored on the system database.

Typical tasks of computer vision are:

  • Recognition
  • Motion analysis
  • Scene reconstruction
  • Image restoration

The Difficulty with Computer Vision

At present, a computing machine is not able to actually understand what it sees. This level of comprehension is still a faraway goal for computers, as the ability to understand an image is not just to collect some pixels. The capability to identify an object perfectly is truly incredible

Computers only “see” just a grid of numbers from the camera or from a disk, and that is how far it can go. Those parameters have rather a large noise component, so the profitable information is quite small at the end. Many computer vision problems are difficult to specify, especially because the information is lost in the transformation from the 3D world to a 2D image. Furthermore given a two-dimensional view of a 3D world, there is no unique solution to reconstruct the 3D image. The noise in computer vision is typically dealt with the use of statistical methods. However, other techniques account for noise or distortions by building explicit models learned directly from the available data.

Fields of Computer Vision

Figure 1 Fields of Computer Vision

The image seen in Figure 1 displays various fields of computer vision which include pattern recognition and image processing. These fields can be considered as abstractly related because usually, advances in one field could potentially lead to advances in other fields as well. Developing a successful face recognition system requires a cumulative knowledge from all of these fields.

Computer Vision: Applications

The good news is that computer vision is being used today in a wide variety of real-world applications, which include:

  • Optical character recognition (OCR): reading handwritten postal codes on letters and automatic number plate recognition (ANPR);
  • Machine inspection: rapid parts inspection for quality assurance using stereo vision with specialized illumination to measure tolerances on aircraft wings or auto body parts or looking for defects in steel castings using X-ray vision;
  • Retail: object recognition for automated checkout lanes;
  • 3D model building (photogrammetric): fully automated construction of 3D models from aerial photographs used in systems such as Bing Maps;
  • Medical imaging: registering pre-operative and intra-operative imagery or performing long-term studies of people’s brain morphology as they age;
  • Automotive safety: detecting unexpected obstacles such as pedestrians on the street, under conditions where active vision techniques such as radar or lidar do not work well.
  • Match move: merging computer-generated imagery (CGI) with live action footage by tracking feature points in the source video to estimate the 3D camera motion and shape of the environment. Such techniques are widely used in them also require the use of precise matting to insert new elements between foreground and background elements.
  • Motion capture (MOCAP): using retro-reflective markers viewed from multiple cameras or other vision-based techniques to capture actors for computer animation;
  • Surveillance: monitoring for intruders, analyzing highway traffic, and monitoring pools for drowning victims;
  • Fingerprint recognition and biometrics: for automatic access authentication as well as forensic applications.


[1] Bradski, G. and Kaehler, A. 2008, Learning OpenCV: Computer Vision with the OpenCV Library. Sebastopol: O’Reilly.

[2] Bambach, S, A survey on recent advances of computer vision algorithms for egocentric video. arXiv preprint arXiv:1501.02825, 2015.

[3] Chuang, Y.-Y., Agarwala, A., Curless, B., Salesin, D. H., and Szeliski, R. (2002), Video matting of complex scenes, ACM Transactions on Graphics (Proc. SIGGRAPH 2002), 21(3):243–248.

[4] Richard Szeliski, “Computer Vision: Algorithms and Applications”, September 3, 2010 draft 2010 Springer.


Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview