What is computer vision?


computer vision



In this article we will briefly talk about one of the most important branches of artificial intelligence, which has occupied many researchers in recent times, the field of computer vision.

Man has eyes that show the surrounding images (the physiological vision of the human) and then enter those images to the brain through the cells and the neural connections, then  occurs   immediate treatment for this images ,so the person can identify what his eyes see.




 


Similarly, computer vision is the process of enabling the computer to have the ability to view the surrounding medium (using any camera connected to it, such as a web camera, for example) and then analyze and process what to recognize it and this is done using some of the algorithms prepared for this.





We now agree that computer vision is basically one of the fields of computer science designed to build intelligent applications that have the ability to understand the content of images as understood by humans.
In fact, it is a mixture of concepts, techniques and ideas from a number of areas, including digital picture processors, pattern recognition, artificial intelligence, and computer graphics.

There is not even an accurate definition of the field of computer vision, but it can simply be said that it is in the middle between processing images and artificial intelligence (including machine learning) as follows:
 
source



Well, how does the computer see the surrounding medium?

This is done by receiving digital images from the camera connected to it in the style of each separate image (a single snapshot of the camera) and this is in certain applications that allow this form or successive successive images (through video recording).

Video is a large set of successive images that are captured continuously at high speed These images appear successive in the form of the video that we see in natural.

 
source


How does the computer see images?

The image is basically a matrix of pixel units in the x and y axes (x, y).
Each pixel has a set of attributes and features  that determine its type, identity and function in the image.

So the human eye when dealing with a specific image such as the image of the cat, for example, they see it in this way:  
source


While the computer sees it as an array of digitized values of each pixel in which it is valued in a form close to this image
 

And for deeply understand for this concept see my previous articls


 

1/ Recognition
Recognition Is the most  common task and means to determine whether the contains or does not contain a certain object or activity - this task seems very simple to the natural person and feel no effort when the eye falls on the body has already identified and deal with it before, but this task is much more difficult than it seems to computers.

2/ Detection,Localization,segmentation
It is a process that is more complicated than the recognition process. This process is more complex. First, it implements the algorithms and methods of recognition, followed by algorithms and other methods to accurately locate the objects in the image.


 
source



Computer Vision Applications

* Robotics
* Monitoring systems and face recognition (Face Recognition)
       Common technique in banks, airports, train stations and important areas
* Medical fields such as radiology, sonar and binoculars
      X-ray, MRI, etc.) in surgical operations, or early identification of cancer cells in captured images
* Cars and self-driving trains

    The world is now turning to this technology very quickly and there are already many countries that are currently using this technology.
* Motion detection in a certain center.
* Handwriting recognition or any other writing and conversion to text in the computer.
* Check baggage
* Manufacturing inspection during the manufacturing process as well as product quality control.
* Tracking
Track the movement of objects in the successive image (video) such as tracking people or cars - etc.
Video tutorial (https://www.youtube.com/watch?v=9KmrZ_M--zA&t=4s)
* Identification and location (Detection and segmentation)
Video tutorial (https://www.youtube.com/watch?v=CeKq9l3WEl0&t=73s)
* Recognize people by distinguishing features such as finger, face, or iris, called biological systems.
* Military areas, and their applications to identify enemy vehicles and missile guidance.
* Traffic systems, such as identification of car plates
* Aircraft Navigation Systems.
* Observe the movement of the eyelid for drivers to warn them to sleep while driving.
* High security systems using iris for people.
* Translating signals from those who can not speak or hear and convert them into written texts or spoken words
* Agricultural and mechanical harvesting.
There are more and more researches, studies and applications in this area. It is really a hot area of ​​research that never stops.


How do you create a computer vision system capable of performing a task (for example, one of the previous applications)?

First, you need a strong understanding of the concept of image, pixel and its structure, color depth and other basic concepts that create a real awareness of the data that you will deal with and classify all the time.
Also learn the skills of opening, saving, creating, arranging, cutting pictures, separating colors, To deal programmatically with images.
Secondly, you need a good and strong understanding of the main programming language in this field, which is Python  .

thirdly , Studying and understanding the artificial neural networks.
Fourthly ,good understanding for a type of artificial neural network called convolutional neural network (CNN) because in the analysis and processing of images, its more efficient to take advantage of time and storage memory compared to fully connected networks.

Fifthly , Use the appropriate work environment for application training and testing and to nominate you to use
(Anaconda, spyder and jupyter)
sixthly , Use suitable libraries to work --There are many of them but I nominate you
(Keras, tensorflow, opencv, imutils and sklearn)
Seventh, it is preferable to use a strong computer in its own processing capabilities and have Graphic Processing Unit (gpu)
because the training of strong  artificial network   is hard process  and slow  and consume from their resources a lot.






Refrences/
https://ar.wikipedia.org/wiki
https://towardsengineeringvision.blogspot.com/2018/12/pixels-with-coordinate-system-and_45.html
https://stackoverflow.com/questions/42299587/python-transform-list-of-x-y-and-z-to-matrix-table
https://www.oreilly.com/library/view/deep-learning-for/9781788295628/4fe36c40-7612-44b8-8846-43c0c4e64157.xhtml
https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=2ahUKEwiiqtTAs73fAhUwyoUKHYsqBzYQjRx6BAgBEAU&url=https%3A%2F%2Fdepositphotos.com%2F174867248%2Fstock-photo-futuristic-eye-vision-backdrop.html&psig=AOvVaw04-UFDAyN-GWk5TjaWmEjk&ust=1545910421646644


    




 

















Comments

Popular posts from this blog

Color Spaces and Color depth

Pixels With Coordinate System and Resolution