> Русский вариант <

Computer vision problems and technologies on this site

What is Semantic Segmentation?

How an object can be identified and delineated in an image or a video frame ?  The first thing coming to mind  is to store a description of the object in the computer memory and teach the computer to match it to different portion of the image. This approach is feasible when the object to be found is known beforehand, but becomes unrealistically laborious as the number of objects increases. So, the question arises if a frame can be split (segmented) into objects prior to their recognition. As, for example, when we make out an unknown and odd-colored fish against a strange-looking sea floor. Segmentation of an image into objects based on their generic properties (features) is called semantic segmentation. One of the most important properties of objects is that they can occlude (screen, hide) other objects. Segmentation without recognition is essential for content-based video encoding promoted by the MPEG-4 standard.

The goal of semantic segmentation is basically to find the occluding edges in the image. One can safely assume that an object is found when a region is located in the image such that its boundary is an occluding edge, implying that 3D scene points projected on its opposite sides are at different depths from the viewer. The occluding edge can, therefore, be defined as a chain of points in the image corresponding to sudden changes in the distance to the viewed object surfaces in the scene. However, it does not suffice to locate an occluding edge. One also needs to know what side of it corresponds to the occluded object and where the occluder is.

The approach: first locate all edges in the image and then try to identify those which are occluding boundaries. The initial edges for our analysis are supplied by the color segmentation procedure. Being closed, such edges are especially suitable. The occurrence of an occluding edge can be inferred from various measurements and computations:

(a) direct measurement of the distance to the surface points for each pixel in the image (3D-imaging);

(b) measuring local motions, including the optic flow on both sides of the boundary and the motion of three boundaries meeting in a junction (motion analysis);

(c) measuring global motions. If all regions belonging to one object can be accurately mapped onto the next frame by a color- (intensity-) preserving multiparametric transformation (e.g., an affine one), this can be of great help in locating occluding boundaries (motion analysis);

(d) junction analysis. Edges in the image can occur as a result of widely different events in the 3D-scene: occlusions, abrupt variations of surface color or intensity, abrupt changes in surface orientation (surface edges) and, therefore, illumination, etc. The points where three or more contours meet are commonly called junctions. Several types of junctions can be identified depending on the edge intersection geometry, for example, T-junctions and Y-junctions. Junction analysis on a single image and junction type tracing from junction to junction can significantly reduce the number of possible variants to assign occlusion labels to region contours (junction analysis).

The knowledge that some contour segment is an occluding boundary may often reduce or even eliminate the ambiguity in the interpretation of other boundaries in the same frame or other frames.   Suppose, for example, that a moving object was reliably identified and followed over a number of frames. Then, by tracking, the same object can also be regarded as an occluder in those frames where it is no longer moving. This type of tracking was implemented in Project GM3 . Boundary tracing within one frame was used in Project SI1.

Thanks for reading !

  applications team
  projects links
  cooperation contacts

 

Laboratory

The Laboratory was organized in 2000 by six researchers (team) from three Russian research institutions with the purpose to accumulate ideas and develop and promote  new technologies in the field of semantic segmentation and content-based video encoding.

We strongly believe that content-based video coding envisaged by the MPEG-4 standard can not be accomplished unless several basically different methods from computer vision are employed and integrated within a single calculation process.  This calls for a close cooperation of groups developing technologies presented on this site.

Our research does not cover all key issues in the development of content-based video encoding. Hence, the purpose of this site is to establish cooperation with other researchers, teams,  and companies interested in  such technologies. 

Both under previous contracts and regulations currently in force, our team has the right to use, publish and transfer to other parties any algorithms and the source code of programs presented on this site

Major Results

These were obtained in the course of three completed projects: SI1, GM3, and CS2.

We believe that an important contribution was to state the problem of finding moving objects as a problem of identifying occluding boundaries (see text on the left). This is a physically sound approach.

Physically based principles of color segmentation were formulated and implemented.

The problem of semantic segmentation of a single image was exhaustively explored for various assumptions relating to the scene geometry and coloration (junction analysis).

Nearly exhaustive results were obtained in the problem of separating an object and the  background and inferring their motions from the motion of regions' edges in the case when all these motions can be approximated by an affine mapping (Project GM3 ). New methods of motion analysis were developed that are especially applicable to low-contrast objects.

The proposed  approaches to color segmentation and motion analysis are  implemented  in the  tools under development  for intraframe editting and postproduction  (Project AEP4).