now you can use the different available detectors and descriptors for…

… main_slam.py too
jingweiz · Mar 24, 2019 · 08ad990 · 08ad990
1 parent 64b5d78
commit 08ad990
Show file tree

Hide file tree

Showing 20 changed files with 406 additions and 173 deletions.
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@ Author: [Luigi Freda](https://www.luigifreda.com)
 **pySLAM** is a *'toy'* implementation of a monocular *Visual Odometry (VO)* pipeline in Python. I released it for **educational purposes**, for a [computer vision class](https://as-ai.org/visual-perception-and-spatial-computing/) I taught. I started developing it for fun as a python programming exercise, during my free time. I took inspiration from some python repos available on the web. 
 
 Main Scripts:
-* `main_vo.py` combines the simplest VO ingredients without performing any image point triangulation or windowed bundle adjustment. At each step $k$, `main_vo.py` estimates the current camera pose $C_k$ with respect to the previous one $C_{k-1}$. The inter frame pose estimation returns $[R_{k-1,k},t_{k-1,k}]$ with $||t_{k-1,k}||=1$. With this very basic computation, you need to use a ground truth in order to recover a correct inter-frame scale $s$ and estimate a meaningful trajectory by composing $C_k = C_{k-1} * [R_{k-1,k}, s t_{k-1,k}]$. This script is a first start to understand the basics of inter frame feature tracking and camera pose estimation.
+* `main_vo.py` combines the simplest VO ingredients without performing any image point triangulation or windowed bundle adjustment. At each step $k$, `main_vo.py` estimates the current camera pose $C_k$ with respect to the previous one $C_{k-1}$. The inter frame pose estimation returns $[R_{k-1,k},t_{k-1,k}]$ with $||t_{k-1,k}||=1$. With this very basic approach, you need to use a ground truth in order to recover a correct inter-frame scale $s$ and estimate a valid trajectory by composing $C_k = C_{k-1} * [R_{k-1,k}, s t_{k-1,k}]$. This script is a first start to understand the basics of inter frame feature tracking and camera pose estimation.
 
 * `main_slam.py` adds feature tracking along multiple frames, point triangulation and bundle adjustment in order to estimate the camera trajectory up-to-scale and a build a local map. It's still a VO pipeline but it shows some basic blocks which are necessary to develop a real visual SLAM pipeline. 
 
@@ -71,9 +71,9 @@ $ python3 -O main_vo.py
 ```
 This will process a [KITTI]((http://www.cvlibs.net/datasets/kitti/eval_odometry.php)) video (available in the folder `videos`) by using its corresponding camera calibration file (available in the folder `settings`), and its groundtruth (available in the video folder). 
 
-**N.B.**: remind, the simple script `main_vo.py` **strictly requires a ground truth**, since - with the used approach - the relative motion between two adjacent camera frames can be only estimated up to scale with a monocular camera (i.e. the implemented inter frame pose estimation returns $[R_{k-1,k},t_{k-1,k}]$ with $||t_{k-1,k}||=1$).  
+**N.B.**: as explained above, the script `main_vo.py` **strictly requires a ground truth**.  
 
-In order to process a different dataset, you need to set the file `config.ini`:
+In order to process a different **dataset**, you need to set the file `config.ini`:
 * select your dataset `type` in the section `[DATASET]` (see the section *Datasets* below for further details) 
 * the camera settings file accordingly (see the section *Camera Settings* below)
 * the groudtruth file accordingly (see the section *Camera Settings* below)
@@ -83,6 +83,8 @@ If you want to test the script `main_slam.py`, you can run:
 $ python3 -O main_slam.py
 ```
 
+You can choose any detector/descriptor among *ORB*, *SIFT*, *SURF*, *BRISK*, *AKAZE* (see below for further information). 
+
 **WARNING**: the available **KITTI videos** (due to information loss in video compression) make main_slam tracking peform worse than with the original KITTI *image sequences*. The available videos are intended to be used for a first quick test. Please, download and use the original KITTI image sequences as explained below. For instance, on the original KITTI sequence 06, main_slam successfully completes the round; at present time, this does not happen with the compressed video.
 
 --- 
@@ -140,6 +142,29 @@ In order to calibrate your camera, you can use the scripts in the folder `calibr
 1. use the script `grab_chessboard_images.py` to collect a sequence of images where the chessboard can be detected (set the chessboard size there)
 2. use the script `calibrate.py` to process the collected images and compute the calibration parameters (set the chessboard size there)
 
+---
+## Detectors/Descriptors
+
+At present time, the following feature **detectors** are supported: 
+* *FAST*
+* *Good features to track* [[ShiTo94]](https://ieeexplore.ieee.org/document/323794)
+* *ORB*  
+* *SIFT*
+* *SURF*
+* *AKAZE*
+* *BRISK*
+
+You can take a look at the file `feature_detector.py`. 
+
+The following feature **descriptors** are supported: 
+* *ORB*  
+* *SIFT*
+* *SURF*
+* *AKAZE*
+* *BRISK*
+
+In both the scripts `main_vo.py` and `main_slam.py`, you can set which detector/descritor to use by means of the function *feature_tracker_factory()*. This function be found in the file `feature_tracker.py`.
+
 --- 
 ## References
 
@@ -161,10 +186,9 @@ Tons of things are still missing to attain a real SLAM pipeline:
 
 * keyframe generation and management 
 * tracking w.r.t. previous keyframe 
-* proper local map generation and management 
+* proper local map generation and management (covisibility)
 * loop closure
 * general relocalization 
-* in main_slam, tracking by using all kind of features (not only ORB)
 
 
 ---
@@ -173,4 +197,4 @@ Tons of things are still missing to attain a real SLAM pipeline:
 * [twitchslam](https://github.com/geohot/twitchslam)
 * [monoVO](https://github.com/uoip/monoVO-python)
 * [pangolin](https://github.com/stevenlovegrove/Pangolin) 
-* [g2opy](https://github.com/uoip/g2opy)
+* [g2opy](https://github.com/uoip/g2opy)
diff --git a/config.ini b/config.ini
@@ -20,11 +20,11 @@ type=VIDEO_DATASET
 type=kitti 
 base_path=/home/luigi/Work/rgbd_datasets/kitti/dataset
 ;
-name=00
-cam_settings=settings/KITTI00-02.yaml
+#name=00
+#cam_settings=settings/KITTI00-02.yaml
 ;
-#name=06
-#cam_settings=settings/KITTI04-12.yaml
+name=06
+cam_settings=settings/KITTI04-12.yaml
 ;
 groundtruth_file=auto
 
@@ -44,8 +44,8 @@ type=video
 base_path=./videos/kitti00
 cam_settings=settings/KITTI00-02.yaml
 ;
-#base_path=./videos/kitti06
-#cam_settings=settings/KITTI04-12.yaml
+;base_path=./videos/kitti06
+;cam_settings=settings/KITTI04-12.yaml
 ;
 name=video.mp4
 groundtruth_file=groundtruth.txt

diff --git a/config.py b/config.py
@@ -38,6 +38,8 @@ def __init__(self):
         self.cam_settings = None
         self.dataset_settings = None
         self.dataset_type = None
+        self.current_path = os.getcwd()
+        #print('current path: ', self.current_path)
 
         self.set_lib_paths()
         self.get_dataset_settings()
@@ -56,6 +58,9 @@ def set_lib_paths(self):
     def get_dataset_settings(self):
         self.dataset_type = self.config_parser['DATASET']['type']
         self.dataset_settings = self.config_parser[self.dataset_type]
+
+        self.dataset_path = self.dataset_settings['base_path'];
+        self.dataset_settings['base_path'] = os.path.join( __location__, self.dataset_path)
         #print('dataset_settings: ', self.dataset_settings)
 
     # get camera settings

diff --git a/dataset.py b/dataset.py
@@ -91,9 +91,10 @@ class VideoDataset(Dataset):
     def __init__(self, path, name, associations=None, type=DatasetType.VIDEO): 
         super().__init__(path, name, associations, type)    
         self.filename = path + '/' + name 
+        #print('video: ', self.filename)
         self.cap = cv2.VideoCapture(self.filename)
         if not self.cap.isOpened():
-            raise IOError('Cannot open movie file')
+            raise IOError('Cannot open movie file: ', self.filename)
         else: 
             print('Processing Video Input')
             self.num_frames = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT))

diff --git a/feature_detector.py b/feature_detector.py
@@ -17,10 +17,11 @@
 * along with PYSLAM. If not, see <http://www.gnu.org/licenses/>.
 """
 import sys 
+import math 
 import numpy as np 
 import cv2
 from enum import Enum
-from geom_helpers import imgBlocks
+from geom_helpers import imgBlocks, unpackSiftOctaveKps
 
 kVerbose = True   
 
@@ -32,6 +33,7 @@
 kNumLevels = 4
 kNumLevelsInitSigma = 12
 kScaleFactor = 1.2 
+kSigmaLevel0 = 1. 
 
 kDrawOriginalExtractedFeatures = False  # for debugging 
 
@@ -44,6 +46,7 @@ class FeatureDetectorTypes(Enum):
     ORB  = 5 
     BRISK = 6
     AKAZE = 7
+    FREAK = 8  # DOES NOT WORK!
 
 
 class FeatureDescriptorTypes(Enum):
@@ -53,6 +56,7 @@ class FeatureDescriptorTypes(Enum):
     ORB  = 3  
     BRISK = 4       
     AKAZE = 5
+    FREAK = 6  # DOES NOT WORK!   
 
 
 def feature_detector_factory(min_num_features=kMinNumFeatureDefault, 
@@ -181,7 +185,8 @@ def __init__(self, min_num_features=kMinNumFeatureDefault,
         self.descriptor_type = descriptor_type
 
         self.num_levels = num_levels  
-        self.scale_factor = kScaleFactor  
+        self.scale_factor = kScaleFactor  # scale factor bewteen two octaves 
+        self.sigma_level0 = kSigmaLevel0  # sigma on first octave 
         self.initSigmaLevels()
 
         self.min_num_features = min_num_features
@@ -191,20 +196,26 @@ def __init__(self, min_num_features=kMinNumFeatureDefault,
         self.use_pyramid_adaptor = False 
         self.pyramid_adaptor = None 
 
+        print("using opencv ", cv2.__version__)
+        # check opencv version in order to use the right modules 
         if cv2.__version__.split('.')[0] == '3':
-            from cv2.xfeatures2d import SIFT_create, SURF_create 
+            from cv2.xfeatures2d import SIFT_create, SURF_create, FREAK_create   
             from cv2 import ORB_create, BRISK_create, AKAZE_create
         else:
             SIFT_create = cv2.SIFT
             SURF_create = cv2.SURF
             ORB_create = cv2.ORB 
             BRISK_create = cv2.BRISK
-            AKAZE_create = cv2.AKAZE            
+            AKAZE_create = cv2.AKAZE 
+            FREAK_create = cv2.FREAK # TODO: to be checked 
+
+        self.FAST_create = cv2.FastFeatureDetector_create
         self.SIFT_create = SIFT_create
         self.SURF_create = SURF_create
         self.ORB_create = ORB_create 
         self.BRISK_create = BRISK_create            
-        self.AKAZE_create = AKAZE_create        
+        self.AKAZE_create = AKAZE_create   
+        self.FREAK_create = FREAK_create     # DOES NOT WORK!   
 
         self.orb_params = dict(nfeatures=min_num_features,
                                scaleFactor=self.scale_factor,
@@ -219,25 +230,33 @@ def __init__(self, min_num_features=kMinNumFeatureDefault,
 
         # init detector 
         if self.detector_type == FeatureDetectorTypes.SIFT: 
-            self._feature_detector = SIFT_create() 
+            self._feature_detector = self.SIFT_create()  # N.B.: The number of octaves is computed automatically from the image resolution
+                                                         #  from https://docs.opencv.org/3.4/d5/d3c/classcv_1_1xfeatures2d_1_1SIFT.html
+            self.scale_factor = 2  # from https://docs.opencv.org/3.1.0/da/df5/tutorial_py_sift_intro.html 
+            # self.layer_scale_factor = math.sqrt(2) # with SIFT, 3 layers per octave are generated with a intra-layer scale factor = sqrt(2)
+            self.sigma_level0 = 1.6
+            self.initSigmaLevels()                                                        
             self.detector_name = 'SIFT'
         elif self.detector_type == FeatureDetectorTypes.SURF:
-            self._feature_detector = SURF_create() 
+            self._feature_detector = self.SURF_create(nOctaveLayers=self.num_levels)          
             self.detector_name = 'SURF'            
         elif self.detector_type == FeatureDetectorTypes.ORB:
-            self._feature_detector = ORB_create(**self.orb_params) 
+            self._feature_detector = self.ORB_create(**self.orb_params) 
             self.detector_name = 'ORB'                
             self.use_bock_adaptor = True    
         elif self.detector_type == FeatureDetectorTypes.BRISK:
-            self._feature_detector = BRISK_create(octaves=self.num_levels) 
+            self._feature_detector = self.BRISK_create(octaves=self.num_levels) 
             self.detector_name = 'BRISK'  
             self.scale_factor = 1.3   # from the BRISK opencv code this seems to be the used scale factor between intra-octave frames 
             self.initSigmaLevels()                 
         elif self.detector_type == FeatureDetectorTypes.AKAZE:
-            self._feature_detector = AKAZE_create(nOctaves=self.num_levels) 
-            self.detector_name = 'AKAZE'                                                  
+            self._feature_detector = self.AKAZE_create(nOctaves=self.num_levels) 
+            self.detector_name = 'AKAZE'   
+        elif self.detector_type == FeatureDetectorTypes.FREAK:
+            self._feature_detector = self.FREAK_create(nOctaves=self.num_levels) 
+            self.detector_name = 'FREAK'                                                              
         elif self.detector_type == FeatureDetectorTypes.FAST:
-            self._feature_detector = cv2.FastFeatureDetector_create(threshold=25, nonmaxSuppression=True)  
+            self._feature_detector = self.FAST_create(threshold=25, nonmaxSuppression=True)  
             self.detector_name = 'FAST'        
             self.use_bock_adaptor = True             
             self.use_pyramid_adaptor = self.num_levels > 1               
@@ -257,20 +276,23 @@ def __init__(self, min_num_features=kMinNumFeatureDefault,
 
         # init descriptor  
         if self.descriptor_type == FeatureDescriptorTypes.SIFT: 
-            self._feature_descriptor = SIFT_create() 
+            self._feature_descriptor = self.SIFT_create() 
             self.decriptor_name = 'SIFT'            
         elif self.descriptor_type == FeatureDescriptorTypes.SURF:
-            self._feature_descriptor = SURF_create() 
+            self._feature_descriptor = self.SURF_create(nOctaveLayers=self.num_levels) 
             self.decriptor_name = 'SURF'                  
         elif self.descriptor_type == FeatureDescriptorTypes.ORB:
-            self._feature_descriptor = ORB_create(**self.orb_params) 
+            self._feature_descriptor = self.ORB_create(**self.orb_params) 
             self.decriptor_name = 'ORB' 
         elif self.descriptor_type == FeatureDescriptorTypes.BRISK:
-            self._feature_descriptor = BRISK_create(octaves=self.num_levels) 
+            self._feature_descriptor = self.BRISK_create(octaves=self.num_levels) 
             self.decriptor_name = 'BRISK'         
         elif self.descriptor_type == FeatureDescriptorTypes.AKAZE:
-            self._feature_descriptor = AKAZE_create(nOctaves=self.num_levels) 
-            self.decriptor_name = 'AKAZE'                                                
+            self._feature_descriptor = self.AKAZE_create(nOctaves=self.num_levels) 
+            self.decriptor_name = 'AKAZE'  
+        elif self.descriptor_type == FeatureDescriptorTypes.FREAK:
+            self._feature_descriptor = self.FREAK_create(nOctaves=self.num_levels) 
+            self.decriptor_name = 'FREAK'                                                             
         elif self.descriptor_type == FeatureDescriptorTypes.NONE:
             self._feature_descriptor = None              
             self.decriptor_name = 'None'                                     
@@ -284,11 +306,13 @@ def initSigmaLevels(self):
         self.inv_scale_factors = np.zeros(num_levels)
         self.inv_level_sigmas2 = np.zeros(num_levels)
 
+        # TODO: in the SIFT case, this sigma management could be refined. 
+        #        SIFT method has layers with intra-layer scale factor = math.sqrt(2)
         self.scale_factors[0]=1.0
-        self.level_sigmas2[0]=1.0
+        self.level_sigmas2[0]=self.sigma_level0*self.sigma_level0
         for i in range(1,num_levels):
             self.scale_factors[i]=self.scale_factors[i-1]*self.scale_factor
-            self.level_sigmas2[i]=self.scale_factors[i]*self.scale_factors[i]
+            self.level_sigmas2[i]=self.scale_factors[i]*self.scale_factors[i]*self.level_sigmas2[i-1]
         #print('self.scale_factors: ', self.scale_factors)
         for i in range(num_levels):
             self.inv_scale_factors[i]=1.0/self.scale_factors[i]
@@ -306,7 +330,7 @@ def detect(self, frame, mask=None):
             kps = self.block_adaptor.detect(frame, mask)            
         else:       
             kps = self._feature_detector.detect(frame, mask)                  
-        kps = self.satNumberOfFeatures(kps)            
+        kps = self.satNumberOfFeatures(kps)                  
         if kDrawOriginalExtractedFeatures: # draw the original features
             imgDraw = cv2.drawKeypoints(frame, kps, None, color=(0,255,0), flags=0)
             cv2.imshow('detected keypoints',imgDraw)            
@@ -321,6 +345,8 @@ def detectAndCompute(self, frame, mask=None):
             frame = cv2.cvtColor(frame,cv2.COLOR_RGB2GRAY)             
         kps = self.detect(frame, mask)    
         kps, des = self._feature_descriptor.compute(frame, kps)  
+        if self.detector_type == FeatureDetectorTypes.SIFT:
+            unpackSiftOctaveKps(kps)            
         if kVerbose:
             #print('detector: ', self.detector_name, ', #features: ', len(kps))           
             print('descriptor: ', self.decriptor_name, ', #features: ', len(kps))                                

diff --git a/feature_matcher.py b/feature_matcher.py
@@ -128,7 +128,7 @@ def __init__(self, norm_type=cv2.NORM_HAMMING, cross_check = False, type = Featu
 
     def match(self, des1, des2):
         if kVerbose:
-            print('BfFeatureMatcher')        
+            print('BfFeatureMatcher, norm ', self.norm_type)        
         matches = self.bf.knnMatch(des1, des2, k=2)   #knnMatch(queryDescriptors,trainDescriptors)
         return self.goodMatches(matches, des1, des2)
 
@@ -140,9 +140,9 @@ def __init__(self, norm_type=cv2.NORM_HAMMING, cross_check = False, type = Featu
             # FLANN parameters for binary descriptors 
             FLANN_INDEX_LSH = 6
             self.index_params= dict(algorithm = FLANN_INDEX_LSH,
-                        table_number = 12,      # 12
-                        key_size = 20,         # 20
-                        multi_probe_level = 2) # 2            
+                        table_number = 6,      # 12
+                        key_size = 12,         # 20
+                        multi_probe_level = 1) # 2            
         if norm_type == cv2.NORM_L2: 
             # FLANN parameters for float descriptors 
             FLANN_INDEX_KDTREE = 1