Improving RANSAC for Efficient and Precise Model Fitting with Statistical Analysis

—RANSAC (random sample consensus) has been widely used as a benchmark algorithm for model fitting in the presence of outliers for more than thirty years. It is robust for outlier removal and rough model fitting, but neither reliable nor efficient enough for many applications where precision and time is critical. Many other algorithms have been proposed for the improvement of RANSAC. However, no much effort has been done to systematically tackle its limitations on model fitting repeatability, quality indication, iteration termination, and multi model fitting. A new paradigm, named as SASAC (statistical analysis for sample consensus), is introduced in this paper to relinquish the limitations of RANSAC above. Unlike RANSAC that does not consider sampling noise, which is true in most sampling cases, a term named as σ rate is defined in SASAC. It is used both as an indicator for the quality of model fitting, and as a criterion for terminating iterative model searching. Iterative least square is advisably integrated in SASAC for optimal model estimation, and a strategy is proposed to handle multi model situation. Experiment results for linear and quadratic function model fitting demonstrate that SASAC can significantly improve the quality and reliability of model fitting and largely reduce the number of iterations for model searching. Using the σ rate as an indicator for the quality of model fitting can effectively avoid wrongly estimated model. In addition, SASAC works very well to multi model dataset and can provide reliable estimations to all the models. SASAC can be combined with RANSAC and its variants to dramatically improve their performance.


I. INTRODUCTION
RANSAC (random sampling and consensus) is a robust regression algorithm well applicable to data contaminated by a large fraction of outliers [2].It is based on the assumption that a subset of data uncontaminated with outlier will construct a correct model.It can be summarized as a hypothesize-andverify framework: a subset of samples for model fitting is randomly selected from the given dataset and used to fit a model hypothesis, which is then evaluated by computing the distance of all the other samples to this model and constructing an inlier subset with a threshold.This hypothesize-and-verify loop is repeated until the number of iterations reaches a predefined success rate for finding a model constructed with an uncontaminated subset [1].
Many algorithms have been developed as the variants of RANSAC [3][4][5][6][7][8][9][10][11][12][13][14].MLESAC (maximum likelihood estimation sample consensus) by Torr and Zisserman [8] adopts the same sampling strategy as RANSAC to generate putative solutions, but chooses the solution to maximize the likelihood rather than just the number of inliers.Chum and Matas proposed locally optimized RANSAC (LO-RANSAC) and optimal randomized RANSAC to improve the hypothesis generation step, using just the inlier set of the best current model [3,11,16].Nister proposed preemptive RANSAC for real-time application.A fixed number of hypotheses are generated with half of them eliminated each iteration, thus the iteration is terminated in a fixed time [4].Capel developed a statistical bail-out test for RANSAC that permits the scoring process be terminated early and saves computation cost [10].Wang and Luo introduced PURSAC (purposive) to avoid random searching in RANSAC using discriminative information from a dataset and analysing the noise-model relationship [12].Uncertainty RANSAC incorporates uncertainty of samples as outliers for reducing the number of iterations [13].A deterministic RANSAC approach also is introduced to estimates the probability of a match to be correct [14] .
All these methods developed on the basis of RANSAC, however, share some common weaknesses.As the quality of model fitting cannot be assessed properly, the condition for terminating the iterative model searching lacks a reliable criterion.Their modelling solution is imprecision and may far from the optimal one unless using huge number of iterations.Generally they are unable to handle multi model situation.This is mainly due to that most of them are focusing on speeding up model searching but not on these weaknesses.
For most model fitting tasks, two types of measurement errors should be considered: small errors (noise) and blunders (outliers) [15].RANSAC and its variants mainly handle the outliers but dismiss the sampling noise.Our research in this paper considers both outliers and noise with statistical analysis for sample consensus (SASAC), which is able to tackle all the weaknesses listed above.As sampling noise follows normal distribution in general, an σ rate is defined as an indicator of model fitting quality.It is also used in SASAC to speedup model searching by timely terminating the process.Iterative least square is integrated into a modelling searching scheme for efficient and reliable model estimation [17].A general approach for multi model fitting is also proposed.SASAC can be combined with RANSAC and its variants, and dramatically improve the performance of original algorithms.

Improving RANSAC for Efficient and Precise Model Fitting with Statistical Analysis
Yunliang Zhang a , Hengyuan Tian b , Yanzi Deng a and Jianguo Wang b * a Xi'an University, China b University of Technology Sydney (UTS), Australia The rest of the paper is organized as follows: Section II reviews the principle of RANSAC with statistical analysis.Section III introduces the scheme of SASAC, and describes its performance for single model fitting.Section IV explains the approach in SASAC for multi model fitting.Experiments with different outlier rate, different inlier threshold for linear and quadratic functions are presented in Section V, to demonstrate the effectiveness and robustness of the proposed algorithm.Conclusion and discussions appear in the last section.

II. STATISTICAL ANALYSIS OF RANSAC
RANSAC is a successful algorithm in model fitting with outlier contaminated data.It follows a simple assumption: a small set of data without outlier will construct a correct model with most inliers, and it can be found from some randomly selected hypotheses [2].RANSAC has four major steps: 1) randomly select a subset of samples from all the samples to fit a model hypothesis; 2) compute the distance of all other samples to this model; 3) construct an inliers set with a predetermined inlier threshold; 4) compare the number of inliers to the highest one so far and store the better one.The steps 1) to 4) are repeated until a preset number of iterations are reached.Then the model with the maximum number of inliers is selected and its inliers are used for model estimation.Thus RANSAC is a stochastic algorithm without deterministic guarantees of finding the global maximum of a likelihood [8].
Assuming all the samples have same outlier ratio ε, and ignoring the impact of sampling noise, RANSAC follows a random sampling paradigm.Fundamentally it is a stochastic algorithm without deterministic guarantees of finding the global maximum of the likelihood.A success rate p is the level of confidence of finding a consensus subset, which is a function of ε, the number of iterations to be conducted N and the number of samples in a subset s [2].The success rate here in (1) simply means that a subset of samples selected by RANSAC are all from inliers.As observed, RANSAC is a probabilistic method and is nondeterministic as it just selects the best fit in N iterations that are randomly selected and are different on each run.
For the sake of robustness, in practical implementations N is usually multiplied by a factor of ten, which increases very much computational costs Error!Reference source not found.. Without prior knowledge of ε in general, it is estimated adaptively during the iteration in RANSAC.
Bearing the natural principle of RANSAC, the lower part of the Equation (1) is fixed by ε of a dataset (usually unknown) and the number of samples in a subset s (decided by the model to fit).If one wants to get high success rate (p) of selecting a subset of samples all from inliers, the number of iterations (N) will increase exponentially.
In practice, however, sampling always has noise and it should follow normal distribution as Equation (2).The density of the normal distribution ρ is decided by the mean μ and standard deviation σ.The analysis of sampling noise against model hypotheses in PURSAC shows that even if a consensus subset all from the inliers, due to the sampling noise and degenerate configurations, the model hypothesis may be different and far from the optimal one [12].A semi-purposive subset selection is proposed in PURSAC to reduce the effect of measurement noise for model fitting.
The searching result of RANSAC is a model hypothesis with the most number of inliers among all the hypotheses tried in the iterations conducted.This result is unreliable to reach optimal sample consensus, even if the number of iterations is more than the N calculated with the Equation (1).A proper indicator is needed to tell the quality of a model fitting with RANSAC.
RANSAC requires some pre-know information about a dataset for setting its parameters, such as the outlier ratio ε and the inlier threshold τ to select inliers.However, in practice this information is generally not available, and wrongly estimated parameters in RANSAC will either reach unreliable model estimation or increase the computation cost unnecessarily.
Let us investigate the line fitting example in the original RANSAC paper [2].As showing the Figure 1, two types of measurement errors (noise and outlier) exist in the sample points.By randomly selecting a subset of samples (two points for line fitting) and counting inliers with a proper threshold τ, RANSAC can find a line model with the highest number of inliers after a certain number of iterations.However, this line model is unlikely to be one close to the correct one, and can be very different in another attempt, unless the number of interactions is so large that can cover all the possibilities of selecting subset of samples.This nature fact of RANSAC makes the model fitting inaccurate and unrepeatable.Due to measurement noise, model hypotheses selected by RANSAC with limited number of iterations usually cannot fit a model precisely, as illustrated in Figure 1.It is safe to conclude that RANSAC is only effective in removing measurement outliers but is inadequate of handling measurement noise.
Figure 1.Line fitting example.RANSAC and its varents MLESAC and PURSAC find their optimal models after a number of interations, which are dervise and far from the real line model, while SASAC is able to find a much better result that is very close to the real model.

III. SASAC SCHEME
The limitations of RANSAC mentioned above are handled in this paper by in-depth statistical analysis.When a dataset has both noised inliers and outliers, the inliers roughly follow normal distribution, and the outliers are generally distributed as white noise.Figure 2 shows an example of the distance distribution of all the sample points to a line.
where N DE and N DEF is the number of inliers with a preset inlier threshold τ and a double sized threshold respectively.The σ rate in the two cases in the Figure 2 is so different that it can be used to distinguish model fitting results easily.The samples around the red rectangles follow normal distribution for correct model fitting.The corresponding σ rate value is positive, generally greater than a high threshold and must greater than a low threshold.The orange rectangles are around white noise peak, which could also be picked by RANSAC.Its corresponding σ rate can either be positive or negative, and must less than the high threshold and generally lesser than the low threshold.In this way the σ rate value can be used to assess the quality of model fitting.
A model with σ rate greater than the high threshold must be a correct one, while the one with σ rate lesser than the low threshold must be an incorrect one.Only when the σ rate value falls between the two thresholds, additional processing is needed to distinguish a model fitting results.The value of two σ rate thresholds is largely determined by the pattern of real samples distribution and its relationship to the inlier threshold, which will be investigated by purposely designed experiments in the Section V.B.

A. SASAC strategies
Unlike RANSAC is barely attempt to find a model with most inliers among a number of test model hypotheses, which may or may not close to the globe optimal model, the proposed SASAC consists of a set of steps to overcome the mentioned limitations of RANSAC: Step 1) no matter how much is the outlier ratio ε of a dataset, just run RANSAC or its variants for small number iterations (ten or so).The model with the maximum number of inliers is selected as an initial estimation; Step 2) the selected inliers are used for model estimation with iterative least square.The iteration in it stops when the number of inliers no longer increase, which means either an optimal model consensus or just a white noise peak; Step 3) evaluate the quality of model fitting with σ rate calculated by Equation ( 3), and compare it to two thresholds.If σ rate > high σ rate threshold, SASAC stops and the result provides an optimal model estimation.
Step 4) Otherwise repeat the iteration from Step 1) to Step 3) with an increased number of iterations in Step 1), until the σ rate reaches the high threshold or stops increasing anymore and exceeds the low threshold.
Step 5) conduct multi model fitting if needed by removing all the inliers of previous model and searching for next model.As the inliers of other models should present as outliers or noise to the present model, and removal of them should not affect current model estimation.
As only a very small number of iterations are conducted in the Step 1) initially, it is no need to know outlier ratio ε, which is used to decide the number of iterations in RANSAC and most of its variants.If an initially selected model similar to the correct one, which may be constructed by a subset with inliers and outliers, its inliers tend to follow the normal distribution.The following steps in SASAC are able to find the optimal model using iterative least square and to calculate σ rate as a confident indicator.
RANSAC and its variants repeatedly performs two simple steps: hypothesis generation and evaluation, and select the best hypothesis from all the tries.They miss the process of searching for the globe optimal model, without an indicator for the quality of model fitting.By statistical analysis sample consensus, SASAC provides an effective strategy to verify model fitting quality and to search for the globe optimal.

B. SASAC for single model fitting
The performance of SASAC is evaluated and compared to three other methods by conducting a Monte Carlo test.A dataset with 70% outlines is tested for 1,000 runs for each algorithm.The results in the Table 1 show that the SASAC has much higher line similarity rate than that of other algorithms, and its standard deviation (STD) is much smaller than the others' one.These demonstrate that SASAC can achieve precise model fitting with high reliability.The test results show σ rate has high correlation with the line similarity rate, which proves that it is a reliable indicator for the quality of model fitting.
Same test has been conducted for quadratic function model fitting, which needs three points for model hypotheses.The similarity rate for this function is the ratio of common inliers of the real model and the final model against the number of the inliers of the real model.The dataset for quadratic function model fitting has 8,000 points totally the outlier ratio is about 77%.  Figure 3 shows the test results of quadratic function model fitting for ten runs.The green curves are the initial models estimated by PURSAC, and the red ones are by SASAC based on these initial models.The green curves in the figure are diverse and far from the true model, indicating that models estimated by RANSAC with limited number of iterations are generally neither precise nor reliable.The red curves are much more constant and close to the real model presented by the black curve.The test results proves that SACAC can save computation cost by reducing the number of iterations needed for RANSAC and its variants to get an initial model fitting, and improve the quality of model fitting at the same time by applying the strategies for optimal model estimation.
The key idea behind SASAC is to quantitively measure the quality of a model fitting and to optimize the model searching procedure with the measurement.SASAC counts both outliers and measurement noise that are naturally exist in real dataset, and uses the statistical property of them to evaluate model fitting result and to accelerate the process.It is worth to mention that the principle of SASAC is based on the statistical analysis of samples for model fitting, and it works very well for large dataset.

IV. SASAC FOR MULTI MODEL FITTING
It is a challenging task for RANSAC and its variants if a dataset has multi models and it is expected to find all the models in the dataset.The last step in the SASAC algorithm is designed to conduct multi model search.This section introduces how SASAC works for multi model fitting and applies it to both line fitting and quadratic function model fitting.
For a dataset with multi models, each model should have certain number of inliers.As the inliers of other models in the dataset present as outliers or noise to current model, removal of them should not affect current model estimation.Based on this fact, applying SASAC to the dataset will find one of the models.The model has more number of inliers has the highest potential to be found first.Then all the inliers of the model are removed from the dataset and the process for searching other models is conducted.This process continues until all the models have been found.

A. Multi model line fitting
First let us apply SASAC for multi model line fitting.A dataset with two line models is generated.Among totally 1,000 points, there are about 200 inliers for each of the line, which means the outlier rate for each line is 80%. Figure 4 shows the multi-model fitting result with RANSAC and SASAC for ten runs.The red and green points in the figure indicate the inliers for two different line models respectively.It is obvious the red lines estimated by SASAC are much more constant and closer to the real model than the blue lines estimated by RANSAC.
Table III is the Monte Carlo test results of multi-line model fitting for 1,000 runs.The results clearly show that SASAC has much better performance than RANSAC and its variants.The similarity rate of SASAC is very close to 1, which means a perfect model fitting.The STD of the SASAC data is also much better than that of RANSAC and its variants, indicating that the results for SASAC is much more precise, consistent and reliable.The multi model fitting strategy in SASAC Step 5) works very well for both multi line fitting and multi curve fitting.As the experiment results shown in Table I, II, III and IV, SASAC can achieve precise and reliable multi model fitting as good as single model fitting for both line and curve.This magnificent performance is accomplished with a very small number of iterations, which means a very low computation cost that is critical for many real-time applications.
More comprehensive experiments are introduced in the next section aiming to evaluate the performance of SASAC with different datasets, and to investigate its characteristics with different parameter setting.

V. EXPERIMENTAL RESULTS
Several experiments have been conducted for evaluating the property and performance of the proposed SASAC algorithm from different aspects.The Subsection A introduces test results for a set of datasets with different outlier rate, and gives the evaluation about the performance of SASAC and corresponding computation cost.Then in the Subsection B the setting of different inlier threshold τ and this impact to the performance of SASAC is analysed with purposely designed experiments.The Subsection C focuses on tests with different multi-model datasets.

A. Computation cost for different outlier rate
The number of iterations calculated by Equation ( 1) gives the relation between the outlier ratio ε and required number of iterations N for a given success rate p and the number of samples in a subset s for RANSAC.The N calculated here is just the number of iterations required for RANSAC to have a chance that all the samples in a subset s are liners with success rate as p.However, due to measurement noise, a model generated with an all inliers' subset may be quite different with the real model.In addition, even if the success rate p is set to 99.9% it may be still not good enough to need the requirement of many applications.
Table V lists calculation results for N using Equation (1) with different s and ε, and a p fixed at 99.9%.The data in the table shows that N is increased drastically as the outlier ratio ε increased, especially when s is four or more.By applying SACAS, however, the relation between N and ε does not follow Equation (1) anymore.N in each run is much less than the one calculated in Table V, and mainly depends on the quality of initial model fitting result, which is decided by the initial estimation in the SASAC Step 1).Table VI is 100 runs' Monte Carlo test results using SASAC and RANSAC combination for line fitting with different outlier ratio ε ranged from10% to 90%.
The results show that precise model estimation can always be achieved by SASAC, without any parameters adjustment for a large range of outlier ratio, using a much smaller number of iterations than the one in Table V.As the outlier ratio increasing, the required similarity rate of RANSAC estimated initial model is decreasing with a small increase of number of iterations N. At the same time, the number of iterations needed for SASAC n is increasing for precise model fitting.This fact indicates that when the outlier ratio increasing, SASAC still can achieve precise model estimation using an initial model estimated by RANSAC or its variants with a lower similarity rate φ.The cost is the slightly increase of n and N, the number of iterations for SASAC and RANSAC.

B. Different inlier threshold
The inlier threshold τ selected in RANSAC, SACAS and other similar model fitting algorithms is a statistical parameter that reflects the distribution pattern of a model's inliers, and is basically decided by the measurement noise.The experiment designed in this subsection aims to investigate how sensitive each algorithm's performance against the selection of their inlier threshold.
A dataset with fixed sample distribution and outlier ratio is used in the experiment.The inliers follow normal distribution with a known STD.As shown in Table VII, the performance of RANSAC and SASAC is examined by selecting the inlier threshold as a quarter, half, 1, 2, 4 and 8 times of the inliers' STD.The results in the table are presented as the average/STD of a multi-run Monte Carlo test for line fitting.It is noticed in the Table VII that when τ ≤ STD both N and n tend to be large, which means more computation cost, and the σ rate is relative small.When τ ≥ STD, n and σ rate tend to be insensitive to the change of τ; N and the required similarity rate φ of the initial model estimation tend to be small with τ increasing.This fact can be explained as a large τ results a large searching step in SASAC, it just needs rougher initial model estimation.However, if τ ≥ 8xSTD, the similarity rate φ will decline and fail to achieve a precise model estimation.
The facts mentioned above reflect the statistical property of normal distribution due to sampling noise, and the model searching strategy applied in the proposed SASAC algorithm.Considering both model fitting precision and corresponding computation cost, the best selection of τ is STD ≤ τ ≤ 4xSTD.In practical applications, sampling noise is usually measurable or can be estimated, thus the inlier threshold can be properly selected.

C. Different multi model dataset
A set of multi model datasets with different outlier ratio are generated to evaluate the performance of SASAC for multi model fitting.Table VIII is the test results for multi model datasets with different outlier ratio.SASAC is combined with RANSAC and for all the datasets the τ is set to around 2xSTD.The results are the average of 100 run and presented in average/STD.Similar to the results for single model dataset, SASAC can get very precise and reliable model estimation with a small number of iterations.VI.CONCLUSION This paper introduces a precise and efficient model fitting algorithm named as SASAC, which is based on statistical analysis for sample consensus.As normal distribution of samples with noise is true in most cases, an indicator σ rate is defined.It is able to check the quality of a model fitting during the model searching process.As the core of SASAC, a search scheme based on σ rate and iterative least square is proposed for optimal model estimation.SASAC can use a rough model estimation from any algorithms to find a precise model fitting.It works very well for all the range of outlier ratio.It especially has more advantage for dataset with large outlier ratio in the way that the number of iterations increased is much smaller than that of other model fitting algorithms needed.This means precise and timely model estimation can always be achieved with SASAC no matter how heavily the data is contaminated.As the last step in SASAC, a general approach is proposed to handle multi model situation, which can provide reliable estimations to all the models.SASAC can be combined with RANSAC and its variants to dramatically improve their model fitting performance.Extensive experiment results for line fitting and curve fitting demonstrate that SASAC can significantly improve model fitting precision and reliability, and reduce computation cost at the same time.SASAC has an indicator about the quality of a model fitting result so as to avoid wrongly estimated model.Further investigation will be conducted to apply SASAC for feature based visual odometry.It is expected to improve the accuracy and robustness of outliers' removal, resulting in a more precise and efficient model fitting solution.

Figure 2 .
Figure 2. Inliers' normal distribution and outliers as white noise It is found that only the noised inliers of sample consensus will follow normal distribution.This property is used to define an σ rate with Equation (3), and to develop iterative model fitting algorithm SASAC for efficient, precise and reliable model estimation.σ rate =   − 2 /2 the distance between the two endpoints of the real line model.The item ∑  F STU is the sum of the vertical distance between corresponding endpoints of the two lines.The line similarity rate φ ranges from 1 (exactly same) to 0 (totally different).As long as the real line model is not a vertical line, Equation (4) can properly indicate two lines' similarity.

Figure 5
Figure 5 is ten runs test results of multi model curve fitting with RANSAC and SASAC.The two black curves in the figure are the real models; green and red curves indicate the models estimated by MLESAC and SASAC respectively.It is obvious the red curves are much more constant and close to the real model than the blue curves.

Figure 5 .
Figure 5. Multi curve fitting with SASAC Table I shows line fitting tests results for SASAC combining with RANSAC, MLESAC and PURSAC respectively.The σ rate in Table I is calculated by the Equation (3), and the similarity of line fitting against the real model is calculated by the Equation (4).

TABLE I .
MONTE CARLO TEST RESULTS FOR LINE FITTING Table II is the Monte Carlo test results for quadratic function model fitting.The test results show that the strategies proposed in SASAC work very well not only for line fitting but also for quadratic function model fitting.It is noticed that SASAC shows more advantage in curve fitting than line fitting, in term of efficiency.This indicates that SASAC can provide more benefit in complicated model fitting.Its performance could be further improved if a better strategy for the globe optimal model estimation could be applied.

TABLE II .
MONTE CARLO TEST RESULTS FOR CURVE FITTING

TABLE III .
MONTE CARLO TEST RESULTS FOR MULTI-LINE FITTING The dataset contains 8,000 points and the outlier ratio for each model is about 80%.SASAC with multi models searching function is applied for model fitting.Table IV is the Monte Carlo test results of multi curve model fitting.Both the σ rate and similarity rate φ for SASAC are larger and more consistent comparing to than that of RANSAC.

TABLE IV .
MONTE CARLO TEST RESULTS FOR MULTI-CURVE FITTING

TABLE V
Table VI that SASAC can provide precise and reliable model estimation for the whole range of outlier ratio; and the required number of iterations for SASAC to achieve this is much less than other model fitting algorithms' requirement.Comparing with RANSAC and other algorithms proposed previously, SASAC can achieve much better model fitting estimation with much smaller number of iterations.SASAC has a big improvement in model estimation at precision, reliability and efficiency, which are critical for most applications.

TABLE VI .
MONTE CARLO TEST FOR DIFFERENT OUTLIER RATIO

TABLE VII .
TEST RESULTS FOR DIFFERENT IINLIER THRESHOLD

66/0.01 1.00/0.00 4xSTD 10.50/2.19 0.85/0.15
The results in TableVIIindicate that the similarity rate φ for both RANSAC and SASAC are not much sensitive to the inlier threshold τ selection.Both can reach a reasonable high similarity rate in a large range of τ.For SASAC, its φ declines only when τ ≤ STD/4 or τ ≥ 8xSTD.

TABLE VIII .
TEST RESULTS FOR DIFFERENT MULTI MODEL DATASET