Duplicate Elimination: Difference between revisions

From Algorithm Wiki
Jump to navigation Jump to search
(Created page with "== Problem Description== SQL does not eliminate duplicates implicitly. It allows to enter duplicate values on columns other than candidate key or if did not specified any keys. If the user wants to eliminate duplicate records, he has to use DISTINCT keyword in the query. Databases, therefore, can have duplicate entries. The problem deals with identifying and removing duplicates from a database. == Bounds Chart == 1050px ==...")
 
No edit summary
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Problem Description==
{{DISPLAYTITLE:Duplicate Elimination (Duplicate Elimination)}}
== Description ==  
 
SQL does not eliminate duplicates implicitly. It allows to enter duplicate values on columns other than candidate key or if did not specified any keys. If the user wants to eliminate duplicate records, he has to use DISTINCT keyword in the query.
SQL does not eliminate duplicates implicitly. It allows to enter duplicate values on columns other than candidate key or if did not specified any keys. If the user wants to eliminate duplicate records, he has to use DISTINCT keyword in the query.


Databases, therefore, can have duplicate entries. The problem deals with identifying and removing duplicates from a database.
Databases, therefore, can have duplicate entries. The problem deals with identifying and removing duplicates from a database.


== Bounds Chart ==
== Parameters ==  
[[File:Duplicate_EliminationBoundsChart.png|1050px]]
 
$n$: number of records
 
== Table of Algorithms ==


== Step Chart ==
{| class="wikitable sortable"  style="text-align:center;" width="100%"
[[File:Duplicate_EliminationStepChart.png|1050px]]
 
! Name !! Year !! Time !! Space !! Approximation Factor !! Model !! Reference


== Improvement Table ==
{| class="wikitable" style="text-align:center;" width="100%"
!width="20%" | Complexity Classes !! width="40%" | Algorithm Paper Links !! width="40%" | Lower Bounds Paper Links
|-
| rowspan="1" | Exp/Factorial
|
|
|-
|-
| rowspan="1" | Polynomial > 3
 
|
| [[Sorting based (Merge Sort) (Duplicate Elimination Duplicate Elimination)|Sorting based (Merge Sort)]] || 1964 || $O(n \log n)$ || $O(n)$ || Exact || Deterministic ||
|
|-
|-
| rowspan="1" | Cubic
| [[Sorting based (Merge Sort) + real-time elimination (Duplicate Elimination Duplicate Elimination)|Sorting based (Merge Sort) + real-time elimination]] || 1964 || $O(n \log n)$ ||  || Exact || Deterministic || 
|
|
|-
|-
| rowspan="1" | Quadratic
| [[BST Algorithm (Duplicate Elimination Duplicate Elimination)|BST Algorithm]] || 1999 || $O(n \log n)$ || $O(\log n)$ || Exact || Deterministic ||
| [ Priority Queue Algorithm (1976)]
|
|-
|-
| rowspan="1" | nlogn
| [[Priority Queue Algorithm (Duplicate Elimination Duplicate Elimination)|Priority Queue Algorithm]] || 1976 || $O(n^{2})$ || $O(n)$ || Exact || Deterministic || 
| [Sorting based [Merge Sort] (1964)]
|-
 
| [[Sorted Neighborhood Algorithm (SNA) (Duplicate Elimination Duplicate Elimination)|Sorted Neighborhood Algorithm (SNA)]] || 1998 || $O(n^{2})$ || $O(n)$ || Exact || Deterministic || [https://link.springer.com/article/10.1023/A:1009761603038 Time]
[Sorting based [Merge Sort] + real-time elimination (1964)]
|-
 
| [[Duplicate Elimination Sorted Neighborhood Algorithm (DE-SNA) (Duplicate Elimination Duplicate Elimination)|Duplicate Elimination Sorted Neighborhood Algorithm (DE-SNA)]] || 2002 || $O(n \log n)$ ||  || Exact || Deterministic || 
[BST Algorithm (1999)]
|-
| [[Adaptive Duplicate Detection Algorithm (ADD) (Duplicate Elimination Duplicate Elimination)|Adaptive Duplicate Detection Algorithm (ADD)]] || 2003 || $O(n^{3})$ || $O({1})$ || Exact || Deterministic || [https://dl.acm.org/doi/10.1145/956750.956759 Time]
|-
|}


[sorted neighborhood algorithm (1993)]
== Time Complexity Graph ==
[Duplicate elimination sorted neighborhood algorithm (DE-SNA) (2002)]


[https://dl.acm.org/doi/10.1145/956750.956759 adaptive duplicate detection algorithm (ADD) (2003)]
[[File:Duplicate Elimination - Time.png|1000px]]
|
|-
| rowspan="1" | Linear
|
|
|-
| rowspan="1" | logn
|
|
|-|}

Latest revision as of 09:10, 28 April 2023

Description

SQL does not eliminate duplicates implicitly. It allows to enter duplicate values on columns other than candidate key or if did not specified any keys. If the user wants to eliminate duplicate records, he has to use DISTINCT keyword in the query.

Databases, therefore, can have duplicate entries. The problem deals with identifying and removing duplicates from a database.

Parameters

$n$: number of records

Table of Algorithms

Name Year Time Space Approximation Factor Model Reference
Sorting based (Merge Sort) 1964 $O(n \log n)$ $O(n)$ Exact Deterministic
Sorting based (Merge Sort) + real-time elimination 1964 $O(n \log n)$ Exact Deterministic
BST Algorithm 1999 $O(n \log n)$ $O(\log n)$ Exact Deterministic
Priority Queue Algorithm 1976 $O(n^{2})$ $O(n)$ Exact Deterministic
Sorted Neighborhood Algorithm (SNA) 1998 $O(n^{2})$ $O(n)$ Exact Deterministic Time
Duplicate Elimination Sorted Neighborhood Algorithm (DE-SNA) 2002 $O(n \log n)$ Exact Deterministic
Adaptive Duplicate Detection Algorithm (ADD) 2003 $O(n^{3})$ $O({1})$ Exact Deterministic Time

Time Complexity Graph

Duplicate Elimination - Time.png