Lossless join decomposition

In database design, a lossless join decomposition is a decomposition of a relation r {\displaystyle r} into relations r 1 , r 2 {\displaystyle r_{1},r_{2}} such that a natural join of the two smaller relations yields back the original relation. This is central in removing redundancy safely from databases while preserving the original data.[1] Lossless join can also be called non-additive.[2]

Definition

A relation r {\displaystyle r} on schema R {\displaystyle R} decomposes losslessly onto schemas R 1 {\displaystyle R_{1}} and R 2 {\displaystyle R_{2}} if π R 1 ( r ) π R 2 ( r ) = r {\displaystyle \pi _{R_{1}}(r)\bowtie \pi _{R_{2}}(r)=r} , that is r {\displaystyle r} is the natural join of its projections onto the smaller schemas. A pair ( R 1 , R 2 ) {\displaystyle (R_{1},R_{2})} is a lossless-join decomposition of R {\displaystyle R} or said to have a lossless join with respect to a set of functional dependencies F {\displaystyle F} if any relation r ( R ) {\displaystyle r(R)} that satisfies F {\displaystyle F} decomposes losslessly onto R 1 {\displaystyle R_{1}} and R 2 {\displaystyle R_{2}} .[3]

Decompositions into more than two schemas can be defined in the same way.[4]

Criteria

A decomposition R = R 1 R 2 {\displaystyle R=R_{1}\cup R_{2}} has a lossless join with respect to F {\displaystyle F} if and only if the closure of R 1 R 2 {\displaystyle R_{1}\cap R_{2}} includes R 1 R 2 {\displaystyle R_{1}\setminus R_{2}} or R 2 R 1 {\displaystyle R_{2}\setminus R_{1}} . In other words, one of the following must hold:[4]

  • ( R 1 R 2 ) ( R 1 R 2 ) F + {\displaystyle (R_{1}\cap R_{2})\to (R_{1}\setminus R_{2})\in F^{+}}
  • ( R 1 R 2 ) ( R 2 R 1 ) F + {\displaystyle (R_{1}\cap R_{2})\to (R_{2}\setminus R_{1})\in F^{+}}

Criteria for multiple sub-schemas

Multiple sub-schemas R 1 , R 2 , . . . , R n {\displaystyle R_{1},R_{2},...,R_{n}} have a lossless join if there is some way in which we can repeatedly perform lossless joins until all the schemas have been joined into a single schema. Once we have a new sub-schema made from a lossless join, we are not allowed to use any of its isolated sub-schema to join with any of the other schemas. For example, if we can do a lossless join on a pair of schemas R i , R j {\displaystyle R_{i},R_{j}} to form a new schema R i , j {\displaystyle R_{i,j}} , we use this new schema (rather than R i {\displaystyle R_{i}} or R j {\displaystyle R_{j}} ) to form a lossless join with another schema R k {\displaystyle R_{k}} (which may already be joined (e.g., R k , l {\displaystyle R_{k,l}} )).[vague]

Example

  • Let R = { A , B , C , D } {\displaystyle R=\{A,B,C,D\}} be the relation schema, with attributes A, B, C and D.
  • Let F = { A B C } {\displaystyle F=\{A\rightarrow BC\}} be the set of functional dependencies.
  • Decomposition into R 1 = { A , B , C } {\displaystyle R_{1}=\{A,B,C\}} and R 2 = { A , D } {\displaystyle R_{2}=\{A,D\}} is lossless under F because R 1 R 2 = A {\displaystyle R_{1}\cap R_{2}=A} and we have a functional dependency A B C {\displaystyle A\rightarrow BC} . In other words, we have proven that ( R 1 R 2 R 1 R 2 ) F + {\displaystyle (R_{1}\cap R_{2}\rightarrow R_{1}\setminus R_{2})\in F^{+}} .[5][6]

References

  1. ^ Pohler, K (2015). "Lossless-Join Decomposition: applications in quantitative computing metrics". International Journal of Applied Computer Science. 21 (4): 190–212.
  2. ^ Elmasri, Ramez (2016). Fundamentals of database systems (Seventh ed.). Hoboken, NJ: Pearson. p. 461. ISBN 978-0133970777.
  3. ^ Maier, David (1983). The theory of relational databases (PDF). Computer Science Press. p. 101. ISBN 0-914894-42-0. Retrieved 16 August 2024.
  4. ^ a b Ullman, Jeffrey D. (1988). Principles of Database and Knowledge-base Systems (PDF) (1 ed.). Computer Science Press. p. 397. ISBN 0-88175188-X. Retrieved 16 August 2024.
  5. ^ "Lossless-Join Decomposition". Cs.sfu.ca. Retrieved 2016-02-07.
  6. ^ "www.data-e-education.com - Lossless Join Decomposition". Archived from the original on 2014-02-21. Retrieved 2014-02-12.