Data Cleaning Model for XML Datasets using Conditional Dependencies
##plugins.themes.bootstrap3.article.main##
Data Cleaning as an essential phase to enhance the overall quality used for decades with different data models, the majority handled a relational dataset as the most dominant data model. However, the XML data model, besides the relational data model considered the most data model commonly used for storing, retrieving, and querying valuable data. In this paper, we introduce a model for detecting and repairing XML data inconsistencies using a set of conditional dependencies. Detecting inconsistencies will be done by joining the existed data source with a set of patterns tableaus as conditional dependencies and then update these values to match the proper patterns using a set of SQL statements. This research considered the final phase for a cleaning model introduced for XML datasets by firstly mapping the XML document to a set of related tables then discovering a set of conditional dependencies (Functional and Inclusions) and finally then applying the following algorithms as a closing step of quality enhancement.
Downloads
References
-
S. Juddoo, ?Overview of data quality challenges in the context of Big Data,? in 2015 International Conference on Computing, Communication and Security, ICCCS 2015, 2016, pp. 1?9.
Google Scholar
1
-
L. Bedgood, ?How Much is Dirty Data Costing You?,? 2015. [Online]. Available: https://www.linkedin.com/pulse/how-much-dirty-data-costing-you-larisa-bedgood/. [Accessed: 16-Jan-2016].
Google Scholar
2
-
Z. Abedjan, L. Golab, and F. Naumann, ?Data profiling - A tutorial,? Proc. ACM SIGMOD Int. Conf. Manag. Data, vol. Part F127746, pp. 1747?1751, 2017.
Google Scholar
3
-
S. Grijzenhout and M. Marx, ?The quality of the XML Web,? J. Web Semant., vol. 19, pp. 59?68, 2013.
Google Scholar
4
-
W. Fan, F. Geerts, and X. Jia, ?A revival of integrity constraints for data cleaning,? Proc. VLDB Endow., vol. 1, no. 2, pp. 1522--1523, 2008.
Google Scholar
5
-
M. ?virec and I. Ml?nkov?, ?Efficient Detection of XML Integrity Constraints Violation,? Commun. Comput. Inf. Sci., vol. 293 PART 1, pp. 259?273, 2012.
Google Scholar
6
-
H. Hamrouni, Z. Brahmia, and R. Bouaziz, ?An Efficient Approach for Detecting and Repairing Data Inconsistencies Resulting from Retroactive Updates in Multi-Temporal and Multi-version XML Databases,? in New Trends in Database and Information Systems II, vol. 312, Cham: Springer, 2015, pp. 135?146.
Google Scholar
7
-
Z. Tan and L. Zhang, ?Improving XML Data Quality with Functional Dependencies,? no. 60603043, pp. 450?465, 2011.
Google Scholar
8
-
S. Flesca, F. Furfaro, S. Greco, and E. Zumpano, ?Repairs and Consistent Answers for XML Data with Functional Dependencies,? in Database and XML Technologies, Springer, 2003, pp. 238?253.
Google Scholar
9
-
S. Flesca, F. Furfaro, S. Greco, and E. Zumpano, ?Querying and repairing inconsistent XML data,? Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3806 LNCS, pp. 175?188, 2005.
Google Scholar
10
-
Z. Tan and L. Zhang, ?Improving XML Data Quality with Functional Dependencies,? in International Conference on Database Systems for Advanced Applications, 2011, no. 60603043, pp. 450?465.
Google Scholar
11
-
I. Ml?nkov? and M. Ne?ask?, ?Heuristic Methods for Inference of XML Schemas: Lessons Learned and Open Issues,? Informatica, vol. 24, no. 4, pp. 577?602, 2013.
Google Scholar
12
-
T. Pankowski, ?Reconciling inconsistent data in probabilistic XML data integration,? Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5071 LNCS, no. 1, pp. 75?86, 2008.
Google Scholar
13
-
Z. Tan and L. Zhang, ?Repairing XML functional dependency violations,? Inf. Sci. (Ny)., vol. 181, no. 23, pp. 5304?5320, 2011.
Google Scholar
14
-
Gratner, ?Forecast: Data quality tools, worldwide, 2006-2011,? 2007. [Online]. Available: http://www.gartner.com/technology/home.jsp.
Google Scholar
15
-
M. Hakawati, P. Saad, N. Sabri, Y. Yacob, R. B. Ahmad, and M. S. Salim, ?XML integrity constraints, What?s next?,? J. Theor. Appl. Inf. Technol., vol. 92, no. 2, pp. 365?371, 2016.
Google Scholar
16
-
L. T. H. Vo, J. Cao, and W. Rahayu, ?Discovering conditional functional dependencies in xml data,? in Conferences in Research and Practice in Information Technology Series, 2011, vol. 115, no. 5, pp. 143?152.
Google Scholar
17
-
M. Hakawati, Y. Yacob, R. A. A. Raof, A. Amir, J. M. Mohammed, and E. S. Al-Hodiani, ?Conditional inclusion dependencies for improving xml data consistency,? J. Theor. Appl. Inf. Technol., vol. 95, no. 17, pp. 4221?4235, 2017.
Google Scholar
18
-
S. Ceri, F. D. I. Giunta, P. L. Lanzi, and P. Milano, ?Mining Constraint Violations,? vol. 32, no. 1, 2007.
Google Scholar
19
-
T. L. Saito and S. Morishita, ?Relational-style XML query,? Proc. ACM SIGMOD Int. Conf. Manag. Data, pp. 303?314, 2008.
Google Scholar
20
-
R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, 7th Editio. Pearson Education, 2016.
Google Scholar
21
-
M. Karlinger, M. Vincent, and M. Schrefl, ?Inclusion dependencies in XML: Extending relational semantics,? Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5690 LNCS, no. 09, pp. 23?37, 2009.
Google Scholar
22
Most read articles by the same author(s)
-
Mohammed Ragheb Hakawati,
Yasmin Yacob,
Amiza Amir,
Jabiry M. Mohammed,
Khalid Jamal Jadaa,
Discovering XML Conditional Dependencies for Data Quality Issues , European Journal of Electrical Engineering and Computer Science: Vol. 4 No. 1 (2020)