##plugins.themes.bootstrap3.article.main##

Data Cleaning as an essential phase to enhance the overall quality used for decades with different data models, the majority handled a relational dataset as the most dominant data model. However, the XML data model, besides the relational data model considered the most data model commonly used for storing, retrieving, and querying valuable data. In this paper, we introduce a model for detecting and repairing XML data inconsistencies using a set of conditional dependencies. Detecting inconsistencies will be done by joining the existed data source with a set of patterns tableaus as conditional dependencies and then update these values to match the proper patterns using a set of SQL statements. This research considered the final phase for a cleaning model introduced for XML datasets by firstly mapping the XML document to a set of related tables then discovering a set of conditional dependencies (Functional and Inclusions) and finally then applying the following algorithms as a closing step of quality enhancement.

Downloads

Download data is not yet available.

References

  1. S. Juddoo, ?Overview of data quality challenges in the context of Big Data,? in 2015 International Conference on Computing, Communication and Security, ICCCS 2015, 2016, pp. 1?9.
     Google Scholar
  2. L. Bedgood, ?How Much is Dirty Data Costing You?,? 2015. [Online]. Available: https://www.linkedin.com/pulse/how-much-dirty-data-costing-you-larisa-bedgood/. [Accessed: 16-Jan-2016].
     Google Scholar
  3. Z. Abedjan, L. Golab, and F. Naumann, ?Data profiling - A tutorial,? Proc. ACM SIGMOD Int. Conf. Manag. Data, vol. Part F127746, pp. 1747?1751, 2017.
     Google Scholar
  4. S. Grijzenhout and M. Marx, ?The quality of the XML Web,? J. Web Semant., vol. 19, pp. 59?68, 2013.
     Google Scholar
  5. W. Fan, F. Geerts, and X. Jia, ?A revival of integrity constraints for data cleaning,? Proc. VLDB Endow., vol. 1, no. 2, pp. 1522--1523, 2008.
     Google Scholar
  6. M. ?virec and I. Ml?nkov?, ?Efficient Detection of XML Integrity Constraints Violation,? Commun. Comput. Inf. Sci., vol. 293 PART 1, pp. 259?273, 2012.
     Google Scholar
  7. H. Hamrouni, Z. Brahmia, and R. Bouaziz, ?An Efficient Approach for Detecting and Repairing Data Inconsistencies Resulting from Retroactive Updates in Multi-Temporal and Multi-version XML Databases,? in New Trends in Database and Information Systems II, vol. 312, Cham: Springer, 2015, pp. 135?146.
     Google Scholar
  8. Z. Tan and L. Zhang, ?Improving XML Data Quality with Functional Dependencies,? no. 60603043, pp. 450?465, 2011.
     Google Scholar
  9. S. Flesca, F. Furfaro, S. Greco, and E. Zumpano, ?Repairs and Consistent Answers for XML Data with Functional Dependencies,? in Database and XML Technologies, Springer, 2003, pp. 238?253.
     Google Scholar
  10. S. Flesca, F. Furfaro, S. Greco, and E. Zumpano, ?Querying and repairing inconsistent XML data,? Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3806 LNCS, pp. 175?188, 2005.
     Google Scholar
  11. Z. Tan and L. Zhang, ?Improving XML Data Quality with Functional Dependencies,? in International Conference on Database Systems for Advanced Applications, 2011, no. 60603043, pp. 450?465.
     Google Scholar
  12. I. Ml?nkov? and M. Ne?ask?, ?Heuristic Methods for Inference of XML Schemas: Lessons Learned and Open Issues,? Informatica, vol. 24, no. 4, pp. 577?602, 2013.
     Google Scholar
  13. T. Pankowski, ?Reconciling inconsistent data in probabilistic XML data integration,? Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5071 LNCS, no. 1, pp. 75?86, 2008.
     Google Scholar
  14. Z. Tan and L. Zhang, ?Repairing XML functional dependency violations,? Inf. Sci. (Ny)., vol. 181, no. 23, pp. 5304?5320, 2011.
     Google Scholar
  15. Gratner, ?Forecast: Data quality tools, worldwide, 2006-2011,? 2007. [Online]. Available: http://www.gartner.com/technology/home.jsp.
     Google Scholar
  16. M. Hakawati, P. Saad, N. Sabri, Y. Yacob, R. B. Ahmad, and M. S. Salim, ?XML integrity constraints, What?s next?,? J. Theor. Appl. Inf. Technol., vol. 92, no. 2, pp. 365?371, 2016.
     Google Scholar
  17. L. T. H. Vo, J. Cao, and W. Rahayu, ?Discovering conditional functional dependencies in xml data,? in Conferences in Research and Practice in Information Technology Series, 2011, vol. 115, no. 5, pp. 143?152.
     Google Scholar
  18. M. Hakawati, Y. Yacob, R. A. A. Raof, A. Amir, J. M. Mohammed, and E. S. Al-Hodiani, ?Conditional inclusion dependencies for improving xml data consistency,? J. Theor. Appl. Inf. Technol., vol. 95, no. 17, pp. 4221?4235, 2017.
     Google Scholar
  19. S. Ceri, F. D. I. Giunta, P. L. Lanzi, and P. Milano, ?Mining Constraint Violations,? vol. 32, no. 1, 2007.
     Google Scholar
  20. T. L. Saito and S. Morishita, ?Relational-style XML query,? Proc. ACM SIGMOD Int. Conf. Manag. Data, pp. 303?314, 2008.
     Google Scholar
  21. R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, 7th Editio. Pearson Education, 2016.
     Google Scholar
  22. M. Karlinger, M. Vincent, and M. Schrefl, ?Inclusion dependencies in XML: Extending relational semantics,? Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5690 LNCS, no. 09, pp. 23?37, 2009.
     Google Scholar


Most read articles by the same author(s)