Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

An Unsupervised Header Independent Approach Towards Subject Column Detection in Tables


Affiliations
1 Department of Computer Science and Engineering, Sri Sai Ram Engineering College, India
     

   Subscribe/Renew Journal


Subject columns are the important columns that help infer the correct subject matter of the table. The main challenging problem is detecting appropriate subject columns in tables with more than the same. Existing approaches restricted to identification of only one subject column in tables with more than one subject column. With this, it is not possible to infer the correct subject matter of the table. In case of subject column detection, the existing approaches requires table information such as table headers, additional evidences about the table from web pages and also training in prior with a labeled set of tables. To solve these issues, in this paper, we proposed a simple header independent semantic based Concept-Voting Subject Column Detection (CVSCD) algorithm. The proposed algorithm identifies possible subject columns in table with more than one subject column, which provides a way to infer table’s correct subject matter. Moreover, CVSCD is unsupervised and works for tables without any table information such as table caption, table headers etc. Experimental results have shown that our approach achieved better accuracy compared to the existing approaches on a corpus of tables extracted from web.

Keywords

Concept-Voting Subject Column Detection (CVSCD), Subject Column, Subject Matter, Table Headers.
Subscription Login to verify subscription
User
Notifications
Font Size

  • Table, Available at: https://en.wikipedia.org/wiki/Table
  • Wim H. Hesselink, “The Boyer-Moore Majority Vote Algorithm”, Available at: http://www.cs.rug.nl/~wim/pub/whh348.pdf.
  • Sreeram Balakrishnan et al., “Applying WebTables in Practice”, Proceedings of the Biennial Conference on Innovative Data Systems Research, pp. 1-6, 2015.
  • Chandra Sekhar, Thanapon Noraset, and Doug Downey, “Methods for Exploring and Mining Tables on WIKIPEDIA”, Proceedings of ACM Workshop on Interactive Data Exploration and Analytics, pp. 18-26, 2013.
  • Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu and Yang Zhang, “Webtables: Exploring the Power of Tables on the Web”, Proceedings of the VLDB Endowment, pp. 538-549, 2008.
  • Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin and Cong Yu, “Finding Related Tables”, Proceedings of ACM International Conference on Management of Data, pp. 817-828, 2012.
  • Dongwoo Kim, Haixun Wang and Alice Oh, “Context-Dependent Conceptualization”, Available at: http://uilab.kaist.ac.kr/research/IJCAI13/ijcai13_dongwoo_camera_ready.pdf.
  • Oktie Hassanzadeh, Michael J. Ward, Mariano Rodriguez-Muro and Kavitha Srinivas, “Understanding a Large Corpus of Web Tables through Matching with Knowledge Bases-An Empirical Study”, Available at: https://pdfs.semanticscholar.org/f3d7/550fcdf9c284874c05931ced2ffbcb2accc0.pdf.
  • J. Liang, Y. Xiao, Y. Zhang, S.W. Hwang and H. Wang, “Graph-Based Wrong is a Relation Detection in a Large-Scale Lexical Taxonomy”, Proceedings of 31st AAAI on Artificial Intelligence, pp. 1178-1184, 2017.
  • G. Limaye, S. Sarawagi and S. Chakrabarti, “Annotating and Searching Web Tables Using Entities, Types and Relationship”, Proceedings of the VLDB Endowment, pp. 1338-1347, 2010.
  • Y. Liu, K. Bai, P. Mitra and C.L. Giles, “Tableseer: Automatic Table Metadata Extraction and Searching in Digital Libraries”, Proceedings of 7th ACM/IEEE Joint Conference on Digital Libraries, pp. 91-100, 2007.
  • J. Park, H. Cho and S.W. Hwang, “Understanding Relations using Concepts and Semantics”, Proceedings of 3rd International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets, pp. 1-15, 2017.
  • Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Pasca, Warren Shen, Fei Wu, Gengxin Miao and Chung Wu, “Recovering Semantics of Tables on the Web”, Proceedings of the VLDB Endowment, pp. 528-538, 2011.
  • Jingjing Wang, Haixun Wang, Zhongyuan Wang and Kenny Q. Zhu, “Understanding Tables on the Web”, Proceedings of 31st International Conference on Conceptual Modeling, pp. 141-155, 2012.
  • Z. Wang, K. Zhao, H. Wang, H. Meng and J.R. Wen, “Query Understanding through Knowledge-based Conceptualization”, Available at: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/paper_msr.pdf, Accessed on 2015.
  • Wentao Wu, Hongsong Li, Haixun Wang and Kenny Q. Zhu, “Probase: A Probabilistic Taxonomy for Text Understanding”, Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 481-492, 2012.
  • Wen-Tau Yih, Ming-Wei Chang, Christopher Meek and Andrzej Pastusiak, “Question Answering using Enhanced Lexical Semantic Models”, Proceedings of 51st Annual Meeting of the Association for Computational Linguistics, pp. 1744-1753, 2013.
  • Y. Zhang, Y. Xiao, S.W. Hwang and W. Wang, “Entity Suggestion with Conceptual Explanation”, Proceedings of 26th International Joint Conference on Artificial Intelligence, pp. 4244-4250, 2017.

Abstract Views: 213

PDF Views: 2




  • An Unsupervised Header Independent Approach Towards Subject Column Detection in Tables

Abstract Views: 213  |  PDF Views: 2

Authors

K. Karpaga Priyaa
Department of Computer Science and Engineering, Sri Sai Ram Engineering College, India
A. Meena Kabilan
Department of Computer Science and Engineering, Sri Sai Ram Engineering College, India
C. Saranya
Department of Computer Science and Engineering, Sri Sai Ram Engineering College, India

Abstract


Subject columns are the important columns that help infer the correct subject matter of the table. The main challenging problem is detecting appropriate subject columns in tables with more than the same. Existing approaches restricted to identification of only one subject column in tables with more than one subject column. With this, it is not possible to infer the correct subject matter of the table. In case of subject column detection, the existing approaches requires table information such as table headers, additional evidences about the table from web pages and also training in prior with a labeled set of tables. To solve these issues, in this paper, we proposed a simple header independent semantic based Concept-Voting Subject Column Detection (CVSCD) algorithm. The proposed algorithm identifies possible subject columns in table with more than one subject column, which provides a way to infer table’s correct subject matter. Moreover, CVSCD is unsupervised and works for tables without any table information such as table caption, table headers etc. Experimental results have shown that our approach achieved better accuracy compared to the existing approaches on a corpus of tables extracted from web.

Keywords


Concept-Voting Subject Column Detection (CVSCD), Subject Column, Subject Matter, Table Headers.

References