Identifying Stylometric Characteristics of Domain Specific Texts Using Classification Algorithms: A Study of Library Science Articles Published in 2020

Mousumi Saha; Saptarshi Ghosh

doi:10.17821/srels/2023/v60i3/171027

Vol 60, No 3 (2023)
Pages: 157-165
Published: 2023-06-01
https://doi.org/10.17821/srels%2F2023%2Fv60i3%2F171027
Cited by 0 articles

Identifying Stylometric Characteristics of Domain Specific Texts Using Classification Algorithms: A Study of Library Science Articles Published in 2020

Affiliations
1 Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734013, West Bengal, India

Academic writing has played an essential role in communicating the cognitive aspects of the human mind. Natural Language Processing (NLP) tools enable us to examine linguistic knowledge. However, writing patterns and applicable linguistic characteristics differ geographically. The study’s primary purpose is to understand the global writing pattern and linguistic diversities of research articles in the LIS domain. The corpus was identified from four SCOPUS-enrolled open-access libraries and information science journals. The journals published in India and outside India were selected for the study in 2020. The syntactic complexity in 147 text documents was measured using the Tool for the Automatic Analysis of Syntactic Sophistication and Complexity (TASSAC). The corpus was further examined using the Structural Equation Model (SEM) to determine the causal relationship among independent variables such as syntax features and readability scores. The results depict the differences in the patterning of syntactic features at both the global and national levels. Furthermore, the study allows us to see how linguistic diversity is underplayed in research writings and helps to understand writing patterns through cross-country comparisons. Furthermore, the paper employs model-based reasoning to identify global and national latent variables.

Keywords

Corpus Linguistic, Noun Phrase Complexity, Readability, Structural Equation Model, Syntactic Sophistication.

User

About The Authors

Mousumi Saha
Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734013, West Bengal
India

Saptarshi Ghosh
Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734013, West Bengal
India

Notifications

Information

Journal Content
Browse

Bentler, P. M. and Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88, 588-606. https://doi.org/10.1037/0033-2909.88.3.588

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. 2nd ed. New York: Routledge, p. 206.

de Ruijsscher, J. A. (2017). Cultural influences on the report readability of US-listed Asian Companies.

Dubay, W. H. (2004). The principles of readability. Costa Mesa: Impact information, p. 3.

Eslami, H. (2014) The effect of syntactic simplicity and complexity on the readability of the text. Journal of Language Teaching and Research, 5, 1185-1191. https://doi.org/10.4304/jltr.5.5.1185-1191

Gómez-Adorno, H., Posadas-Duran, J. P., Ríos-Toledo, G., Sidorov, G. and Sierra, G. (2018). Stylometry-based approach for detecting writing style changes in literary texts. Computacion y Sistemas, 22, 47-53. https://doi.org/10.13053/cys-22-1-2882

Haegeman, L. (2001). International encyclopedia of the social and behavioral sciences. In Linguistics: Theory of Principles and Parameters, 8957-8961. https://doi.org/10.1016/B0-08-043076-7/02956-9

Hair, J. F., Ringle, C. M., and Sarstedt, M. (2011). PLS-SEM: Indeed, a silver bullet. Journal of Marketing Theory and Practice, 19: 139-152. https://doi.org/10.2753/MTP1069-6679190202

Hamat, A., Jaludin, A., Mohd-Dom, T. N., Rani, H., Jamil, N. A., and Aziz, A. F. A. (2022). Diabetes in the News: Readability analysis of Malaysian diabetes corpus. International Journal of Environmental Research and Public Health, 19, 6802. https://doi.org/10.3390/ijerph19116802

Han, N., Hayashi, K. and Miyao, Y. (2020). Analyzing word embedding through structural equation modeling. In LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings, May 2020, Marseille, France, p. 1823-1832.

Henseler, J., Ringle, C. M. and Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the Academy of Marketing Science, 43, 115-135. https://doi.org/10.1007/s11747-014-0403-8

Hu, L.-T. and Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424-453. https://doi.org/10.1037/1082-989X.3.4.424

Jitpraneechai, N. (2019). Noun phrase complexity in academic writing: A comparison of argumentative English essays written by Thai and Native English University students. LEARN Journal: Language Education and Acquisition Research Network, 12, 71-88.

Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine-grained indices of syntactic complexity and usage-based indices of syntactic sophistication.

Larsson, T., Plonsky, L. and Hancock, G. R. (2021). On the benefits of structural equation modeling for corpus linguists. Corpus Linguistics and Linguistic Theory, 17, 683-714. https://doi.org/10.1515/cllt-2020-0051

López-Escobedo, F., Méndez-Cruz, C.-F., Sierra, G. and Solórzano-Soto, J. (2013). Analysis of stylometric variables in long and short texts. Procedia - Social and Behavioral Sciences, 95, 604-611. https://doi.org/10.1016/j.sbspro.2013.10.688

Maestre, M. D. L. (1998). Noun phrase “complexity” as a style marker: an exercise in stylistic analysis. Atlantis, 20, 91-105.

Park, J. R., Poole, E. and Li, J. (2022). Stylometric features in librarian’s responses to user queries: implications for user interaction in digital information services. Global Knowledge, Memory, and Communication, ahead-of-print(ahead-of-print). https://doi.org/10.1108/GKMC-03-2022-0055

Ringle, C. M., Wende, S. and Becker, J.-M. (2022). SmartPLS 4 (No. 4). SmartPLS GmbH.

Shrestha, N. (2021). Factor analysis as a tool for survey analysis. American Journal of Applied Mathematics and Statistics, 9, 4-11. https://doi.org/10.12691/ajams-9-1-2

Staples, S., Egbert, J., Biber, D. and Gray, B. (2016). Academic writing development at the university level: Phrasal and clausal complexity across level of study, discipline, and genre. Written Communication, 33, 149-183. https://doi.org/10.1177/0741088316631527

von Glasersfeld, E. (1970). The problem of syntactic complexity in reading and readability. Journal of Literacy Research, 3, 1-14. https://doi.org/10.1080/10862967009546930

Abstract Views: 227

PDF Views: 2

Identifying Stylometric Characteristics of Domain Specific Texts Using Classification Algorithms: A Study of Library Science Articles Published in 2020

Abstract Views: 227 | PDF Views: 2

Authors

Mousumi Saha
Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734013, West Bengal, India

Saptarshi Ghosh
Department of Library and Information Science, University of North Bengal, Raja Rammohunpur – 734013, West Bengal, India

Abstract

Keywords

Corpus Linguistic, Noun Phrase Complexity, Readability, Structural Equation Model, Syntactic Sophistication.

References

DOI: https://doi.org/10.17821/srels%2F2023%2Fv60i3%2F171027

Username
Password
Remember me

Username
Password
Remember me

Journal of Information and Knowledge (Formerly SRELS Journal of Information Management)

Identifying Stylometric Characteristics of Domain Specific Texts Using Classification Algorithms: A Study of Library Science Articles Published in 2020

Subscribe/Renew Journal

Keywords

Identifying Stylometric Characteristics of Domain Specific Texts Using Classification Algorithms: A Study of Library Science Articles Published in 2020

Authors

Abstract

Keywords

References