Бібліотечний вісник | bv.nbuv.gov.ua
Кузнєцов О. - Визначення індексів УДК нових надходжень в електронному вигляді для формування електронної бібліотеки програмними засобами, Заїка В. (2023)

  ARCHIVE (All issues) /     Content (2023, Issue 3)Ukrainian English

Kuznetsov Oleksandr, Zaika Victor

UDC code determination of new electronic receipts for the formation of an electronic library by means of software

Section: Libraries in the digital environment

Abstract: The purpose of the article is to propose a validation technique of the UDC index of library electronic documents accessions and to demonstrate its usage for the five electronic documents on economic topics (UDC index 331) based on the developed software tool "Text Analysis". Research methodology. The quantitative method of document content research is applied. To find documents (files) similar in content, the concept of the cosine measure of similarity was used and coefficients of the thematic direction, were calculated for each document. Text files were vectorized, that is, represented as vectors in a multidimensional space. For this purpose, different word forms were reduced to one lexeme and the number (or frequency) of lexeme usage in each document was calculated. Lexemes are interpreted as coordinates, and the frequency of use is interpreted as the value of the corresponding coordinate. After vectorization of the texts, the mathematical apparatus of analytical geometry was applied, and a numerical value - the coefficient of the thematic direction - was matched to the topic of each text document. Scientific novelty. For the first time, methods of content analysis, namely, quantitative analysis, were used to assess the reliability of the UDC index of a document, and a software tool was created, the use of which will help the systematizer to confirm or refute the UDC index of a dubious document without reading it. Conclusions. The author’s software tool and the proposed UDC correction technique can be used when creating repositories of electronic texts and will contribute to improving the quality of information search and content selection. When accumulating a certain number of electronic documents, thanks to the developed methodology, the UDC of a new text (receipt) can be determined automatically by the indicator of the coefficients of the thematic direction (close to one) of the new text and the corresponding corpus. The vector of coefficients of the thematic direction of the studied texts, their distribution according to the growth of the coefficients of the thematic direction, made it possible to identify a cluster - a group of texts with the same content. A reliable criterion is the value of the coefficient for a variable linear approximation, ideally a horizontal shelf on the graph of the distribution of the coefficients of the thematic direction - the coefficient is equal to one. The number of thematic areas is determined by the number of clusters.

Keywords: computer text analysis systems, content analysis, cosine similarity measure, UDC index, cluster, electronic library, frequency array, coefficient of thematic direction, software packages for content analysis.



Author(s) citation:

Cite:
Kuznetsov Oleksandr (2023). UDC code determination of new electronic receipts for the formation of an electronic library by means of software. Bibliotechnyi visnyk, (3) 3-16. (In Ukrainan). doi: https://doi.org/10.15407/bv2023.03.003


References:

  1. Shyrokov, V. A., Shevchenko, I. V. and Zahnitko, A. P. (2015). Dani tekstovykh korpusiv u linhvistychnykh doslidzhenniakh: monohrafiia [Text corpora data in linguistic research: monograph]. Lviv, Ukraine: Vyd-vo Lviv. politekhniky. [In Ukrainian].
  2. Symonenko, T. V. (2011). Merezheve informatsiino-bibliotechne zabezpechennia naukovykh doslidzhen [Network information and library support for scientific research]. (Extended abstract of PhD disertation). V. I. Vernadskyi National Library of Ukraine. Kyiv, Ukraine. [In Ukrainian].
  3. Chatbot GPT. [In English].
  4. Manning, C., Raghavan, P., and Schütze. H. (2008). Introduction to Information Retrieval. Cambridge University Press. [In English].
  5. Concordance. [In English].
  6. Content Analysis-Methods, Types and Examples. [In English].
  7. Descriptions of Inquirer Categories and Use of Inquirer Dictionaries. [In English].
  8. Lowe, W. (2015). Yoshikoder: Cross-platform multilingual content analysis. Java software version 0.6.5. [In English].
  9. News about News. [In English].
  10. The American Newspaper: a Study in Social Psychology. [In English].
  11. Watson. [In English].
  12. Weber, R. P. (1990). Basic Content Analysis. Beverly Hills, CA: SAGE. [In English].
  13. Worldwide IDC Global DataSphere Forecast, 2022–2026. [In English].
  14. Zaidman-Zait, A. (2014). Content Analysis. [In English]. doi: https://doi.org/10.1007/978-94-007-0753-5_552