-
Song, R.; Luo, Z.; Nie, J.-Y.; Yu, Y.; Hon, H.-W.: Identification of ambiguous queries in web search (2009)
0.05
0.05183304 = product of:
0.20733216 = sum of:
0.20733216 = weight(_text_:java in 3441) [ClassicSimilarity], result of:
0.20733216 = score(doc=3441,freq=2.0), product of:
0.4437873 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06297082 = queryNorm
0.46718815 = fieldWeight in 3441, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.046875 = fieldNorm(doc=3441)
0.25 = coord(1/4)
- Abstract
- It is widely believed that many queries submitted to search engines are inherently ambiguous (e.g., java and apple). However, few studies have tried to classify queries based on ambiguity and to answer "what the proportion of ambiguous queries is". This paper deals with these issues. First, we clarify the definition of ambiguous queries by constructing the taxonomy of queries from being ambiguous to specific. Second, we ask human annotators to manually classify queries. From manually labeled results, we observe that query ambiguity is to some extent predictable. Third, we propose a supervised learning approach to automatically identify ambiguous queries. Experimental results show that we can correctly identify 87% of labeled queries with the approach. Finally, by using our approach, we estimate that about 16% of queries in a real search log are ambiguous.
-
Croft, W.B.; Metzler, D.; Strohman, T.: Search engines : information retrieval in practice (2010)
0.05
0.05183304 = product of:
0.20733216 = sum of:
0.20733216 = weight(_text_:java in 3605) [ClassicSimilarity], result of:
0.20733216 = score(doc=3605,freq=2.0), product of:
0.4437873 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06297082 = queryNorm
0.46718815 = fieldWeight in 3605, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.046875 = fieldNorm(doc=3605)
0.25 = coord(1/4)
- Abstract
- For introductory information retrieval courses at the undergraduate and graduate level in computer science, information science and computer engineering departments. Written by a leader in the field of information retrieval, Search Engines: Information Retrieval in Practice, is designed to give undergraduate students the understanding and tools they need to evaluate, compare and modify search engines. Coverage of the underlying IR and mathematical models reinforce key concepts. The book's numerous programming exercises make extensive use of Galago, a Java-based open source search engine. SUPPLEMENTS / Extensive lecture slides (in PDF and PPT format) / Solutions to selected end of chapter problems (Instructors only) / Test collections for exercises / Galago search engine
-
Tang, X.-B.; Wei Wei, G,-C.L.; Zhu, J.: ¬An inference model of medical insurance fraud detection : based on ontology and SWRL (2017)
0.05
0.05183304 = product of:
0.20733216 = sum of:
0.20733216 = weight(_text_:java in 4615) [ClassicSimilarity], result of:
0.20733216 = score(doc=4615,freq=2.0), product of:
0.4437873 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06297082 = queryNorm
0.46718815 = fieldWeight in 4615, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.046875 = fieldNorm(doc=4615)
0.25 = coord(1/4)
- Abstract
- Medical insurance fraud is common in many countries' medical insurance systems and represents a serious threat to the insurance funds and the benefits of patients. In this paper, we present an inference model of medical insurance fraud detection, based on a medical detection domain ontology that incorporates the knowledge base provided by the Medical Terminology, NKIMed, and Chinese Library Classification systems. Through analyzing the behaviors of irregular and fraudulent medical services, we defined the scope of the medical domain ontology relevant to the task and built the ontology about medical sciences and medical service behaviors. The ontology then utilizes Semantic Web Rule Language (SWRL) and Java Expert System Shell (JESS) to detect medical irregularities and mine implicit knowledge. The system can be used to improve the management of medical insurance risks.
-
XML in libraries (2002)
0.05
0.04563646 = product of:
0.09127292 = sum of:
0.0068447553 = weight(_text_:und in 4100) [ClassicSimilarity], result of:
0.0068447553 = score(doc=4100,freq=2.0), product of:
0.13966292 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.06297082 = queryNorm
0.049009107 = fieldWeight in 4100, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.015625 = fieldNorm(doc=4100)
0.08442816 = weight(_text_:handled in 4100) [ClassicSimilarity], result of:
0.08442816 = score(doc=4100,freq=2.0), product of:
0.4905077 = queryWeight, product of:
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.06297082 = queryNorm
0.17212403 = fieldWeight in 4100, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.015625 = fieldNorm(doc=4100)
0.5 = coord(2/4)
- Content
- Sammelrezension mit: (1) The ABCs of XML: The Librarian's Guide to the eXtensible Markup Language. Norman Desmarais. Houston, TX: New Technology Press, 2000. 206 pp. $28.00. (ISBN: 0-9675942-0-0) und (2) Learning XML. Erik T. Ray. Sebastopol, CA: O'Reilly & Associates, 2003. 400 pp. $34.95. (ISBN: 0-596-00420-6)
- Footnote
- Rez. in: JASIST 55(2004) no.14, S.1304-1305 (Z. Holbrooks):"The eXtensible Markup Language (XML) and its family of enabling technologies (XPath, XPointer, XLink, XSLT, et al.) were the new "new thing" only a couple of years ago. Happily, XML is now a W3C standard, and its enabling technologies are rapidly proliferating and maturing. Together, they are changing the way data is handled an the Web, how legacy data is accessed and leveraged in corporate archives, and offering the Semantic Web community a powerful toolset. Library and information professionals need a basic understanding of what XML is, and what its impacts will be an the library community as content vendors and publishers convert to the new standards. Norman Desmarais aims to provide librarians with an overview of XML and some potential library applications. The ABCs of XML contains the useful basic information that most general XML works cover. It is addressed to librarians, as evidenced by the occasional reference to periodical vendors, MARC, and OPACs. However, librarians without SGML, HTML, database, or programming experience may find the work daunting. The snippets of code-most incomplete and unattended by screenshots to illustrate the result of the code's execution-obscure more often than they enlighten. A single code sample (p. 91, a book purchase order) is immediately recognizable and sensible. There are no figures, illustrations, or screenshots. Subsection headings are used conservatively. Readers are confronted with page after page of unbroken technical text, and occasionally oddly formatted text (in some of the code samples). The author concentrates an commercial products and projects. Library and agency initiatives-for example, the National Institutes of Health HL-7 and U.S. Department of Education's GEM project-are notable for their absence. The Library of Congress USMARC to SGML effort is discussed in chapter 1, which covers the relationship of XML to its parent SGML, the XML processor, and data type definitions, using MARC as its illustrative example. Chapter 3 addresses the stylesheet options for XML, including DSSSL, CSS, and XSL. The Document Style Semantics and Specification Language (DSSSL) was created for use with SGML, and pruned into DSSSL-Lite and further (DSSSL-online). Cascading Style Sheets (CSS) were created for use with HTML. Extensible Style Language (XSL) is a further revision (and extension) of DSSSL-o specifically for use with XML. Discussion of aural stylesheets and Synchronized Multimedia Integration Language (SMIL) round out the chapter.
-
Chen, H.; Chung, Y.-M.; Ramsey, M.; Yang, C.C.: ¬A smart itsy bitsy spider for the Web (1998)
0.04
0.043194205 = product of:
0.17277682 = sum of:
0.17277682 = weight(_text_:java in 1871) [ClassicSimilarity], result of:
0.17277682 = score(doc=1871,freq=2.0), product of:
0.4437873 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06297082 = queryNorm
0.38932347 = fieldWeight in 1871, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0390625 = fieldNorm(doc=1871)
0.25 = coord(1/4)
- Abstract
- As part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent agent approach to Web searching. In this experiment, we developed 2 Web personal spiders based on best first search and genetic algorithm techniques, respectively. These personal spiders can dynamically take a user's selected starting homepages and search for the most closely related homepages in the Web, based on the links and keyword indexing. A graphical, dynamic, Jav-based interface was developed and is available for Web access. A system architecture for implementing such an agent-spider is presented, followed by deteiled discussions of benchmark testing and user evaluation results. In benchmark testing, although the genetic algorithm spider did not outperform the best first search spider, we found both results to be comparable and complementary. In user evaluation, the genetic algorithm spider obtained significantly higher recall value than that of the best first search spider. However, their precision values were not statistically different. The mutation process introduced in genetic algorithms allows users to find other potential relevant homepages that cannot be explored via a conventional local search process. In addition, we found the Java-based interface to be a necessary component for design of a truly interactive and dynamic Web agent
-
Chen, C.: CiteSpace II : detecting and visualizing emerging trends and transient patterns in scientific literature (2006)
0.04
0.043194205 = product of:
0.17277682 = sum of:
0.17277682 = weight(_text_:java in 272) [ClassicSimilarity], result of:
0.17277682 = score(doc=272,freq=2.0), product of:
0.4437873 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06297082 = queryNorm
0.38932347 = fieldWeight in 272, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0390625 = fieldNorm(doc=272)
0.25 = coord(1/4)
- Abstract
- This article describes the latest development of a generic approach to detecting and visualizing emerging trends and transient patterns in scientific literature. The work makes substantial theoretical and methodological contributions to progressive knowledge domain visualization. A specialty is conceptualized and visualized as a time-variant duality between two fundamental concepts in information science: research fronts and intellectual bases. A research front is defined as an emergent and transient grouping of concepts and underlying research issues. The intellectual base of a research front is its citation and co-citation footprint in scientific literature - an evolving network of scientific publications cited by research-front concepts. Kleinberg's (2002) burst-detection algorithm is adapted to identify emergent research-front concepts. Freeman's (1979) betweenness centrality metric is used to highlight potential pivotal points of paradigm shift over time. Two complementary visualization views are designed and implemented: cluster views and time-zone views. The contributions of the approach are that (a) the nature of an intellectual base is algorithmically and temporally identified by emergent research-front terms, (b) the value of a co-citation cluster is explicitly interpreted in terms of research-front concepts, and (c) visually prominent and algorithmically detected pivotal points substantially reduce the complexity of a visualized network. The modeling and visualization process is implemented in CiteSpace II, a Java application, and applied to the analysis of two research fields: mass extinction (1981-2004) and terrorism (1990-2003). Prominent trends and pivotal points in visualized networks were verified in collaboration with domain experts, who are the authors of pivotal-point articles. Practical implications of the work are discussed. A number of challenges and opportunities for future studies are identified.
-
Eddings, J.: How the Internet works (1994)
0.04
0.043194205 = product of:
0.17277682 = sum of:
0.17277682 = weight(_text_:java in 2514) [ClassicSimilarity], result of:
0.17277682 = score(doc=2514,freq=2.0), product of:
0.4437873 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06297082 = queryNorm
0.38932347 = fieldWeight in 2514, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0390625 = fieldNorm(doc=2514)
0.25 = coord(1/4)
- Abstract
- How the Internet Works promises "an exciting visual journey down the highways and byways of the Internet," and it delivers. The book's high quality graphics and simple, succinct text make it the ideal book for beginners; however it still has much to offer for Net vets. This book is jam- packed with cool ways to visualize how the Net works. The first section visually explores how TCP/IP, Winsock, and other Net connectivity mysteries work. This section also helps you understand how e-mail addresses and domains work, what file types mean, and how information travels across the Net. Part 2 unravels the Net's underlying architecture, including good information on how routers work and what is meant by client/server architecture. The third section covers your own connection to the Net through an Internet Service Provider (ISP), and how ISDN, cable modems, and Web TV work. Part 4 discusses e-mail, spam, newsgroups, Internet Relay Chat (IRC), and Net phone calls. In part 5, you'll find out how other Net tools, such as gopher, telnet, WAIS, and FTP, can enhance your Net experience. The sixth section takes on the World Wide Web, including everything from how HTML works to image maps and forms. Part 7 looks at other Web features such as push technology, Java, ActiveX, and CGI scripting, while part 8 deals with multimedia on the Net. Part 9 shows you what intranets are and covers groupware, and shopping and searching the Net. The book wraps up with part 10, a chapter on Net security that covers firewalls, viruses, cookies, and other Web tracking devices, plus cryptography and parental controls.
-
Wu, D.; Shi, J.: Classical music recording ontology used in a library catalog (2016)
0.04
0.043194205 = product of:
0.17277682 = sum of:
0.17277682 = weight(_text_:java in 4179) [ClassicSimilarity], result of:
0.17277682 = score(doc=4179,freq=2.0), product of:
0.4437873 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06297082 = queryNorm
0.38932347 = fieldWeight in 4179, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0390625 = fieldNorm(doc=4179)
0.25 = coord(1/4)
- Abstract
- In order to improve the organization of classical music information resources, we constructed a classical music recording ontology, on top of which we then designed an online classical music catalog. Our construction of the classical music recording ontology consisted of three steps: identifying the purpose, analyzing the ontology, and encoding the ontology. We identified the main classes and properties of the domain by investigating classical music recording resources and users' information needs. We implemented the ontology in the Web Ontology Language (OWL) using five steps: transforming the properties, encoding the transformed properties, defining ranges of the properties, constructing individuals, and standardizing the ontology. In constructing the online catalog, we first designed the structure and functions of the catalog based on investigations into users' information needs and information-seeking behaviors. Then we extracted classes and properties of the ontology using the Apache Jena application programming interface (API), and constructed a catalog in the Java environment. The catalog provides a hierarchical main page (built using the Functional Requirements for Bibliographic Records (FRBR) model), a classical music information network and integrated information service; this combination of features greatly eases the task of finding classical music recordings and more information about classical music.
-
Strong, R.W.: Undergraduates' information differentiation behaviors in a research process : a grounded theory approach (2005)
0.04
0.04221408 = product of:
0.16885632 = sum of:
0.16885632 = weight(_text_:handled in 985) [ClassicSimilarity], result of:
0.16885632 = score(doc=985,freq=2.0), product of:
0.4905077 = queryWeight, product of:
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.06297082 = queryNorm
0.34424806 = fieldWeight in 985, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.03125 = fieldNorm(doc=985)
0.25 = coord(1/4)
- Abstract
- This research explores, using a Grounded Theory approach, the question of how a particular group of undergraduate university students differentiates the values of retrieved information in a contemporary research process. Specifically it attempts to isolate and label those specific techniques, processes, formulae-both objective and subjective-that the students use to identify, prioritize, and successfully incorporate the most useful and valuable information into their research project. The research reviews the relevant literature covering the areas of: epistemology, knowledge acquisition, and cognitive learning theory; early relevance research; the movement from relevance models to information seeking in context; and the proximate recent research. A research methodology is articulated using a Grounded Theory approach, and the research process and research participants are fully explained and described. The findings of the research are set forth using three Thematic Sets- Traditional Relevance Measures; Structural Frames; and Metaphors: General and Ecological-using the actual discourse of the study participants, and a theoretical construct is advanced. Based on that construct, it can be theorized that identification and analysis of the metaphorical language that the particular students in this study used, both by way of general and ecological metaphors-their stories-about how they found, handled, and evaluated information, can be a very useful tool in understanding how the students identified, prioritized, and successfully incorporated the most useful and relevant information into their research projects. It also is argued that this type of metaphorical analysis could be useful in providing a bridging mechanism for a broader understanding of the relationships between traditional user relevance studies and the concepts of frame theory and sense-making. Finally, a corollary to Whitmire's original epistemological hypothesis is posited: Students who were more adept at using metaphors-either general or ecological-appeared more comfortable with handling contradictory information sources, and better able to articulate their valuing decisions. The research concludes with a discussion of the implications for both future research in the Library and Information Science field, and for the practice of both Library professionals and classroom instructors involved in assisting students involved in information valuing decision-making in a research process.
-
Quirin, A.; Cordón, O.; Santamaría, J.; Vargas-Quesada, B.; Moya-Anegón, F.: ¬A new variant of the Pathfinder algorithm to generate large visual science maps in cubic time (2008)
0.04
0.04221408 = product of:
0.16885632 = sum of:
0.16885632 = weight(_text_:handled in 3112) [ClassicSimilarity], result of:
0.16885632 = score(doc=3112,freq=2.0), product of:
0.4905077 = queryWeight, product of:
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.06297082 = queryNorm
0.34424806 = fieldWeight in 3112, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.03125 = fieldNorm(doc=3112)
0.25 = coord(1/4)
- Abstract
- In the last few years, there is an increasing interest to generate visual representations of very large scientific domains. A methodology based on the combined use of ISI-JCR category cocitation and social networks analysis through the use of the Pathfinder algorithm has demonstrated its ability to achieve high quality, schematic visualizations for these kinds of domains. Now, the next step would be to generate these scientograms in an on-line fashion. To do so, there is a need to significantly decrease the run time of the latter pruning technique when working with category cocitation matrices of a large dimension like the ones handled in these large domains (Pathfinder has a time complexity order of O(n4), with n being the number of categories in the cocitation matrix, i.e., the number of nodes in the network). Although a previous improvement called Binary Pathfinder has already been proposed to speed up the original algorithm, its significant time complexity reduction is not enough for that aim. In this paper, we make use of a different shortest path computation from classical approaches in computer science graph theory to propose a new variant of the Pathfinder algorithm which allows us to reduce its time complexity in one order of magnitude, O(n3), and thus to significantly decrease the run time of the implementation when applied to large scientific domains considering the parameter q = n - 1. Besides, the new algorithm has a much simpler structure than the Binary Pathfinder as well as it saves a significant amount of memory with respect to the original Pathfinder by reducing the space complexity to the need of just storing two matrices. An experimental comparison will be developed using large networks from real-world domains to show the good performance of the new proposal.
-
Duckett, R.J.; Walker, P.; Donnelly, C.: Know it all, find it fast : an A-Z source guide for the enquiry desk (2008)
0.04
0.04221408 = product of:
0.16885632 = sum of:
0.16885632 = weight(_text_:handled in 3786) [ClassicSimilarity], result of:
0.16885632 = score(doc=3786,freq=2.0), product of:
0.4905077 = queryWeight, product of:
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.06297082 = queryNorm
0.34424806 = fieldWeight in 3786, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.03125 = fieldNorm(doc=3786)
0.25 = coord(1/4)
- Abstract
- 'I wish that I had been able to obtain such a guide when I started dealing with enquiries' - "Managing Information". 'By the time I got to it 3 staff had noticed it on the desk and written a note saying really good and can we have a copy' - "BBOB News". 'This is certainly a comforting and very useful guide for the information worker, particularly inexperienced or unqualified, staffing a general enquiry desk' - "New Library World". There is a queue, the phone is ringing, the photocopier has jammed and your enquirer is waiting for a response. You are stressed and you can feel the panic rising. Where do you go to find the information you need to answer the question promptly and accurately?Answering queries from users is one of the most important services undertaken by library and information staff. Yet it is also one of the most difficult, least understood subjects. There are still very few materials available to help frontline staff - often paraprofessional - develop their reader enquiry skills. This award-winning sourcebook is an essential guide to where to look to find the answers quickly. It is designed as a first point of reference for library and information practitioners, to be depended upon if they are unfamiliar with the subject of an enquiry - or wish to find out more. It is arranged in an easily searchable, fully cross-referenced A-Z list of around 150 of the subject areas most frequently handled at enquiry desks.Each subject entry lists the most important information sources and where to locate them, including printed and electronic sources, relevant websites and useful contacts for referral purposes. The authors use their extensive experience in reference work to offer useful tips, warn of potential pitfalls, and spotlight typical queries and how to tackle them. This new edition has been brought right up-to-date with all sources checked for currency and many new ones added. The searchability is enhanced by a comprehensive index to make those essential sources even easier to find - saving you valuable minutes! Offering quick and easy pointers to a multitude of information sources, this is an invaluable reference deskbook for all library and information staff in need of a speedy answer, in reference libraries, subject departments and other information units.
-
Styltsvig, H.B.: Ontology-based information retrieval (2006)
0.04
0.04221408 = product of:
0.16885632 = sum of:
0.16885632 = weight(_text_:handled in 2154) [ClassicSimilarity], result of:
0.16885632 = score(doc=2154,freq=2.0), product of:
0.4905077 = queryWeight, product of:
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.06297082 = queryNorm
0.34424806 = fieldWeight in 2154, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.03125 = fieldNorm(doc=2154)
0.25 = coord(1/4)
- Abstract
- In this thesis, we will present methods for introducing ontologies in information retrieval. The main hypothesis is that the inclusion of conceptual knowledge such as ontologies in the information retrieval process can contribute to the solution of major problems currently found in information retrieval. This utilization of ontologies has a number of challenges. Our focus is on the use of similarity measures derived from the knowledge about relations between concepts in ontologies, the recognition of semantic information in texts and the mapping of this knowledge into the ontologies in use, as well as how to fuse together the ideas of ontological similarity and ontological indexing into a realistic information retrieval scenario. To achieve the recognition of semantic knowledge in a text, shallow natural language processing is used during indexing that reveals knowledge to the level of noun phrases. Furthermore, we briefly cover the identification of semantic relations inside and between noun phrases, as well as discuss which kind of problems are caused by an increase in compoundness with respect to the structure of concepts in the evaluation of queries. Measuring similarity between concepts based on distances in the structure of the ontology is discussed. In addition, a shared nodes measure is introduced and, based on a set of intuitive similarity properties, compared to a number of different measures. In this comparison the shared nodes measure appears to be superior, though more computationally complex. Some of the major problems of shared nodes which relate to the way relations differ with respect to the degree they bring the concepts they connect closer are discussed. A generalized measure called weighted shared nodes is introduced to deal with these problems. Finally, the utilization of concept similarity in query evaluation is discussed. A semantic expansion approach that incorporates concept similarity is introduced and a generalized fuzzy set retrieval model that applies expansion during query evaluation is presented. While not commonly used in present information retrieval systems, it appears that the fuzzy set model comprises the flexibility needed when generalizing to an ontology-based retrieval model and, with the introduction of a hierarchical fuzzy aggregation principle, compound concepts can be handled in a straightforward and natural manner.
-
Das, S.; Naskar, D.; Roy, S.: Reorganizing educational institutional domain using faceted ontological principles (2022)
0.04
0.04221408 = product of:
0.16885632 = sum of:
0.16885632 = weight(_text_:handled in 2100) [ClassicSimilarity], result of:
0.16885632 = score(doc=2100,freq=2.0), product of:
0.4905077 = queryWeight, product of:
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.06297082 = queryNorm
0.34424806 = fieldWeight in 2100, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.03125 = fieldNorm(doc=2100)
0.25 = coord(1/4)
- Abstract
- The purpose of this work is to find out how different library classification systems and linguistic ontologies arrange a particular domain of interest and what are the limitations for information retrieval. We use knowledge representation techniques and languages for construction of a domain specific ontology. This ontology would help not only in problem solving, but it would demonstrate the ease with which complex queries can be handled using principles of domain ontology, thereby facilitating better information retrieval. Facet-based methodology has been used for ontology formalization for quite some time. Ontology formalization involves different steps such as, Identification of the terminology, Analysis, Synthesis, Standardization and Ordering. Firstly, for purposes of conceptualization OntoUML has been used which is a well-founded and established language for Ontology driven Conceptual Modelling. Phase transformation of "the same mode" has been subsequently obtained by OWL-DL using Protégé software. The final OWL ontology contains a total of around 232 axioms. These axioms comprise 148 logical axioms, 76 declaration axioms and 43 classes. These axioms glue together classes, properties and data types as well as a constraint. Such data clustering cannot be achieved through general use of simple classification schemes. Hence it has been observed and established that domain ontology using faceted principles provide better information retrieval with enhanced precision. This ontology should be seen not only as an alternative of the existing classification system but as a Knowledge Base (KB) system which can handle complex queries well, which is the ultimate purpose of any classification system or indexing system. In this paper, we try to understand how ontology-based information retrieval systems can prove its utility as a useful tool in the field of library science with a particular focus on the education domain.
-
Noerr, P.: ¬The Digital Library Tool Kit (2001)
0.03
0.03455536 = product of:
0.13822144 = sum of:
0.13822144 = weight(_text_:java in 774) [ClassicSimilarity], result of:
0.13822144 = score(doc=774,freq=2.0), product of:
0.4437873 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06297082 = queryNorm
0.31145877 = fieldWeight in 774, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.03125 = fieldNorm(doc=774)
0.25 = coord(1/4)
- Footnote
- This Digital Library Tool Kit was sponsored by Sun Microsystems, Inc. to address some of the leading questions that academic institutions, public libraries, government agencies, and museums face in trying to develop, manage, and distribute digital content. The evolution of Java programming, digital object standards, Internet access, electronic commerce, and digital media management models is causing educators, CIOs, and librarians to rethink many of their traditional goals and modes of operation. New audiences, continuous access to collections, and enhanced services to user communities are enabled. As one of the leading technology providers to education and library communities, Sun is pleased to present this comprehensive introduction to digital libraries
-
Herrero-Solana, V.; Moya Anegón, F. de: Graphical Table of Contents (GTOC) for library collections : the application of UDC codes for the subject maps (2003)
0.03
0.03455536 = product of:
0.13822144 = sum of:
0.13822144 = weight(_text_:java in 3758) [ClassicSimilarity], result of:
0.13822144 = score(doc=3758,freq=2.0), product of:
0.4437873 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06297082 = queryNorm
0.31145877 = fieldWeight in 3758, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.03125 = fieldNorm(doc=3758)
0.25 = coord(1/4)
- Abstract
- The representation of information contents by graphical maps is an extended ongoing research topic. In this paper we introduce the application of UDC codes for the subject maps development. We use the following graphic representation methodologies: 1) Multidimensional scaling (MDS), 2) Cluster analysis, 3) Neural networks (Self Organizing Map - SOM). Finally, we conclude about the application viability of every kind of map. 1. Introduction Advanced techniques for Information Retrieval (IR) currently make up one of the most active areas for research in the field of library and information science. New models representing document content are replacing the classic systems in which the search terms supplied by the user were compared against the indexing terms existing in the inverted files of a database. One of the topics most often studied in the last years is bibliographic browsing, a good complement to querying strategies. Since the 80's, many authors have treated this topic. For example, Ellis establishes that browsing is based an three different types of tasks: identification, familiarization and differentiation (Ellis, 1989). On the other hand, Cove indicates three different browsing types: searching browsing, general purpose browsing and serendipity browsing (Cove, 1988). Marcia Bates presents six different types (Bates, 1989), although the classification of Bawden is the one that really interests us: 1) similarity comparison, 2) structure driven, 3) global vision (Bawden, 1993). The global vision browsing implies the use of graphic representations, which we will call map displays, that allow the user to get a global idea of the nature and structure of the information in the database. In the 90's, several authors worked an this research line, developing different types of maps. One of the most active was Xia Lin what introduced the concept of Graphical Table of Contents (GTOC), comparing the maps to true table of contents based an graphic representations (Lin 1996). Lin applies the algorithm SOM to his own personal bibliography, analyzed in function of the words of the title and abstract fields, and represented in a two-dimensional map (Lin 1997). Later on, Lin applied this type of maps to create websites GTOCs, through a Java application.
-
Vlachidis, A.; Binding, C.; Tudhope, D.; May, K.: Excavating grey literature : a case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources (2010)
0.03
0.03455536 = product of:
0.13822144 = sum of:
0.13822144 = weight(_text_:java in 935) [ClassicSimilarity], result of:
0.13822144 = score(doc=935,freq=2.0), product of:
0.4437873 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06297082 = queryNorm
0.31145877 = fieldWeight in 935, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.03125 = fieldNorm(doc=935)
0.25 = coord(1/4)
- Abstract
- Purpose - This paper sets out to discuss the use of information extraction (IE), a natural language-processing (NLP) technique to assist "rich" semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic-aware "rich" indexing of diverse natural language resources with properties capable of satisfying information retrieval from online publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project. Design/methodology/approach - The paper proposes use of the English Heritage extension (CRM-EH) of the standard core ontology in cultural heritage, CIDOC CRM, and exploitation of domain thesauri resources for driving and enhancing an Ontology-Oriented Information Extraction process. The process of semantic indexing is based on a rule-based Information Extraction technique, which is facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules. Findings - Initial results suggest that the combination of information extraction with knowledge resources and standard conceptual models is capable of supporting semantic-aware term indexing. Additional efforts are required for further exploitation of the technique and adoption of formal evaluation methods for assessing the performance of the method in measurable terms. Originality/value - The value of the paper lies in the semantic indexing of 535 unpublished online documents often referred to as "Grey Literature", from the Archaeological Data Service OASIS corpus (Online AccesS to the Index of archaeological investigationS), with respect to the CRM ontological concepts E49.Time Appellation and P19.Physical Object.
-
Radhakrishnan, A.: Swoogle : an engine for the Semantic Web (2007)
0.03
0.03455536 = product of:
0.13822144 = sum of:
0.13822144 = weight(_text_:java in 709) [ClassicSimilarity], result of:
0.13822144 = score(doc=709,freq=2.0), product of:
0.4437873 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06297082 = queryNorm
0.31145877 = fieldWeight in 709, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.03125 = fieldNorm(doc=709)
0.25 = coord(1/4)
- Content
- "Swoogle, the Semantic web search engine, is a research project carried out by the ebiquity research group in the Computer Science and Electrical Engineering Department at the University of Maryland. It's an engine tailored towards finding documents on the semantic web. The whole research paper is available here. Semantic web is touted as the next generation of online content representation where the web documents are represented in a language that is not only easy for humans but is machine readable (easing the integration of data as never thought possible) as well. And the main elements of the semantic web include data model description formats such as Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, Turtle, N-Triples), and notations such as RDF Schema (RDFS), the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain (Wikipedia). And Swoogle is an attempt to mine and index this new set of web documents. The engine performs crawling of semantic documents like most web search engines and the search is available as web service too. The engine is primarily written in Java with the PHP used for the front-end and MySQL for database. Swoogle is capable of searching over 10,000 ontologies and indexes more that 1.3 million web documents. It also computes the importance of a Semantic Web document. The techniques used for indexing are the more google-type page ranking and also mining the documents for inter-relationships that are the basis for the semantic web. For more information on how the RDF framework can be used to relate documents, read the link here. Being a research project, and with a non-commercial motive, there is not much hype around Swoogle. However, the approach to indexing of Semantic web documents is an approach that most engines will have to take at some point of time. When the Internet debuted, there were no specific engines available for indexing or searching. The Search domain only picked up as more and more content became available. One fundamental question that I've always wondered about it is - provided that the search engines return very relevant results for a query - how to ascertain that the documents are indeed the most relevant ones available. There is always an inherent delay in indexing of document. Its here that the new semantic documents search engines can close delay. Experimenting with the concept of Search in the semantic web can only bore well for the future of search technology."
-
Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015)
0.03
0.03455536 = product of:
0.13822144 = sum of:
0.13822144 = weight(_text_:java in 3301) [ClassicSimilarity], result of:
0.13822144 = score(doc=3301,freq=2.0), product of:
0.4437873 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06297082 = queryNorm
0.31145877 = fieldWeight in 3301, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.03125 = fieldNorm(doc=3301)
0.25 = coord(1/4)
- Abstract
- Analytico-synthetic and faceted classifications, such as Universal Decimal Classification (UDC) express content of documents with complex, pre-combined classification codes. Without classification authority control that would help manage and access structured notations, the use of UDC codes in searching and browsing is limited. Existing UDC parsing solutions are usually created for a particular database system or a specific task and are not widely applicable. The approach described in this paper provides a solution by which the analysis and interpretation of UDC notations would be stored into an intermediate format (in this case, in XML) by automatic means without any data or information loss. Due to its richness, the output file can be converted into different formats, such as standard mark-up and data exchange formats or simple lists of the recommended entry points of a UDC number. The program can also be used to create authority records containing complex UDC numbers which can be comprehensively analysed in order to be retrieved effectively. The Java program, as well as the corresponding schema definition it employs, is under continuous development. The current version of the interpreter software is now available online for testing purposes at the following web site: http://interpreter-eto.rhcloud.com. The future plan is to implement conversion methods for standard formats and to create standard online interfaces in order to make it possible to use the features of software as a service. This would result in the algorithm being able to be employed both in existing and future library systems to analyse UDC numbers without any significant programming effort.
-
Rosenfeld, L.; Morville, P.: Information architecture for the World Wide Web : designing large-scale Web sites (1998)
0.03
0.03023594 = product of:
0.12094376 = sum of:
0.12094376 = weight(_text_:java in 1493) [ClassicSimilarity], result of:
0.12094376 = score(doc=1493,freq=2.0), product of:
0.4437873 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06297082 = queryNorm
0.2725264 = fieldWeight in 1493, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.02734375 = fieldNorm(doc=1493)
0.25 = coord(1/4)
- Abstract
- Some web sites "work" and some don't. Good web site consultants know that you can't just jump in and start writing HTML, the same way you can't build a house by just pouring a foundation and putting up some walls. You need to know who will be using the site, and what they'll be using it for. You need some idea of what you'd like to draw their attention to during their visit. Overall, you need a strong, cohesive vision for the site that makes it both distinctive and usable. Information Architecture for the World Wide Web is about applying the principles of architecture and library science to web site design. Each web site is like a public building, available for tourists and regulars alike to breeze through at their leisure. The job of the architect is to set up the framework for the site to make it comfortable and inviting for people to visit, relax in, and perhaps even return to someday. Most books on web development concentrate either on the aesthetics or the mechanics of the site. This book is about the framework that holds the two together. With this book, you learn how to design web sites and intranets that support growth, management, and ease of use. Special attention is given to: * The process behind architecting a large, complex site * Web site hierarchy design and organization Information Architecture for the World Wide Web is for webmasters, designers, and anyone else involved in building a web site. It's for novice web designers who, from the start, want to avoid the traps that result in poorly designed sites. It's for experienced web designers who have already created sites but realize that something "is missing" from their sites and want to improve them. It's for programmers and administrators who are comfortable with HTML, CGI, and Java but want to understand how to organize their web pages into a cohesive site. The authors are two of the principals of Argus Associates, a web consulting firm. At Argus, they have created information architectures for web sites and intranets of some of the largest companies in the United States, including Chrysler Corporation, Barron's, and Dow Chemical.
-
Haravu, L.J.: Lectures on knowledge management : paradigms, challenges and opportunities (2002)
0.03
0.0263838 = product of:
0.1055352 = sum of:
0.1055352 = weight(_text_:handled in 3048) [ClassicSimilarity], result of:
0.1055352 = score(doc=3048,freq=2.0), product of:
0.4905077 = queryWeight, product of:
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.06297082 = queryNorm
0.21515504 = fieldWeight in 3048, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.7894444 = idf(docFreq=49, maxDocs=44421)
0.01953125 = fieldNorm(doc=3048)
0.25 = coord(1/4)
- Footnote
- Rez. in: Knowledge organization 30(2003) no.1, S.42-44 (D. Mercier): "This work is a collection of lecture notes following the 22"d Sarada Ranganathan Endowment Lectures which took place in Bangalore, India, from 4-6 December 2000. This compilation has been divided into four sections: historical introduction, compilation of several definitions about knowledge and its management, impacts of knowledge management (KM) an information professionals and, review of information technologies as tools for knowledge management. The aim of this book is to provide "a succinct overview of various aspects of knowledge management, particularly in companies" (p. v). Each chapter focuses an a dominant text in a specific area. Most of the quoted authors are known consultants in KM. Each chapter is similarly handled: a review of a dominant book, some subject matter from a few other consultants and, last but not least, comments an a few broadly cited cases. Each chapter is uneven with regards to the level of detail provided, and ending summaries, which would have been useful, are missing. The book is structured in two parts containing five chapters each. The first part is theoretical, the second deals with knowledge workers and technologies. Haravu begins the first chapter with a historical overview of information and knowledge management (IKM) essentially based an the review previously made by Drucker (1999). Haravu emphasises the major facts and events of the discipline from the industrial revolution up to the advent of the knowledge economy. On the whole, this book is largely technology-oriented. The lecturer presents micro-economic factors contributing to the economic perspective of knowledge management, focusing an the existing explicit knowledge. This is Haravu's prevailing perspective. He then offers a compilation of definitions from Allee (1997) and Sveiby (1997), both known for their contribution in the area of knowledge evaluation. As many others, Haravu confirms his assumption regarding the distinction between information and knowledge, and the knowledge categories: explicit and tacit, both actions oriented and supported by rules (p. 43). The SECI model (Nonaka & Takeuchi, 1995), also known as "knowledge conversion spiral" is described briefly, and the theoretically relational dimension between individual and collectivities is explained. Three SECI linked concepts appear to be missing: contexts in movement, intellectual assets and leadership.