-
Lubetzky, S.: Development of cataloging rules (1953)
0.05
0.04634181 = product of:
0.18536724 = sum of:
0.18536724 = weight(_text_:handle in 3626) [ClassicSimilarity], result of:
0.18536724 = score(doc=3626,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.43370473 = fieldWeight in 3626, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.046875 = fieldNorm(doc=3626)
0.25 = coord(1/4)
- Content
- Vgl.: https://www.ideals.illinois.edu/bitstream/handle/2142/5511/librarytrendsv2i2c_opt.pdf.
-
Thornton, K: Powerful structure : inspecting infrastructures of information organization in Wikimedia Foundation projects (2016)
0.05
0.04634181 = product of:
0.18536724 = sum of:
0.18536724 = weight(_text_:handle in 4288) [ClassicSimilarity], result of:
0.18536724 = score(doc=4288,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.43370473 = fieldWeight in 4288, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.046875 = fieldNorm(doc=4288)
0.25 = coord(1/4)
- Content
- Vgl. auch: https://digital.lib.washington.edu/researchworks/bitstream/handle/1773/38160/Thornton_washington_0250E_16572.pdf?sequence=1&isAllowed=y.
-
Chen, H.; Chung, Y.-M.; Ramsey, M.; Yang, C.C.: ¬A smart itsy bitsy spider for the Web (1998)
0.04
0.04481125 = product of:
0.179245 = sum of:
0.179245 = weight(_text_:java in 1871) [ClassicSimilarity], result of:
0.179245 = score(doc=1871,freq=2.0), product of:
0.4604012 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06532823 = queryNorm
0.38932347 = fieldWeight in 1871, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0390625 = fieldNorm(doc=1871)
0.25 = coord(1/4)
- Abstract
- As part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent agent approach to Web searching. In this experiment, we developed 2 Web personal spiders based on best first search and genetic algorithm techniques, respectively. These personal spiders can dynamically take a user's selected starting homepages and search for the most closely related homepages in the Web, based on the links and keyword indexing. A graphical, dynamic, Jav-based interface was developed and is available for Web access. A system architecture for implementing such an agent-spider is presented, followed by deteiled discussions of benchmark testing and user evaluation results. In benchmark testing, although the genetic algorithm spider did not outperform the best first search spider, we found both results to be comparable and complementary. In user evaluation, the genetic algorithm spider obtained significantly higher recall value than that of the best first search spider. However, their precision values were not statistically different. The mutation process introduced in genetic algorithms allows users to find other potential relevant homepages that cannot be explored via a conventional local search process. In addition, we found the Java-based interface to be a necessary component for design of a truly interactive and dynamic Web agent
-
Chen, C.: CiteSpace II : detecting and visualizing emerging trends and transient patterns in scientific literature (2006)
0.04
0.04481125 = product of:
0.179245 = sum of:
0.179245 = weight(_text_:java in 272) [ClassicSimilarity], result of:
0.179245 = score(doc=272,freq=2.0), product of:
0.4604012 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06532823 = queryNorm
0.38932347 = fieldWeight in 272, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0390625 = fieldNorm(doc=272)
0.25 = coord(1/4)
- Abstract
- This article describes the latest development of a generic approach to detecting and visualizing emerging trends and transient patterns in scientific literature. The work makes substantial theoretical and methodological contributions to progressive knowledge domain visualization. A specialty is conceptualized and visualized as a time-variant duality between two fundamental concepts in information science: research fronts and intellectual bases. A research front is defined as an emergent and transient grouping of concepts and underlying research issues. The intellectual base of a research front is its citation and co-citation footprint in scientific literature - an evolving network of scientific publications cited by research-front concepts. Kleinberg's (2002) burst-detection algorithm is adapted to identify emergent research-front concepts. Freeman's (1979) betweenness centrality metric is used to highlight potential pivotal points of paradigm shift over time. Two complementary visualization views are designed and implemented: cluster views and time-zone views. The contributions of the approach are that (a) the nature of an intellectual base is algorithmically and temporally identified by emergent research-front terms, (b) the value of a co-citation cluster is explicitly interpreted in terms of research-front concepts, and (c) visually prominent and algorithmically detected pivotal points substantially reduce the complexity of a visualized network. The modeling and visualization process is implemented in CiteSpace II, a Java application, and applied to the analysis of two research fields: mass extinction (1981-2004) and terrorism (1990-2003). Prominent trends and pivotal points in visualized networks were verified in collaboration with domain experts, who are the authors of pivotal-point articles. Practical implications of the work are discussed. A number of challenges and opportunities for future studies are identified.
-
Eddings, J.: How the Internet works (1994)
0.04
0.04481125 = product of:
0.179245 = sum of:
0.179245 = weight(_text_:java in 2514) [ClassicSimilarity], result of:
0.179245 = score(doc=2514,freq=2.0), product of:
0.4604012 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06532823 = queryNorm
0.38932347 = fieldWeight in 2514, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0390625 = fieldNorm(doc=2514)
0.25 = coord(1/4)
- Abstract
- How the Internet Works promises "an exciting visual journey down the highways and byways of the Internet," and it delivers. The book's high quality graphics and simple, succinct text make it the ideal book for beginners; however it still has much to offer for Net vets. This book is jam- packed with cool ways to visualize how the Net works. The first section visually explores how TCP/IP, Winsock, and other Net connectivity mysteries work. This section also helps you understand how e-mail addresses and domains work, what file types mean, and how information travels across the Net. Part 2 unravels the Net's underlying architecture, including good information on how routers work and what is meant by client/server architecture. The third section covers your own connection to the Net through an Internet Service Provider (ISP), and how ISDN, cable modems, and Web TV work. Part 4 discusses e-mail, spam, newsgroups, Internet Relay Chat (IRC), and Net phone calls. In part 5, you'll find out how other Net tools, such as gopher, telnet, WAIS, and FTP, can enhance your Net experience. The sixth section takes on the World Wide Web, including everything from how HTML works to image maps and forms. Part 7 looks at other Web features such as push technology, Java, ActiveX, and CGI scripting, while part 8 deals with multimedia on the Net. Part 9 shows you what intranets are and covers groupware, and shopping and searching the Net. The book wraps up with part 10, a chapter on Net security that covers firewalls, viruses, cookies, and other Web tracking devices, plus cryptography and parental controls.
-
Wu, D.; Shi, J.: Classical music recording ontology used in a library catalog (2016)
0.04
0.04481125 = product of:
0.179245 = sum of:
0.179245 = weight(_text_:java in 4179) [ClassicSimilarity], result of:
0.179245 = score(doc=4179,freq=2.0), product of:
0.4604012 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06532823 = queryNorm
0.38932347 = fieldWeight in 4179, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0390625 = fieldNorm(doc=4179)
0.25 = coord(1/4)
- Abstract
- In order to improve the organization of classical music information resources, we constructed a classical music recording ontology, on top of which we then designed an online classical music catalog. Our construction of the classical music recording ontology consisted of three steps: identifying the purpose, analyzing the ontology, and encoding the ontology. We identified the main classes and properties of the domain by investigating classical music recording resources and users' information needs. We implemented the ontology in the Web Ontology Language (OWL) using five steps: transforming the properties, encoding the transformed properties, defining ranges of the properties, constructing individuals, and standardizing the ontology. In constructing the online catalog, we first designed the structure and functions of the catalog based on investigations into users' information needs and information-seeking behaviors. Then we extracted classes and properties of the ontology using the Apache Jena application programming interface (API), and constructed a catalog in the Java environment. The catalog provides a hierarchical main page (built using the Functional Requirements for Bibliographic Records (FRBR) model), a classical music information network and integrated information service; this combination of features greatly eases the task of finding classical music recordings and more information about classical music.
-
Keane, D.: ¬The information behaviour of senior executives (1999)
0.04
0.04369148 = product of:
0.17476591 = sum of:
0.17476591 = weight(_text_:handle in 1278) [ClassicSimilarity], result of:
0.17476591 = score(doc=1278,freq=4.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.40890077 = fieldWeight in 1278, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.03125 = fieldNorm(doc=1278)
0.25 = coord(1/4)
- Abstract
- For senior executives, the ability to work with large quantities of information - sorting the wheat from the chaff- has long been recognised as a key determinant of achievement. What an executive believes to be important information can have a significant influence on what they think and how they think about it. Senior executives, because of their critical leadership role, are challenged in their daily lives to develop effective ways of acquiring, using and sharing important information. Some executives are undoubtedly better than others in how they handle such information and there is a high level of interest in identifying those information behavior characteristics that lead to executive excellence (Davenport & Prusak, 1998). Because of their position within organizations, CEOs - those senior executives who have overall responsibility for the management of the organization or business unit - are particularly concerned with enhancing their information behavior. CEOs have the task of managing the organization so that it achieves its strategic goals and objectives. And a critical part of this task is becoming highly effective in managing a wide range of information and in developing skills of influence and decision making. It is therefore important for us to understand how senior executives handle information on a day-to-day basis. What information do they consider important? And why? Several studies have sought to address these questions with varying degrees of success. Some have set out to better understand what type of information senior executives need (McLeod & Jones, 1987) while other studies have attempted to provide a comprehensive theoretical base for executive work (Mintzberg, 1968; 1973; 1975). Yet other work has tried to devise various tools and methodologies for eliciting the unique information requirements of individual executives (Rockart, 1979).
-
Taniguchi, S.: ¬A system for analyzing cataloguing rules : a feasibility study (1996)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 4266) [ClassicSimilarity], result of:
0.15447271 = score(doc=4266,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 4266, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=4266)
0.25 = coord(1/4)
- Abstract
- The quality control of cataloging standards is as important as the quality control of bibliographic records. In order to aid the quality control of cataloging standards, a prototype system to analyze the ambiguity and complexity of cataloging rules was developed. Before developing the system, a standard rule unit was defined and a simple, function-like format was devised to indicate the syntactic structure of each unit rule. The AACR2 chapter 1 rules were then manually transformed into this function-like, unit role format. The systems reads the manually transformed unit rules and puts them into their basic forms based on their syntactic components. The system then applies rule-templates, which are skeletal schemata for specific types of cataloging rules, to the converted rules. As a result of this rule-template application, the internal structure of each unit rule is determined. The system is also used to explore inter-rule relationships. That is, the system determines whether two rules have an exclusive, parallel, complementary, or non-relationship. These relationships are based on the analysis of the structural parts described above in terms of the given rule-template. To assists in this process, the system applies external knowledge represented in the same fashion as the rule units themselves. Although the prototype system can handle only a restricted range of rules, the proposed approach is positively validated and shown to be useful. However, it is possibly impractical to build a complete rule-analyzing system of this type at this stage
-
Rijsbergen, C.J. van; Lalmas, M.: Information calculus for information retrieval (1996)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 4269) [ClassicSimilarity], result of:
0.15447271 = score(doc=4269,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 4269, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=4269)
0.25 = coord(1/4)
- Abstract
- Information is and always has been an elusive concept; nevertheless many philosophers, mathematicians, logicians and computer scientists have felt that it is fundamental. Many attempts have been made to come up with some sensible and intuitively acceptable definition of information; up to now, none of these have succeeded. This work is based on the approach followed by Dretske, Barwise, and Devlin, who claimed that the notion of information starts from the position that given an ontology of objects individuated by a cognitive agent, it makes sense to speak of the information an object (e.g., a text, an image, a video) contains about another object (e.g. the query). This phenomenon is captured by the flow of information between objects. Its exploitation is the task of an information retrieval system. These authors proposes a theory of information that provides an analysis of the concept of information (any type, from any media) and the manner in which intelligent organisms (referring to as cognitive agents) handle and respond to the information picked up from their environment. They defined the nature of information flow and the mechanisms that give rise to such a flow. The theory, which is based on Situation Theory, is expressed with a calculus defined on channels. The calculus was defined so that it satisfies properties that are attributes to information and its flows. This paper demonstrates the connection between this calculus and information retrieval, and porposes a model of an information retrieval system based on this calculus
-
Carpineto, C.; Romano, G.: Order-theoretical ranking (2000)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 5766) [ClassicSimilarity], result of:
0.15447271 = score(doc=5766,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 5766, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=5766)
0.25 = coord(1/4)
- Abstract
- Current best-match ranking (BMR) systems perform well but cannot handle word mismatch between a query and a document. The best known alternative ranking method, hierarchical clustering-based ranking (HCR), seems to be more robust than BMR with respect to this problem, but it is hampered by theoretical and practical limitations. We present an approach to document ranking that explicitly addresses the word mismatch problem by exploiting interdocument similarity information in a novel way. Document ranking is seen as a query-document transformation driven by a conceptual representation of the whole document collection, into which the query is merged. Our approach is nased on the theory of concept (or Galois) lattices, which, er argue, provides a powerful, well-founded, and conputationally-tractable framework to model the space in which documents and query are represented and to compute such a transformation. We compared information retrieval using concept lattice-based ranking (CLR) to BMR and HCR. The results showed that HCR was outperformed by CLR as well as BMR, and suggested that, of the two best methods, BMR achieved better performance than CLR on the whole document set, whereas CLR compared more favorably when only the first retrieved documents were used for evaluation. We also evaluated the three methods' specific ability to rank documents that did not match the query, in which case the speriority of CLR over BMR and HCR was apparent
-
Gatzemeier, F.H.: Patterns, schemata, and types : author support through formalized experience (2000)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 6069) [ClassicSimilarity], result of:
0.15447271 = score(doc=6069,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 6069, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=6069)
0.25 = coord(1/4)
- Abstract
- Conceptual authoring support provides tools to help authors construct and organize their document on the conceptual level. As computer-based tools are purely formal entities, they cannot handle natural language itself. Instead, they provide the author with directions and examples that (if adopted) remain linked to the text. This paper discusses several levels of such directions: A Pattern describes a solution for a common problem, here a combination of audience and topic. It may point to several Schemata, which may be expanded in the document structure graph, leaving the author with more specific graph structures to expand and text gaps to fill in. A Type Definition is finally a restriction on the possible document structures the author is allowed to build. Several examples of such patterns, schemata and types are presented. These levels of support are being implemented in an authoring support environment called CHASID. It extends conventional authoring applications, currently ToolBook. The graph transformation aspects are implemented as an executable PROGRES specification
-
Drabenstott, K.M.; Weller, M.S.: Handling spelling errors in online catalog searches (1996)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 6973) [ClassicSimilarity], result of:
0.15447271 = score(doc=6973,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 6973, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=6973)
0.25 = coord(1/4)
- Abstract
- Reports results of 2 separate but related projects to study the influence of spelling errors (misspellings), made by searchers, on the subject searching of online catalogues and to suggest ways of improving error detection systems to handle the errors that they detect. This involved the categorization of user queries for subjects that were extracted from the online catalogue transaction logs of 4 USA university libraries. The research questions considered: the prevalence of misspellings in user queries for subjects; and how users respond to online catalogues that detect possible spelling errors in their subject queries. Less than 6% of user queries that match the catalogue's controlled and free text terms were found to contain spelling errors. While the majority of users corrected misspelled query words, a sizable proportion made an action that was even more detrimental than the original misspelling. Concludes with 3 recommended improvements: online catalogues should be equipped with search trees to place the burden of selecting a subject the system instead of the user; systems should be equipped with automatic spelling checking routines that inform users of possibly misspelled words; and online catalogues should be enhanced with tools and techniques to distinguish between queries that fail due to misspellings and correction failures. Cautions that spelling is not a serious problem but can seriously hinder the most routine subject search
-
Sharretts, C.W.; Shieh, J.; French, J.C.: Electronic theses and dissertations at the University of Virginia (1999)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 702) [ClassicSimilarity], result of:
0.15447271 = score(doc=702,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 702, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=702)
0.25 = coord(1/4)
- Abstract
- Although technology has made life easier in many ways, one constant complaint has been the time it takes to learn it. This is why simplicity was the main concern of the University of Virginia (UVa) when implementing the Electronic Theses and Dissertations (ETD). ETD are not a new concept. The uniqueness of the Virginia ETD lies in the fact that the whole process was assimilated through the technical skills and intellectual efforts of faculty and students. The ETD creates no extra network load and is fully automatic from the submission of data, to the conversion into MARC and subsequent loading into the Library's online catalog, VIRGO. This paper describes the trajectory of an ETD upon submission. The system is designed to be easy and self-explanatory. Submission instructions guide the student step by step. Screen messages, such as errors, are generated automatically when appropriate, while e-mail messages, regarding the status of the process, are automatically posted to students, advisors, catalogers, and school officials. The paradigms and methodologies will help to push forward the ETD project at the University. Planned enhancements are: Indexing the data for searching and retrieval using Dienst for Web interface, to synchronize the searching experience in both VIRGO and the Web; Securing the authorship of the data; Automating the upload and indexing bibliographic data in VIRGO; Employing Uniform Resource Names (URN) using the Corporation for National Research Initiatives (CNRI) Handle architecture scheme; Adding Standard Generalized Markup Language (SGML) to the list of formats acceptable for archiving ETD
-
McIlwaine, I.C.: ¬The Universal Decimal Classification : a guide to its use (2000)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 1161) [ClassicSimilarity], result of:
0.15447271 = score(doc=1161,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 1161, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=1161)
0.25 = coord(1/4)
- Abstract
- This book is an extension and total revision of the author's earlier Guide to the use of UDC. The original was written in 1993 and in the intervening years much has happened with the classification. In particular, a much more rigorous approach has been undertaken in revision to ensure that the scheme is able to handle the requirements of a networked world. The book outlines the history and development of the Universal Decimal Classification, provides practical hints on its application and works through all the auxiliary and main tables highlighting aspects that need to be noted in applying the scheme. It also provides guidance on the use of the Master Reference File and discusses the ways in which the classification is used in the 21st century and its suitability as an aid to subject description in tagging metadata and consequently for application on the Internet. It is intended as a source for information about the scheme, for practical usage by classifiers in their daily work and as a guide to the student learning how to apply the classification. It is amply provided with examples to illustrate the many ways in which the scheme can be applied and will be a useful source for a wide range of information workers
-
Chinenyanga, T.T.; Kushmerick, N.: ¬An expressive and efficient language for XML information retrieval (2002)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 1462) [ClassicSimilarity], result of:
0.15447271 = score(doc=1462,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 1462, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=1462)
0.25 = coord(1/4)
- Abstract
- Several languages for querying and transforming XML, including XML-QL, Quilt, and XQL, have been proposed. However, these languages do not support ranked queries based on textual similarity, in the spirit of traditional IR. Several extensions to these XML query languages to support keyword search have been made, but the resulting languages cannot express IR-style queries such as "find books and CDs with similar titles." In some of these languages keywords are used merely as boolean filters without support for true ranked retrieval; others permit similarity calculations only between a data value and a constant, and thus cannot express the above query. WHIRL avoids both problems, but assumes relational data. We propose ELIXIR, an expressive and efficient language for XML information retrieval that extends XML-QL with a textual similarity operator that can be used for similarity joins, so ELIXIR is sufficiently expressive to handle the sample query above. ELIXIR thus qualifies as a general-purpose XML IR query language. Our central contribution is an efficient algorithm for answering ELIXIR queries that rewrites the original ELIXIR query into a series of XML-QL queries to generate intermediate relational data, and uses WHIRL to efficiently evaluate the similarity operators on this intermediate data, yielding an XML document with nodes ranked by similarity. Our experiments demonstrate that our prototype scales well with the size of the query and the XML data.
-
Dominich, S.; Góth, J.; Kiezer, T.; Szlávik, Z.: ¬An entropy-based interpretation of retrieval status value-based retrieval, and its application to the computation of term and query discrimination value (2004)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 3237) [ClassicSimilarity], result of:
0.15447271 = score(doc=3237,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 3237, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=3237)
0.25 = coord(1/4)
- Abstract
- The concepts of Shannon information and entropy have been applied to a number of information retrieval tasks such as to formalize the probabilistic model, to design practical retrieval systems, to cluster documents, and to model texture in image retrieval. In this report, the concept of entropy is used for a different purpose. It is shown that any positive Retrieval Status Value (RSV)based retrieval system may be conceived as a special probability space in which the amount of the associated Shannon information is being reduced; in this view, the retrieval system is referred to as Uncertainty Decreasing Operation (UDO). The concept of UDO is then proposed as a theoretical background for term and query discrimination Power, and it is applied to the computation of term and query discrimination values in the vector space retrieval model. Experimental evidence is given as regards such computation; the results obtained compare weIl to those obtained using vector-based calculation of term discrimination values. The UDO-based computation, however, presents advantages over the vectorbased calculation: It is faster, easier to assess and handle in practice, and its application is not restricted to the vector space model. Based an the ADI test collection, it is shown that the UDO-based Term Discrimination Value (TDV) weighting scheme yields better retrieval effectiveness than using the vector-based TDV weighting scheme. Also, experimental evidence is given to the intuition that the choice of an appropriate weighting scheure and similarity measure depends an collection properties, and thus the UDO approach may be used as a theoretical basis for this intuition.
-
Broughton, V.; Lane, H.: Classification schemes revisited : applications to Web indexing and searching (2000)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 3476) [ClassicSimilarity], result of:
0.15447271 = score(doc=3476,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 3476, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=3476)
0.25 = coord(1/4)
- Content
- A short discussion of using classification systems to organize the web, one of many such. The authors are both involved with BC2 and naturally think it is the best system for organizing information online. They list reasons why faceted classifications are best (e.g. no theoretical limits to specificity or exhaustivity; easier to handle complex subjects; flexible enough to accommodate different user needs) and take a brief look at how BC2 works. They conclude with a discussion of how and why it should be applied to online resources, and a plea for recognition of the importance of classification and subject analysis skills, even when full-text searching is available and databases respond instantly.
-
Shachak, A.: Diffusion pattern of the use of genomic databases and analysis of biological sequences from 1970-2003 : bibliographic record analysis of 12 journals (2006)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 5906) [ClassicSimilarity], result of:
0.15447271 = score(doc=5906,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 5906, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=5906)
0.25 = coord(1/4)
- Abstract
- In recent years there has been an explosion of biological data stored in large central databases, tools to handle the data, and educational programs to train scientists in using bioinformatics resources. Still, the diffusion of bioinformatics within the biological cornmunity has yet to be extensively studied. In this study, the diffusion of two bioinformatics-related practices-using genomic databases and analyzing DNA and protein sequences-was investigated by analyzing MEDLINE records of 12 journals, representing various fields of biology. The diffusion of these practices between 1970 and 2003 follows an S-shaped curve typical of many innovations, beginning with slow growth, followed by a period of rapid linear growth, and finally reaching saturation. Similar diffusion patterns were found for both the use of genomic databases and biological sequence analysis, indicating the strong relationship between these practices. This study presents the surge in the use of genomic databases and analysis of biological sequences and proposes that these practices are fully diffused within the biological community. Extrapolating from these results, it suggests that taking a diffusion of innovations approach may be useful for researchers as well as for providers of bioinformatics applications and support services.
-
Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 206) [ClassicSimilarity], result of:
0.15447271 = score(doc=206,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 206, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=206)
0.25 = coord(1/4)
- Abstract
- Khoo, Dai, and Loh examine new statistical methods for the identification of two and three character words in Chinese text. Some meaningful Chinese words are simple (independent units of one or more characters in a sentence that have independent meaning) but others are compounds of two or more simple words. In their segmentation they utilize the Modern Chinese Word Segmentation for Application of Information Processing, with some modifications to focus on meaningful words to do manual segmentation. About 37% of meaningful words are longer than 2 characters indicating a need to handle three and four character words. Four hundred sentences from news articles were manually broken into overlapping bi-grams and tri-grams. Using logistic regression, the log of the odds that such bi/tri-grams were meaningful words was calculated. Variables like relative frequency, document frequency, local frequency, and contextual and positional information, were incorporated in the model only if the concordance measure improved by at least 2% with their addition. For two- and three-character words relative frequency of adjacent characters and document frequency of overlapping bi-grams were found to be significant. Using measures of recall and precision where correct automatic segmentation is normalized either by manual segmentation or by automatic segmentation, the contextual information formula for 2 character words provides significantly better results than previous formulations and using both the 2 and 3 character formulations in combination significantly improves the 2 character results.
-
Ziadie, A.M.: Classification in libraries and networks abroad : a report of a panel discussion (1995)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 569) [ClassicSimilarity], result of:
0.15447271 = score(doc=569,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 569, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=569)
0.25 = coord(1/4)
- Abstract
- Ia McIlwaine discussed the importance of addressing the issue of lack of user-friendly access to systems for users located in many parts of the world. The diversity of the European classification systems is a case in point. A good example of how to handle this diversity, in her opinion, is the system at the Federal Technical University in Zurich. It has an especially user-friendly French and German interface which, along with UDC numbers, provides captions helpful for the average user. Having examined the problems associated with transnational copy cataloging she emphasized the consideration of cultural constructs in transnational cataloging. For example, the Islamic countries tend to adapt translations quite well in their classification schemes due to the fact that they possess greater literary warrant in Islam. China appears to have solved difficulties concerning transnational copy cataloging by incorporating Chinese materials into specialized classification schemes while utilizing MARC records in the national library for cataloging Western materials. Philip Bryant called for the balance of "utopian vision" with practicality. He stressed that existing bibliographic notations must be pushed to the limit in an attempt to function with the network He applauded the continuous work of Stephen Walker, Stephen Robertson and Jill Venner for developing an online catalog (OKAPI) which allows the average user to obtain help existing in the database by using the classification system already established in the data. He emphasized the significance of the BUBL project at the University of Strathclyde, where UDC subject divisions are employed as a means of dividing subjects into fairly large groupings.