Search (1405 results, page 10 of 71)

Eddings, J.: How the Internet works (1994) 0.05
```
0.04514617 = product of:
  0.18058468 = sum of:
    0.18058468 = weight(_text_:java in 2514) [ClassicSimilarity], result of:
      0.18058468 = score(doc=2514,freq=2.0), product of:
        0.46384227 = queryWeight, product of:
          7.0475073 = idf(docFreq=104, maxDocs=44421)
          0.0658165 = queryNorm
        0.38932347 = fieldWeight in 2514, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.0475073 = idf(docFreq=104, maxDocs=44421)
          0.0390625 = fieldNorm(doc=2514)
  0.25 = coord(1/4)
```
Abstract

How the Internet Works promises "an exciting visual journey down the highways and byways of the Internet," and it delivers. The book's high quality graphics and simple, succinct text make it the ideal book for beginners; however it still has much to offer for Net vets. This book is jam- packed with cool ways to visualize how the Net works. The first section visually explores how TCP/IP, Winsock, and other Net connectivity mysteries work. This section also helps you understand how e-mail addresses and domains work, what file types mean, and how information travels across the Net. Part 2 unravels the Net's underlying architecture, including good information on how routers work and what is meant by client/server architecture. The third section covers your own connection to the Net through an Internet Service Provider (ISP), and how ISDN, cable modems, and Web TV work. Part 4 discusses e-mail, spam, newsgroups, Internet Relay Chat (IRC), and Net phone calls. In part 5, you'll find out how other Net tools, such as gopher, telnet, WAIS, and FTP, can enhance your Net experience. The sixth section takes on the World Wide Web, including everything from how HTML works to image maps and forms. Part 7 looks at other Web features such as push technology, Java, ActiveX, and CGI scripting, while part 8 deals with multimedia on the Net. Part 9 shows you what intranets are and covers groupware, and shopping and searching the Net. The book wraps up with part 10, a chapter on Net security that covers firewalls, viruses, cookies, and other Web tracking devices, plus cryptography and parental controls.
Wu, D.; Shi, J.: Classical music recording ontology used in a library catalog (2016) 0.05
```
0.04514617 = product of:
  0.18058468 = sum of:
    0.18058468 = weight(_text_:java in 4179) [ClassicSimilarity], result of:
      0.18058468 = score(doc=4179,freq=2.0), product of:
        0.46384227 = queryWeight, product of:
          7.0475073 = idf(docFreq=104, maxDocs=44421)
          0.0658165 = queryNorm
        0.38932347 = fieldWeight in 4179, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.0475073 = idf(docFreq=104, maxDocs=44421)
          0.0390625 = fieldNorm(doc=4179)
  0.25 = coord(1/4)
```
Abstract

In order to improve the organization of classical music information resources, we constructed a classical music recording ontology, on top of which we then designed an online classical music catalog. Our construction of the classical music recording ontology consisted of three steps: identifying the purpose, analyzing the ontology, and encoding the ontology. We identified the main classes and properties of the domain by investigating classical music recording resources and users' information needs. We implemented the ontology in the Web Ontology Language (OWL) using five steps: transforming the properties, encoding the transformed properties, defining ranges of the properties, constructing individuals, and standardizing the ontology. In constructing the online catalog, we first designed the structure and functions of the catalog based on investigations into users' information needs and information-seeking behaviors. Then we extracted classes and properties of the ontology using the Apache Jena application programming interface (API), and constructed a catalog in the Java environment. The catalog provides a hierarchical main page (built using the Functional Requirements for Bibliographic Records (FRBR) model), a classical music information network and integrated information service; this combination of features greatly eases the task of finding classical music recordings and more information about classical music.
Barker, P.: ¬An examination of the use of the OSI Directory for accessing bibliographic information : project ABDUX (1993) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 7309) [ClassicSimilarity], result of:
      0.1716406 = score(doc=7309,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 7309, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=7309)
  0.25 = coord(1/4)
```
Abstract

Describes the work of the ABDUX project, containing a brief description of the rationale for using X.500 for access to bibliographic information. Outlines the project's design work and a demonstration system. Reviews the standards applicable to bibliographic data and library OPACs. Highlights difficulties found when handling bibliographic data in library systems. Discusses the service requirements of OPACs for accessing bibliographic, discussing how X.500 Directory services may be used. Suggests the DIT structures that coulb be used for storing both bibliographic information and descriptions on information resources in general in the directory. Describes the way in which the model of bibliographic data is presented. Outlines the syntax of ASN.1 and how records and fields may be described in terms of X.500 object classes and attribute types. Details the mapping of MARC format into an X.500 compatible form. Provides the schema information for representing research notes and archives, not covered by MARC definitions. Examines the success in implementing the designs and loos ahead to future possibilities
Fisher, S.; Rowley, J.: Management information and library management systems : an overview (1994) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 7442) [ClassicSimilarity], result of:
      0.1716406 = score(doc=7442,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 7442, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=7442)
  0.25 = coord(1/4)
```
Abstract

Management information facilities transform the library management system into a much more effective management tool. Three levels of management can be identified - operational, tactical and strategic - and each of these has its own unique management information needs. Earlier work on the use of management information in libraries and the development of management information systems demonstrates that progress in these areas has been slow. Management information systems comprise three components: facilities for handling ad hoc enquiries; facilities for standard report report generation; and management information modules, or report generators that support the production of user-defined reports. A lsit of standard reports covering acquisitions, cataloguing, circulation control, serials and inter-library loans is provided. The functions of report generators are explored and the nature of enquiry facilities reviewed. Management information tools available in library management systems form a valuable aid in decision making. These should be further exploited and further developed
Beynon-Davies, P.: ¬A semantic database approach to knowledge-based hypermedia systems (1994) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 830) [ClassicSimilarity], result of:
      0.1716406 = score(doc=830,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 830, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=830)
  0.25 = coord(1/4)
```
Abstract

Discusses an architecture for knowledge-based hypermedia systems based on work from semantic databases. Its power derives from its use of a single, uniform data structure which can be used to store both the intensional and extensional information needed to generate hypermedia systems. The architecture is also sufficiently powerful to accomodate the representation of reasonable amount of knowledge within a hypermedia system. Work has been conducted in building a number of prototypes on a small information base of digital image data. The prototypes serve as demonstrators of systems for managing the large amount of information held by museums of their artifacts. The aim of this work is to demonstrate the flexibility of the architecture in sereving the needs of a number of distinct user groups. The first prototype has demonstrated that the virtual architecture is capable of supporting some of the main hypermedia access methods. The current demonstrator is being used to investigate the potential of the approach for handling multiple classifications of hypermedia material. The research is particularly directed at the incorporation of evolving temporal and spatial knowledge
Ohsuga, S.: ¬A way of designing knowledge based systems (1995) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 3278) [ClassicSimilarity], result of:
      0.1716406 = score(doc=3278,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 3278, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=3278)
  0.25 = coord(1/4)
```
Abstract

Discusses the design of intelligent knowledge based systems. Discusses the kinds of systems that are capable of handling diverse problems arising in the real world, and solving them autonomously. A new approach is necessary for designing such systems. Analyzes huhman activities and describes a way of representing each activity as a compound of basic intelligent functions. Some functions are represented as the compounds of other functions. Thus, a hierarchy of the functions is constructed to form the software architecture of an intelligent system, where the human interface appears on top of this structure. Intelligent systems need to be provided with considerable knowledge. However, it is very wasteful to let every person collect and structure large amounts of knowledge. It is desirable that there should be large knowledge bases which can supply each intelligent knowledge system as necessary. Discusses a network system consisting of many intelligent systems and one or more large commonly accessible knowledge bases
Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 5191) [ClassicSimilarity], result of:
      0.1716406 = score(doc=5191,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 5191, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=5191)
  0.25 = coord(1/4)
```
Abstract

Traces the development of text searching and retrieval software designed to cope with the increasing demands made by the storage and handling of large amounts of data, recorded on high data storage media, from CD-ROM to multi gigabyte storage media and online information services, with particular reference to the need to cope with graphics as well as conventional ASCII text. Includes details of: Boolean searching, fuzzy searching and matching; relevance ranking; proximity searching and improved strategies for dealing with text searching in very large databases. Concludes that the best searching tools for CD-ROM publishers are those optimized for searching and retrieval on CD-ROM. CD-ROM drives have relatively lower random seek times than hard discs and so the software most appropriate to the medium is that which can effectively arrange the indexes and text on the CD-ROM to avoid continuous random access searching. Lists and reviews a selection of software packages designed to achieve the sort of results required for rapid CD-ROM searching
Fattahi, R.: ¬A uniform approach to the indexing of cataloguing data in online library systems (1997) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 1131) [ClassicSimilarity], result of:
      0.1716406 = score(doc=1131,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 1131, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=1131)
  0.25 = coord(1/4)
```
Abstract

Argues that in library cataloguing and for optional functionality of bibliographic records the indexing of fields and subfields should follow a uniform approach. This would maintain effectiveness in searching, retrieval and display of bibliographic information both within systems and between systems. However, a review of different postings to the AUTOCAT and USMARC discussion lists indicates that the indexing and tagging of cataloguing data do not, at present, follow a consistent approach in online library systems. If the rationale of cataloguing principles is to bring uniformity in bibliographic description and effectiveness in access, they should also address the question of uniform approaches to the indexing of cataloguing data. In this context and in terms of the identification and handling of data elements, cataloguing standards (codes, MARC formats and the Z39.50 standard) should be brought closer, in that they should provide guidelines for the designation of data elements for machine readable records

Oppenheim, C.: Managers' use and handling of information (1997) 0.04

0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 1357) [ClassicSimilarity], result of:
      0.1716406 = score(doc=1357,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 1357, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=1357)
  0.25 = coord(1/4)

Taylor, M.J.; Mortimer, A.M.; Addison, M.A.; Turner, M.C.R.: 'NESS-plants' : an interactive multi-media information system for botanic gardens (1994) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 3733) [ClassicSimilarity], result of:
      0.1716406 = score(doc=3733,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 3733, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=3733)
  0.25 = coord(1/4)
```
Abstract

Multimedia techniques facilitate simplified access to information databases containing diverse types of data, ranging from simple text to audio and pictorial information. The design of such systems focuses on the user interface, with particular emphasis on navigation through the information base. The paper describes the enhancement of an interactive multimedia information system developed at the University of Liverpool for use by the general public in the University's Botanic Gardens, at Ness. The original system consists of a plant record management system for handling textual and graphical information, and a library of pictures; a task oriented user interface providing flexible interrogation of the held information; and independent access facilities for casual visitors i.e. the general public and professional curators i.e. the garden staff. A novel feature of the general public's interaction is the ability to compose complex queries visually, using multimediua techniques. These locate individual plant records in the context of an on screen map which represents the geographic layout of the garden
Cousins, S.A.: Duplicate detection and record consolidation in large bibliographic databases : the COPAC database experience (1998) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 3833) [ClassicSimilarity], result of:
      0.1716406 = score(doc=3833,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 3833, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=3833)
  0.25 = coord(1/4)
```
Abstract

COPAC (CURL OPAC) is a union catalogue, based on records supplied by members of the Consortium of University Libraries (CURL), giving access to the online catalogue records of some of the largest academic research libraries in the UK and Ireland. Like all union catalogues, COPAC is supplied with multiple copies of records representing the same document in the contributing library catalogues. To reduce the level of duplication visible to the COPAC user, duplicate detection and record consolidation procedures have been developed. These result in the production of a single record for each document, representing the holdings of several libraries. Discusses the ways in which both the duplicate detection and record consolidation procedures are carried out, and problem areas encountered. Describes the general structure of these procedures, providing a model of the duplicate record handling mechanisms used in COPAC
Wu, X.: Rule induction with extension matrices (1998) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 3912) [ClassicSimilarity], result of:
      0.1716406 = score(doc=3912,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 3912, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=3912)
  0.25 = coord(1/4)
```
Abstract

Presents a heuristic, attribute-based, noise-tolerant data mining program, HCV (Version 2.0), absed on the newly-developed extension matrix approach. Gives a simple example of attribute-based induction to show the difference between the rules in variable-valued logic produced by HCV, the decision tree generated by C4.5 and the decision tree's decompiled rules by C4.5 rules. Outlines the extension matrix approach for data mining. Describes the HCV algorithm in detail. Outlines techniques developed and implemented in the HCV program for noise handling and discretization of continuous domains respectively. Follows these with a performance comparison of HCV with famous ID3-like algorithms including C4.5 and C4.5 rules on a collection of standard databases including the famous MONK's problems
Schupbach, W.: ¬The Iconographic Collections Videodisc at the Wellcome Institute for the History of Medicine, London (1994) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 6363) [ClassicSimilarity], result of:
      0.1716406 = score(doc=6363,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 6363, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=6363)
  0.25 = coord(1/4)
```
Abstract

Many libraries, museums, art galleries, and heritage centres are thinking of using electronic media to help them to achieve their aims and some have started to put these plans into action. One such project is the Iconographic Collections Videodisc, produced by the Wellcome Institute for the History of Medicine Library, London, UK, and available free of charge to the public at the Wellcome Institute building since Jun 93. The Iconographic Collections consists of large collections of prints, drawings, paintings and photographs, covering a range of subjects, including the history of medicine as the central theme. The aims of the project are: to preserve the collections from the damage caused by avoidable exposure to light and handling; and to make available to users allt he items in the library. The videodisc performs the function of a massive; illustrated catalogue. Includes examples of the images stored on the videodisc and the catalogue records held on the system
Ioannides, D.: XML schema languages : beyond DTD (2000) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 1720) [ClassicSimilarity], result of:
      0.1716406 = score(doc=1720,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 1720, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=1720)
  0.25 = coord(1/4)
```
Abstract

The flexibility and extensibility of XML have largely contributed to its wide acceptance beyond the traditional realm of SGML. Yet, there is still one more obstacle to be overcome before XML is able to become the evangelized universal data/document format. The obstacle is posed by the limitations of the legacy standard for constraining the contents of an XML document. The traditionally used DTD (document type definition) format does not lend itself to be used in the wide variety of applications XML is capable of handling. The World Wide Web Consortium (W3C) has charged the XML schema working group with the task of developing a schema language to replace DTD. This XML schema language is evolving based on early drafts of XML schema languages. Each one of these early efforts adopted a slightly different approach, but all of them were moving in the same direction.
L'Homme, D.; L'Homme, M.-C.; Lemay, C.: Benchmarking the performance of two Part-of-Speech (POS) taggers for terminological purposes (2002) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 2855) [ClassicSimilarity], result of:
      0.1716406 = score(doc=2855,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 2855, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=2855)
  0.25 = coord(1/4)
```
Abstract

Part-of-Speech (POS) taggers are used in an increasing number of terminology applications. However, terminologists do not know exactly how they perform an specialized texts since most POS taggers have been trained an "general" Corpora, that is, Corpora containing all sorts of undifferentiated texts. In this article, we evaluate the Performance of two POS taggers an French and English medical texts. The taggers are TnT (a statistical tagger developed at Saarland University (Brants 2000)) and WinBrill (the Windows version of the tagger initially developed by Eric Brill (1992)). Ten extracts from medical texts were submitted to the taggers and the outputs scanned manually. Results pertain to the accuracy of tagging in terms of correctly and incorrectly tagged words. We also study the handling of unknown words from different viewpoints.
Setting the record straight : understanding the MARC format (1993) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 3327) [ClassicSimilarity], result of:
      0.1716406 = score(doc=3327,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 3327, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=3327)
  0.25 = coord(1/4)
```
Abstract

MARC is an acronym for Machine Readable Catalogue or Cataloguing. This general description, howcver, is rather misleading as MARC is neither a kind of catalogue nor a method of cataloguing. In fact, MARC is a Standardformat for representing bibliographic information for handling by computer. While the MARC format was primarily designed to serve the needs of libraries, the concept has since been embraced by the wider information community as a convenient way of storing and exchanging bibliographic data. The original MARC format was developed at the Library of Congress in 1965-6 leading to a pilot project, known as MARC I, which had the aim of investigating the feasibility of producing machine-readable catalogue data. Similar work was in progress in the United Kingdom whcre the Council of the British National Bibliography had set up the BNB MARC Project with the rennt of examining the use of machine-readable data in producing the printed British National Bibliography (BNB). These parallel developments led to Anglo-American co-operation an the MARC 11 project which was initiated in 1968. MARC II was to prove instrumental in defining the concept of MARC as a communications format.
Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 3563) [ClassicSimilarity], result of:
      0.1716406 = score(doc=3563,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 3563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=3563)
  0.25 = coord(1/4)
```
Abstract

Topic discovery is an important means for marketing, e-Business and social science studies. As well, it can be applied to various purposes, such as identifying a group with certain properties and observing the emergence and diminishment of a certain cyber community. Previous topic discovery work (J.M. Kleinberg, Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, p. 668) requires manual judgment of usefulness of outcomes and is thus incapable of handling the explosive growth of the Internet. In this paper, we propose the Automatic Topic Discovery (ATD) method, which combines a method of base set construction, a clustering algorithm and an iterative principal eigenvector computation method to discover the topics relevant to a given query without using manual examination. Given a query, ATD returns with topics associated with the query and top representative pages for each topic. Our experiments show that the ATD method performs better than the traditional eigenvector method in terms of computation time and topic discovery quality.
McCallum, S.H.: Preservation metadata standards for digital resources : what we have and what we need (2005) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 5353) [ClassicSimilarity], result of:
      0.1716406 = score(doc=5353,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 5353, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=5353)
  0.25 = coord(1/4)
```
Abstract

A key component for the successful preservation of digital resources is going to be the metadata that enables automated preservation processes to take place. The number of digital items will preclude human handling and the fact that these resources are electronic makes them logical for computer driven preservation activities. Over the last decade there have been a number of digital repository experiments that took different approaches, developed and used different data models, and generally moved our understanding forward. This paper reports on a recent initiative, PREMIS, that builds upon concepts and experience to date. It merits careful testing to see if the metadata identified can be used generally and become a foundation for more detailed metadata. And how much more will be needed for preservation activities? Initiatives for additional technical metadata and document format registries are also discussed.
Pirkola, A.: Morphological typology of languages for IR (2001) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 5476) [ClassicSimilarity], result of:
      0.1716406 = score(doc=5476,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 5476, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=5476)
  0.25 = coord(1/4)
```
Abstract

This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross-language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono- and cross-lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.
Conrad, J.G.; Schriber, C.P.: Managing déjà vu : collection building for the identification of nonidentical duplicate documents (2006) 0.04
```
0.04291015 = product of:
  0.1716406 = sum of:
    0.1716406 = weight(_text_:handling in 59) [ClassicSimilarity], result of:
      0.1716406 = score(doc=59,freq=2.0), product of:
        0.4128091 = queryWeight, product of:
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.0658165 = queryNorm
        0.41578686 = fieldWeight in 59, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.272122 = idf(docFreq=227, maxDocs=44421)
          0.046875 = fieldNorm(doc=59)
  0.25 = coord(1/4)
```
Abstract

As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. Few users wish to retrieve search results consisting of sets of duplicate documents, whether identical duplicates or close variants. The goal of this work is to facilitate (a) investigations into the phenomenon of near duplicates and (b) algorithmic approaches to minimizing its deleterious effect on search results. Harnessing the expertise of both client-users and professional searchers, we establish principled methods to generate a test collection for identifying and handling nonidentical duplicate documents. We subsequently examine a flexible method of characterizing and comparing documents to permit the identification of near duplicates. This method has produced promising results following an extensive evaluation using a production-based test collection created by domain experts.

Search (1405 results, page 10 of 71)

Authors

Years

Languages

Types

Themes

Subjects

Classifications