Search (1260 results, page 5 of 63)

Maislin, S.: Ripping out the pages (2000) 0.05
```
0.04711922 = product of:
  0.18847688 = sum of:
    0.18847688 = weight(_text_:hyperlinks in 1220) [ClassicSimilarity], result of:
      0.18847688 = score(doc=1220,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.40365952 = fieldWeight in 1220, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.0390625 = fieldNorm(doc=1220)
  0.25 = coord(1/4)
```
Abstract

When the Web was invented, it was touted as a novel nonlinear medium for the written word. No longer would we be constrained by linear presentations! Hyperlinks would allow us to jump haphazardly from page to page, chapter to chapter, idea to idea! Texts would no longer need to run from beginning to end! This is misleading. A printed book is also multidimensional and potentially nonlinear. We can open it to any page, from any other page, for any reason. We can open several books at once. In fact, what makes a book special is its combination of linear structure (the order of the words) and nonlinear physicality (the bound papers). This linear/nonlinear duality is enhanced further by the index, which maps linearly sequenced pages in a nonlinear, informationally ordered structure (architecture). In truth, the online environment is crippled by an absence of linear structure. Imagine selecting a hard cover book, tearing off the covers, ripping pages into small pieces, and throwing them in a box. That box is like a computer file system, and the paper scraps are Web documents. Only one scrap can be retrieved from the box at a time, and it must be replaced before another can be accessed. Page numbers are meaningless. Global context is destroyed. And without page numbers or context, what happens to the index?
Amitay, E.; Carmel, D.; Herscovici, M.; Lempel, R.; Soffer, A.: Trend detection through temporal link analysis (2004) 0.05
```
0.04711922 = product of:
  0.18847688 = sum of:
    0.18847688 = weight(_text_:hyperlinks in 4092) [ClassicSimilarity], result of:
      0.18847688 = score(doc=4092,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.40365952 = fieldWeight in 4092, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.0390625 = fieldNorm(doc=4092)
  0.25 = coord(1/4)
```
Abstract

Although time has been recognized as an important dimension in the co-citation literature, to date it has not been incorporated into the analogous process of link analysis an the Web. In this paper, we discuss several aspects and uses of the time dimension in the context of Web information retrieval. We describe the ideal casewhere search engines track and store temporal data for each of the pages in their repository, assigning timestamps to the hyperlinks embedded within the pages. We introduce several applications which benefit from the availability of such timestamps. To demonstrate our claims, we use a somewhat simplistic approach, which dates links by approximating the age of the page's content. We show that by using this crude measure alone it is possible to detect and expose significant events and trends. We predict that by using more robust methods for tracking modifications in the content of pages, search engines will be able to provide results that are more timely and better reflect current real-life trends than those they provide today.
Daizadeh, I.: ¬An example of information management in biology : qualitative data economizing theory applied to the Human Genome Project databases (2006) 0.05
```
0.04711922 = product of:
  0.18847688 = sum of:
    0.18847688 = weight(_text_:hyperlinks in 5924) [ClassicSimilarity], result of:
      0.18847688 = score(doc=5924,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.40365952 = fieldWeight in 5924, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.0390625 = fieldNorm(doc=5924)
  0.25 = coord(1/4)
```
Abstract

Ironically, although much work has been done an elucidating algorithms for enabling scientists to efficiently retrieve relevant information from the glut of data derived from the efforts of the Human Genome Project and other similar projects, little has been performed an optimizing the levels of data economy across databases. One technique to qualify the degree of data economization is that constructed by Boisot. Boisot's Information Space (I-Space) takes into account the degree to which data are written (codification), the degree to which the data can be understood (abstraction), and the degree to which the data are effectively communicated to an audience (diffusion). A data system is said to be more data economical if it is relatively high in these dimensions. Application of the approach to entries in two popular, publicly available biological data repositories, the Protein DataBank (PDB) and GenBank, leads to the recommendation that PDB increases its level of abstraction through establishing a larger set of detailed keywords, diffusion through constructing hyperlinks to other databases, and codification through constructing additional subsections. With these recommendations in place, PDB would achieve the greater data economies currently enjoyed by GenBank. A discussion of the limitations of the approach is presented.
Egghe, L.; Rousseau, R.: ¬A measure for the cohesion of weighted networks (2003) 0.05
```
0.04711922 = product of:
  0.18847688 = sum of:
    0.18847688 = weight(_text_:hyperlinks in 157) [ClassicSimilarity], result of:
      0.18847688 = score(doc=157,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.40365952 = fieldWeight in 157, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.0390625 = fieldNorm(doc=157)
  0.25 = coord(1/4)
```
Abstract

Measurement of the degree of interconnectedness in graph like networks of hyperlinks or citations can indicate the existence of research fields and assist in comparative evaluation of research efforts. In this issue we begin with Egghe and Rousseau who review compactness measures and investigate the compactness of a network as a weighted graph with dissimilarity values characterizing the arcs between nodes. They make use of a generalization of the Botofogo, Rivlin, Shneiderman, (BRS) compaction measure which treats the distance between unreachable nodes not as infinity but rather as the number of nodes in the network. The dissimilarity values are determined by summing the reciprocals of the weights of the arcs in the shortest chain between two nodes where no weight is smaller than one. The BRS measure is then the maximum value for the sum of the dissimilarity measures less the actual sum divided by the difference between the maximum and minimum. The Wiener index, the sum of all elements in the dissimilarity matrix divided by two, is then computed for Small's particle physics co-citation data as well as the BRS measure, the dissimilarity values and shortest paths. The compactness measure for the weighted network is smaller than for the un-weighted. When the bibliographic coupling network is utilized it is shown to be less compact than the co-citation network which indicates that the new measure produces results that confirm to an obvious case.
Vaughan, L.; Shaw , D.: Bibliographic and Web citations : what Is the difference? (2003) 0.05
```
0.04711922 = product of:
  0.18847688 = sum of:
    0.18847688 = weight(_text_:hyperlinks in 176) [ClassicSimilarity], result of:
      0.18847688 = score(doc=176,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.40365952 = fieldWeight in 176, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.0390625 = fieldNorm(doc=176)
  0.25 = coord(1/4)
```
Abstract

Vaughn, and Shaw look at the relationship between traditional citation and Web citation (not hyperlinks but rather textual mentions of published papers). Using English language research journals in ISI's 2000 Journal Citation Report - Information and Library Science category - 1209 full length papers published in 1997 in 46 journals were identified. Each was searched in Social Science Citation Index and on the Web using Google phrase search by entering the title in quotation marks, and followed for distinction where necessary with sub-titles, author's names, and journal title words. After removing obvious false drops, the number of web sites was recorded for comparison with the SSCI counts. A second sample from 1992 was also collected for examination. There were a total of 16,371 web citations to the selected papers. The top and bottom ranked four journals were then examined and every third citation to every third paper was selected and classified as to source type, domain, and country of origin. Web counts are much higher than ISI citation counts. Of the 46 journals from 1997, 26 demonstrated a significant correlation between Web and traditional citation counts, and 11 of the 15 in the 1992 sample also showed significant correlation. Journal impact factor in 1998 and 1999 correlated significantly with average Web citations per journal in the 1997 data, but at a low level. Thirty percent of web citations come from other papers posted on the web, and 30percent from listings of web based bibliographic services, while twelve percent come from class reading lists. High web citation journals often have web accessible tables of content.
Barjak, F.; Thelwall, M.: ¬A statistical analysis of the web presences of European life sciences research teams (2008) 0.05
```
0.04711922 = product of:
  0.18847688 = sum of:
    0.18847688 = weight(_text_:hyperlinks in 2383) [ClassicSimilarity], result of:
      0.18847688 = score(doc=2383,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.40365952 = fieldWeight in 2383, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.0390625 = fieldNorm(doc=2383)
  0.25 = coord(1/4)
```
Abstract

Web links have been used for around ten years to explore the online impact of academic information and information producers. Nevertheless, few studies have attempted to relate link counts to relevant offline attributes of the owners of the targeted Web sites, with the exception of research productivity. This article reports the results of a study to relate site inlink counts to relevant owner characteristics for over 400 European life-science research group Web sites. The analysis confirmed that research-group size and Web-presence size were important for attracting Web links, although research productivity was not. Little evidence was found for significant influence of any of an array of factors, including research-group leader gender and industry connections. In addition, the choice of search engine for link data created a surprising international difference in the results, with Google perhaps giving unreliable results. Overall, the data collection, statistical analysis and results interpretation were all complex and it seems that we still need to know more about search engines, hyperlinks, and their function in science before we can draw conclusions on their usefulness and role in the canon of science and technology indicators.
Breslin, J.G.: Social semantic information spaces (2009) 0.05
```
0.04711922 = product of:
  0.18847688 = sum of:
    0.18847688 = weight(_text_:hyperlinks in 364) [ClassicSimilarity], result of:
      0.18847688 = score(doc=364,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.40365952 = fieldWeight in 364, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.0390625 = fieldNorm(doc=364)
  0.25 = coord(1/4)
```
Abstract

The structural and syntactic web put in place in the early 90s is still much the same as what we use today: resources (web pages, files, etc.) connected by untyped hyperlinks. By untyped, we mean that there is no easy way for a computer to figure out what a link between two pages means - for example, on the W3C website, there are hundreds of links to the various organisations that are registered members of the association, but there is nothing explicitly saying that the link is to an organisation that is a "member of" the W3C or what type of organisation is represented by the link. On John's work page, he links to many papers he has written, but it does not explicitly say that he is the author of those papers or that he wrote such-and-such when he was working at a particular university. In fact, the Web was envisaged to be much more, as one can see from the image in Fig. 1 which is taken from Tim Berners Lee's original outline for the Web in 1989, entitled "Information Management: A Proposal". In this, all the resources are connected by links describing the type of relationships, e.g. "wrote", "describe", "refers to", etc. This is a precursor to the Semantic Web which we will come back to later.
Kim, J.H.; Barnett, G.A.; Park, H.W.: ¬A hyperlink and issue network analysis of the United States Senate : a rediscovery of the Web as a relational and topical medium (2010) 0.05
```
0.04711922 = product of:
  0.18847688 = sum of:
    0.18847688 = weight(_text_:hyperlinks in 689) [ClassicSimilarity], result of:
      0.18847688 = score(doc=689,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.40365952 = fieldWeight in 689, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.0390625 = fieldNorm(doc=689)
  0.25 = coord(1/4)
```
Abstract

Politicians' Web sites have been considered a medium for organizing, mobilizing, and agenda-setting, but extant literature lacks a systematic approach to interpret the Web sites of senators - a new medium for political communication. This study classifies the role of political Web sites into relational (hyperlinking) and topical (shared-issues) aspects. The two aspects may be viewed from a social embeddedness perspective and three facets, as K. Foot and S. Schneider ([2002]) suggested. This study employed network analysis, a set of research procedures for identifying structures in social systems, as the basis of the relations among the system's components rather than the attributes of individuals. Hyperlink and issue data were gathered from the United States Senate Web site and Yahoo. Major findings include: (a) The hyperlinks are more targeted at Democratic senators than at Republicans and are a means of communication for senators and users; (b) the issue network found from the Web is used for discussing public agendas and is more highly utilized by Republican senators; (c) the hyperlink and issue networks are correlated; and (d) social relationships and issue ecologies can be effectively detected by these two networks. The need for further research is addressed.
Kipp, M.E.I.; Campbell, D.G.: Searching with tags : do tags help users find things? (2010) 0.05
```
0.04711922 = product of:
  0.18847688 = sum of:
    0.18847688 = weight(_text_:hyperlinks in 64) [ClassicSimilarity], result of:
      0.18847688 = score(doc=64,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.40365952 = fieldWeight in 64, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.0390625 = fieldNorm(doc=64)
  0.25 = coord(1/4)
```
Abstract

The question of whether tags can be useful in the process of information retrieval was examined in this pilot study. Many tags are subject related and could work well as index terms or entry vocabulary; however, folksonomies also include relationships that are traditionally not included in controlled vocabularies including affective or time and task related tags and the user name of the tagger. Participants searched a social bookmarking tool, specialising in academic articles (CiteULike), and an online journal database (Pubmed) for articles relevant to a given information request. Screen capture software was used to collect participant actions and a semi-structured interview asked them to describe their search process. Preliminary results showed that participants did use tags in their search process, as a guide to searching and as hyperlinks to potentially useful articles. However, participants also used controlled vocabularies in the journal database to locate useful search terms and links to related articles supplied by Pubmed. Additionally, participants reported using user names of taggers and group names to help select resources by relevance. The inclusion of subjective and social information from the taggers is very different from the traditional objectivity of indexing and was reported as an asset by a number of participants. This study suggests that while users value social and subjective factors when searching, they also find utility in objective factors such as subject headings. Most importantly, users are interested in the ability of systems to connect them with related articles whether via subject access or other means.
Lopes Martins, D.; Silva Lemos, D.L. da; Rosa de Oliveira, L.F.; Siqueira, J.; Carmo, D. do; Nunes Medeiros, V.: Information organization and representation in digital cultural heritage in Brazil : systematic mapping of information infrastructure in digital collections for data science applications (2023) 0.05
```
0.04711922 = product of:
  0.18847688 = sum of:
    0.18847688 = weight(_text_:hyperlinks in 1970) [ClassicSimilarity], result of:
      0.18847688 = score(doc=1970,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.40365952 = fieldWeight in 1970, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.0390625 = fieldNorm(doc=1970)
  0.25 = coord(1/4)
```
Abstract

This paper focuses on data science in digital cultural heritage in Brazil, where there is a lack of systematized information and curated databases for the integrated organization of documentary knowledge. Thus, the aim was to systematically map the different forms of information organization and representation applied to objects from collections belonging to institutions affiliated with the federal government's Special Department of Culture. This diagnosis is then used to discuss the requirements of devising strategies that favor a better data science information infrastructure to reuse information on Brazil's cultural heritage. Content analysis was used to identify analytical categories and obtain a broader understanding of the documentary sources of these institutions in order to extract, analyze, and interpret the data involved. A total of 215 hyperlinks that can be considered cultural collections of the institutions studied were identified, representing 2,537,921 cultural heritage items. The results show that the online publication of Brazil's digital cultural heritage is limited in terms of technology, copyright licensing, and established information organization practices. This paper provides a conceptual and analytical view to discuss the requirements for formulating strategies aimed at building a data science information infrastructure of Brazilian digital cultural collections that serves as future projects.
Chen, H.; Chung, Y.-M.; Ramsey, M.; Yang, C.C.: ¬A smart itsy bitsy spider for the Web (1998) 0.04
```
0.04383175 = product of:
  0.175327 = sum of:
    0.175327 = weight(_text_:java in 1871) [ClassicSimilarity], result of:
      0.175327 = score(doc=1871,freq=2.0), product of:
        0.45033762 = queryWeight, product of:
          7.0475073 = idf(docFreq=104, maxDocs=44421)
          0.06390027 = queryNorm
        0.38932347 = fieldWeight in 1871, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.0475073 = idf(docFreq=104, maxDocs=44421)
          0.0390625 = fieldNorm(doc=1871)
  0.25 = coord(1/4)
```
Abstract

As part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent agent approach to Web searching. In this experiment, we developed 2 Web personal spiders based on best first search and genetic algorithm techniques, respectively. These personal spiders can dynamically take a user's selected starting homepages and search for the most closely related homepages in the Web, based on the links and keyword indexing. A graphical, dynamic, Jav-based interface was developed and is available for Web access. A system architecture for implementing such an agent-spider is presented, followed by deteiled discussions of benchmark testing and user evaluation results. In benchmark testing, although the genetic algorithm spider did not outperform the best first search spider, we found both results to be comparable and complementary. In user evaluation, the genetic algorithm spider obtained significantly higher recall value than that of the best first search spider. However, their precision values were not statistically different. The mutation process introduced in genetic algorithms allows users to find other potential relevant homepages that cannot be explored via a conventional local search process. In addition, we found the Java-based interface to be a necessary component for design of a truly interactive and dynamic Web agent
Chen, C.: CiteSpace II : detecting and visualizing emerging trends and transient patterns in scientific literature (2006) 0.04
```
0.04383175 = product of:
  0.175327 = sum of:
    0.175327 = weight(_text_:java in 272) [ClassicSimilarity], result of:
      0.175327 = score(doc=272,freq=2.0), product of:
        0.45033762 = queryWeight, product of:
          7.0475073 = idf(docFreq=104, maxDocs=44421)
          0.06390027 = queryNorm
        0.38932347 = fieldWeight in 272, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.0475073 = idf(docFreq=104, maxDocs=44421)
          0.0390625 = fieldNorm(doc=272)
  0.25 = coord(1/4)
```
Abstract

This article describes the latest development of a generic approach to detecting and visualizing emerging trends and transient patterns in scientific literature. The work makes substantial theoretical and methodological contributions to progressive knowledge domain visualization. A specialty is conceptualized and visualized as a time-variant duality between two fundamental concepts in information science: research fronts and intellectual bases. A research front is defined as an emergent and transient grouping of concepts and underlying research issues. The intellectual base of a research front is its citation and co-citation footprint in scientific literature - an evolving network of scientific publications cited by research-front concepts. Kleinberg's (2002) burst-detection algorithm is adapted to identify emergent research-front concepts. Freeman's (1979) betweenness centrality metric is used to highlight potential pivotal points of paradigm shift over time. Two complementary visualization views are designed and implemented: cluster views and time-zone views. The contributions of the approach are that (a) the nature of an intellectual base is algorithmically and temporally identified by emergent research-front terms, (b) the value of a co-citation cluster is explicitly interpreted in terms of research-front concepts, and (c) visually prominent and algorithmically detected pivotal points substantially reduce the complexity of a visualized network. The modeling and visualization process is implemented in CiteSpace II, a Java application, and applied to the analysis of two research fields: mass extinction (1981-2004) and terrorism (1990-2003). Prominent trends and pivotal points in visualized networks were verified in collaboration with domain experts, who are the authors of pivotal-point articles. Practical implications of the work are discussed. A number of challenges and opportunities for future studies are identified.
Eddings, J.: How the Internet works (1994) 0.04
```
0.04383175 = product of:
  0.175327 = sum of:
    0.175327 = weight(_text_:java in 2514) [ClassicSimilarity], result of:
      0.175327 = score(doc=2514,freq=2.0), product of:
        0.45033762 = queryWeight, product of:
          7.0475073 = idf(docFreq=104, maxDocs=44421)
          0.06390027 = queryNorm
        0.38932347 = fieldWeight in 2514, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.0475073 = idf(docFreq=104, maxDocs=44421)
          0.0390625 = fieldNorm(doc=2514)
  0.25 = coord(1/4)
```
Abstract

How the Internet Works promises "an exciting visual journey down the highways and byways of the Internet," and it delivers. The book's high quality graphics and simple, succinct text make it the ideal book for beginners; however it still has much to offer for Net vets. This book is jam- packed with cool ways to visualize how the Net works. The first section visually explores how TCP/IP, Winsock, and other Net connectivity mysteries work. This section also helps you understand how e-mail addresses and domains work, what file types mean, and how information travels across the Net. Part 2 unravels the Net's underlying architecture, including good information on how routers work and what is meant by client/server architecture. The third section covers your own connection to the Net through an Internet Service Provider (ISP), and how ISDN, cable modems, and Web TV work. Part 4 discusses e-mail, spam, newsgroups, Internet Relay Chat (IRC), and Net phone calls. In part 5, you'll find out how other Net tools, such as gopher, telnet, WAIS, and FTP, can enhance your Net experience. The sixth section takes on the World Wide Web, including everything from how HTML works to image maps and forms. Part 7 looks at other Web features such as push technology, Java, ActiveX, and CGI scripting, while part 8 deals with multimedia on the Net. Part 9 shows you what intranets are and covers groupware, and shopping and searching the Net. The book wraps up with part 10, a chapter on Net security that covers firewalls, viruses, cookies, and other Web tracking devices, plus cryptography and parental controls.
Wu, D.; Shi, J.: Classical music recording ontology used in a library catalog (2016) 0.04
```
0.04383175 = product of:
  0.175327 = sum of:
    0.175327 = weight(_text_:java in 4179) [ClassicSimilarity], result of:
      0.175327 = score(doc=4179,freq=2.0), product of:
        0.45033762 = queryWeight, product of:
          7.0475073 = idf(docFreq=104, maxDocs=44421)
          0.06390027 = queryNorm
        0.38932347 = fieldWeight in 4179, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.0475073 = idf(docFreq=104, maxDocs=44421)
          0.0390625 = fieldNorm(doc=4179)
  0.25 = coord(1/4)
```
Abstract

In order to improve the organization of classical music information resources, we constructed a classical music recording ontology, on top of which we then designed an online classical music catalog. Our construction of the classical music recording ontology consisted of three steps: identifying the purpose, analyzing the ontology, and encoding the ontology. We identified the main classes and properties of the domain by investigating classical music recording resources and users' information needs. We implemented the ontology in the Web Ontology Language (OWL) using five steps: transforming the properties, encoding the transformed properties, defining ranges of the properties, constructing individuals, and standardizing the ontology. In constructing the online catalog, we first designed the structure and functions of the catalog based on investigations into users' information needs and information-seeking behaviors. Then we extracted classes and properties of the ontology using the Apache Jena application programming interface (API), and constructed a catalog in the Java environment. The catalog provides a hierarchical main page (built using the Functional Requirements for Bibliographic Records (FRBR) model), a classical music information network and integrated information service; this combination of features greatly eases the task of finding classical music recordings and more information about classical music.
Chakrabarti, S.: Mining the Web : discovering knowledge from hypertext data (2003) 0.04
```
0.037695378 = product of:
  0.15078151 = sum of:
    0.15078151 = weight(_text_:hyperlinks in 3222) [ClassicSimilarity], result of:
      0.15078151 = score(doc=3222,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.32292762 = fieldWeight in 3222, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.03125 = fieldNorm(doc=3222)
  0.25 = coord(1/4)
```
Footnote

Rez. in: JASIST 55(2004) no.3, S.275-276 (C. Chen): "This is a book about finding significant statistical patterns on the Web - in particular, patterns that are associated with hypertext documents, topics, hyperlinks, and queries. The term pattern in this book refers to dependencies among such items. On the one hand, the Web contains useful information an just about every topic under the sun. On the other hand, just like searching for a needle in a haystack, one would need powerful tools to locate useful information an the vast land of the Web. Soumen Chakrabarti's book focuses an a wide range of techniques for machine learning and data mining an the Web. The goal of the book is to provide both the technical Background and tools and tricks of the trade of Web content mining. Much of the technical content reflects the state of the art between 1995 and 2002. The targeted audience is researchers and innovative developers in this area, as well as newcomers who intend to enter this area. The book begins with an introduction chapter. The introduction chapter explains fundamental concepts such as crawling and indexing as well as clustering and classification. The remaining eight chapters are organized into three parts: i) infrastructure, ii) learning and iii) applications.
Yang, K.: Information retrieval on the Web (2004) 0.04
```
0.037695378 = product of:
  0.15078151 = sum of:
    0.15078151 = weight(_text_:hyperlinks in 5278) [ClassicSimilarity], result of:
      0.15078151 = score(doc=5278,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.32292762 = fieldWeight in 5278, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.03125 = fieldNorm(doc=5278)
  0.25 = coord(1/4)
```
Abstract

How do we find information an the Web? Although information on the Web is distributed and decentralized, the Web can be viewed as a single, virtual document collection. In that regard, the fundamental questions and approaches of traditional information retrieval (IR) research (e.g., term weighting, query expansion) are likely to be relevant in Web document retrieval. Findings from traditional IR research, however, may not always be applicable in a Web setting. The Web document collection - massive in size and diverse in content, format, purpose, and quality - challenges the validity of previous research findings that are based an relatively small and homogeneous test collections. Moreover, some traditional IR approaches, although applicable in theory, may be impossible or impractical to implement in a Web setting. For instance, the size, distribution, and dynamic nature of Web information make it extremely difficult to construct a complete and up-to-date data representation of the kind required for a model IR system. To further complicate matters, information seeking on the Web is diverse in character and unpredictable in nature. Web searchers come from all walks of life and are motivated by many kinds of information needs. The wide range of experience, knowledge, motivation, and purpose means that searchers can express diverse types of information needs in a wide variety of ways with differing criteria for satisfying those needs. Conventional evaluation measures, such as precision and recall, may no longer be appropriate for Web IR, where a representative test collection is all but impossible to construct. Finding information on the Web creates many new challenges for, and exacerbates some old problems in, IR research. At the same time, the Web is rich in new types of information not present in most IR test collections. Hyperlinks, usage statistics, document markup tags, and collections of topic hierarchies such as Yahoo! (http://www.yahoo.com) present an opportunity to leverage Web-specific document characteristics in novel ways that go beyond the term-based retrieval framework of traditional IR. Consequently, researchers in Web IR have reexamined the findings from traditional IR research.
Kim, W.; Wilbur, W.J.: Corpus-based statistical screening for content-bearing terms (2001) 0.04
```
0.037695378 = product of:
  0.15078151 = sum of:
    0.15078151 = weight(_text_:hyperlinks in 188) [ClassicSimilarity], result of:
      0.15078151 = score(doc=188,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.32292762 = fieldWeight in 188, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.03125 = fieldNorm(doc=188)
  0.25 = coord(1/4)
```
Abstract

Kim and Wilber present three techniques for the algorithmic identification in text of content bearing terms and phrases intended for human use as entry points or hyperlinks. Using a set of 1,075 terms from MEDLINE evaluated on a zero to four, stop word to definite content word scale, they evaluate the ranked lists of their three methods based on their placement of content words in the top ranks. Data consist of the natural language elements of 304,057 MEDLINE records from 1996, and 173,252 Wall Street Journal records from the TIPSTER collection. Phrases are extracted by breaking at punctuation marks and stop words, normalized by lower casing, replacement of nonalphanumerics with spaces, and the reduction of multiple spaces. In the ``strength of context'' approach each document is a vector of binary values for each word or word pair. The words or word pairs are removed from all documents, and the Robertson, Spark Jones relevance weight for each term computed, negative weights replaced with zero, those below a randomness threshold ignored, and the remainder summed for each document, to yield a score for the document and finally to assign to the term the average document score for documents in which it occurred. The average of these word scores is assigned to the original phrase. The ``frequency clumping'' approach defines a random phrase as one whose distribution among documents is Poisson in character. A pvalue, the probability that a phrase frequency of occurrence would be equal to, or less than, Poisson expectations is computed, and a score assigned which is the negative log of that value. In the ``database comparison'' approach if a phrase occurring in a document allows prediction that the document is in MEDLINE rather that in the Wall Street Journal, it is considered to be content bearing for MEDLINE. The score is computed by dividing the number of occurrences of the term in MEDLINE by occurrences in the Journal, and taking the product of all these values. The one hundred top and bottom ranked phrases that occurred in at least 500 documents were collected for each method. The union set had 476 phrases. A second selection was made of two word phrases occurring each in only three documents with a union of 599 phrases. A judge then ranked the two sets of terms as to subject specificity on a 0 to 4 scale. Precision was the average subject specificity of the first r ranks and recall the fraction of the subject specific phrases in the first r ranks and eleven point average precision was used as a summary measure. The three methods all move content bearing terms forward in the lists as does the use of the sum of the logs of the three methods.
Jung, S.; Herlocker, J.L.; Webster, J.: Click data as implicit relevance feedback in web search (2007) 0.04
```
0.037695378 = product of:
  0.15078151 = sum of:
    0.15078151 = weight(_text_:hyperlinks in 1912) [ClassicSimilarity], result of:
      0.15078151 = score(doc=1912,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.32292762 = fieldWeight in 1912, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.03125 = fieldNorm(doc=1912)
  0.25 = coord(1/4)
```
Abstract

Search sessions consist of a person presenting a query to a search engine, followed by that person examining the search results, selecting some of those search results for further review, possibly following some series of hyperlinks, and perhaps backtracking to previously viewed pages in the session. The series of pages selected for viewing in a search session, sometimes called the click data, is intuitively a source of relevance feedback information to the search engine. We are interested in how that relevance feedback can be used to improve the search results quality for all users, not just the current user. For example, the search engine could learn which documents are frequently visited when certain search queries are given. In this article, we address three issues related to using click data as implicit relevance feedback: (1) How click data beyond the search results page might be more reliable than just the clicks from the search results page; (2) Whether we can further subselect from this click data to get even more reliable relevance feedback; and (3) How the reliability of click data for relevance feedback changes when the goal becomes finding one document for the user that completely meets their information needs (if possible). We refer to these documents as the ones that are strictly relevant to the query. Our conclusions are based on empirical data from a live website with manual assessment of relevance. We found that considering all of the click data in a search session as relevance feedback has the potential to increase both precision and recall of the feedback data. We further found that, when the goal is identifying strictly relevant documents, that it could be useful to focus on last visited documents rather than all documents visited in a search session.
Spitkovsky, V.; Norvig, P.: From words to concepts and back : dictionaries for linking text, entities and ideas (2012) 0.04
```
0.037695378 = product of:
  0.15078151 = sum of:
    0.15078151 = weight(_text_:hyperlinks in 1337) [ClassicSimilarity], result of:
      0.15078151 = score(doc=1337,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.32292762 = fieldWeight in 1337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.03125 = fieldNorm(doc=1337)
  0.25 = coord(1/4)
```
Abstract

Human language is both rich and ambiguous. When we hear or read words, we resolve meanings to mental representations, for example recognizing and linking names to the intended persons, locations or organizations. Bridging words and meaning - from turning search queries into relevant results to suggesting targeted keywords for advertisers - is also Google's core competency, and important for many other tasks in information retrieval and natural language processing. We are happy to release a resource, spanning 7,560,141 concepts and 175,100,788 unique text strings, that we hope will help everyone working in these areas. How do we represent concepts? Our approach piggybacks on the unique titles of entries from an encyclopedia, which are mostly proper and common noun phrases. We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia's groupings of articles into hierarchical categories. The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article's canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept's url. Our database thus includes weights that measure degrees of association. For example, the top two entries for football indicate that it is an ambiguous term, which is almost twice as likely to refer to what we in the US call soccer. Vgl. auch: Spitkovsky, V.I., A.X. Chang: A cross-lingual dictionary for english Wikipedia concepts. In: http://nlp.stanford.edu/pubs/crosswikis.pdf.

Liu, B.: Web data mining : exploring hyperlinks, contents, and usage data (2011) 0.04

0.037695378 = product of:
  0.15078151 = sum of:
    0.15078151 = weight(_text_:hyperlinks in 1354) [ClassicSimilarity], result of:
      0.15078151 = score(doc=1354,freq=2.0), product of:
        0.46692044 = queryWeight, product of:
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.06390027 = queryNorm
        0.32292762 = fieldWeight in 1354, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.3070183 = idf(docFreq=80, maxDocs=44421)
          0.03125 = fieldNorm(doc=1354)
  0.25 = coord(1/4)

Search (1260 results, page 5 of 63)

Authors

Years

Languages

Types

Themes

Subjects

Classifications