Search (91 results, page 1 of 5)

Qi, Q.; Hessen, D.J.; Heijden, P.G.M. van der: Improving information retrieval through correspondenceanalysis instead of latent semantic analysis (2023) 0.03
```
0.027976645 = product of:
  0.11190658 = sum of:
    0.11190658 = weight(_text_:however in 2047) [ClassicSimilarity], result of:
      0.11190658 = score(doc=2047,freq=4.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.38933545 = fieldWeight in 2047, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=2047)
  0.25 = coord(1/4)
```
Abstract

The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrixhave been shown to mainly display marginal effects, which are irrelevant for informationretrieval. To improve the performance of LSA, usually the elements of the raw document-term matrix are weighted and the weighting exponent of singular values can be adjusted.An alternative information retrieval technique that ignores the marginal effects is correspon-dence analysis (CA). In this paper, the information retrieval performance of LSA and CA isempirically compared. Moreover, it is explored whether the two weightings also improve theperformance of CA. The results for four empirical datasets show that CA always performsbetter than LSA. Weighting the elements of the raw data matrix can improve CA; however,it is data dependent and the improvement is small. Adjusting the singular value weightingexponent often improves the performance of CA; however, the extent of the improvementdepends on the dataset and the number of dimensions. (PDF) Improving information retrieval through correspondence analysis instead of latent semantic analysis.
Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998) 0.02
```
0.023079555 = product of:
  0.09231822 = sum of:
    0.09231822 = weight(_text_:however in 2319) [ClassicSimilarity], result of:
      0.09231822 = score(doc=2319,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.32118538 = fieldWeight in 2319, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.0546875 = fieldNorm(doc=2319)
  0.25 = coord(1/4)
```
Abstract

Keyword based querying has been an immediate and efficient way to specify and retrieve related information that the user inquired. However, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given. Proposes an idea to integrate 2 existing techniques, query expansion and relevance feedback to achieve a concept-based information search for the Web
Lee, C.; Lee, G.G.: Probabilistic information retrieval model for a dependence structured indexing system (2005) 0.02
```
0.023079555 = product of:
  0.09231822 = sum of:
    0.09231822 = weight(_text_:however in 2004) [ClassicSimilarity], result of:
      0.09231822 = score(doc=2004,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.32118538 = fieldWeight in 2004, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.0546875 = fieldNorm(doc=2004)
  0.25 = coord(1/4)
```
Abstract

Most previous information retrieval (IR) models assume that terms of queries and documents are statistically independent from each other. However, conditional independence assumption is obviously and openly understood to be wrong, so we present a new method of incorporating term dependence into a probabilistic retrieval model by adapting a dependency structured indexing system using a dependency parse tree and Chow Expansion to compensate the weakness of the assumption. In this paper, we describe a theoretic process to apply the Chow Expansion to the general probabilistic models and the state-of-the-art 2-Poisson model. Through experiments on document collections in English and Korean, we demonstrate that the incorporation of term dependences using Chow Expansion contributes to the improvement of performance in probabilistic IR systems.
Sánchez-de-Madariaga, R.; Fernández-del-Castillo, J.R.: ¬The bootstrapping of the Yarowsky algorithm in real corpora (2009) 0.02
```
0.023079555 = product of:
  0.09231822 = sum of:
    0.09231822 = weight(_text_:however in 3451) [ClassicSimilarity], result of:
      0.09231822 = score(doc=3451,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.32118538 = fieldWeight in 3451, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.0546875 = fieldNorm(doc=3451)
  0.25 = coord(1/4)
```
Abstract

The Yarowsky bootstrapping algorithm resolves the homograph-level word sense disambiguation (WSD) problem, which is the sense granularity level required for real natural language processing (NLP) applications. At the same time it resolves the knowledge acquisition bottleneck problem affecting most WSD algorithms and can be easily applied to foreign language corpora. However, this paper shows that the Yarowsky algorithm is significantly less accurate when applied to domain fluctuating, real corpora. This paper also introduces a new bootstrapping methodology that performs much better when applied to these corpora. The accuracy achieved in non-domain fluctuating corpora is not reached due to inherent domain fluctuation ambiguities.
Abdelkareem, M.A.A.: In terms of publication index, what indicator is the best for researchers indexing, Google Scholar, Scopus, Clarivate or others? (2018) 0.02
```
0.023079555 = product of:
  0.09231822 = sum of:
    0.09231822 = weight(_text_:however in 548) [ClassicSimilarity], result of:
      0.09231822 = score(doc=548,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.32118538 = fieldWeight in 548, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.0546875 = fieldNorm(doc=548)
  0.25 = coord(1/4)
```
Abstract

I believe that Google Scholar is the most popular academic indexing way for researchers and citations. However, some other indexing institutions may be more professional than Google Scholar but not as popular as Google Scholar. Other indexing websites like Scopus and Clarivate are providing more statistical figures for scholars, institutions or even journals. On account of publication citations, always Google Scholar shows higher citations for a paper than other indexing websites since Google Scholar consider most of the publication platforms so he can easily count the citations. While other databases just consider the citations come from those journals that are already indexed in their database
Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.02
```
0.019782476 = product of:
  0.079129905 = sum of:
    0.079129905 = weight(_text_:however in 2716) [ClassicSimilarity], result of:
      0.079129905 = score(doc=2716,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.27530175 = fieldWeight in 2716, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=2716)
  0.25 = coord(1/4)
```
Abstract

An inverted index stores, for each term that appears in a collection of documents, a list of document numbers containing that term. Such an index is indispensible when Boolean or informal ranked queries are to be answered. Construction of the index ist, however, a non trivial task. Simple methods using in.memory data structures cannot be used for large collections because they require too much random access storage, and traditional disc based methods require large amounts of temporary file space. Describes a new indexing algorithm designed to create large compressed inverted indexes in situ. It makes use of simple compression codes for the positive integers and an in place external multi way merge sort. The new techniques has been used to invert a 2-gigabyte text collection in under 4 hours, using less than 40 megabytes of temporary disc space, and less than 20 megabytes of main memory
Zhu, B.; Chen, H.: Validating a geographical image retrieval system (2000) 0.02
```
0.019782476 = product of:
  0.079129905 = sum of:
    0.079129905 = weight(_text_:however in 5769) [ClassicSimilarity], result of:
      0.079129905 = score(doc=5769,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.27530175 = fieldWeight in 5769, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=5769)
  0.25 = coord(1/4)
```
Abstract

This paper summarizes a prototype geographical image retrieval system that demonstrates how to integrate image processing and information analysis techniques to support large-scale content-based image retrieval. By using an image as its interface, the prototype system addresses a troublesome aspect of traditional retrieval models, which require users to have complete knowledge of the low-level features of an image. In addition we describe an experiment to validate against that of human subjects in an effort to address the scarcity of research evaluating performance of an algorithm against that of human beings. The results of the experiment indicate that the system could do as well as human subjects in accomplishing the tasks of similarity analysis and image categorization. We also found that under some circumstances texture features of an image are insufficient to represent an geographic image. We believe, however, that our image retrieval system provides a promising approach to integrating image processing techniques and information retrieval algorithms
Nie, J.-Y.: Query expansion and query translation as logical inference (2003) 0.02
```
0.019782476 = product of:
  0.079129905 = sum of:
    0.079129905 = weight(_text_:however in 2425) [ClassicSimilarity], result of:
      0.079129905 = score(doc=2425,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.27530175 = fieldWeight in 2425, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=2425)
  0.25 = coord(1/4)
```
Abstract

A number of studies have examined the problems of query expansion in monolingual Information Retrieval (IR), and query translation for crosslanguage IR. However, no link has been made between them. This article first shows that query translation is a special case of query expansion. There is also another set of studies an inferential IR. Again, there is no relationship established with query translation or query expansion. The second claim of this article is that logical inference is a general form that covers query expansion and query translation. This analysis provides a unified view of different subareas of IR. We further develop the inferential IR approach in two particular contexts: using fuzzy logic and probability theory. The evaluation formulas obtained are shown to strongly correspond to those used in other IR models. This indicates that inference is indeed the core of advanced IR.
Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.02
```
0.019782476 = product of:
  0.079129905 = sum of:
    0.079129905 = weight(_text_:however in 3239) [ClassicSimilarity], result of:
      0.079129905 = score(doc=3239,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.27530175 = fieldWeight in 3239, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=3239)
  0.25 = coord(1/4)
```
Abstract

Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.02
```
0.019782476 = product of:
  0.079129905 = sum of:
    0.079129905 = weight(_text_:however in 3717) [ClassicSimilarity], result of:
      0.079129905 = score(doc=3717,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.27530175 = fieldWeight in 3717, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=3717)
  0.25 = coord(1/4)
```
Abstract

Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.
Thelwall, M.: Can Google's PageRank be used to find the most important academic Web pages? (2003) 0.02
```
0.019782476 = product of:
  0.079129905 = sum of:
    0.079129905 = weight(_text_:however in 5457) [ClassicSimilarity], result of:
      0.079129905 = score(doc=5457,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.27530175 = fieldWeight in 5457, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=5457)
  0.25 = coord(1/4)
```
Abstract

Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared with simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.
Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.02
```
0.019782476 = product of:
  0.079129905 = sum of:
    0.079129905 = weight(_text_:however in 799) [ClassicSimilarity], result of:
      0.079129905 = score(doc=799,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.27530175 = fieldWeight in 799, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=799)
  0.25 = coord(1/4)
```
Abstract

Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.
Witschel, H.F.: Global term weights in distributed environments (2008) 0.02
```
0.019782476 = product of:
  0.079129905 = sum of:
    0.079129905 = weight(_text_:however in 3096) [ClassicSimilarity], result of:
      0.079129905 = score(doc=3096,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.27530175 = fieldWeight in 3096, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=3096)
  0.25 = coord(1/4)
```
Abstract

This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the target retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection - an "extended stop word list" - are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some "domain-specific stop words" need to be added. A good solution for achieving this is to mix estimates from small samples of the target retrieval collection with ones derived from a reference corpus.
Cecchini, R.L.; Lorenzetti, C.M.; Maguitman, A.G.; Brignole, N.B.: Multiobjective evolutionary algorithms for context-based search (2010) 0.02
```
0.019782476 = product of:
  0.079129905 = sum of:
    0.079129905 = weight(_text_:however in 469) [ClassicSimilarity], result of:
      0.079129905 = score(doc=469,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.27530175 = fieldWeight in 469, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=469)
  0.25 = coord(1/4)
```
Abstract

Formulating high-quality queries is a key aspect of context-based search. However, determining the effectiveness of a query is challenging because multiple objectives, such as high precision and high recall, are usually involved. In this work, we study techniques that can be applied to evolve contextualized queries when the criteria for determining query quality are based on multiple objectives. We report on the results of three different strategies for evolving queries: (a) single-objective, (b) multiobjective with Pareto-based ranking, and (c) multiobjective with aggregative ranking. After a comprehensive evaluation with a large set of topics, we discuss the limitations of the single-objective approach and observe that both the Pareto-based and aggregative strategies are highly effective for evolving topical queries. In particular, our experiments lead us to conclude that the multiobjective techniques are superior to a baseline as well as to well-known and ad hoc query reformulation techniques.
Efron, M.: Linear time series models for term weighting in information retrieval (2010) 0.02
```
0.019782476 = product of:
  0.079129905 = sum of:
    0.079129905 = weight(_text_:however in 675) [ClassicSimilarity], result of:
      0.079129905 = score(doc=675,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.27530175 = fieldWeight in 675, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=675)
  0.25 = coord(1/4)
```
Abstract

Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this article is the temporal behavior of terms as a collection changes over time. We propose capturing each term's collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a weakly discriminative term x at time t is predictable by a linear model of the term's prior observations. On the other hand, a linear time series model for a strong discriminators' collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time-based measures of term importance and test these against state-of-the-art term weighting models.
Biskri, I.; Rompré, L.: Using association rules for query reformulation (2012) 0.02
```
0.019782476 = product of:
  0.079129905 = sum of:
    0.079129905 = weight(_text_:however in 1092) [ClassicSimilarity], result of:
      0.079129905 = score(doc=1092,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.27530175 = fieldWeight in 1092, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=1092)
  0.25 = coord(1/4)
```
Abstract

In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.
Jindal, V.; Bawa, S.; Batra, S.: ¬A review of ranking approaches for semantic search on Web (2014) 0.02
```
0.019782476 = product of:
  0.079129905 = sum of:
    0.079129905 = weight(_text_:however in 3799) [ClassicSimilarity], result of:
      0.079129905 = score(doc=3799,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.27530175 = fieldWeight in 3799, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=3799)
  0.25 = coord(1/4)
```
Abstract

With ever increasing information being available to the end users, search engines have become the most powerful tools for obtaining useful information scattered on the Web. However, it is very common that even most renowned search engines return result sets with not so useful pages to the user. Research on semantic search aims to improve traditional information search and retrieval methods where the basic relevance criteria rely primarily on the presence of query keywords within the returned pages. This work is an attempt to explore different relevancy ranking approaches based on semantics which are considered appropriate for the retrieval of relevant information. In this paper, various pilot projects and their corresponding outcomes have been investigated based on methodologies adopted and their most distinctive characteristics towards ranking. An overview of selected approaches and their comparison by means of the classification criteria has been presented. With the help of this comparison, some common concepts and outstanding features have been identified.
Li, H.; Wu, H.; Li, D.; Lin, S.; Su, Z.; Luo, X.: PSI: A probabilistic semantic interpretable framework for fine-grained image ranking (2018) 0.02
```
0.019782476 = product of:
  0.079129905 = sum of:
    0.079129905 = weight(_text_:however in 577) [ClassicSimilarity], result of:
      0.079129905 = score(doc=577,freq=2.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.27530175 = fieldWeight in 577, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.046875 = fieldNorm(doc=577)
  0.25 = coord(1/4)
```
Abstract

Image Ranking is one of the key problems in information science research area. However, most current methods focus on increasing the performance, leaving the semantic gap problem, which refers to the learned ranking models are hard to be understood, remaining intact. Therefore, in this article, we aim at learning an interpretable ranking model to tackle the semantic gap in fine-grained image ranking. We propose to combine attribute-based representation and online passive-aggressive (PA) learning based ranking models to achieve this goal. Besides, considering the highly localized instances in fine-grained image ranking, we introduce a supervised constrained clustering method to gather class-balanced training instances for local PA-based models, and incorporate the learned local models into a unified probabilistic framework. Extensive experiments on the benchmark demonstrate that the proposed framework outperforms state-of-the-art methods in terms of accuracy and speed.
Henzinger, M.R.: Hyperlink analysis for the Web (2001) 0.02
```
0.018651098 = product of:
  0.07460439 = sum of:
    0.07460439 = weight(_text_:however in 1008) [ClassicSimilarity], result of:
      0.07460439 = score(doc=1008,freq=4.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.25955698 = fieldWeight in 1008, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.03125 = fieldNorm(doc=1008)
  0.25 = coord(1/4)
```
Content

Information retrieval is a computer science subfield whose goal is to find all documents relevant to a user query in a given collection of documents. As such, information retrieval should really be called document retrieval. Before the advent of the Web, IR systems were typically installed in libraries for use mostly by reference librarians. The retrieval algorithm for these systems was usually based exclusively on analysis of the words in the document. The Web changed all this. Now each Web user has access to various search engines whose retrieval algorithms often use not only the words in the documents but also information like the hyperlink structure of the Web or markup language tags. How are hyperlinks useful? The hyperlink functionality alone-that is, the hyperlink to Web page B that is contained in Web page A-is not directly useful in information retrieval. However, the way Web page authors use hyperlinks can give them valuable information content. Authors usually create hyperlinks they think will be useful to readers. Some may be navigational aids that, for example, take the reader back to the site's home page; others provide access to documents that augment the content of the current page. The latter tend to point to highquality pages that might be on the same topic as the page containing the hyperlink. Web information retrieval systems can exploit this information to refine searches for relevant documents. Hyperlink analysis significantly improves the relevance of the search results, so much so that all major Web search engines claim to use some type of hyperlink analysis. However, the search engines do not disclose details about the type of hyperlink analysis they perform- mostly to avoid manipulation of search results by Web-positioning companies. In this article, I discuss how hyperlink analysis can be applied to ranking algorithms, and survey other ways Web search engines can use this analysis.
White, R.W.; Marchionini, G.: Examining the effectiveness of real-time query expansion (2007) 0.02
```
0.018651098 = product of:
  0.07460439 = sum of:
    0.07460439 = weight(_text_:however in 1913) [ClassicSimilarity], result of:
      0.07460439 = score(doc=1913,freq=4.0), product of:
        0.28742972 = queryWeight, product of:
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.06921162 = queryNorm
        0.25955698 = fieldWeight in 1913, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1529117 = idf(docFreq=1897, maxDocs=44421)
          0.03125 = fieldNorm(doc=1913)
  0.25 = coord(1/4)
```
Abstract

Interactive query expansion (IQE) (c.f. [Efthimiadis, E. N. (1996). Query expansion. Annual Review of Information Systems and Technology, 31, 121-187]) is a potentially useful technique to help searchers formulate improved query statements, and ultimately retrieve better search results. However, IQE is seldom used in operational settings. Two possible explanations for this are that IQE is generally not integrated into searchers' established information-seeking behaviors (e.g., examining lists of documents), and it may not be offered at a time in the search when it is needed most (i.e., during the initial query formulation). These challenges can be addressed by coupling IQE more closely with familiar search activities, rather than as a separate functionality that searchers must learn. In this article we introduce and evaluate a variant of IQE known as Real-Time Query Expansion (RTQE). As a searcher enters their query in a text box at the interface, RTQE provides a list of suggested additional query terms, in effect offering query expansion options while the query is formulated. To investigate how the technique is used - and when it may be useful - we conducted a user study comparing three search interfaces: a baseline interface with no query expansion support; an interface that provides expansion options during query entry, and a third interface that provides options after queries have been submitted to a search system. The results show that offering RTQE leads to better quality initial queries, more engagement in the search, and an increase in the uptake of query expansion. However, the results also imply that care must be taken when implementing RTQE interactively. Our findings have broad implications for how IQE should be offered, and form part of our research on the development of techniques to support the increased use of query expansion.

Search (91 results, page 1 of 5)

Authors

Years

Languages

Types

Themes

Subjects

Classifications