Document (#42595)

Author
Masanes, J.
Title
Web archiving methods and approaches : a comparative study
Source
Library trends. 54(2005) no.1, S.72-90
Year
2005
Abstract
The Web is a virtually infinite information space, and archiving its entirety, all its aspects, is a utopia. The volume of information presents a challenge, but it is neither the only nor the most limiting factor given the continuous drop in storage device costs. Significant challenges lie in the management and technical issues of the location and collection of Web sites. As a consequence of this, archiving the Web is a task that no single institution can carry out alone. This article will present various approaches undertaken today by different institutions; it will discuss their focuses, strengths, and limits, as well as a model for appraisal and identifying potential complementary aspects amongst them. A comparison for discovery accuracy is presented between the snapshot approach done by the Internet Archive (IA) and the event-based collection done by the Bibliothèque Nationale de France (BNF) in 2002 for the presidential and parliamentary elections. The balanced conclusion of this comparison allows for identification of future direction for improvement of the former approach.
Content
Vgl.: DOI: 10.1353/lib.2006.0005.
Theme
Internet

Similar documents (content)

  1. Poole, A.H.: ¬The information work of community archives : a systematic literature review (2020) 0.07
    0.07203726 = sum of:
      0.07203726 = product of:
        0.36018628 = sum of:
          0.014482299 = weight(abstract_txt:this in 840) [ClassicSimilarity], result of:
            0.014482299 = score(doc=840,freq=4.0), product of:
              0.048149247 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.020010203 = queryNorm
              0.30077934 = fieldWeight in 840, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=840)
          0.01814327 = weight(abstract_txt:approach in 840) [ClassicSimilarity], result of:
            0.01814327 = score(doc=840,freq=1.0), product of:
              0.07759453 = queryWeight, product of:
                1.036514 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.020010203 = queryNorm
              0.2338215 = fieldWeight in 840, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=840)
          0.0967785 = weight(abstract_txt:appraisal in 840) [ClassicSimilarity], result of:
            0.0967785 = score(doc=840,freq=1.0), product of:
              0.1880168 = queryWeight, product of:
                1.1408879 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.020010203 = queryNorm
              0.51473325 = fieldWeight in 840, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0625 = fieldNorm(doc=840)
          0.03482991 = weight(abstract_txt:collection in 840) [ClassicSimilarity], result of:
            0.03482991 = score(doc=840,freq=1.0), product of:
              0.11985486 = queryWeight, product of:
                1.2882123 = boost
                4.649612 = idf(docFreq=1154, maxDocs=44421)
                0.020010203 = queryNorm
              0.29060075 = fieldWeight in 840, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.649612 = idf(docFreq=1154, maxDocs=44421)
                0.0625 = fieldNorm(doc=840)
          0.19595228 = weight(abstract_txt:archiving in 840) [ClassicSimilarity], result of:
            0.19595228 = score(doc=840,freq=1.0), product of:
              0.43399498 = queryWeight, product of:
                3.0022552 = boost
                7.2241306 = idf(docFreq=87, maxDocs=44421)
                0.020010203 = queryNorm
              0.45150816 = fieldWeight in 840, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2241306 = idf(docFreq=87, maxDocs=44421)
                0.0625 = fieldNorm(doc=840)
        0.2 = coord(5/25)
    
  2. Käki, M.; Aula, A.: Controlling the complexity in comparing search user interfaces via user studies (2008) 0.07
    0.07184645 = sum of:
      0.07184645 = product of:
        0.35923225 = sum of:
          0.009051437 = weight(abstract_txt:this in 3024) [ClassicSimilarity], result of:
            0.009051437 = score(doc=3024,freq=1.0), product of:
              0.048149247 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.020010203 = queryNorm
              0.18798709 = fieldWeight in 3024, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.078125 = fieldNorm(doc=3024)
          0.024940012 = weight(abstract_txt:will in 3024) [ClassicSimilarity], result of:
            0.024940012 = score(doc=3024,freq=1.0), product of:
              0.082669474 = queryWeight, product of:
                1.069873 = boost
                3.8615482 = idf(docFreq=2539, maxDocs=44421)
                0.020010203 = queryNorm
              0.30168346 = fieldWeight in 3024, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8615482 = idf(docFreq=2539, maxDocs=44421)
                0.078125 = fieldNorm(doc=3024)
          0.1074756 = weight(abstract_txt:balanced in 3024) [ClassicSimilarity], result of:
            0.1074756 = score(doc=3024,freq=1.0), product of:
              0.17375767 = queryWeight, product of:
                1.0967727 = boost
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.020010203 = queryNorm
              0.6185373 = fieldWeight in 3024, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.078125 = fieldNorm(doc=3024)
          0.10841458 = weight(abstract_txt:limiting in 3024) [ClassicSimilarity], result of:
            0.10841458 = score(doc=3024,freq=1.0), product of:
              0.17476824 = queryWeight, product of:
                1.0999575 = boost
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.020010203 = queryNorm
              0.62033343 = fieldWeight in 3024, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.078125 = fieldNorm(doc=3024)
          0.10935064 = weight(abstract_txt:comparison in 3024) [ClassicSimilarity], result of:
            0.10935064 = score(doc=3024,freq=2.0), product of:
              0.17577279 = queryWeight, product of:
                1.5600389 = boost
                5.6307297 = idf(docFreq=432, maxDocs=44421)
                0.020010203 = queryNorm
              0.6221136 = fieldWeight in 3024, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6307297 = idf(docFreq=432, maxDocs=44421)
                0.078125 = fieldNorm(doc=3024)
        0.2 = coord(5/25)
    
  3. Huang, T.; Nie, R.; Zhao, Y.: Archival knowledge in the field of personal archiving : an exploratory study based on grounded theory (2021) 0.07
    0.07058902 = sum of:
      0.07058902 = product of:
        0.3529451 = sum of:
          0.0072411494 = weight(abstract_txt:this in 1174) [ClassicSimilarity], result of:
            0.0072411494 = score(doc=1174,freq=1.0), product of:
              0.048149247 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.020010203 = queryNorm
              0.15038967 = fieldWeight in 1174, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=1174)
          0.01814327 = weight(abstract_txt:approach in 1174) [ClassicSimilarity], result of:
            0.01814327 = score(doc=1174,freq=1.0), product of:
              0.07759453 = queryWeight, product of:
                1.036514 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.020010203 = queryNorm
              0.2338215 = fieldWeight in 1174, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=1174)
          0.0967785 = weight(abstract_txt:appraisal in 1174) [ClassicSimilarity], result of:
            0.0967785 = score(doc=1174,freq=1.0), product of:
              0.1880168 = queryWeight, product of:
                1.1408879 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.020010203 = queryNorm
              0.51473325 = fieldWeight in 1174, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0625 = fieldNorm(doc=1174)
          0.03482991 = weight(abstract_txt:collection in 1174) [ClassicSimilarity], result of:
            0.03482991 = score(doc=1174,freq=1.0), product of:
              0.11985486 = queryWeight, product of:
                1.2882123 = boost
                4.649612 = idf(docFreq=1154, maxDocs=44421)
                0.020010203 = queryNorm
              0.29060075 = fieldWeight in 1174, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.649612 = idf(docFreq=1154, maxDocs=44421)
                0.0625 = fieldNorm(doc=1174)
          0.19595228 = weight(abstract_txt:archiving in 1174) [ClassicSimilarity], result of:
            0.19595228 = score(doc=1174,freq=1.0), product of:
              0.43399498 = queryWeight, product of:
                3.0022552 = boost
                7.2241306 = idf(docFreq=87, maxDocs=44421)
                0.020010203 = queryNorm
              0.45150816 = fieldWeight in 1174, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2241306 = idf(docFreq=87, maxDocs=44421)
                0.0625 = fieldNorm(doc=1174)
        0.2 = coord(5/25)
    
  4. Filipp, H.; Waudig, D.: Erfassung und Erschließung von Softwareinformationen (1991) 0.07
    0.06694362 = sum of:
      0.06694362 = product of:
        0.55786353 = sum of:
          0.018102873 = weight(abstract_txt:this in 4747) [ClassicSimilarity], result of:
            0.018102873 = score(doc=4747,freq=1.0), product of:
              0.048149247 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.020010203 = queryNorm
              0.37597418 = fieldWeight in 4747, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.15625 = fieldNorm(doc=4747)
          0.049880024 = weight(abstract_txt:will in 4747) [ClassicSimilarity], result of:
            0.049880024 = score(doc=4747,freq=1.0), product of:
              0.082669474 = queryWeight, product of:
                1.069873 = boost
                3.8615482 = idf(docFreq=2539, maxDocs=44421)
                0.020010203 = queryNorm
              0.6033669 = fieldWeight in 4747, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8615482 = idf(docFreq=2539, maxDocs=44421)
                0.15625 = fieldNorm(doc=4747)
          0.48988065 = weight(abstract_txt:archiving in 4747) [ClassicSimilarity], result of:
            0.48988065 = score(doc=4747,freq=1.0), product of:
              0.43399498 = queryWeight, product of:
                3.0022552 = boost
                7.2241306 = idf(docFreq=87, maxDocs=44421)
                0.020010203 = queryNorm
              1.1287704 = fieldWeight in 4747, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2241306 = idf(docFreq=87, maxDocs=44421)
                0.15625 = fieldNorm(doc=4747)
        0.12 = coord(3/25)
    
  5. Steenbakkers, J.F.: NEDLIB Guidelines for setting up a deposit system for electronic publications (2001) 0.07
    0.06615616 = sum of:
      0.06615616 = product of:
        0.413476 = sum of:
          0.012800664 = weight(abstract_txt:this in 4) [ClassicSimilarity], result of:
            0.012800664 = score(doc=4,freq=2.0), product of:
              0.048149247 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.020010203 = queryNorm
              0.26585388 = fieldWeight in 4, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.078125 = fieldNorm(doc=4)
          0.109381266 = weight(abstract_txt:amongst in 4) [ClassicSimilarity], result of:
            0.109381266 = score(doc=4,freq=1.0), product of:
              0.1758056 = queryWeight, product of:
                1.1032171 = boost
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.020010203 = queryNorm
              0.6221717 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.078125 = fieldNorm(doc=4)
          0.046353754 = weight(abstract_txt:aspects in 4) [ClassicSimilarity], result of:
            0.046353754 = score(doc=4,freq=1.0), product of:
              0.12496949 = queryWeight, product of:
                1.3154114 = boost
                4.747783 = idf(docFreq=1046, maxDocs=44421)
                0.020010203 = queryNorm
              0.37092057 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.747783 = idf(docFreq=1046, maxDocs=44421)
                0.078125 = fieldNorm(doc=4)
          0.24494033 = weight(abstract_txt:archiving in 4) [ClassicSimilarity], result of:
            0.24494033 = score(doc=4,freq=1.0), product of:
              0.43399498 = queryWeight, product of:
                3.0022552 = boost
                7.2241306 = idf(docFreq=87, maxDocs=44421)
                0.020010203 = queryNorm
              0.5643852 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2241306 = idf(docFreq=87, maxDocs=44421)
                0.078125 = fieldNorm(doc=4)
        0.16 = coord(4/25)