Document (#38041)

Author
Pabón, G.
Gutiérrez, C.
Fernández, J.D.
Martínez-Prieto, M.A.
Title
Linked Open Data technologies for publication of census microdata
Source
Journal of the American Society for Information Science and Technology. 64(2013) no.9, S.1802-1814
Year
2013
Abstract
Censuses are one of the most relevant types of statistical data, allowing analyses of the population in terms of demography, economy, sociology, and culture. For fine-grained analysis, census agencies publish census microdata that consist of a sample of individual records of the census containing detailed anonymous individual information. Working with microdata from different censuses and doing comparative studies are currently difficult tasks due to the diversity of formats and granularities. In this article, we show that novel data processing techniques can be applied to make census microdata interoperable and easy to access and combine. In fact, we demonstrate how Linked Open Data principles, a set of techniques to publish and make connections of (semi-)structured data on the web, can be fruitfully applied to census microdata. We present a step-by-step process to achieve this goal and we study, in theory and practice, two real case studies: the 2001 Spanish census and a general framework for Integrated Public Use Microdata Series (IPUMS-I).

Similar documents (author)

  1. Prieto-Díaz, R.: Applying faceted classification to domain analysis (1992) 1.20
    1.2024456 = sum of:
      1.2024456 = product of:
        3.6073368 = sum of:
          3.6073368 = weight(author_txt:prieto in 263) [ClassicSimilarity], result of:
            3.6073368 = score(doc=263,freq=1.0), product of:
              0.728041 = queryWeight, product of:
                1.2754395 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.05760168 = queryNorm
              4.954854 = fieldWeight in 263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.5 = fieldNorm(doc=263)
        0.33333334 = coord(1/3)
    
  2. Prieto-Díaz, R.: Implementing faceted classification for software reuse (1991) 1.20
    1.2024456 = sum of:
      1.2024456 = product of:
        3.6073368 = sum of:
          3.6073368 = weight(author_txt:prieto in 547) [ClassicSimilarity], result of:
            3.6073368 = score(doc=547,freq=1.0), product of:
              0.728041 = queryWeight, product of:
                1.2754395 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.05760168 = queryNorm
              4.954854 = fieldWeight in 547, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.5 = fieldNorm(doc=547)
        0.33333334 = coord(1/3)
    
  3. Prieto-Díaz, R.: ¬A faceted approach to building ontologies (2002) 1.20
    1.2024456 = sum of:
      1.2024456 = product of:
        3.6073368 = sum of:
          3.6073368 = weight(author_txt:prieto in 3259) [ClassicSimilarity], result of:
            3.6073368 = score(doc=3259,freq=1.0), product of:
              0.728041 = queryWeight, product of:
                1.2754395 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.05760168 = queryNorm
              4.954854 = fieldWeight in 3259, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.5 = fieldNorm(doc=3259)
        0.33333334 = coord(1/3)
    
  4. Moreno Fernández, L.M. -> Fernández, L.M.M.: 0.90
    0.89632916 = sum of:
      0.89632916 = product of:
        2.6889875 = sum of:
          2.6889875 = weight(author_txt:fernández in 5950) [ClassicSimilarity], result of:
            2.6889875 = score(doc=5950,freq=2.0), product of:
              0.5192883 = queryWeight, product of:
                1.0771748 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.05760168 = queryNorm
              5.178217 = fieldWeight in 5950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.4375 = fieldNorm(doc=5950)
        0.33333334 = coord(1/3)
    
  5. Hernández, S. Fernández- -> Fernández-Hernández, S.: 0.77
    0.7682822 = sum of:
      0.7682822 = product of:
        2.3048465 = sum of:
          2.3048465 = weight(author_txt:fernández in 1952) [ClassicSimilarity], result of:
            2.3048465 = score(doc=1952,freq=2.0), product of:
              0.5192883 = queryWeight, product of:
                1.0771748 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.05760168 = queryNorm
              4.438472 = fieldWeight in 1952, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.375 = fieldNorm(doc=1952)
        0.33333334 = coord(1/3)
    

Similar documents (content)

  1. Phenix, K.: Software for libraries : reviews of products for librarians and patrons (1993) 0.16
    0.15558162 = sum of:
      0.15558162 = product of:
        1.2965136 = sum of:
          0.027093252 = weight(abstract_txt:studies in 6754) [ClassicSimilarity], result of:
            0.027093252 = score(doc=6754,freq=1.0), product of:
              0.05820179 = queryWeight, product of:
                1.2858034 = boost
                4.25605 = idf(docFreq=1711, maxDocs=44421)
                0.010635429 = queryNorm
              0.46550548 = fieldWeight in 6754, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.25605 = idf(docFreq=1711, maxDocs=44421)
                0.109375 = fieldNorm(doc=6754)
          0.056203384 = weight(abstract_txt:data in 6754) [ClassicSimilarity], result of:
            0.056203384 = score(doc=6754,freq=3.0), product of:
              0.08908614 = queryWeight, product of:
                2.5152519 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.010635429 = queryNorm
              0.6308881 = fieldWeight in 6754, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.109375 = fieldNorm(doc=6754)
          1.2132169 = weight(abstract_txt:census in 6754) [ClassicSimilarity], result of:
            1.2132169 = score(doc=6754,freq=2.0), product of:
              0.8844377 = queryWeight, product of:
                9.377219 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.010635429 = queryNorm
              1.3717382 = fieldWeight in 6754, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.109375 = fieldNorm(doc=6754)
        0.12 = coord(3/25)
    
  2. Lamb, I.; Larson, C.: Shining a light on scientific data : building a data catalog to foster data sharing and reuse (2016) 0.15
    0.14922546 = sum of:
      0.14922546 = product of:
        0.74612725 = sum of:
          0.036414076 = weight(abstract_txt:population in 4195) [ClassicSimilarity], result of:
            0.036414076 = score(doc=4195,freq=1.0), product of:
              0.07040721 = queryWeight, product of:
                6.6200633 = idf(docFreq=160, maxDocs=44421)
                0.010635429 = queryNorm
              0.5171924 = fieldWeight in 4195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6200633 = idf(docFreq=160, maxDocs=44421)
                0.078125 = fieldNorm(doc=4195)
          0.019352322 = weight(abstract_txt:studies in 4195) [ClassicSimilarity], result of:
            0.019352322 = score(doc=4195,freq=1.0), product of:
              0.05820179 = queryWeight, product of:
                1.2858034 = boost
                4.25605 = idf(docFreq=1711, maxDocs=44421)
                0.010635429 = queryNorm
              0.3325039 = fieldWeight in 4195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.25605 = idf(docFreq=1711, maxDocs=44421)
                0.078125 = fieldNorm(doc=4195)
          0.025766455 = weight(abstract_txt:make in 4195) [ClassicSimilarity], result of:
            0.025766455 = score(doc=4195,freq=1.0), product of:
              0.07043968 = queryWeight, product of:
                1.4145396 = boost
                4.682171 = idf(docFreq=1117, maxDocs=44421)
                0.010635429 = queryNorm
              0.3657946 = fieldWeight in 4195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.682171 = idf(docFreq=1117, maxDocs=44421)
                0.078125 = fieldNorm(doc=4195)
          0.051827323 = weight(abstract_txt:data in 4195) [ClassicSimilarity], result of:
            0.051827323 = score(doc=4195,freq=5.0), product of:
              0.08908614 = queryWeight, product of:
                2.5152519 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.010635429 = queryNorm
              0.5817664 = fieldWeight in 4195, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=4195)
          0.6127671 = weight(abstract_txt:census in 4195) [ClassicSimilarity], result of:
            0.6127671 = score(doc=4195,freq=1.0), product of:
              0.8844377 = queryWeight, product of:
                9.377219 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.010635429 = queryNorm
              0.6928324 = fieldWeight in 4195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.078125 = fieldNorm(doc=4195)
        0.2 = coord(5/25)
    
  3. Mixter, J.; Childress, E.R.: FAST (Faceted Application of Subject Terminology) users : summary and case studies (2013) 0.09
    0.0892972 = sum of:
      0.0892972 = product of:
        0.5581075 = sum of:
          0.029131262 = weight(abstract_txt:agencies in 3011) [ClassicSimilarity], result of:
            0.029131262 = score(doc=3011,freq=1.0), product of:
              0.07040721 = queryWeight, product of:
                6.6200633 = idf(docFreq=160, maxDocs=44421)
                0.010635429 = queryNorm
              0.41375396 = fieldWeight in 3011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6200633 = idf(docFreq=160, maxDocs=44421)
                0.0625 = fieldNorm(doc=3011)
          0.0154818585 = weight(abstract_txt:studies in 3011) [ClassicSimilarity], result of:
            0.0154818585 = score(doc=3011,freq=1.0), product of:
              0.05820179 = queryWeight, product of:
                1.2858034 = boost
                4.25605 = idf(docFreq=1711, maxDocs=44421)
                0.010635429 = queryNorm
              0.26600313 = fieldWeight in 3011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.25605 = idf(docFreq=1711, maxDocs=44421)
                0.0625 = fieldNorm(doc=3011)
          0.023280699 = weight(abstract_txt:individual in 3011) [ClassicSimilarity], result of:
            0.023280699 = score(doc=3011,freq=1.0), product of:
              0.07639266 = queryWeight, product of:
                1.4731 = boost
                4.8760076 = idf(docFreq=920, maxDocs=44421)
                0.010635429 = queryNorm
              0.30475047 = fieldWeight in 3011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8760076 = idf(docFreq=920, maxDocs=44421)
                0.0625 = fieldNorm(doc=3011)
          0.49021366 = weight(abstract_txt:census in 3011) [ClassicSimilarity], result of:
            0.49021366 = score(doc=3011,freq=1.0), product of:
              0.8844377 = queryWeight, product of:
                9.377219 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.010635429 = queryNorm
              0.5542659 = fieldWeight in 3011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.0625 = fieldNorm(doc=3011)
        0.16 = coord(4/25)
    
  4. Leoncini, C.; Servello, R.M.: ¬The activities for authority control in EDIT16: authors, publishers/printers, devices, and places (2004) 0.08
    0.08140095 = sum of:
      0.08140095 = product of:
        1.017512 = sum of:
          0.037084617 = weight(abstract_txt:data in 5597) [ClassicSimilarity], result of:
            0.037084617 = score(doc=5597,freq=1.0), product of:
              0.08908614 = queryWeight, product of:
                2.5152519 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.010635429 = queryNorm
              0.41627818 = fieldWeight in 5597, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.125 = fieldNorm(doc=5597)
          0.9804273 = weight(abstract_txt:census in 5597) [ClassicSimilarity], result of:
            0.9804273 = score(doc=5597,freq=1.0), product of:
              0.8844377 = queryWeight, product of:
                9.377219 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.010635429 = queryNorm
              1.1085318 = fieldWeight in 5597, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.125 = fieldNorm(doc=5597)
        0.08 = coord(2/25)
    
  5. Hernon, P.; Dugan, R.E.: GIS and privacy (1997) 0.08
    0.0798055 = sum of:
      0.0798055 = product of:
        0.66504586 = sum of:
          0.029100873 = weight(abstract_txt:individual in 2583) [ClassicSimilarity], result of:
            0.029100873 = score(doc=2583,freq=1.0), product of:
              0.07639266 = queryWeight, product of:
                1.4731 = boost
                4.8760076 = idf(docFreq=920, maxDocs=44421)
                0.010635429 = queryNorm
              0.38093808 = fieldWeight in 2583, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8760076 = idf(docFreq=920, maxDocs=44421)
                0.078125 = fieldNorm(doc=2583)
          0.023177885 = weight(abstract_txt:data in 2583) [ClassicSimilarity], result of:
            0.023177885 = score(doc=2583,freq=1.0), product of:
              0.08908614 = queryWeight, product of:
                2.5152519 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.010635429 = queryNorm
              0.26017386 = fieldWeight in 2583, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=2583)
          0.6127671 = weight(abstract_txt:census in 2583) [ClassicSimilarity], result of:
            0.6127671 = score(doc=2583,freq=1.0), product of:
              0.8844377 = queryWeight, product of:
                9.377219 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.010635429 = queryNorm
              0.6928324 = fieldWeight in 2583, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.078125 = fieldNorm(doc=2583)
        0.12 = coord(3/25)