<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data | Social Sustainability Data Observatory</title><link>https://social-dataobservatory-eu.netlify.app/data/</link><atom:link href="https://social-dataobservatory-eu.netlify.app/data/index.xml" rel="self" type="application/rss+xml"/><description>Data</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Mon, 05 Jul 2021 08:00:00 +0000</lastBuildDate><image><url>https://social-dataobservatory-eu.netlify.app/media/icon_hufacc74e466c78853fac059e005cdc2c0_24983_512x512_fill_lanczos_center_3.png</url><title>Data</title><link>https://social-dataobservatory-eu.netlify.app/data/</link></image><item><title>Survey Harmonization</title><link>https://social-dataobservatory-eu.netlify.app/data/surveys/</link><pubDate>Mon, 05 Jul 2021 08:00:00 +0000</pubDate><guid>https://social-dataobservatory-eu.netlify.app/data/surveys/</guid><description>&lt;p>We provide retrospecitve, &lt;em>ex post&lt;/em>, and &lt;em>ex ante&lt;/em> survey harmonization to our partners.&lt;/p>
&lt;ol>
&lt;li>The aim of retrospective survey harmonization is to pool data from pre-existing surveys made with a similar methodology in different points in time and different countries or territories. Ex post survey harmonization is in a way a passive form of pooling research funding because you can utilize information from surveying that were made on somebody else’s expense.&lt;/li>
&lt;/ol>
&lt;figure id="figure-the-arab-barometer-surveys-do-not-have-a-consolidated-codebook-but-our-retroharmonize-software-created-one-and-put-together-data-from-three-years-and-collected-in-many-countries-about-various-public-policy-issues">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://social-dataobservatory-eu.netlify.app/media/img/surveys/arabb-comparison-select-country-chart.png" alt="The Arab Barometer surveys do not have a consolidated codebook, but our retroharmonize software created one, and put together data from three years and collected in many countries about various public policy issues." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
The Arab Barometer surveys do not have a consolidated codebook, but our retroharmonize software created one, and put together data from three years and collected in many countries about various public policy issues.
&lt;/figcaption>&lt;/figure>
&lt;ol start="2">
&lt;li>The aim of ex ante survey harmonization is to maximize the value from future retrospective harmonization; in a way, it is an active form of pooling research funding, because you benefit from money spent on related open governmental and open science survey programs.&lt;/li>
&lt;/ol>
&lt;figure id="figure-in-this-example-we-designed-a-survey-representative-among-music-professionals-that-it-can-be-compared-with-large-sample-national-surveys-on-living-conditions-and-attitudes-and-with-occupational-groups--nationally-representative-surveys-do-not-question-enough-musicians-to-allow-such-specific-use-musician-only-surveys-do-not-allow-comparison">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://social-dataobservatory-eu.netlify.app/media/img/surveys/difficulty_bills_levels.jpg" alt="In this example we designed a survey representative among music professionals that it can be compared with large-sample, national surveys on living conditions and attitudes, and with occupational groups. Nationally representative surveys do not question enough musicians to allow such specific use; musician only surveys do not allow comparison." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
In this example we designed a survey representative among music professionals that it can be compared with large-sample, national surveys on living conditions and attitudes, and with occupational groups. Nationally representative surveys do not question enough musicians to allow such specific use; musician only surveys do not allow comparison.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;a href="https://retroharmonize.dataobservatory.eu/" target="_blank" rel="noopener">retorhamonize&lt;/a> is a peer-reviewed, scientfic statistcal software that allows the programmatic retrospective harmonization of surveys, such as the last 35 years of all Eurobarometer microdata, or all Afrobarometer microdata. Eurobarometer grew out of certain CEE member states’ need for comparable data about their music and audiovisual sectors. We commissioned surveys following ESSNet-Culture guidelines and combined our survey data with open access European microdata-level surveys.&lt;/p>
&lt;p>&lt;a href="https://regions.dataobservatory.eu/" target="_blank" rel="noopener">regions&lt;/a> solves the problems caused by Europe’s shifting regional boundaries, which have undergone changes in several thousand places over the last twenty years, meaning member states’ and Eurostat’s regional statistics are not comparable over more than two to three years. This software validates and, where possible, changes the regional coding from NUTS1999 until the not yet used NUTS2021, opening up vast, valuable, untapped data sources that can be used for longitudinal analysis or for panel analysis far more precise than what national data alone would allow. It was originally designed in a research project at IVIR in the University of Amsterdam to understand the geographical dynamics of book piracy. Because of the needs this software fills, it had 700 users in the first month after publication. It is particularly useful to re-code old surveys, as regional boundaries are changing in each decade several hundred times in Europe.&lt;/p></description></item><item><title>Metadata</title><link>https://social-dataobservatory-eu.netlify.app/data/metadata/</link><pubDate>Tue, 01 Jun 2021 11:00:00 +0000</pubDate><guid>https://social-dataobservatory-eu.netlify.app/data/metadata/</guid><description>&lt;p>Our observatory has a new data API which allows access to our daily refreshing open data. You can access the API via &lt;a href="http://api.greendeal.dataobservatory.eu/" target="_blank" rel="noopener">api.greendeal.dataobservatory.eu&lt;/a>&lt;/p>
&lt;p>All the data and the metadata are available as open data, without database use restrictions, under the &lt;a href="https://opendatacommons.org/licenses/odbl/" target="_blank" rel="noopener">ODbL&lt;/a> license. However, the metadata contents are not finalized yet. We are currently working on a solution that applies the &lt;a href="http://www.nature.com/articles/sdata201618" target="_blank" rel="noopener">FAIR Guiding Principles for scientific data management and stewardship&lt;/a>, and fulfills the mandatory requirements of the Dublic Core metadata standards and at the same time the &lt;a href="https://support.datacite.org/docs/datacite-metadata-schema-v44-mandatory-properties" target="_blank" rel="noopener">mandatory requirements&lt;/a>, and most of the &lt;a href="https://support.datacite.org/docs/datacite-metadata-schema-v44-recommended-and-optional-properties" target="_blank" rel="noopener">recommended requirements&lt;/a> of DataCite. These changes will be effective before 1 July 2021.&lt;/p>
&lt;p>The &lt;strong>Competition Data Observatory&lt;/strong> temporarily shares an API with the &lt;a href="https://economy.dataobservatory.eu/" target="_blank" rel="noopener">Economy Data Observatory&lt;/a>, which serves as an incubator for similar economy-oriented reproducible research resources.&lt;/p>
&lt;figure id="figure-apigreendealdataobservatoryeuhttpsapigreendealdataobservatoryeudatabasemetadata-descriptive-metadata">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://social-dataobservatory-eu.netlify.app/media/img/observatory_screenshots/GDO_API_metadata_table.png" alt="[api.greendeal.dataobservatory.eu](https://api.greendeal.dataobservatory.eu/database/metadata) descriptive metadata" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
&lt;a href="https://api.greendeal.dataobservatory.eu/database/metadata" target="_blank" rel="noopener">api.greendeal.dataobservatory.eu&lt;/a> descriptive metadata
&lt;/figcaption>&lt;/figure>
&lt;h2 id="descriptive-metadata">Descriptive Metadata&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">&lt;/th>
&lt;th style="text-align:center">&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">Identifier&lt;/td>
&lt;td style="text-align:center">An unambiguous reference to the resource within a given context. (Dublin Core item), but several identifiders allowed, and we will use several of them.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Creator&lt;/td>
&lt;td style="text-align:center">The main researchers involved in producing the data, or the authors of the publication, in priority order. To supply multiple creators, repeat this property. (Extends the Dublin Core with multiple authors, and legal persons, and adds affiliation data.)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Title&lt;/td>
&lt;td style="text-align:center">A name given to the resource. Extends Dublin Core with alternative title, subtitle, translated Title, and other title(s).&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Publisher&lt;/td>
&lt;td style="text-align:center">The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role. For software, use Publisher for the code repository. (Dublin Core item.)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Publication Year&lt;/td>
&lt;td style="text-align:center">The year when the data was or will be made publicly available.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Resource Type&lt;/td>
&lt;td style="text-align:center">We publish Datasets, Images, Report, and Data Papers. (Dublin Core item with controlled vocabulary.)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="recommended-for-discovery">Recommended for discovery&lt;/h3>
&lt;p>The &lt;strong>Recommended&lt;/strong> (R) properties are optional, but strongly recommended for interoperability.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">&lt;/th>
&lt;th style="text-align:center">&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">Subject&lt;/td>
&lt;td style="text-align:center">The topic of the resource. (Dublin Core item.)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Contributor&lt;/td>
&lt;td style="text-align:center">The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource. (Extends the Dublin Core with multiple authors, and legal persons, and adds affiliation data.) When applicable, we add Distributor (of the datasets and images), Contact Person, Data Collector, Data Curator, Data Manager, Hosting Institution, Producer (for images), Project Manager, Researcher, Research Group, Rightsholder, Sponsor, Supervisor&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Date&lt;/td>
&lt;td style="text-align:center">A point or period of time associated with an event in the lifecycle of the resource, besides the Dublin Core minimum we add Collected, Created, Issued, Updated, and if necessary, Withdrawn dates to our datasets.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Related Identifier&lt;/td>
&lt;td style="text-align:center">An identifier or identifiers other than the primary Identifier applied to the resource being registered.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Rights&lt;/td>
&lt;td style="text-align:center">We give &lt;a href="https://spdx.org/licenses/" target="_blank" rel="noopener">SPDX License List&lt;/a> standards rights description with URLs to the actual license. (Dublin Core item: Rights Management)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Description&lt;/td>
&lt;td style="text-align:center">Recommended for discovery.(Dublin Core item.)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">GeoLocation&lt;/td>
&lt;td style="text-align:center">Similar to Dublin Core item Coverage&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>The &lt;code>Subject&lt;/code> property: we need to set standard coding schemas for each observatory.&lt;/li>
&lt;li>&lt;code>Contributor&lt;/code> property:
&lt;ul>
&lt;li>&lt;code>DataCurator&lt;/code> the curator of the dataset, who sets the mandatory properties.&lt;/li>
&lt;li>&lt;code>DataManager&lt;/code> the person who keeps the dataset up-to-date.&lt;/li>
&lt;li>&lt;code>ContactPerson&lt;/code> the person who can be contacted for reuse requests or bug reports.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>The &lt;code>Date&lt;/code> property contains the following dates, which are set automatically by the &lt;a href="https://r.dataobservatory.eu/" target="_blank" rel="noopener">dataobservatory R package&lt;/a>:
&lt;ul>
&lt;li>&lt;code>Updated&lt;/code> when the dataset was updated;&lt;/li>
&lt;li>&lt;code>EarliestObservation&lt;/code>, which the earliest, not backcasted, estimated or imputed observation.&lt;/li>
&lt;li>&lt;code>LatestObservation&lt;/code>, which the earliest, not backcasted, estimated or imputed observation.&lt;/li>
&lt;li>&lt;code>UpdatedatSource&lt;/code>, when the raw data source was last updated.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>The &lt;code>GeoLocation&lt;/code> is automatically created by the &lt;a href="https://r.dataobservatory.eu/" target="_blank" rel="noopener">dataobservatory R package&lt;/a>.&lt;/li>
&lt;li>The &lt;code>Description&lt;/code> property optional elements, and we adopted them as follows for the observatories:
&lt;ul>
&lt;li>The &lt;code>Abstract&lt;/code> is a short, textual description; we try to automate its creation as much as a possible, but some curatorial input is necessary.&lt;/li>
&lt;li>In the &lt;code>TechnicalInfo&lt;/code> sub-field, we record automatically the &lt;code>utils::sessionInfo()&lt;/code> for computational reproducability. This is automatically created by the &lt;a href="https://r.dataobservatory.eu/" target="_blank" rel="noopener">dataobservatory R package&lt;/a>.&lt;/li>
&lt;li>In the &lt;code>Other&lt;/code> sub-field, we record the keywords for structuring the observatory.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="optional">Optional&lt;/h3>
&lt;p>The &lt;strong>Optional&lt;/strong> (O) properties are optional and provide richer description. For findability they are not so important, but to create a web service, they are essential. In the mandatory and recommended fields, we are following other metadata standards and codelists, but in the optional fields we have to build up our own system for the observatories.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">&lt;/th>
&lt;th style="text-align:center">&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">Language&lt;/td>
&lt;td style="text-align:center">A language of the resource. (Dublin Core item.)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Alternative Identifier&lt;/td>
&lt;td style="text-align:center">An identifier or identifiers other than the primary Identifier applied to the resource being registered.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Size&lt;/td>
&lt;td style="text-align:center">We give the CSV, downloadable dataset size in bytes.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Format&lt;/td>
&lt;td style="text-align:center">We give file format information. We mainly use CSV and JSON, and occasionally rds and SPSS types. (Dublin Core item.)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Version&lt;/td>
&lt;td style="text-align:center">The version number of the resource.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Rights&lt;/td>
&lt;td style="text-align:center">We give &lt;a href="https://spdx.org/licenses/" target="_blank" rel="noopener">SPDX License List&lt;/a> standards rights description with URLs to the actual license. (Dublin Core item: Rights Management)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Funding Reference&lt;/td>
&lt;td style="text-align:center">We provide the funding reference information when applicable. This is usually mandatory with public funds.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Related Item&lt;/td>
&lt;td style="text-align:center">We give information about our observatory partners&amp;rsquo; related research products, awards, grants (also Dublin Core item as Relation.) We particularly include source information when the dataset is derived from another resource (which is a Dublin Core item.)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>In the &lt;code>Language&lt;/code> we only use English (eng) at the moment.&lt;/li>
&lt;li>By default We do not use the &lt;code>Alternative Identifier&lt;/code> property. We will do this when the same dataset will be used in several observatories.&lt;/li>
&lt;li>The &lt;code>Size&lt;/code> property is measured in bytes for the CSV representation of the dataset. During creations, the software creates a temporary CSV file to check if the dataset has no writing problems, and measures the dataset size.&lt;/li>
&lt;li>The &lt;code>Version&lt;/code> property needs further work. For a daily re-freshing API we need to find an applicable versioning system.&lt;/li>
&lt;li>The &lt;code>Funding reference&lt;/code> will contain information for donors, sponsors, and co-financing partners.&lt;/li>
&lt;li>Our default setting for &lt;code>Rights&lt;/code> is the &lt;a href="https://spdx.org/licenses/CC-BY-NC-SA-4.0.html" target="_blank" rel="noopener">CC-BY-NC-SA-4.0&lt;/a> license and we provide an URI for the license document.&lt;/li>
&lt;li>In the &lt;code>RelatedItem&lt;/code> we give information about:
&lt;ul>
&lt;li>The original (raw) data source.&lt;/li>
&lt;li>Methodological bibilography reference, when needed.&lt;/li>
&lt;li>The open-source statistical software code that processed the data.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="processing-metadata">Administrative (Processing) Metadata&lt;/h2>
&lt;p>Like with diamonds, it is better to know the history of a dataset, too. Our administrative metadata contains codelists that follow the SXDX statistical metadata standards, and similarly strucutred information about the processing history of the dataset.&lt;/p>
&lt;figure id="figure-apigreendealdataobservatoryeuhttpsapigreendealdataobservatoryeudatabasecodebook-processing-metadata">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://social-dataobservatory-eu.netlify.app/media/img/observatory_screenshots/GDO_API_codebook_table.png" alt="[api.greendeal.dataobservatory.eu](https://api.greendeal.dataobservatory.eu/database/codebook) processing metadata" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
&lt;a href="https://api.greendeal.dataobservatory.eu/database/codebook" target="_blank" rel="noopener">api.greendeal.dataobservatory.eu&lt;/a> processing metadata
&lt;/figcaption>&lt;/figure>
&lt;p>See for further reference &lt;a href="https://r.dataobservatory.eu/articles/codebook.html" target="_blank" rel="noopener">The codebook Class&lt;/a>.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">&lt;/th>
&lt;th style="text-align:center">&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">Observation Status&lt;/td>
&lt;td style="text-align:center">SDMX Code list for &lt;a href="https://sdmx.org/?sdmx_news=new-version-of-code-list-for-observation-status-version-2-2" target="_blank" rel="noopener">Observation Status 2.2&lt;/a> (CL_OBS_STATUS), such as actual, missing, imputed, etc. values.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Method&lt;/td>
&lt;td style="text-align:center">If the value is estimated, we provide modelling information.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Unit&lt;/td>
&lt;td style="text-align:center">We provide the measurement unit of the data (when applicable.)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Frequency&lt;/td>
&lt;td style="text-align:center">&lt;a href="https://sdmx.org/?page_id=3215/" target="_blank" rel="noopener">SDMX Code list for Frequency 2.1 (CL_FREQ)&lt;/a> frequency values&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Codelist&lt;/td>
&lt;td style="text-align:center">Euros-SDMX Codelist entries for the observational units, such as sex, etc.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Imputation&lt;/td>
&lt;td style="text-align:center">SDMX Code list for Frequency 2.1 (CL_IMPUT_METH) imputation values&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Estimation&lt;/td>
&lt;td style="text-align:center">The estimation methodology of data that we calculated, together with citation information and URI to the actual processing code&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Related Item&lt;/td>
&lt;td style="text-align:center">We give information about the software code that processed the data (both Dublin Core and DataCite compliant.)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>See an example in the &lt;a href="https://r.dataobservatory.eu/articles/codebook.html" target="_blank" rel="noopener">The codebook Class&lt;/a> article of the &lt;a href="https://r.dataobservatory.eu/" target="_blank" rel="noopener">dataobservatory R package&lt;/a>.&lt;/p></description></item><item><title>Data Sharing</title><link>https://social-dataobservatory-eu.netlify.app/data/data-sharing/</link><pubDate>Sun, 16 May 2021 00:00:00 +0000</pubDate><guid>https://social-dataobservatory-eu.netlify.app/data/data-sharing/</guid><description>&lt;p>we would like to actively encourage the sharing of data assets.&lt;/p></description></item><item><title>Open Data</title><link>https://social-dataobservatory-eu.netlify.app/data/open-data/</link><pubDate>Sun, 16 May 2021 00:00:00 +0000</pubDate><guid>https://social-dataobservatory-eu.netlify.app/data/open-data/</guid><description>&lt;p>Many countries in the world allow access to a vast array of information,
such as documents under freedom of information requests, statistics,
datasets. In the European Union, most taxpayer financed data in
government administration, transport, or meteorology, for example, can
be usually re-used. More and more scientific output is expected to be
reviewable and reproducible, which implies open access.&lt;/p>
&lt;table>
&lt;tbody>
&lt;tr class="odd">
&lt;td style="text-align: center;">
&lt;figure id="figure-whats-the-problem-with-open-datadataopen-govopen-data-problems">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://social-dataobservatory-eu.netlify.app/media/img/blogposts_2021/photo-1490004047268-5259045aa2b4.jpg" alt="[What’s the Problem with Open Data?](/data/open-gov/#open-data-problems)" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
&lt;a href="https://social-dataobservatory-eu.netlify.app/data/open-gov/#open-data-problems">What’s the Problem with Open Data?&lt;/a>
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;td style="text-align: center;">
&lt;figure id="figure-how-we-add-valuedataopen-govopen-data-value-added">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://social-dataobservatory-eu.netlify.app/media/img/blogposts_2021/photo-1590247813693-5541d1c609fd.jpg" alt="[How We Add Value?](/data/open-gov/#open-data-value-added)" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
&lt;a href="https://social-dataobservatory-eu.netlify.app/data/open-gov/#open-data-value-added">How We Add Value?&lt;/a>
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;table>
&lt;tbody>
&lt;tr class="even">
&lt;td style="text-align: center;">
&lt;figure id="figure-is-there-value-in-itdataopen-govis-there-value-left-in-open-data-if-its-money-on-the-street-why-nobodys-picking-it-up">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://social-dataobservatory-eu.netlify.app/media/img/blogposts_2021/photo-1533580909002-a2f298d005eb.jpg" alt="[Is There Value in It?](/data/open-gov/#is-there-value-left-in-open-data) If it’s money on the street, why nobody’s picking it up?" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
&lt;a href="https://social-dataobservatory-eu.netlify.app/data/open-gov/#is-there-value-left-in-open-data">Is There Value in It?&lt;/a> &lt;/br>If it’s money on the street, why nobody’s picking it up?
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;td style="text-align: center;">
&lt;figure id="figure-datasets-should-work-together-to-give-informationdataopen-govdata-integrationdata-is-only-potential-information-raw-and-unprocessed">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://social-dataobservatory-eu.netlify.app/media/img/blogposts_2021/photo-1605143185650-77944b152643.jpg" alt="[Datasets Should Work Together to Give Information](/data/open-gov/#data-integration)Data is only potential information, raw and unprocessed." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
&lt;a href="https://social-dataobservatory-eu.netlify.app/data/open-gov/#data-integration">Datasets Should Work Together to Give Information&lt;/a>&lt;/br>Data is only potential information, raw and unprocessed.
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="open-data-problems">What’s the Problem with Open Data?&lt;/h2>
&lt;p>&lt;em>“Data is stuff. It is raw, unprocessed, possibly even untouched by human
hands, unviewed by human eyes, un-thought-about by human minds.”&lt;/em> [1]&lt;/p>
&lt;ul>
&lt;li>Most open data cannot be just &lt;a href="#open-data-faq">&amp;ldquo;downloaded.&amp;rdquo;&lt;/a>&lt;/li>
&lt;li>Often, you need to put more than $100 value of &lt;a href="#is-there-value-left-in-open-data">work&lt;/a> into processing, validating, documenting a dataset that is worth $100. But you can share this investment with our data observatories.&lt;/li>
&lt;li>Open data is almost always lacking of documentation, and no clear references to validate if the data is reliable or not corrupted. This is why we always &lt;a href="#open-data-value-added">start&lt;/a> with reprocessing and redocumenting.&lt;/li>
&lt;/ul>
&lt;figure id="figure-our-review-of-about-80-eu-un-and-oecd-data-observatories-reveals-that-most-of-them-do-not-use-these-organizationss-open-data---instead-they-use-various-and-often-not-well-processed-proprietary-sources">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://social-dataobservatory-eu.netlify.app/media/img/observatory_screenshots/observatory_collage_16x9_800.png" alt="Our review of about 80 EU, UN and OECD data observatories reveals that most of them do not use these organizations&amp;#39;s open data - instead they use various, and often not well processed proprietary sources." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Our review of about 80 EU, UN and OECD data observatories reveals that most of them do not use these organizations&amp;rsquo;s open data - instead they use various, and often not well processed proprietary sources.
&lt;/figcaption>&lt;/figure>
&lt;p>Read more: &lt;a href="https://dataandlyrics.com/post/2021-06-18-gold-without-rush/" target="_blank" rel="noopener">Open Data - The New Gold Without the
Rush&lt;/a>&lt;/p>
&lt;h2 id="open-data-value-added">How We Add Value?&lt;/h2>
&lt;ul>
&lt;li>We believe that even such generally trusted data sources as Eurostat
often need to be reprocessed, because various legal and political
constraints do not allow the common European statistical services to
provide optimal quality data – for example, on the regional and city
levels.&lt;/li>
&lt;li>With
&lt;a href="https://greendeal.dataobservatory.eu/authors/ropengov/" target="_blank" rel="noopener">rOpenGov&lt;/a>
and other partners, we are creating open-source statistical software
in R to re-process these heterogenous and low-quality data into tidy
statistical indicators to automatically validate and document it.&lt;/li>
&lt;li>Metadata is a potentially informative data record about a
potentially informative dataset. We are carefully documenting and
releasing administrative, processing, and descriptive metadata,
following international metadata standards, to make our data easy to
find and easy to use for data analysts.&lt;/li>
&lt;li>We are automatically creating depositions and authoritative copies
marked with an individual digital object identifier (DOI) to
maintain data integrity.&lt;/li>
&lt;/ul>
&lt;h2 id="is-there-value-left-in-open-data">Is There Value in Open Data?&lt;/h2>
&lt;p>&lt;em>A well-known story tells of a finance professor and a student who come across a $100 bill lying on the ground. As the student stops to pick it up, the professor says, “Don’t bother—if it were really a $100 bill, it wouldn’t be there.”&lt;/em>&lt;/p>
&lt;p>But this is not the case with open data. Often, you need to put more than $100 into processing, validating, documenting a dataset that is worth $100.&lt;/p>
&lt;p>In the EU, open data is governed by the &lt;a href="https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1561563110433&amp;amp;uri=CELEX:32019L1024" target="_blank" rel="noopener">Directive on open data and the re-use of public sector information - in short: Open Data Directive (EU) 2019 / 1024&lt;/a>. It entered into force on 16 July 2019. It replaces the &lt;a href="https://eur-lex.europa.eu/legal-content/en/ALL/?uri=CELEX:32003L0098" target="_blank" rel="noopener">Public Sector Information Directive&lt;/a>, also known as the &lt;em>PSI Directive&lt;/em> which dated from 2003 and was subsequently amended in 2013.&lt;/p>
&lt;p>&lt;strong>Open Data&lt;/strong> is &lt;em>potentially&lt;/em> useful data that can &lt;em>potentially&lt;/em> replace costlier or hard to get data sources to build information. It is analogous to potential energy: work is required to release it. We build automated systems that reduce this work and increase the likelihood that open data will offer the &lt;em>best value for money&lt;/em>.&lt;/p>
&lt;ul>
&lt;li>Most open data is not publicy accessible, and available upon request. Our real curatorial advantage is that we know where it is and how to get this request processed.&lt;/li>
&lt;li>Most European open data comes from tax authorities, meteorological
offices, managers of transport infrastructure, and other
governmental bodies whose data needs are very different from yours.
Their data must be carefully evaluated, re-processed, and if
necessary, imputed to be usable for your scientific, business or
policy goals.&lt;/li>
&lt;li>The use of open science data is problematic in different ways:
usually understanding the data documentation requires
domain-specific specialist knowledge. &lt;a href="https://social-dataobservatory-eu.netlify.app/data/open-science/">Open science
data&lt;/a> is even more scattered and difficult to
access than technically open, but not public governmental data.&lt;/li>
&lt;/ul>
&lt;h2 id="data-integration">From Datasets to Data Integration, Data to Information&lt;/h2>
&lt;p>“Data is only potential information, raw and unprocessed, prior to
anyone actually being informed by it.” ^[2]&lt;/p>
&lt;ul>
&lt;li>We are building simple databases and supporting APIs that release
the data without restrictions, in a tidy format that is easy to join
with other data, or easy to join into databases, together with
standardized metadata.&lt;/li>
&lt;/ul>
&lt;figure id="figure-our-service-flow-and-value-chain">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://social-dataobservatory-eu.netlify.app/media/img/slides/automated_observatory_value_chain.jpg" alt="Our service flow and value chain" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Our service flow and value chain
&lt;/figcaption>&lt;/figure>
&lt;h2 id="open-data-faq">FAQ&lt;/h2>
&lt;h3 id="why-downloading-does-not-work">Why Downloading Does Not Work?&lt;/h3>
&lt;ul>
&lt;li>Most open data is not available on the internet.&lt;/li>
&lt;li>If it is available, it is not in a form that you can easily import into a spreadsheet application like Excel or OpenOffice, or into a statistical application like SPSS or STATA.&lt;/li>
&lt;li>Even the data quality of trusted web sources, like the Eurostat website, can be very low. Eurostat just publishes what it gets from governments, and often has no mandate to fix errors. The data is full with missing information, and in the case of regional statistics, faulty region codes and region names that make matching your data or placing them on a map impossible.&lt;/li>
&lt;li>Adjusting euros with millions of euros, correctly translating dollars to euros, pounds to kilograms requires plenty of work. This is a very error-prone process when done by humans.&lt;/li>
&lt;/ul>
&lt;h3 id="can-open-data-be-used-in-machine-learning-and-ai">Can Open Data be Used in Machine Learning and AI?&lt;/h3>
&lt;ul>
&lt;li>Most public and open data sources have many missing observations; machine learning models usually cannot hanlde missingness. These points must be carefully imputed with approximations, which can be very challenging when the data has geographical dimension.&lt;/li>
&lt;li>Removing missing values makes samples extremely biased and your model will learn from omissions, not information.&lt;/li>
&lt;/ul>
&lt;h2 id="photo-credits">Photo Credits&lt;/h2>
&lt;p>&lt;em>What&amp;rsquo;s the Problem with Open Data?&lt;/em> illustration is a photo by &lt;a href="https://unsplash.com/photos/8hJQKRIQZMY" target="_blank" rel="noopener">Cristina Gottardi&lt;/a>
&lt;em>How We Add Value?&lt;/em> illustration is a photo by &lt;a href="https://unsplash.com/photos/IEiAmhXehwE" target="_blank" rel="noopener">Nana Smirnova&lt;/a>.
&lt;em>Is There Value Left in It?&lt;/em> is a photo by &lt;a href="https://unsplash.com/photos/GcnPjvqRL18" target="_blank" rel="noopener">Imelda&lt;/a>
&lt;em>Datasets Should Work Together to Give Information&lt;/em> is a photo by &lt;a href="https://unsplash.com/photos/huRn8ECqADI" target="_blank" rel="noopener">Lucas Santos&lt;/a>&lt;/p>
&lt;h2 id="footnote-references">Footnote References&lt;/h2>
&lt;p>[1] Pomerantz, Jeffrey. 2021. “Metadata.” MIT Press essential knowledge
series. MIT Press. Cambridge, Massachusetts ; London, England : The MIT
Press, [2015]&lt;/p>
&lt;p>[2] Pomerantz, Jeffrey. 2021. “Metadata.” MIT Press essential knowledge
series. MIT Press. Cambridge, Massachusetts ; London, England : The MIT
Press, [2015]&lt;/p></description></item></channel></rss>