Creating An Automated Data Observatory

Daniel Antal, Sandor Budai, Line Matson, Moon Moon Moon

Sep 11, 2020 3 min read

Start our video (35mb, takes time to load)

We are building data ecosystems, so called observatories, where scientific, business, policy and civic users can find factual information, data, evidence for their domain. Our open source, open data, open collaboration approach allows to connect various open and proprietary data sources, and our reproducible research workflows allow us to automate data collection, processing, publication, documentation and presentation.

Our scripts are checking data sources, such as Eurostat’s Eurobase, Spotify’s API and other music industry sources every day for new information, and process any data corrections or new disclosure, interpolate, backcast or forecast missing values, make currency translations and unit conversions. This is shown illustrated with an earlier post.

For direct access to the file visit this link.

In the video we show automated the creation of an observatory website with well-formatted, statistical data dissemination, a technical document in PDF and an ebook can be automated. In our view, our technology is particularly useful technology in business and scientific researech projects, where it is important that always the most timely and correct data is being analyzed, and remains automatically documented and cited. We are ready deploy public, collaborative, or private data observatories in short time.

Data processing costs can be as high as 80% for any in-house AI deployment project. We work mainly with organization that do not have in house data science team, and acquire their data anyway from outside the organization. In their case, this rate can be as high as 95%, meaning that getting and processing the data for deploying AI can be 20x more expensive than the AI solution itself.

AI solutions require a large amount of standardized, well processed data to learn from. We want to radically decrease the cost of data acquisition and processing for our users so that exploiting AI becomes in their reach. This is particularly important in one of our target industries, the music industries, where most of the global sales is algorithmic and AI-driven. Artists, bands, small labels, publishers, even small country national associations cannot remain competitive if they cannot participate in this technological revolution.

We started our operations on 1 September 2020 on the basis of CEEMID, a pan-European data observatory that created about 2000 music and creative industry indicators for its users. In the coming days, we are gradually opening up about 50 music industry and 50 broader creative industry indicators in a fully reproducible workflow, with daily re-freshed, re-processed, well-formatted and documented indicators for business and policy decisions.

We would like to validate this approach in one of the world’s most prestigious university-backed incubator programs, in the Yes!Delft AI/Blockchain Validation Lab.

Video credits

Data acquisition and processing: Daniel Antal, CFA and Marta Kołczyńska, PhD (survey data).
Documentation automation: Sandor Budai
Video art: Line Matson
Music: Moon Moon Moon.

R open-data Reproducible research Digital Music Observatory