Traditionally, companies innovated using only internal resources, applying what is known as a closed innovation model. Today, Open Innovation models are gaining more and more traction, especially across large companies that understand the importance of leveraging the benefits of collaborating with highly technological and specialized startups to keep pace with fast-changing businesses.
Since its launch in 2020, CRIF InnovEcos has been connecting startups, colleagues and clients to foster innovation and growth. This article is a sneak peek into one of our Venture Lab success stories and how it led to the development of CRIF's first synthetic data evaluation framework, DAISYnt.
What is synthetic data and why is it important?
Synthetic data is artificially generated information statistically mimicking data collected from real-life events. Synthetization is at the leading edge of privacy-enhancing techniques, allowing the simultaneous benefits of achieving the same data quality as the original data while strongly reducing the risks of manipulating real-world data. Thanks to the unique mix of quality and privacy, synthetic data technology can be leveraged by data companies for many purposes. For example, to improve the training of AI models by generating additional samples, to enrich sales operations with real-looking yet artificial data, and to test new data-product features with an unlimited amount of realistic data.
How did InnovEcoS guide the innovation process that led to developing an algorithm dedicated to synthetic data evaluation within CRIF? Did you only use internal resources or collaborate with startups and research centers?
The Venture Client program aims to find the right long-term partner to solve CRIF innovation challenges and meet specific business model integration criteria.
Despite its extensive experience in data science and analytics, the team wanted to find a best-in-class player in synthetic data, understand the status of the technology and boost the joint go-to-market.
InnovEcoS supported the advanced analytics team in focusing on use cases, exploring available solutions in the startup scene thanks to scouting and shortlisting activities, and then enabled the unit to run more than +11 proofs of concept with different startups and scaleups, of which +5 reached the pilot phase.
What is important is that during the process, the unit, which was already collaborating on synthetic data with Giorgio Visani, Ph.D. from the University of Bologna, had to develop an internal capability to evaluate synthetic datasets. Indeed, the more experience gained through a comparison with startup and scaleup results, the more the team felt the need to assess the quality of a synthetic dataset scientifically. After a few months of the project, DAISYnt was created!
Today the project is in the venture-building phase; our support is to facilitate the definition of a scalable business model among the selected solutions, collaborate with external research centers and innovation teams and evangelize the benefits of synthetic data internally.
What is DAISYnt? What are its applications?
DAISYnt (aDoption of Artificial Intelligence SYnthesis) is a framework designed to catalog the important traits of synthetic data and provide a specific methodology to test them. This framework's application compares two data sources according to four aspects: general applications, statistical distributions, data utility and privacy. The result is a set of all-around reliable metrics intended to certify a synthetic dataset and provide an objective function to maximize during the generation process.
How do you see the intersection of the use of synthetic data within the financial sector?
The financial sector is highly regulated. Consequently, state-of-the-art privacy-enhancing technologies like synthetic data play a dual role in the long run. On one side, synthetic data technology paves the way for improvements in the field of model development, product testing and sales operations. Conversely, the same technology can be an excellent candidate to comply with all the regulations concerning the use of personal and sensitive information. For example, given that synthetization generates artificial data, a synthetic dataset for AI model training could be stored for a more extended period, allowing full auditing of the model itself years after the initial training.
What are your main takeaways after completing an innovative collaborative project like this one?
The first takeaway is that state-of-the-art technologies require experience. The fruitful collaboration with many startups allowed us to gain the essential expertise to run synthesis software and better understand how they could fit with and improve existing products. The second takeaway is that the process matters: following a scientific approach to a problem of open innovation enables all project stakeholders to focus on the outcomes/testing and validation phases.
ABOUT THE AUTHORS
InnovEcos is CRIF's Global Open Innovation Hub. Our mission is to guide the discovery of future business models and technological trends to innovate services, products and processes through research, experimentation and collaboration with startups. In the last two years, we have scouted +550 startups and run +16 PoCs and +6 pilots through our Venture Client Lab.
Emilio Tropea is the Open Innovation and Venture Manager @Innovecos, collaborating with startups through proactive scouting and the promotion of design-driven methodologies to create new product enhancements and develop business models based on digital ecosystems.
Enrico Bagli is the Data Science and Innovation Manager @ CRIF and Strands, leveraging data and artificial intelligence to develop solutions for open banking and finance management apps.
Giacomo Graffi is a Data Scientist @ CRIF, exploring machine learning and engineering to develop innovative and robust solutions.
The article is part of a series that aims to help startup founders and corporates to navigate and understand the value of open innovation and partnerships.