The Importance of Data Modelling in the Age of AI

Using data modelling to support your AI strategy

Tshepiso Mogoswane
5 min readFeb 12, 2023
Using data modelling to support your AI strategy
Artificial Intelligence (AI) and machine learning, data modelling

In the age of Artificial Intelligence (AI) and machine learning, data modelling is still a critical component of any successful AI strategy. In this blog post, we’ll explore the relationship between AI strategy and data strategy and then discuss what data modelling is and why it’s still important in the age of AI.

“Contextual information not only results in better performing AI systems but also in a clearer ethical perspective for those creating and shaping it“ — Emil Eifrem

What is an AI strategy in relation to a data strategy? Before we dive into data modelling, let’s first define what we mean by AI strategy and data strategy. An AI strategy is a plan that outlines how an organisation will leverage AI and machine learning to achieve its goals. A data strategy, on the other hand, is a plan that outlines how an organisation will manage, store, and use its data to achieve its goals.

The two strategies are closely intertwined, as AI and machine learning rely on high-quality, accurate data to function effectively. Without a clear understanding of the underlying data structure, it’s easy for AI to misinterpret or misclassify data, resulting in inaccurate predictions and insights. This is where a data strategy comes in — By planning how data will be collected, managed, and analysed, organisations can guarantee that their AI systems use the best possible data for the best possible outcomes.

In addition, a data strategy is critical for ongoing AI performance monitoring and optimisation. As AI models are deployed and begin processing new data, a good data strategy can help organisations identify and troubleshoot any issues that arise, allowing for continuous improvement and refinement of their AI models.

What is data modelling? With the AI and data strategies defined, let’s dive into what data modelling is. According to Mike Sargo, Chief Data and Analytics Officer and Co-Founder of Data Ideology, “A data model enables you to make decisions based on facts instead of educated guesses”.In simplest terms, data modelling is the process of creating a visual representation of a dataset. This representation helps analysts better understand the relationships between data points and identify trends and patterns that would otherwise be difficult to spot. The result is a more complete, nuanced understanding of the data — something that’s essential for any successful AI implementation.

Why is data modelling still relevant in the age of AI? So, why is data modelling still crucial in a world where machines can perform complex data analysis independently? As we discussed earlier, an AI is only as effective as the data it’s trained on. Without a clear understanding of the underlying data structure, it’s easy for AI to misinterpret or misclassify data, resulting in inaccurate predictions and insights. By creating a visual representation of the data structure, analysts can identify and address any anomalies, redundancies, or inconsistencies in the data, ensuring that the AI model is trained on high-quality, accurate data.

Now that we have examined why data modelling remains relevant in the age of AI, let’s take some time to explore the different types of available data models. There are three main types of data models: conceptual, logical, and physical.

three main types of data models
  1. Conceptual data model: This type of data model represents the high-level relationships between entities and their attributes. It provides a conceptual view of the data, and is often used in the early stages of a project to help stakeholders understand the scope and requirements of the project.
  2. Logical data model: This type of data model provides a more detailed view of the data, showing the relationships between entities, their attributes, and the business rules that govern them. It’s used to establish the technical requirements of a project and to identify any data constraints or limitations.
  3. Physical data model: This type of data model provides a detailed view of the physical storage and access methods for the data. It ensures that the data is stored and accessed efficiently and securely.

Now that we’ve explored the different types of data models let’s dive into some of the techniques that can be used for data modelling.

  1. Entity-Relationship (ER) modelling: ER modelling is a technique for representing the relationships between entities in a dataset. It involves creating a diagram that shows the different entities in the dataset and the relationships between them. This technique is often used in the early stages of a project to help stakeholders understand the relationships between different data points.
  2. Object-oriented modelling: Object-oriented modelling is a technique that involves creating a class hierarchy to represent the different entities in a dataset. Each class represents a different entity, and the relationships between entities are represented by the relationships between classes. This technique is often used in software development, where it’s essential to represent complex datasets in an easy-to-understand way.
  3. Dimensional modelling: Dimensional modelling is a commonly used technique in data warehousing. It involves organising the data into “dimensions” (e.g. time, location, product) and “facts” (e.g. sales, revenue, profit). This technique helps analyse large datasets and for creating easy-to-understand reports and visualisations.
  4. Data Vault modelling: Data Vault modelling is a technique that’s designed to handle large, complex datasets. It involves breaking the data down into smaller, more manageable pieces and creating a flexible, scalable data model that can easily accommodate changes in the dataset. This technique is often used in big data applications, where the dataset is constantly changing and evolving.
  5. Graph data modelling: Graph data modelling is a commonly used technique to represent complex relationships between entities. It involves creating a graph showing the different entities and their relationships. This technique is often used in applications that involve recommendation engines, fraud detection, and social network analysis.

Conclusion

In conclusion, while AI and machine learning are transforming how we analyse and use data, data modelling is still a crucial part of any successful AI strategy. By using the different types of data models and the various data modelling techniques, organisations can ensure that they’re providing high-quality, accurate data to their AI systems, leading to more effective predictions and insights.

References:

  1. The Entity-Relationship Model — Toward a Unified View of Data Vol. 1 (1), 1976, Peter Pin-Shan Chen: https://www.dragon1.com/downloads/peter-chen-entity-relationhip-model.pdf.
  2. DObject-Oriented Modeling: A Roadmap, Gregor Engels, Luuk Groenewegen: Enterprise Strategy Group, 27 Feb. 2020, https://www.esg-global.com/blog/data-modeling-still-critical-for-machine-learning.
  3. A Dimensional Modeling Manifesto August 2, 1997, Ralph Kimball: https://www.kimballgroup.com/1997/08/a-dimensional-modeling-manifesto/
  4. Building a Scalable Data Warehouse with Data Vault 2.0 1st Edition September 15, 2015”, Dan Linstedt, Michael Olschimke: https://www.google.co.uk/books/edition/Building_a_Scalable_Data_Warehouse_with/lgDJBAAAQBAJ?hl=en&gbpv=1&printsec=frontcover
  5. Toward AI Standards: Why Context Is Critical for Artificial Intelligence”, Emil Eifrem: https://neo4j.com/emil/toward-ai-standards-why-context-is-critical-for-artificial-intelligence/

--

--

Tshepiso Mogoswane

Solution Architect (Data & AI) | Data Evangelist | Technology Enthusiast | Learn more about all things data by following me mrmogoswane.medium.com