L3S Best Publication of the Quarter (Q3/2024)
Category: Knowledge Graphs and Bias
Employing Hybrid AI Systems to Trace and Document Bias in ML Pipelines
Authors: Mayra Russo, Yasharajsinh Chudasama, Disha Purohit, Sammy Sawischa, Maria-Esther Vidal
Published in IEEE.
The paper in a nutshell
Our paper addresses the challenge of capturing interpretable knowledge about measurable biases in data and AI pipelines that can be both human- and machine-readable.
To do so, we employ a hybrid AI system architecture, meaning that we take components from symbolic AI and sub-symbolic AI and combine them in efforts to extract the benefits attributed to each separately in order to enhance the performance and explainability of AI systems.
Via a practical use case based on the fake news detection task, we show two different implementations of our hybrid AI architecture and demonstrate its capacity to trace the underlying AI pipeline to generate semantic metadata in order to elucidate how data biases across the pipeline impact the output.
Which problem do you solve with your research?
In our research, we tackle the problem of bias associated with AI systems pipelines. As the use of AI systems goes from seemingly trivial applications to some with higher stakes and involving consequential decision-making, it is important to remember that all these systems share the ability to produce undesirable, biased results. For that reason, accounting for bias proactively during the model development phase and deployment needs to become an essential task performed by AI researchers and developers.
Our work proposes a methodology to generate bias-centric end-to-end documentation artefacts of these AI pipelines.
What is the potential impact of your findings?
The findings of our work uncover that even under the premise of balanced datasets during the data ingestion phase, the AI model’s inner processes can offset an attribute-focused bias that significantly impacts the overall accuracy and effectiveness of the fake news detection system. We concretely report a stark skewness in the distribution of input variables towards the Fake News label, we uncover how a predictive variable leads to more constraints in the learning process, and highlight open challenges of training models with unbalanced datasets.
What is new about your research?
Our research proposes a novel documentation approach that resorts to a hybrid AI architecture to trace AI systems and produce human- and machine-readable documentation.
Our hybrid AI architecture is enabled for a versatile implementation.
Particularly in our work, we showcase two implementations of a hybrid AI system. One follows an integrated approach and performs fine-grained tracing and documentation of the whole of the AI mode process. The other one follows a principled approach and enables the documentation and comparison of bias in the input data and the predictions generated by the model. For more information, we have a video summarizing our work available online: https://youtu.be/v2GfIQPAy_4?si=BXtWOf97cLiZavyu.
Paper link: https://ieeexplore.ieee.org/document/10596297