To ensure the accuracy and reliability of the predictions that we provide to our customers, BlueConduit relies on a comprehensive inspection process.
Inspections play a pivotal role in BlueConduit's lead prediction system for the following reasons:
Building Confidence in the Model: Inspections help build confidence in the predictive model by providing ground-truth data. In the realm of data, not all information holds the same level of quality and confidence, as records can be outdated or unreliable. Ground truth from inspections serves as the gold standard, ensuring that materials are accurately classified, and customers can better trust the model's predictions for their entire service area.
Reducing Bias: Inspections are essential for reducing bias in the training data. If the model is trained solely on historical records, it may inherit biases present in that data. These biases can include under-coverage, participation, survivorship, and spatial bias, which may distort the accuracy of predictions. Through inspections, BlueConduit systematically addresses these biases by collecting representative and accurate data that is not influenced by these distortions.
Under-coverage Bias: This type of bias occurs when certain segments of the population or specific geographical areas are not adequately represented in the data. Under-coverage bias could manifest if inspections primarily focus on easily accessible or well-documented areas, neglecting less accessible or underrepresented neighborhoods.
Participation Bias: Participation bias arises when the willingness of individuals or property owners to cooperate with inspections is related to other factors, such as income, education, or social status. For example, if only certain groups consent to surveys or other attempts to classify their service like, the resulting data may not accurately reflect the true distribution of lead service lines in the community.
Survivorship Bias: Survivorship bias occurs when data only includes observations that have "survived" or endured a specific process, while excluding those that did not. In the context of lead service line inventories, survivorship bias could occur if records are only updated when service lines are replaced, neglecting data from properties where no replacements have occurred. This can lead to an overestimation of non-lead service lines, as missing data may contain valuable information about the prevalence of lead lines.
Spatial Bias: Spatial bias is related to the geographical distribution of inspections. If inspections are concentrated in specific regions, urban or suburban areas, or neighborhoods with certain characteristics, the resulting data may not accurately represent the overall spatial distribution of lead service lines.
For further details and in-depth information regarding the inspection process and its impact on lead prediction accuracy, please refer to this blog post for a comprehensive discussion.