Data driven HR

Data Driven HR: Discrimination by AI

This is part one of a five part series on privacy rights of employees in the context of data driven HR.

The golden age of data driven HR is upon us. We are rapidly becoming much better at understanding what happens between your ears. In my recently defended  doctoral dissertation I contribute to some of the psychometric models that predict attitudes and behaviour on the work floor. Or should we say behind the work screen, as few of us share a work floor in 2022. As with any new revolutionary technology there are some pitfalls and dangers we have to learn to live with. The fundamental questions are “What are the privacy rights of employees?” and, as an HR department or HR-consultant, “What do we have the right to measure?”. What follows is a redacted extract of ethical considerations in my dissertation that dives deeply into these questions. This blog post is part 1, discrimination by AI.

SARA for data driven HR

Data driven HR and Discrimination by AI

In 1991 at a high water mark of pop culture and a lasting memory for many in my age cohort Arnold Schwarzenegger said “hasta la vista baby” to the evil AI bot before overly dramatically terminating him. Rogue AI’s set to destroy us are long time Hollywood favourites with movies from The Matrix to the Space Odyssey and many more. But catastrophic effects of AI are just around the corner, are much more subtle, and do not require the computer to become sentient, nor any kind of ill intentions or rebellion.

In 2015 Amazon said “hasta la vista baby” to their CV vetting algorithm (Dastin, 2018) after it insisted on discriminating against women in the CV selection process, even when manual interventions were done to hide gender and correct for bias in the machine. This AI did not like women and would downgrade the CV when it could identify via secondary clues that the candidate was likely female.

COMPAS, stands for Correctional Offender Management Profiling for Alternative Sanctions. It is a case management and decision support tool used by US courts to assess the likelihood of a defendant becoming a recidivist. Basically the AI tells judges whether it thinks the individual in question would get in trouble again if the judge lets him go early or offers an alternative sanction (Brennan et al.2009). Sounds great, but …, investigative journalists for ProPublica analysed the outcomes of the engine and found widespread Machine Bias against minorities, especially against black people. (Allen & Masters, 2021) ( ProbPublica, 2020). If you are black the AI would evaluate your chances to be recidivist higher. Basically the judge’s computer is racist.

So how is this possible? How do neutral algorithms and data driven HR becomes engines of discrimination. It is because the machine looks for correlation not causality. Causality is complicated to establish statistically. Correlation is very straightforward. Specifically in machine learning we need massive amounts of data in order for the machine to find patterns and develop a predictive model. Typically, as well as in both Amazon’s CV selection AI and in COMPAS historical data was used. But the history is tainted by inequality and discrimination. If we have been hiring mostly wealthy white males in the past, and wealthy white males have better education, better connections and more career opportunities, the AI will learn that being a wealthy white male is the proxy for success. So it will start looking for clues of who the wealthy white males are. In this way our next batch would have even more wealthy white males and the problem exacerbates. You may think the solution is as easy as hiding race, gender or anything that could be discriminatory from the AI, but here the AI will outsmart you. It will find patterns in other clues by which it will discriminate. For example an HR recruitment engine was found to use the postal codes of Chicago to estimate effectiveness at a certain job. Chicago is a very segregated city with different neighbourhoods representing different ethnicities and social economical classes. Obviously those from less fortunate neighbourhood have had less opportunities for education and career development than those from the wealthy neighbourhoods. So the AI in the name of the numbers will avoid those neighbourhoods. Which of course isn’t fair, if I score well on objective criteria but come from a disadvantaged neighbourhood I should at least have the same likelihood to do well at the job than someone from a more pampered background.

Data driven HR

Now I hear you thinking, “well we should obviously not allow the AI to judge based on the address” but again, it is not that straight forward. What if, in our research, by creating a taxonomy of experienced utility, we are building weapons for mass discrimination in AI? Some of our affective preferences are cultural and may be related to ethnicity or social class. This would be reflected in a survey such as ours. We are psychometrically looking into the mind, once the door is open computers and machine learning will be used to optimise outcomes, that is unavoidable. The AI will then develop psychometric patterns or fingerprints of the profiles which in the past have been successful for whatever purpose the AI is employed. This means that it will discriminate on ethnicity, race, culture, social class, religion and even philosophical convictions.

So how do we mitigate this problem? Well the jury is still out on it, as Kochling & Wehner (2020) point out in their meta analysis of 36 papers on discrimination by algorithms. There is no silver bullet but the answer builds on three pillars: transparency, interpretability, and explainability. We want to avoid any “black box” and create a “glass box” as Roscher et al. (2020) illustrates. Transparency, interpretability and explainability are really about keeping a human at the helm. But will this remain realistic as automation and economic pressures push people away? I wonder.

Another possible approach to the problem is to look at the three computational steps and address each individually, input, processing and output. If the input is biased the machine will exacerbate this bias, so we should try to have non-biased inputs. However, curated datasets are costlier and smaller. This would also mean the AI can not continuously learn about its environment because the inputs have to be curated first. For the processing we should follow Roshcer’s advice regarding transparency and have an active role for humans in the process.

And lastly there may be a place for affirmative action in setting the objectives for the AI, maybe the AI should have quotas to fill based on each dimcriminatable characteristic. If the outputs are locked on certain quotas then the AI will adjust accordingly. However, is that fair? Both sides of the argument invoke Rawls’ Theory of Justice (Rawls, 1999). “Rawlsian Affirmative Action” (Taylor, 2009) refers to the interpretation of modern libertarian ideas of Rawls in the context of affirmative action. Rawls is a highly influential philosopher in the American political and ethical zeitgeist. Samuel Freeman reads his views as follows:

“So-called “affirmative action,” or giving preferential treatment for socially disadvantaged minorities, is not part of FEO [Fair Equality of Opportunity] for Rawls, and is perhaps incompatible with it. This does not mean that Rawls never regarded preferential treatment in hiring and education as appropriate. In lectures he indicated that it may be a proper corrective for remedying the present effects of past discrimination. But this assumes it is temporary. Under the ideal conditions of a “well-ordered society,” Rawls did not regard preferential treatment as compatible with fair equality of opportunity. It does not fit with the emphasis on individuals and individual rights, rather than groups or group rights, that is central to liberalism.”  (Freeman, 2007)

Suffice to say we are not going to resolve the debate on affirmative action in this paper. What is important to note is that, all academics and professionals dealing with AI and predictive modelling of behaviour have to be aware of the prevalence of Machine Bias and to be well versed in its dynamics and remedies, even as the remedies are still being cooked up. The coming decades, with the rise of data driven HR, will bring an ongoing battle to fight discrimination by algorithms and we have to try to not make things worse with our work. Because if we let the machine loose on our minds it will be “hasta la vista baby” for any hope of a fair society.

What is SARA and how can she deliver your data driven HR methodologies?

Sara stands for Survey Analysis and Reporting Automation. It is a platform where HR consultants can implement their data driven methodologies and automate their workflows. It is used by top consultancy firms around the world to deliver team assessments, psychometric tests, 360 degree feedback, cultural analysis and other analytical HR tools. SARA is the AI you need to be at the cutting edge of HR-tech.

AI in data driven HR

What else does Codific build with privacy by design principles?

Codific is a team of security software engineers that leverage privacy by design principles to build secure cloud solutions. We build applications in different verticals such as HR-tech, Ed-Tech and Med-Tech. Secure collaboration and secure sharing are at the core of our solutions.

Videolab is used by top universities, academies and hospitals to put the care in healthcare. Communication skills, empathy and other soft skills are trained by sharing patient interviews recordings for feedback.

SAMMY Is a Software Assurance Maturity Model management tool. It enables companies to formulate and implement a security assurance program tuned to the risks they are facing. That way other companies can help us build a simple and safe digital future. Obviously our AppSec program and SAMMY itself is built on top of it.

We believe in collaboration and open innovation, we would love to hear about your projects and see how we can contribute in developing secure software and privacy by design architecture. Contact us.

July 2022