Every year, the Commandant of the Marine Corps publishes a reading list of books that often only bear on warfighting tangentially at best. The idea behind this is that those entrusted with the lives of servicemembers should have an understanding of the world that goes beyond the profession of arms.
In much the same way, I have been advising data scientists to go beyond professional literature. Below are the five books every data scientist should have read. As a profession, we are increasingly tackling morally complex issues. In the context of nuclear weapons, General Omar Bradley referred to nuclear giants but ethical infants. In a world that is shaped by data, and those who work with it, more than it has ever been in the history of humankind, may we take this chance to grow not only in skill but also in ethics, professionalism and humanity.
Alexander Solzhenitsyn: In the First Circle
Solzhenitsyn’s most famous books, such as A Day in the Life of Ivan Denisovich and The Gulag Archipelago, deal with the every-day privations of Soviet prison camps. In the First Circle is different: its characters exist literally in a limbo (the ‘First Circle’ of Dante’s Inferno of the righteous but unbaptised souls), subject to repression by the regime on one hand but due to their scientific knowledge, valued as workers in a sharashka, a ‘special engineering bureau’ staffed by prisoner scientists. In his Gulag books, Solzhenitsyn asks what it means to be human in a system of calculated inhuman repression. In the First Circle, his question is more specific: what does it mean to be a scientist in an unjust regime, and what compromises a scientist should not make, even if the cost is their freedom or their very survival. As AI and ML is increasingly used by tyrannical regimes as a tool of political repression, Solzhenitsyn’s In the First Circle speaks as loudly to today’s data scientists as it did to the Soviet and Western scientists who first read it in samizdat copies in the late 1960.
Thornton Wilder: The Bridge of San Luis Rey
Wilder’s The Bridge of San Luis Rey might well be the first novel to feature a data scientist (of sorts) as a protagonist – the best part of a century before data science as such existed. Brother Juniper, an Italian monk in Peru, witnesses the collapse of an Inca rope bridge, leading to the death of five people. He devotes his following years to unraveling the mystery behind what seems random and senseless at first – why did these five people die, and why didn’t others? What made them special? He goes about this in much the same way as we would in any current problem: by gathering information on the decedents, and try to find what common factors set them apart. Wilder’s book is about a lot of things – not the least fate, randomness and our innate expectation of an ordered universe.
David Halberstam: The Best and the Brightest
Robert S. McNamara was nicknamed the Electric Brain for his almost preternatural grasp of quantitation. McGeorge Bundy was a foreign policy prodigy. JFK inspired Americans in a way few other Presidents have, before or after. The list goes on and on – JFK’s and later Lyndon B. Johnson’s cabinet was full of men of exceptional intelligence, knowledge, education and sophistication. And yet, in the face of a mounting crisis in Vietnam, they were worse than powerless: they contributed the worst shortcomings of their thinking to the problem that eventually embroiled America in a hopeless conflict. Halberstam’s The Best and the Brightest is a story of good people making bad decisions, the psychological pitfalls of interpreting the world not as it is but as we wished it to be to conform to our innermost prejudices. In the end, the ‘best and the brightest’ of America, together with prodigious amounts of information and data, missed the opportunity to prevent the fall of South Viet Nam to the Communists. The drama of Vietnam played out on the world stage, but the same cognitive biases Halberstam describes are at work in boardrooms, data science teams and decision-makers’ offices every single day.
Frank Herbert: The Dosadi Experiment
Herbert’s The Dosadi Experiment should be required reading in Responsible Conduct of Research courses. The book deals with a perennial question: is it ethical to allow an injustice to a small number of individuals to continue if it protects an entire populated universe from potentially disastrous upheaval? Because this is Frank Herbert, there’s a decent amount of trippy 1970s sci-fi stuff, including ego sharing, tree bark like creatures that create FTL information transmission and dogs bred to be semi-sentient items of furniture. Taking those curves as they come, however, The Dosadi Experiment is a masterpiece, a weird-but-wonderful meditation on the rights of the many and the rights of the few, in a research ethics context.
Eric Schlosser: Command and Control
In 1980, a Titan II inter-continental ballistic missile of the US Air Force suffered a liquid fuel explosion inside its silo near Damascus, Arkansas – with a nuclear warhead on top. Schlosser’s book reveals that this kind of incident was, unsettlingly, much less infrequent than one might be comfortable with. What happens when the literal survival of the planet depends on technology, and how comfortable are we in replacing the human decision-maker – often enough, a twenty-something 1st Lieutenant in a missile silo’s command centre – with technology that may run away from us? Schlosser’s book is an impassioned plea for better design of critical systems, presenting the near-disasters of nuclear weapons – which ought to be the epitome of safety engineering! – as an indication of all that can, and does, go wrong.
What are your favourite non-data science books for data scientists? Let me know in the comments.