Development/Validation of a Moral Questionnaire and the Question whether it is Possible to “Fake Good”

Overall objectives: Morality is back in criminological research. We designed a moral dilemma questionnaire and studied to which extent the instrument differentiated socially well-adjusted persons from criminals. If so, are criminals able to “fake good”, which would make the instrument useless except in a research context with anonymous participants. Method: The questionnaire included a set of short stories describing a moral dilemma, and a set of solutions to the dilemmas. To each of these the subject should respond “right” or “wrong”. 297 well-adjusted subjects, working in governmental or private enterprises, 233 students at the Police Academy, and 321 prison inmates filled in questionnaire forms. Results: A factor analysis suggested a 3-factor solution. Factors were interpreted as Rule knowledge, Rule adherence, and Utilitarianism. Prisoners differed markedly from well-adjusted subjects a discriminant analysis yielded 86% correct classifications. There were theoretically meaningful relations with a set of external validation parameters reflecting personality factors and disorders. Conclusion: The results suggest that the questionnaire approach was successful in a research perspective. Cheating study Method: 46 prisoners filled in (anonymously) the moral dilemma questionnaire twice, honestly and trying to fake good. The order was rotated. Results: The algorithm which correctly predicted 86% as being prisoner or socially well-adjusted was applied. None of the 46 participants were well-adjusted when responding honestly. Scores changed when they faked good, but only five managed to merge into the well-adjusted group. Conclusion: Prisoners are not able to fake good with respect to moral statements. This opens for clinical use but is ethically problematic.


Introduction
Current theories about the origin of man stress the importance of social life for the development of the highly complex brain of primates. Gorillas, for instance, have a cognitive and problemsolving capacity that far exceeds what is necessary relative to any aspect of its survival as a species, for instance territorial defence and food gathering. It is a reasonable assumption that the ancestors of man were still more dependent on finely tuned social skills for survival and reproductive success [1]. Man's frontal lobes, unique in terms of size and complexity among animals (including the big apes) are believed to be orchestrating our social behaviour, and have been a driving force in the development of our cognitive and problem-solving skills.
Man is not only a pack animal but also a troop animal-our social contexts involve thousands of individuals, most of whom we do not know by names or by previous contacts. The only other troop animal is the baboon -such troops can number up to 1000 individuals or more. In baboon societies the troop is organized into a number of packs, and there is a "top monkey". The norm is to stick to ones pack (and its local leader) but rule violations are common (falling in love or establishing friendship across pack borders). Like man, baboons are able to keep pets: dogs. It is a reasonable assumption that the complexity of living with a troop increases the evolutionary pressure to develop CNS structures, specialized for social interactions, which have survival value in this context.
Neither wolves, nor monkeys (which are highly social, packliving and intelligent animals) are able to cheat in an egoistic manner. Baboons can and do to some extent by being troop animals, but chimpanzees are the masters. It seems that either troop experience or a large brain is needed for cheating.
Man is obviously very apt at cheating. Viewed from the perspective of the individual, cheating may carry some advantages, if used sensibly. For complex social organisations (a local society, a factory, a sports team), cheating is almost always something negative. There is a lot of social pressure against cheating in most cultures, and man is taught rules and moral during his very long childhood, partly through reward and punishment but mainly through imitation and modelling (leadership). The process by which moral rules are integrated in an individual during growth and development have been the focus for much research and still more speculation, scientific (psychology, sociology, philosophy) as well as non-scientific (religious, political). Sources of knowledge and speculation have been normal as well as abnormal (pathological) behaviours.
Moral development has been most extensively investigated within a purely humanistic frame of reference, and particularly within moral philosophy (ethics) and to some extent theology. Kant's view (albeit that it was formulated more than 200 years ago) is of particular interest, partly because it represents 2200 years of thinking within deontological ethics, but mainly because of the intellectual sharpness of his theory. It fits well with the current upsurge of interest for socio-biology and neo-Darwinism. Viewed this way moral competency is a high-level and unique cognitive faculty, a category that human beings cannot abstain from using in their everyday conception of the world and their decisions to act. It has a strong survival value for our species. The human brain is predesigned for this kind of processing, much like it is predesigned for language production and comprehension, and for mathematics and music. In order for the faculty to develop in the individual, an extensive training over many years is necessary, socialization. Therefore, humans have an extended childhood phase, in which the brain is highly plastic and geared for learning.
Already in the 1920ies, sociological role-taking models focusing on the development and integration of societal norms were formulated, partly influenced by current psychoanalytical concepts of ego development. Such models were the starting point for the construction of the Gough Delinquency Scale [2], which was later included, with reversed scoring and a new name (Socialization), in the California Personality Inventory (CPI), and in the Karolinska Scales of Personality, KSP [3].
Another line of thinking is represented by Cleckley [4] in his influential book on psychopathy: "The Mask of Sanity". He based his views on his extensive clinical experience with personality disordered patients. The bottom line of his theory states that psychopaths have a selective semantic defect in understanding moral statements: "They (the psychopaths) know the words but not the music". Similar lines of thinking were presented already in the 18 th century (Pritchard), later to become the anglo-saxon "moral insanity" tradition of psychopathy research. In this tradition, societal factors are regarded as the key to understanding such syndromes, in contrast to the continental constitutional and/or degenerative hypotheses on psychopathy [5,6]. Later research has demonstrated that a psychopathy rating scale (the Psychopath Checklist Revised, PCL-R, Hare [7]), based on Cleckley's set of psychopathy criteria are highly correlated with the CPI Socialization scale [8]. Obviously, traits that are now believed to be characteristic of psychopathy as conceptualized by Hare & Schalling [9], i.e., novelty seeking, impulsiveness, aggressiveness and lack of empathy, are highly conducive to cheating and conning, and to crime. Psychopaths, defined according to PCL-R criteria, make up less than one percent of the male population, constitute less than one third of the prison population, and commit more than half of the serious crimes in a society.
Still another theoretical approach is represented by Kohlberg [10,11], and Kohlberg et al. [12]. They have worked on the issue of moral development and competence from a very different perspective: healthy children of various ages. He and his coworkers have provided detailed schemes of moral development, and tried to describe and characterize each unique level. According to his views, moral develops in the child in much the same way as our cognitive tools according to Piaget, who actually formulated a moral development theory himself, Langford [13]. Whereas the scientific tools in the previously mentioned research traditions have been self-report questionnaires or clinical observations aimed at capturing personality traits and behavioural tendencies, the Kohlberg tradition has typically employed moral problemsolving per se as the area of interest. Subjects are presented with short stories that represent moral dilemmas and are supposed to consider and respond to various solutions to the presented problem. Information is gained by semi-structured expert interviews.
It is an interesting question how these two approaches, the self-report personality traits and the moral dilemmas approach, relate to each other, and how the relation changes with situational factors for various groups of individuals, for instance psychopaths. When nothing can be gained by responding in a particular way to a set of questionnaire items, psychopaths are usually very frank and honest, in contrast to highly socialized individuals. Accordingly, psychopaths have repeatedly been shown to score very low on Lie and Social desirability scales. In spite of their notable skill in manipulating other individuals for their own benefit, they appear to be quite idiosyncratic and impaired in perceiving and interpreting social signals and situations. According to social roletaking theories (GH Mead), and to Cleckley [4], they should have great difficulties in cheating if they have to analyze and respond to moral dilemmas of the kind that can be constructed on the basis of Kohlberg's [10][11][12] model simply because they do not know how to do it. This opens up for the interesting possibility to identify an individual with psychopathic personality traits (which are strongly associated with cheating and conning), or serious deficits of moral competency regardless if s/he tries to dissimulate or not. If individuals are honest in responding to the items, psychopaths will score high on psychopathy and non-psychopaths will score If individuals want to make themselves look better, nonpsychopaths will be able to do that (because they "know the words as well as the music") whereas the psychopaths, or the seriously disturbed individuals in terms of moral competency, are presumed to face great problems in cheating "with credibility" or "fake good" because they simply lack the cognitive tools to do it when it comes to moral dilemmas.
It is an enigma that there are no modern studies whatsoever of moral competency in psychopaths, in spite of the fact that moral incompetency appears to be a key problem for these individuals. In an international conference on psychopathy in Copenhagen 1996, attended by the top researchers in the field, no empirical data on this issue was presented or even known to the audience [16], and this has not changed. However, there is indeed reason to look more widely into the problem of moral competency among other groups of individuals as well. Psychopathic disorders represent an extreme in terms of moral impairment. Moral competency should be regarded as a continuum, and be studied as one aspect of social competence, albeit probably one of very high order and complexity. Obviously, a majority of the war crimes, past and present ones, have been committed by non-psychopathic individuals.

Aims
The overall aim of the Main study (Development and validation of a moral questionnaire) was to construct and validate a questionnaire that combines items/scales from the questionnaire and the "moral dilemma" tradition. One aim was to unravel which dimensions that underlie our moral dilemma items, and another aim was to achieve a construct validation of these items: vs relevant questionnaire scales and vs groups of subjects that can be expected to comprise a high number of individuals with psychopathic personality traits (i.e. prison inmates). A third aim was to estimate roughly how precise a differentiation could be obtained in terms of predictive vs error variance, and if the moral dilemma items contributed to the discrimination with unique variance. Finally, in the Cheating study, we were interested in whether prisoners were able to "fake good", i.e. to respond to the moral issues as welladjusted persons do. In both studies, subjects were guaranteed full anonymity, i.e. nothing was gained by responding in a socially approved manner.

Subjects (Main study)
Subjects were recruited in the early 1990ies from various groups to obtain a wide range of individuals. One large group of subjects were prison inmates in 20 local as well as regional prisons (N=305, 266 men and 39 women). In the local prisons the largest group of inmates was individuals who were sentenced to a fairly short time in prison (a substantial number were convicted of drunken driving). Another large group was individuals with long prison time who had been transferred to a local prison as a preparation for release. A group of gamblers (N=16, 15 men and one woman) was recruited from an institution treating obsessional gambling. Socially well-adjusted subjects were recruited from the Stockholm Police Academy (N=233), and from various companies and officies (Computing, Administrative staff, Finance, Car salesmen and Technicians, N=297), total 338 men and 187 women. In most of the analyses subjects were pooled into a socially maladjusted group (prison inmates and addicts, N=321) and a socially welladjusted group (N=530). There were significant differences in age between these two groups (Age 20-30: 362 vs 157; age 30-45: 121 vs 129; age >45: 5 vs 27; X2(2)=32.9, p<.001 for non-criminals vs prisoners), and a higher rate of men vs women among the prisoners (88% vs 64%). Subjects filled in the questionnaires anonymously. The completed sheets were collected by a contact person at each site, who mailed them back to the sender.

Questionnaire
A compound questionnaire was compiled that comprised 156 items. Items were drawn from four sources. The Lie scale of the Eysenck Personality Questionnaire (21 items, response format Yes/No), the Social Desirability (10 items) and the Socialization (20 items) scales from the Karolinska Scales of Personality (KSP, Schalling, 1993; response format with 4 alternatives), items from the SCID-screen Axis-II instrument judged to be relevant to the following personality disorders: Antisocial (6 items), Narcissistic (5 items), Borderline (7 items) and Passive-Aggressive (4 items). 82 new items were included, inspired from a scale constructed by John B Miner and Philip Ash, and claimed to be useful to assess the honesty of a subject. It should be noted the difference between moral dilemmas in every-day life, and classical dilemmas in the philosophical tradition ("throw one over-board, else all die"). The Swedish version, preliminary denoted the Key Test (KT) consists of number of short descriptive stories involving a moral dilemma, and, for each of these stories, five suggested ways to handle the dilemma. The alternatives are partly structured according to Kohlberg's scheme of moral development [11]. An example of a description of a moral dilemma, and five alternative solutions to the problem, is presented below.
Sven is working as a salesman in a clock shop. He has recently been informed that the shop owner has bought a quantity of forged brand watches very cheaply, which are going to be sold as fullpriced originals.

Treatment of data
Because of the large N and the involvement of multiple comparisons, the general significance level was set at p<.01, so that clinical and statistical significance would match. The analyses proceeded from univariate analyses focusing on item and scale level to multivariate analyses. In a first step we determined the factor structure of the new items, and the validity of the scales constructed according to these factors. Construct validation was assessed in two ways: correlations with conceptually related scales, and differences in means between socially maladjusted and well adjusted groups of subjects. In the final phase the precision on the individual level of the discrimination between socially maladjusted and well adjusted individuals, using the new scales, was assessed, particularly in terms of false positives and negatives (sensitivity and specificity). On the basis of these analyses, the original questionnaire was modified by excluding some items that did not contribute to the discrimination.

Factor structure and Key test (KT) item analysis
One item had many missing responses. Obviously, many subjects found it difficult to respond to it, and it was therefore excluded from further calculations. The remaining items had various ratios of yes responses -no-one was judged to have too small a variance for inclusion. Several factor analyses were run on the remaining 81KT items for 850 subjects, trying various number of factors and oblique vs orthogonal solutions. A 3-factor varimax-rotated orthogonal solution yielded a conceptually meaningful structure, and covered 64 percent of the total variance. The solution was stable (almost identical results when run on two randomly selected split-half assemblies of subjects). On the basis of the communalities, items were allocated to one of these dimensions, leaving 14 items that had no clear association with either of the dimensions. Items within the three scales were fairly homogenous (Cronbach's alpha .89 .84 and .79, respectively for the three dimensions). A conceptual analysis of the item content within each dimension suggested that the first dimension might be labelled "Rule knowledge", quite close to the Aristotle virtue ethical tradition. The second one was conceptually bipolar, with "Rule squaredness" at one endpoint and "I refuse to label anything as dishonesty, and requiring intervention from my side" at the other endpoint. This might, from a philosophical standpoint, be described as reflecting the degree of adherence to deontological ethical principles (Rule adherence). The third dimension reflected a kind of flexible and practical "everyday petty rule-breaking" with a certain amount of reciprocality (today I borrow a company stamp for a private letter, tomorrow I put a private stamp on a company letter), in the interest of overall efficiency. This fits nicely into the third great philosophical tradition of "consequence ethics", Utilitarianism.
Items belonging to each of the three dimensions were then summed, with equal weights, to three scales denoted Rule knowledge, Rule adherence, and Utilitarianism. These scales were moderately intercorrelated (range .61 to .69, around 45% shared variance). Rule knowledge was negatively correlated with the two other scales, which were positively intercorrelated. The distribution was markedly skewed for Rule knowledge, somewhat skewed for Rule adherence, and normal for Utilitarianism. Means, SD: s and range are given in Table 1.

Validation of the scales: correlational analyses
Correlations between the scale values and the validation scales (Lie, Social desirability, Socialization, and the SCID-P Antisocial, Passive-Aggressive and Borderline scales) are shown in Table 2. The correlation pattern conforms to what one would expect.

Copyright © Sten Levander
Volume -3 Issue -3 Mean values ±SD in the KT scales, and in the validation scales are shown in Table 3. The differences among the 9 groups were significant for all scales (one-way ANOVAs). Prisoners and Abusers differed markedly from the other groups on all scales except Lie and Social Desirability, but did not differ themselves. Excluding these two groups, there were still significant overall differences among the socially well-adjusted groups (One-way ANOVAs).

Validation of the scales: differences among subject groups
Individual comparisons among the means (Scheffé) showed that the police academy students were most often different from the other socially well-adjusted groups (25 of 40 comparisons were significant, far more than for any other group). The differences were mainly in a prosocial direction (e.g., higher in Lie, Social Desirability, Socialization etc.), but they were also higher in the KT scale "Rule adherence". In a separate analysis it was shown that Police academy freshmen were more "prosocial" than last-year students, suggesting that the curriculum and experiences provided by the education/training modified the idealistic and rule adherent moral conceptions that characterized freshmen.
The issue whether there were police academy students who appeared to be unsuited to become police officers was checked by plotting the distribution of police students on the back-ground of the well-adjusted subjects (except police students) and the maladjusted subjects. Some of the well-adjusted subject had scores in mal-adjusted range -but none of the police students. The concurrent selection process at the police school seemed to eliminate such applicants.
Mean values ±SD in the KT scales, and in the validation scales are shown for the two sexes in Table 4, and the three age categories in Table 5. Two-way ANOVAs were performed for factors of sex and social adjustment for the 10 scales. All main effects of well vs maladjusted groups were highly significant (p<<.001) except for two scales: Lie (NS) and Social desirability (p<.01). There were no significant interactions and a main effect of sex for only one scale: Borderline, women scoring higher as expected.

Validation of the KT test multivariate analyses
A stepwise discriminant analysis was run with the 10 scales as predictors and social adjustment as criterion. The multiple canonical correlation was .73 (p<<.001). Seven of the ten scales were entered in the algorithm, and are listed in the order of univariate strength of association with the criterion (Table 6). Using the discriminant function 86 percent of the subjects were classified correctly ( Table  7). Most of the failures were false negatives among the prisoners.  In the last phase of the analysis, we aimed at finding rules that maximized the predictive validity by using a minimum of selected scales. In order to explore the usefulness of such an analysis for screening we chose a rather strict cut-off threshold, which would check 20% of the socially well-adjusted subjects. Based on the previous analyses we selected five scales for the final discrimination: the three KT scales and SD and So from KSP.
To start with, subjects who fell at an extreme and undesirable end in one of these scales were checked, so that 10% of the subjects were checked. Furthermore, a multiple regression equation was calculated that took all variables into account concurrently, and maximized the differentiation between prisoners and nonprisoners. Cut-off scores were then calculated so that another 10% of the subjects were checked by the multiple regression equation.
The result of this compound procedure is presented in Table  8 for seven groups of subjects, and for the dichotomy socially well adjusted vs maladjusted (prisoners+abusers). Overall, 79% were correctly classified, 75 of 851 were false negatives and 104 were false positives. Finally an ROC-curve was constructed using the multi-variate prediction variable (Figure 1) ( Table 9). The AUC was 0.91.

Discussion (Main study)
The main finding of this study was that subjects' responses to a set of moral dilemma issues formed a meaningful pattern. The validity was unexpectedly high, as illustrated by the highly significant discrimination between prison inmates and others, and by the consistently meaningful correlation pattern with external validation scales. Between 85 and 91 percent of the 851 subjects could be correctly classified, by the algorithms as being in prison or not. This finding also implicates that prisoners have a markedly different conception of moral than non-prisoners. They do not seem to know either the rules, or be able to fine-tune the balance between strict rules and means-ends moral reasoning. Citing Cleckley (1943) once more, 'They do not know the music'.
Another main finding was the three-factor structure of the KT items, which mirrored the three great philosophical traditions of virtue (rule knowledge), absolute rule orientation (deonotological) and consequence oriented (utilitarianism) ethics. Differences among subjects might be expected to reflect the innate skill in acquiring rule knowledge and social competence, the exposure to socialization, and socio-cultural factors. It is obvious that the current data cannot be extrapolated to other countries or cultures. The identification of the factors was possible probably because of the inclusion of the prisoners and thereby a larger variation in moral attitudes. This factor pattern did not appear when prisoners were excluded, nor in another randomly selected group of healthy volunteers (unpublished, data available from the author).
This is an attempt to design a questionnaire type instrument to assess moral competency, which yielded consistent and meaningful data. It appears that this is a promising way to look empirically into this complex field of human moral decision-making. The use of such questionnaires outside pure research projects, for instance for selection purposes, raises ethical issues. In this case, what will be measured on the individual level is simply a deviation from the average moral in a specific society at a specific point in time. Individuals who lack full moral competency, as well as professionals with particular insights into ethical issues and a special competency in the field will probably fall outside the normal range. This limitation has to be recognized. Furthermore, knowing the rules and possessing the competency does not necessarily mean that acts are governed by moral considerations. Obviously, strong situational factors can override the rule of moral (citing Brecht: "Zuerst kommt das Fressen, dann die Moral"). However, applying some teleological reasoning founded in social-Darwinistic thinking, nature would be unwise to install such a complex system (moral competence) and then does not use it. Our argument is then that if an individual successfully have attained a full and mature moral competence, s/ he will also be strongly driven by this competence when acting, and experience suffering (bad conscience, another complex faculty) if acting immorally.
Concurrently there is a revived interest in morality as a determinant of crime in criminological studies. One theory, the Situational Action theory (Wikström, 2014) is an example of that -to the 'self-control' dimension of the Gottfredsson & Hirschi 'integrated model ' (1990) is added a moral subdimension (Wikström & Svensson, 2010). Morality was the strongest of a many predictors of crime among a large sample of young people (Andersson, Levander, Svensson & Levander, 2012). Empirical data are currently coming out from ongoing studies like The Peterborough project (Wikström, If the KT instrument is used outside a research setting, it will most likely be the target of criticism from some agents, but it will not be possible to claim that it differs principally from a range of other scales that are actually used in selection contexts, even if it purports to measure such a highly charged concept as "moral competence". In that case the issue if it is possible to "fake good" becomes important. Is it possible, then the instrument is useless for selection purposes.