Behind the scenes of Seeking Conviction
How Carolina Public Press analyzed NC court data on sexual assaults
Kate Martin and Frank Taylor, Carolina Public Press
Our series, Seeking Conviction, relies partly on an analysis of North Carolina criminal court records produced by the Administrative Office of the Courts. Court clerks in each of the state’s 100 counties enter criminal and infraction data into a central system, which is basically a large filing cabinet for cases.
People can look up basic information about cases at terminals in courthouses around the state.
Carolina Public Press analyzed data contained in more than 3 gigabytes of zipped text files from the court’s Automated Criminal/Infractions System. Once parsed, the data included hundreds of thousands of lines across several tables, some dozens of columns wide.
These records are public under North Carolina public records law. Anyone can buy the data from the agency for $7 after filing a records request, but it requires knowledge of a scripting language to parse it (see below).
First, Carolina Public Press looked in the data for defendants who were charged starting on Jan. 1, 2014, and whose cases were resolved by June 30, 2018. Cases that remain open are counted separately.
The analysis sought to examine defendants charged with the following six felony crimes to represent rapes that had an element of threats, force, intimidation or involved an incapacitated person:
- First-degree rape (recodified to first-degree forcible rape Dec. 1, 2015).
- First-degree forcible rape.
- Second-degree rape (recodified to second-degree forcible rape Dec. 1, 2015).
- Second-degree forcible rape.
- First-degree forcible sex offense.
- Second-degree forcible sex offense.
In addition, if a separate freeform text description of the crime entered by court clerks met any of the above conditions, those cases were also included.
In a handful of cases, defendants were charged with a different crime — such as the rape of a child or teenager. Despite the use of the word “rape” in the statute, these crimes are based on age and the existence of force. Carolina Public Press’ analysis focused on forcible rapes and excluded these age-based charges, not because they are unimportant but because they are fundamentally different in the way the statutes work, the way cases are built and the way in which they are prosecuted.
However, if a defendant was charged with one of these crimes that do not include the element of force, then ultimately convicted of one of the six sexual assault crimes that do involve force, the defendant was included in the analysis.
Get independent, in-depth and investigative journalism in your inbox
Sign up for free and never miss a CPP news report, investigation, conversation or event.
In the articles of this series, the crimes included in the analysis are generally described as “sexual assault,” a term that is in common usage but does not describe the title of any specific statutory charge in North Carolina. Where the word “rape” is used instead, it either appears in quoted comments or in reference to cases that specifically involved rapes, as defined in the statutes.
Child or statutory rape charges were not included in this analysis. A child can never consent to sex acts under North Carolina law, and sex with a child is a crime. But a forcible rape or sex offense charge can be applied in a case involving a child, so child victims were not excluded if these charges were involved.
Defendants are often charged with multiple crimes in connection with the same event. Consider the following example: A defendant is charged with burglary, assault and first-degree rape. In situations where the rape charge was dropped (either by the district attorney or the judge), but the defendant was prosecuted for one of the other two crimes, Carolina Public Press did not count that as a conviction because there was no conviction for sexual assault. If the defendant pleaded guilty to any of the six rape crimes, it was counted as a conviction, including if the defendant pleaded to a lesser sexual assault charge, as an example, second-degree rape from a first-degree rape charge.
If the rape charge was not dropped, but the defendant pleaded guilty to a lesser crime — misdemeanor sexual battery or assault on a female, for instance — it counted as a conviction but not for rape.
Once Carolina Public Press obtained the number of sexual assault defendants and the number of convictions statewide and in each jurisdiction, it was possible to calculate a conviction rate for this group of defendants.
Carolina Public Press found that 24.2 percent was the statewide conviction rate for both sexual assaults and reduced pleas. This figure is the basis of the “about 1 in 4” description provided in these articles.
Based on this calculation, Carolina Public Press evaluated counties with conviction rates 8 percentage points below 24.2 percent as “Low” counties. Those below the state average by a lesser amount were simply “below average.” Counties with conviction rates that were 8 percentage points or more above the state average were evaluated as “High” counties. Those that were above the state average by a lesser amount were “above average.”
A few, mostly smaller, counties had very low numbers of defendants who matched the criteria for this analysis. If a county had fewer than four defendants, it was not given an evaluation based on its conviction rate, but instead categorized as having insufficient data.
This also helped avoid the problem of division by zero for counties that had no defendants.
Carolina Public Press also evaluated prosecutorial districts based on the same calculations. No prosecutorial districts had too few defendants to be evaluated. Even the counties that had too few defendants to be evaluated on their own were included in the analysis and evaluation of their prosecutorial districts and the statewide totals.
From the data to the human story
Data is a starting point in a journalist’s investigation.
Rape is a crime unlike any other. It leaves indelible memories in the minds of the victims. Rape survivors may decide to pursue cases through the criminal justice system, only to drop out for a variety of reasons.
District attorney’s offices often drop cases because a victim cannot continue. And sometimes judges dismiss charges, also for a variety of reasons.
In rare instances, people are charged with both a rape crime and a high-level felony related to child or statutory rape, and sometimes only the child rape results in a conviction.
This does not count as a rape conviction for our analysis, and these instances only account for a handful of cases statewide.
Analyzing information from the data
Carolina Public Press analyzed this data based on the name, date of birth and the county in which these cases were filed against specific defendants.
Those three fields together created a unique identifier for each defendant. It is a more accurate way to represent convictions because defendants may be charged with multiple sex crimes across multiple cases.
For example, if a defendant were charged with four rape crimes and convicted of only one rape crime, analyzing by defendant produces a 100 percent conviction rate, whereas analyzing by case would show a 25 percent conviction rate.
The system from which the files were extracted was created more than 30 years ago to track cases across the state on a mainframe system. Though modified many times over the years, the file structure is still in use today in all 100 counties of North Carolina.
Carolina Public Press used a Python script to separate the information into several tables from a series of large text files. Then, queries were written in structured query language, or SQL, to isolate the specific rape crimes as well as eliminate duplicates.
Within the data are hundreds of fields spanning multiple tables related to each case. Most crimes are referred to by a four-digit code in the state’s data.
We spent months interviewing sources familiar with the data and court procedure, and reading user manuals and data guides to get a better understanding of field names and court practices. Through this research we were able to assure the results accurately reflected the conviction rates for each prosecutorial district.
Through this research, we identified several ways to eliminate entries that would unreasonably have skewed conviction rates lower.
- Superseding indictments: When a district attorney refiles a case for reasons that may include correcting an error in the original charging paperwork.
- Voluntary dismissal with leave: When the defendant fails to appear in court and cannot otherwise be located. Charges are dropped and may be filed again in the future. In cases like this, the prosecutor did not fail to pursue charges.
- Other procedural court actions that were not an ultimate disposition of a case.
Kate Martin of Carolina Public Press performed the bulk of the analysis for this project. Frank Taylor of Carolina Public Press worked with her on geographical analysis and spot checking. Data experts from several collaborative partners made significant contributions in checking and critiquing this analysis at various stages, including David Raynor of The (Raleigh) News & Observer, Tyler Dukes of WRAL-TV and Jason deBruyn of WUNC North Carolina Public Radio.
Because this information is entered into databases by people, and people make mistakes, it is reasonable to expect the data provided by the state will contain some errors. Whenever and wherever possible, those identified errors were corrected.
For example, each North Carolina county is represented in the data by a three-digit code in the Administrative Office of the Courts’ data. In a few instances, district attorneys alerted Carolina Public Press to cases that were coded incorrectly, which caused the data to be attributed to the wrong county.
Similarly, there is a possibility that a crime code may be entered incorrectly by deputy clerks in each of the state’s 100 counties. Crimes are represented by a four-digit code in the database. Crime codes beginning with 11 are typically sex crimes. First-degree rape is coded as 1103, for instance. Depending on how a clerk keys in a crime code, whether it is on a 10-key or a normal keyboard, a traffic infraction, which typically begins with 44, may be coded instead as a sex crime.
We modified the analysis to account for three such instances where entries were incorrectly entered as rape crimes instead of traffic infractions, though there may be more.
In any and/or all instances where we have been made aware of or discovered coding errors, we corrected the information in the final analysis. We do not know if a rape crime was similarly misclassified by the state as a traffic infraction, though it’s possible.
In addition to correcting a small number of errors that we identified, we conducted numerous spot checks, especially when something raised a red flag. These spot checks found no additional errors beyond those already corrected.
Contact Kate Martin, lead investigative reporter at Carolina Public Press, at firstname.lastname@example.org. Contact Frank Taylor, managing editor at Carolina Public Press, at email@example.com.
Illustration by Mariano Santillan of Carolina Public Press.See all reporting in the Seeking Conviction series
About the collaboration:
Eleven news organizations across North Carolina participated in the reporting of this series, which was coordinated by Carolina Public Press. Those organizations also include: The Fayetteville Observer; The (Durham) Herald-Sun, the Hickory Daily Record, The (Raleigh) News & Observer, the (Greensboro) News & Record, North Carolina Health News, the Winston-Salem Journal, WLOS News 13, WRAL-TV and WUNC North Carolina Public Radio. Go here to find all the reporting done in this project.