The 2023 Kidney and Kidney Tumor Segmentation Challenge

The 2023 Kidney and Kidney Tumor Segmentation challenge (abbreviated KiTS23) is a competition in which teams compete to develop the best system for automatic semantic segmentation of kidneys, renal tumors, and renal cysts. It is the third iteration of the KiTS challenge after having taken place in 2019 and 2021.

Kidney cancer is diagnosed in more than 430,000 individuals each year, accounting for roughly 180,000 deaths [1]. Kidney tumors are found in an even greater number each year, and in most circumstances, it's not currently possible to radiographically determine whether a given tumor is malignant or benign [2]. Even among tumors presumed to be malignant, many appear to be slow-growing and indolent in nature, which has given rise to "active surveillance" as an increasingly popular management strategy for small renal masses [3]. That said, the progression to metastatic disease is a very serious concern, and there is a significant unmet need for systems that can objectively and reliably characterize kidney tumor images for stratifying risk and for pedicting treatment outcomes.

For nearly half a decade, KiTS has maintained and expanded upon a publicly-available, multi-institutional cohort of hundreds of segmented CT scans showing kidney tumors, along with comprehensive anonymized clinical information about each case [4]. This dataset has not only served as a high-quality benchmark for 3D semantic segmentation methods [5,6], but also as a common resource for translational research in kidney tumor radiomics [7,8].

For the third time, KiTS is inviting the greater research community to participate in a competition to develop the best automatic semantic segmentation system for kidney tumors. This year's competition features an expanded training set (489 cases), a fresh never-before-used test set (110 cases), and the addition of cases in the nephrogenic contrast phase, whereas previously all cases were in late arterial. We are excited to see how modern approaches can perform in this more diverse and challenging -- and also more broadly-applicable context.

An ITK-SNAP 3D rendering of a segmented case

Fig. 1: The KiTS23 logo.

An axial slice of a segmented case

Fig. 2: An example of a segmented axial slice. The kidney region is shown in purple, the tumor is shown in green, and the cyst is shown in blue.

Timeline All dates 2023.

April 14 Training Dataset Release
July 14 Deadline for Short Paper
July 21 - 28 Submissions Accepted
July 31 Results Announced
October 8 Satellite Event at MICCAI 2023

How to Participate

+ Download the data

We are using a GitHub repository to manage version control for the dataset. The segmentations are stored on GitHub directly, but the imaging must be fetched from separate servers by following the instructions on the repository's

The image and segmentation data are available under a CC BY-NC-SA 4.0 license. This license specifies noncommercial use (i.e., don't rebrand and sell our data), but we do not consider using the data to compete in this challenge to be a commercial use. We therefore allow (and encourage) teams from the industry to participate.

Access via GitHub

+ Create your model

The task is to develop a model which can predict high-quality segmentations for kidneys, kidney tumors, and kidney cysts. The training data consists of several hundred examples on which models can be trained and validated. An explanation for how teams will be evaluated and ranked can be found on the homepage.

Teams are allowed to use other data in addition to the official training set in order to construct their models, however, that data must be publicly available as of April 14, 2023. The same applies to pretrained weights -- they must have been publicly available before the initial KiTS23 dataset was released on April 14. This is to prevent unfair advantages for teams that may have amassed large private datasets. All external data use must be described in detail in each team's accompanying paper (described in the following section).

Wondering where to start? Several teams that competed in previous KiTS Challenges have made the code for their submissions publicly available. A (not comprehensive) list is given below.

You might also benefit from reading the papers of teams that submitted to KiTS19 or KiTS21.

+ Describe your approach with a short paper

The primary goal of challenges like KiTS is to objectively assess the performance of competing methods. This is only possible if teams provide a complete description of the methods they use. Teams should follow the provided template [Overleaf, Google Docs] and provide satisfactory answers to every field. Papers should otherwise follow the MICCAI main conference guidelines for paper formatting. Drafts of these papers must be submitted by July 14, 2023.

Submit Here

+ Submit your predictions

After you submit your paper (above), you will be asked to fill out a form to choose a timepoint during the one-week submission window when you'd like to receive the test imaging download link. After receiving it, you must then return your predictions by replying with an attachment within 24 hours.

Predictions should be submitted as a tar file containing a single .nii.gz file for each test case received. The files should be named exactly the same as the test cases they correspond to. For example, if you receive a test case named case_00700.nii.gz, your submission should contain a file named case_00700.nii.gz. It must also have the same shape and affine matrix as the source, otherwise scoring will fail. Predictions should be encoded in the same way as the label data -- 0 for background, 1 for kidney, 2 for tumor, and 3 for cyst. If floating point values are provided, they will be rounded and converted to integers. Any value outside of 1, 2, or 3 will be treated as background.

You can check your folder before tarring it up using the kits23/ script.

For example:

$ python3 kits23/ \
      --submission-folder /path/to/submission/folder \
      --images-folder /path/to/test/images/folder

Where /path/to/test/images/folder points to the untarred archive of test images you received. This script will print a list of cases that failed the check along with a failure reason. If all files look good, it will print Ready for submission :).

Submit Form

+ Submit your algorithm container (top-5 only)

The MICCAI community has recently seen a strong push towards challenges that require algorithm submission rather than prediction submission in order to ensure that participants are not manually intervening in their algorithm's predictions. We at KiTS take scientific integrity very seriously and fully support this effort to ensure the reproducibility of challenge results.

That said, we worry that requring all teams to submit a container will discourage submissions which might make use of highly novel or computationally expensive approaches which might be difficult to containerize in such a way that they will run on our servers in a limited amount of time. Therefore, we have decided to require only the top-5 teams who will be coauthoring the challenge report to provide a containerized version of their algorithm for us to independently verify its performance. In doing so, these teams retain full ownership of the intellectual property associated with their approach, and unless other explicit permission is given, we will destroy their containers promtly after verification.

This process will be coordinated with teams individually after the results are announced.

+ Attend the satellite event at MICCAI 2023! (optional)

The KiTS23 challenge is run in conjunction with the 2023 MICCAI Conference in Vancouver, Canada. All participants who made a valid submission are welcome to present their work at this event.

Conference Website

MICCAI 2023 Leaderboard

The Data

The KiTS23 cohort includes patients who underwent cryoablation, partial nephrectomy, or radical nephrectomy for suspected renal malignancy between 2010 and 2022 at an M Health Fairview medical center. A retrospective review of these cases was conducted to identify all patients who had undergone a contrast-enhanced preoperative CT scan that includes the entirety of all kidneys.

Each case's most recent contrast-enhanced preoperative scan (in either corticomedullary or nephrogenic phase) was segmented for each instance of the following semantic classes.

  1. Kidney: Includes all parenchyma and the non-adipose tissue within the hilum.
  2. Tumor: Masses found on the kidney that were pre-operatively suspected of being malignant.
  3. Cyst: Kidney masses radiologically (or pathologically, if available) determined to be cysts.

The dataset is composed of 599 cases with 489 allocated to the training set and 110 in the test set. Many of these in the training set were used in previous challenges:

  • KiTS19 (segmentation):
    • case_00000 - case_00209: Training set
    • case_00210 - case_00299: Test set
  • KiTS21 (segmentation):
    • case_00000 - case_00299: Training set
  • KNIGHT (classification):
    • case_00000 - case_00299: Training set
    • case_00400 - case_00499: Test set

The 110 cases in the KiTS23 test set, on the other hand, have not yet been used for any challenge.

A visualization of aggregated tumor shapes

Fig. 3: A visualization showing the largest axial slice of each tumor in the dataset overlayed onto one image

A visualization of aggregated tumor shapes

Fig. 4: A visualization showing the boundary between the tumor and the kidney on the the tumor's largest axial slice overlayed onto one image

Data Annotation Process

Just like KiTS21, we have decided to perform our image annotations on a web-based platform in full view of the public. We hope that this will encourage participants to scrutinize this process and dataset in greater detail, and perhaps reproduce something similar for their own work.

The process for KiTS23 is very similar to what was described for KiTS21. The primary difference is that we have decided to skip the "guidence" step, and instead have the team of trainees directly perform the "delination" step, and in so doing, each region of interest is segmented only once. We made this decision based on the KiTS21 finding that the three delineations per ROI offered only marginal performance boosts to methods that made use of it, if any at all, and therefore does not seem to represent a wise allocation of scarce annotation resources.

If you choose to browse the annotation (and we hope you do!), you will see a card for each case that indicates its status in the annotation process. An example of this is shown in Fig. 5. The meaning of each symbol is as follows:

  • circle Awaiting upstream annotation. For example, delinations cannot be started until guidance is accepted.
  • circle Annotation in progress.
  • check Submitted, but waiting for downstream annotation to finish before queueing for review. For example, localization and guidance must be reviewed at the same time, so localizations must wait for guidance to be submitted before going out for review.
  • rules In need of review.
  • error Rejected by review, in need of revisions.
  • check_circle Accepted by review.

When you click on an icon, you will be taken to an instance of the ULabel Annotation Tool where you can see the raw annotations made by our annotation team. Only logged-in members of the annotation team can submit their changes to the server, but you may make edits and save them locally if you like. The annotation team's progress is synced with the KiTS23 GitHub repository about once per week.

It's important to note the distinction between what we call "annotations" and what we call "segmentations". We use "annotations" to refer to the raw vectorized interactions that the user generates during an annotation session. A "segmentation," on the other hand, refers to the rasterized output of a postprocessing script that uses "annotations" to define regions of interest.

We placed members of our annotation team into three categories:

  • Experts: Attending radiologists, urologic oncologists, and urologic oncology fellows
  • Trainees: Residents, medical students, and undergraduates planning to study medicine. All trainees received several hours of training from the experts

Broadly, our annotation process is as follows.

  1. Trainees place 3D bounding-boxes around each region of interest
  2. These same Trainees then place 2D contours around each region of interest. In case of any doubt, experts are consulted.
  3. Contour annotations are postprocessed to generate segmentations (shown in Fig. 2)
  4. Segmentations are reviewed by trainees and subjected to minor revisions as needed
  5. A baseline nnU-Net model is trained for this segmentation task, and cases with low agreement between the labels and cross-validation predictions are further reviewed by experts as needed

Our postprocessing script uses thresholds and fairly simple heuristic-based geometric algorithms. Its source code is available on the KiTS23 GitHub repository under /kits23/annotation/

An example of a card on the browse page

Fig. 5: An example of a case in progress as it's listed on the browse page.

An animation showing a kidney delineation

Fig. 6: An animation showing a kidney delineation of one axial slice. You'll notice that the contour does not keep perfectly snug to the ROI. This is because the postprocessing can easily remove the included perinephric fat using a Hounsfield Unit threshold.

How to Win

Like KiTS21 and KiTS19 before it, we will be using "Hierarchical Evaluation Classes" (HECs) rather than each ROI alone. The HECs are as follows:

  • Kidney and Masses: Kidney + Tumor + Cyst
  • Kidney Mass: Tumor + Cyst
  • Tumor: Tumor only

For a more detailed discussion of why we use HECs, please see the explanation from KiTS21.

We will also be using the same two metrics that we used in KiTS21:

  1. Sørensen-Dice
  2. Surface Dice, as described in [9]

These will be computed for every HEC of every case of the test set and the metrics will be averaged over each HEC. The teams will then be ranked based on both average metrics. The final leaderboard place will be determined by averaging the leaderboard places between the two rankings, sometimes called "rank-then-aggregate". In the case of any ties, average Sørensen-Dice value on the "Tumor" HEC will be used as a tiebreaker. An implementation of these metrics and ranking procedure can be found at /kits23/evaluation/.

The winning team will be awarded $5,000 USD after their containerized solution has been verified


This project is a collaboration between the University of Minnesota Robotics Institute (MnRI), the Helmholtz Imaging at the German Cancer Research Center (DKFZ), and the Cleveland Clinic's Urologic Cancer Program.

Annotation Team

Logo for the University of Minnesota

Logo for Helmholtz Imaging at the German Cancer Research Center

Logo for the Cleveland Clinics


  1. Sung, Hyuna, et al. "Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries." CA: a cancer journal for clinicians 71.3 (2021): 209-249. [html]
  2. de Leon, Alberto Diaz, and Ivan Pedrosa. "Imaging and screening of kidney cancer." Radiologic Clinics 55.6 (2017): 1235-1250. [html]
  3. Mir, Maria Carmen, et al. "Role of active surveillance for localized small renal masses." European Urology Oncology 1.3 (2018): 177-187. [html]
  4. Heller, Nicholas, et al. "The kits19 challenge data: 300 kidney tumor cases with clinical context, ct semantic segmentations, and surgical outcomes." arXiv preprint arXiv:1904.00445 (2019). [html]
  5. Heller, Nicholas, et al. "The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 challenge." Medical image analysis 67 (2021): 101821. [html]
  6. Isensee, Fabian, et al. "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation." Nature methods 18.2 (2021): 203-211. [html]
  7. Haarburger, Christoph, et al. "Radiomics feature reproducibility under inter-rater variability in segmentations of CT images." Scientific reports 10.1 (2020): 1-10. [html]
  8. Heller, Nicholas, et al. "Computer-generated RENAL nephrometry scores yield comparable predictive results to those of human-expert scores in predicting oncologic and perioperative outcomes." The Journal of urology 207.5 (2022): 1105-1115. [html]
  9. Nikolov, Stanislav, et al. "Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy." arXiv preprint arXiv:1809.04430 (2018). [html]