Version 1.0 November 2019

AmsterdamUMC · Nov 25, 2019 · 50eb2a4 · 50eb2a4
commit 50eb2a4
Show file tree

Hide file tree

Showing 23 changed files with 16,516 additions and 0 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,2 @@
+# Auto detect text files and perform LF normalization
+* text=auto
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,11 @@
+**/config.ini
+**/.ipynb_checkpoints
+**/__pycache__
+**/*.csv
+**/*.zip
+**/*.7z
+**/*.bak
+**/*.ps1
+**/*.parquet
+**/dask-worker-space
+
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2019 Patrick Thoral
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,30 @@
+<img src="img/logo_amds.png" alt="Logo" height="128px"/>
+
+# Welcome
+AmsterdamUMCdb is the first freely accessible European intensive care database. It is endorsed by the European Society of Intensive Care Medicine (ESICM) and its Data Science Section. It contains de-identified health data related to tens of thousands of intensive care unit admissions, including demographics, vital signs, laboratory tests and medications.
+
+# Version
+The current version of AmsterdamUMCdb is 1.0, released in November 2019. This version contains data related to 23,371 intensive care unit and high dependency unit admissions of adult patients from 2003-2016.
+
+# Requesting Access
+The database, although de-identified, still contains detailed information regarding the clinical care of patients, so must be treated with appropriate care and respect and cannot be shared without permission. To request access, go to the [Amsterdam Medical Data Science](https://amsterdammedicaldatascience.nl/) website.
+
+# Facts and Figures
+The current database contains data from the clinical patient data management system of the department of Intensive Care, a mixed medical-surgical ICU, from Amsterdam University Medical Center. The clinical data contains 23,371 admissions from 20,169 patients admitted from 2003 to 2016 with a total of almost 1.0 billion clinical observations consisting of vitals, clinical scoring systems, device data and lab results data and 5.0 million medication records.
+
+<img src="img/plot_admissions_year.png" alt="Admissions per year category" height="512px"/>
+<img src="img/plot_admissions_age.png" alt="Admission per age category" height="512px"/>
+
+
+# Available tables
+The table and field definitions are available from the [AmsterdamUMCdb wiki](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki) and from Jupyter Notebooks in the [tables](tables/) folder.
+
+|Table name|Description|
+|:---|:---|
+|[admissions](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/admissions)|admissions and demographic data of the patients admitted to the ICU or MCU|
+|[drugitems](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/drugitems)|medication orders including fluids, (parenteral) feeding and blood transfusions during the stay on the ICU|
+|[freetextitems](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/freetextitems)|observations, including laboratory results, that are based on non-numeric (text) data|
+|[listitems](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/listitems)|categorial observations, e.g. based on a selection from a list, like type of heart rhytm, ventilatory mode, etc.|
+|[numericitems](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/numericitems)| numerical measurements and observations, including vital parameters, data from medical devices, lab results, outputs from drains and foley-catheters, scores etc.|
+|[procedureorderitems](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/procedureorderitems)|procedures and tasks, such as performing a chest X-ray, drawing blood and daily ICU nursing care and scoring|
+|[processitems](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/processitems)|catheters, drains, tubes, and continous non-medication processes (e.g. renal replacement therapy, hypothermia induction, etc.)|
diff --git a/config.SAMPLE.ini b/config.SAMPLE.ini
@@ -0,0 +1,85 @@
+################################################################################
+# SAMPLE config.ini file for AmsterdamUMCdb
+# This configuration file contains settings for the amsterdamumcdb notebooks for
+# connecting to databases. Save the file as config.ini in the root of the 
+# repository
+################################################################################
+
+################################################################################
+# This section stores the settings for the csv containing the actual database
+################################################################################
+[files]
+datapath = ./data
+admissions = admissions.csv
+drugitems = drugitems.csv
+freetextitems = freetextitems.csv
+listitems = listitems.csv
+numericitems = numericitems.csv
+procedureorderitems = procedureorderitems.csv
+processitems = processitems.csv
+
+################################################################################
+# This section stores the settings for connecting to a postgreSQL server using
+# the psycopg2 module.
+################################################################################
+[psycopg2]
+database = postgres
+username = postgres
+password = postgres
+host = 127.0.0.1
+port = 5432
+
+################################################################################
+# This sectios stores the settings for connection to database (sql) servers 
+# from different database servers using the pyodbc package. The Amsterdam UMC 
+# AmsterdamUMCdb project uses Microsoft SQL server and is the default 
+# uncommented connection string. Uncomment the other connection strings 
+# depending on the database server in use. See 
+# [Connecting to databases](https://github.com/mkleehammer/pyodbc/wiki)
+# on the pyodbc GitHub wiki for more information on setting the connection 
+# strings inclusing database and OS specific issues.
+#
+# Note: username/password are not required for Microsoft SQL Server when using 
+# Windows Authentication with Trusted_Connection
+################################################################################
+[pyodbc]
+hostname = myservername.mydomain.com
+database = mydatabase
+username = myusername
+password = mypassword 
+
+#Microsoft SQL Server Connection String using Windows Authentication
+connectionstring = (
+    'DRIVER={ODBC Driver 13 for SQL Server};' #ODBC driver to use
+    'SERVER='+hostname+';'
+    'DATABASE='+database+';'
+    'Trusted_Connection=yes'
+    )
+
+
+#Microsoft SQL Server Connection String using username/password
+# connectionstring = (
+#     'DRIVER={ODBC Driver 13 for SQL Server}' #ODBC driver to use
+#     'SERVER='+hostname+';'
+#     'DATABASE='+database';'
+#     'UID='+username+';'
+#     'PWD='+password+';'
+#     )
+
+#MySQL
+# connectionstring = (
+#     'DRIVER={MySQL};'
+#     'SERVER='+hostname+';'
+#     'DATABASE='+database+';'
+#     'UID='+username+';'
+#     'PWD='+password+';'
+#     )
+
+#PostgreSQL
+# connectionstring = (
+#     'DRIVER={PostgreSQL Unicode(x64)};'
+#     'SERVER='+hostname+';'
+#     'DATABASE='+database+';'
+#     'UID='+username+';'
+#     'PWD='+password+';'
+#     )
diff --git a/data/README.md b/data/README.md
@@ -0,0 +1,8 @@
+<img src="../img/logo_amds.png" alt="Logo" height="128px"/>
+
+# AmsterdamUMCdb - Freely Accessible ICU Database
+version 1.0 November 2019  
+Copyright &copy; 2003-2019 Amsterdam UMC - Amsterdam Medical Data Science
+
+# Data folder
+This folder is a placeholder for the AmsterdamUMCdb csv files. Extract the files into this folder so the Jupyter Notebooks can find them without manually changing the paths. However, you are free to choose another location, but make sure to modify the [`config.SAMPLE.ini`](../config.SAMPLE.ini) file in the root folder of this repository and save it as `config.ini`.
diff --git a/img/avatar_amds.png b/img/avatar_amds.png
diff --git a/img/avatar_amsterdam_umc.png b/img/avatar_amsterdam_umc.png
diff --git a/img/logo_amds.png b/img/logo_amds.png
diff --git a/img/logo_amsterdam_umc.png b/img/logo_amsterdam_umc.png
diff --git a/img/plot_admissions_age.png b/img/plot_admissions_age.png
diff --git a/img/plot_admissions_year.png b/img/plot_admissions_year.png
diff --git a/setup-amsterdamumcdb/README.md b/setup-amsterdamumcdb/README.md
@@ -0,0 +1,49 @@
+<img src="../img/logo_amds.png" alt="Logo" height="128px"/>
+
+# AmsterdamUMCdb - Freely Accessible ICU Database
+version 1.0 November 2019  
+Copyright &copy; 2003-2019 Amsterdam UMC - Amsterdam Medical Data Science
+
+# Setup AmsterdamUMCdb
+## Requirements
+- Access to the AmsterdamUMCdb csv files: request access from [Amsterdam Medical Data Science](https://www.amsterdammedicaldatascience.nl/).
+- Operating system: any OS capable of running Python and PostgreSQL, including Windows, macOS and Linux.
+- Internal memory: 8GB should suffice for basic analysis and running the Jupyter notebooks. However, the recommended memory specification to run both PostgreSQL and the Jupyter Notebooks on the same machine is 16-32 GB.
+- Disk space: Downloading and extracting the database files will require 110 GB of hard disk space. In addition, creating the SQL database requires about 128 GB of hard disk space and and an additional 144 GB for creating the indices to improve query performance. 
+
+## 1. Install a Python distribution
+We **strongly recommend** installing Python using Anaconda, a popular distribution that includes many useful modules for data science out-of-the-box. Install the (latest) Python 3.7 version distribution from [Anaconda's](https://www.anaconda.com/distribution) distribution page.
+
+## 2. Install PostgreSQL
+PostgreSQL is an open source database management system (DBMS), available for most operating systems, including Windows, macOS and Linux. We recommend the installation of the most recent version of PostgreSQL (version 12) from the PostgreSQL [download](https://www.postgresql.org/download/) page. Please note your password for the `postgres` superuser, and if you did not chose `postgres` as the password, you need to modify these settings in the [`config.SAMPLE.ini`](https://github.com/AmsterdamUMC/AmsterdamUMCdb/tree/master/config.SAMPLE.ini) file in the root of the repository. Save the file as [`config.ini`](https://github.com/AmsterdamUMC/AmsterdamUMCdb/tree/master/config.ini).
+
+## 3. Install psycopg2 module
+To connect to your postgreSQL server from Python, the [psycopg2](https://pypi.org/project/psycopg2/) package needs to be installed from the Anaconda Prompt/Shell using conda:
+
+> conda install -c anaconda psycopg2
+## 4. Clone the AmsterdamUMCdb GitHub respository
+Clone or download the [AmsterdamUMCdb](https://github.com/AmsterdamUMC/AmsterdamUMCdb) repository from GitHub. 
+Follow the instructions on GitHub's online step-by-step guide, if needed: https://help.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository. 
+
+## 5. Download the database files
+Download the AmsterdamUMCdb zip file from and extract the files from the zip file to the `data` folder of the cloned AmsterdamUMCdb repository.
+
+## 6. Create database tables
+Start Jupyter notebook server from the command line (using Command Prompt on Windows or Terminal on Mac/Linux) by running:
+
+> jupyter notebook
+
+From the Jupyter file browser, open the `setup-amsterdamumc.ipynb` file from the `setup-amsterdamumc` folder in the cloned repository. The code in the notebook assumes there is a default postgres installation with a dabase named `postgres`, user `postgres` with password `postgres`. You should change these settings in the [`config.SAMPLE.ini`](https://github.com/AmsterdamUMC/AmsterdamUMCdb/tree/master/config.SAMPLE.ini) file in the root of the repository and save the file as [`config.ini`].
+To create the tables in the database run this Jupyter notebook, either cell by cell (▶️ Run) to see what's happening, or use the ⏩ button to to automatically perform all steps. An `amsterdamumc` [schema](https://www.postgresql.org/docs/12/ddl-schemas.html) will be created, and all tables will be added to this schema.
+
+## 7. Verify the database
+After the notebook has been run completely, the postgres database should contain all tables with the same number of records we released. The output should state `Verification: PASSED`.
+
+## 8. Create database table indices
+It's highly recommended to create some useful indices to improve performance for common queries on identifiers like admissionid, itemid and measured times. 
+
+## 9. Jupyter Notebooks
+While the indices are being created, the postgreSQL should be available for querying using the notebooks in the [`tables`](https://github.com/AmsterdamUMC/AmsterdamUMCdb/tree/master/tables) folder (with lower performance). We use  plotly (version >4) for interactive plots in some notebooks. Plotly can be installed by 
+using conda:
+
+> conda install -c plotly plotly