edc-export

You can export clinicedc models using ModelsToFile. You need to load the django environment in order to use this class.

If you are running a notebook you can load the DJANGO environment likes this:

# %%capture
import os
import pandas as pd
from dj_notebook import activate
from pathlib import Path

env_file = os.environ["META_ENV"] # path to .env file

plus = activate(dotenv_file=env_file)
pd.set_option('future.no_silent_downcasting', True)
Note:

If you cannot load the Django environment try clinicedc_utils function export_raw_tables instead of ModelsToFile. Function export_raw_tables passes simple SELECT ALL statements for each model in a project to pandas read_sql to generate CSV and STATA files. However, export_raw_tables does not expand foreign keys and M2M columns. See the example below.

Exporting models from a clinicedc project

ModelsToFile creates dataframes for each model and exports to either CSV or STATA (118) using pandas to_csv or to_stata. Each dataframe is created using class ModelToDataframe. ModelToDataframe attempts to convert datatypes (datetime, int, float, string) and:

  • adds subject_identifier, gender, dob to every longitudinal model

  • expands M2M columns to a comma separated list of responses

  • expands FK relations for list models to the “name” field of the FK

To export all relevant models from a typical clinicedc project:

from django.apps import apps as django_apps
from edc_export.models_to_file import ModelsToFile
from django.contrib.auth.models import User
from edc_export.constants import CSV, STATA_14
from edc_sites.site import sites

#%%

# user's permissions are checked per model and site when exporting
user = User.objects.get(username="erikvw")

# grab a list of all sites, not just the current
site_ids = list(sites.all().keys())

#%%
study_prefix = "meta"
study_apps = [
    f"{study_prefix}_consent",
    f"{study_prefix}_lists",
    f"{study_prefix}_subject",
    f"{study_prefix}_ae",
    f"{study_prefix}_prn",
    f"{study_prefix}_screening",
]
all_model_names = [
    "edc_appointment.appointment",
    "edc_data_manager.datadictionary",
    "edc_metadata.crfmetadata",
    "edc_metadata.requisitionmetadata",
    "edc_registration.registeredsubject",
    "edc_visit_schedule.subjectschedulehistory",
]

excluded_model_names = []

#%%
# prepare a list of model names in label lower format
for app_config in django_apps.get_app_configs():
    if app_config.name.startswith(study_prefix):
        if app_config.name in study_apps:
            model_names = [
                model_cls._meta.label_lower
                for model_cls in app_config.get_models()
                if "historical" not in model_cls._meta.label_lower
                and not model_cls._meta.proxy
                and model_cls._meta.label_lower not in excluded_model_names
            ]
            if model_names:
                all_model_names.extend(model_names)


#%%
# export to dataframe then CSV; then add all CSV files to a ZIP archive
models_to_file = ModelsToFile(
    user=user,
    models=all_model_names,
    site_ids=site_ids,
    decrypt=False,
    archive_to_single_file=True,
    export_format=CSV
)

# archive (zip) will be in a temp folder. Copy the path and move the file to a more convenient location.
print(models_to_file.archive_filename)

Export management command

You can also use the management command export_data to export to either CSV or STATA.

To export the same tables as the example above use the --trial-prefix parameter. The --trial-prefix parameter assumes you have structured your project as a standard clinicedc project. In the case of the META Trial, the trial prefix is “meta”. That is, all trial apps start with meta_ – meta_ae, meta_prn, meta_subject, etc. A few models for edc apps are included as well.

uv run manage,py export_data -f stata -p ~/my/export/folder --trial-prefix meta

If you want better control of what is being exported, use the -a or -m parameters instead of --trial-prefix.

When you use -a, all models in the listed apps are exported

uv run manage,py export_data -f stata -p ~/my/export/folder -a meta_subject,meta_prn

Or you can list the models you want with the -m parameter. Models are listed in label_lower format.

uv run manage,py export_data -f stata -p ~/my/export/folder \
  -m edc_appointment.appointment,edc_metadata.crfmetadata,meta_subject.subjectvisit

or combine them

uv run manage,py export_data -f stata -p ~/my/export/folder \
  -a meta_subject \
  -m edc_appointment.appointment,edc_metadata.crfmetadata,meta_subject.subjectvisit

See the management command for other options

uv run manage,py help export_data

Exporting raw tables

See also remote_read_sql

The examples above require access to the Django application. If that is not an option, you can export raw tables with the utility export_raw_tables.

Foreign keys and M2M columns are not expanded in raw tables. You have to do the merging manually.

from pathlib import Path
from clinicedc_utils import export_raw_tables

# change as needed (see remote_read_sql)

# include ssh_config_path to export from a remote database
# using an ssh tunnel
ssh_config_path = Path("~/.remote_read_sql")

# mysql conf. If accessing a remote DB it is recommended
# to use a read_only/SELECT only mysql account
my_cnf_path = Path("~/.my.cnf")
my_cnf_connection_name = "my.database.server"

db_name = "meta3_production"

# all CSV files will be here
data_folder = Path("~/raw_tables/meta3/csv/").expanduser()

conn_opts = dict(
    ssh_config_path=ssh_config_path, # remove if not remote
    my_cnf_path=my_cnf_path,
    my_cnf_connection_name=my_cnf_connection_name,
    db_name=db_name,
)

problems = export_raw_tables(
    data_folder=data_folder,
    subject_visit_table="meta_subject_subjectvisit",
    subject_consent_table="meta_consent_subjectconsent",
    **conn_opts
)

print(problems)