edc-export¶
You can export clinicedc models using ModelsToFile. You need to load the django environment in order to use this class.
If you are running a notebook you can load the DJANGO environment likes this:
# %%capture
import os
import pandas as pd
from dj_notebook import activate
from pathlib import Path
env_file = os.environ["META_ENV"] # path to .env file
plus = activate(dotenv_file=env_file)
pd.set_option('future.no_silent_downcasting', True)
- Note:
If you cannot load the Django environment try
clinicedc_utilsfunctionexport_raw_tablesinstead ofModelsToFile. Functionexport_raw_tablespasses simple SELECT ALL statements for each model in a project to pandasread_sqlto generate CSV and STATA files. However,export_raw_tablesdoes not expand foreign keys and M2M columns. See the example below.
Exporting models from a clinicedc project¶
ModelsToFile creates dataframes for each model and exports to either CSV or STATA (118) using pandas to_csv or to_stata. Each dataframe is created using class ModelToDataframe. ModelToDataframe attempts to convert datatypes (datetime, int, float, string) and:
adds subject_identifier, gender, dob to every longitudinal model
expands M2M columns to a comma separated list of responses
expands FK relations for list models to the “name” field of the FK
To export all relevant models from a typical clinicedc project:
from django.apps import apps as django_apps
from edc_export.models_to_file import ModelsToFile
from django.contrib.auth.models import User
from edc_export.constants import CSV, STATA_14
from edc_sites.site import sites
#%%
# user's permissions are checked per model and site when exporting
user = User.objects.get(username="erikvw")
# grab a list of all sites, not just the current
site_ids = list(sites.all().keys())
#%%
study_prefix = "meta"
study_apps = [
f"{study_prefix}_consent",
f"{study_prefix}_lists",
f"{study_prefix}_subject",
f"{study_prefix}_ae",
f"{study_prefix}_prn",
f"{study_prefix}_screening",
]
all_model_names = [
"edc_appointment.appointment",
"edc_data_manager.datadictionary",
"edc_metadata.crfmetadata",
"edc_metadata.requisitionmetadata",
"edc_registration.registeredsubject",
"edc_visit_schedule.subjectschedulehistory",
]
excluded_model_names = []
#%%
# prepare a list of model names in label lower format
for app_config in django_apps.get_app_configs():
if app_config.name.startswith(study_prefix):
if app_config.name in study_apps:
model_names = [
model_cls._meta.label_lower
for model_cls in app_config.get_models()
if "historical" not in model_cls._meta.label_lower
and not model_cls._meta.proxy
and model_cls._meta.label_lower not in excluded_model_names
]
if model_names:
all_model_names.extend(model_names)
#%%
# export to dataframe then CSV; then add all CSV files to a ZIP archive
models_to_file = ModelsToFile(
user=user,
models=all_model_names,
site_ids=site_ids,
decrypt=False,
archive_to_single_file=True,
export_format=CSV
)
# archive (zip) will be in a temp folder. Copy the path and move the file to a more convenient location.
print(models_to_file.archive_filename)
Export management command¶
You can also use the management command export_data to export to either CSV or STATA.
To export the same tables as the example above use the --trial-prefix parameter. The --trial-prefix parameter assumes you have structured your project as a standard clinicedc project. In the case of the META Trial, the trial prefix is “meta”. That is, all trial apps start with meta_ – meta_ae, meta_prn, meta_subject, etc. A few models for edc apps are included as well.
uv run manage,py export_data -f stata -p ~/my/export/folder --trial-prefix meta
If you want better control of what is being exported, use the -a or -m parameters instead of --trial-prefix.
When you use -a, all models in the listed apps are exported
uv run manage,py export_data -f stata -p ~/my/export/folder -a meta_subject,meta_prn
Or you can list the models you want with the -m parameter. Models are listed in label_lower format.
uv run manage,py export_data -f stata -p ~/my/export/folder \
-m edc_appointment.appointment,edc_metadata.crfmetadata,meta_subject.subjectvisit
or combine them
uv run manage,py export_data -f stata -p ~/my/export/folder \
-a meta_subject \
-m edc_appointment.appointment,edc_metadata.crfmetadata,meta_subject.subjectvisit
See the management command for other options
uv run manage,py help export_data
Exporting raw tables¶
See also remote_read_sql
The examples above require access to the Django application. If that is not an option, you can export raw tables with the utility export_raw_tables.
Foreign keys and M2M columns are not expanded in raw tables. You have to do the merging manually.
from pathlib import Path
from clinicedc_utils import export_raw_tables
# change as needed (see remote_read_sql)
# include ssh_config_path to export from a remote database
# using an ssh tunnel
ssh_config_path = Path("~/.remote_read_sql")
# mysql conf. If accessing a remote DB it is recommended
# to use a read_only/SELECT only mysql account
my_cnf_path = Path("~/.my.cnf")
my_cnf_connection_name = "my.database.server"
db_name = "meta3_production"
# all CSV files will be here
data_folder = Path("~/raw_tables/meta3/csv/").expanduser()
conn_opts = dict(
ssh_config_path=ssh_config_path, # remove if not remote
my_cnf_path=my_cnf_path,
my_cnf_connection_name=my_cnf_connection_name,
db_name=db_name,
)
problems = export_raw_tables(
data_folder=data_folder,
subject_visit_table="meta_subject_subjectvisit",
subject_consent_table="meta_consent_subjectconsent",
**conn_opts
)
print(problems)