Note

Model Formating Overview¶

This notebook was created to enable common formatting for the Early Release Science modeling initiatives. We will review:

Variable terminology
File naming schemes
Data formating
Physical unit archiving

This is namely for the booking of the following model types:

1D-3D climate (spatial- and/or altitude- dependent climate)
1D-3D chemistry (spatial- and/or altitude- dependent atmospheric composition)
1D-3D cloud (spatial- and/or altitude- dependent single scattering, asymmetries, cloud optical depth)
Spectroscopy (flux, transit depth as a function of wavelength)

However, it can be applied to other modeling products (e.g. mixing profiles).

Variable Terminology¶

All file names and meta data should conform to the following variable names. Note, that these will not apply to all models. This is just an initial list. Please shoot me a DM, or slack for additional parameters to add (natasha.e.batalha@nasa.gov, or ERS slack channel)

Planet parameters (`planet_params`)¶

rp: planet radius
mp: planet mass
tint: object internal temperature
heat_redis: heat redistribution (only relevant for irradiated objects)
p_reference: reference pressure radius
pteff: planetary effective temperature
mh : metallicity
cto : carbon to oxygen ratio
logkzz : log of the kzz eddy diffusion

Stellar parameters (`stellar_params`)¶

logg : gravity
feh : stellar metallicity
steff : stellar effective temperature
rs : stellar radius
ms : stellar mass

Orbital parameters (`orbit_params`)¶

sma : semi-major axis

Cloud parameters (`cld_params`)¶

opd : extinction optical depth
ssa : single scattering albedo
asy : asymmetry parameter
fsed : cloud sedimentation efficiency parameter

Model Gridding ( `coords`)¶

pressure: pressure grid
wavelength: wavelength grid
wno: wavenumber grid
lat: latitude grid
lon: longitude grid

Model Output ( `data_vars`)¶

There are SO many different model outputs users will want to pass. For the purposes of ERS, we will focus on these, but feel free to send recommendations for more. Note that in your xarray file there will not be a separation between these categories. They will all be lumped into data_vars. However their coordinate systems will be different! The beauty of xarray!

Spectrum¶

transit_depth : transmission spectrum reported as unitless depth (rp/rs)^2. This way it can be directly compared to data.
fpfs_emission : relative emission spectrum (unitless)
fpfs_reflection relative reflected light spectrum (unitless)
flux_emission : thermal emission in raw flux units
albedo : albedo spectrum

Chemistry¶

case sensitive molecule names (e.g. Na, H2O, TiO) for each chemical abundance (either 1d or 3d). This means your mixing ratio profile for TiO would not be TIO. Or, for example the chemical profile for sodium would be “Na” not NA

Climate¶

temperature: computed temperature profile either 1d or 3d

Cloud¶

opd : extinction optical depth
ssa : single scattering albedo
asy : asymmetry parameter

Retrieval output¶

Coming soon.

Specifying units¶

We should be able to convert all units to astropy.units. For unitless parameters (e.g. single scattering albedo, optical depth) unitless designation should be provided. See example:

[1]:

import astropy.units as u

[2]:

#examples of valid units

u.Unit('cm') #Valid
#u.Unit('CM') #NOT valid
u.Unit("R_jup")#Valid
u.Unit("R_jupiter")#Valid
#u.Unit("R_Jupiter")#NOT Valid

[2]:

$\mathrm{R_{\rm J}}$

[3]:

unit = 'cm'
#doing it this away enables easy conversions. for example:
(1*u.Unit('R_jup')).to('cm')

[3]:

$7.1492 \times 10^{9} \; \mathrm{cm}$

Storing `xarray` data¶

Filenaming¶

We usually rely on a long filename to give us information about the model. If we properly use attrs then filenaming does not matter. However, friendly filenames are always appreciated by people using your models. We suggest the following naming convention.

Given independent variables (x,y,z): tag_x{x}_y{y}_z{z}.nc

For example: jupiter_mh1_teff1000_tint100.nc

Using `netcdf`¶

“The recommended way to store xarray data structures is netCDF, which is a binary file format for self-described datasets that originated in the geosciences. Xarray is based on the netCDF data model, so netCDF files on disk directly correspond to Dataset objects (more accurately, a group in a netCDF file directly corresponds to a Dataset object. See Groups for more.)” - Quoted from xarray website

[13]:

ds.to_netcdf("/data/picaso_dbs/fakeplanet_1000teq.nc")

Using `pickle`¶

[14]:

import pickle as pk
pk.dump(ds, open("/data/picaso_dbs/fakeplanet_1000teq.pk",'wb'))

Checking your data is in compliance¶

TLDR: this function will check that your data can be properly interpretted

[22]:

def data_check(usr_xa):
    """This function will check that all the requirements have been met"""

    #step 1: check that required attributes are present
    assert 'author' in usr_xa.attrs ,'No author information in attrs'
    assert 'contact' in usr_xa.attrs ,'No contact information in attrs'
    assert 'code' in usr_xa.attrs , 'Code used was not specified in attrs'

    #step 2: check that all coordinates have units
    try:
        for i in usr_xa.coords.keys(): test= usr_xa[i].units
    except AttributeError:
        print(f'Missing unit for {i} coords')

    #step 2: check that all coordinates have units
    try:
        for i in usr_xa.data_vars.keys(): test=usr_xa[i].units
    except AttributeError:
        print(f'Missing unit for {i} data_var')

    #step 3: check that some attrs is a proper dictionary
    try :
        for i in usr_xa.attrs:
            #these need to be dictionaries to be interpretable
            if i in ['planet_params','stellar_params','cld_params','orbit_params']:
                json.loads(usr_xa.attrs[i])
    except ValueError:
        print(f"Was not able to read attr for {i}. This means that you did not properly define a dictionary with json and a dict."," For example: json.dumps({'mp':1,'rp':1})")

    #step 4: hurray if you have made it to here this is great
    #last thing is the least important -- to make sure that we agree on terminology
    for i in usr_xa.attrs:
        if i == 'planet_params':
            for model_key in json.loads(usr_xa.attrs[i]).keys():
                assert model_key in ['rp', 'mp', 'tint', 'heat_redis', 'p_reference','rainout','p_quench',
                'pteff', 'mh' , 'cto' , 'logkzz'], f'Could not find {model_key} in listed planet_params attr. This might be because we havent added it yet! Check your terms and contact us if this is the case'

        elif  i == 'stellar_params':
            for model_key in json.loads(usr_xa.attrs[i]).keys():
                assert model_key in ['logg', 'feh', 'steff', 'rs', 'ms',
                ], f'Could not find {model_key} in listed stellar_params attr. This might be because we havent added it yet! Check your terms and contact us if this is the case'

        elif  i == 'orbit_params':
            for model_key in json.loads(usr_xa.attrs[i]).keys():
                assert model_key in ['sma',
                ], f'Could not find {model_key} in listed orbit_params attr. This might be because we havent added it yet! Check your terms and contact us if this is the case'

        elif  i == 'cld_params':
            for model_key in json.loads(usr_xa.attrs[i]).keys():
                assert model_key  in ['opd','ssa','asy','fsed','p_cloud','haze_effec',
                ], f'Could not find {model_key} in listed cld_params attr. This might be because we havent added it yet! Check your terms and contact us if this is the case'

    print('SUCCESS!')

[23]:

ds_sm = xr.open_dataset("profile_eq_planet_300_grav_4.5_mh_+2.0_CO_2.0_sm_0.0486_v_0.5_.nc")
data_check(ds_sm)

SUCCESS!

[ ]:

Model Formating Overview¶

Variable Terminology¶

Planet parameters (`planet_params`)¶

Stellar parameters (`stellar_params`)¶

Orbital parameters (`orbit_params`)¶

Cloud parameters (`cld_params`)¶

Model Gridding ( `coords`)¶

Model Output ( `data_vars`)¶

Spectrum¶

Chemistry¶

Climate¶

Cloud¶

Retrieval output¶

Specifying units¶

Data Types: Using `xarray`¶

Easy Example: 1D data: e.g. P-T profiles, chemistry¶

Looping to add many variables to `data_vars`¶

2D data: e.g. cloud profiles with pressure vs wavenumber¶

3D data: e.g. GCM pressure grid¶

Storing `xarray` data¶

Filenaming¶

Using `netcdf`¶

Using `pickle`¶

Reading/interpreting an `xarray` file¶

Checking your data is in compliance¶

Model Formating Overview¶

Variable Terminology¶

Planet parameters (planet_params)¶

Stellar parameters (stellar_params)¶

Orbital parameters (orbit_params)¶

Cloud parameters (cld_params)¶

Model Gridding ( coords)¶

Model Output ( data_vars)¶

Spectrum¶

Chemistry¶

Climate¶

Cloud¶

Retrieval output¶

Specifying units¶

Data Types: Using xarray¶

Easy Example: 1D data: e.g. P-T profiles, chemistry¶

Looping to add many variables to data_vars¶

2D data: e.g. cloud profiles with pressure vs wavenumber¶

3D data: e.g. GCM pressure grid¶

Storing xarray data¶

Filenaming¶

Using netcdf¶

Using pickle¶

Reading/interpreting an xarray file¶

Checking your data is in compliance¶

Planet parameters (`planet_params`)¶

Stellar parameters (`stellar_params`)¶

Orbital parameters (`orbit_params`)¶

Cloud parameters (`cld_params`)¶

Model Gridding ( `coords`)¶

Model Output ( `data_vars`)¶

Data Types: Using `xarray`¶

Looping to add many variables to `data_vars`¶

Storing `xarray` data¶

Using `netcdf`¶

Using `pickle`¶

Reading/interpreting an `xarray` file¶