{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Creating Custom Opacity Database\n",
    "\n",
    "In this notebook we will take users through how to compute an opacity database file using the full high resolution 1460 opacity grid. \n",
    "\n",
    "Roxana Lupu has posted a version of this 1460 grid online on Zenodo: https://zenodo.org/record/6600976#.YtcFguzMI6E\n",
    "\n",
    "Roxana's file at 1 $\\mu$m is R\\~100k. This makes it suitable for data up to R\\~10K at 1 $\\mu$m.\n",
    "\n",
    "Coming soon, we will also be posting original LBL calculations via the MAESTRO collaboration on a MAESTRO Zenodo community and NASA hosted website. \n",
    "\n",
    "In the meantime, please contact the developers for access to certain raw data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import picaso.opacity_factory as opa_fac#\n",
    "import time\n",
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## First Define Database Specs (min, max wavelength, R)\n",
    "\n",
    "Opacities are computed \"line-by-line\" (LBL) grids that are R~1e6 accross a wide wavelength range (e.g. 0.3-300um). In order to make our calculations computationally feasible, we need to create databases that are a resampled subset of this full calculation.\n",
    "\n",
    "**CAUTIONS regarding resampling**: when you resample an opacity database you must resample to **at least 100x higher than your expected data**. For example: \n",
    "\n",
    "My data is at R=100...\n",
    "I will need to resample to R=10000\n",
    "\n",
    "My data is at R=3000...\n",
    "I will need to resample to R=300000\n",
    "\n",
    "If you want a native line-by-line calculations you don't need to resample. In this case you just make sure that `oldR`=`newR`.  \n",
    "\n",
    "The general procedure of the opacity factory is: \n",
    "\n",
    "1. Interpolate the given opacity bundle to a common wavelength solution that is comparable R (`oldR` below) to the original LBL calculation\n",
    "2. Resample the opacity accordingly to `newR`. For example, if `oldR`=1e6 and `newR`=1e4, then every 100th point is taken for the final opacity specturm\n",
    "3. Insert that resampled opacity to the databse\n",
    "\n",
    "### Things to consider when choosing minw, maxw, and R\n",
    "\n",
    "Here are some typical file sizes to guide your choosing: \n",
    "\n",
    "1. 43G : all_opacities_0.3_15_R50000.db\n",
    "2. 28G : all_opacities_0.3_3_R30000.db\n",
    "3. 12G : all_opacities_0.3_5.3_R10000.db\n",
    "4. 565G : all_opacities_0,3_5,3_R500k.db\n",
    "5. 4.1G : all_opacities_5_14_R10000.db"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#this is where your opacity file should be located if you've set your environments correctly\n",
    "min_wavelength = 1\n",
    "max_wavelength = 3\n",
    "old_R=1e6 #this should be near the value of the LBL calculations \n",
    "#the routine works by first interpolating the LBL calculations uniformly to this opacity\n",
    "#then they are resampled to the new resolution requested by the user\n",
    "\n",
    "#what molecules would you like to include\n",
    "#Lupu et als set set includes \n",
    "#C2H2, C2H4, C2H6, CH4, CO, CO2, CrH, Fe, \n",
    "#FeH, H2, H3+, H2O, H2S, HCN, LiCl, \n",
    "#LiF, LiH, MgH, N2, NH3, OCS, PH3, SiO, \n",
    "#TiO, and VO, in addition to alkali metals (Li, Na, K, Rb, Cs)\n",
    "\n",
    "#let's choose a subset of these for purposes of demonstration (note I usually include \n",
    "#as many as are available)\n",
    "molecules_1460 = ['CH4', 'CO2' , 'H2O' ,  'CO'  ,'H2' ,'Na','K']\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are a three additional opacities files that are used to compute the spectral database: \n",
    "\n",
    "- Optical CH4: Cool optical CH4 is still missing from state of the art CH4 calculations. [Therefore we have to \"stitch in\" optical CH4 at cool temperatures from Karkoschka](https://www.sciencedirect.com/science/article/abs/pii/S0019103598959139)\n",
    "- Optical O3: similar to CH4, we have to stitch in the famous O3 band needed to compute Earth like planets \n",
    "- Continuum induced absorption (e.g. H2-H2, H2-He, H2-N2, H2-CH4) \n",
    "\n",
    "In addition to these continuum sources, the PICASO opacity factory also adds in: \n",
    "- [H2-](https://github.com/natashabatalha/picaso/blob/af8dfef83f507c27d947c93c6d09a8a87c040b98/picaso/opacity_factory.py#L253)\n",
    "- [H-bf](https://github.com/natashabatalha/picaso/blob/af8dfef83f507c27d947c93c6d09a8a87c040b98/picaso/opacity_factory.py#L292)\n",
    "- [H-ff](https://github.com/natashabatalha/picaso/blob/af8dfef83f507c27d947c93c6d09a8a87c040b98/picaso/opacity_factory.py#L321)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "#additional files, if needed\n",
    "#these are located in the picaso_refdata folder \n",
    "dir_extras = os.path.join(os.environ['picaso_refdata'],'opacities')\n",
    "\n",
    "original_continuum = os.path.join(dir_extras,'CIA_DS_aug_2015.dat')\n",
    "dir_kark = os.path.join(dir_extras,'KarkCH4TempDependent.csv')\n",
    "dir_o3 = os.path.join(dir_extras,'O3_visible.txt')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Formats\n",
    "\n",
    "There are only specific file formats that `opacity_factory` will read from. If you would like to be added to the list please contact the developers.  \n",
    "\n",
    "1. p_1, p_2, p_3 file formats from legacy Richard Freedman calculations \n",
    "2. [.npy file formats from Ehsan Gharib-Nezhad](https://zenodo.org/record/4458189#.Ytc6R-zMLvU)\n",
    "3. Roxana Lupu .txt file formats\n",
    "\n",
    "### Directory Assumption Requirement \n",
    "\n",
    "- folder name needs to be the same as the molecule name (e.g. H2O/ would have all the opacity defined above in the molecule_1460 variable name\n",
    "\n",
    "\n",
    "For example, if each individual opacity file is `p_1`: \n",
    "\n",
    "`/data/weighted_cxs_1460/`\n",
    "\n",
    "    |--> H2O/\n",
    "    |----|--> p_1\n",
    "    |----|--> p_2\n",
    "    |----|--> p_3\n",
    "    .....\n",
    "    |----|--> p_N\n",
    "    |--> CH4/\n",
    "    |----|--> p_1\n",
    "    |----|--> p_2\n",
    "    |----|--> p_3\n",
    "    .....\n",
    "    |----|--> p_N\n",
    "    |--> NO2/\n",
    "    \n",
    "### For alkalis \n",
    "\n",
    "Historically the alkalis have come in different formats. Possible inputs for `alkali_dir` below: \n",
    "\n",
    "- For Roxana's Files from Zenodo: set alkalis_dir to `individual_file`, which will use the  alkali name. E.g., will look for \"Na\" in the \"Na\" folder\n",
    "- For Natasha's processed alkalis file: use `alkali` or point to the directory of the alkali folder \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#original deirectory of the 1460 grid \n",
    "#for Lupu files remember to unzip!! \n",
    "og_directory =\"/data/lupu\"\n",
    "#og_directory='/data/weighted_cxs_1460/'\n",
    "#alkalis_dir = '/data/weighted_cxs_1460/alkalis'#this is technically the default (a folder called alkalis in og_directory) but if your alkalis are located somewhere else you can specificy the full path and add it below\n",
    "alkalis_dir = 'individual_file'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Build New Database \n",
    "\n",
    "Note: Do not run if you want to insert to existing database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "opa_fac.build_skeleton('/data/picaso_dbs/test.db')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Insert Molecular opacity \n",
    "\n",
    "### Option 1: my data is low resolution R<100\n",
    "\n",
    "If your data is at low resolution then you can proceed with what is below by resamplig the Lupu files to a lower resolution (R=10k)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Inserting: CH4\n",
      "CH4 inserts finished in :1.8 minutes\n",
      "Inserting: CO2\n",
      "CO2 inserts finished in :1.2 minutes\n",
      "Inserting: H2O\n",
      "H2O inserts finished in :1.7 minutes\n",
      "Inserting: CO\n",
      "CO inserts finished in :1.3 minutes\n",
      "Inserting: H2\n",
      "H2 inserts finished in :1.3 minutes\n",
      "Inserting: Na\n",
      "Na inserts finished in :1.3 minutes\n",
      "Inserting: K\n",
      "K inserts finished in :1.3 minutes\n"
     ]
    }
   ],
   "source": [
    "newR=10000\n",
    "#new database name \n",
    "new_db = f'/data/picaso_dbs/lupu_{min_wavelength}_{max_wavelength}_R{newR}.db' \n",
    "opa_fac.build_skeleton(new_db)\n",
    "for molecule in molecules_1460:#molecules_1460:\n",
    "    start_time = time.time()\n",
    "    print('Inserting: '+molecule)\n",
    "    new_waveno_grid = opa_fac.insert_molecular_1460(molecule, min_wavelength, max_wavelength, og_directory, new_db,\n",
    "                                                    #SEE CHOICE HERE!!!!\n",
    "                                                    new_R=newR, #new_dwno=new_dwno,\n",
    "                                                    alkali_dir=alkalis_dir,\n",
    "            dir_kark_ch4=dir_kark, dir_optical_o3=dir_o3, old_R=old_R) #these two parameters are used to hack in extra cross sections into the db \n",
    "    print(molecule+ ' inserts finished in :' +str((time.time() - start_time)/60.0)[0:3]+' minutes')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Option 2: my data is at high resolution R=10k \n",
    "\n",
    "At high resolution you will need to use the Lupu files with their direct interpolated wavelength solution. This will not do any resampling and will just insert their opacity data into the picaso database. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Inserting: CH4\n",
      "CH4 inserts finished in :1.1 minutes\n",
      "Inserting: CO2\n",
      "CO2 inserts finished in :1.0 minutes\n",
      "Inserting: H2O\n",
      "H2O inserts finished in :1.0 minutes\n",
      "Inserting: CO\n",
      "CO inserts finished in :1.1 minutes\n",
      "Inserting: H2\n",
      "H2 inserts finished in :1.1 minutes\n",
      "Inserting: Na\n",
      "Na inserts finished in :1.1 minutes\n",
      "Inserting: K\n",
      "K inserts finished in :1.0 minutes\n"
     ]
    }
   ],
   "source": [
    "new_db = f'/data/picaso_dbs/lupu_{min_wavelength}_{max_wavelength}_OG_R.db' \n",
    "opa_fac.build_skeleton(new_db)\n",
    "for molecule in molecules_1460:#molecules_1460:\n",
    "    start_time = time.time()\n",
    "    print('Inserting: '+molecule)\n",
    "    new_waveno_grid = opa_fac.insert_molecular_1460(molecule, min_wavelength, max_wavelength, og_directory, new_db,\n",
    "                                                    #SEE CHOICE HERE!!!!\n",
    "                                                    insert_direct=True,\n",
    "                                                    alkali_dir=alkalis_dir,\n",
    "            dir_kark_ch4=dir_kark, dir_optical_o3=dir_o3) #these two parameters are used to hack in extra cross sections into the db \n",
    "    print(molecule+ ' inserts finished in :' +str((time.time() - start_time)/60.0)[0:3]+' minutes')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3) Insert Continuum Opacity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Continuum inserts finished in :0.1 minutes\n"
     ]
    }
   ],
   "source": [
    "start_time = time.time()\n",
    "new_waveno_grid = opa_fac.get_molecular(new_db,[molecules_1460[0]],[500],[1])['wavenumber']\n",
    "\n",
    "opa_fac.restruct_continuum(original_continuum,['wno','H2H2','H2He','H2H','H2CH4','H2N2']\n",
    "                               ,new_waveno_grid, overwrite=False,\n",
    "                               new_db = new_db)\n",
    "print('Continuum inserts finished in :' +str((time.time() - start_time)/60.0)[0:3]+' minutes')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}