{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to Fit Grid Models to Data\n",
    "\n",
    "In this notebook, we show how to use the `PICASO`-formatted grid models to interpret data. We will use the results of the [JWST Transiting Exoplanet Community Early Release Science Team's](https://arxiv.org/pdf/2208.11692.pdf) first look analysis of WASP-39 b.\n",
    "\n",
    "**Helpful knowledge before running this notebook:**\n",
    "\n",
    "- [How to use xarray files](https://natashabatalha.github.io/picaso/notebooks/codehelp/data_uniformity_tutorial.html) \n",
    "- [Basic PICASO knowledge of how to compute transit spectra](https://natashabatalha.github.io/picaso/notebooks/workshops/ESO2021/ESO_Tutorial.html)\n",
    "\n",
    "**Need to do before running this notebook**\n",
    "\n",
    "In order to use this notebook you will have to: \n",
    "\n",
    "- [Download and unpack the zenodo grid models](https://doi.org/10.5281/zenodo.7236759)\n",
    "- [Download the final planet spectrum](https://zenodo.org/record/6959427#.Y1M0U-zMLvU) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import os\n",
    "\n",
    "import picaso.justdoit as jdi\n",
    "import picaso.justplotit as jpi\n",
    "import picaso.analyze as lyz\n",
    "jpi.output_notebook()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Paths To Data and Models\n",
    "\n",
    "You should have four folders in your `model_dir`: \n",
    "\n",
    "1. `RCTE_cloud_free/`: 192 models \n",
    "2. `RCTE_cloudy/`: 3840 models \n",
    "3. `photochem_cloud_free/`: 116 models\n",
    "4. `photochem_cloudy/`: 580 models \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#should have sub folders similar to above\n",
    "#agnostic to where it is, just make sure you point to the right data file\n",
    "model_dir = \"/data2/models/WASP-39B/xarray/\"\n",
    "\n",
    "#downloaded and unzipped from Zenodo\n",
    "data_dir = '/data2/observations/WASP-39b/ZENODO/TRANSMISSION_SPECTRA_DATA/'\n",
    "#for this tutorial let's grab the firely reduction\n",
    "data_file = os.path.join(data_dir,\"FIREFLY_REDUCTION.txt\")\n",
    "\n",
    "wlgrid_center,rprs_data2,wlgrid_width, e_rprs2 = np.loadtxt(data_file,usecols=[0,1,2,3],unpack=True,skiprows=1)\n",
    "\n",
    "#for now, we are only going to fit 3-5 um\n",
    "wh =  np.where(wlgrid_center < 3.0)\n",
    "wlgrid_center = np.delete(wlgrid_center,wh[0])\n",
    "wlgrid_width = np.delete(wlgrid_width,wh[0])\n",
    "rprs_data2 = np.delete(rprs_data2,wh[0])\n",
    "e_rprs2 = np.delete(e_rprs2,wh[0])\n",
    "reduction_name = \"Firefly\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f=jpi.plot_errorbar(wlgrid_center, rprs_data2,e_rprs2, plot_type='matplotlib',\n",
    "                   plot_kwargs={'ylabel':r'(R$_p$/R$_*$)$^2$'})#plot_type='bokeh' also available\n",
    "#jpi.show(f) #if using bokeh (note if using bokeh need those key words (e.g. y_axis_label instead of ylabel))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Add Available Grids to Test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First step will be to load your first grid into the `GridFitter` class. You can do this easily by supplying the function a directory location, and grid name (`grid_name`). \n",
    "\n",
    "The only purpose of `grid_name` is in case you add more grids to your `GridFitter` function, it will be easy to keep track of what parameters go with what grid.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "grid_name = \"picaso_cld_free\"\n",
    "location = os.path.join(model_dir,\"RCTE_cloud_free\")\n",
    "fitter = lyz.GridFitter(grid_name,location, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This shows you what parameters the grid was created over"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fitter.grid_params['picaso_cld_free']['planet_params'].keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "location = os.path.join(model_dir,\"RCTE_cloudy\")\n",
    "fitter.add_grid('picaso_cldy', location)#"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Explore the parameters of the grid"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see what grids you have loaded"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fitter.grids #what grids exist"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also see what the top level information about your grid is"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fitter.overview['picaso_cld_free']#top level info from the attrs "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The full list of planet parameters can also be cross referenced against the full list of file names so you can easily plot of different models. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(fitter.grid_params['picaso_cld_free']['planet_params']['tint'][0], \n",
    "#this full list can be cross referened against the file list \n",
    "fitter.list_of_files['picaso_cld_free'][0])\n",
    "#in this case we can verify against the filename"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Add Datasets to Explore\n",
    "\n",
    "Though the models are interesting, what we are really after is which is most representative of the data. So now let's add some datasets to explore. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fitter.add_data('firefly',wlgrid_center, wlgrid_width, rprs_data2, e_rprs2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compute $\\chi_{red}^2$/N and Retrieve Single Best Fit\n",
    "\n",
    "In this analysis we used the reduced chi sq per data point as a metric to fit the grid. This fitter function will go through your whole grid and compute cross reference the chi sq compared to your data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fitter.fit_grid('picaso_cld_free','firefly')\n",
    "fitter.fit_grid('picaso_cldy','firefly')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have accumulated results let's turn this into a dictionary to easily see what we've done"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "out = fitter.as_dict()#allows you to easily grab data\n",
    "out.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are most interested in the models with the best reduced chi sq. We can use our ranked order to get the models that best fit the data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "### Use rank order to get the top best fit or other parameters\n",
    "#top 5 best fit models metallicities for the cloud free grid\n",
    "print(\"cld free\",np.array(out['grid_params']['picaso_cld_free']['planet_params']['mh']\n",
    "        )[out['rank_order']['picaso_cld_free']['firefly']][0:5])\n",
    "\n",
    "#top 5 best fit models metallicities for the cloudy grid\n",
    "print(\"cldy\",np.array(out['grid_params']['picaso_cldy']['planet_params']['mh']\n",
    "        )[out['rank_order']['picaso_cldy']['firefly']][0:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Interesting! We are already seeing interesting information. Without clouds our model predicts higher metallicity than when we add clouds. Let's look at the associated chi square values. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#top 5 best fit chi sqs for the cloud free grid\n",
    "print(\"cld free\", np.array(out['chi_sqs']['picaso_cld_free']['firefly']\n",
    "        )[out['rank_order']['picaso_cld_free']['firefly']][0:5])\n",
    "      \n",
    "#top 5 best fit chi sq for the cloudy grid\n",
    "print(\"cldy\", np.array(out['chi_sqs']['picaso_cldy']['firefly']\n",
    "        )[out['rank_order']['picaso_cldy']['firefly']][0:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The cloudy grid is giving lower chi square giving us clues that this planet likely has clouds affecting the spectrum."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analyze Single Best Fits\n",
    "\n",
    "Let's analyze the single best fits in order to compare the spectrum with the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig,ax = fitter.plot_best_fit(['picaso_cld_free','picaso_cldy'],'firefly')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By-eye, our cloudy grid is giving a much better representation of the data. Let's look at what physical parameters are associated with this. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "best_fit = fitter.print_best_fit('picaso_cldy','firefly')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see these same parameters reported in original Nature paper: https://arxiv.org/pdf/2208.11692.pdf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Estimated Posteriors \n",
    "\n",
    "It is also helpful to get an idea of what the probability is for each grid parameter in your model. This will give you a better representation of degeneracies that exist with your data and each of your physical parameters. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "posterior_chance_dict, fig = fitter.plot_chi_posteriors(['picaso_cldy', 'picaso_cld_free'], \n",
    "                                                        'firefly', max_row=3, max_col=2, input_parameters='all')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "posterior_chance_dict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What can you take away from this plot? \n",
    "1. Cloudy models reduce the number of models that can be fit to the data with high metallicity\n",
    "2. Internal temperature cannot be constrained by the data \n",
    "3. C/O ratios greater than ~0.8 can be ruled out by the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Interpret Best Fit\n",
    "\n",
    "Now that we are happy with the best-fitting model, we can load in that data and post process some plots in order to gain better understanding of our results.\n",
    "\n",
    "We can use `PICASO`'s `xarray` loader to quickly load in one of our models. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#grab top model \n",
    "top_model_file  = np.array(out['list_of_files']['picaso_cldy']\n",
    "        )[out['rank_order']['picaso_cldy']['firefly']][0]\n",
    "\n",
    "xr_usr = jdi.xr.load_dataset(top_model_file)\n",
    "#take a look at the Xarray file \n",
    "xr_usr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "opa = jdi.opannection(wave_range=[3,5])\n",
    "case = jdi.input_xarray(xr_usr, opa)\n",
    "#if you need to rerun your spectrum \n",
    "#out = case.spectrum(opa,calculation='transmisson')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### See Contribution From Each Molecule\n",
    "\n",
    "One of the most common plots that was also used in the original paper is the \"leave one out\" method to see how each molecule is affecting our spectrum. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#copy atmo before modifying and rerunning picaso\n",
    "og_atmo = jdi.copy.deepcopy(case.inputs['atmosphere']['profile'])\n",
    "#atmo\n",
    "w,f,l =[],[],[]\n",
    "for iex in ['CH4','H2O','CO2',None]:\n",
    "    case.atmosphere(df = og_atmo,exclude_mol=iex, delim_whitespace=True)\n",
    "    df= case.spectrum(opa, full_output=True,calculation='transmission') #note the new last key \n",
    "    wno, rprs2  = df['wavenumber'] , df['transit_depth']\n",
    "    wno, rprs2 = jdi.mean_regrid(wno, rprs2, R=150)\n",
    "    w +=[wno]\n",
    "    f+=[rprs2]\n",
    "    if iex==None: \n",
    "        leg='all'\n",
    "    else: \n",
    "        leg = f'No {iex}'\n",
    "    l+=[leg]\n",
    "jpi.show(jpi.spectrum(w,f,legend=l))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quantify Molecular Detection using Guassian Fitting\n",
    "\n",
    "\n",
    "For very gaussian shaped molecules (like CO2 in this case), we can use a simple Gaussian fitting technique to quantify the significance of our detection. Note this ONLY works in cases where the shape of the molecule is gaussian with a single peak and well-shaped wings. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#grab file to test \n",
    "top_model_file  = np.array(out['list_of_files']['picaso_cldy']\n",
    "        )[out['rank_order']['picaso_cldy']['firefly']][0]\n",
    "\n",
    "min_wave = 3 #min wave to search for gauss peak\n",
    "max_wave = 5 #max wave to search for gauss peak\n",
    "out = lyz.detection_test(fitter,'CO2',min_wave,max_wave,'picaso_cldy','firefly',\n",
    "                     top_model_file,\n",
    "                     #opa_kwargs={wave_range=[]}#this is where you input arguments for opannection\n",
    "                     plot=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By comparing the line fit to the single Gaussian fit, we can use the methodology of [Trotta 2008](https://ui.adsabs.harvard.edu/abs/2008ConPh..49...71T/abstract) to get out a sigma detection significance. In this case we can see that the single gaussian fit is preferred over the line model at 26 sigma.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "out['sigma_single_v_line']"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}