{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# What Resampling Do I Need for My Data??\n",
    "\n",
    "This notebook is purely used to help people understand what resampling they need for their specific dataset. It will show you: \n",
    "\n",
    "1. Given the precision, and resolution of your data, what resampling do I need?\n",
    "\n",
    "**Note:** This notebook relies on having a R=500,000 database and therefore is **not** executable based on public data. For context, a R=500,000 database for 1-5$\\mu$m is around 0.5 Tb. It is also not needed for most use cases. However, these are valuable tests for users to see in order to judge the required accuracy of a resampled opacity database "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import picaso.justdoit as jdi\n",
    "import picaso.justplotit as jpi\n",
    "import picaso.opacity_factory as opa_fac\n",
    "jpi.output_notebook()\n",
    "\n",
    "#tells us how long picaso will take to run \n",
    "import time\n",
    "import tracemalloc\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test set up\n",
    "\n",
    "Now let's run a simple transmission model for each resampling: \n",
    "\n",
    "1. LBL: 500,000\n",
    "2. 100,000\n",
    "3. 60,000\n",
    "4. 20,000\n",
    "5. 10,000\n",
    "6. Lupu insert direct (option #2) in [this tutorial](https://natashabatalha.github.io/picaso/notebooks/10_CreatingOpacityDb.html)\n",
    "\n",
    "To determine what resolution to use for: \n",
    "\n",
    "1. 100\n",
    "2. 500\n",
    "3. 1000\n",
    "4. 3000\n",
    "5. 5000\n",
    "6. 10000"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#this is where your opacity file should be located if you've set your environments correctly\n",
    "db_filename = '/data2/picaso_dbs/R500k/all_opacities_0,3_5,3_R500k.db'\n",
    "R=500000\n",
    "opas = {}\n",
    "# I had a 500k table on hand so let's first add samplings of \n",
    "resampled_at = [500000,100000,20000,10000] \n",
    "for inewr in resampled_at:\n",
    "    isamp = int(R/inewr)\n",
    "    opas[inewr] = jdi.opannection(filename_db=db_filename,\n",
    "                                  resample=isamp,wave_range=[1,3])\n",
    "    \n",
    "#lets pull in this other one I ran for 1-5 um at 60,000 (since it wasnt a multiple of 500k :)\n",
    "opas[60000] = jdi.opannection(filename_db='/data2/picaso_dbs/R60000/all_opacities_0.6_6_R60000.db',\n",
    "                             wave_range=[1,3])\n",
    "#and lets test the lupu \"insert direct\" we explored in \n",
    "#the previous tutorial\n",
    "opas['lupu'] = jdi.opannection(filename_db='/data2/picaso_dbs/lupu_1_3_OG_R.db')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: picaso will really yell at you for resampling your data further than what it has already done. This is because as you will see below, it strongly affects the accuracy of your calculations.\n",
    "\n",
    "## Simple Transit Model \n",
    "\n",
    "- Mixture of H2, H2O, CH4 \n",
    "- Basic Hot Jupiter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "log_g = 4.38933\n",
    "metallicity = -0.03\n",
    "t_eff =  5326.6 #K\n",
    "r_star =0.932#895 #rsun#josh=\n",
    "m_star = 0.934 #msun\n",
    "m_planet = 0.281 #mjup\n",
    "r_planet = 1.279 #rjup\n",
    "\n",
    "outs = {}\n",
    "\n",
    "for ikey in opas.keys():\n",
    "    tracemalloc.start()\n",
    "    start = time.time()\n",
    "    pl=jdi.inputs()\n",
    "\n",
    "    pl.star(opas[ikey], \n",
    "            t_eff,metallicity,log_g,radius=r_star, \n",
    "            radius_unit = jdi.u.Unit('R_sun') )\n",
    "\n",
    "    pl.gravity(mass=m_planet, mass_unit=jdi.u.Unit('M_jup'),\n",
    "              radius=r_planet, radius_unit=jdi.u.Unit('R_jup'))\n",
    "\n",
    "    pl.approx(p_reference=10)\n",
    "\n",
    "    df = {'pressure':np.logspace(-7,2,40)}\n",
    "    df['temperature'] = np.logspace(-7,2,40)*0 + 500\n",
    "    df['H2O'] = np.logspace(-7,2,40)*0 + 1e-4\n",
    "    df['CH4'] = np.logspace(-7,2,40)*0 + 1e-4\n",
    "    df['H2'] = np.logspace(-7,2,40)*0 + 1-2e-4\n",
    "\n",
    "    pl.atmosphere(df=jdi.pd.DataFrame(df))\n",
    "    outs[ikey] = pl.spectrum(opas[ikey], calculation='transmission')\n",
    "    mem = tracemalloc.get_traced_memory()\n",
    "    print(\"Resampling: \", ikey, 'Took (s):',(time.time()-start)/60,\n",
    "         'Peak Memory:', mem)\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Regridding to Data Resolution \n",
    "\n",
    "Here we will regrid everything to various resolutions (100,500,1000,3000) so that the user can see how this shapes the spectra"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#medium to low resolution tests\n",
    "r_test_low = [100,500,1000,3000]\n",
    "rebin = {isamp:{} for isamp in opas.keys()}\n",
    "for isamp in opas.keys(): \n",
    "    for iR in r_test_low:\n",
    "        w,f = jdi.mean_regrid(outs[isamp]['wavenumber'],outs[isamp]['transit_depth']\n",
    "                              , R=iR)\n",
    "        rebin[isamp][iR] = [w,f]\n",
    "\n",
    "#high resolution resolution tests\n",
    "r_test_hi = [10000, 50000]\n",
    "for iR in r_test_hi:\n",
    "    for isamp in [500000, 100000,'lupu']:    \n",
    "        if isamp==500000:\n",
    "            w,f = jdi.mean_regrid(outs[isamp]['wavenumber'],outs[isamp]['transit_depth']\n",
    "                      , R=iR)\n",
    "        else: \n",
    "            w,f = jdi.mean_regrid(outs[isamp]['wavenumber'],outs[isamp]['transit_depth']\n",
    "                      , newx=w)\n",
    "        rebin[isamp][iR] = [w,f]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "for iR in r_test_low:\n",
    "    w,f,l=[],[],[]\n",
    "    w+=[ rebin[500000][iR][0]]\n",
    "    f+=[ 1e6*(rebin[500000][iR][1]-np.min(rebin[500000][iR][1]))]\n",
    "    l+=['Line-by-Line at 500k']\n",
    "    for isamp in [i for i in opas.keys() if i not in [500000]]:   \n",
    "        w+=[rebin[isamp][iR][0]]\n",
    "        f+=[1e6*(rebin[isamp][iR][1]-np.min(rebin[isamp][iR][1]))]\n",
    "        l+=['Resampled R='+str(isamp)]\n",
    "    jpi.show(jpi.spectrum(w,f,plot_width=600,legend=l,muted_alpha=0, \n",
    "                          title=f'Data is R={iR}'), y_axis_label='Spectrum(ppm)')\n",
    "    \n",
    "for iR in r_test_hi:\n",
    "    w,f,l=[],[],[]\n",
    "    w+=[ rebin[500000][iR][0]]\n",
    "    f+=[ 1e6*(rebin[500000][iR][1]-np.min(rebin[500000][iR][1]))]\n",
    "    l+=['Line-by-Line at 500k']\n",
    "    for isamp in [100000,'lupu']:   \n",
    "        w+=[rebin[isamp][iR][0]]\n",
    "        f+=[1e6*(rebin[isamp][iR][1]-np.min(rebin[isamp][iR][1]))]\n",
    "        l+=['Resampled R='+str(isamp)]\n",
    "    jpi.show(jpi.spectrum(w,f,plot_width=600,legend=l, muted_alpha=0,\n",
    "                          title=f'Data is R={iR}', y_axis_label='Spectrum(ppm)'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot differences"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "errs = {isamp:{} for isamp in list(opas.keys())[1:]}\n",
    "for iR in r_test_low:\n",
    "    w,f,l=[],[],[]\n",
    "    for isamp in list(opas.keys())[1:]: \n",
    "        w+=[rebin[isamp][iR][0]]\n",
    "        err = 1e6*(rebin[isamp][iR][1] - rebin[500000][iR][1])\n",
    "        f+=[err]\n",
    "        l+=['Resampled R='+str(isamp)]\n",
    "        errs[isamp][iR] = np.std(err)\n",
    "    jpi.show(jpi.spectrum(w,f,plot_width=600,legend=l, \n",
    "                          #y_range=[10,10],\n",
    "                          title=f'Data is R={iR}',\n",
    "                        y_axis_label='Delta LBL-Resampled(ppm)')\n",
    "            )\n",
    "for iR in r_test_hi:\n",
    "    w,f,l=[],[],[]\n",
    "    for isamp in [100000,'lupu']:  \n",
    "        w+=[rebin[isamp][iR][0]]\n",
    "        err = 1e6*(rebin[isamp][iR][1] - rebin[500000][iR][1])\n",
    "        f+=[err]\n",
    "        l+=['Resampled R='+str(isamp)]\n",
    "        errs[isamp][iR] = np.std(err)\n",
    "    jpi.show(jpi.spectrum(w,f,plot_width=600,legend=l, \n",
    "                          title=f'Data is R={iR}',\n",
    "                          y_axis_label='Delta LBL-Resampled(ppm)')\n",
    "                        )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Takeaways \n",
    "\n",
    "1. Largest \"error\" from resampling comes in the window regions of opacity \n",
    "2. Your resampling is really dependent on what level of data precision you expect\n",
    "3. Errors associated with under sampling can be seen in the spectra as \"jaggedness\". These should not be mistaken for features. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "err_fig = jpi.figure(x_axis_type='log', height=400, width=500,\n",
    "                    x_axis_label='Data Resolution',\n",
    "                    y_axis_label='1 Sigma Error (ppm)')\n",
    "for i,isamp in enumerate(errs.keys()): \n",
    "    x = list(errs[isamp].keys())\n",
    "    y = list(errs[isamp].values())\n",
    "    err_fig.line(x,y,line_width=2, color=jpi.Colorblind8[i],legend_label=f'Resampling={isamp}')\n",
    "jpi.show(err_fig)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}