Data Plan

Info on data aggregation and database design

Full Write Up

Brief Overview

Language: All databases will be constructed in MySQL, while all query software will leverage SQLAlchemy. SQLAlchemy handles several flavors of database connections and will allow us to compare various flavors. MySQLdb has been shown to be the fasts in most capacities, but we will continue to make sure that it is optimal.

Structure: Currently the MySQl database is partitioned into three tables:

  1. particles: containing all particle information: molecule name, iso name, iso abundance, iso mass, default line souce chosen by group, particle id
  2. partitions: containing all partition function information: temperature, partition, particle id, partition id, DOI
  3. transitions: this will be the largest dataset containing standard info such as wavenumber, Einstein A, g, elower, etc. We will also try to contain "completeness" levels for each database entries, among other data. Will be sending out a full excel doc of meta data soon.
A more formal design of the database including query times will be put up in the next week.