enviPy-bayer

Author	SHA1	Message	Date
liambrydon	e26d5a21e3	[Enhancement] Refactor Dataset (#184 ) # Summary I have introduced a new base `class Dataset` in `ml.py` which all datasets should subclass. It stores the dataset as a polars DataFrame with the column names and number of columns determined by the subclass. It implements generic methods such as `add_row`, `at`, `limit` and dataset saving. It also details abstract methods required by the subclasses. These include `X`, `y` and `generate_dataset`. There are two subclasses that currently exist. `RuleBasedDataset` for the MLRR models and `EnviFormerDataset` for the enviFormer models. # Old Dataset to New RuleBasedDataset Functionality Translation - [x] \_\_init\_\_ - self.columns and self.num_labels moved to base Dataset class - self.data moved to base class with name self.df along with initialising from list or from another DataFrame - struct_features, triggered and observed remain the same - [x] \_block\_indices - function moved to base Dataset class - [x] structure_id - stays in RuleBasedDataset, now requires an index for the row of interest - [x] add_row - moved to base Dataset class, now calls add_rows so one or more rows can be added at a time - [x] times_triggered - stays in RuleBasedDataset, now does a look up using polars df.filter - [x] struct_features (see init) - [x] triggered (see init) - [x] observed (see init) - [x] at - removed in favour of indexing with getitem - [x] limit - removed in favour of indexing with getitem - [x] classification_dataset - stays in RuleBasedDataset, largely the same just with new dataset construction using add_rows - [x] generate_dataset - stays in RuleBasedDataset, largely the same just with new dataset construction using add_rows - [x] X - moved to base Dataset as @abstract_method, RuleBasedDataset implementation functionally the same but uses polars - [x] trig - stays in RuleBasedDataset, functionally the same but uses polars - [x] y - moved to base Dataset as @abstract_method, RuleBasedDataset implementation functionally the same but uses polars - [x] \_\_get_item\_\_ - moved to base dataset, now passes item to the dataframe for polars to handle - [x] to_arff - stays in RuleBasedDataset, functionally the same but uses polars - [x] \_\_repr\_\_ - moved to base dataset - [x] \_\_iter\_\_ - moved to base Dataset, now uses polars iter_rows # Base Dataset class Features The following functions are available in the base Dataset class - init - Create the dataset from a list of columns and data in format list of list. Or can create a dataset from a polars Dataframe, this is essential for recreating itself during indexing. Can create an empty dataset by just passing column names. - add_rows - Add rows to the Dataset, we check that the new data length is the same but it is presumed that the column order matches the existing dataframe - add_row - Add one row, see add_rows - block_indices - Returns the column indices that start with the given prefix - columns - Property, returns dataframe.columns - shape - Property, returns dataframe.shape - X - Abstract method to be implemented by the subclasses, it should represent the input to a ML model - y - Abstract method to be implemented by the subclasses, it should represent the target for a ML model - generate_dataset - Abstract and static method to be implemented by the subclasses, should return an initialised subclass of Dataset - iter - returns the iterable from dataframe.iter_rows() - getitem - passes the item argument to the dataframe. If the result of indexing the dataframe is another dataframe, the new dataframe is packaged into a new Dataset of the same subclass. If the result of indexing is something else (int, float, polar Series) return the result. - save - Pickle and save the dataframe to the given path - load - Static method to load the dataset from the given path - to_numpy - returns the dataframe as a numpy array. Required for compatibility with training of the ECC model - repr - return a representation of the dataset - len - return the length of the dataframe - iter_rows - Return dataframe.iterrows with arguments passed through. Mainly used to get the named iterable which returns rows of the dataframe as dict of column names: column values instead of tuple of column values. - filter - pass to dataframe.filter and recreates self with the result - select - pass to dataframe.select and recreates self with the result - with_columns - pass to dataframe.with_columns and recreates self with the result - sort - pass to dataframe.sort and recreates self with the result - item - pass to dataframe.item - fill_nan - fill the dataframe nan's with value - height - Property, returns the height (number of rows) of the dataframe - [x] App domain - [x] MACCS alternatives Co-authored-by: Liam Brydon <62733830+MyCreativityOutlet@users.noreply.github.com> Reviewed-on: enviPath/enviPy#184 Reviewed-by: jebus <lorsbach@envipath.com> Co-authored-by: liambrydon <lbry121@aucklanduni.ac.nz> Co-committed-by: liambrydon <lbry121@aucklanduni.ac.nz>	2025-11-07 08:46:17 +13:00
t03i	36879c266b	[Feature] Documentation for development setup ## Summary This PR improves the local development setup experience by adding Docker Compose and Makefile for streamlined setup. ## Changes - Added `docker-compose.yml`: for one-command PostgreSQL database setup - Added `Makefile`: Convenient shortcuts for common dev tasks (\`make setup\`, \`make dev\`, etc.) - Updated `README.md`: Quick development setup instructions using Make - - Added: RDkit installation pain point documentation - Fixed: Made Java feature properly dependent ## Why these changes? The application uses PostgreSQL-specific features (\`ArrayField\`) and requires an anonymous user created by the bootstrap command. This PR makes the setup process trivial for new developers: ```bash cp .env.local.example .env make setup # Starts DB, runs migrations, bootstraps data make dev # Starts development server ``` Java fix: Moved global Java import to inline to avoid everyone having to configure the Java path. Numerous changes to view and settings. - Applied ruff-formatting ## Testing Verified complete setup from scratch works with: - PostgreSQL running in Docker - All migrations applied - Bootstrap data loaded successfully - Anonymous user created - The development server starts correctly. Co-authored-by: Tobias O <tobias.olenyi@tum.de> Co-authored-by: Tobias O <tobias.olenyi@envipath.com> Co-authored-by: Liam <62733830+limmooo@users.noreply.github.com> Reviewed-on: enviPath/enviPy#143 Reviewed-by: jebus <lorsbach@envipath.com> Reviewed-by: liambrydon <lbry121@aucklanduni.ac.nz> Co-authored-by: t03i <mail+envipath@t03i.net> Co-committed-by: t03i <mail+envipath@t03i.net>	2025-10-08 18:51:50 +13:00
liambrydon	d2f4fdc58a	[Feature] Enviformer fine tuning and evaluation ## Changes - I have finished the backend integration of EnviFormer (#19), this includes, dataset building, model finetuning, model evaluation and model prediction with the finetuned model. - `PackageBasedModel` has been adjusted to be more abstract, this includes making the `_save_model` method and making `compute_averages` a static class function. - I had to bump the python-version in `pyproject.toml` to >=3.12 from >=3.11 otherwise uv failed to install EnviFormer. - The default EnviFormer loading during `settings.py` has been removed. ## Future Fix I noticed you have a little bit of code in `PackageBasedModel` -> `evaluate_model` for using the `eval_packages` during evaluation instead of train/test splits on `data_packages`. It doesn't seem finished, I presume we want this for all models, so I will take care of that in a new branch/pullrequest after this request is merged. Also, I haven't done anything for a POST request to finetune the model, I'm not sure if that is something we want now. Co-authored-by: Liam Brydon <62733830+MyCreativityOutlet@users.noreply.github.com> Reviewed-on: enviPath/enviPy#141 Reviewed-by: jebus <lorsbach@envipath.com> Co-authored-by: liambrydon <lbry121@aucklanduni.ac.nz> Co-committed-by: liambrydon <lbry121@aucklanduni.ac.nz>	2025-10-07 21:14:10 +13:00
jebus	b757a07f91	[Misc] Performance improvements, SMIRKS Coverage, Minor Bugfixes (#132 ) Bump Python Version to 3.12 Make use of "epauth" optional Cache `srs` property of rules to speed up apply Adjust view names for use of `reverse()` Fix Views for Scenario Attachments Added Simply Compare View/Template to identify differences between rdkit and ambit Make migrations consistent with tests + compare Fixes #76 Set default year for Scenario Modal Fix html tags for package description Added Tests for Pathway / Rule Added remove stereo for apply Co-authored-by: Tim Lorsbach <tim@lorsba.ch> Reviewed-on: enviPath/enviPy#132	2025-09-26 19:33:03 +12:00
jebus	50db2fb372	[Feature] MultiGen Eval (Backend) (#117 ) Fixes #16 Co-authored-by: Tim Lorsbach <tim@lorsba.ch> Reviewed-on: enviPath/enviPy#117	2025-09-18 18:40:45 +12:00
jebus	762a6b7baf	[Feature] Package Export/Import (#116 ) Fixes #90 Fixes #91 Fixes #115 Fixes #104 Co-authored-by: Tim Lorsbach <tim@lorsba.ch> Reviewed-on: enviPath/enviPy#116	2025-09-16 02:41:10 +12:00
jebus	e82fe7e87e	[Feature] Initial Active Directory / Entra Login (#101 ) Co-authored-by: Tim Lorsbach <tim@lorsba.ch> Reviewed-on: enviPath/enviPy#101	2025-09-10 08:29:27 +12:00
jebus	3c8f0e80cb	[Feature] OAuth2 Provider (#84 ) Fixes #74 Co-authored-by: Tim Lorsbach <tim@lorsba.ch> Reviewed-on: enviPath/enviPy#84	2025-09-05 06:50:16 +12:00
jebus	2babe7f7e2	[Feature] Scenario Creation (#78 ) Co-authored-by: Tim Lorsbach <tim@lorsba.ch> Reviewed-on: enviPath/enviPy#78	2025-09-02 08:06:18 +12:00
jebus	49e02ed97d	feature/additional_information (#30 ) Fixes #12 Co-authored-by: Tim Lorsbach <tim@lorsba.ch> Reviewed-on: enviPath/enviPy#30	2025-07-19 08:10:40 +12:00
jebus	6eb1d1bd65	Added Sentry (#4 ) Co-authored-by: Tim Lorsbach <tim@lorsba.ch> Reviewed-on: enviPath/enviPy#4	2025-06-28 06:00:15 +12:00
Tim Lorsbach	acdb62c08f	added uv.lock to avoid dep issues	2025-06-24 09:07:32 +02:00

12 Commits