I added playwright for frontend testing and got a couple simple test cases working.
I have updated pyproject.toml but it can also be installed with `pip install pytest-playwright` followed by `playwright install`
With the django server running you can do `playwright codegen http://localhost:8000/` which will generate test code based on the actions you take on the webpage it opens. Be sure to change the target to pytest in the code pop up.
I will add more test cases but @jebus and @t03i feel free to add more. Especially once we are done with the full front-end redesign.
I have put the tests under `tests/frontend/` but I am not sure how to add them to the CI. They give steps for CI integration but maybe we want to somehow include them in our exisiting CI yaml? https://playwright.dev/python/docs/ci-intro
Reviewed-on: enviPath/enviPy#218
Reviewed-by: Tobias O <tobias.olenyi@envipath.com>
Co-authored-by: Liam Brydon <lbry121@aucklanduni.ac.nz>
Co-committed-by: Liam Brydon <lbry121@aucklanduni.ac.nz>
## Changes
- All text input fields are now cleaned with nh3 to remove html tags. We allow certain html tags under `settings.py/ALLOWED_HTML_TAGS` so we can easily update the tags we allow in the future.
- All names and descriptions now use the template tag `nh_safe` in all html files.
- Usernames and emails are a small exception and are not allowed any html tags
Co-authored-by: Liam Brydon <62733830+MyCreativityOutlet@users.noreply.github.com>
Co-authored-by: jebus <lorsbach@envipath.com>
Co-authored-by: Tim Lorsbach <tim@lorsba.ch>
Reviewed-on: enviPath/enviPy#171
Reviewed-by: jebus <lorsbach@envipath.com>
Co-authored-by: liambrydon <lbry121@aucklanduni.ac.nz>
Co-committed-by: liambrydon <lbry121@aucklanduni.ac.nz>
# Summary
I have introduced a new base `class Dataset` in `ml.py` which all datasets should subclass. It stores the dataset as a polars DataFrame with the column names and number of columns determined by the subclass. It implements generic methods such as `add_row`, `at`, `limit` and dataset saving. It also details abstract methods required by the subclasses. These include `X`, `y` and `generate_dataset`.
There are two subclasses that currently exist. `RuleBasedDataset` for the MLRR models and `EnviFormerDataset` for the enviFormer models.
# Old Dataset to New RuleBasedDataset Functionality Translation
- [x] \_\_init\_\_
- self.columns and self.num_labels moved to base Dataset class
- self.data moved to base class with name self.df along with initialising from list or from another DataFrame
- struct_features, triggered and observed remain the same
- [x] \_block\_indices
- function moved to base Dataset class
- [x] structure_id
- stays in RuleBasedDataset, now requires an index for the row of interest
- [x] add_row
- moved to base Dataset class, now calls add_rows so one or more rows can be added at a time
- [x] times_triggered
- stays in RuleBasedDataset, now does a look up using polars df.filter
- [x] struct_features (see init)
- [x] triggered (see init)
- [x] observed (see init)
- [x] at
- removed in favour of indexing with getitem
- [x] limit
- removed in favour of indexing with getitem
- [x] classification_dataset
- stays in RuleBasedDataset, largely the same just with new dataset construction using add_rows
- [x] generate_dataset
- stays in RuleBasedDataset, largely the same just with new dataset construction using add_rows
- [x] X
- moved to base Dataset as @abstract_method, RuleBasedDataset implementation functionally the same but uses polars
- [x] trig
- stays in RuleBasedDataset, functionally the same but uses polars
- [x] y
- moved to base Dataset as @abstract_method, RuleBasedDataset implementation functionally the same but uses polars
- [x] \_\_get_item\_\_
- moved to base dataset, now passes item to the dataframe for polars to handle
- [x] to_arff
- stays in RuleBasedDataset, functionally the same but uses polars
- [x] \_\_repr\_\_
- moved to base dataset
- [x] \_\_iter\_\_
- moved to base Dataset, now uses polars iter_rows
# Base Dataset class Features
The following functions are available in the base Dataset class
- init - Create the dataset from a list of columns and data in format list of list. Or can create a dataset from a polars Dataframe, this is essential for recreating itself during indexing. Can create an empty dataset by just passing column names.
- add_rows - Add rows to the Dataset, we check that the new data length is the same but it is presumed that the column order matches the existing dataframe
- add_row - Add one row, see add_rows
- block_indices - Returns the column indices that start with the given prefix
- columns - Property, returns dataframe.columns
- shape - Property, returns dataframe.shape
- X - Abstract method to be implemented by the subclasses, it should represent the input to a ML model
- y - Abstract method to be implemented by the subclasses, it should represent the target for a ML model
- generate_dataset - Abstract and static method to be implemented by the subclasses, should return an initialised subclass of Dataset
- iter - returns the iterable from dataframe.iter_rows()
- getitem - passes the item argument to the dataframe. If the result of indexing the dataframe is another dataframe, the new dataframe is packaged into a new Dataset of the same subclass. If the result of indexing is something else (int, float, polar Series) return the result.
- save - Pickle and save the dataframe to the given path
- load - Static method to load the dataset from the given path
- to_numpy - returns the dataframe as a numpy array. Required for compatibility with training of the ECC model
- repr - return a representation of the dataset
- len - return the length of the dataframe
- iter_rows - Return dataframe.iterrows with arguments passed through. Mainly used to get the named iterable which returns rows of the dataframe as dict of column names: column values instead of tuple of column values.
- filter - pass to dataframe.filter and recreates self with the result
- select - pass to dataframe.select and recreates self with the result
- with_columns - pass to dataframe.with_columns and recreates self with the result
- sort - pass to dataframe.sort and recreates self with the result
- item - pass to dataframe.item
- fill_nan - fill the dataframe nan's with value
- height - Property, returns the height (number of rows) of the dataframe
- [x] App domain
- [x] MACCS alternatives
Co-authored-by: Liam Brydon <62733830+MyCreativityOutlet@users.noreply.github.com>
Reviewed-on: enviPath/enviPy#184
Reviewed-by: jebus <lorsbach@envipath.com>
Co-authored-by: liambrydon <lbry121@aucklanduni.ac.nz>
Co-committed-by: liambrydon <lbry121@aucklanduni.ac.nz>
The caching is now finished. The cache is created in `settings.py` giving us the most flexibility for using it in the future.
The cache is currently updated/accessed by `tasks.py/get_ml_model` which can be called from whatever task needs to access ml models in this way (currently, `predict` and `predict_simple`).
This implementation currently caches all ml models including the relative reasoning. If we don't want this and only want to cache enviFormer, i can change it to that. However, I don't think there is a harm in having the other models be cached as well.
Co-authored-by: Liam Brydon <62733830+MyCreativityOutlet@users.noreply.github.com>
Reviewed-on: enviPath/enviPy#156
Co-authored-by: liambrydon <lbry121@aucklanduni.ac.nz>
Co-committed-by: liambrydon <lbry121@aucklanduni.ac.nz>
## Changes
- I have finished the backend integration of EnviFormer (#19), this includes, dataset building, model finetuning, model evaluation and model prediction with the finetuned model.
- `PackageBasedModel` has been adjusted to be more abstract, this includes making the `_save_model` method and making `compute_averages` a static class function.
- I had to bump the python-version in `pyproject.toml` to >=3.12 from >=3.11 otherwise uv failed to install EnviFormer.
- The default EnviFormer loading during `settings.py` has been removed.
## Future Fix
I noticed you have a little bit of code in `PackageBasedModel` -> `evaluate_model` for using the `eval_packages` during evaluation instead of train/test splits on `data_packages`. It doesn't seem finished, I presume we want this for all models, so I will take care of that in a new branch/pullrequest after this request is merged.
Also, I haven't done anything for a POST request to finetune the model, I'm not sure if that is something we want now.
Co-authored-by: Liam Brydon <62733830+MyCreativityOutlet@users.noreply.github.com>
Reviewed-on: enviPath/enviPy#141
Reviewed-by: jebus <lorsbach@envipath.com>
Co-authored-by: liambrydon <lbry121@aucklanduni.ac.nz>
Co-committed-by: liambrydon <lbry121@aucklanduni.ac.nz>
Bump Python Version to 3.12
Make use of "epauth" optional
Cache `srs` property of rules to speed up apply
Adjust view names for use of `reverse()`
Fix Views for Scenario Attachments
Added Simply Compare View/Template to identify differences between rdkit and ambit
Make migrations consistent with tests + compare
Fixes#76
Set default year for Scenario Modal
Fix html tags for package description
Added Tests for Pathway / Rule
Added remove stereo for apply
Co-authored-by: Tim Lorsbach <tim@lorsba.ch>
Reviewed-on: enviPath/enviPy#132