Commit Graph

158 Commits

Author SHA1 Message Date
97626337aa chore: add prettier formatting to html 2025-11-10 18:30:07 +13:00
2aded2ddd7 Merge remote-tracking branch 'origin/develop' into feature/frontend_update 2025-11-10 17:52:00 +13:00
e26d5a21e3 [Enhancement] Refactor Dataset (#184)
# Summary
I have introduced a new base `class Dataset` in `ml.py` which all datasets should subclass. It stores the dataset as a polars DataFrame with the column names and number of columns determined by the subclass. It implements generic methods such as `add_row`, `at`, `limit` and dataset saving. It also details abstract methods required by the subclasses. These include `X`, `y` and `generate_dataset`.

There are two subclasses that currently exist. `RuleBasedDataset` for the MLRR models and `EnviFormerDataset` for the enviFormer models.

# Old Dataset to New RuleBasedDataset Functionality Translation

- [x] \_\_init\_\_
    - self.columns and self.num_labels moved to base Dataset class
    - self.data moved to base class with name self.df along with initialising from list or from another DataFrame
    - struct_features, triggered and observed remain the same
- [x] \_block\_indices
    - function moved to base Dataset class
- [x] structure_id
    - stays in RuleBasedDataset, now requires an index for the row of interest
- [x] add_row
    - moved to base Dataset class, now calls add_rows so one or more rows can be added at a time
- [x] times_triggered
    - stays in RuleBasedDataset, now does a look up using polars df.filter
- [x] struct_features (see init)
- [x] triggered (see init)
- [x] observed (see init)
- [x] at
    - removed in favour of indexing with getitem
- [x] limit
    - removed in favour of indexing with getitem
- [x] classification_dataset
    - stays in RuleBasedDataset, largely the same just with new dataset construction using add_rows
- [x] generate_dataset
    - stays in RuleBasedDataset, largely the same just with new dataset construction using add_rows
- [x] X
    - moved to base Dataset as @abstract_method, RuleBasedDataset implementation functionally the same but uses polars
- [x] trig
    - stays in RuleBasedDataset, functionally the same but uses polars
- [x] y
    - moved to base Dataset as @abstract_method, RuleBasedDataset implementation functionally the same but uses polars
- [x] \_\_get_item\_\_
    - moved to base dataset, now passes item to the dataframe for polars to handle
- [x] to_arff
    - stays in RuleBasedDataset, functionally the same but uses polars
- [x] \_\_repr\_\_
    - moved to base dataset
- [x] \_\_iter\_\_
    - moved to base Dataset, now uses polars iter_rows

# Base Dataset class Features
The following functions are available in the base Dataset class

- init - Create the dataset from a list of columns and data in format list of list. Or can create a dataset from a polars Dataframe, this is essential for recreating itself during indexing. Can create an empty dataset by just passing column names.
- add_rows - Add rows to the Dataset, we check that the new data length is the same but it is presumed that the column order matches the existing dataframe
- add_row - Add one row, see add_rows
- block_indices - Returns the column indices that start with the given prefix
- columns - Property, returns dataframe.columns
- shape - Property, returns dataframe.shape
- X - Abstract method to be implemented by the subclasses, it should represent the input to a ML model
- y - Abstract method to be implemented by the subclasses, it should represent the target for a ML model
- generate_dataset - Abstract and static method to be implemented by the subclasses, should return an initialised subclass of Dataset
- iter - returns the iterable from dataframe.iter_rows()
- getitem - passes the item argument to the dataframe. If the result of indexing the dataframe is another dataframe, the new dataframe is  packaged into a new Dataset of the same subclass. If the result of indexing is something else (int, float, polar Series) return the result.
- save - Pickle and save the dataframe to the given path
- load - Static method to load the dataset from the given path
- to_numpy - returns the dataframe as a numpy array. Required for compatibility with training of the ECC model
- repr - return a representation of the dataset
- len - return the length of the dataframe
- iter_rows - Return dataframe.iterrows with arguments passed through. Mainly used to get the named iterable which returns rows of the dataframe as dict of column names: column values instead of tuple of column values.
- filter - pass to dataframe.filter and recreates self with the result
- select - pass to dataframe.select and recreates self with the result
- with_columns - pass to dataframe.with_columns and recreates self with the result
- sort - pass to dataframe.sort and recreates self with the result
- item - pass to dataframe.item
- fill_nan - fill the dataframe nan's with value
- height - Property, returns the height (number of rows) of the dataframe

- [x] App domain
- [x] MACCS alternatives

Co-authored-by: Liam Brydon <62733830+MyCreativityOutlet@users.noreply.github.com>
Reviewed-on: enviPath/enviPy#184
Reviewed-by: jebus <lorsbach@envipath.com>
Co-authored-by: liambrydon <lbry121@aucklanduni.ac.nz>
Co-committed-by: liambrydon <lbry121@aucklanduni.ac.nz>
2025-11-07 08:46:17 +13:00
f5133c1980 fix: remove obsolete page id 2025-11-06 10:36:04 +13:00
7fbc49afd3 chore: update citations 2025-11-05 17:50:56 +13:00
a087a518f6 chore: remove incorrect license header 2025-11-05 17:39:21 +13:00
881e0e6798 chore: fix typo 2025-11-05 17:38:52 +13:00
2eab66e9ee refactor: added meta.site_id for matomo 2025-11-05 17:37:44 +13:00
ab927b11a2 refactor: remove dependency-groups 2025-11-05 17:36:43 +13:00
fde60c3ad3 refactor: remove optional stubs 2025-11-05 17:35:45 +13:00
61a43da822 refactor: set enviformer to main 2025-11-05 17:34:31 +13:00
211ebfd19b refactor: remove enviformer loading in settings 2025-11-05 17:33:41 +13:00
06a6c23d05 fix: add tailwindcss/cli 2025-11-05 17:30:15 +13:00
3536a14e47 Merge remote-tracking branch 'origin/develop' into feature/frontend_update 2025-11-05 17:25:27 +13:00
98d62e1d1f [Feature] Make Matomo Site ID configurable via .env (#183)
Co-authored-by: Tim Lorsbach <tim@lorsba.ch>
Reviewed-on: enviPath/enviPy#183
2025-11-05 10:19:07 +13:00
7eb4029ac9 refactor: add public_mode for static pages to remove nav elements 2025-11-04 19:34:04 +13:00
7b38fc2e37 fix: remove jobs clash 2025-11-04 19:33:31 +13:00
4834348454 Merge remote-tracking branch 'origin/develop' into feature/frontend_update 2025-10-30 14:02:57 +13:00
13ed86a780 [Feature] Identify Missing Rules (#177)
Fixes #97
Co-authored-by: Tim Lorsbach <tim@lorsba.ch>
Reviewed-on: enviPath/enviPy#177
2025-10-30 00:47:45 +13:00
f1b4c5aadb [Feature] Adding list_display to various django admin sites (#180)
Co-authored-by: Tim Lorsbach <tim@lorsba.ch>
Reviewed-on: enviPath/enviPy#180
2025-10-29 22:26:28 +13:00
0a52b12f02 fix: handle line-clamp issue with news 2025-10-29 19:59:45 +13:00
14571d23a6 docs: add pnpm note 2025-10-29 18:23:28 +13:00
ea8475f0e2 docs: update README regarding dev command 2025-10-29 18:07:56 +13:00
442d139217 chore: remove obsolete doc 2025-10-29 18:06:21 +13:00
1ba511a31d chore: minimize fallback data 2025-10-29 18:02:30 +13:00
5d89341955 chore: delete obsolete runserver command 2025-10-29 18:01:21 +13:00
5f390ac2d2 fix: reenable modal showing 2025-10-29 17:52:10 +13:00
46d21e60d2 chore: add example input to search 2025-10-29 16:36:01 +13:00
13be240226 feat: working search redirect 2025-10-29 16:30:00 +13:00
167a72f5a3 fix: remove obsolete menu list 2025-10-29 16:01:13 +13:00
1736319bd7 style: update navbar and add browse back 2025-10-29 15:58:17 +13:00
e87aae6bf7 style: add legal footers on login 2025-10-29 12:16:42 +13:00
253523c81f feat: add mock legal (impressum page) 2025-10-29 12:16:24 +13:00
15809a4ccf style: update login pages 2025-10-29 12:01:35 +13:00
b7e1dac66a feat: add mockup for static pages 2025-10-29 11:13:31 +13:00
849ebbe7f8 style: update hero 2025-10-29 11:13:07 +13:00
c5dcb36452 fix: dev command working 2025-10-29 10:59:22 +13:00
37e0e18a28 [Fix] Fixed Incremental Prediction Typo (#176)
Co-authored-by: Tim Lorsbach <tim@lorsba.ch>
Reviewed-on: enviPath/enviPy#176
2025-10-28 23:29:08 +13:00
de44c22606 [Migration] Added missing Migration for JobLog (#175)
Co-authored-by: Tim Lorsbach <tim@lorsba.ch>
Reviewed-on: enviPath/enviPy#175
2025-10-27 22:41:16 +13:00
a952c08469 [Feature] Basic logging of Jobs, Model Evaluation (#169)
Co-authored-by: Tim Lorsbach <tim@lorsba.ch>
Reviewed-on: enviPath/enviPy#169
2025-10-27 22:34:05 +13:00
551cfc7768 [Enhancement] Create ML Models (#173)
## Changes

- Ability to change the threshold from a command line argument.
- Names of data packages included in model name
- Names of data, rule and eval packages included in the model description
- EnviFormer models are now viewable on the admin site
- Ignore CO2 for training and evaluating EnviFormer

Co-authored-by: Liam Brydon <62733830+MyCreativityOutlet@users.noreply.github.com>
Reviewed-on: enviPath/enviPy#173
Reviewed-by: jebus <lorsbach@envipath.com>
Co-authored-by: liambrydon <lbry121@aucklanduni.ac.nz>
Co-committed-by: liambrydon <lbry121@aucklanduni.ac.nz>
2025-10-23 06:20:22 +13:00
16a991220a Slim down Navbar 2025-10-22 12:10:46 +13:00
05c8e130b1 Add documentation link 2025-10-22 12:08:31 +13:00
4fd7856043 Remove fields from navbar 2025-10-22 12:07:55 +13:00
f5efaf1b3f Change to icon help button 2025-10-22 12:07:42 +13:00
4a2ef3a237 Add search icon mockup 2025-10-22 11:37:23 +13:00
d097013853 Add discourse API for better data retrieval 2025-10-22 11:37:13 +13:00
63cc7cf460 Change partners location 2025-10-22 11:36:25 +13:00
8fda2577ee [Feature] Dump/Restore of enviFormer Models (#170)
Dump:
`./manage.py  dump_enviformer d544303c-a1ca-439d-b036-5e3413ce4a48 --output test.tar.gz`

Restore:
`./manage.py load_enviformer test.tar.gz 1062eb09-5ec7-4bdd-a8f2-ae0252eb4b06`

Co-authored-by: Tim Lorsbach <tim@lorsba.ch>
Reviewed-on: enviPath/enviPy#170
2025-10-22 10:39:22 +13:00
819a94aced [Fix] Catch Exception for Adding Structures / Show PubChem Substances (#168)
Fixes #163
Fixes #165

Co-authored-by: Tim Lorsbach <tim@lorsba.ch>
Reviewed-on: enviPath/enviPy#168
2025-10-22 01:13:06 +13:00