alvaModel is a software tool to create Quantitative Structure Activity/Property Relationship (QSAR/QSPR) models. These models can be used to predict the biological, physicochemical and environmental properties of chemicals.
Build and Deploy
Alvascience’s solution to build and deploy QSAR/QSPR regression models consists of two pieces of software: alvaModel and alvaRunner. The latter is a software tool that allows you to apply the models, created using alvaModel, on a new set of molecules without the need of any other software tool, demonstrated in the picture below:
This solution separates the training of the models from their deployment. Therefore, it allows you to deploy your models to other parties (e.g., if you want to make them available to prove their reproducibility) or to use models created by others (e.g., if you want to test a model described in a scientific paper).
Using alvaModel you can train QSAR/QSPR regression models using the descriptors and fingerprints previously calculated in alvaDesc. The target variable you want to predict can be imported from an external text file.
Graphical user interface
An easy to use GUI allows you to create your models:
Using a simple step-by-step procedure (wizard) you can create models selecting the descriptors and fingerprints you want to use.
Feature selection using Genetic Algorithms
Since the number of descriptors in a alvaDesc project can be quite big (up to 5000), alvaModel can perform a feature selection, based on Genetic Algorithms, in order to find the best models according to a defined score (e.g., R2, Q2, RMSE).
Several feature reduction tools can be used to reduce the number of descriptors to train your model with (e.g., Constant values, Standard deviation, Pair correlation).
Different regression model types are available:
- Ordinary Least Squares (OLS) model
- kNN regression model, specifically a weighted Nearest Neighbour Regression (wNNR) model
- Consensus model defined as the arithmetic mean of the values predicted by the selected models
The model’s Applicability Domain can be estimated by measuring the similarity among the training dataset and the given molecules. An in/out indication shows whether a molecule lies inside or outside the Applicability Domain. The Applicability Domain methods available are Distance-based (e.g., Average distance) and Leverage (which use the so called Hat Matrix).
A short video introduction:
The software is 64bit and it’s available for Windows, Linux and macOS.
- A key input of the software is a project created using alvaDesc
- The models can be exported in a project which can be applied on a new set of molecules using alvaRunner