Date post: | 27-Jun-2015 |
Category: |
Documents |
Upload: | brian-bissett |
View: | 595 times |
Download: | 2 times |
Advanced Excel Technologies in Early Development Applications
Brian Bissett
Molecular Properties Group
Pfizer Global Research & Development
Groton CT
Presentation Overview
• Advantages/Disadvantages of Excel– The Good, the Inadequate, the Aborted.
• Example Applications– ELogD Assay– Solubility Assay– Stability Assay
• Proven Techniques– Developing Specifications– Windowing– 3 ’s and you’re out.– DDE, SendKeys, OLE
• Demos & Wrap Up
The Good
• Excel is Ubiquitous.• Excel can read a wide variety of File Formats.• Excel can write a wide variety of File Types (*.xls,
*.csv, *.txt, etc.)• Excel can communicate with other applications
through both DDE (although no longer “officially” supported.) and the more up to date OLE protocol.
• Excel has the VBA macro language built in and has the most comprehensive “toolkit” of properties and methods available in the commercial spreadsheet market.
The Inadequate
• The “Object” model has “holes” in it. Especially with regard to the autocomplete feature.
• Higher Math is a problem.
• It Crashes (a lot).• Misbehavior. (Worse than Crashing) Properties and
Methods which should be available suddenly no longer function. The Cure: Reboot Windows.
The Aborted
Memory Leaks in Excel necessitate numerous reboots of the Windows Operating System when doing intensive development work.
An example of one of the many instances when Excel self aborted its operation due to memory issues present in the application.
What Tasks should be Automated with Excel?
• Data – Analysis– Extraction– Parsing– Reporting– Uploading
• Reports• Laboratory Notebook Entries
Specifications – The Basics• Inputs
– Files• Data Files from Machines• Files with Information from In-house.
– GUI• Parameters to be Calculated
– What needs to be calculated.– Where do I get the information required to calculate it?
• Outputs– Reports– Files
• Database Upload• Laboratory Notebooks• Reports
ELogD Automated Analysis Macro - Requirements
• Load a comma delimited data file from an Agilent HPLC.
• Sort the data in the file.• Determine the largest peak in a data series.• Extract the retention time corresponding to the
largest peak.• Perform some calculations (regression, formulas).• Prepare a Report.• Prepare an Uploadable file for the corporate
database.
Kinetic Solubility Macro - Requirements
• Load in several comma delimited data files from a Labsystems Platereader.
• Load an Excel Spreadsheet which contains information from the Candidate Enhancement Group about the Compounds to be Assayed.
• Average successive well readings and remove “outlier” values.
• Determine wells where light scattering indicates compound has come out of solution
• Determine Corresponding Solubility.• Prepare a Report.• Prepare an Uploadable file for the corporate
database.
Stability Assay Macro - Requirements
• Load in several comma delimited data files from an ESA coul array instrument.
• Load an Excel Spreadsheet which contains information from the Candidate Enhancement Group about the Compounds to be Assayed.
• Check for the presence of a UV signal for each sample.• Find Dominant Potentials (DP’s) and Potential Dominant
Potentials (PDP’s).• Assign a Rank in terms of stability.• Prepare a Report.• Prepare an Uploadable file for the corporate database.
Developing Specifications
• Every Assay begins with an Idea.• The Idea is tested to check its validity.• If the Idea is feasible, it will go through a period of
refinement until a process has been developed.
Common Pitfall
The macro meets the designed specification but it does not extract the parameters the user wanted.
The Basic Problem
“People by their nature tend to be flexible in interpreting data while algorithms tend to be very rigid (by design) in analyzing data.”
Example
Agreed upon specification:
Extract an Area Corresponding within ± 1.0% of the given retention time.
peak Ret. Time Area1 1.24 10242 2.25 7683 4.45 202
Further suppose the 2nd peak is the peak of interest. In this case with an ideal retention time of 2.25 and an area of 768.
Rigid Window Limits
If the “given” retention time is above 2.2275 and below 2.2725, then the area of 768 will be extracted by the program.
Hence, a rigid window has been formed based on a value ± 1.0% of the ideal retention time.
lower 2.2275upper 2.2725
peak Ret. Time Area1 1.24 10242 2.25 7683 4.45 202
The Problem with Rigid Window Limits
But what happens when the given RT = 2.2265 or = 2.2735 ?
Since the RT falls outside the Rigid Window Limits, no Area will be extracted.
While this meets the specification, invariably a scientist will say what follows on the next slide to you.
lower 2.2275upper 2.2725
peak Ret. Time Area1 1.24 10242 2.25 7683 4.45 202
The Classic Complaints
“Your Macro doesn’t work. I would have extracted the area corresponding to Retention Time X.”
To the scientist it doesn’t matter that Retention Time X falls outside of the specified window, if that’s what he/she would have chosen, that’s what they expect to see.
Algorithms however don’t care what you want to see, they merely report that which falls within the given specifications or parameters.
The Solution: Window Widening
• Rather than have a fixed window for a parameter to fall into, a field of ranges can be set up for a parameter to fall within. If the parameter falls within any of the ranges it will be picked up.
• An example of such ranges could be:– Ideal– High– Max
• The program would first scan for the parameter to be extracted within the “Ideal” range.
The Solution: Window Widening• If the parameter cannot be found within the “Ideal”
range then it would begin searching for an appropriate parameter by Widening the Window.
• A maximum window size must be set as well as a delta (or increment) for the window to be widened on each successive pass of the search.
• Recursive calls are made to the searching subroutine widening the window on each successive pass by the corresponding delta.
• The search is complete when a parameter is found to extract or the maximum window size is reached.
• The extracted parameter can be color coded in the report to reflect the Range from which it was extracted.
The Solution: Window Widening
limits Ret. Timehigh max 2.0250high flag 2.1375low ideal 2.2275high ideal 2.2725high flag 2.3625high max 2.4750
i
h
h
m
m
123
Window Range
3 ’s and You’re Out.
Outlier Removal
When analyzing multiple data series it is best to remove outlier values, those greater than 3 ’s from the mean. This is an especially useful tool when analyzing solubility data (scattering).
For the tasks Excel doesn’t Excel At.• Excel like Lotus evolved as a “bean counting”
application, not an application for scientific development.
• As Excel began to be utilized for scientific development, more and more add-in (or third party) applications became available to enhance Excel’s limited capabilities.
• Some of the better third party products can be found at these links:
http://www.octavian.com/excel.html
http://www.add-ins.com/assistnt.htm
http://j-walk.com/ss/
Using DDE and OLE
• In addition to third party applications it is also possible to control another application remotely by Excel if it supports DDE or OLE automation.
• Utilizing a third party application is a must for tasks such as:
– Curve Fitting– Generating “Nice” Plots and Graphs– “Higher” Math, FFT’s, Matrices, Complex Numbers– Statistical Functions such as ANOVA– Linear Programming
DDE Example
Here is an example in which Excel utilizes the Program Origin to Curve Fit some sample Data.
http://www.originlab.com/
More Information Available in My Textbook
http://www.crcpress.com/
http://www.pharmalabauto.com/