2
Grid-SAFE
• JISC funded project to build general purpose accounting/monitoring solution.– http://gridsafe.forge.nesc.ac.uk/
• Builds on accounting subsystem from SAFE user administration system used by UK national facilities HPCx/HECToR
Challenges
• Need to work with different HPC technologies– Different batch systems– Different middleware
• Need to work with wide variety of different local policies.• Need to work with both grids and local HPC resources.• One solution won’t fit all potential users
– Build kit of parts – Pre-built solutions for common deployment scenarios.
• Key aims– Modular design, individual functions can be deployed independently – Behaviour can be customised using plug-ins to implement different
service policies.
Overview
Data Formats
• System can consume accounting data in a variety of formats.
• Each format has a plug-in parser module
• New formats can be supported by writing additional parser plug-ins.
• Data is stored in an SQL database.
• Additional policy plug-ins can augment the parser to customise behaviour.
Raw Data
DBParser
Policy Policy Policy
Parser
• System can support multiple input formats at the same time.
• Current supported parsers– OGF-UR XML
– SGE accounting logfile
– PBS accounting logfile
– EGEE JobManager logfile
– Etc.
• New parsers easy to generate
7
OGF-UR support
• OGF-UR XML is supported as an interchange format– Parser plug-in to parse OGF-UR
– Export module to format internal data as OGF-UR
• Grids may only want to use only this Format for central accounting.– Local instances could use raw data and generate UR for central processing.
• Various grid communities seem to interpret OGF-UR differently and/or make additional requirements beyond that in the schema
– Required fields
– Different charging models
– Different global username models
– OGF-UR spec allows extensions.
– Specification will also evolve over time.
• Parser/exporter highly configurable to support variations/extensions.
Use in the grid
Grid accounting
Site accounting Independent UR Generator
XML XML
9
Report generation module
• Reports can be generated on demand from web interface
• Grid-safe uses XML templates to define reports – Can generate unified reports over multiple data tables containing
different types of data
– Tables/charts
– Parameterised reports (e.g. to select user or project).
• Support reports in multiple output formats– PDF HTML CSV XML
Report generation speed
• Performance of report generation a particular issue
• Number of database records key to this.– Need to utilise database effectively. Not acceptable to read all
records into memory.
• ~1,000,000 record database table not a problem. – Current National HPC systems within this range.
– Throughput clusters often have significantly larger record counts due to large numbers of small short jobs.
• Old data can be moved to separate tables.
• Support for Daily aggregates via policy plug-in– Builds secondary accounting table combining similar records.
– For ECDF 51 million records -> 35 thousand aggregates
Policy plug-ins
• Allow behaviour to be customised to local requirements
• Generate new properties
– E.G. Charge values
• Trigger additional processing
– Decrement charging allocations
– Generate aggregate records
– Etc.
• New policies can be written for specific requirements
Aggregation Policy
• Generates Aggregated records
– Each time a new record is loaded
– Corresponding aggregate is located/created
– Aggregate values updated
• The raw data is also kept and can be used in reports if required.
• Aggregate data can be regenerated if required.
ClassificationPolicy
• Converts selected fields from raw accounting data into references to separate database table.– Reduces data footprint.
– Augmenting information can be added to these tables.
• Example:
URRecordURRecord
DailyAggregateDailyAggregate
UserUser
UnixGroupUnixGroup
SiteSite
InstitutionInstitution
DerivedPolicy
• Defines new properties as expressions over existing properties
• E.g. (EndTime-StartTime)*CPUs
• These expressions can then be used in reports.
LinkPolicy
• Merge data from different sources
– E.g. Batch system logs and middleware logs.
• Each data source is parsed to its own table.
– Primary table parsed first.
– LinkPolicy added to secondary data source.
– Locates corresponding primary record,
– Adds cross reference or copies additional properties to primary
Web Services
• RUPI– Current proposal from OGF RUS-WG
– Web service for the upload of XML usage record.
– Grid-SAFE has an implementation of the current upload service (RUPI).
• RUQI– Currently working on a proposal for a Query specification
– Aims
– Easy to implement in different code bases.
– Provide sufficient functionality for efficient report generation.
– Long term aim to provide reporting portal that can query any system that implements this interface.