+ All Categories
Home > Documents > 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

Date post: 04-Jan-2016
Category:
Upload: austen-morris
View: 216 times
Download: 1 times
Share this document with a friend
Popular Tags:
24
1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji
Transcript
Page 1: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

1

DSARCH OVERVIEW

Dataset Archiving Utility Overview

By Zaihua Ji

Page 2: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

2

Outline

Definitions and how DSARCH fits into RDA functions

Purpose of DSARCH Introduction of DSARCH Usage Procedures of using DSARCH to archive data Convert existing datasets Publish MSS/RDA Server data filelists

Page 3: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

3

RDA Components

Page 4: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

4

Definition of Metadata

-- Metadata that summarizes a dataset, such as author, title, summary, etc.

-- Metadata that defines external properties of RDA files, such as file locations on MSS/RDA Server, sizes, file packaging (tar, COS block, etc.), dataset sub-groups of files, archive file type (P, B, W, …), etc.

-- Metadata that defines internal properties of RDA files, such as data format (GRIB, ASCII, etc.), variables, spatial & temporal ranges, internal metrics (eg., number of grids or stations), etc.

Page 5: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

5

Metadata Coverage

Page 6: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

6

Current Data Archive Flow

Page 7: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

7

Dsarch Work Flow for Existing dataset

Page 8: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

8

Dsarch Work Flow for New Dataset

Page 9: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

9

What DSARCH For Archives/Retrieves data files to/from MSS

and RDA server Records/Retrieves dataset metadata

(mainly VSN metadata) to/from RDADB Organizes archived data files for one

dataset into sub-datasets, called groups

Page 10: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

10

What DSARCH Does Defines group and dataset information in RDADB Automatically selects MSS VSN by default, other options are available Archives data files on the MSS and/or RDA server and saves transaction

records into the RDADB Retrieves dataset/group/file information from the RDADB Modifies/corrects information in the RDADB Copies MSS files back to local computers (RDA Server, bison, etc) Moves MSS/Web data files from one dataset/group to another Removes files from MSS or RDA server RDADB Maintenance Functions

Backups dataset/group/file information from RDADB into CVS Archive Restores dataset/group/file information into RDADB from CVS Archive

Page 11: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

11

Advantage of Using DSARCHFunctions Dsarch Method Current Method

Minimal number of archive steps 1 3 Auto-generate MSS file names Yes No (manually per cb) Archive onto MSS Yes msrcp/mswrite Archive onto RDA Server Yes cp/mv Auto record information of archived data files into RDADB

Yes (Instant) No (Separate fill utilities weekly)

Retrieve MSS data files Yes (use saved local file names)

msrcp/msread

Delete data files and record changes

Yes Multiple steps

Support sub-datasets (Groups) Yes Text file organization Move data files from one dataset to another

Yes (auto change password) Multiple steps

Auto-cache published file lists Yes No Generate data download scripts Yes No VSN metadata Backup CVS SCCS Concurrent Processes Yes No

Page 12: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

12

General DSARCH Usage

dsarch [[-DS] dsnnn.n] [Action Option] [Mode Options]

[Information Options]

Quotes [] indicate optional Three Option categories: Action, Mode, and Information (Info for short)

Options. Action options - specify what tasks this utility to execute Mode options - modify behaviors of given actions Info options - pass information, one or multiple values, to run DSARCH An option is given in either short or long names, eg. -DS or -Dataset All Action and Mode options, as well as Info option -IF (-InputFile), must

be given on command line; all other info options can be given either on command line or in one or multiple input files specified by option -IF

Page 13: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

13

Categories of Action Options Dataset Actions - create, modify and retrieve dataset information in

RDADB Group Actions - create, delete, modify and retrieve dataset group

information in RDADB File Actions

archive files onto MSS and RDA server move and delete files on MSS and RDA server create, delete, modify and retrieve information about data files in RDADB

File-Name Actions - generate, release, retrieve and archive MSS file names (VSN - Volume Serial Number) per RDADB card bank

Info Actions - create, modify and retrieve all information for datasets, groups in datasets, on MSS and RDA Server

Backup Actions - archive, restore, and check CVS backup history information for datasets, groups, and MSS and RDA Server files

Page 14: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

14

Procedure for Creating a New Dataset

Create SCCS dataset archive file with ‘Search & Discovery Metadata’ only (for dataset main webpage)

Create initial dataset record in RDADB per utility ‘filldataset’

Set flag for using RDADB to ‘Y’ and modify the dataset info

Create group information if needed Archive local files onto MSS and/or RDA Server Set flag for using RDADB to ‘P’ or ‘I’, and publish

dataset file lists per utility ‘publish_filelist’

Page 15: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

15

Procedure to Convert Existing Dataset

Set flag of using RDADB to ‘Y’ Edit VSN section from the SCCS dataset

document to create a file named ‘dsnnn.n.sccs’ Use ‘myconvert’ utility to reformat ‘dsnnn.n.sccs’

into ‘dsnnn.n.mss’, which is an input file designed for RDADB

Use DSARCH to enter the input file ‘dsnnn.n.mss’ into RDADB

Set flag of using RDADB to ‘P’ or ‘I’, and use ‘publish_filelist’ utility to publish dataset file lists

Page 16: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

16

Convert SCCS Metadata for An Existing Dataset

myconvert dsnnn.n.sccs > dsnnn.n.mss

dsnnn.n.sccs - MSS file information modified from VSN file section of SCCS dataset metadata file, by inserting conversion control keys for information about the dataset, groups and files. See examples later

dsnnn.n.mss - DSARCH input file holding dataset, group, and/or MSS file information

Page 17: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

17

Conversion Key Categories Dataset Keys - mark public and/or internal MSS

dataset notes Group Keys - build up group information, such as group

index, group name (ID), group titles, and public and/or internal MSS group notes

MSS File Keys - setup MSS file information, such as format key to specify what information should be collected from the description part of a file line, and keys for data and file formats

A special key LB, which can be used to turn on (LB<=><br>, for example) or off (LB<=>) of a html line-break symbol for multiple line notes

Page 18: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

18

Dataset Keys

DM - public MSS dataset note; description lines following ‘DM<=>’ are collected, including empty lines

DI - internal MSS dataset note; description lines following ‘DI<=>’ are collected, including empty lines

Page 19: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

19

Group Keys

GN - group name or group ID, up to 20 characters. It is given in format of GN<=>GroupID, and is optional if GI is present, eg. GN<=>List-A

GI - group index number. It is automatically assigned in order, if GN is present. It is given format of GI<=>GroupIndex, eg. GI<=>5

Page 20: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

20

MSS File Keys DE - file note or description SD - shared note for multiple files. Insert a line of 'SD<=>' to record

description lines as a shared note LF - local file name for a MSS VSN file name FF - file format, up to 10 characters, eg. 'FF<=>BI.TAR' means the

following files are binary COS-blocked and then tarred TF - data format, up to 10 characters, eg. 'TF<=>ASC.IMMA' means the

following files hold data files in both ASCII and IMMA formats RL - length of each record in a file RN - number (count) of records in a file FMT - format for description part of file information lines; eg.

'FMT<=>LF,,DE' means that description part of each line is split into three columns by delimiter ','; the first column is local file name, the second is ignored and the third is file note

Page 21: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

21

Example of ds540.1.sccsp-TF<=>BINARYFF<=>TARFMT<=>LF,GN<=>MSGSTD2p- MSG, Standard Statistics, 2x2, 1800-1997py61279 stdg3.1800.1809.tar, less than .5 MBpy61281 stdg3.1810.1819.tar, less than .5 MB….FF<=>Z.TARGN<=>MSGSTD2_R2.2p- MSG, Standard Statistics, 2x2, release 2.2, 1998-2004p- Release 2.2 is exclusively new data for 1998-2004. For convenience, data fromp- Group ID <a href="#MSGSTD2">MSGSTD2</a> for 1990-1997 have been included in these files.py88809 msg_2deg/stdg3.1990.1999.tar, 33.983 MBby88810 b/u y88809py88811 msg_2deg/stdg4.1990.1999.tar,34.098 MBby88812 b/u y88811….

Page 22: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

22

Snapshots of Web Display

…..

…..

Page 23: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

23

Publish MSS/Web Filelists

Publish_filelist [-t] dsnnn.n

Set flag of using RDADB to P or I before publishing a dataset (Y is fine for option -t to publish test filelists)

Generates html file of MSS public filelist to root directory of given dataset

Generates html file of MSS internal filelist to internal dataset directory

Generates html index files of RDA Server data in data and sub-data directories of given dataset, unless manually created index html files exist already

All html filelist files will be built dynamically and cached

Page 24: 1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.

24

Update Dataset With New Data

Run DSARCH to archive new data files onto MSS and/or RDA server

No need republish filelists Filelists including the information of new

archived data files will be recached automatically when the filelist web pages are accessed by users


Recommended