+ All Categories
Home > Documents > Status of the Production and Nagios news ALICE TF Meeting 29/07/2010.

Status of the Production and Nagios news ALICE TF Meeting 29/07/2010.

Date post: 04-Jan-2016
Category:
Upload: nicholas-hamilton
View: 219 times
Download: 1 times
Share this document with a friend
10
Status of the Production and Nagios news ALICE TF Meeting 29/07/2010
Transcript
Page 1: Status of the Production and Nagios news ALICE TF Meeting 29/07/2010.

Status of the Production and Nagios news

ALICE TF Meeting29/07/2010

Page 2: Status of the Production and Nagios news ALICE TF Meeting 29/07/2010.

Status of the production

• Since yesterday (28/07/2010) ALICE is running out of MC production– Raw data reconstruction: Currently running at

CERN (LHC10e). Decrease of the activity during the week

– Analysis trains: Ongoing– User analysis: Ongoing– MC production: Finished for the moment. No new

MC requirements on pipe

Page 3: Status of the Production and Nagios news ALICE TF Meeting 29/07/2010.

Job profile this week

Decrease due to the stop of the MC production

Page 4: Status of the Production and Nagios news ALICE TF Meeting 29/07/2010.

Job profile per users

Production clearly dominated by the MC jobs this week

As usual, important user analysis activity also this

week

Page 5: Status of the Production and Nagios news ALICE TF Meeting 29/07/2010.

Raw data transfers and productionLow raw data transfer activity this week: 1.3TB of raw data transferred. (Compatible with the raw data taking regime this week)

Around 25TB of raw data recorded in CASTOR@CERN

Page 6: Status of the Production and Nagios news ALICE TF Meeting 29/07/2010.

Status of the sites

• T1 sites– CNAF: The site has been running a very low number of Alice

jobs since more than a week. • A GPFS migration caused this problem• Still today the number of jobs is low although the operation is finished• # jobs should increase in the next hours

– RAL• ALICE is running over the number of assigned resources• Site proposed to put a cap on the number of Alice jobs at 1250. This is

about 25% of the farm, and is around 10 times Alice's current fairshare allocation, (Alice's current usage is about 65%)This is necessary as the recent high volumes on Alice work caused CMS to run a high priority workload elsewhere.

Page 7: Status of the Production and Nagios news ALICE TF Meeting 29/07/2010.

Status of the sites• T2 sites

– Subatech will be down starting tomorrow Friday at 16:00 GMT+2 until Monday in the morning. Electrical maintenance• In addition some French sites had cooling problems already solved

– Grenoble: External network will be down on Saturday, July 31st from 5:30 am till 6:00pm.

– Poznan: SE failed during the week, already solved– IPNL: CREAM1.6 migration completed – Torino: CREAM1.6 migration completed – Madrid: SE failing today. Migration activities ongoing. The CREAM system

already migrared to CREAM1.6– Trujillo: Out of production since a long time, in addition SE failing– LBL: SE failing today– Small activities at some Russian sites (new host certificates of the voboxes)

Page 8: Status of the Production and Nagios news ALICE TF Meeting 29/07/2010.

Pending issues

• Issue reported last week:– Large amount of zombies or extremely long jobs

running at the sites (over 46h)• Declared as pathological jobs which should be killed• Sites were encouraged to whether kill those jobs or

decrease the CPU limit time of the ALICE queues to 24h– No news after this during this week

Page 9: Status of the Production and Nagios news ALICE TF Meeting 29/07/2010.

Quattor recipe for the CREAM-CE migration

• Thanks to Jerome for this instructions– Available at:– http://alien2.cern.ch/index.php?

option=com_content&view=article&id=46&Itemid=103

Page 10: Status of the Production and Nagios news ALICE TF Meeting 29/07/2010.

Status of Nagios

• SAM will switched off in September

ALL VOBOXES MUST BE PINGABLE AND ACCESIBLE FROM samnag014.cern.ch


Recommended