+ All Categories
Home > Documents > COMP3122 Network Management Richard Henson April 2012.

COMP3122 Network Management Richard Henson April 2012.

Date post: 21-Jan-2016
Category:
Upload: annabel-neal
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
36
COMP3122 COMP3122 Network Management Network Management Richard Henson Richard Henson April 2012 April 2012
Transcript
Page 1: COMP3122 Network Management Richard Henson April 2012.

COMP3122 COMP3122 Network ManagementNetwork Management

Richard HensonRichard Henson

April 2012April 2012

Page 2: COMP3122 Network Management Richard Henson April 2012.

Week 11 – Troubleshooting Week 11 – Troubleshooting & Optimisation& Optimisation

Learning Objectives:Learning Objectives:– Explain the principles of troubleshooting as a Explain the principles of troubleshooting as a

means of mitigating against failuremeans of mitigating against failure– Use the various tools available on a named Use the various tools available on a named

operating system to identify potential faults operating system to identify potential faults and problemsand problems

– Take appropriate action to stop a fault Take appropriate action to stop a fault becoming a failurebecoming a failure

Page 3: COMP3122 Network Management Richard Henson April 2012.

““A stitch in time saves nine”A stitch in time saves nine”

Page 4: COMP3122 Network Management Richard Henson April 2012.

Business - Worst Possible Scenario (1)Business - Worst Possible Scenario (1)

There is an interruption in the power There is an interruption in the power supplysupply– UPS is invoked UPS is invoked – the interruption continues…the interruption continues…– servers all have to be shut downservers all have to be shut down

Power supply restored…Power supply restored…– but main domain controller doesn’t rebootbut main domain controller doesn’t reboot– no other domain controllers therefore no other domain controllers therefore

connect to itconnect to it– the domain tree failsthe domain tree fails

Page 5: COMP3122 Network Management Richard Henson April 2012.

Business - Worst Possible Scenario (2)Business - Worst Possible Scenario (2) Organisation cannot do business with the Organisation cannot do business with the

network down…network down…– server can’t be persuaded to boot server can’t be persuaded to boot – new main domain controller has to be new main domain controller has to be

commissionedcommissioned– whole directory tree has to be rebuilt!!!whole directory tree has to be rebuilt!!!– word spreads very rapidly…word spreads very rapidly…

Business loses so much custom, trust, and Business loses so much custom, trust, and credibility that even when it starts doing credibility that even when it starts doing business again customers choose to go business again customers choose to go elsewhereelsewhere– without a flourishing customer base… without a flourishing customer base… the the

business foldsbusiness folds

Page 6: COMP3122 Network Management Richard Henson April 2012.

Analysis: This scenario shouldn’t Analysis: This scenario shouldn’t have occurred…have occurred…

Unlikely that the server would fail to boot Unlikely that the server would fail to boot without prior warning…without prior warning…– warnings would have been presented…warnings would have been presented…– but were clearly not acted upon!but were clearly not acted upon!

Disaster recovery plan!?!Disaster recovery plan!?!– not formulated? not formulated? – not tested?not tested?– not effective (in the event of a domain tree controller not effective (in the event of a domain tree controller

failure…)failure…)

Page 7: COMP3122 Network Management Richard Henson April 2012.

But it does…But it does… Actual example (15Actual example (15thth Feb 2010): Feb 2010):

– root domain controller [on the network] had not root domain controller [on the network] had not been backed up for 10 months, when it crashed been backed up for 10 months, when it crashed (well… at least it had been backed up at some (well… at least it had been backed up at some time…)time…)

– http://searchwindowsserver.techtarget.com/http://searchwindowsserver.techtarget.com/generic/0,295582,sid68_gci1381567,00.html generic/0,295582,sid68_gci1381567,00.html

The consultant called in to fix it reported that:The consultant called in to fix it reported that:– ““I had never seen a case where the forest I had never seen a case where the forest

root domain had to be recovered -- and I root domain had to be recovered -- and I couldn't find anyone who had.” couldn't find anyone who had.”

Page 8: COMP3122 Network Management Richard Henson April 2012.

Analysis: Who is to blame? (1)Analysis: Who is to blame? (1) In this example, the organisation said In this example, the organisation said

they were following Microsoft guidelinesthey were following Microsoft guidelines– they set up an they set up an emptyempty root domain root domain– the root domain controller had a RAID-5 the root domain controller had a RAID-5

disk configurationdisk configuration This was true, to some extentThis was true, to some extent

– Microsoft did espouse this as best Microsoft did espouse this as best practice… in the year 2000!practice… in the year 2000!

– guidelines had changed since then…guidelines had changed since then…

Page 9: COMP3122 Network Management Richard Henson April 2012.

Analysis: Who is to blame? (2)Analysis: Who is to blame? (2) The disaster that struck was:The disaster that struck was:

– two RAID drives failed on the same day!two RAID drives failed on the same day!– unlucky? possible to prepare for this?unlucky? possible to prepare for this?

The recovery process took about three weeksThe recovery process took about three weeks– most of the time was spent studying logs, doing most of the time was spent studying logs, doing

the restore, etc. the restore, etc.

In this case, the tree was still able to function In this case, the tree was still able to function without a root domainwithout a root domain– business was able to continuebusiness was able to continue– customer base wasn’t compromised…customer base wasn’t compromised…

Page 10: COMP3122 Network Management Richard Henson April 2012.

Fault Tolerance and Risk Fault Tolerance and Risk AssessmentAssessment

General “common sense” principle:General “common sense” principle:– alwaysalways have a backup have a backup– ESPECIALLY for the most important computer ESPECIALLY for the most important computer

on the network…on the network…

Q: Q: – How can you tell what needs backing up?How can you tell what needs backing up?

A:A:– Risk Assessment and Risk ManagementRisk Assessment and Risk Management

Page 11: COMP3122 Network Management Richard Henson April 2012.

Why not Risk Management?Why not Risk Management?

Time consuming!Time consuming! However, without proper risk However, without proper risk

management…management…– how does the organisation know what how does the organisation know what

processes are most important to its processes are most important to its functioning?functioning?

– how can an organisation provide resources how can an organisation provide resources to protect aspects of its network?to protect aspects of its network?

Page 12: COMP3122 Network Management Richard Henson April 2012.

Risk Management and Risk Management and Risk AssessmentRisk Assessment

Risk Assessment is an essential first stepRisk Assessment is an essential first step– requires putting a “value” on assetsrequires putting a “value” on assets– more valuable… greater protectionmore valuable… greater protection

Do information assets have value?Do information assets have value?– organisations still failing to acknowledge that they organisations still failing to acknowledge that they

do…do…– categorisation of information assets therefore categorisation of information assets therefore

potentially problematicpotentially problematic– need to look at the consequence to the need to look at the consequence to the

organisation of losing that asset…organisation of losing that asset…

Page 13: COMP3122 Network Management Richard Henson April 2012.

How do you back up a How do you back up a Domain Controller?Domain Controller?

The Windows “Backup” program works, and The Windows “Backup” program works, and can easily be scheduledcan easily be scheduled– but heavily criticised…but heavily criticised…– even the 2008 server version…even the 2008 server version…

Third Party products give more flexibility and Third Party products give more flexibility and protection e.g. :protection e.g. :– Recovery ManagerRecovery Manager

» http://www.quest.com/recovery-manager-for-active-directoryhttp://www.quest.com/recovery-manager-for-active-directory

– Backup ExecBackup Exec» http://www.symantec.com/business/products/family.jsp?familyid=backupexechttp://www.symantec.com/business/products/family.jsp?familyid=backupexec

Page 14: COMP3122 Network Management Richard Henson April 2012.

Prevention is Better than CurePrevention is Better than Cure A server shouldn’t crash unexpectedly!A server shouldn’t crash unexpectedly!

– should be kept cool (environmental unit mustn’t should be kept cool (environmental unit mustn’t break down!)break down!)

– monitoring should show that unexpected things are monitoring should show that unexpected things are happeninghappening

– action can then (usually) be taken to take care of action can then (usually) be taken to take care of the unexpectedthe unexpected

Many tools available to:Many tools available to:– Check/monitor the system on a regular basisCheck/monitor the system on a regular basis– Provide stats/ to administrators Provide stats/ to administrators

» could also be used for security purposescould also be used for security purposes

– Generate alerts if something is starting to go Generate alerts if something is starting to go wrong…wrong…

Page 15: COMP3122 Network Management Richard Henson April 2012.

Troubleshooting Tools for a Windows Troubleshooting Tools for a Windows Server: Task ManagerServer: Task Manager

Applications tab:Applications tab:– shows which applications are runningshows which applications are running– enables changing of process priorityenables changing of process priority

» use view/update speeduse view/update speed

– can be used tocan be used to» open new applicationsopen new applications» shut rogue applications downshut rogue applications down

Page 16: COMP3122 Network Management Richard Henson April 2012.

Task Manager (continued)Task Manager (continued)

Processes tab:Processes tab:– all system processesall system processes– Memory usage of eachMemory usage of each– % CPU time for each% CPU time for each– total CPU time since boot uptotal CPU time since boot up– also used to close a process downalso used to close a process down

» careful! (but you get a warning…)careful! (but you get a warning…)

Page 17: COMP3122 Network Management Richard Henson April 2012.

Task Manager (continued)Task Manager (continued)

Performance tab:Performance tab:– total no. of threads, processes, handles runningtotal no. of threads, processes, handles running– Graph: % CPU usageGraph: % CPU usage

» User mode User mode » Kernel mode (optional: view menu)Kernel mode (optional: view menu)» graph per CPU (optional: view menu)graph per CPU (optional: view menu)

– physical (Page File) memory available/usagephysical (Page File) memory available/usage– virtual memory available/usagevirtual memory available/usage

Page 18: COMP3122 Network Management Richard Henson April 2012.

Event ViewerEvent Viewer

Events recorded into “event log” files Events recorded into “event log” files – System logSystem log– Auditing log (customisable)Auditing log (customisable)– Application logApplication log– customisable - additional filescustomisable - additional files

New files recorded daily; old ones New files recorded daily; old ones archivedarchived– time before archiving also customisabletime before archiving also customisable

Page 19: COMP3122 Network Management Richard Henson April 2012.

Event ViewerEvent Viewer

Three types of events recorded in log:Three types of events recorded in log:– InformationInformation– WarningWarning– ErrorError

More information on each event obtained by More information on each event obtained by double-clickingdouble-clicking– make note of event codemake note of event code– heed and take action if necessaryheed and take action if necessary

Page 20: COMP3122 Network Management Richard Henson April 2012.

Using Event ViewerUsing Event Viewer

Wise to check all event logs regularlyWise to check all event logs regularly– take time/trouble to find out that those take time/trouble to find out that those

messages really mean…messages really mean… The action is needed that itThe action is needed that it

– sort out potential problems nowsort out potential problems now– Make sure they don’t become real ones Make sure they don’t become real ones

later… later…

Page 21: COMP3122 Network Management Richard Henson April 2012.

Auditing Further EventsAuditing Further Events

Any “object” can be auditedAny “object” can be audited Objects to audit, and processes Objects to audit, and processes

audited can be set through audit audited can be set through audit (group) policy(group) policy– Using MMC & relevant snap-inUsing MMC & relevant snap-in

Types of process audited:Types of process audited:– accessaccess– attempt to accessattempt to access

Page 22: COMP3122 Network Management Richard Henson April 2012.

Security auditingSecurity auditing

Same principles as general Same principles as general auditingauditing

Refers to “restricted” objectsRefers to “restricted” objects Events appear in separate Events appear in separate

security logsecurity log

Page 23: COMP3122 Network Management Richard Henson April 2012.

Event Management software Event Management software (SIEM)(SIEM)

Who’s going to look at all these log files?Who’s going to look at all these log files?– in practice, often no-one..in practice, often no-one..

Solution – SIEM software to analyse and Solution – SIEM software to analyse and present information from:present information from:– network and security devicesnetwork and security devices– identity & access management applicationsidentity & access management applications– vulnerability management/policy compliance toolsvulnerability management/policy compliance tools– os, database & application logsos, database & application logs– external threat dataexternal threat data http://www.focus.com/briefs/

how-select-security-information-and-event-management-siem

Page 24: COMP3122 Network Management Richard Henson April 2012.

Other Troubleshooting Other Troubleshooting ResourcesResources

NT Diagnostics (NT Diagnostics (winmsd.exe)winmsd.exe) – hardware & operating system data from registryhardware & operating system data from registry

Performance MonitorPerformance Monitor– Can monitor many aspects of system performanceCan monitor many aspects of system performance– Either display current data graphically, in real-timeEither display current data graphically, in real-time– or log data at regular intervals to get a longer term or log data at regular intervals to get a longer term

picturepicture– Useful role in Useful role in system optimisationsystem optimisation

Page 25: COMP3122 Network Management Richard Henson April 2012.

Other Troubleshooting Other Troubleshooting ResourcesResources

System Monitor (perfmon.msc)System Monitor (perfmon.msc)– captures, filters, or analyses frames or packets captures, filters, or analyses frames or packets

sent over the networksent over the network AlertsAlerts

– notify administrator when a particular threshold notify administrator when a particular threshold value has been reachedvalue has been reached

System RecoverySystem Recovery– if a fatal error occurs:if a fatal error occurs:

» a dump of system memory is made, and can be used for a dump of system memory is made, and can be used for identifying the cause of the problemidentifying the cause of the problem

» alerts are sent to usersalerts are sent to users» system is restarted automaticallysystem is restarted automatically

Page 26: COMP3122 Network Management Richard Henson April 2012.

Performance MonitorPerformance Monitor

Windows 2003 Server, but not available Windows 2003 Server, but not available on diskon disk

To obtain and download Performance To obtain and download Performance Monitor Wizard (PerfWiz), visit the Monitor Wizard (PerfWiz), visit the following Web site:following Web site:– http://www.microsoft.com/downloads/http://www.microsoft.com/downloads/

details.aspx?FamilyID=31fccd98-c3a1-4644-details.aspx?FamilyID=31fccd98-c3a1-4644-

9622-faa046d69214&displaylang=en9622-faa046d69214&displaylang=en

Page 27: COMP3122 Network Management Richard Henson April 2012.

What if the machine What if the machine doesn’t boot…doesn’t boot…

Tools available:Tools available:– The boot error itselfThe boot error itself

» blue screen? driver softwareblue screen? driver software

» constant reboot? motherboardconstant reboot? motherboard

– Last Known Good…Last Known Good…» Gives machine a chance to go back to the Gives machine a chance to go back to the

previous (usually last but one) previous (usually last but one) configurationconfiguration

Page 28: COMP3122 Network Management Richard Henson April 2012.

What if the machine What if the machine doesn’t boot… (continued)doesn’t boot… (continued) Safe ModeSafe Mode

– includes VGA Mode or boot includes VGA Mode or boot logginglogging

– Debugging mode also availableDebugging mode also available» output difficult to decipher for non-output difficult to decipher for non-

expertsexperts

Recovery ConsoleRecovery Console– ““DOS-type prompt” for performing DOS-type prompt” for performing

minor repairsminor repairs

Page 29: COMP3122 Network Management Richard Henson April 2012.

What if the machine What if the machine doesn’t boot… (continued)doesn’t boot… (continued)

System Configuration Utility System Configuration Utility (Msconfig.exe)(Msconfig.exe)– automates the routine troubleshooting automates the routine troubleshooting

steps relating to Windows configuration steps relating to Windows configuration issuesissues

– can be used to modify the system can be used to modify the system configuration and troubleshoot the problem configuration and troubleshoot the problem using a process-of-elimination methodusing a process-of-elimination method

Page 30: COMP3122 Network Management Richard Henson April 2012.

What if the machine What if the machine doesn’t boot… (continued)doesn’t boot… (continued)

Emergency Repair Disk (ERD)Emergency Repair Disk (ERD)– reboot machine using different mediareboot machine using different media

» e,g. floppy diske,g. floppy disk

– media should be generated BEFORE it media should be generated BEFORE it needs to be used!needs to be used!

– option to create the ERD during the set option to create the ERD during the set up process…up process…

Page 31: COMP3122 Network Management Richard Henson April 2012.

What if the machine What if the machine doesn’t boot… (continued)doesn’t boot… (continued)

Full restoreFull restore– assumes a full backup has already been assumes a full backup has already been

mademade– still have to:still have to:

» reformat hard disk from scratch…reformat hard disk from scratch…

» and then restore the backup files using and then restore the backup files using backup/restore option….backup/restore option….

– but better than losing all your data!but better than losing all your data!

Page 32: COMP3122 Network Management Richard Henson April 2012.

Network Troubleshooting Chart -1Network Troubleshooting Chart -1Identify the

problematic network node

  Use commands such as PING & TraceRt

  URL:URL:

http://http://teamapproach.ca/teamapproach.ca/trouble trouble

       

Is there a problem with one of the network protocols?

Isolate the problem to a protocol layer and fix it

   

       

Is there a memory problem?

Is there a memory leak?

Fix or eliminate the software with the memory leak

     

  Is there sufficient memory?

Add more memory

Page 33: COMP3122 Network Management Richard Henson April 2012.

Network Troubleshooting Chart - 2Network Troubleshooting Chart - 2

Does the system freeze?

Investigate priority and device driver problems

   

       

Is there high processor utilization?

Is it caused by hardware or software?

Provide adequate processor resources

  hardware    

  Can an upgraded device driver fix the problem?

Upgrade you hardware to offload the processor

Page 34: COMP3122 Network Management Richard Henson April 2012.

Network Troubleshooting Chart – 3Network Troubleshooting Chart – 3

Is there a disk problem?

Is there sufficient file cache?

Add more memory to ensure sufficient cache

     

  Use NTFS and do regular maintenance

Use RAID

     

  Is there a boot record problem?

Use FixBoot or FixMBR from the recovery console

         

Page 35: COMP3122 Network Management Richard Henson April 2012.

Network Troubleshooting Chart – 4Network Troubleshooting Chart – 4

Is there a network problem?

Use Network Monitor to identify top broadcasters

Eliminate unnecessary broadcasts

       

    Use Network Monitor to identify top talkers

Eliminate unnecessary network traffic

       

    Correct poor configuration

Reorganize & upgrade network for more capacity

       

    Is there a address or name resolution problem?

Examine ARP cache, WINS, DNS, and NBTstats

Page 36: COMP3122 Network Management Richard Henson April 2012.

Optimisation…Optimisation…

All about improving the performance All about improving the performance of system resources…of system resources…

A network manager should never A network manager should never have “nothing to do…”have “nothing to do…”


Recommended