1
The Future of XML at the IRSand Building Partnerships for
Collaboration Dynamic Data
@ The Internal Revenue Service
John A. [email protected]
FTA Technology ConferenceAugust 8 – 10, 2005
Agenda
• IRS Enterprise Architecture• IRS XML Transition Strategy• IRS XML Framework• XML Registry• XML Components• Data Exchanges
2
Much has beenaccomplished...
• The IRS has deployed several high impact applications that makelife easier for American Taxpayers– “Where’s My Refund” Internet Refund, Fact of Filing (IRFOF)– CADE version 1
• 1040ez filers moved to new system– e-Services
• Self service applications for practitioners– Modernized e-File
• Electronic filing of business returns• Web Services interface being introduced
• Use of the Internet to enable a new value proposition for taxpayersand software developers and meet rising expectations
…Much Remains To BeDone
• We need to…– Shorten our development cycles– Reducing our cycle times– Simplify our infrastructure and the relationships between
systems– Modernize our data strategies
• So we can…– Enhance self service opportunities– Enable taxpayer account access– Publish XML based Application Programming Interfaces– Retire the Master File
3
It’s all About the Data
• The lack of ability to move data hobbles anorganization’s flexibility– It’s not lack of data, but lack of ability to get the data
where it’s needed– Data needs to be moved and combined to turn it into
information and get it to authorized consumers• The ultimate goal:
– The “zero latency enterprise”• Latency is our enemy
– Move from weekly cycles to daily, then instantaneoussettlement
The IRS Service BasedArchitecture (IRS/SBA)
• The IRS/SBA is the IRS’ implementation of a Service OrientedArchitecture
• Loosely coupled.• Places XML translators in front on existing IRS systems
– Provides quicker time to market as software resources arereused
– Shortens maintenance cycles by using XML as the wire andinterface formats
– XML will protect software developer investment• Event driven
– Publish and subscribe• Provides data convergence
– Data from multiple sources is combined on the fly to turn it intoinformation
4
IRS/SBA
Web ServicesInterface
IntegrationBroker
Adapters Applications
Native formatsXML
Orchestration
IRS/SBA
Web ServicesInterface
IntegrationBroker
IRS ELDM
Schema
Doc Types
ServiceInput/OutputDocTypes – (IRS XML XSD)
Flow ServicesBusiness Processes
Repository
Simple & Complex Elements(common components)
Ex: TaxpayerAddressCreate,TaxpayerAddressUpdate, etc.
5
Benefits of the ServiceBased Architecture
• Services allow developers to focus on business logic and user friendlydisplays instead of where to get the data from and how to get it
• Hides the complexity of the infrastructure– One entry point into the IRS– External interfaces governed by standards– Each application or service has exactly one interface to everything else
• Serves up authoritative information– Developers deal with “business information” instead of “data”
• Reliable service delivery– Stable interfaces through XML– Faster time to market – less code to write– Each service can be upgraded and scaled independently (loose
coupling)
Technology and Standards
Majority of IRS data centric systems andinfrastructure components have not
transitioned to the Modernized Architecture
Initial Modernized Data replicated in LegacyFormat (e.g., CADE => IMF)
Majority of IRS systems and infrastructurecomponents have transitioned to the
Modernized Architecture
Remaining Legacy Data replicated inModernized Format
IRS
Inte
rnal
Dat
a A
rchi
tect
ure
Transition Strategyto Modernized Data Stores
Lega
cyD
ata
Stor
esM
oder
nize
d D
ata
Stor
es
6
IRS External Data Exchange DriversTechnology and Standards
Modernized DataExchange
XML Standards andTechnology stable(though still evolving)
And
Tax Administration sectoradoption of Standardsand CommonVocabularies mature (andgrowing)
Legacy DataExchanges
XML Standards andTechnology maturing
And/or
Tax Administration sectoradoption of Standards andCommon Vocabularies notyet mature
Transition Strategy to XML DataExchange
IRS External Data Exchange
Use Adaptors totranslate Legacy
formats toModernized format
Current State
Target State
Use Adaptors totranslate from
Modernized format toLegacy.
IRS
Inte
rnal
Dat
a A
rchi
tect
ure
IRS Transition Strategy
Optimal Path
Outpace Standards
Trail Standards
Where we areToday
Legacy Data Modernized Data
Lega
cy F
orm
atM
oder
nize
d Fo
rmat
7
What This Means
• The vertical line, IRS modernization progression,is a driver for how the IRS manages data internally
• The horizontal line, maturity of XML technologyand standards in the industry, is a driver for howthe IRS will exchange data with our externalpartners
• This is a principle, some data exchanges maytransition prior to others. Prototypes/pilots will bestrategies for discussion.
Overview – IRS XML Framework
• Objective:– Identify the target state of XML technology implementation for the IRS– Assess and adopt appropriate standards, practices, and tools for the IRS and its trading
partners– Strategically transition current systems and legacy formats to modernized XML data exchange
formats
Adapted from a Department of Education, Federal Student Aid briefing. “XML: A Beginners Guide”Presented at the 2003 Electronic Access Conference in San Diego, CA.
8
Current IRS XML Standardsand Governance Initiatives
• IRS XML Community of Practice (Chartered)– Organize internal activity– Collaborate on adoption of industry standards– Form and vet IRS responses to external communities
of practice• Naming and Design Rules (NDR)
– Draft UBL based NDR circulating for comment• XML Registry, Repository and Registration process
– Concept of Operations and alternatives analysis• Common component schema
– Building blocks for composing IRS schema– Integral to IRS/SBA– Tax Filing vocabulary derived from existing MeF
schema
XML Registry Concept
9
Common Component Building Blocks
TheCommonComponentApproach isin a designphase. Thisis oneconcept,borrowedfrom FederalStudent Aid.
IRS NDR, CommonComponents, and MeF
• IRS NDR is in a comment and review period• Federal NDR is in an earlier comment phase• UBL 2.0 is close to draft release• IRS NDR will align with Federal NDR and UBL 2.0• Approach to MeF
– Capture vocabulary into common component schemafrom existing MeF schema
– Prototype NDR conformant schema for MeF– Assess impact and feedback to Federal WG and UBL TC– Publish common component schema for Tax Filing– Target MeF Release 5.0 for NDR conformance
10
Some xmlCoP &Working Group Issues
• XML Performance and Size• Standards Interoperability• Naming & Design Rules
– Schema Versioning and Governance• XML Registry• Transition• XBRL• Many Others!
IRS Partners for XML Policyand Standards
• Federal CIO’s Council Working Groups– XML Community of Practice (xmlCoP)– Semantic Interoperability CoP
• Federation of Tax Administrators (FTA)– TAG– TIGERS
• OASIS– Tax XML TC– UBL TC
• Universal Business Language (UBL 1.0)– ebXML Registry TC
11
Data Exchange Scope
Current IRS/State Exchanges
• No Format Changes Pending!– There are no immediate plans to change the
format of existing files.• The future state and how we get there will be
designed with– close communication and– collaboration between the IRS and our
external partners.• New exchanges will be standards driven.
12
Beyond MeF
• The IRS has worked closely with the TIGERS on MeF.• MeF is just one of many taxpayer data exchanges. As
a new system, it was one of the first to implement XMLon a large scale.
• As legacy systems transition and new data exchangesare created, the IRS will continue to partner with theFTA and TIGERS to plan and manage this change. Forexample: E-Services, GLDEP & EODAD.
• XML vocabularies, policy and standards will beharmonized with Federal, state and local, industry andInternational communities.
Contact
John Triplett Enterprise Data Management Office (EDMO)[email protected]
Doug Peterson Electronic Tax Administration (ETA)[email protected]
Sol Safran Prime Enterprise Data Management (EDM)[email protected]
13
Back-Up Slides
• Some slides with examples of issues underdiscussion
IRS Governmental Liaison Data ExchangeProgram (GLDEP)
• Data Extracts from various IRS master files anddatabases
• Shared with state and local tax administrations• Currently 15 GLDEP data extracts• Agencies enroll annually and select which
extracts they wish to receive
14
Current Extracts
• 1099 Misc• BMF/BRTF• Corporate Affiliations• CP 2000• Exam/Appeals• FEIN (Federal Employer
ID Number)• IMF/IRTF
• IRMF• Levy• Military Combat Zone• Non-Itemizer• PTIN (Preparer Tax ID
Number)• TAR (Taxpayer Address
Request)
Examination OperationalAutomation Database
• EOAD is a relational database using different“files” within one database for different purposes
• Data is separated first by type of return (Form1040, 1120, 1065, and 1120S) and then bytype/purpose of records (Entity, Issue, Other,and Partners)
15
Compressibility of XML
txt
XML
WinZip
674,062 bytes
Ref: Dan WinkowskiMITRE XML Intro for Managers
11,421,822 bytes
148,294 bytes 94,369 bytes
The compressed version of the XML document is smaller than the compressed version of the original document!
translate