+ All Categories
Home > Technology > Multithreaded XML Import (San Francisco Magento Meetup)

Multithreaded XML Import (San Francisco Magento Meetup)

Date post: 11-May-2015
Category:
Upload: aoe
View: 476 times
Download: 1 times
Share this document with a friend
Description:
Author: Fabrizio Branca 1. Multithreaded XML Import …for Magento San Francisco Magento Meetup Group - October 23, 2013 2. Fabrizio Branca Lead System Developer at 3. E-Commerce: Magento CMS: TYPO3 Global Enterprise Projects Portals: ZF, FLOW,… High Performance /Scale Mobile Searchperience: SOLR 120 people in 7 offices world-wide 4. Aoe_Import github.com/AOEmedia/Aoe_Import git clone --recursive … 5. Will Aoe_Import be the fastest product importer around? YES, of course! Well, maybe… Actually, Aoe_Import is only a XML Importer “Framework”. It’s up to you to decide how to handle the xml snippets… 6. for large XML files XML! Not CSV. full flexibility in processor implementation Aoe_Import multi-thread support! Subscribe your “Processors” to xpaths Stream processing (XMLReader) “event” driven 7. memory single product Problem Memory limit time 8. Memory limit time memory memory Trivial Solution Memory limit time 9. Beat the memory Leak by forking Waiting for other thread to terminate Threading overhead Process import 10. Forking? In PHP? $pid = pcntl_fork(); if ($pid) { // parent process runs what is here echo "parentn"; } else { // child process runs what is here echo "childn"; } 11. Threadi github.com/AOEmedia/Threadi 12. Clean OOP interface for PHP to forking and process management Threadi 13. Batch Processor Collect a bunch of imports … …fork… …and process them in a child process. 14. No imports are processed in the main thread. So there’s no memory leak happing here Main thread memory Memory limit time Create process collection Waiting for other thread to terminate Threading Process imports in process collection overhead Forks Every fork starts with the low memory footprint of the main thread Find the number of imports that can be processed at a time without hitting the memory limit 15. Multi-threading? Sure! Number of threads processed in parallel Number of items in a batch 16. Problems? Database Connection Database connection doesn’t like to be cloned! Mage::getSingleton('core/resource') ->getConnection('core_write') ->closeConnection(); 17. Problems? Thread Safety 18. Problems? Thread Safety --- a/app/code/core/Enterprise/Catalog/Model/Index/Action/Catalog/Category/Product/Refresh.php +++ b/app/code/core/Enterprise/Catalog/Model/Index/Action/Catalog/Category/Product/Refresh.php @@ -326,7 +326,7 @@ class Enterprise_Catalog_Model_Index_Action_Catalog_Category_Product_Refresh ->setComment('Catalog Category Product Index Tmp'); $this->_connection->dropTable($this->_getMainTmpTable()); $this->_connection->createTable($table); $this->_connection->createTemporaryTable($table); + } /** 19. Other Use-Cases? Scheduler Queue processing Indexes Everything that’s batchable 20. Thank you! Any questions? My blog http://www.aoemedia.com http://www.fabrizio-branca.de @fbrnc Follow me on twitter!
Popular Tags:
20
XML Import Multithreaded …for Magento San Francisco Magento Meetup Group - October 23, 2013
Transcript
Page 1: Multithreaded XML Import (San Francisco Magento Meetup)

XML Import Multithreaded

…for Magento

San Francisco Magento Meetup Group - October 23, 2013

Page 2: Multithreaded XML Import (San Francisco Magento Meetup)

Fabrizio Branca Lead System Developer at

Page 3: Multithreaded XML Import (San Francisco Magento Meetup)

E-Commerce: Magento

CMS: TYPO3

Portals: ZF, FLOW,…

Mobile Searchperience: SOLR

120 people in 7 offices world-wide

High Performance

/Scale

Global Enterprise Projects

Page 4: Multithreaded XML Import (San Francisco Magento Meetup)

Aoe_Import github.com/AOEmedia/Aoe_Import

git clone --recursive …

Page 5: Multithreaded XML Import (San Francisco Magento Meetup)

YES, of course!

Well, maybe…

Will Aoe_Import be the fastest product importer around?

Actually, Aoe_Import is only a XML Importer “Framework”. It’s

up to you to decide how to handle the xml snippets…

Page 6: Multithreaded XML Import (San Francisco Magento Meetup)

Aoe_Import

XML! Not CSV.

for large XML files

full flexibility in processor implementation

Stream processing (XMLReader)

“event” driven Subscribe your

“Processors” to xpaths

multi-thread support!

Page 7: Multithreaded XML Import (San Francisco Magento Meetup)

Problem

Memory limit

mem

ory

time

single product

Page 8: Multithreaded XML Import (San Francisco Magento Meetup)

Trivial Solution

Memory limit

mem

ory

time

Memory limit

mem

ory

time

Page 9: Multithreaded XML Import (San Francisco Magento Meetup)

Beat the memory Leak by forking

Waiting for other

thread to terminate

Threading

overhead

Process

import

Page 10: Multithreaded XML Import (San Francisco Magento Meetup)

Forking? In PHP?

$pid = pcntl_fork(); if ($pid) { // parent process runs what is here echo "parent\n"; } else { // child process runs what is here echo "child\n"; }

Page 11: Multithreaded XML Import (San Francisco Magento Meetup)

Threadi github.com/AOEmedia/Threadi

Page 12: Multithreaded XML Import (San Francisco Magento Meetup)

Threadi

Clean OOP interface for PHP to forking and process management

Page 13: Multithreaded XML Import (San Francisco Magento Meetup)

Batch Processor Collect a bunch of imports …

…fork… …and process them in

a child process.

Page 14: Multithreaded XML Import (San Francisco Magento Meetup)

Waiting for other thread to terminate

Threading

overhead

Process imports in process collection

Create process

collection

Memory limit

mem

ory

time

Main thread

Forks

No imports are processed in the main thread.

So there’s no memory leak happing here

Every fork starts with the low

memory footprint of the main thread

Find the number of imports

that can be processed at a

time without hitting the memory limit

Page 15: Multithreaded XML Import (San Francisco Magento Meetup)

Multi-threading? Sure!

Number of items in a batch

Number of

threads

processed

in parallel

Page 16: Multithreaded XML Import (San Francisco Magento Meetup)

Problems? Database Connection

Mage::getSingleton('core/resource') ->getConnection('core_write') ->closeConnection();

Database connection

doesn’t like to be cloned!

Page 17: Multithreaded XML Import (San Francisco Magento Meetup)

Problems? Thread Safety

Page 18: Multithreaded XML Import (San Francisco Magento Meetup)

Problems? Thread Safety

--- a/app/code/core/Enterprise/Catalog/Model/Index/Action/Catalog/Category/Product/Refresh.php +++ b/app/code/core/Enterprise/Catalog/Model/Index/Action/Catalog/Category/Product/Refresh.php @@ -326,7 +326,7 @@ class Enterprise_Catalog_Model_Index_Action_Catalog_Category_Product_Refresh ->setComment('Catalog Category Product Index Tmp'); $this->_connection->dropTable($this->_getMainTmpTable()); - $this->_connection->createTable($table); + $this->_connection->createTemporaryTable($table); } /**

Page 19: Multithreaded XML Import (San Francisco Magento Meetup)

Other Use-Cases?

Queue processing

Scheduler

Indexes

Everything that’s batchable

Page 20: Multithreaded XML Import (San Francisco Magento Meetup)

Thank you! Any questions?

http://www.aoemedia.com

http://www.fabrizio-branca.de

@fbrnc Follow me on twitter!

My blog


Recommended