Grammarly - example.celunwen.comexample.celunwen.com/grammarly ·...

Grammarly GrammarlyReportgeneratedonSaturday,May6,2017,7:40AM Page1of35

3

1

1

11

5

4

2

1

1

1

6

5

1

4

DOCUMENT SCORE

93ISSUESFOUNDINTHISTEXT

54PLAGIARISM

Checkingdisabled

ContextualSpelling 5

ConfusedWords

MixedDialectsofEnglish

MisspelledWords

Grammar 25

DeterminerUse(a/an/the/this,etc.)

FaultySubject-VerbAgreement

IncorrectVerbForms

IncorrectNounNumber

WrongorMissingPrepositions

PronounUse

IncorrectPhrasing

Punctuation 11

CommaMisusewithinClauses

PunctuationinCompound/ComplexSentences

SentenceStructure 1

IncompleteSentences

Style 12

PassiveVoiceMisuse

of100


4

3

1

WordySentences

ImproperFormatting

PoliticallyIncorrectorOffensiveLanguage

Vocabularyenhancement Noerrors


Abstract

MoreandmoredatasetsmadeforHumanActivity

Recognition(HAR)havebeenmadeavailableforpublics

inrecentyears.AndHumanActivityRecognitionhasgain

attentionduetoitswiderangeofapplicationfrom

surveillance,medicalpersonal assistedtool,roboticto

theinteractionbetweenhumanandmachine.Andwith

deeplearningtechnicsappliedrecentlyespeciallyfor

imageclassificationresearchershaveswitch andfocus

moreandmorefrom traditionalprocessingtodeep

learningtechnics.Although, extractingthecorrect

featuresforfurtherprocessingstillachallenge,traditional

technics stillbeenusedforinHARtoavoid

computationalcomplexitythatcomewithdeeplearning

methodologies.Understandinghumanbehaviorsisa

challengingproblemincomputervision,wehave

witnesses recentlysignificantadvanceswithproposed

novelmethodologies fortracking,poseestimation, and

movementrecognition.Thissurveyisasuccinct

descriptionofdifferentexistenttechnicsandmethods

applyinHAR,followingprevioussurvey andpapers.

Keywords:Humanactionrecognition,Activity

recognition,featureextraction

1.Introduction.

SincetheearlyfourteenhundredwithDaVinciworkand

studieswhichwas interestedinHumanAppearancesto

helphisstudentdrawingperfectly Humanactionsuchas

peopleclimbing,goingupstairsorgoing

downstairs[https://www.slideshare.net/zukun/cvml2011-

human-action-recognition-ivan-laptev-9017571].Withhis

work,oneofwelldocumented researchinearly

HumanActionRecognitionDaVinciinsistthatapainter

1

2

3

4 5

6

7

8 9

10

11

12

13 14

1

Possiblyconfusedword:personal

2

[switch switched]→3

[morefrom moreon]→

4

[Although ],5

Unusualwordpair

6

[technicshas]

7

Possiblyconfusedword:witnesses8

Repetitiveword:methodologies9

[estimation ],

10

Repetitiveword:survey

11

[was were]→

12

Overusedword:perfectly


shouldbefully awareofthebodystructure(nerves

system,musclesandbonesstructures,etc.)tounderstand

variousmotions.

Intelligentenvironment(intelligent home,intelligent

electronicdevices)exploitdatacollectedfromusersand

anticipatetheprobabilityoftheendresult whetherbad

orworstcasescenario.Thesystemisableto getthe

information,interpreteditandthentakeanaction or

suggestanaction.Asweareintheeraofintelligent

automatesystem . Andcommon tasks:walking,

standing,running,sleeping,etc.arebeingstudy and

interpretedbycomputer system.

Identifyhumansfromvideosourceshasattracted

increasingattentioninseveralapplicationdomains,such

asforcontent-basedvideoannotationandretrieval,video

surveillance,andotherapplications[1]–[3],butgiving

semanticmeaningtohumanactionorbehaviorisso

challenging,infactitnotnecessarilyeasytounderstand

whatanaction really mean. Thiscomplexityis

source ofchallengesfromanacademicpointofview.In

fact,thereisnobetterwaytocategorizedresearchdueto

itscomplexity,butmainlyfollowing[4]wecan

categorize inthreetype: Surveillance,Control and

Analysis.

Peoplecountingorcrowdflux,flow,andcongestion

analysisinpublicarea suchastrain,busstationor

mall[5]canbegrouped inSurveillanceapplications ,

HumanComputerInterfaces[6]orvirtualrealitycanbe

grouped inControlapplicationsandDiagnosisofpatient

canbegrouped assuchinAnalysisapplications of

HumanActionRecognitionorComputervisionfield.

Thepotential amountofapplications ,thespeed and

priceofcurrenthardwareespeciallyinpoorcountries

andthefocusonsecurityissueshaveintensifiedthework

withinthecomputervisioncommunitytowardsretrieving,

collectingandanalyzinghumanbehavior.Furthermore,the

15

16

17

18

19

21 20 22

23

24

26 27 25

28

29 30 31

33

34 35

36

38 37 39

32

40 41 42

43

13

[welldocumented well-documented]→14

Unusualwordpair

15

Overusedword:fully

16

Unusualwordpair

17

[endresult result]→18

[isableto can]→19

[takeanaction takeaction]→20

Sentencefragment21

[thesystem]22

Overusedword:common

23

[ ]24

[acomputer or thecomputer]

25

Wordiness

26

Repetitiveword:action27

Overusedword:really28

[asource or thesource]

29

Repetitiveword:categorize30

[type: type:]→31

[Control ],32

Wordiness33

[area areas]→34

Passivevoice35

arebeingstudy arebeingstudied→


riseofterrorismandsecuritiesissueshastremendously

increase theresearchfieldespeciallyinsecurity[7],

meansinsurveillance.

MajorapplicationsofHARarefoundinsecurity,

medical,entertainment,interaction.Thankstoprevious

studiescounterterrorismteamcandetectandpredictfrom

acertainnumberofpatternsandtechnicsasuspicious

behavior.Inmedical,personaldevicescanhelpprovide

liveandaccuratehealthstatusofapatient(inparticular

oldpeople)assuchprovideagooddirectandquick

responsefromthedoctor.Inentertainment,theHAR

methodsappliedcanhelpidentifyandevenpredicta

playernextmoveandinInteractiontheapplicationof

HARmethodsprovidegoodroboticssystemthatcome

closetotheperfectionofexpressing,understandingand

reflectinghumanbehavior.Soaccordingtothecomplexity

ofthefacingsituationcategoriesmaybedeterminelike:

actionbehavior,gesturesbehaviorandinteractions

behavior[8]asinFigure1bellow.

Anactionit’saformofexpressionwithiscomposeof

differentgestures:running,climbingareexamplesof

commonactionsandhasvariabletiming.AGestureit’sa

non-vocalformofcommunicationwheretheactorexpress

andexchangeinformationviaonepartoracombinationof

somepartofthebodymostlyhands,foot,andhead.Often,

thegesturedoesnotexistinalongperiodtime.Andan

Interactionit’sanactionduringwhichactors(humansor

inhuman)exchangeinformationorinteractsuchin

hugging,scanningQRcodeusingonedeviceoveranother

device.

DuetochallengesandissuessurroundingHuman

ActivityRecognition:intra-classvariations,viewpoint

variations,environmentalcomplexities,occlusions,and

more.Currentsystem,stillnotworkingwithaccuracy

result.ThestudiesinHARremotetoearlydecades,

researchersarestilltryingtocomeclosetohumannature

44

[applications Applications]→36

Passivevoice

37

Passivevoice38

Repetitiveword:grouped39

[applications Applications]→

40

Unusualwordpair41

Repetitiveword:applications42

[speed ],43

Possiblepoliticallyincorrectlanguage

44

[increase increased]→


ofgettingfewitemseriesandcategorizeitwhichwillbe

calledfilterortrainingsetlaterandfromthesefilterbeing

abletoclassifyanyotherelementthattheymaybefacing.

So,incomputervisionresearcheraretryingtomatchthat

humanparticularity.But,wemustacknowledgethatgreat

significantadvanceshavebeenmadesofareventhoughit

stillcan’tmatchhumanvisionsystem.

Therearemethodswithmanualdesignfeaturesanddata

drivingbasedapproacheswhicharedistinctivebytheway

classificationisappliedsuchas:HistogramofOriented

Gradients(HOG),LocalBinaryPattern(LBP),Scale-

InvariantFeatureTransform(SIFT),Hessian3D,and

EnhancedSpeeded-UpRobustFeatures(ESURF)applied

inmanualdesignfeaturesanddatadrivingbased

approachesmostlyusingdeeplearningwherethefeature

aredetect,interpretandprocessautomaticallybythe

systemcomparetooldapproacheswherethefeatureare

chosenbythehuman.

Ingeneral,traditionalapproachesapplybottom-up

methodologyin3stepsforeground,featureextractionand

finallyclassificationFigure2.Aspreviouslynoted,

multiplesurveys,reviewshavebeenpublishedwith

differenttaxonomyandapproachtodealwiththeHuman

ActionRecognition.[8]classifyHARintotwocategories

singlelayeredapproachandhierarchicalapproachwhere

singlelayeredfocusongestureandactionorinotherword

lowlevelhumanactivitiesincontrasttohierarchicalthat

focusonmorecomplexactivitiesorhighlevelhuman

activitiessometimescalledsub-events.Withsubcategories

ofspacetimeapproachandsequentialapproachforsingle

layeredmethodandstatistical,syntacticanddescription

basedforhierarchicallayeredapproach.

[9]presentedavailableresources,datasetsandlibraries

andchallengesofHARtodealwithproblemsof

backgroundsubtraction:changedetectionandsalient

motiondetection.Otherresearchersstudyvideobase


representationwiththeparticularityof[10]categorizing

globalandlocalfeaturesextractionwherebackground

construction-basedmethodsandforegroundextraction-

basedmethodswasusedintheresearch.[11]and[12]

respectivelyprovidingareviewcoveringstagesprocessof

HARfromlow-levelprocessingstagestohigh-level

featureprocessingapplicationswithafocusonhealthcare

andlastprovidingvariousobjectsegmentation,image

processingandactivityrecognitionbybriefingonsensor-

basedvision-based,HiddenMarkovModel(HMM)also

PrincipalComponentAnalysis(PCA).

Occlusion,variationinexecutionrate,anthropometry,

cameramotion,andbackgroundclutteraresomeof

challengesasmentionedearly,facedinHARasnotedin

[13].Mid-Levelfeaturerepresentationbyapplyingsparse

classifierfordiscriminativepartsselectionwasstudyin

[14]similarly[15]studyconfidentbasedinHARby

proposingamethodofmakingchoicebetweentheDense

Trajectories(DT)featurelevelandthehigh-levelpose

features.AliteraturereviewonsemanticbasedHAR

systemusingsemanticfeaturesispresentedin[16].

Acquiringdataisoneofthemostrequirestepin

computervisionandcanbeobtainfrommultiplesource.

Assuch,theoverallfunctionalityofthesystemis

impactedbytheuseofappropriatetool.Andconvincing

improvementhavebeenmadetowardtheseend[17][18].

Dependingonthedimensionalityandthedepththedata

obtainfromthesedevicesareclassifyinto2Dand3Dtool.

Whenacquiringdatainto2Dform,thereisalossof

informationfromonedimensionbecauseinrealitydataare

in3Ddimensionfrequently.Whichimplytoothatsystem

applying3Dapproacharemoreaccuratethan2Dsystem.

ExistentreviewsandsurveysexistonHARbutduetothe

popularitythatthefieldisgainingthosedocumentsare

gettingoutdated,intrinsicallywritingareviewinafield

whichimprovementaretooubiquitousischallenging.In


thispaperwecontributewithdiscussionandcomparison

ofmethodsapplyinginHARtherestofthesurveyis

organizedasfollowsfollowingtheintroduction:Section2

discussmanualdesignfeaturesapproach,Section3

discussDataDrivingBasedapproach(deepandnon-deep

learning),Section4somediscussionSection5introduce

someexistentdatasetendinginSection6withthe

conclusion.

2.Manualdesignfeatures

ManualdesignfeaturesapproachappliedinHARhas

accomplishimpressiveresultovertheyearsofit

application.Theapproachusefeaturedetector(globalor

localfeature)incaseoflow-levelfeatureorhigh-level

featurepassingmiddle-levelfeaturetoextractimportant

features(portionpropertyoftheoverallimageorsequence

ofimages).Then,itclassifiesbytrainingclassifierlikethe

SupportVectorMachine(SVM)[19][20][21][22];the

approachincludesspace-timebased,spacetimevolumes,

spacetimetrajectories,spacetimefeatures,appearance-

based,shapebased,motionbased,hybrid,localbinary

patterns,andfuzzylogic-basedtechniquesasshownin

Figure2withaccentuationonlow-levelfeatures,mid-level

features,andhigh-levelfeatures[23]spatio-temporal

featuresasinspiredindatamodelof[24][25]andmany

morewhichhaveattainedgoodresultforaction

recognition.

Thereputationofhumanactionrecognitionorhuman

behaviorrecognitionhasledtonumerouspublished

articlesandpapers[6],[26]–[31].Thesearticlesfocuson

differentfeaturesandclassifiersusedinhumanbehavior

recognition.Inpracticeconsiderablehardwareresources

andvisionalgorithmsarerequiredtocomputethedata

(acquiring,saving,processing2D,3Dfixandmovingdata

inputs).And3Ddatacanbeobtainedthroughmostlytree

componentscategories:marker-basedmotioncapture

systemsMoCap[http://mocap.cs.cmu.edu/]it’sthe


perfectillustration,thenwehavestereocamerasand

finallyrangeordepthsensorssuchasMicrosoftKinect.

Despitethefactthatvision-basedactionrecognition

continuestogrowth,variouschallengesstillnotresolve

completely:variousactions,moodoftheactor,occlusion,

cameraposition,backgroundetc.Wher eassome

researchershaveutilizedwearableinertialsensors

includingaccelerometersandgyroscopes(mostly

smartphone)[32]–[37]tosolvetheseissues.Evenif,there

aremanypapersrelatedtoHumanBehaviorRecognition

usingwhetherdepthsensororinertialsensors,thepurpose

ofthissurveyit’stoinformonthecurrentstateof

applicationincomputervisionfield.

Acquiring3Ddatarequiretools,thebasiconwhichis

almostaffordabletoallistheKinect(MicrosoftorxBox

)butthecheapandeasytoolisthesmartphonewiththe

latests tatisticreporting2.32billionuser’sFigure3

worldwide[https://www.statista.com/statistics/330695/num

ber-of-smartphone-users-worldwide/(accessApril2017)].

TheKinectsensorinclude:acamera,anInfrareddepth

sensor,amicrophoneandanLEDlightasshowninFigure

4andFigure5.Itcancapture8and16-bitswitha

resolutionof320×240and640x480pixelsproperties

resolutionperchannel.Heterogeneousmethodhasbeen

appliedtocomputetheobtaineddatafromthesetools[2],

[3],[38]–[41].

Andforwearableinertialsensorswhichisoftendirectly

connectedorplacedonhuman(smartphoneandother

sensorsequipment)andinothercase(rarely)indirectly

connectedorplacedonthehuman;theygenerate

accelerometerandrotationsignalscorrespondingtoan

actionperformedbytheactor(humanmostly),Figure4

showsacaptureimageofa3Dskeletonsourceofdata.

Andacquiring2Dinformation,requireaneasyan

accessibletooltoallsuchasmobilephoneincorporatinga

camera.Thisshowhowaccessingto2Ddataismore

45

46


simplecomparetoaccessing3Ddata.

2.1.Appearancebasedapproach

Shape,motionandhybridbasedapproachare

discussinginthispart,wheremethodologiesandtechnics

areappliedon2Daswellason3Ddata.Shapebasedare

pursuitobjectivein[42]withauthorsproposingtheuseof

bagofwords(BOW)frameworktoclassifyeachframesof

avideoandin[43],tensorshapedescriptorandtensor

dynamictimewarpingwasuseby[44].Morearticlesalso

appliedappearancebasedapproachintheirfounding:

gesturerecognition[45],blobanalysis[46].

2.1.1.Shapebased

Inthisapproachfeaturesareobtainedfromshape

featuresilhouette.[47]obtained3Ddatawhichisconvert

to2Ddatausingspatialdistributionofgradientsthedatais

thencomputewithR-transformthetechnicisappliedon

Weizmann,KTH,andBalletdataset.In[17]theauthors

analyzedmapsfeaturetoseparatesilhouettefromnoisy

backgroundlatertheframeworkperformatrackingto

checkthesilhouettemovementinthescene.Themethod

createssequenceofscenefromthehumansilhouettemaps

representationandusedahybridclassifier.Inpractice

HARmethodshouldbecomputationallylean.Similarly

methodwasproposedusingK-neighborin[48].

In[49],proposedapose-basedviewinvariantHAR

methodbasedonthecontourpointswithsequenceofthe

multi-viewkeyposes.In[50].theauthorsemploythe

contourpointsofthehumansilhouetteandradialscheme

withtheSVMasclassifier.[51],[52]buildaregion-based

descriptorfromextractingfeaturesfromsurrounding

regionsofthesilhouetteintheimage.[52]usedpose

informationbyfirstly,extractingthescaleinvariant

features,andthenclusteredittobuildthekeyposes,

finishingbyclassifyingusingaweightedvotingscheme.

2.1.2.Motionbased

Fortheapproachfeaturesareobtainedfrommotion

45

[themood or amood]

46

[) ],


featuresappliedwithgenericclassifier.Amotion

descriptorwasproposedin[53]forunconstrainedvideos

representation.Themotiondescriptorisbasedonmotion

explicitmotionmodelingoperatingoncodewords

generatedbydenselocalpatchtrajectories,and,sodoesn’t

needforeground-backgroundseparation.Anothermotion-

basedmethodwasintroduceby[54]usinghistogramof

orientedgradients.In[55],actionrecognitionmethodwas

proposedbasedonHumanObjectInteractiondescriptor

andposeestimation.Otherauthorsappliedkinematic

splinecurves[56],multiplekeymotionhistoryimages

[57],motiontrajectories[58]andjointmotionsimilarity

[59].

2.1.3.Hybrid

Approachescombiningshape-basedapproachandmotion-

basedapproachfeatures.Anmaplevelandsilhouette-

basedshapefeatureswereusedforseparatingthenoise

fromtheactualsilhouettein[54]followedbyan

histogramsoforientedgradientstobetterclassify.Other

methodsbasedonhybridapproachwereproposedin[60]

[61].TheBOWandablock-wiseweightedkernelfunction

matrixwereusedformulti-viewin[62].While,[63]

appliedshape-motionprototypetrees.Representingaction

asasequenceofprototypesanddistancemeasurewas

usedforsequencematching.Methodtestedon5datasets.

[64]proposedkeyposesmethodasvariantofmotion

energyandmotionhistoryImageswithsimplenearest-

neighborclassifier.

2.2.Spacetimebasedapproach

Approachesthatfocusonrecognizingactivitiesbasedon

space-timefeaturesoralsoontrajectorymatching.Andan

activityisrepresentedbyasetofspace-timefeatures.It

hasfourmajorcomponents:thespacetimeinterestpoint

withtwosub-categoriesdensedetectorsandsparse

detectors;featuredescriptorwithlocalandglobalfeatures

astype;vocabularycompriseofBOWandmodelbased


andfinallytheclassifierwithsupervisedandunsupervised

categories.Figure6showanexampleofahumanactions

withdensetrajectoriesappliedin[65]

AndFigure7showthedifferentmajorcomponent

availableandappliedinspacetimeapproach.Moreover,

[66]employedmotionfeaturesasinputtohidden

conditionalrandomfields(HCRF)totacklemuchbroader

rangeofcomplexhiddenstructureswhereas[67]proposed

aRealtimeclassificationandpredictionofactions.

AnactiondescriptorofHIP,relyingontheworkof

[68]wasproposeby[69]and[70]proposedtoincorporate

informationfromhuman–objectsinteractionsapplied

overseveraldatasets.

2.2.1.Spacetimevolumes

In[71],anHARsystemwasproposedusingtemporal-

spatialsemantic,insteadofusingSTVtheauthorsused

templatescomposedof2Dobservations.Theapproach

wasthenextendedby[72]wheremotionhistoryimage,

foregroundimageapproachandHOGwerecombined,to

finallyusedSMILE-SVMforclassification.Applying

spacetimebasedapproachondifferentdatasetshave

shownoutstandingaccuracyresultoutputsuchasin[73]

withanaccuracyperformanceof98.2%appliedoverthe

KTHDataset.And[74]withaperformanceof89.4%over

theUCF(UniversityofCentralFlorida)datasetusing

discriminativeclustering,treemining,treeclusteringand

rankingtoselectdiscriminativetreepatterns.

2.2.2.Spacetimetrajectories

Humanactioncanbeseenassetofspatio-temporal

trajectories,trajectoriesinSpacetimetrajectorieshave

differentlevelsofabstractionfromlow-leveltrajectoriesto

high-leveltrajectorieslikehandwrittencharacters.

However,allspacetimetrajectoriesapproachhasa

commonproperty:time-structuredpatterns.Spacetime

trajectoriesisappliedonjointposition(bodyjoint)to

differentiateactions.Fromthesenotionmanypapershave


beenpublishedandapproacheshavebeenproposed[75],

[76].

Inspired byimageclassificationdensesamplingmethod

[65]introducedtheconceptofdensetrajectoriesapplyon

videoactionrecognition.Aftersamplingandtrackedusing

displacementinformation,densepointsfromimageframe

ofdenseopticalflowfield.Theapproachshows

robustnessoftheproposaltoirregularmotionchanges.

[77]Improve[65]workbyusingSURFdescriptorand

denseopticalflowtooptimizetheestimation.However,

whenapplytheapproachwithhighdensitytrajectories

featuresinthevideothecomputationalcostincrease.In

fact,therearehavebeenattemptstoreducethecost,to

tacklethechallengesaliencymapmethodwasusedto

capturesalientregionwithinaframeasin[78],[79],[80].

Assuch,applyingthesaliencymapallowtodropsome

densetrajectoriesfeatureduringtheprocesswithout

compromisingtheframeinput.

In2016twomajorpublicationswasmadeavailable[81],

[82]representingskeletonshapesastrajectorieson

Kendall’sshapemanifold.Themethodusestransported-

squarerootvectorfields(TSRVFs)oftrajectoriesand

standardEuclideannormtoreducethecomputationalcost

andincreaseacomputationalefficiency.And[83]used

HOG,HOF,andMBHmethodfortrajectories,recording

anhighestaccuracy.[53]proposetheuseofexplicit

motionmodellingmethodtoresolvethechallengeofHAR

inunconstrainedvideosinputdata.

2.2.3.Spacetimesfeatures

Ingeneral,spacetimefeaturearelocalpropertiesthat

containdiscriminativeactioncharacteristics.Andcanbe

dividedinto2separatecategories:sparsepropertyand

denseproperty.Featuresdetectorsbasedoninterestpoint

detectorssuchasBOW[84],and3DHOF[85]are

groupedinsparsecategory,whilethosebasedonoptical

flowaregroupedintodensecategory.It(interestpoint

47

48


detectors)providefound ationformostrecentmethods

(algorithms)proposed.

[86]buildafeaturedescriptorframeworkandapplyPCA–

SVMforclassificationand[87]usedacomparisonof

Harris3DandMultimodalDecomposableModelsfor

classification.BOWstillthemostpopularmethodfor

representationwithallthedifferentvariationssuchas

BOVWfollowingfeatureextractionstep,codebook

generationstep,encodingstepandpoolingstep[88],[89],

[90],[91].TheperformanceofBOVWvariantofBOW

approachisduetoeffectivedensetrajectorylowlevel

feature.Tofurtherimprovespacetimefeaturemethodand

providebetterperformancesomeresearchersapplied

Fishervector,spacetimeoccurrence.

Spacetimeapproachwithfeaturedetectorwitha

particularityofglobalfeaturehasadisadvantageofbeing

sensitivetonoiseandtoocclusions.So,detectingthe

presenceofmultiplepersoninascenemakespace-time

approachescanhardtorecognizeactions.But,space-time

featuresfocusmainlyonspatiotemporalinformation.Other

limitationsareSTVsapproacheslackthecapacityof

recognizingmultipleentity(person)inamultipleperson

imageframe.Trajectory-basedapproacheslackthe

precisioninlocalizejointposition.Spacetimeapproach,

eventhoughsuitableforsimpledatasetrequiremultiple

featurecombinationtohandlecomplexdatasetwhichalso

increasingthecomputationalcomplexity.However,to

overcomethelimitationswemayapplythebackground

subtractiontechnic,slidingwindowandmoremethods.

2.3.Otherapproaches

Paradoxicaltopreviousparagraph,thereareother

methods,technics,approachwhichcanbegroupedand

categorizedastraditionalapproach,butcan’tfitin

formerlyappearanceorspacetimeapproach.Forthat,we

havegroupeditinotherssuchasLocalBinaryPatternand

fuzzylogic-basedapproach.

49

47

[published ],


2.3.1.Localbinarypattern

ThisisatypeofvisualdescriptororTextureSpectrum

modelusedforclassificationincomputervision,introduce

inthefieldin1990by[92][

https://en.wikipedia.org/wiki/Local_binary_patterns].Since

itsintroduction,LBPcombinedwithHOGhasshown

considerableimprovementindetectionperformanceanda

fullLBPsurveyofthedifferentversions wasproposed

by[93]in2016.

Severalversionssuchashavebeenproposedfordifferent

classification[94],[95].AHARfacerecognitionwas

proposeby[96]basedonNearestNeighborInterpolation

classifier.ThismethodwasappliedontheOlivetti

ResearchLaboratorydatasetresultinginanaccuracyof

97.5%recognitionrateperformance.Anotherhuman

actionrecognitionapproachusingLBPwithGaussian

mixturewasusedin[97],theauthorsmethodontopof

intensitydifferencepropertyofLBPintroducethe

extractionofmultiplefeaturewitherrorcorrectingoutput

codeapplyoverthesimplevectormachineclassifier.

Thelinearbasepatternapproachwasalsobeenapplied

formulti-viewHAR,likein[98],whereamulti-view

basedoncontour-basedposefeaturesanduniform

rotation-invariantwithsimplevectormachineclassifier.

MotionBinaryPatternwasintroducedformulti-viewHAR

by[99]incombinationofVolumeLocalBinaryPattern

andopticalflow.AndwastestedovertheINRIAXmas

MotionAcquisitionSequencesdatasetwitharecord

performanceaccuracyof80.5%.

2.3.2.FuzzyLogic

Traditionalapproachesemployspatialortemporal

featureswithgenericclassifierforrepresentationand

classif ication.However,itischallengingtohandle

uncertaintyandcomplexityinvolvedinrealworld

applications.And,sotoresolvethisissuetheFuzzylogic

approachwasintroduced,tobenefitfromitparticularityof

50

51

48

[animage or theimage]


consideringastruthonlyintegervariablesofvalueina

rangeof0to1.Butthenotionandtermwasfirstly

introducedinnineteensixty-fiveinafuzzysettheoryby

Lotfizadhe[

https://en.wikipedia.org/wiki/Lotfi_A._Zadeh].

Toresolvetheseuncertainty,fuzzylogicbased

approachhasbeenappliedasin[100]basedonInterval

Type-2FuzzyLogicSystemswithfeatureinformation

optimizewithBigBang-BigCrunchalgorithm,the

experimentswereperformedonWeizmannhumanaction

datasetwhichoutperformedtheequivalentType1Fuzzy

LogicSystemandnon-fuzzymethodsregarding

recognitionaccuracyandanalysisperformance..In[101]

authorsutilizedsilhouetteslicesfeaturesandmovement

speedfeatures,andemployedfuzzyc-meansclustering

techniquetoacquiremembershipfunction.Andin[102]

fuzzylogicbasedclassifiermethodwasusedtorecognize

humanintention,[103]appliedfuzzyviewestimation

frameworktopredictsquatevolutionofscenarios.

MostHARappro achesdependontheviewand

recognizeanactivitythroughfixedviewpoint.However,in

realtimeworldapplicationstherecognitionmustcome

fromanyviewpoint,whichintroducetheuseofmulti

cameratocollectthedata,butthissolutionisdifficultin

practicebecauseofcameracalibration.Followingthispath

[104]proposeamethodforviewinvariantusingsingle

cameraandclusteringalgorithm,themethodwasapplied

overtheIXMASdataset.Inadditionotherapproachfocus

onneuro-fuzzysystemshavealsobeenproposedfor

gesturerecognitioninparticular[105]andotherbehavior

recognition[106]arealsoverysuccessfulinbehavior

recognition.

3.DataDrivingBasedApproach

Wementionitinpreviouslinestheperformanceofthe

HARdependsonthemethodsandtheappropriatechosen

featureaswellasefficientrepresentationofdata.

52

49

[adense or thedense]


Dissimilartotraditionalapproacheswheretheactionis

representedbypicked(chosen)featuredetectorsand

descriptors;learning-basedapproachintheotherhand

havecapabilitytoautomaticallylearnthefeaturefromraw

data,alongthislineintroducingend-to-endlearning

concept,meaningconversionfrompixelleveltoaction

classificationlevel.Theseapproachesaregroupedinnon-

deeplearningapproachanddeeplearningapproachas

showninFigure8bellow.

3.1.Non-DeepLearning-Based

Asoneofthecategorydictionarylearningapproachisa

typeofrepresentationgenerallyfocusingonsparse

representation.Ithasbeenusedinmanyapplicationslike

inimageclassificationorinactionrecognition[107].The

conceptissimilartoBOVWmethodologybecauseit

basedonvectorsrepresentation.Andthesevectorsalso

calledcodewords,alsocalleddictionaryatoms

sometimes.[108],fourdatasetweresubjectofthestudy

withtheauthorsapplyingspatio-temporalmotionfeatures.

Geneticprogrammingisanevolutionarytechnique

inspiredbytheprocessofnaturalevolution.Andmaybe

usedtosolveproblemswithouthavingpriorknowledge

andhelpmaximizingtherecognitiontaskperformance.

Alongthewayfeaturedescriptorevolvedonfilling3D

operatorssuchas3D-Gaborfilterandwavelet.

[109]proposebasedondiscriminativeBayesianonfive

datasettorecognizeactionandface.[110]addressthe

problemofCross-viewactionrecognitionbyusing

transferabledictionarypair.Theauthorsdifferentiate

specificdictionarieswhereeachdictionaryequaltoone

cameraview.Moreover,[111]extended[110]workwith

commondictionarytechnicwhichacquireinformation

fromdifferentviews.Aweaklysuperviseddictionary

learning-basedapproachwithtracelassowasproposedin

[112].Theapproachuseddictionaryandfullyexploiting

visualattributecorrelationsratherthanpriorslabel

50

[performance ],


information.In[111]theauthorsapplieddictionary

leaning-basedmethodsforcross-viewactionrecognition.

Thismethodusedtwodictionarylearningapproachesto

learnthesparserepresentationsofvideosregardlessofthe

views,byenforcingcorrespondencevideosinaset.Itwas

performedovertreedatasetandshowsgreatperformance.

3.2.DeepLearningBased

Thisispartofmachinelearningalgorithmsthatuse

cascadenonlinearprocessingunitlayerstoextractfeature

andtransformtheinputintomultiplesmallfeaturelevel.

AndEachlayerusesoutputfrompreviouslayerasinput.

Andthealgorith msmaybesupervisedforanalysis

patternorunsupervisedfor

classification[https://en.wikipedia.org/wiki/Deep_learning#

Definitions].

Previousstudiesappliedondifferentdatasetshowsthat

traditionalapproachdoesnotfulfilltotallytheprocessof

computervisionandactionrecognition.Assuch,HAR

systemthatcanofferthepossibilityofautomatically

determinefeaturedescriptor,learnandevolvewithoutthe

interventionofhumanwillbecrucialforevolvementof

actionrecognition.Thisiswheredeeplearningcomein

handyanditasshownoverthepaststudieshowimportant

itisinmachinelearningwiththeaimedoflearning

differentmultiplelevelsofrepresentationandabstraction,

tomakeinformationmeaningfulanddeeplearningasalso

shownitaccuracyandperformancehigherthantraditional

approachanditisappliedinspeech,images,videosand

textextraction,representation,andclassification.Asin

Figure8deeplearningcanbegroupedintotwoentities:

unsupervisedapproachsuchasDeepBeliefNetworks,

DeepBoltzmannmachines,RestrictedBoltzmann

Machines,andregularizedauto-encodersandsupervised

approach:DeepNeuralNetworks,RecurrentNeural

Networks,andConvolutionalNeuralNetworks.

Butduetothesuccessofmodelssuchasthesimple

53

51

[ageneric or thegeneric]


vectormachine,non-availabledatatoperformalgorithm

onfortrainingdeeplearningapproachhavereceivedlittle

attentioninthebeginningofcomputervisionfieldand

actionrecognitioninparticular.

3.2.1.Unsuperviseddeeplearningmodel

Duringtrainingprocessinthismodelthereisnoneed

forclass tolabel,meaningthismodelisusedandapply

whenfacingtheunavailabilityoflabelleddata.In2006,

[113]worktriggerthenotionofdeeplearningby

proposingdeepbeliefnetworksmethodwiththeusesof

unsupervisedalgorithmtotrainDNNalayeratthetime.

Thesameyearsaw[114]followingthesamepath

proposingafeaturereductiontechnicfordeeplearning.

Consideringtheintroductionofdeeplearningapproach,

therehavebeenanincreaseconcerntoapplythisapproach

fordivergentapplicationwhetheritisinimage,

classification,humanactionrecognition,speech

recognition,healthcaresystem,intelligenthome,object

recognitionormore.

[115]proposedforvideoactionrecognitionan

unsupervisedlearningapproach,wheretheauthorsuseda

spatialappearancefeatureandincorporatewithCNN

technic.Thesolutionproposedwasappliedonthe

ImageNetDataset.[116]proposedDBNwithRestricted

BoltzmannMachines.Despitethefactthatunsupervised

approachofferperformancehigherthantraditional

approachseenbefore,therestillachallengefacedby

researchers,becauseprocessingfromunlabeledvideodata

stillachallenge.

Tobringsomelighttoit,[117]usedunsupervised

approach,whereasdatawerecollectedfromfourdifferent

datasetappliedwithhybridfeaturemodelsandactive

learning.AnotherstudyusingDeepBeliefNetworkswas

proposedby[118]wheretheauthorsusedskeleton

coordinatesfeatureobtainfromdepthimages.Even,

thoughwehaveseenperformanceinitapplication,

5452

[thesquat]


unsupervisedapproachresearchersarelosingand

abandoningthemethodoverthesupervisedapproach,

especiallywiththeimplicationofConvolutionalNeural

Networks.But[119]studyadvocatethatinthefuture

unsupervisedapproachwillbethemostappliedapproach

ratherthansupervisedapproachbecause,aslikehuman

recognitionandidentificationofobjectcomeby

observationandnotbythenotionofbeingtold,sodoes

futuresystemwillbeabletorecognizeunsupervised

elements.

3.2.2.Superviseddeeplearningmodel

Thereisasignificantincreaseofstudiesrelatedtodeep

learninginrecentyearswhetheritappliedfor

classification,modelingtexture,regression,information

retrieval,robotics,faultdiagnosisandmanymorewith

deepCNNorRNN.Manyreasoncanbelistedforthat

matterbuthereweonlynominatedtheaccesstodata,the

accesstomaterialsandthecomputationalabilities.

Untilnow,CNNisconsideredasoneofthemosteffective

andpowerfulsolutionforactionrecognition,ithasshown

greatperformanceindifferentapplicationsandfor

differenttaskslikeHAR,imageclassificationoreven

handwritingrecognition[120],[121],[122],[123].The

ConvolutionalNeuralNetworkconsistofdividingthe

inputintomultiplelayerssuch:convolutionallayers,

RectifierLinearUnits,poolinglayersandfullyconnected

layer,butintheoryonlythreecategoriesarecited:

convolutionlayers,subsamplinglayers,andfull

connectionlayersasinFigure9.

[124]elongated[125]workonbyapplyingthetechnicon

videousingfixedsceneframeasdatamatrixinput,

unfortunatelytheoutcomeperformancewasnotuseful.

Later[126]usingtwo-streamconvolutionalneuralnetwork

toresolvetheissuesfacedby[124]bycombininglate

fusionandthemethodproducegreatresult.However,due

tocomputationalcomplexitytwostreamtechnicisnot


recommendedorsuitableforrealtimesystemapplication.

Ingeneral,deeplearningdealwithretrieveinformation

expressintwodimension,butsomeapplicationretrieves

three-dimensiondataassuchrequire3Dconvolution

neuralnetwork.[127],[128]worksapplied3DCNN,the

firstperformancereachingahighsensitivityof93.16%

withaverageof2.74falsepositivesfordetectionand

recognitionofmicrobleedsinmagneticresonanceimages

andthesecondoneinspiredbyVoxNetand3DShapeNets

applied3DCNNontheModelNetdatasettoacquireand

recognizealsoclassifythedata.

Therestillexistissuessuchascomputationalcomplexity

ortheamountofrequiredatatocreatethe100%perfect

systemforHAR.Tofollowthepath[129]proposea

variationofCNNcalledFactorizedSpatiotemporal

Convolutionalnetwork.Theapproachfactorizesstandard

three-dimensionconvolutionalneuralnetworkmodelas

twodimensionspatialkernelstoreducetheamountof

learnparametersandthecomplexityofthenetwork,and

anotherstudywasmadeby[129]stillusing3DCNN.And

theapplicationofthisapproachshowsthattheapproachis

betterforspatiotemporalinformationcompareto2Ddata

andapplywithlinearclassifierexceedthestate-of-the-art

methods.Otherresearchershavemixedtraditional

approachandCNN,arguingthatitimprovesperformance

[130].Anothervariationofconvolutionalneuralnetwork

wasintroducebyadjustingpre-trainedconvolutional

neuralnetwork,extractingatframelevelfeature,applying

PCA,SVMin[131].

Anothermethodofsuperviseddeeplearningissemantic

basedfeaturelikeposeisalsouseincomputervisionto

describeanaction[132],[133].Descriptorsofthismethod

arebasedonthemotionandappearanceinformation,from

jointhumanbodyparts.Experimentaloftheapproach

wereevaluatedonBerkeleyMHADdataset,onJHMDB

andMPIICookingdatasets.Theoutcomeresultshown

53

[theprevious]


betterperformancethatsomeotherstechnics.[134]utilize

contextualinformationandadaptedtheregionbasedCNN

forclassification,and[135]addressthetaskofsemantic

imagesegmentation.[136]proposeamethodtodealwith

multiviewdatasourcebylearningfrom2Ddense

trajectoriesandrenderssynthetic3Dmodelandoncemore

itsshownhowdeeplearningapproachisfarbetterin

performancecomparetotraditionalapproach.Also,

learningbasedapproachusetheadvantagesoflearning

featurefromrawdataorunlabeleddata.

However,learningbasedapproachhavesomelimitations

suchastheamountofdataneededfortraining.Tosolve

theproblem[137]proposedin2016adatasetcompose

of200actionor849hoursofvideotohelpapplylearning

baseapproachalgorithm.Infact,recentstatisticshows

increaseinterestincomputervision,inhumanaction

recognitionandinconvolutionalneuralnetworkas

process,whichmeanswewillnotbesurpriseifsome

researchersfoundinnearfuturebreakthroughalgorithms

foractionrecognition.

4.Discussion

Alowlevelfeatureit’saportionofanimage,thatallow

tosimplifythecomplexityofanimagebygetting

propertiesrelatedonlytoacertainpattern.Assuch,the

inputmaybeanwithMvalueontheXaxis,Nvalueon

theYaxisand3thecolorpropertyRGB.Whichleadusto

valueofalowlevelfeatureentity.Extractingsucha

valuableinformationit’soneofthefirsttaskfacedby

systemincomputervisioninparticular.

Ashuman,fromthedaywearebornwedealeasilywith

imagesandthenaturalinstinctalwaystaketheleadin

categorizingtheenvironmentsurroundingus[138],sodoes

onemaywonderifacomputercanalsoadaptand

recognizeentityfromimages.Inpastyearstheanswerto

thequestionwillbenobutwithrecentresearchdiscovery

ithasbeenmadepossibleforcomputertoobtain,readand

55

54

[model ],


understandanimageassuchclassifyit.Toreachthegoal,

computerdoesnotreadtheinputasanallonesingle

element,thattheimportanceofdeviseasithasbeen

proven“Dividetobetterreign”,sothesystemwillreduce

anddividetheinputintomultiplesmallestentitiespossible

andtreateachnewasasingleelement.Commonlythere

aretreepropertyusedduringtheextractionprocess:color,

shapeandtexture[25],[27],[139]–[143].Andthe

performanceofthesystemisarelatedtoagoodchoiceof

featureandextractionmethod.

Inregardtothepreviousnotedpropertiesandbasedon

theirimportanceofextractingfeatureincomputervisionor

predictivemodelingandprobabilisticdatamining;There

stillchallengesthatneedsolutiontobefoundfor,to

completelycaptureandclassifyHumanActivityorHuman

Behavior,giventhecomplexityofhumanactionor

reactiontothereality,environment,etc.Advancehave

beenmadeincomputervisionbyapplyingdifferent

technicsandmethodovermultiplepropertiestoovercame

thechallenge.Someresearcherhaveappliedfacefeature

andspeechfeature[144]–[146],ortextfeatureandspeech

feature[147],[148],orthecombinationofmultiplefeature

(posture,speech,face,etc.)[149]–[151].Wehave

acknowledgedthefactthatusingmultiplefeatureincrease

theperformanceandtheaccuracybutatthesametime

increasethecomplexity.

Wehavetonoticethatallapproachesrepresentactivities

(action,behavior)asaframesequenceintimeandspace

locationswhetherithasbeenextractedfrommoving

entitiesorfiximagesandusedifferentclassification

models.[70]proposedtotransferfromonedatasetto

anotherdatasetafterincorporatinginformationgoingfrom

humaninteractiontoobjectinteraction.Hiddenconditional

randomfieldsfrommotionfeaturesinputwitha

combinationoflarge-scaleglobalfeaturesandlocalpatch

featurestodistinguishvariousactionsin[66].[152]and


[153]usedRandomforestsforactionrepresentation

respectivelyclassifyandlocalizehumanactionsinvideo

usingaHoughtransformvotingframework,and,a

vocabularyoflocalappearance-motionfeaturesandfast

approximatesearchinalargenumberoftrees.Areal-time

algorithmtodescribeinteractionswasproposedtheearly

twothousand,withacapacitytodetectandtrack

movements,creatingafeaturevectorgivenasinputto

HiddenMarkovModelforclassificationthatdescribesthe

motion[154].Complexactivityrecognitionwithtwo

sequentialsub-tasksincreasinggranularitylevels,applying

firstlyhuman-to-objectinteractiontechniques,then

context-basedinformationtotrainaconditionalrandom

fieldmodelwasproposedby[155].

Self-organizingmapstolearnbodyposturewithfuzzy

distances,fortimeinvariantactionrepresentation,the

algorithmisbasedonmultilayerperceptionswasusedby

[156].Localoccupancypatternandactionletensemble

modelwasproposedby[157]inwhichtheauthorsfirst

capturedthehumanbodypartsthencapturedintraclass

variationstoallowerrorhandlingindepthcamera.

Interactionbetweenactivityandscenetorecognizehuman

activitiesusing3Dskeletalrepresentationandgeometric

representationofthescenes.and,appearance-to-pose

mappingforactivityproblem.Gaussianprocessesasan

onlineprobabilisticfeatureusingsparserepresentationto

reducecomplexityincomputationwasappliedby[75]and

[158]usedsparserepresentationofskeletaldatawith

dissimilarityspacetorecognizebehaviororactivities.

Todescribeanevent,anactionwithmultiplefeatures

containingmeaningfulinformationcanbeconsideredto

achievethegoal.Asinpreviousparagraphmoreandmore

papershavebeenpublishedincomputervisionfield.And

forthesearticlestheyaremostlybasedonfeaturefusion

whichcanwhetherbeearlyfusionorlatefusion.Using

onekeyelementisgoodinthefunctionalityofanything


butusingmultiplekeyfeatureincreasedramaticallythe

chanceofbetterperformanceandgreataccuracyinthe

outcome.And,sousingmultiplefeatureincreasesthe

performancerecognition.

[150]proposedanovelmethodbyapplyingKernel

CanonicalCorrelationAnalysisandMulti-viewHidden

ConditionalRandomFieldsforHumanActivity

Recognitiontodetectandinterpretagreementand

disagreementnotionfromnonverbalaudio-visualcues

data.However,theproposedmethodsfrompreviouspaper

facechallengingdifficultywhenclassifying,and,

sometimestheaudiosamplegetlostintheprocedure.In

theotherhand[145]appliedmultiplehierarchical

classificationmodelstakenfromthepropertiesofNN

(Neuralnetwork)forrecognizingaudioemotionalfeature

aswellasvisualemotionalfeatureinsteadoflabels.

[159]usedtheHollywoodHumanActionsdatasetandby

takingadvantagesofvideosequencestoproposeaHAR

system,theresearcherextractfirstlyvisualfeaturebefore

extractingaudiofeatureandfinallyapplysupportvector

machinesclassifiers[160]usedaudioandvisualcuesand

applyseveralclassifierstoseparatetheinformationand

categorizewhetheritanaudioorvisualcontentusing

spatio-temporalfeaturestoallowtheextendspatio-

temporalbagoffeatureswithgeometry,and,applykernel-

basedlearningtechniques.Similarly,[161]withpreviously

usingmultiplekernellearningalgorithmforbetter

estimation,appliedfuzzytechniquesandputtogether

supportvectormachinesclassifiersoutput.

Buthumanactionoractivityarecomplexandinfluenceby

themood,theemotions,theinteractions,etc.Thisexplain

thecomplexityincomputervisionfield,andchoosingthe

exactparametersorpropertiesorproperfeaturesusefulfor

HARbecomeakeycomponenttoadvanceinrecognizing

andpredicthumanbehaviorasin[162].Someresearchers

focusonaudiodata,such[163]wheretheauthorsusing


thecanonicalcorrelationanalysis(CCA)proposeanother

wayofusingandinterprethumanbehaviorapplytoleap

featureandspeechsynchronization.Whereas[164]use

canonicalkernelandSpaceVectorMachine(SVM)in

learningandclassifyingimages.Otherresearcherstook

advantageoffacialexpressionsandfacialactioncoding

system(FACS)[165],todescribesalleventualityof

behaviorwiththecombinationofactionunits(AU),and

audioinformationtoidentifytheiremotionofactors,

followingthepathwitharealtime3Dsystem[166].[167]

appliedConditionalrandomnetworktosolveatacertain

pointthechallengingtaskofrecognizingandclassifying

humanbehaviorbyselectingtreemainclassesfriendly,

aggressiveorneutralemployingconditionalrandomfield

method,theauthorappliedtheremethodoverthedataset

obtainfromspeechintheGreekparliament.

HierarchicalDirichletProcesswasappliedby[168]which

allowthecreationofmultiplehiddenstateandused

Markov-chainMonteCarloforsamplingthedatawhich

gavetheopportunitytoidentifyandclassifybehaviorin

twotypeagreementordisagreementfromnon-verbal

featuresmodelandcues.[169]paperstudythemimicry

duringhumaninteractions,withanoticeonthefactthat

firstandformostthesesignalswherestudyby

psychologistbeforebeingusedandclassifybyresearcher

incomputervision,so,theauthorsasoneofthefirstinthe

firstinthefieldtoappliedcomputationtechniquesonsuch

typeoffeaturetocapturecontinuousdetectionofhuman

behavioralmimicry.And[170]appliedpsychology[171]

notioncouplingwithcomputationalmethodtoclassify

humanactivitybydecomposinganactivityanduseeach

sectionoftheactionasfeatureorinput.Andcomparing

theresulttotheHiddenMarkovModelclassifierthe

authorfoundasignificantincreasingimprovement.

Whetherit’sinahealthcaresystems[172],insecurity

[173],orinautonomousprediction[174]computervision

55

[2016 ],


willkeepattractingresearcherbecause,thecomplexityof

humanbehaviorkeepputtingabigdifferencebetween

humanandmachines,Although,therearecurrently

improvementinmachinelearningwhileapplyingtechnics

tounderstandhumanbehavioritstillachallengetofully

understandaccuratelyhowhumanbehaveorcouldbehave.

Assuch,selectingexactusefulandimportantkeyelement

forinterpretationofhumanactivitydemureanissue.

Eventhough[175],[176]trytocharacterizeorclassify

humanactionitstillnotsufficient,uptodateonly

combinationofmultipledifferentfeaturescanalmosttryto

describehumanbehavior.Nonetheless,complex

computationalclassificationistheconsequentofhighlevel

feature,assuchthereisnotenoughresearchappliedwith

theseproperties.Also,Learning-basedapproacheshave

beencategorizedintodictionarylearningandsupervised

approach,geneticapproachaswellasunsuperviseddeep

learningapproach.However,thecategorizationboundary

mayoverlap,assuchitisnotstrictboundarylimit.

5.Exampleofsomepublicdatasets

Manypublicdomaindatasethavebeenmadeavailableto

all,bellowisanon-exhaustivelistofsomeofthedata

source.

Commonwellknownpublicdataset

5.1.BerkeleyMHADdataset[http://tele-immersion.citris-

uc.org/berkeley_mhad#about]

GeneratedaspartoftheNSFfundedproject(#0941382),

CDI-TypeI:CollaborativeResearch:ABio-Inspired

ApproachtoRecognitionofHumanMovementsand

MovementStyles.TheBerkeleyMultimodalHuman

ActionDatabase(MHAD)contains11actionsperformed

by7malesand5femalesubjectsintherange23-30years

ofageexceptforoneelderlysubjectperformingatotalof

660actionssuchasjumpinginplace,jumpingjacks,

throwing,wavinghands,clappinghands,sitdown,stand

up[177].


With[178]applyingmeta-cognitiveradialbasis

functionnetworkanditsprojectionbasedlearning

algorithmtoachieveover97%recognitionaccuracy.

5.2.URFDdataset

CreatedbyMichalKępskifromInterdisciplinaryCentre

forComputationalModellingattheUniversityofRzeszow

inDecember2014.Thedatasetconsistsof70sequenceof

30falls+40activitiesofdailylivingrecordwith2

MicrosoftKinectcameras.

[179]and[180]bothappliedtheirmethodontheURFD

datasetcorrespondinglywithstatisticalcontrolchartand

neuralnetworkforclassificationandimprovingHAR

systemoutput,and,strategyforfalleventsdetection.

5.3.UTDMHADdataset

CollectedaspartofaresearchonHARusingfusionof

depthandinertialsensordata,thedatasetwascreatedat

theDepartmentofElectricalEngineering,Universityof

TexasatDallas.Consistingof300actions(wave,throw,

catch,draw,etc.)performbysixactors(3malesand3

females)withdepthsequencesizeof424x512xnumberof

frame.

[181]methodappliedSpatio-TemporalInterestPointto

detectchanges.Then,extractappearanceandmotion

featuresinterestpointsusingtheHOGandHistogramof

OpticalFlow(HOF)descriptors.Tofinallymatchthe

SVMbyBOWofthespace-timeinterestpointdescriptor.

[182]encodespatio-temporalinformationofskeleton

sequenceswithconvNets.

5.4.WeizmannHumanActionDataset

DatasetintroducedbytheWeizmanninstituteofScience

in2005.Thisdatasetconsistsof10simpleactionswith

staticbackground:walk,run,skip,jack,jumpforwardor

jump,jumpinplaceorpjump,gallop-sidewaysorside,

bend,wave1,andwave2.Consistingof90videosof

Resolution=180x144ofStaticcamera.Thedatasethas

homogeneousoutdoorbackgrounds.Alsoprovides


irregularversions(withdog,occluded,withbag,etc.)for

robustnessexperiment.Someresearchhasshownan

accuracyofhundredpercentwhenappliedonthis

dataset[52].

6.Conclusion.

Thissurveyreview differentapproachesusedin

HumanActionRecognition(HAR)orHumanBehavior

Recognitionalongwithtechnicsandmethodapplied.

Focusingincategorizingtraditionalrepresentationbased

andlearningbaserepresentation.Despitetheenormous

amountofpublishedpapers,methodologiesemployor

technicsapplied tocollectandprocessthedata,there

stillchallengingproblemwhetherintheinterpretationor

labelingofaction.Human cansometimesmake

action whichdoesnotexactlymeanswhatitlookslike

butinsteadmeaning differentlyaccordingtothemood

(e.g.puttingbothhand behindtheneck)orothers

reasons.Assuch,therestillwindowforimprovementin

computervisionfield.Thatbeingsaid ,theaccuracyand

performancearefactorsofusedfeatures butthatalso

implythatthesystembecome morecomplex ifmore

features, areextractandmoremethodareappliedtoit.

Nextstepofthisdocumentwillbetogivemore

documents andgiveevenmoredetailsonthefounding

sofarincomputervisionandhelpnewresearcherstohave

adocumentthat reflect everythingthatneed tobe

knownbeforejumpingintothefieldandhavetheperfect

knowledgefoundation.Theresearchwillfacilitatebetter

judgement inwheredoesthenotionofHumanActivity

recognitioncomefrom,whatisit currentstateandfinal

howcanfutureresearchersimproveandsolvedifferent

challenges.

56

57

58

60 59

61

62

63

64

65 66

67

68 69

70 71 72

73

74






56

[review reviews]→

57

Repetitiveword:applied

58

[ ]59

Repetitiveword:action60

[anaction or theaction]61

Repetitiveword:meaning62

[hand hands]→

63

Passivevoice64

[features ],65

[become becomes]→66

Overusedword:complex67

[features ],

Human Ahuman→


68

Repetitiveword:documents69

Possiblyconfusedword:founding70

Repetitiveword:document71

[reflect reflects]→72

[need needs]→

73

[judgement judgment]→

74

[it its]→

Date post:	19-Aug-2018
Category:	Documents
Upload:	lecong
View:	218 times
Download:	0 times

Grammarly - example.celunwen.comexample.celunwen.com/grammarly ·...

Documents