Fig.11.Sparseweightsafterbasisprojection[50]. researchonexploringtheimpactofbitwidthonaccuracy[ 51 ]. Infact,recentlycommercialhardwareforDNNreportedly support8-bitintegeroperations[ 52 ].Asbitwidthscanvaryby layer,hardwareoptimizationshavebeenexploredtoexploit thereducedbitwidthfor2.56 × energysavings[ 53 ]or2.24 × increaseinthroughput[ 54 ]comparedtoa16-bitfxedpoint implementation.Withmoresignifcantchangestothenetwork, itispossibletoreducebitwidthdownto1-bitforeither weights[ 55 ]orbothweightsandactivations[ 56 , 57 ]atthecost ofreducedaccuracy.Theimpactof1-bitweightsonhardware isexploredin[58]. B.Sparsity ForSVMclassifcation,theweightscanbeprojectedonto abasissuchthattheresultingweightsaresparsefora2 × reductioninnumberofmultiplications[ 50 ](Fig.11).For featureextraction,theinputimagecanbemadesparsebypre- processingfora24%reductioninpowerconsumption[48]. ForDNNs,thenumberofMACsandweightscanbereduced byremovingweightsthroughaprocesscalledpruning.This wasfrstexploredin[ 59 ]whereweightswithminimalimpact ontheoutputwereremoved.In[ 60 ],pruningisappliedto modernDNNsbyremovingsmallweights.However,removing weightsdoesnotnecessarilyleadtolowerenergy.Accordingly, in[ 61 ]weightsareremovedbasedonanenergy-modelto directlyminimizeenergyconsumption.Thetoolusedforenergy modelingcanbefoundat[62]. Specializedhardwarehasbeenproposedin[ 47 , 50 , 63 , 64 ]toexploitsparseweightsforincreasedspeedorreduced energyconsumption.InEyeriss[ 47 ],theprocessingelements aredesignedtoskipreadsandMACswhentheinputsare zero,resultingina45%energyreduction.In[ 50 ],byusing specializedhardwaretoavoidsparseweights,theenergyand storagecostarereducedby43%and34%,respectively. C.Compression Datamovementandstorageareimportantfactorsinboth energyandcost.Featureextractioncanresultinsparsedata (e.g.,gradientinHOGandReLUinDNN)andtheweights usedinclassifcationcanalsobemadesparsebypruning.As aresult,compressioncanbeappliedtoexploitdatastatistics toreducedatamovementandstoragecost. Variousformsoflightweightcompressionhavebeenex- ploredtoreducedatamovement.Losslesscompressioncanbe usedtoreducethetransferofdataonandoffchip[ 11 , 53 , 64 ]. Simplerun-lengthcodingoftheactivationsin[ 65 ]provides upto1.9 × bandwidthreduction,whichiswithin5-10%ofthe theoreticalentropylimit.Lossycompressionsuchasvector quantizationcanalsobeusedonfeaturevectors[ 50 ]and weights[ 8 , 12 , 66 ]suchthattheycanbestoredon-chipatlow cost.Generally,thecostofthecompression/decompressionis ontheorderofafewthousandkgateswithminimalenergy overhead.Inthelossycompressioncase,itisalsoimportant toevaluatetheimpactonperformanceaccuracy.VII.O PPORTUNITIESIN M IXED -S IGNAL C IRCUITS Mostofthedatamovementisinbetweenthememory andprocessingelement(PE),andalsothesensorandPE. Inthissection,wediscusshowthisisaddressedusingmixed- signalcircuitdesign.However,circuitnon-idealitiesshould alsobefactoredintothealgorithmdesign;thesecircuitscan beneftfromthereducedprecisionalgorithmsdiscussedin SectionVI.Inaddition,sincethetrainingoftenoccursinthe digitaldomain,theADCandDACconversionoverheadshould alsobeaccountedforwhenevaluatingthesystem. Whilespatialarchitecturesbringthememoryclosertothe computation(i.e.,intothePE),therehavealsobeeneffortsto integratethecomputationintothememoryitself.Forinstance, in[ 67 ]theclassifcationisembeddedintheSRAM.Specifcally, thewordline(WL)isdrivenbya5-bitfeaturevectorusing aDAC,whilethebit-cellsstorethebinaryweights ± 1 .The bit-cellcurrentiseffectivelyaproductofthevalueofthe featurevectorandthevalueoftheweightstoredinthebit-cell; thecurrentsfromthecolumnareaddedtogethertodischarge thebitline(BLorBLB).Acomparatoristhenusedtocompare theresultingdotproducttoathreshold,specifcallysign thresholdingofthedifferentialbitlines.Duetothevariations inthebitcell,thisisconsideredaweakclassifer,andboosting isneededtocombinetheweakclassiferstoformastrong classifer[ 68 ].Thisapproachgives12 × energysavingsover readingthe1-bitweightsfromtheSRAM. Recentworkhasalsoexploredtheuseofmixed-signal circuitstoreducethecomputationcostoftheMAC.Itwas shownin[ 69 ]thatperformingtheMACusingswitched capacitorscanbemoreenergy-effcientthandigitalcircuits despiteADCandDACconversionoverhead.Accordingly, thematrixmultiplicationcanbeintegratedintotheADCas demonstratedin[ 70 ],wherethemostsignifcantbitsofthe multiplicationsforAdaboostclassifcationareperformedusing switchedcapacitorsinan8-bitsuccessiveapproximationformat. Thisisextendedin[ 71 ]tonotonlyperformmultiplications, butalsotheaccumulationintheanalogdomain.Itisassumed that3-bitsand6-bitsaresuffcienttorepresenttheweights andinputvectors,respectively.Thisenablesthecomputation tomoveclosertothesensorandreducesthenumberofADC conversionsby21 × .