简介:Themulticoreevolutionhasstimulatedrenewedinterestsinscalingupapplicationsonshared-memorymultiprocessors,significantlyimprovingthescalabilityofmanyapplications.Butthescalabilityislimitedwithinasinglenode;thereforeprogrammersstillhavetoredesignapplicationstoscaleoutovermultiplenodes.Thispaperrevisitsthedesignandimplementationofdistributedsharedmemory(DSM)asawaytoscaleoutapplicationsoptimizedfornon-uniformmemoryaccess(NUMA)architectureoverawell-connectedcluster.ThispaperpresentsMAGI,anefficientDSMsystemthatprovidesatransparentsharedaddressspacewithscalableperformanceonaclusterwithfastnetworkinterfaces.MAGIisuniqueinthatitpresentsaNUMAabstractiontofullyharnessthemulticoreresourcesineachnodethroughhierarchicalsynchronizationandmemorymanagement.MAGIalsoexploitsthememoryaccesspatternsofbig-dataapplicationsandleveragesasetofoptimizationsforremotedirectmemoryaccess(RDMA)toreducethenumberofpagefaultsandthecostofthecoherenceprotocol.MAGIhasbeenimplementedasauser-spacelibrarywithpthread-compatibleinterfacesandcanrunexistingmultithreadedapplicationswithminimizedmodifications.WedeployedMAGIoveran8-nodeRDMA-enabledcluster.ExperimentalevaluationshowsthatMAGIachievesupto9.25:4speedupcomparedwithanunoptimizedimplementation,leadingtoasealableperformanceforlarge-scaledata-intensiveapplications.