Neurobayes korttids systematisk handel


6-DHLBigData (1).pdf - STORA DATA I LOGISTIK A DHL. Detta är slutet på förhandsvisningen. Registrera dig för att få tillgång till resten av dokumentet. Oformaterad textförhandsgranskning: BIG DATA I LOGISTIK En DHL-perspektiv på hur man går bortom hype December 2013 Drivs av Solutions amp Innovation: Trendforskning PUBLISHER DHL Customer Solutions förstärkning Innovation Represented av Martin Wegner Vice President Solutions amp Innovation 53844 Troisdorf, Tyskland PROJEKTDIREKTÖR Dr Markus Kckelhaus Solutions amp Innovation, DHL PROJEKTSTYRNING OCH REDOVISNINGSBYRÅ Katrin Zeiler Solutions amp Innovation, DHL I SAMARBETE MED: MYNDIGHETER Martin Jeske, Moritz Grner, Frank Wei Preface PREFACE Stora data och logistik är gjorda för varandra, och idag logistikindustrin positionerar sig för att sätta denna mängd information till bättre användning. Potentialen för Big Data i logistikbranschen har redan blivit uppmärksammad i den hyllade DHL Logistics Trend Radar. Denna övergripande studie är ett dynamiskt levande dokument som är utformat för att hjälpa organisationer att skapa nya strategier och utveckla mer kraftfulla projekt och innovationer. Big Data har mycket att erbjuda logistikverksamheten. Sofistikerad dataanalys kan konsolidera denna traditionellt fragmenterade sektor och dessa nya möjligheter sätter logistikleverantörer i pole position som sökmotorer i den fysiska världen. Det har utvecklats gemensamt med T-Systems och experterna från Detecon Consulting. Forskargruppen har kombinerat erfarenheter från världsklass från både logistikområdet och informationshanteringsdomänen. Hos oss kan vi flytta från en djup brunn av data till djupt utnyttjande. Vi hoppas att Big Data in Logistics ger dig några kraftfulla nya perspektiv och idéer. Tack för att du valde att gå med på denna Big Data-resa tillsammans kan vi alla dra nytta av en ny modell för samarbete och samarbete inom logistikbranschen. Hos oss kan vi använda information för att förbättra operativ effektivitet och kundupplevelse och skapa användbara nya affärsmodeller Med vänliga hälsningar, För att skärpa fokuset frågar den trendrapport du läser nu de stora Big Data-frågorna: Big Data är en relativt outnyttjad tillgång som företag kan utnyttja när de antar en förändring av tankegång och tillämpa rätt borrteknik. Det går också långt bortom buzz-ord att erbjuda verkliga användarfall, avslöjar vad som händer nu och vad som sannolikt kommer att hända i framtiden. Denna trendrapport börjar med en introduktion till Big Data koncept och innebörd, ger exempel från många olika branscher och presenterar sedan logistikanvändningsfall. Martin Wegner Dr. Markus Kckelhaus 1 2 Innehållsförteckning Förord. 1 1 Förstå stora data. 3 2 Stora data bästa praxis över hela branschen. 6 2.1 Driftseffektivitet. 7 2.2 Kundupplevelse. 10 2.3 Nya affärsmodeller. 13 3 Stora data i logistik. 15 3.1 Logistik som datadrivet företag. 15 3.2 Användningsfel Driftseffektivitet. 18 3.3 Användarfall Kundreferens. 22 3.4 Använda fall Nya affärsmodeller. 25 3.5 Succesfaktorer för implementering av Big Data Analytics. 27 Outlook. 29 Förstå stora data 1 Förstå stora data Den fortsatta framgången med internetkraftverk som Amazon, Google, Facebook och eBay ger bevis för en fjärde produktionsfaktor i dagens hyperlänkade värld. Förutom resurser, arbete och kapital finns det ingen tvekan om att informationen har blivit en nödvändig sak i universum1 tack vare tillväxten av sociala medier, allmänt tillgänglig nätverksåtkomst och det stadigt ökande antalet smarta anslutna enheter. Dagens digitala universum expanderar med en hastighet som fördubblar datavolymen vartannat år2 (se figur 1). element i konkurrenskraftig differentiering. Företagen i alla branscher gör ansträngningar för att handla gut-känslan för korrekt data-driven insikt för att uppnå effektiv beslutsfattande. Oavsett problemet att avgöra förväntade försäljningsvolymer, kundproduktpreferenser, optimerade arbetsscheman är det data som nu har kraft att hjälpa företag att lyckas. Precis som en strävan efter olja, med Big Data krävs det utbildad borrning för att avslöja en brunn av värdefull information. Varför är sökningen efter meningsfull information så komplex Den beror på den enorma tillväxten av tillgänglig data inom företag och på det offentliga Internet. Bakom 2008 översteg antalet tillgängliga digitala informationsbitar (bitar) antalet stjärnor. Förutom denna exponentiella volymtillväxt har två ytterligare egenskaper hos data väsentligt förändrats. För det första häller data. Den massiva användningen av anslutna enheter som bilar, smartphones, RFID-läsare, webbkameror och sensornätverk ger ett stort antal autonoma datakällor. Enheter som dessa genererar kontinuerligt dataströmmar utan mänskligt ingripande, vilket ökar hastigheten för dataaggregering och bearbetning. För det andra är data extremt varierad. Den övervägande delen av nyskapade data härrör från kamerabilder, video och övervakningsmaterial, blogginlägg, forumdiskussioner och e-handelskataloger. Alla dessa ostrukturerade datakällor bidrar till en mycket större mängd datatyper. 40.000 30.000 (Exabytes) 20.000 10.000 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Figur 1: Exponentiell datautväxt mellan 2010 och 2020 Källa: IDCs Digital Universe Study, sponsrad av EMC, december 2012 Det olikartade och explosiva digitala universum, IDC , 2008 1 Digitala universum 2020: Stora data, större digitala skuggor och största tillväxten i Fjärran Östern, IDC, sponsrad av EMC, december 2012 2 3 4 Förstå stora data Telefonica måste svara på resan för att så småningom lansera sin smarta Stegtjänsten var: Vilket extra värde bär den befintliga massan av data och hur kan vi utnyttja den? Medan konsumenterna är bekanta med att fatta informationsdriven dagliga livsbeslut som inköp, färdplanering eller att hitta en plats att äta, släpar företagen sig . För att utnyttja sina informationsmedel måste företagen framför allt ändra sin inställning till hur man använder data. Tidigare användes dataanalys för att bekräfta beslut som redan hade tagits. Vad som krävs är en kulturell förändring. Företagen måste övergå till en framåtblickande analysmetod som genererar ny insikt och bättre svar. Denna förändring i tankegång innebär också en ny kvalitet på experiment, samarbete och öppenhet över företaget. Volym, hastighet och variation (3V) är denna stora data I litteraturen har 3Vs diskuterats allmänt som egenskaperna hos Big Data analytics. Men det är mycket mer att överväga om företag vill utnyttja information som produktionsfaktor och stärka sin konkurrenskraft. Vad som krävs är ett skift i tankesätt och tillämpning av rätt borrteknik. Att bli ett informationsdrivet företag När den globala telekommunikationsföretaget Telefonica började utforska informationsdrivna affärsmodeller kunde företaget redan behandla hundratals miljoner dataposter från sitt mobilnät för att räkna och fakturera telefonsamtal och datatjänster . Således var hantering av en enorm datavolym vid hög hastighet inte huvudproblemet. Istället är nyckelfrågan Tillsammans med denna övergång är en annan förutsättning för att bli en informationsdriven verksamhet att skapa en specifik uppsättning datavetenskapliga färdigheter. Detta innefattar att mastera både ett brett spektrum av analytiska förfaranden och ha en övergripande förståelse för verksamheten. Och företag måste ta nya tekniska lösningar för att utforska information i en högre detalj och hastighet. Störande paradigmer för databehandling, som i minnesdatabaser och slutligen konsekventa beräkningsmodeller, lovar att lösa storskaliga dataanalysproblem till en ekonomiskt genomförbar kostnad. Varje företag äger redan mycket information. Men de flesta av deras data måste raffineras först då kan det omvandlas till affärsvärde. Med Big Data analytics kan företagen uppnå den attityd, kompetens och teknik som krävs för att bli ett data raffinaderi och skapa ytterligare värde från sina informations tillgångar. Förstå Big Data Logistics och Big Data är en perfekt match Logistik sektorn är idealiskt placerad för att dra nytta av de tekniska och metodiska framstegen i Big Data. En stark aning om att datahantering alltid har varit avgörande för disciplinen är att logiken i sina gamla grekiska rötter betyder praktisk aritmetik.3 Idag hanterar logistikleverantörer ett massivt flöde av varor och skapar samtidigt stora dataset. För miljontals leveranser varje dag spåras ursprung och destination, storlek, vikt, innehåll och plats över globala leveransnät. Men sparar denna data spårning fullt ut värde Sannolikt inte. Troligtvis finns det stor oanvänd potential för att förbättra operativ effektivitet och kundupplevelse och skapa användbara nya affärsmodeller. Tänk på att fördelarna med att integrera leverantörskedjedataströmmar från flera logistikleverantörer kan eliminera nuvarande fragmentering av marknaden, vilket möjliggör kraftfullt nytt samarbete och tjänster. Många leverantörer inser att Big Data är en gamechanging trend för logistikindustrin. I en nyligen genomförd studie om utbudskedjetrender uppgav sextio procent av respondenterna att de planerar att investera i Big Data analytics inom de närmaste fem åren4 (se figur 2 nedan). Emellertid börjar strävan efter konkurrensfördel med identifieringen av starka Big Data-användningsfall. I det här dokumentet tittar vi först på organisationer som framgångsrikt har använt Big Data analytics inom ramen för sina egna branscher. Sedan presenterar vi ett antal användningsfall som är specifika för logistik sektorn. Sociala nätverk (internt B2B) Business Analytics-plattformar som en tjänst Idag Fem års nätverk Redesign SoftwareSystems Product Lifecycle Management 0 10 20 30 40 50 60 70 Figur 2: Aktuella och planerade investeringsområden för Big Data-teknik. Källa: Trends and Strategies in Logistics and Supply Chain Management, sid. 51, BVL International, 2013 Definition och utveckling, Logistik Baden-Wrttemberg, jfr. logistik-bw. deDefinition.411M52087573ab0.0.html 3 Trends and Strategies in Logistics and Supply Chain Management, BVL International, 2013 4 5 6 Stora data bästa praxis över industrin 2 STORA DATA BÄSTA PRAKTISKA AKTUELLA INDUSTRIER Att kapitalisera på värdet av informationstillgångar är ett nytt strategiskt mål för de flesta företag och organisationer. Förutom Internet-kraftverk som framgångsrikt har etablerat informationsdrivna affärsmodeller är företag i andra sektorer typiskt i tidigt skede av att utforska hur de kan dra nytta av deras växande stapel av data och sätta dessa data till god användning. Enligt den senaste forskningen5, adresserar endast 14 europeiska företag redan Big Data analytics som en del av deras strategiska planering (se figur 3). Men nästan hälften av dessa företag räknar med en årlig datautveckling i organisationen över 25 år. Den första och mest uppenbara är operativ effektivitet. I det här fallet används data för att fatta bättre beslut, optimera resursförbrukningen och förbättra processkvalitet och prestanda. Det är vad automatiserad databehandling alltid har tillhandahållit, men med en förbättrad uppsättning funktioner. Den andra dimensionen är kundupplevelse. Vanliga mål är att öka kundlojaliteten, utföra exakt kundsegmentering och optimera kundservice. Inklusive de stora dataförslagen på det offentliga Internetet driver Big Data CRM-tekniker till nästa utvecklingsstadium. Det möjliggör också nya affärsmodeller som kompletterar intäktsströmmar från befintliga produkter och att skapa ytterligare intäkter från helt nya (data) produkter. Stora datavärde Dimensioner När företag antar Big Data som en del av sin affärsstrategi är den första frågan till ytan vanligtvis vilken typ av värde Big Data kommer att driva Kommer det att bidra till toppen eller botten eller kommer det att finnas en icke-ekonomisk förare Ur en värdesynpunkt faller tillämpningen av Big Data analytics i en av tre dimensioner (se Figur 4). För var och en av dessa stora data-värdedimensioner finns det växande antal övertygande applikationer. Dessa visar upp affärspotentialen att tjäna pengar på ett brett spektrum av vertikala marknader. I följande avsnitt presenterar vi flera användningsfall för att illustrera hur tidiga flyttare har utnyttjat datakällor med innovativa medel och följaktligen skapat betydande tilläggsvärde. Har ditt företag definierat en stor datastrategi har ditt företag definierat en stor datastrategi nr 63 23 Planerad Ja 14 Figur 3: Stora data som ett strategiskt mål i europeiska företag Statistik från BARC-studie (N 273) Källa: Stora dataundersökningen Europa, BARC Februari 2013, s.17 Stora dataundersökningar Europa, BARC-institutet, februari 2013 5 Stora data bästa praxis över hela verksamheten Operativ operativ effektivitet Kundkundserfarenhet Använd data till: för att: Använda data Utnyttja: Använd data för: Öka kunden Öka kunderna lloyalty oyalty och retention retention Utföra exakt Utföra kund kundrecrecise segmentering och målning segmentering och inriktning Optimera kund Optimera interaktion interaktionskunder och service och service Effektivitet Öka nivå Öka nivå av transparens genomskinlighet Optimera resursoptimera Förbrukningsresurs Förbättra Processkvalitet konsumtion och prestanda Förbättra processkvalitet och prestanda ex perience Nya modeller Ny företags affärsmodeller Aktivera ondata databyte: by: Capitalize on Utökade intäktsströmmar Utöka inkomstströmmar från befintliga produkter från befintliga produkter. Skapa nya intäkter. Skapa nya strömmar från helt strömmar från helt nya (data) produkter (data). Produkter Figur 4: Värdesdimensioner för Big Data-användningsfall Källa: DPDHL Detecon 2.1 Operativ effektivitet 2.1.1 Använda data för att förutsäga hotspots för brottslighet För uppdrag av polisens politiska avdelningar kan uppgiften att spåra brottslingar för att bevara den allmänna säkerheten ibland vara tråkig. Med många siled information repositories, handläggning ofta innebär att manuell anslutning av många datapunkter. Detta tar tider och dramatiskt saktar fallupplösningen. Vidare används vägarpoliseringsresurser reaktivt, vilket gör det väldigt svårt att fånga brottslingar i lagen. I de flesta fall är det inte möjligt att lösa dessa utmaningar genom att öka polispersonalen, eftersom de offentliga budgetarna är begränsade. En myndighet som utnyttjar sina olika datakällor är New York Police Department (NYPD). Genom att fånga och knyta bitar av brottsrelaterad information hoppas den vara ett steg före brottsförövarna.6 Långt innan termen Big Data skapades gjorde NYPD ett försök att bryta upp avdelningen av sina dataintag (t. ex. data från 911 samtal, utredningsrapporter och mer). Med en enda bild av all information som rör ett visst brott uppnår tjänstemän en mer sammanhängande, realtidsbild av sina fall. Denna omställning har väsentligt ökat retrospektiv analys och gör det möjligt för NYPD att vidta åtgärder tidigare när man spåra enskilda brottslingar. De stadigt minskande graden av våldsbrott i New York7 har inte bara hänförts till denna effektivare strömlinjeformning av de många dataposter som krävs för att utföra fallarbete utan också till en grundläggande förändring av polisarbete.8 Genom att införa statistisk analys och georafisk kartläggning av brottsplatserna , NYPD har kunnat skapa en större bild för att styra resursutplacering och patrullövning. Nu kan avdelningen identifiera brottsmönster med hjälp av beräkningsanalys, och detta ger insikter som gör det möjligt för varje befälhavare att proaktivt identifiera hotspots av kriminell verksamhet. NYPD ändrar brottsreglering ekvationen med det sätt som den använder information, IBM cf. www-01.ibmsoftwaresuccesscssdb. nsfCSJSTS-6PFJAZ 6 Indexbrott efter region, New York State Division of Criminal Justice Services, maj 2013, jfr. criminaljustice. ny. govcrimnetojsastats. htm 7 Compstat och organisationsändring i Lowell Police Department, Willis et. al. Polisstiftelsen, 2004 cf. policefoundation. org 8 contentcompstat-and-organization-change-lowell-police-avdelningen 7 8 Stora data bästa praxis över branscher Detta förutseende perspektiv gör det möjligt för NYPD att effektivt rikta utplaceringen av arbetskraft och resurser. I kombination med andra åtgärder har den systematiska analysen av befintlig information bidragit till en kontinuerligt minskande frekvens av våldsbrott (se figur 5). Tekniken att använda historiska data för att uppnå mönsterigenkänning och därmed förutsäga brottsplatserna har över tiden antagits av ett antal kommuner i USA Eftersom allt fler polisavdelningar erbjuder brottsrekordinformation till allmänheten har tredje parter också börjat tillhandahålla brottsplatsprediction de sammanställer data i nationella synpunkter och ger även anonym tippningsfunktionalitet (se figur 6) .9 26.000 1.000 24.000 -3 900 22.000 800 20.000 700 18.000 -4 600 Rån 16.000 500 14.000 400 12.000 300 10.000 2002 Mord 2004 2006 2008 2010 2012 Figur 5: Utveckling av våldsbrott i New York City data från indexbrott Rapporteras till polis efter region: New York City, 20032012, Källa: New York State Division of Criminal Justice Services, jfr. criminaljustice. ny. govcrimnetojsastats. htm Figur 6: En offentlig motor Crimereports skärmdump, jfr. crimereport Jfr. publicengines (exempel) 9 Stora data bästa praxis över branscher 2.1.2 Optimal växlingsplanering i butiker För detaljhandelschefer är planeringsskift för att möta kundernas efterfrågan en känslig uppgift. Överbelastning affären skapar onödiga kostnader och sänker webbplatsens lönsamhet. Att köra butiken med en låg personalnivå påverkar negativt kund - och medarbetarnas tillfredsställelse. Båda är dåliga för företag. På DM drugstores utfördes skiftplaneringsuppgiften historiskt av butikschefen baserad på enkla extrapoleringar och personlig erfarenhet. För vanliga arbetsdagar var denna process tillräckligt bra. Men med ett ökande antal undantag blev det otillfredsställande. Overhead eller brist på personalbegränsad affärsutveckling. Så DM bestämde sig för att effektivt bistå affärshanterare i sin personalplanering genom att hitta sätt att på ett tillförlitligt sätt förutse efterfrågan vid varje försäljningsställe. 10 Tillvägagångssättet var att genomföra en långsiktig förutsägelse för dagligaffärsintäkter med hänsyn till ett brett spektrum av individuella och lokala parametrar. Inputdata till en ny algoritm inkluderade historiska intäktsdata, öppettider och ankomsttider för nya varor från distributionscentra. Utöver detta togs andra data in för att uppnå högsta precisionsnivå. Dessa uppgifter inkluderade lokala omständigheter som marknadsdagar, helgdagar i grannorter, vägdirigeringar och framtida väderprognosdata (eftersom väderförhållandena påverkar konsumentbeteendet väsentligt). DM utvärderade olika prediktiva algoritmer, och den valda lösningen ger nu sådana exakta prognoser som det har visat sig vara ett kraftfullt stöd för skiftplanering. Baserat på högupplösningsprognoserna för den dagliga försäljningen för varje enskild butik kan anställda nu skriva in sina personliga preferenser i skiftscheman fyra till åtta veckor i förväg. När de väl godkänts är deras skift osannolikt att de kan förlita sig på den långsiktiga planen, och en förändring i sista minuten är en exceptionell händelse. Detta visar hur man tillämpar predictive analytics hos DM ökar operativ effektivitet i butik och bidrar samtidigt till en bättre balans mellan arbete och livslängd för butikspersonal. Business Intelligence Guide 20122013, isreport, isi Medien Mnchen, eller cf. blue-yonderendm-drogerie-markt-en. html 10 9 2 2006 Q4 2007 10 Stora data bästa praxis inom branschen 2.2 Kundupplevelse 2.2.1 Social inflytande analys för kundretention För att få insikt om kundnöjdhet och framtida efterfrågan använder företagen ett nummer av olika affärsmodeller. Det konventionella tillvägagångssättet är att genomföra marknadsundersökningar på kundbasen, men detta skapar en generaliserad syn utan att fokusera på enskilda konsumentbehov och beteenden. Ett problem som utmanar leverantörer av telekommunikation är det för kundkörning (förlust av kunder över en tidsperiod). För att hjälpa till att minska klyftan analyserar organisationer vanligtvis användningsmönster för enskilda abonnenter och deras egna servicekvalitet. De erbjuder också specifika fördelar11 för att hålla vissa kunder lojala, baserat på parametrar som kundutgifter, användning och abonnemangslängd. Tidigare har dessa behållningsinsatser baserat på individuellt kundvärde uppnått viss förbättring i lojalitet12, men kundkörning är fortfarande ett problem för leverantörer (se figur 7). För att bättre förutsäga kundbeteendet har T-Mobile USA börjat inkludera sociala relationer mellan abonnenterna i sin churn management model.13 Organisationen använder en multi-graph teknik, som liknar de metoder som används. Skapa detta helt nya perspektiv av kundernas behov av T - Mobile för att berika sin äldre analys av data (historiskt taget från faktureringssystem och kommunikationsnätverk). Dessutom inhämtas ungefär en petabyte av råa data, inklusive information från webklikkströmmar och sociala nätverk, för att hjälpa till att spåra de sofistikerade mekanismerna bakom kundkörningen. Detta mycket innovativa tillvägagångssätt har redan betalat för T-Mobile. Efter bara det första kvartalet med att använda sin nya kärnhanteringsmodell sjönk organisationerna med 50 procent jämfört med samma kvartal föregående år. Postpaid Prepaid Blended Postpaid Trend Prepaid Trend Blended Trend 6 5 Churn Rate () i Social Network Analysis, för att identifiera så kallade stamledare. Det här är människor som har ett starkt inflytande i större, sammanhängande grupper. Om en stamledare byter till en konkurrentstjänst, är det troligt att ett antal av deras vänner och familjemedlemmar också kommer att byta det är som en dominoeffekt. Med denna förändring av sättet att beräkna kundvärdet har T-Mobile förbättrat sitt mätvärde för att inte bara omfatta en kunds livstidsabonnemangsutgift på mobila tjänster utan också storleken på hans eller hennes sociala nätverk eller stam (se figur 8). 4 3 2 1 0 Kv2 2005 Kv4 2005 Kv2 2006 Efterbetalt Förskott Q4 2006 Kv2 2007 Blended Efterbetald trend Förutbetalt trend Blandad trend Q4 2007 Q2 2008 Figur 8: Identifiering av influenser inom en mobil abonnentbas Postpaid Prepaid Blended Efterbetald trend Förutbetalt trend Figur 7: Utveckling Blandad trend för abonnentförhållanden, från: Mobile Churn and Loyalty Strategies, Informa, sid. 24 Kund Lojalitetsspårning, Informa, 2012 11 Kv 4 2006 12 2007 Kv2 2007 MobileQ4Churn Q2 Lojalitet 2008 och Strategier, 2: a upplagan, Informa, 2009 T-Mobile utmaningar churn med data, Brett Sheppard, OReilly Strata, 2011 cf. strata. oreilly201108t-mobile-challenges-churn-with. html 13 februari 2008 Stor data bästa praxis över branscher 2.2.2 Undvik lagerförhållanden för kundnöjdhet Detta är en frekvent och besviken upplevelse för shoppare: när de hittat det perfekta kläder, upptäcker de att den storlek de behöver är slutsålda. Med ökad konkurrens inom textil - och klädsegmentet är tillgängligheten av populära kläder nu vanligtvis begränsad. Detta beror på konsolidering av märken och accelererade produktcykler. I vissa fall finns det bara tre veckor mellan den första designen av ett plagg och dess inköp.14 Den frekventa lanseringen av nya samlingar, som drivs av vertikalt organiserade kedjor, begränsar inköp av artiklar till en enda sats. Detta medför risk för klädkedjor, vilket gör det viktigare än någonsin att exakt förutse konsumenternas efterfrågan på ett visst föremål. Möjligheten att korrekt förutsäga efterfrågan har blivit en viktig faktor för lönsam verksamhet. Ocko-koncernens flerkanaliga detaljhandlare insåg att konventionella metoder för prognostisering av efterfrågan på online - och postorderkatalogprodukter visade sig otillräckliga i en alltmer konkurrensutsatt miljö. För 63 av poster översteg avvikelsen (jämfört med faktiska försäljningsvolymer) cirka 20,15. Koncernen uppskattade affärsrisken för både överproduktion och brist. Överproduktion skulle påverka lönsamheten och låsa upp för mycket kapital. Brist skulle irritera kunder. För att möta kundernas efterfrågan, särskilt de höga förväntningarna hos digitala infödingar när de köpte ett online-inköp, tog Otto-gruppen ett innovativt och störande tillvägagångssätt för att förbättra sin leveransförmåga (se figur 9). Prediction discrepancy 63 prognosavvikelse gt 20 1000 500 Absolute frekvens Klassisk förutsägelse utveckling merchandising risk 100 20 0 20 Klassisk förutsägelse Neuro Bayes utveckla backlog risk 11 prognos avvikelse gt 20 100 200 Prediction with Neuro Bayes Figur 9: Relativ avvikelse av prognosen från den faktiska försäljningsvolymen, från: Big Data amp Predictive Analytics Der Nutzen von Daten fr przise Prognoser och Upphandlingar i Zukunft, Otto Group, Michael Sinn Konferenssamtal Big Data Europe, Zürich, 28 augusti 2012 Märk i globalt Modeindustrie, Patrik Aspers, Årsbok 20072008, Max Planck-institutet för studien av samhällen 14 Otto rechnet mit knstlicher Intelligenz, Lebensmittel Zeitung, 21 augusti 2009 15 11 12 Stora data bästa praxis över branscher 70 63 60 50 40 30 20 11 10 0 Konventionell efterfrågan prognos Efterfrågesprognos med prediktiv analys Figur 10: Procentandel katalogprodukter med faktiska försäljningsgrader som avviker mer än 20 från efterfrågan. Källa: Perfektes Bestandsmanagement durch Prediktiv Analytics, Mathias Stben, Otto Group, vid 29: e tyska logistikkongressen, okt. 2012 Efter att ha utvärderat en rad lösningar för att skapa stabil förutsägelse av försäljningsvolymer lyckades Otto-koncernen så småningom att tillämpa en metod som härrörde från fält av hög energi fysik. Det använde ett multivariat analysverktyg som använder sig av självinlärningskapacitet från neurala nätverkstekniker och kombinerar dem med bayesisk statistik.16 Med detta analysverktyg etablerade gruppen en helt ny prognosmotor som utbildade verktyget med historiska data från 16 tidigare årstider och kontinuerligt inmatar verktyget med 300 miljoner transaktionsposter per vecka från den aktuella säsongen. Detta nya system genererar mer än en miljard enskilda prognoser per år och har redan levererat övertygande resultat. Med endast 11 katalogprodukter saknas försäljningsprediktionen med mer än 20 (se figur 10), är Otto-koncernen nu bättre i stånd att tillgodose kundernas efterfrågan.17 Samtidigt sänker denna nya förutsägbara strategi aktieinnehavet, vilket resulterar i förbättrad lönsamhet och tillgången på medel. Jfr neurobayes. phi-t. deindex. phppublic-information 16 Treffsichere Absatzprognose mit Predictive Analytics, Michael Sinn, Konferenssamtal om Big Data Amp Analytics Kongress, Köln, 19 juni 2012 17 jfr. youtubewatchvhAE2Mui5lRA Stora data bästa praxis över branscher 2.3 Nya affärsmodeller 2.3.1 Crowdsanalys levererar detaljhandel och reklaminsikt För att tillhandahålla effektiva mobila röst - och datatjänster måste nätoperatörer kontinuerligt fånga en uppsättning data på varje abonnent. Förutom att registrera användningen av mobila tjänster (för redovisning och fakturering) måste operatörer också spela in varje abonnents plats så att de kan rikta samtal och dataströmmar till det celltorn som abonnentens handenhet är ansluten till. Så här skapar varje abonnent en digital spår när de flyttar runt leverantörsnätet. Och i de flesta länder är det bara en liten grupp nätoperatörer som har fångat större delen av befolkningen, eftersom kundernas kombinerade digitala spår av abonnentbasen ger en övergripande återspegling av samhället eller, närmare bestämt, hur samhället rör sig. Det är till exempel möjligt att bedöma attraktiviteten hos en viss gata för att öppna en ny butik, baserat på en högupplösningsanalys av hur människor flyttar och vilar i detta område och att öppettiderna sannolikt skapar maximal fotfall (se figur 11) . I ett större sammanhang är det också möjligt att se effekterna av händelser som marknadsföringskampanjer och öppnandet av en konkurrentbutik genom att analysera eventuella förändringar i rörelsemönster. När kön och åldersgruppsuppdelningar ingår i data, och geo-lokaliserade dataset och sociala nätverksaktivitet ingår, lägger denna segmentering till sig ännu större värde för återförsäljare och annonsörer. Tidigare kunde organisationer endast göra intern användning av plats - och användardata från mobilnät. Detta beror på sekretesslagar som begränsar utnyttjandet av enskild abonnentinformation. Men när en gång abonnentidentiteten har delats upp från rörelsedata kvarstår betydande affärsvärde i denna anonyma publikdata, som Telefonica har upptäckt. Med lanseringen av Telefonica Digital globala affärsdivision driver nätoperatören nu affärsinnovation utanför sina kärnverksamheter och varumärken. Som en del av Telefonica Digital har Dynamic Insights-initiativet kommersialiserat analysen av rörelsedata, vilket skapar inkrementella intäkter från detaljhandels-, fastighets-, fritids - och mediekunder.18 Andra operatörer har utvecklat liknande erbjudanden, såsom Verizons Precision Market Insights-tjänst.19 I tätort av digital spår är tillräckligt hög för att korrelera det kollektiva beteendet hos abonnentmassan med egenskaper hos en viss plats eller område. Figur 11: Analys av kundfodfall på en viss plats baserad på mobil abonnentdata, från blog. telefonicapress-releasetelefonica-dynamic-insights-lanseringar-smart-steps-in-the-uk Cf. dynamicinsights. telefonica 18 Cf. verizonenterpriseindustryretailprecision-market-insights 19 13 14 Stora data bästa praxis inom industrin 2.3.2 Skapa nya försäkringsprodukter från geo-lokaliserade data Klimatkänsligheten är en egenskap hos jordbruksindustrin, eftersom lokala temperaturer, solskenstimmar och nederbördsnivåer direkt påverkar avkastningen . Med den ökande förekomsten av extrema väderförhållanden på grund av den globala uppvärmningen har klimatvariationen blivit en betydande risk för bönder.20 För att mildra effekterna av grödans brister, fattar bönder försäkringar för att täcka deras potentiella finansiella förluster. Försäkringsbolag i sin tur utmanas med alltmer oförutsägbara lokala väder ytterligheter. Å ena sidan är de konventionella riskmodellerna baserade på historiska data inte längre lämpliga för att förutse framtida försäkrad förlust.21 Å andra sidan måste anspråk kontrolleras mer exakt eftersom skadan kan variera över en drabbad region. För bönder resulterar kombinationen av dessa två aspekter i högre försäkringsräntor och långsammare utbetalning av skadeskador. In the United States, most private insurance companies viewed crop production as too risky to insure without federal subsidies.22 In 2006, The Climate Corporation started out to create a new weather simulation model based on 2.5 million temperature and precipitation data points, combined with 150 million soil observations. The high resolution of its simulation grid allows the company to dynamically calculate the risk and pricing for weather insurance coverage on a per-field basis across the entire country (see Figure 12). As the tracking of local growing conditions and the calculation of crop shortfall are performed in real time, payouts to policy holders are executed automatically when bad weather conditions occur. This eliminates the need for sophisticated and time-consuming claims processes. Based on 10 trillion simulation data points23, The Climate Corporations new insurance business model is now successfully established. After only six years, the organizations insurance services have been approved across all 50 states in the U. S. Figure 12: Real-time tracking of weather conditions and yield impact per field screenshot taken from climateproductsclimate-apps Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation, Chapter 4.3.4, Intergovernmental Panel on Climate 20 Change (IPCC), 2012 cf. ipcc. chpdfspecial-reportssrexSREXFullReport. pdf Warming of the Oceans and Implications for the (Re-)Insurance Industry, The Geneva Association, June 2013 21 Weather Insurance Reinvented, Linda H. Smith, DTN The Progressive Farmer, November 2011 cf. dtnprogressivefarmer 22 About us, The Climate Corporation, cf. climatecompanyabout 23 Big Data in Logistics 3 BIG DATA IN LOGISTICS Companies are learning to turn large-scale quantities of data into competitive advantage. Their precise forecasting of market demand, radical customization of services, and entirely new business models demonstrate exploitation of their previously untapped data. As todays best practices touch many vertical markets, it is reasonable to predict that Big Data will also become a disruptive trend in the logistics industry. However, the application of Big Data analytics is not immediately obvious in this sector. The particularities of the logistics business must be thoroughly examined first in order to discover valuable use cases. 3.1 Logistics as a Data-driven Business A kick-start for discussion of how to apply Big Data is to look at creating and consuming information. In the logistics industry, Big Data analytics can provide competitive advantage because of five distinct properties. These five properties highlight where Big Data can be most effectively applied in the logistics industry. They provide a roadmap to the well of unique information assets owned by every logistics provider. In the following sections, we identify specific use cases that exploit the value of this information, and contribute to operational efficiency, a better customer experience, or the development of new business models. Optimization of service properties like delivery time, resource utilization, and geographical coverage is an inherent challenge of logistics 1. Optimization to the core Large-scale logistics operations require data to run efficiently. The earlier this information is available and the more precise the information is, the better the optimization results will become Advanced predictive techniques and real-time processing promise to provide a new quality in capacity forecast and resource control The delivery of tangible goods requires a direct customer interaction at pickup and delivery 2. Tangible goods, tangible customers 3. In sync with customer business On a global scale, millions of customer touch points a day create an opportunity for market intelligence, product feedback or even demographics Big Data concepts provide versatile analytic means in order to generate valuable insight on consumer sentiment and product quality Modern logistics solutions seamlessly integrate into production and distribution processes in various industries The tight level of integration with customer operations let logistics providers feel the heartbeat of individual businesses, vertical markets, or regions The application of analy tic methodology to this comprehensive knowledge reveals supply chain risks and provides resilience against disruptions The transport and delivery network is a high-resolution data source 4. A network of information Apart from using data for optimizing the network itself, network data may provide valuable insight on the global flow of goods The power and diversity of Big Data analytics moves the level of observation to a microeconomic viewpoint Local presence and decentralized operations is a necessity for logistics services 5. Global coverage, local presence A fleet of vehicles moving across the country to automatically collect local information along the transport routes Processing this huge stream of data originating from a large delivery fleet creates a valuable zoom display for demographic, environmental, and traffic statistics 15 16 Big Data in Logistics Big Data in Logistics 17 New Customer Base Big Data in Logistics Shop The Data-driven Logistics Provider 5 Existing Custom er Base Customer Loyalty Management Financial Industry Market and customer intelligence External Online Sources Manufacturing FMCG SME Marketing and Sales Product Management New Business Address Verification Market Intelligence Supply Chain Monitoring Environmental Statistics 11 Environmental Intelligence CO2 Sensors attached to delivery vehicles produce fine-meshed statistics on pollution, traffic density, noise, parking spot utilization, etc. Continuous sensor data Service Improvement and Product Innovation Retail Operations Order volume, received service quality 6 Market Research Commercial Data Services Public customer information is mapped against business parameters in order to predict churn and initiate countermeasures High-tech Pharma Public Authorities Customer sentiment and feedback A comprehensive view on customer requirements and service quality is used to enhance the product portfolio 3 8 Supply chain monitoring data is used to create market intelligence reports for small and medium-sized companies Strategic Network Planning Long-term demand forecasts for transport capacity are generated in order to support strategic investments into the network Commerce Sector 9 Households SME Network flow data Core Market Intelligence for SME Location, traffic density, directions, delivery sequence Tr a n s p o r t N e t w ork Financial Demand and Supply Chain Analytics 1 Real-time Route Optimization Delivery routes are dynamically calculated based on delivery sequence, traffic conditions and recipient status Real-time incidents A micro-economic view is created on global supply chain data that helps financial institutions improve their rating and investment decisions Network flow data 10 2 Location, destination, availability Crowd-based Pickup and Delivery A large crowd of occasionally available carriers pick up or deliver shipments along routes they would take anyway Address Verification Fleet personnel verifies recipient addresses which are transmitted to a c entral address verification service provided to retailers and marketing agencies 4 Operational Capacity Planning Short - and mid-term capacity planning allows optimal utilization and scaling of manpower and resources 7 Risk Evaluation and Resilience Planning By tracking and predicting events that lead to supply chain disruptions, the resilience level of transport services is increased Flow of data Flow of physical goods 2013 Detecon International 18 Big Data in Logistics 3.2 Use Cases Operational Efficiency A straightforward way to apply Big Data analytics in a business environment is to increase the level of efficiency in operations. This is simply what IT has always been doing accelerating business processes but Big Data analytics effectively opens the throttle. 3.2.1 Last-mile optimization A constraint in achieving high operational efficiency in a distribution network occurs at the last mile. 24 This final hop in a supply chain is often the most expensive one. The optimization of last-mile delivery to drive down product cost is therefore a promising application for Big Data techniques. Two fundamental approaches make data analysis a powerful tool for increasing last-mile efficiency. In a first and somewhat evolutionary step, a massive stream of information is processed to further maximize the performance of a conventional delivery fleet. This is mainly achieved by real-time optimization of delivery routes. The second, more disruptive approach utilizes data processing to control an entirely new last-mile delivery model. With this, the raw capacity of a huge crowd of randomly moving people replaces the effectiveness of a highly optimized workforce. 1 Real-time route optimization The traveling salesmen problem was formulated around eighty years ago, but still defines the core challenge for last-mile delivery. Route optimization on the last mile aims at saving time in the delivery process. Rapid processing of real-time information supports this goal in multiple ways. When the delivery vehicle is loaded and unloaded, a dynamic calculation of the optimal delivery sequence based on sensor-based detection of shipment items frees the staff from manual sequencing. On the road, telematics databases are tapped to automatically change delivery routes according to current traffic conditions. And routing intelligence considers the availability and location information posted by recipients in order to avoid unsuccessful delivery attempts. In summary, every delivery vehicle receives a continuous adaptation of the delivery sequence that takes into account geographical factors, environmental factors, and recipient status. What makes this a Big Data problem It requires the execution of combinatorial optimization procedures fed from correlated streams of real-time events to dynamically re-route vehicles on the go. As a result, each driver receives instant driving direction updates from the onboard navigation system, guiding them to the next best point of delivery. DHL SmartTruck Daily optimized initial tour planning based on incoming shipment data Dynamic routing system, which recalculates the routes depending on the current order and traffic situation Cuts costs and improves CO2 efficiency, for example by reducing mileage The term last mile has its origin in telecommunications and describes the last segment in a communication network that actually reaches the 24 customer. In the logistics sector, the last mile is a metaphor for the final section of a supply chain, in which goods are handed over to the recipient. Source: The definition of the first and last miles, DHL Discover Logistics, cf. dhl-discoverlogisticscmsencoursetechnologies reinforcementfirst. jsp Big Data in Logistics 2 Crowd-based pick-up and delivery The wisdom and capacity of a crowd of people has become a strong lever for effectively solving business problems. Sourcing a workforce, funding a startup, or performing networked research are just a few examples of requisitioning resources from a crowd. Applied to a distribution network, a crowd-based approach may create substantial efficiency enhancements on the last mile. The idea is simple: Commuters, taxi drivers, or students can be paid to take over lastmile delivery on the routes that they are traveling anyway. Scaling up the number of these affiliates to a large crowd of occasional carriers effectively takes load off the delivery fleet. Despite the fact that crowd-based delivery has to be incentivized, it has potential to cut last-mile delivery costs, especially in rural and sparsely populated areas. On the downside, a crowd-based approach also issues a vital challenge: The automated control of a huge number of randomly moving delivery resources. This requires extensive data processing capabilities, answered by Big Data techniques such as complex event processing and geocorrelation. A real-time data stream is traced in DHL MyWays order to assign shipments to available carriers, based on their respective location and destination. Interfaced through a mobile application, crowd affiliates publish their current position and accept pre-selected delivery assignments. The above two use cases illustrate approaches to optimizing last-mile delivery, yet they are diametrically opposed. In both cases, massive real-time information (originating from sensors, external databases, and mobile devices) is combined to operate delivery resources at maximum levels of efficiency. And both of these Big Data applications are enabled by the pervasiveness of mobile technologies. Unique crowd-based delivery for B2C parcels Flexible delivery in time and location Using existing movement of city residents myways 19 20 Big Data in Logistics 3.2.2 Predictive network and capacity planning Optimal utilization of resources is a key competitive advantage for logistics providers. Excess capacities lower profitability (which is critical for low-margin forwarding services), while capacity shortages impact service quality and put customer satisfaction at risk. Logistics providers must therefore perform thorough resource planning, both at strategic and operational levels. Strategic-level planning considers the long-term configuration of the distribution network, and operational-level planning scales capacities up or down on a daily or monthly basis. For both perspectives, Big Data techniques improve the reliability of planning and the level of detail achieved, enabling logistics providers to perfectly match demand and available resources. 3 Strategic network planning At a strategic level, the topology and capacity of the distribution network are adapted according to anticipated future demand. The results from this stage of planning usually drive investments with long requisition and amortization cycles such as investments in warehouses, distribution centers, and custom-built vehicles. More precise capacity demand forecasts therefore increase efficiency and lower the risks of investing in storage and fleet capacity. Big Data techniques support network planning and optimization by analyzing comprehensive historical capacity and utilization data of transit points and transportation routes. In addition, these techniques consider seasonal factors and emerging freight flow trends by learning algorithms that are fed with extensive statistical series. External economic information (such as industry-specific and regional growth forecasts) is included for more accurate prediction of specific transportation capacity demand. In summary, to substantially increase predictive value, a much higher volume and variety of information is exploited by advanced regression and scenario modeling techniques. The result is a new quality of planning with expanded forecast periods this effectively reduces the risk of long-term infrastructure investments and contracted external capacities. It can also expose any impending over-capacity and provide this as automated feedback to accelerate sales volume. This is achieved by dynamic pricing mechanisms, or by transfer of overhead capacities to spot-market trading. Big Data in Logistics 4 Operational capacity planning At operational level, transit points and transportation routes must be managed efficiently on a day-to-day basis. This involves capacity planning for trucks, trains, and aircraft as well as shift planning for personnel in distribution centers and warehouses. Often operational planning tasks are based on historical averages or even on personal experience, which typically results in resource inefficiency. Instead, using the capabilities of advanced analytics, the dynamics within and outside the distribution network are modeled and the impact on capacity requirements calculated in advance. Real-time information about shipments (items that are entering the distribution network, are in transit, and are stored) is aggregated to predict the allocation of resources for the next 48 hours. This data is automatically sourced from warehouse management systems and sensor data along the transportation chain. In addition detection of ad-hoc changes in demand is derived from externally available customer information (e. g. data on product releases, factory openings, or unexpected bankruptcy). Additionally, local incidents are detected (e. g. regional disease outbreaks or natural disasters) as these can skew demand figures for a particular region or product. This prediction of resource requirements helps Both of the above Big Data scenarios increase resource efficiency in the distribution network, but the style of data processing is different. The strategic optimization combines a high data volume from a variety of sources in order to support investment and contracting decisions, while the operational optimization continuously forecasts network flows based on real-time streams of data. DHL Parcel Volume Prediction operating personnel to scale capacity up or down in each particular location. But theres more to it than that. A precise forecast also reveals upcoming congestions on routes or at transit points that cannot be addressed by local scaling. For example, a freight aircraft that is working to capacity must leave behind any further expedited shipments at the airport of origin. Simulation results give early warning of this type of congestion, enabling shipments to be reassigned to uncongested routes, mitigating the local shortfall. This is an excellent example of how Big Data analytics can turn the distribution network into a self-optimizing infrastructure. Analytic tool to measure influences of external factors on the expected volume of parcels Correlates external data with internal network data Results in a Big Data Prediction Model that significantly increases operational capacity planning Ongoing research project by DHL Solutions amp Innovation 21 22 Big Data in Logistics 3.3 Use Cases Customer Experience The aspect of Big Data analytics that currently attracts the most attention is acquisition of customer insight. For every business, it is vitally important to learn about customer demand and satisfaction. But as organizations experience increased business success, the individual customer can blur into a large and anonymous customer base. Big Data analytics help to win back individual customer insight and to create targeted customer value. 3.3.1 Customer value management Clearly, data from the distribution network carries significant value for the analysis and management of customer relations. With the application of Big Data techniques, and enriched by public Internet mining, this data can be used to minimize customer attrition and understand customer demand. 5 Customer loyalty management For most business models, the cost of winning a new customer is far higher than the cost of retaining an existing customer. But it is increasingly difficult to trace and analyze individual customer satisfaction because there are more and more indirect customer touch points (e. g. portals, apps, and indirect sales channels). Because of this, many businesses are failing to establish effective customer retention programs. Smart use of data enables the identification of valuable customers who are on the point of leaving to join the competition. Big Data analytics allow a comprehensive assessment of customer satisfaction by merging multiple extensive data sources. For logistics providers, this materializes in a combined evaluation of records from customer touch points, operational data on logistics service quality, and external data. How do these pieces fit together Imagine the scenario of a logistics provider noticing a customer who lowers shipment volumes despite concurrently publishing steady sales records through newswire. The provider then checks delivery records, and realizes that this customer recently experienced delayed shipments. Looking at the bigger picture, this information suggests an urgent need for customer retention activity. To achieve this insight not just with one customer but across the entire customer base, the logistics provider must tap multiple data sources and use Big Data analytics. Customer touch points include responses to sales and marketing activities, customer service inquiries, and complaint management details. This digital customer trail is correlated with data from the distribution network comprising statistical series on shipping volume and received service quality levels. In addition, the Internet provides useful customer insight: Publicly available information from news agencies, annual reports, stock trackers, or even sentiments from social media sites enrich the logistics providers internal perspective of each customer. From this comprehensive information pool, the logistics provider can extract the attrition potential of every single customer by applying techniques such as semantic text analytics, natural-language processing, and pattern recognition. On automatically generated triggers, the provider then initiates proactive counter-measures and customer loyalty programs. Although business relationships in logistics usually relate to the sender side, loyalty management must also target the recipient side. Recipients are even more affected by poor service quality, and their feedback influences sender selection for future shipments. A good example of this is Internet or catalog shopping: Recurring customer complaints result in the vendor considering a switch of logistics provider. But to include recipients into loyalty management requires yet more data to be processed, especially in B2C markets. Big Data analytics are essential, helping to produce an integrated view of customer interactions and operational performance, and ensure sender and recipient satisfaction. Big Data in Logistics 6 Continuous service improvement and product innovation Logistics providers collect customer feedback as this provides valuable insight into service quality and customer expectations and demands. This feedback is a major source of information for continuous improvement in service quality. It is also important input for the ideation of new service innovations. To get solid results from customer feedback evaluation, it is necessary to aggregate information from as many touch points as possible. In the past, the single source of data has been ingests from CRM systems and customer surveys. But today, Big Data solutions provide access to gargantuan volumes of useful data stored on public Internet sites. In social networks and on 3.3.2 Suppy chain risk management discussion forums, people openly and anonymously share their service experiences. But extracting by hand relevant customer feedback from the natural-language content created by billions of Internet users is like looking for that proverbial needle in a haystack. The uninterrupted direct supply of materials is essential to businesses operating global production chains. Lost, delayed, or damaged goods have an immediate negative impact on revenue streams. Whereas logistics providers are prepared to control their own operational risk in supply chain services, an increasing number of disruptions result from major events such as civil unrest, natural disasters, or sudden economic developments.25 To anticipate supply chain disruptions and mitigate the effect of unforeseen incidents, global enterprises seek to deploy business continuity management (BCM) measures.26 Sophisticated Big Data techniques such as text mining and semantic analytics allow the automated retrieval of customer sentiment from huge text and audio repositories. In addition, this unsolicited feedback on quality and demand can be broken down by region and time. This enables identification of correlation with one-time incidents and tracking the effect of any initiated action. In summary, meticulous review of the entire public Internet brings unbiased customer feedback to the logistics provider. This empowers product and operational managers to design services capable of meeting customer demand. This demand for improved business continuity creates an opportunity for logistics providers to expand their customer value in outsourced supply chain operations. Rapid analysis of various information streams can be used to forecast events with a potentially significant or disastrous impact on customer business. In response to arising critical conditions, counter-measures can be initiated early to tackle arising business risks. Are you ready for anything, DHL Supply Chain Matters, 2011, cf. dhlsupplychainmatters. dhlefficiencyarticle24are-you-ready - 25 for-anything Making the right risk decisions to strengthen operations performance, PriceWaterhouseCoopers and MIT Forum for Supply Chain Innovation, 2013 26 23 24 Big Data in Logistics 7 Risk evaluation and resilience planning Contract logistics providers know their customers supply chains in great detail. To cater for the customer need for predictive risk assessment, two things must be linked and continuously checked against each other: A model describing all elements of the supply chain topology, and monitoring of the forces that affect the performance of this supply chain. Data on local developments in politics, economy, nature, health, and more must be drawn from a plethora of sources (e. g. social media, blogs, weather forecasts, news sites, stock trackers, and many other publically available sites), and then aggregated and analyzed. Most of this information stream is unstructured and continuously updated, so Big Data analytics power the retrieval of input that is meaningful in the detection of supply chain risks. Both semantic analytics and complex event processing techniques are required to detect patterns in this stream of interrelated information pieces.27 The customer is notified when a pattern points to a critical condition arising for one of the supply chain elements (e. g. a tornado warning in the region where a transshipment point is located). This notification includes a report on the probability and impact of this risk, and provides suitable counter-measures to mitigate potential disruption. Equipped with this information, the customer can re-plan transport routes or ramp up supplies from other geographies. Robust supply chains that are able to cope with unforeseen events are a vital business capability in todays rapidly changing world. In addition to a resilient and flexible supply chain infrastructure, businesses need highly accurate risk detection to keep running when disaster strikes. With Big Data tools and techniques, logistics providers can secure customer operations by performing predictive analytics on a global scale. Coming Soon A New Supply Chain Risk Management Solution by DHL A unique consultancy and software solution that improves the resilience of your entire supply chain Designed to reduce emergency costs, maintain service levels, protect sales, and enable fast post-disruption recovery Protects your brand and market share, informs your inventory decisions, and creates competitive advantage The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems, David C. Luckham, Addison-Wesley Long - 27 man, 2001 Big Data in Logistics 3.4 Use Cases New Business Models 3.4.1 B2B demand and supply chain forecast The logistics sector has long been a macroeconomic indicator, and the global transportation of goods often acts as a benchmark for future economic development. The type of goods and shipped volumes indicate regional demand and supply levels. The predictive value of logistics data for the global economy is constituted by existing financial indices measuring the macroeconomic impact of the logistics sector. Examples are the Baltic Dry Index28, a price index for raw materials shipped, and the Dow Jones Transportation Average29, showing the economic stability of the 20 largest U. S. logistics providers. By applying the power of Big Data analytics, logistics providers have a unique opportunity to extract detailed microeconomic insights from the flow of goods through their distribution networks. They can exploit the huge digital asset that is piled up from the millions of daily shipments by capturing demand and supply figures in various geographical and industry segments. 8 The result has high predictive value and this compound market intelligence is therefore a compelling service that can be offered by third parties. To serve a broad range of potential customers, the generated forecasts are segmented by industry, region, and product category. The primary target groups for advanced data services such as these are small and medium-sized enterprises that lack capacity to conduct their own customized market research. Market intelligence for small and medium-sized enterprises The aggregation of shipment records (comprising origin, destination, type of goods, quantity, and value) is an extensive source of valuable market intelligence. As long as postal privacy is retained, logistics providers can refine this data in order to substantiate existing external market research. With regression analysis, DHL Geovista the fine-grained information in a shipment database can significantly enhance the precision of conventional demand and supply forecasts. Online geo marketing tool for SMEs to analyze business potential Provides realistic sales forecast and local competitor analysis based on a scientific model A desired location can be evaluated by using high-quality geodata deutschepost. degeovista Baltic Dry Index, Financial Times Lexicon, cf. lexicon. ftTermtermBaltic-Dry-Index 28 Dow Jones Transportation Average, SampP Dow Jones Indices, cf. djaveragesgotransportation-overview 29 25 26 Big Data in Logistics 9 Financial demand and supply chain analytics Financial analysts depend on data to generate their growth perspectives and stock ratings. Sometimes analysts even perform manual checks on supply chains as the only available source to forecast sales figures or market volumes. So for ratings agencies and advisory firms in the banking and insurance sector, access to the detailed information collected from a global distribution network is particularly valuable. An option for logistics providers is to create a commercial analytics platform allowing a broad range of users to slice and dice raw data according to their field of research effectively creating new revenue streams from the huge amount of information that controls logistics operations. 10 In the above use cases, analytics techniques are applied to vast amounts of shipment data. This illustrates how logistics providers can implement new informationdriven business models. In addition, the monetization of data that already exists adds the potential of highly profitable revenue to the logistics providers top line. 3.4.2 Real-time local intelligence Information-driven business models are frequently built upon existing amounts of data, but this is not a prerequisite. An established product or service can also be extended in order to generate new information assets. For logistics providers, the pickup and delivery of shipments provides a particular opportunity for a complementary new business model. No other industry can provide the equivalent blanket-coverage local presence of a fleet of vehicles that is constantly on the move and geographically distributed. Logistics providers can equip these vehicles with new devices (with camera, sensor, and mobile connectivity miniaturization powered by the Internet of Things) to collect rich sets of information on the go. This unique capability enables logistics providers to offer existing and new customers a completely new set of value-added data services. Address verification The verification of a customers delivery address is a fundamental requirement for online commerce. Whereas address verification is broadly available in industrialized nations, for developing countries and in remote areas the quality of address data is typically poor. This is also partly due to the lack of structured naming schemes for streets and buildings in some locations. Logistics providers can use daily freight, express, and parcel delivery data to automatically verify address data to achieve, for example, optimized route planning with correct geocoding for retail, banking, and public sector entities. DHL Address Management Direct match of input data with reference data Return incomplete or incorrect incoming data with validated data from database Significant increase of data quality for planning purposes (route planning) Big Data in Logistics 11 Environmental intelligence The accelerated growth of urban areas30 increases the importance of city planning activities and environmental monitoring. By using a variety of sensors attached to delivery vehicles, logistics providers can produce rich environmental statistics. Data sets may include measurements of ozone and fine dust pollution, temperature and humidity, as well as traffic density, noise, and parking spot utilization along urban roads. As all of this data can be collected en passant (in passing), it is relatively easy for logistics providers to offer a valuable data service to authorities, environment agencies, and real-estate developers while achieving complementary revenues to subsidize, for example, the maintenance of a large delivery fleet. There are numerous other local intelligence use cases exploiting the ubiquity of a large delivery fleet. From road condition reports that steer plowing or road maintenance squads, to surveys on the thermal insulation of public households, logistics providers are in pole position as search engines in the physical world. Innovative services that provide all kinds of data in microscopic geographical detail are equally attractive to advertising agencies, construction companies, and public bodies such as police and fire departments. Big Data techniques that extract structured information from real-time footage and sensor data are now building a technical backbone for the deployment of new data-driven business models. 3.5 Succcess Factors for Implementing Big Data Analytics Our discussion of Big Data analytics has been focused on the value of information assets and the way in which logistics providers can leverage data for better business performance. This is a good start, as solid use cases are a fundamental requirement for adopting new information-driven business models. But there needs to be more than a positive assessment of business value. The following five success factors must also be in place. 3.5.1 Business and IT alignment In the past, advancements in information management clearly targeted either a business problem or a technology problem. While trends such as CRM strongly affected the way sales and service people work, other trends such as cloud computing have caused headaches for IT teams attempting to operate dynamic IT resources across the Internet. Consequently, business units and the IT department may have different perspectives on which changes are worth adopting and managing. But for an organization to transform itself into an information-driven company one that uses Big Data analytics for competitive advantage both the business units and the IT department must accept and support substantial change. It is therefore essential to demonstrate and align both a business case and an IT case for using Big Data (including objectives, benefits, and risks). To complete a Big Data implementation, there must be a mutual understanding of the challenges as well as a joint commitment of knowledge and talent. According to the United Nations, by 2050 85.9 of the population in developed countries will live in urban areas. Taken from: Open-air computers, 30 The Economist, Oct. 27, 2012 cf. economistnewsspecial-report21564998-cities-are-turning-vast-data-factories-open-air-computers 27 28 Big Data in Logistics 3.5.2 Data transparency and governance Big Data use cases often build upon a smart combination of individual data sources which jointly provide new perspectives and insights. But in many companies the reality is that three major challenges must be addressed to ensure successful implementation. First, to locate data that is already available in the company, there must be full transparency of information assets and ownership. Secondly, to prevent ambiguous data mapping, data attributes must be clearly structured and explicitly defined across multiple databases. And thirdly, strong governance on data quality must be maintained. The validity of mass query results is likely to be compromised unless there are effective cleansing procedures to remove incomplete, obsolete, or duplicate data records. And it is of utmost importance to assure high overall data quality of individual data sources because with the boosted volume, variety, and velocity of Big Data it is more difficult to implement efficient validation and adjustment procedures. 3.5.3 Data privacy In the conceptual phase of every Big Data project, it is essential to consider data protection and privacy issues. Personal data is often revealed when exploiting information assets, especially when attempting to gain customer insight. Use cases are typically elusive in countries with strict data protection laws, yet legislation is not the only constraint. Even when a use case complies with prevailing laws, the large-scale collection and exploitation of data often stirs public debate and this can subsequently damage corporate reputation and brand value. or breaks reliable and meaningful insights. In most industries, the required mathematical and statistical skillset is scarce. In fact, a talent war is underway, as more and more companies recognize they must source missing data science skills externally. Very specialized knowledge is required to deploy the right techniques for each particular data processing problem, so organizations must invest in new HR approaches in support of Big Data initiatives. 3.5.5 Appropriate technology usage Many data processing problems currently hyped as Big Data challenges could, in fact, have been technically solved five years ago. But back then, the required technology investment would have shattered every business case. Now at a fraction of the cost, raw computing power has exponentially increased, and advanced data processing concepts are available, enabling a new dimension of performance. The most prominent approaches are in-memory data storage and distributed computing frameworks. However, these new concepts require adoption of entirely new technologies. 3.5.4 Data science skills For IT departments to implement Big Data projects therefore requires a thorough evaluation of established and new technology components. It needs to be established whether these components can support a particular use case, and whether existing investments can be scaled up for higher performance. For example, in-memory databases (such as the SAP HANA system) are very fast but have a limited volume of data storage, while distributed computing frameworks (such as the Apache Hadoop framework) are able to scale out to a huge number of nodes but at the cost of delayed data consistency across multiple nodes. A key to successful Big Data implementation is mastery of the many data analysis and manipulation techniques that turn vast raw data into valuable information. The skillful application of computational mathematics makes In summary, these are the five success factors that must be in place for organizations to leverage data for better business performance. Big Data is ready to be used. Outlook OUTLOOK Looking ahead, there are admittedly numerous obstacles to overcome (data quality, privacy, and technical feasibility, to name just a few) before Big Data has pervasive influence in the logistics industry. But in the long run, these obstacles are of secondary importance because, first and foremost, Big Data is driven by entrepreneurial spirit. Several organizations have led the way for us Google, Amazon, Facebook, and eBay, for example, have already succeeded in turning extensive information into business. Now we are beginning to see first movers in the logistics sector. These are the entrepreneurial logistics providers that refuse to be left behind the opportunity-oriented organizations prepared to exploit data assets in pursuit of the applications described in this trend report. But apart from the leading logistics providers that implement specific Big Data opportunities, how will the entire logistics sector transform into a data-driven industry What evolution can we anticipate in a world where virtually every single shipped item is connected to the Internet We may not know all of the answers right now. But this trend report has shown there is plenty of headroom for valuable Big Data innovation. Joining resources, labor, and capital, it is clear that information has become the fourth production factor and essential to competitive differentiation. It is time to tap the potential of Big Data to improve operational efficiency and customer experience, and create useful new business models. It is time for a shift of mindset, a clear strategy and application of the right drilling techniques. Over the next decade, as data assumes its rightful place as a key driver in the logistics sector, every activity within DHL is bound to get smarter, faster, and more efficient. 29 FOR MORE INFORMATION About Big Data in Logistics, contact: RECOMMENDED READING LOGISTICS TREND RADAR Dr. Markus Kckelhaus DHL Customer Solutions amp Innovation Junkersring 57 53844 Troisdorf, Germany Phone: 49 2241 1203 230 Mobile: 49 152 5797 0580 e-mail: markus. kueckelhausdhl Katrin Zeiler DHL Customer Solutions amp Innovation Junkersring 57 53844 Troisdorf, Germany Phone: 49 2241 1203 235 Mobile: 49 173 239 0335 e-mail: katrin. zeilerdhl dhltrendradar KEY LOGISTICS TRENDS IN LIFE SCIENCES 2020 dhllifesciences2020. View Full Document This document was uploaded on 11302016 for the course MS 6721 at City University of Hong Kong. Click to edit the document details Share this link with a friend: Most Popular Documents for MS 6721 8NetworkReadingSciRep2012.pdf City University of Hong Kong MS 6721 - Winter 2016 Understanding Road Usage Patterns in Urban Areas SUBJECT AREAS: APPLIED PHYSICS CIVIL 7Network. pdf City University of Hong Kong MS 6721 - Winter 2016 1 SUPPLY CHAIN MANAGEMENT Lecture 7 Network and Graph Qingpeng ZHANG SEEM, City Unive 8NetworkReadingIJOPM2011.pdf City University of Hong Kong MS 6721 - Winter 2016 International Journal of Operations amp Production Management A complex network approac reading tasks2.docx City University of Hong Kong MS 6721 - Winter 2016 Summary Of Decision Support Systems nowsdays supply chain risks is becoming increasin MS6721.pdf City University of Hong Kong MS 6721 - Winter 2016 Form 2B City University of Hong Kong Information on a Course offered by Department of 3Demand. pdf City University of Hong Kong MS 6721 - Winter 2016 1 SUPPLY CHAIN MANAGEMENT Lecture 3 Demand For ecasting Qingpeng ZHANG SEEM, City Univ404 means the file is not found. If you have already uploaded the file then the name may be misspelled or it is in a different folder. Other Possible Causes You may get a 404 error for images because you have Hot Link Protection turned on and the domain is not on the list of authorized domains. If you go to your temporary url (ip username) and get this error, there maybe a problem with the rule set stored in an. htaccess file. You can try renaming that file to. htaccess-backup and refreshing the site to see if that resolves the issue. It is also possible that you have inadvertently deleted your document root or the your account may need to be recreated. Either way, please contact your web host immediately. Are you using WordPress See the Section on 404 errors after clicking a link in WordPress. Missing or Broken Files When you get a 404 error be sure to check the URL that you are attempting to use in your browser. This tells the server what resource it should attempt to request. In this example the file must be in publichtmlexampleExample Notice that the CaSe is important in this example. On platforms that enforce case-sensitivity e xample and E xample are not the same locations. For addon domains, the file must be in publichtmladdondomainexampleExample and the names are case-sensitive. Broken Image When you have a missing image on your site you may see a box on your page with with a red X where the image is missing. Right click on the X and choose Properties. The properties will tell you the path and file name that cannot be found. This varies by browser, if you do not see a box on your page with a red X try right clicking on the page, then select View Page Info, and goto the Media Tab. In this example the image file must be in publichtmlcgi-sysimages Notice that the CaSe is important in this example. On platforms that enforce case-sensitivity PNG and png are not the same locations. When working with WordPress, 404 Page Not Found errors can often occur when a new theme has been activated or when the rewrite rules in the. htaccess file have been altered. When you encounter a 404 error in WordPress, you have two options for correcting it. Option 1: Correct the Permalinks Log in to WordPress. From the left-hand navigation menu in WordPress, click Settings gt Permalinks (Note the current setting. If you are using a custom structure, copy or save the custom structure somewhere.) Select Default . Click Save Settings . Change the settings back to the previous configuration (before you selected Default). Put the custom structure back if you had one. Click Save Settings . This will reset the permalinks and fix the issue in many cases. If this doesnt work, you may need to edit your. htaccess file directly. Option 2: Modify the. htaccess File Add the following snippet of code to the top of your. htaccess file: BEGIN WordPress ltIfModule modrewrite. cgt RewriteEngine On RewriteBase RewriteRule index. php - L RewriteCond - f RewriteCond - d RewriteRule. index. php L ltIfModulegt End WordPress If your blog is showing the wrong domain name in links, redirecting to another site, or is missing images and style, these are all usually related to the same problem: you have the wrong domain name configured in your WordPress blog. The. htaccess file contains directives (instructions) that tell the server how to behave in certain scenarios and directly affect how your website functions. Redirects and rewriting URLs are two very common directives found in a. htaccess file, and many scripts such as WordPress, Drupal, Joomla and Magento add directives to the. htaccess so those scripts can function. It is possible that you may need to edit the. htaccess file at some point, for various reasons. This section covers how to edit the file in cPanel, but not what may need to be changed.(You may need to consult other articles and resources for that information.) There are Many Ways to Edit a. htaccess File Edit the file on your computer and upload it to the server via FTP Use an FTP programs Edit Mode Use SSH and a text editor Use the File Manager in cPanel The easiest way to edit a. htaccess file for most people is through the File Manager in cPanel. How to Edit. htaccess files in cPanels File Manager Before you do anything, it is suggested that you backup your website so that you can revert back to a previous version if something goes wrong. Open the File Manager Log into cPanel. In the Files section, click on the File Manager icon. Check the box for Document Root for and select the domain name you wish to access from the drop-down menu. Make sure Show Hidden Files (dotfiles) is checked. Click Go . The File Manager will open in a new tab or window. Look for the. htaccess file in the list of files. You may need to scroll to find it. To Edit the. htaccess File Right click on the. htaccess file and click Code Edit from the menu. Alternatively, you can click on the icon for the. htaccess file and then click on the Code Editor icon at the top of the page. A dialogue box may appear asking you about encoding. Just click Edit to continue. The editor will open in a new window. Edit the file as needed. Click Save Changes in the upper right hand corner when done. The changes will be saved. Test your website to make sure your changes were successfully saved. If not, correct the error or revert back to the previous version until your site works again. Once complete, you can click Close to close the File Manager window. Introduction. Lecture BigData Analytics. Julian M. Kunkel. 1 Introduction Lecture BigData Analytics Julian M. Kunkel University of Hamburg German Climate Computing Center (DKRZ) 2 Outline 1 Introduction 2 BigData Challenges 3 Analytical Workflow 4 Use Cases 5 Programming 6 Summary Julian M. Kunkel Lecture BigData Analytics, 51 3 About DKRZ German Climate Computing Center (DKRZ) Partner for Climate Research Maximum Compute Performance. Sophisticated Data Management. Competent Service. Julian M. Kunkel Lecture BigData Analytics, 51 4 Introduction BigData Challenges Analytical Workflow Use Cases Programming Summary Scientific Computing Research Group of Prof. Ludwig at the University of Hamburg Embedded into DKRZ Research Analysis of parallel IO Alternative IO interfaces IO amp energy tracing tools Data reduction techniques Middleware optimization Cost amp energy efficiency Julian M. Kunkel Lecture BigData Analytics, 51 5 Lecture Concept of the lecture The lecture is focussing on applying technology and some theory Theory Data models and processing concepts Algorithms and data structures System architectures Statistics and machine learning Applying technology Learning about various state-of-the art technology Hands-on for understanding the key concepts Languages: Java, Python, R The domain of big data is overwhelming, especially in terms of technology It is a crash course for several topics such as statistics and databases it is not the goal to learn and understand every aspect in this lecture Julian M. Kunkel Lecture BigData Analytics, 51 6 Lecture (2) Slides Many openly accessable sources have been used Citation to them by a number The reference slide provides the link to the source For figures, a reference is indicated by Source: Author 1 title ref In the title, an ref means that this reference has been used for the slide, some text may be taken literally Excercise Weekly delivery, processing time about 8 hours per week estimated Teamwork of 2 or 3 people (groups are mandatory) Supported by: Hans Ole Hatzel 1 If available Julian M. Kunkel Lecture BigData Analytics, 51 7 Idea of BigData Methods of obtaining knowledge (Erkenntnissprozess) Theory (model), hypothesis, experiment, analysis (repeat) Explorative: start theory with observations of phenomena Constructivism: starts with axioms and reason implications The Fourth Paradigm (Big) Data Analytics Insight (prediction of the future) For industry: insight business advantage and money. Analytics: follow an explorative approach and study the data To infer knowledge, use statistics machine learning Construct a theory (model) and validate it with the data Julian M. Kunkel Lecture BigData Analytics, 51 8 Example Models Similarity is a (very) simplistic model and predictor for the world Humans use this approach in their cognitive process Uses the advantage of BigData Weather prediction You may develop and rely on complex models of physics Or use a simple model for a particular day e. g. expect it to be similar to the weather of the day over the last X years Used by humans: rule of thumb for farmers Preferences of Humans Identify a set of people which liked items you like Predict you like also the items those people like (items you haven t rated so far) Julian M. Kunkel Lecture BigData Analytics, 51 9 Relevance of Big Data Big Data Analytics is emerging Relevance increases compared to supercomputing Google Search Trends, relative searches Julian M. Kunkel Lecture BigData Analytics, 51 10 1 Introduction 2 BigData Challenges Volume Velocity Variety Veracity Value 3 Analytical Workflow 4 Use Cases 5 Programming 6 Summary Julian M. Kunkel Lecture BigData Analytics, 51 11 BigData Challenges amp Characteristics Source: MarianVesper 4 Julian M. Kunkel Lecture BigData Analytics, 51 12 Volume: The size of the Data What is Big Data Terrabytes to 10s of petabytes What is not Big Data A few gigabytes Examples Wikipedia corpus with history ca. 10 TByte Wikimedia commons ca. 23 TByte Google search index ca. 46 Gigawebpages 2 YouTube per year 76 PByte ( ) 2 3 sumanrs. wordpress20120414youtube-yearly-costs-for-storagenetworking-estimate Julian M. Kunkel Lecture BigData Analytics, 51 13 Velocity: Data Volume per Time What is Big Data 30 KiB to 30 GiB per second (902 GiByear to 902 PiByear) What is not Big Data A never changing data set Examples LHC (Cern) with all experiments about 25 GBs 4 Square Kilometre Array 700 TBs (in 2018) 5 50k Google searches per s 6 Facebook 30 Billion content pieces shared per month blog. kissmetricsfacebook-statistics Julian M. Kunkel Lecture BigData Analytics, 51 14 Data Sources Enterprise data Serves business objectives, well defined Customer information Transactions, e. g. Purchases ExperimentalObservational data (EOD) Created by machines from sensorsdevices Trading systems, satellites Microscopes, video streams, Smart meters Social media Created by humans Messages, posts, blogs, Wikis Julian M. Kunkel Lecture BigData Analytics, 51 15 Variety: Types of Data Structured data Like tables with fixed attributes Traditionally handled by relational databases Unstructured data Usually generated by humans E. g. natural language, voice, Wikipedia, Twitter posts Must be processed into (semi-structured) data to gain value Semi-structured data What is Big Data Has some structure in tags but it changes with documents E. g. HTML, XML, JSON files, server logs Use data from multiple sources and in multiple forms Involve unstructured and semi-structured data Julian M. Kunkel Lecture BigData Analytics, 51 16 Veracity: Trustworthiness of Data What is Big Data Data involves some uncertainty and ambiguities Mistakes can be introduced by humans and machines People sharing accounts Like sth. today, dislike it tomorrorw Wrong system timestamps Data Quality is vital Analytics and conclusions rely on good data quality Garbage data perfect model gt garbage results Perfect data garbage model gt garbage results GIGO paradigm: Garbage In Garbage Out Julian M. Kunkel Lecture BigData Analytics, 51 17 Value of Data What is Big Data Raw data of Big Data is of low value For example, single observations Analytics and theory about the data increases the value Analytics transform big data into smart data Julian M. Kunkel Lecture BigData Analytics, 51 18 Types of Data Analytics and Value of Data 1 Descriptive analytics (Beschreiben) What happened 2 Diagnostic analytics Why did this happen, what went wrong 3 Predictive analytics (Vorhersagen) What will happen 4 Prescriptive analytics (Empfehlen) What should we do and why The level of insight and value of data increases from step 1 to 4 Julian M. Kunkel Lecture BigData Analytics, 51 19 Introduction BigData Challenges Analytical Workflo w Use Cases Programming Summary The Value of Data (alternative view) Source: Dursun Delen, Haluk Demirkan 9 Julian M. Kunkel Lecture BigData Analytics, 51 20 The Value of Data (alternative view 2) Source: Forrester report. Understanding The Business Intelligence Growth Opportunity Julian M. Kunkel Lecture BigData Analytics, 51 21 1 Introduction 2 BigData Challenges 3 Analytical Workflow Value Chain Roles Privacy 4 Use Cases 5 Programming 6 Summary Julian M. Kunkel Lecture BigData Analytics, 51 22 Big Data Analytics Value Chain There are many visualizations of the processing and value chain 8 Source: Andrew Stein 8 Julian M. Kunkel Lecture BigData Analytics, 51 23 Big Data Analytics Value Chain (2) Source: Miller and Mork 7 Julian M. Kunkel Lecture BigData Analytics, 51 24 Roles in the Big Data Business Data scientist Data science is a systematic method dedicated to knowledge discovery via data analysis 1 In business, optimize organizational processes for efficiency In science, analyze experimentalobservational data to derive results Data engineer Data engineering is the domain that develops and provides systems for managing and analyzing big data Build modular and scalable data platforms for data sci entists Deploy big data solutions Julian M. Kunkel Lecture BigData Analytics, 51 25 Typical Skills Data scientist Statistics (Mathematics) Computer science Programming e. g. Java, Python, R, (SAS. ) Machine learning Some domain knowledge for the problem to solve Data engineer Computer science Databases Software engineering Massively parallel processing Real-time processing Languages: C, Java, Python Understand performance factors and limitations of systems Julian M. Kunkel Lecture BigData Analytics, 51 26 Data Science vs. Business Intelligence (BI) Characteristics of BI Provides pre-created dashboards for management Repeated visualization of well known analysis steps Deals with structured data Typically data is generated within the organization Central data storage (vs. multiple data silos) Handeled well by specialized database techniques Typical types of insight Customer service data: what business causes the largest customer wait times Sales and marketing data: which marketing is most effective Operational data: efficiency of the help desk Employee performance data: who is mostleast productive Julian M. Kunkel Lecture BigData Analytics, 51 27 Privacy B e aware of privacy issues if you deal with personalprivate information. German privacy laws are more strict than those of other countries Ziel des Datenschutzes Recht auf informationelle Selbstbestimmung Schutz des Einzelnen vor beeintraumlchtigung des Persoumlnlichkeitsrechts durch den Umgang mit seinen personenbezogenen 8 Daten Besonderer Schutz fuumlr Daten uumlber Gesundheit, ethnische Herkunft, religioumlse, gewerkschaftschliche oder sexuelle Orientierung 8 3 BDSG, Einzelangaben uumlber persoumlnliche oder sachliche Verhaumlltnisse einer bestimmten oder bestimmbaren natuumlrlichen Person Julian M. Kunkel Lecture BigData Analytics, 51 28 Wichtige Grundsaumltze des Gesetzes 10 Verbotsprinzip mit Erlaubsnisvorbehalt Erhebung, Verarbeitung, Nutzung und Weitergabe von personenbezogenen Daten sind verboten Nutzung nur mit Rechtsgrundlage oder mit Zustimmung der Person Unternehmen mit 10 Personen benoumltigen Datenschutzbeauftragten Verfahren zur automatischen Verarbeitung sind vom Datenschutzbeauftragten zu pruumlfen und anzeigepflichtig Sitz der verantwor tlichen Stelle maszliggeblich Bei einer Niederlassung in D gilt BDSG Prinzipien: Datenvermeidung, - sparsamkeit Schutz vor Zugriffen, Aumlnderungen und Weitergabe Betroffene haben Recht auf Auskunft, Loumlschung oder Sperrung AnonymisierungPseudonymisierung: Ist die Zuordnung zu Einzelpersonen (nahezu) ausgeschlossen, so koumlnnen Daten verabeitet werden Julian M. Kunkel Lecture BigData Analytics, 51 29 1 Introduction 2 BigData Challenges 3 Analytical Workflow 4 Use Cases Overview 5 Programming 6 Summary Julian M. Kunkel Lecture BigData Analytics, 51 30 Source: 21 Julian M. Kunkel Lecture BigData Analytics, 51 31 Use Cases for BigData Analytics Increase efficiency of processes and systems Advertisement: Optimize for target audience Product: Acceptance (likedislike) of buyer, dynamic pricing Decrease financial risks: fraud detection, account takeover Insurance policies: Modeling of catastrophes Recommendation engine: Stimulate purchaseconsume Systems: Fault prediction and anomaly dete ction Supply chain management Science Epidemiology research: Google searches indicate Flu spread Personalized Healthcare: Recommend good treatment Physics: Finding the Higgs-Boson, analyze telescope data Enabler for social sciences: Analyze people s mood Julian M. Kunkel Lecture BigData Analytics, 51 32 Big Data in Industry Source: 20 Julian M. Kunkel Lecture BigData Analytics, 51 33 Example Use Case: Deutschland Card 2 Goals Customer bonus card which tracks purchases Increase scalability and flexibility Previous solution based on OLAP Big Data Characteristics Volume: O(10) TB Variety: mostly structured data, schemes are extended steadily Velocity: data growth rate O(100) GB month Results Much better scalability of the solution From dashboards to ad-hoc analysis within minutes Julian M. Kunkel Lecture BigData Analytics, 51 34 Example Use Case: DM 2 Goals Predict required employees per day and store Prevent staff changes on short-notice Big Data Characteristics Results Input data: O pening hours, incoming goods, empl. preferences, holidays, weather. Model: NeuroBayes (Bayes neuronal networks) Predictions: Sales, employee planning predictions per week Daily updated sales per store Reliable predictions for staff planning Customer and employee satisfaction Julian M. Kunkel Lecture BigData Analytics, 51 35 Example Use Case: OTTO 2 Goals Optimize inventory and prevent out-of-stock situations Big Data Characteristics Input data: product characteristics, advertisement VolumeVelocity: 135 GBweek, 300 million records Model: NeuroBayes (Bayes neuronal networks) 1 billion predictions per year Results Better prognostics of product sales (up to 40) Real time data analytics Julian M. Kunkel Lecture BigData Analytics, 51 36 Example Use Case: Smarter Cities (by KTH) 2 Goals Improve traffic management in Stockholm Prediction of alternative routes Big Data Characteristics Input data: Traffic videossensors, weather, GPS VolumeVelocity: 250k GPS-datas other data sources Results 20 less traffic 50 reduction in travel time 20 less emissions Julian M. Kunkel Lecture BigData Analytics, 51 37 Example Facebook Studies Insight from 11 by exploring posts Young narcissists tweet more likely. Middle-aged narcissists update their status US students post more problematic information than German students US Government checks tweetsfacebook messages for several reasons Human communication graph has an average diameter of 4.74 Manipulation of news feeds 13 News feeds have been changed to analysis people s behavior in subsequent posts Paper: Experimental evidence of massive-scale emotional contagion through social networks Julian M. Kunkel Lecture BigData Analytics, 51 38 From Big Data to the Data Lake 20 With cheap storage costs, people promote the concept of the data lake Combines data from many sources and of any type Allows for conducting future analysis and not miss any opportunity Attributes of the data lake Collect everything: all data, both raw sources over extended periods of time as well as any processed data Decide during analysis which data is important, e. g. no schema until read Dive in anywhere: enable users across multiple business units to refine, explore and enrich data on their terms Flexible access: enable multiple data access patterns across a shared infrastructure: batch, interactive, online, search, and others Julian M. Kunkel Lecture BigData Analytics, 51 39 1 Introduction 2 BigData Challenges 3 Analytical Workflow 4 Use Cases 5 Programming Java Python R 6 Summary Julian M. Kunkel Lecture BigData Analytics, 51 40 Programming BigData Analytics High-level concepts SQL and derivatives Domain-specific languages (Cypher, PigLatin) Programming languages Java interfaces are widely available but low-level Python and R have connectors to popular BigData solutions In the exercises, we ll learn and use basics of those languagesinterfaces Julian M. Kunkel Lecture BigData Analytics, 51 41 Introduction to Java Developed by Sun Microsystems in 1995 Object oriented programming language OpenJDK implementation is open source Source code byte co de just-in-time compiler Byte code is portable amp platform independent Virtual machine abstracts from systems Strong and static type system Popular language for Enterprise amp Big Data applications Most popular programming language (Pos. 1 on the TIOBE index) Development tools: Eclipse Specialties Good runtime and compile time error reporting Generic data types (vs. templates of C) Introspection via. Reflection Julian M. Kunkel Lecture BigData Analytics, 51 44 Introduction to Python Open source Position 5 on TIOBE index Interpreted language Weak type system (errors at runtime) Development tools: any editor, interactive shell Note: Use and learn python3 explicitly Recommended plotting library: matplotlib 9 Specialties Strong text processing Simple to use Support for object oriented programming Indentation is relevant for code blocks 9 Julian M. Kunkel Lecture BigData Analytics, 51 45 Example Python Program 1 binenv python 2 import re use the module re 3 4 function reading a file 5 def readfile(filename): 6 with open(filename, r ) as f: 7 data f. readlines() 8 f. close() 9 return data 10 return return an empty arraylist the main function 13 if name quot main quot: 14 data readfile( intro. py ) 15 iterate over the array 16 for x in data: 17 extract imports from a python file using a regex 18 m re. match(quotimport t(pltwhatgt )quot, x) 19 if m: 20 print(m. group(quotwhatquot)) 21 dictionary (key value pair) 22 dic m. groupdict() 23 dic. update( ) append a new dict. with one key 24 use format string with dictionary 25 print(quotfound import (WHAT)s in file (FILE)squot dic ) 26 Prints: Found import re in file intro. py Julian M. Kunkel Lecture BigData Analytics, 51 46 Example Python Classes 1 from abc import abstractmethod 2 3 class Animal(): 4 constructor, self are instance methods, else class methods 5 def init (self, weight): 6 self. weight weight private variables start with 7 8 decorator 10 def name(self): 11 return self. class. name reflection like def str (self): 14 return quotI m a s with weight fquot (self. name(), self. weight) class Rabbit(Animal): 17 def init (self): 18 super() is available with python 3 19 super(). init (2.5) def name(self): 22 return quotSmall Rabbitquot override name if name quot main quot: 25 r Rabbit() 26 print(r) print: I m a Small Rabbit with weight Julian M. Kunkel Lecture BigData Analytics, 51 47 Introduction to R Based on S language for statisticians Open source Position 19 on TIOBE index Interpreter with C modules (packages) Easy installation of packages via CRAN 10 Popular language for data analytics Development tools: RStudio (or any editor), interactive shell Recommended plotting library: ggplot2 11 Specialties Vectormatrix operations. Note: Loops are slow, so avoid them Table data structure (data frames) 10 Comprehensive R Archive Network 11 Julian M. Kunkel Lecture BigData Analytics, 51 48 Course for Learning R Programming 1 Run with quotRscript intro. rquot or run quotRquot and copyamppaste into interactive shell 2 Installing a new package is as easy as: 3 install. packages(quotswirlquot) 4 Note: sometimes packages are not available on all mirrors 5 library(swirl) load the package 6 7 help(swirl) read help about the function swirl swirl() start an interactive course to learn R 11 a simple for loop 12 for (x in 1:10) else 18 Julian M. Kunkel Lecture BigData Analytics, 51 49 Example R Program 1 create an array 2 x c(1, 2, 10:12) 3 4 apply an operator on the full vector and output it 5 print( x2 ) prints: slice arrays 8 print ( x3:5 ) prints: print( xc(1,4,8) ) prints: 1 11 NA r runif(100, min0, max100) create array with random numbers 12 m matrix(r, ncol4, byrow TRUE) create a matrix slice matri x rows quotmrow(s), column(s)quot 15 print( m10:12, ) Output: 16 ,1 ,2 ,3 ,4 17 1, 2, 3, slice rows amp columns 22 print ( m10, c(1,4) ) Output: 1 subset the table based on a mask 25 set m (m,1 lt 20 amp m,2 gt 2), Julian M. Kunkel Lecture BigData Analytics, 51 51 Summary Big data analytics Explore data and model causalities to gain knowledge amp value Challenges: 5 Vs Volume, velocity, variety, veracity, value Data sources: Enterprise, humans, Exp. Observational data (EOD) Types of data: Structured, unstructured and semi-structured Levels of analytics: Descriptive, predictive and prescriptive Roles in big data business: Data scientist and engineer Data science business intelligence Julian M. Kunkel Lecture BigData Analytics, 51 52 Bibliography 1 Book: Lillian Pierson. Data Science for Dummies. John Wiley amp Sons 2 Report: Juumlrgen Urbanski et. al. Big Data im Praxiseinsatz Szenarien, Beispiele, Effekte. BITKOM 3 4 Forrester Big Data Webinar. Holger Kisker, Martha Bennet. Big Data: Gold Rush Or Illusion Gilbert Miller, Peter Mork From Data to Decisions: A Value Chain for Big Data. 8 Andrew Stein. The Analytics Value Chain. 9 Dursun Delen, Haluk Demirkan. Decision Support Systems, Data, information and analytics as services. j.mp11bl9b9 10 Wikipedia 11 Kashmir Hill. 46 Things We ve Learned From Facebook Studies. Forbe. 12 Hortonworks Julian M. Kunkel Lecture BigData Analytics, 51

Comments