Category: hadoop

Pioneering Open Source distributed enterprise framework powers US$166B Big Data ecosystem

Wakefield, MA —23 January 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced Apache® Hadoop® v3.2.0, the latest version of the Open Source software framework for reliable, scalable, distributed computing.
Now in its 11th year, Apache Hadoop is the foundation of the US$166B Big Data ecosystem (source: IDC) by enabling data applications to run and be managed on large hardware clusters in a distributed computing environment. “Apache Hadoop has been at the center of this big data transformation, providing an ecosystem with tools for businesses to store and process data on a scale that was unheard of several years ago,” according to Accenture Technology Labs.
“This latest release unlocks the powerful feature set the Apache Hadoop community has been working on for more than nine months,” said Vinod Kumar Vavilapalli, Vice President of Apache Hadoop. “It further diversifies the platform by building on the cloud connector enhancements from Apache Hadoop 3.0.0 and opening it up for deep learning use-cases and long-running apps.”
Apache Hadoop 3.2.0 highlights include:
  • ABFS Filesystem connector —supports the latest Azure Datalake Gen2 Storage;
  • Enhanced S3A connector —including better resilience to throttled AWS S3 and DynamoDB IO;
  • Node Attributes Support in YARN —helps to tag multiple labels on the nodes based on its attributes and supports placing the containers based on expression of these labels;
  • Storage Policy Satisfier  —supports HDFS (Hadoop Distributed File System) applications to move the blocks between storage types as they set the storage policies on files/directories; 
  • Hadoop Submarine —enables data engineers to easily develop, train and deploy deep learning models (in TensorFlow) on very same Hadoop YARN cluster;
  • C++ HDFS client —helps to do async IO to HDFS which helps downstream projects such as Apache ORC;
  • Upgrades for long running services —supports in-place seamless upgrades of long running containers via YARN Native Service API (application program interface) and CLI (command-line interface).
“This is one of the biggest releases in Apache Hadoop 3.x line which brings many new features and over 1,000 changes,” said Sunil Govindan, Apache Hadoop 3.2.0 release manager. “We are pleased to announce that Apache Hadoop 3.2.0 is available to take your data management requirements to the next level. Thanks to all our contributors who helped to make this release happen.”
Apache Hadoop is widely deployed at numerous enterprises and institutions worldwide, such as Adobe, Alibaba, Amazon Web Services, AOL, Apple, Capital One, Cloudera, Cornell University, eBay, ESA Calvalus satellite mission, Facebook, foursquare, Google, Hortonworks, HP, Huawei, Hulu, IBM, Intel, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, Rakuten, SAP, Tencent, Teradata, Tesla Motors, Twitter, Uber, and Yahoo. The project maintains a list of educational and production users, as well as companies that offer Hadoop-related services at https://wiki.apache.org/hadoop/PoweredBy
Global Knowledge hails, “…the open-source Apache Hadoop platform changes the economics and dynamics of large-scale data analytics due to its scalability, cost effectiveness, flexibility, and built-in fault tolerance. It makes possible the massive parallel computing that today’s data analysis requires.”
Hadoop is proven at scale: Netflix captures 500+B daily events using Apache Hadoop. Twitter uses Apache Hadoop to handle 5B+ sessions a day in real time. Twitter’s 10,000+ node cluster processes and analyzes more than a zettabyte of raw data through 200B+ tweets per year. Facebook’s cluster of 4,000+ machines that store 300+ petabytes is augmented by 4 new petabytes of data generated each day. Microsoft uses Apache Hadoop YARN to run the internal Cosmos data lake, which operates over hundreds of thousands of nodes and manages billions of containers per day.
Transparency Market Research recently reported that the global Hadoop market is anticipated to rise at a staggering 29% CAGR with a market valuation of US$37.7B by the end of 2023.
Apache Hadoop remains one of the most active projects at the ASF: it ranks #1 for Apache project repositories by code commits, and is the #5 repository by size (3,881,797 lines of code).
“The Apache Hadoop community continues to go from strength to strength in further driving innovation in Big Data,” added Vavilapalli. “We hope that developers, operators and users leverage our latest release in fulfilling their data management needs.”
Catch Apache Hadoop in action at the Strata conference, 25-28 March 2019 in San Francisco, and dozens of Hadoop MeetUps held around the world, including on 30 January 2019 at LinkedIn in Sunnyvale, California.
Availability and Oversight
Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/ and https://twitter.com/hadoop
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server –the world’s most popular Web server software. Through the ASF’s meritocratic process known as “The Apache Way,” more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation’s official global conference series. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. “Apache”, “Hadoop”, “Apache Hadoop”, and “ApacheCon” are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Read more