Category: opensource

by Jim Jagielski and Sally Khudairi

As the world’s largest and one of the most influential Open Source foundations, The Apache Software Foundation (ASF) is home to more than 350 community-led projects and initiatives. The ASF’s 731 individual Members and more than 7,000 Committers are global, diverse, and often embodies a case of collective humility. We’ve assembled a list of 20 ubiquitous and up-and-coming Apache projects to celebrate the ASF’s 20th Anniversary on 26 March 2019, applaud our all-volunteer community, and thank the billions of users who benefit from their Herculean efforts.
1. Apache HTTP Server
The most popular Open Source HTTP server on the planet shot to fame just 13 months from its inception in 1995, and remains so today due to its ability to provide a secure, efficient and extensible server that provides HTTP services observing the latest HTTP standards. Serving modern operating systems including UNIX, Microsoft Windows, and Mac OS/X, the Apache HTTP Server played a key role in the initial growth of the World Wide Web; its rapid adoption over all other Web servers combined was also instrumental to the wide proliferation of eCommerce sites and solutions. The Apache HTTP Server project was the ASF’s flagship project at its launch, and served as the basis upon which future Apache projects emulated with its open, community-driven, merit-based development process known as “The Apache Way”.
2. Apache Incubator
The Apache Incubator is the ASF’s nexus for innovation, serving as the entry path for projects and codebases wishing to officially become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects go through the incubation process to ensure all donations are in accordance with the ASF legal standards, and develop diverse communities that adhere to the ASF’s guiding principles. Incubation is required of newly accepted projects until their infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. Whilst incubation is neither a reflection of the completeness or stability of the code, nor does it indicate that the project has yet to be fully endorsed by the ASF, its rigorous process of mentoring projects and their communities according to “The Apache Way” has led to the graduation of nearly 200 projects in the Incubator’s 16-year history. Today 51 “podlings” are undergoing development in the Apache Incubator across an array of categories, including annotation, artificial intelligence, Big Data, cryptography, data science/storage/visualization, development environments, Edge and IoT, email, JavaEE, libraries, machine learning, serverless computing, and more.
3. Apache Kafka
The Apache footprint as the foundation of the Big Data ecosystem continues to grow, from Accumulo to Hadoop to ZooKeeper, with fifty active projects to date and two dozen more in the Apache Incubator. Apache Kafka’s highly-performant distributed, fault tolerant, real-time publish-subscribe messaging platform powers Big Data solutions at Airbnb, LinkedIn, MailChimp, Netflix, The New York Times, Oracle, PayPal, Pinterest, Spotify, Twitter, Uber, Wikimedia Foundation, and countless other businesses.
4. Apache Maven
Build Management. http://maven.apache.org/
Spinning out of the Apache Turbine servlet framework project in 2004, Apache Maven has risen to the top as the hugely popular build automation tool that helps Java developers build and release software. Stable, flexible, and feature-rich, Maven streamlines continuous builds, integration, testing, and delivery processes with an impressive central repository and robust plug-in ecosystem, making it the go-to choice for developers who want to easily manage a project’s build, reporting, and documentation.
5. Apache CloudStack
Super-quick to deploy, well-documented, and with an easy production environment, one of the biggest draws to Apache CloudStack is that it “just works”. Powering some of the industry’s most visible Clouds –from global hosting providers to telcos to the Fortune 100 top 5% and more– the CloudStack community is cohesive, agile, and focused, leveraging 11 years of Cloud success to enable users to rapidly and affordably build fully featured clouds.
6. Apache cTAKES
Developed from real-world use at the Mayo Clinic in 2006, cTAKES was created by a team of physicians, computer scientists and software engineers seeking a natural language processing system for extraction of information from electronic medical record clinical free-text. Today Apache cTAKES is an integral part of the Mayo Clinic’s electronic medical records and has processed more than 80 million clinical notes. Apache cTAKES is a growing standard for clinical data management infrastructure across hospitals and academic institutions that include Boston Children’s Hospital, Cincinnati Children’s Hospital, Massachusetts Institute of Technology, University of Colorado Boulder, University of Pittsburgh, and University of California San Diego, as well as companies such as Wired Informatics.
7. Apache Ignite
Apache Ignite is used for transactional, analytical, and streaming workloads at petabyte scale for the likes of American Airlines, ING, Yahoo Japan and countless others on premises, on cloud platforms, or in hybrid environments. Apache Ignite’s in-memory data fabric provides an in-memory data grid, compute grid, streaming, and acceleration solutions across the Apache Big Data system ecosystem, including Apache Cassandra, Apache Hadoop, Apache Spark, and more.
8. Apache CouchDB
Thousands of organizations such as the BBC, GrubHub, and the Large Hadron Collider use Apache CouchDB for seamless data flow between every imaginable computing environment, from globally-distributed server clusters to mobile devices to Web browsers. Its Couch Replication Protocol allows you to store, retrieve, and replicate data safely on premises or on the Cloud with very high performance reliability. Apache CouchDB does all the heavy lifting so you can sit back and relax.
9. Apache Edgent (incubating)
The boom of IoT –personal assistants, smart phones, smart homes, connected cars, Industry 4.0 and beyond– is producing an ever-growing amount of data streaming from millions of systems, sensors, equipment, vehicles and more. The demand for reliable, efficient real-time data has driven the need for the “Empowered Edge”, where data collection and analysis is optimized by moving away from centralized sources towards the edges of of the networks, where much of the data originates. Companies like IBM and SAP are leveraging Apache Edgent to accelerate analytics at the edge across the IoT ecosystem. Apache Edgent can be used in conjunction with many Apache data analytics solutions such as Apache Flink, Apache Kafka, Apache Samza, Apache Spark, Apache Storm, and more.
10. Apache OFBiz
Enterprise Resource Planning (ERP). https://ofbiz.apache.org/
Whereas most of the ASF projects are about running or creating infrastructure, we also realize the importance of running and handling a business. Apache OFBiz is a comprehensive suite of business applications from accounting and CRM through Warehousing and Inventory control. The Java based framework provides the power and the flexibility to serve as the core of one’s B2B and B2C business management and is easily expandable and customizable. Apache OFBiz is a complete ERP solution, flexible, free, and fully Open Source and services users from United Airlines to Cabi.
11. Apache SIS (Spatial Information System)
The US National Oceanic and Atmospheric Administration, Vietnamese National Space Center, numerous spatial agencies, governments, and others rely on Apache SIS to create their own intelligent, standards-based interoperable geospatial applications. The Apache SIS toolkit handles spatial data, location awareness, geospatial data representation, and provides a unified metadata model for file formats used for real-time smart city visualization, geospatial dataset discovery, state-of-the-art location-enabled emergency management, earth observation, as well as information modeling for extra-terrestrial bodies such as Mars and asteroids.
12. Apache Syncope
Identity Management. http://syncope.apache.org/
Apache Syncope manages digital identity data in enterprise applications and environments to handle user information such as username, password, first name, last name, email address, etc. Identity management involves considering user attributes, roles, resources and entitlements that control who access to what data, when, how, and why. Apache Syncope users include the Italian Army, the University of Helsinki, University of Milan, and the SWITCH Swiss university network.
13. Apache PLC4X (incubating)
Internet of Things (IoT). http://plc4x.incubator.apache.org/
Connectivity and integration across many Industrial IoT edge gateways is often impossible with closed-source, proprietary legacy systems with incompatible protocols. Apache PLC4X provides a universal protocol adapter for creating Industrial IoT applications through a set of libraries that allow unified access to any type of industrial programmable logic controllers (PLCs) using a variety of protocols with a shared API. In addition, the project is planning integrations modular to Apache IoT projects that include Apache Brooklyn, Apache Camel, Edgent, Apache Kafka, Apache Mynewt, and Apache NiFi.
14. Apache Commons
With 42%+ of Apache projects written in Java (that’s 62+ million lines of code), having a set of stable, reusable Open Source Java software components available to all Apache projects and external users is both helpful and necessary. Apache Commons provides a suite of dozens of stable, reusable, easily deployed Java components, and a workspace for Commons contributors to collaborate on the development of new components.
15. Apache Spark
Machine Learning. http://spark.apache.org/
Big Data is growing exponentially each year, accelerated by industries such as agriculture, big business, FinTech, healthcare, IoT, manufacturing, mobile advertising and more. Apache Spark’s unified analytics engine for processing and analyzing large-scale data processing helps data scientists apply machine learning insights and an array of libraries to improve responsiveness more accurate results. Apache Spark runs workloads 100x faster on Apache Hadoop, Apache Mesos, Kubernetes, whether standalone or in the cloud, and to access diverse data sources, from Apache Cassandra, Apache Hadoop HDFS, Apache HBase, Apache Hive, and hundreds of others.
16. Apache Cordova
Apache Cordova is the popular developer tool used to easily build cross-platform, cross-device mobile apps using a Write-Once-Run-Anywhere solution, which enabling developers to create a single app that will appear the same across multiple mobile device platforms. Apache Cordova acts as an extensible container, and serves as the base that most mobile application development tools and frameworks are built upon, including mobile development platforms and commercial software products by Blackberry, Google, IBM, Intel, Microsoft, Oracle, Salesforce, and many others.
17. Apache Tomcat
Starting off as the Apache JServ project, designed to allow for Java “servlets” to be run in a Web environment, Tomcat grew to being a full-fledged, comprehensive Java Application server and was the de-facto reference implementation for the Java specifications. Since 2005, Apache Tomcat has formed, and still forms, the foundation of numerous Java-based web infrastructures such as eBay, E*Trade, WalMart, and The Weather Channel.
18. Apache Lucene/Solr
Adobe, AOL, Apple, AT&T, Bank of America, Bloomberg, Cisco, Disney, eTrade, Ford, The Guardian, Homeland Security, Instagram, MTV Networks, NASA Planetary Data System, Netflix, SourceForge, Verizon, Walmart, whitehouse.gov, Zappos, and countless others turn to Apache Lucene Solr to quickly and reliably index and search multiple sites and enterprise data such as documents and email. Popular features include near real-time indexing, automated failover and recovery, rich document parsing and indexing, user-extensible caching, design for high-volume traffic, and much more. 
19. Apache Wicket
The Apache Wicket component-based Web application framework is prized by many followers for its “Plain Old Java Object” (POJO) data model and markup/logic separation not common in most frameworks. Developers have been using Apache Wicket since 2004 to quickly create powerful, reusable components using object oriented methodology with Java and HTML. Wicket powers thousands of applications and sites for governments, stores, universities, cities, banks, email providers, and more, including Apress, DHL, SAP, Vodafone, and Xbox.com.
20. Apache Daffodil (incubating)
Governments handle massive amounts of complex and legacy data across security boundaries every day. In order for such data to be consumed, it must be inspected for correctness and sanitized of malicious data. Whilst traditional inspection methods are often proprietary, incomplete, and poorly maintained, Apache Daffodil streamlines the process with an Open Source implementation of the Data Format Description Language specification (DFDL) that fully describes a wide array of complex and legacy file formats down to the bit level. Daffodil can parse data to XML or JSON to allow for validation, sanitization, and transformation, and also serialize or ”unparse” back to the original file format, effectively mitigating a large variety of common vulnerabilities.

The Apache Software Foundation is a leader in community-driven open source software and continues to innovate with dozens of new projects and their communities. Apache projects are managing exabytes of data, executing teraflops of operations, and storing billions of objects in virtually every industry. Apache software is an integral part of nearly every end user computing device, from laptops to tablets to phones. The commercially-friendly and permissive Apache License v2.0 has become an open source industry standard. As the demand for quality open source software continues to grow, the collective Apache community will continue to rise to the challenge of solving current problems and ideate tomorrow’s opportunities through The Apache Way of open development. Learn more at http://apache.org/

# # # 

Read more
Powerful Open Source Customer Data Platform in use at Al-Monitor, Altola, Jahia, and Yupiik, among others. 
Wakefield, MA —21 March 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Unomi™ as a Top-Level Project (TLP).
Apache Unomi is a standards-based, Customer Data Platform (CDP) that manages online customer, leads, and visitor information to provide personalized experiences that adheres to visitor privacy rules such as GDPR and “Do Not Track” preferences. The project was originally developed at Jahia, and was submitted to the Apache Incubator in October 2015.

“I am truly thankful to our community, especially our mentors, who have helped us achieve this milestone,” said Serge Huber, Vice President of Apache Unomi. “The original vision behind Unomi was to ensure true privacy by making the technologies handling customer data completely Open Source and independent. Since it was submitted to the Apache Incubator, developing Unomi using the Apache Way will ensure the project grows its community to be more diverse and welcome new users and developers.”

Apache Unomi is versatile, and features privacy management, user/event/goal tracking, reporting, visitor profile management, segmentation, personas, A/B testing, and more. It can be used as:

  • a personalization service for a Web CMS;
  • an analytics service for  native mobile applications;
  • a centralized profile management system with segmentation capabilities; and
  • a consent management hub

Apache Unomi is the industry’s first reference implementation of the upcoming OASIS CDP specification (established by the OASIS CXS Technical Committee, which sets standards as a core technology for enabling the delivery of personalized user experiences). As a reference implementation, Apache Unomi serves as a real world example of how the standard will be stable, and is quickly gaining traction by those interested in truly open and transparent customer data privacy. Apache Unomi is in use at organizations such as Al-Monitor, Altola, Jahia, Yupiik, and many others to create and deliver consistent personalized experiences across channels, markets, and systems.

“When Serge and I announced the launch of the Apache Unomi project at the 2015 ApacheCon Budapest, Apache Unomi, at that time, was the first proposal among the rising Customer Data Platform industry’s segment, positioned as an ‘ethical data-driven marketing’ product that would respect the privacy of customers while leveraging the power of unified customers data,” said Elie Auvray, Head of Business Development at Jahia. “Jahia’s digital experience management solutions are based on Apache Unomi, and we can’t wait to see how the project will now evolve with its growing community. Seeing today Apache Unomi becoming a Top-Level Project is a great reward for us as Open Source software believers. We are proud of this milestone, grateful to the Apache Software Foundation and our mentors, and we know it’s only the beginning of a new –hopefully long and successful– journey.”

“Under development at OASIS, the Customer Data Platform specification –for which Apache Unomi aims to be the reference implementation– lies at the crossroads of many solutions providers needs such as WCM, CRM, Big Data Platforms, Machine Learning, IoT and Digital Marketing,” said Laurent Liscia, CEO of OASIS. “At a time when client data interoperability and built-in data privacy are mandatory foundations for legal, consistent, and personalized experiences across channel markets and systems, the CDP specification, together with Apache Unomi, is a clear and welcome answer to end-user concerns.”

“Apache Unomi is the perfect solution to implement a user profile platform,” said Jean-Baptiste Onofré, Fellow at Talend. “It fully addresses the user trust and privacy needs, allowing to easily create user profile and Web marketing features. As Unomi is powered by Apache Karaf, it’s also a great platform for several use cases, such as digital marketing in Web applications, managing user profiles on IoT devices, and more.”

“Apache Unomi enables Al-Monitor readers to be driven towards additional personalized content that corresponds, via content tags profiling and related automated segmentations, to what they have already accessed,” said Valerie Voci, Head of Digital Strategy and Marketing at Al-Monitor. “This data follows our customers where they go, so it’s a consistent experience whether they are getting these recommendations in their inbox or on the Website or both. And if a change takes place on one, that change is immediately reflected on the other. It helps us create a very cohesive marketing message and a great overall digital experience.”

“As we were developing a progressive web app (PWA) for a client, we were looking for a Customer Data Platform (CDP) to store customer insights, such as behavioral and explicit customer data,” said Lars Petersen, Co-Founder at Altola. “Privacy was table stake for us, along with the flexibility to customize data schema and open API. We selected Apache Unomi based on these parameters, we had it up and running on AWS in less than 30 min. and are very impressed with the maturity of the platform, its privacy by design and how easy it was to work with.”

“In a digital world, customer data is very important to offer a better experience to users. However, data privacy and trust is not an option for users,” said François Papon, CTO at Yupiik. “Apache Unomi is the best solution for our clients because it’s an Open Source project managed by an independent foundation, there is no vendor lock-in. It’s also based on other solutions like Apache Karaf that made it ready for modularity, scalability, cloud, devops, and more.” 
“Apache Unomi is poised to disrupt the Customer Data Platform market,” said Thomas Sigdestad, CTO at Enonic, and co-chair, with Serge Huber, of the CDP standards work at OASIS open. “The CDP marketplace is lacking from a standard way of exchanging data, and the vendor space is over-represented by closed source and proprietary cloud offerings. This effectively limits the potential and adoption of CDP in general. Apache Unomi is not merely Open Source, but also the reference implementation of the imminent CDP standard from OASIS. Companies using Unomi will benefit from faster and simpler integrations without locking their customer data into yet another proprietary silo.” 
“Graduating as an Apache Top-Level Project is only the beginning,” added Huber. “Unomi has a lot of potential that it still to be developed, and is a perfect opportunity for those interested in Customer Data Privacy to participate through our mailing lists and Slack channel, and to learn more about the project on our Website and presentations.”
Catch Apache Unomi in action at ApacheCon North America (9-12 September 2019 in Las Vegas, Nevada), and ApacheCon Europe (22-24 October 2019 in Berlin, Germany) http://apachecon.com/ .
Availability and Oversight
Apache Unomi software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Unomi, visit http://unomi.apache.org/
About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects seeking to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server –the world’s most popular Web server software. Through the ASF’s meritocratic process known as “The Apache Way,” more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation’s official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, Leaseweb, Microsoft, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, Union Investment, Workday, and Verizon Media. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. “Apache”, “Unomi”, “Apache Unomi”, and “ApacheCon” are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
Read more
As Open Source software continues to grow in importance, it seems appropriate to reflect upon the ongoing success of The Apache Software Foundation (ASF) as it approaches its 20th anniversary. The Apache Way of community-driven development continues to gain momentum despite the compounding challenges of building software in the greater Open Source ecosystem.
This approach, The Apache Way, was defined over 24 years ago by the original Apache Group, prior to the establishment of the Foundation. It has led to our success as a foundation and we believe it has been fundamental to the triumph of Open Source as a whole.
While The Apache Way has been refined over the years, it remains true to the original goals of transparent, community-driven collaboration in a vendor-neutral environment that is accessible to all.

The Apache Way defines Open Source in terms of both a legal and a social framework for collaboration. It helps others understand what makes Open Source powerful and how participants are expected to behave. In this post we will examine The Apache Way in the context of the Foundation’s mission:

“The mission of the Apache Software Foundation (ASF) is to provide software for the public good. We do this by providing services and support for many like-minded software project communities consisting of individuals who choose to participate in ASF activities.” 

Let’s dissect this mission statement. 

“Provide Software for the Public Good”

Key points in this section: 

  • We produce software that is non-excludable and non-rivalrous
  • Use of the software in any context does not reduce its availability to others
  • Users and contributors have no committed responsibility to the foundation, our projects or our communities
  • Use of a license that conforms to the Open Source Definition is necessary but not sufficient to deliver on our mission 

Investopedia defines a public good as “a product that one individual can consume without reducing its availability to another individual, and from which no one is excluded.” On the surface, this is a good definition for our use of the term. However, there is a nuance in our use. Our mission is not to produce “public goods” but to “provide software for the public good”. 

To understand why this is important, one needs to think about what motivates the ASF to produce software that is a public good.

Open Source software can be digitally copied and reused in an unlimited number of ways. Every user can modify it for their specific needs. They can combine it with other software. They can design innovative new products and services using it and can make a living from the proceeds. This is all possible without impacting other people’s use of the software. As such, the ASF produces software that can be used for the public good in many different ways.

To allow us to deliver on this part of the mission, it is critical that we adopt a license that uses the law to protect the software curated here at the Foundation. For us that license is the Apache License, Version 2. In addition, we adopt an inbound licensing policy that defines which licenses are allowable on software reused within Apache projects. This policy can be summarized as: 

  • The license must meet the Open Source Definition (OSD).
  • The license, as applied in practice, must not impose significant restrictions beyond those imposed by the Apache License 2.0.

This means that you can be assured that software curated by projects within The Apache Software Foundation is both a public good and for the public good. You can use Apache software for any purpose and you have no responsibility to the Foundation or the project to contribute back (though as addressed in the next section, it is often in your interests to do so). 

It is important to recognize that there are software projects out there that adopt our license but do not adopt our inbound licensing policy. Such projects may bring restrictions that are not covered by our license; therefore, it is important to carefully examine the licensing policies of these projects. Using the Apache License alone may not provide you with the same options a Foundation project provides. 

Apache projects are successful, in large part, because of our diligence with respect to clearly-defined licensing policies. Such diligence makes it much easier for downstream users to understand what they can and cannot do with Apache software. The Apache License is deliberately permissive to ensure that everyone has an opportunity to participate in Open Source within the ASF or elsewhere. Modifications of our license are allowed, but modified licenses are neither the Apache License nor affiliated with or endorsed by The Apache Software Foundation. No modified license can be represented as such. Modified licenses that use the Apache name are strictly disallowed, as they are both confusing to users and undermine the Apache brand.

While we recognize that there are many ways to license software, whether Open Source or otherwise, we assert that only projects that use both our license (unmodified) and our inbound licensing policy truly follow and adhere to The Apache Way. 

While an OSD-approved license and associated policies are necessary for successful Open Source production, they are not sufficient. They provide a legal framework for the production of Open Source, but they do not provide a social framework, which brings us to the second sentence of our mission:

“The mission of the Apache Software Foundation is to provide software for the public good. We do this by providing services and support for many like-minded software project communities of individuals who choose to contribute to Apache projects.”

“Like-Minded Software Project Communities of Individuals”

Key points in this section: 

  • The Apache Way provides a governance model designed to create a social framework for collaboration
  • The Apache Software Foundation develops communities, and those communities develop software
  • ASF project communities develop and reuse software components that in turn may be reused in products
  • Users of ASF software often build products and services using our software components
  • Our model, and others like it, have produced some of the largest and longest-lived Open Source projects that have literally revolutionized the industry 

There is a lot packed into these few words. It is an understanding of these words that makes the difference between software that is under an Open Source license and software that reaches sustainability through The Apache Way. These words underscore the fact that the Foundation does not directly produce software. That’s right, The Apache Software Foundation, with upwards of $8Bn of software code, does not directly produce software. Rather than focus on software, we focus on the creation of and support of collaborative communities; the software is an intentional by-product. 

Our like-minded project communities come together because they share common problems that can be addressed in software. As the saying goes, “a problem shared is a problem halved”. By bringing together individuals with their unique ideas and skills, we break down barriers to collaboration. 

The Apache Way is carefully crafted to create a social structure for collaboration, which complements the legal framework discussed above. Where the legal framework ensures an equal right to use the software, The Apache Way ensures an equal ability to contribute to the software. This is critically important to the long term sustainability of Open Source software projects. This social structure for collaboration is missing from many non-Apache projects, yet a robust social structure is invariably a key component in long-term successful projects outside of the ASF.

The Apache Way is fully inclusive, open, transparent and consensus-based. It promotes vendor neutrality to prevent undue influence (or control) from a single company. It ensures that any individual with a valuable contribution is empowered, and it seeks to assure that a project remains sustainable despite inevitable changes in community membership over time.

Apache projects typically produce software components that can be combined with other software (of any license) in different ways to solve different problems. This provides plenty of opportunity for participants to collaborate within a given software project independent of their relationship outside the Foundation. This is very different from the idea of licensing your product as a whole under an Open Source license. Our model offers more opportunities for reuse which, in turn, increase the pool of individuals likely to contribute to the project.

In addition, our merit-based system seeks to ensure that as people come and go, for whatever reason, there is always someone to take their place. As a result, some ubiquitous Apache projects have existed for over 20 years and helped commercialize the World Wide Web; while dozens of newer projects have defined industry segments such as Big Data and IoT (Internet of Things). 

A core tenet of The Apache Way is “Community Over Code”, which encapsulates our deep belief that a healthy community is a far higher priority than good code. A strong community can always rectify a problem with the code, whereas an unhealthy community will likely struggle to maintain a codebase in a sustainable manner. Healthy communities ensure the Foundation has the stability to thrive for the next 20 years and beyond. Apache projects do not have the problem of scaling that others, who focus only on the legal frameworks of Open Source, suffer from. If you look around at projects that have grown up alongside the Apache projects, you will see a similar focus on scaling the governance model. This is no accident. 

Why this is Important

Software is a critical part of any modern economy. It touches every part of every life in the developed world, and is increasingly transforming everyday life, from womb to grave, everywhere.

At The Apache Software Foundation, we believe that every developer has their personal motivations for building software. We celebrate their right to choose when and how they build their software, including their right to use a non-open license. 

We will not dictate what is best for developers or for the software industry.

We care about the provision of software that enables our users, our contributors, and the general public to decide what is best for them.

We welcome you to use our software and contribute to our projects — or not. It’s up to you. 

We ask that you leave commercial interests at the door.

Countless organizations are proving that their team members who collaborate in a vendor-neutral environment often apply Open Innovation processes (such as The Apache Way) to their work. This helps create internal efficiencies and lays the groundwork for new external opportunities that may provide additional added benefits.

Bringing only your intention of contributing what best serves the greater Apache community reinforces trust in the people and projects behind the Apache brand, and helps us realize our mission of providing software for the public good. 

We learn together and work together to deliver the best software we can. 

Apache software is available for all.

The freedom to choose is what makes the Foundation and Apache projects so strong.

Summary

The software industry has changed and continues to change. The ways software is delivered to end users have changed. Some of the leaders in our industry have retired and new leaders have emerged. But some things have not changed. Our model of collaborative software development, through a combination of a licensing and social framework, remains one of the most successful models of software production.

Increasing the number of users, even those who do not contribute to code, should be seen as a benefit, not a problem, in Open Source. More users present an opportunity. At Apache, more users means more success since they are our future contributors.

As a US 501(c)(3) public charitable organization, The Apache Software Foundation helps individuals and organizations understand how Open Source at scale works in a highly competitive market. For more than two decades our focus has not been on producing software, but rather mentoring communities who produce software. The Apache Way advances sustainable Open Source communities: everything we do is Open Source so all kinds of users can benefit from our experience. Apache is for everyone.

# # #

Read more
Open Source Big Data in-memory columnar layer adopted by dozens of Open Source and commercial technologies; exceeded 1,000,000 monthly downloads within first three years as an Apache Top-Level Project
Wakefield, MA —19 February 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced momentum with Apache® Arrow™, the Open Source Big Data in-memory columnar layer.
Since the founding of the project in January 2016, Apache Arrow has quickly become the defacto standard for representing and processing analytical data in memory, accelerating analytical processing and interchange by more than 100x.
“When we became a Top-Level Project, we projected that the majority of the world’s data will be processed through Arrow within the next decade,” said Jacques Nadeau, Vice President of Apache Arrow. “In just three years time, we are proud to see Arrow’s substantial industry adoption and increased value across a wide range of analytical, machine learning, and artificial intelligence workloads.”
Highlights of Apache Arrow’s success include:
Industry Adoption —more than 20 major technologies adopted Arrow to accelerate in-memory analytics, including Apache Spark, NVIDIA RAPIDS, pandas, and Dremio, among others. A list of known Open Source and commercial implementations can be found at https://arrow.apache.org/powered_by/
Millions of Downloads —leveraging and integrating Apache Arrow into many other technologies has bolstered downloads to more than 1,000,000 each month.
New Language Support —as a cross-language development platform, supporting multiple programming languages is paramount. Apache Arrow has grown from supporting one language to eleven different languages today; they include C++, Java, Python, R, C#, Javascript, and Ruby, among others.
Seamless Data Format Support —Arrow supports different data types, both simple and nested, located in arbitrary memory such as regular system RAM, memory-mapped files or on-GPU memory. In addition, it can ingest data from popular storage formats such as Apache Parquet, CSV files, Apache ORC, JSON, and more.
Major Code Donations —Apache Arrow’s new features and expanded functionality are due in part to code and component donations that include:
  • C# Library
  • Gandiva LLVM-based Expression Compiler
  • Go Library
  • Javascript Library
  • Plasma Shared Memory Object Store
  • Ruby Libraries (Apache Arrow and Apache Parquet)
  • Rust Libraries (Parquet and DataFusion Query Engine)
Community and Contributor Growth —over the past 12 months, nearly 300 individuals have submitted more than 3,000 contributions that have grown the Apache Arrow code base by 300,000 lines of code. The Arrow community is welcoming approximately 10 new contributors each month.
In January the project announced its most recent release, Apache Arrow 0.12.0, which reflects more than 600 enhancements developed during Q4 2018. The Apache Arrow community is actively working on a number of impactful new initiatives that include solving high performance analytical problems and allowing for more efficient data distribution across entire clusters.
“Apache Arrow’s rapid industry adoption and developer community growth supports our original thesis of the importance of a language-independent open standard for columnar data,” said Wes McKinney, member of the Apache Arrow Project Management Committee, and creator of Python’s pandas project. “Additionally, we are seeing productive collaborations take place not only between programming languages but also between the database systems and data science worlds. We look forward to welcoming more data system developers into our community.”
About Apache Arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.
Availability and Oversight
Apache Arrow software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Arrow, visit http://arrow.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server –the world’s most popular Web server software. Through the ASF’s meritocratic process known as “The Apache Way,” more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation’s official global conference series. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, Union Investment, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. “Apache”, “Arrow”, “Apache Arrow”, and “ApacheCon” are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Read more
Pioneering Open Source distributed enterprise framework powers US$166B Big Data ecosystem

Wakefield, MA —23 January 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced Apache® Hadoop® v3.2.0, the latest version of the Open Source software framework for reliable, scalable, distributed computing.
Now in its 11th year, Apache Hadoop is the foundation of the US$166B Big Data ecosystem (source: IDC) by enabling data applications to run and be managed on large hardware clusters in a distributed computing environment. “Apache Hadoop has been at the center of this big data transformation, providing an ecosystem with tools for businesses to store and process data on a scale that was unheard of several years ago,” according to Accenture Technology Labs.
“This latest release unlocks the powerful feature set the Apache Hadoop community has been working on for more than nine months,” said Vinod Kumar Vavilapalli, Vice President of Apache Hadoop. “It further diversifies the platform by building on the cloud connector enhancements from Apache Hadoop 3.0.0 and opening it up for deep learning use-cases and long-running apps.”
Apache Hadoop 3.2.0 highlights include:
  • ABFS Filesystem connector —supports the latest Azure Datalake Gen2 Storage;
  • Enhanced S3A connector —including better resilience to throttled AWS S3 and DynamoDB IO;
  • Node Attributes Support in YARN —helps to tag multiple labels on the nodes based on its attributes and supports placing the containers based on expression of these labels;
  • Storage Policy Satisfier  —supports HDFS (Hadoop Distributed File System) applications to move the blocks between storage types as they set the storage policies on files/directories; 
  • Hadoop Submarine —enables data engineers to easily develop, train and deploy deep learning models (in TensorFlow) on very same Hadoop YARN cluster;
  • C++ HDFS client —helps to do async IO to HDFS which helps downstream projects such as Apache ORC;
  • Upgrades for long running services —supports in-place seamless upgrades of long running containers via YARN Native Service API (application program interface) and CLI (command-line interface).
“This is one of the biggest releases in Apache Hadoop 3.x line which brings many new features and over 1,000 changes,” said Sunil Govindan, Apache Hadoop 3.2.0 release manager. “We are pleased to announce that Apache Hadoop 3.2.0 is available to take your data management requirements to the next level. Thanks to all our contributors who helped to make this release happen.”
Apache Hadoop is widely deployed at numerous enterprises and institutions worldwide, such as Adobe, Alibaba, Amazon Web Services, AOL, Apple, Capital One, Cloudera, Cornell University, eBay, ESA Calvalus satellite mission, Facebook, foursquare, Google, Hortonworks, HP, Huawei, Hulu, IBM, Intel, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, Rakuten, SAP, Tencent, Teradata, Tesla Motors, Twitter, Uber, and Yahoo. The project maintains a list of educational and production users, as well as companies that offer Hadoop-related services at https://wiki.apache.org/hadoop/PoweredBy
Global Knowledge hails, “…the open-source Apache Hadoop platform changes the economics and dynamics of large-scale data analytics due to its scalability, cost effectiveness, flexibility, and built-in fault tolerance. It makes possible the massive parallel computing that today’s data analysis requires.”
Hadoop is proven at scale: Netflix captures 500+B daily events using Apache Hadoop. Twitter uses Apache Hadoop to handle 5B+ sessions a day in real time. Twitter’s 10,000+ node cluster processes and analyzes more than a zettabyte of raw data through 200B+ tweets per year. Facebook’s cluster of 4,000+ machines that store 300+ petabytes is augmented by 4 new petabytes of data generated each day. Microsoft uses Apache Hadoop YARN to run the internal Cosmos data lake, which operates over hundreds of thousands of nodes and manages billions of containers per day.
Transparency Market Research recently reported that the global Hadoop market is anticipated to rise at a staggering 29% CAGR with a market valuation of US$37.7B by the end of 2023.
Apache Hadoop remains one of the most active projects at the ASF: it ranks #1 for Apache project repositories by code commits, and is the #5 repository by size (3,881,797 lines of code).
“The Apache Hadoop community continues to go from strength to strength in further driving innovation in Big Data,” added Vavilapalli. “We hope that developers, operators and users leverage our latest release in fulfilling their data management needs.”
Catch Apache Hadoop in action at the Strata conference, 25-28 March 2019 in San Francisco, and dozens of Hadoop MeetUps held around the world, including on 30 January 2019 at LinkedIn in Sunnyvale, California.
Availability and Oversight
Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/ and https://twitter.com/hadoop
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server –the world’s most popular Web server software. Through the ASF’s meritocratic process known as “The Apache Way,” more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation’s official global conference series. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. “Apache”, “Hadoop”, “Apache Hadoop”, and “ApacheCon” are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Read more

Open Source Big Data workflow management system in use at Adobe, Airbnb, Etsy, Google, ING, Lyft, PayPal, Reddit, Square, Twitter, and United Airlines, among others.

Wakefield, MA —8 January 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Airflow™ as a Top-Level Project (TLP).

Apache Airflow is a flexible, scalable workflow automation and scheduling system for authoring and managing Big Data processing pipelines of hundreds of petabytes. Graduation from the Apache Incubator as a Top-Level Project signifies that the Apache Airflow community and products have been well-governed under the ASF’s meritocratic process and principles.

“Since its inception, Apache Airflow has quickly become the de-facto standard for workflow orchestration,” said Bolke de Bruin, Vice President of Apache Airflow. “Airflow has gained adoption among developers and data scientists alike thanks to its focus on configuration-as-code. That has gained us a community during incubation at the ASF that not only uses Apache Airflow but also contributes back. This reflects Airflow’s ease of use, scalability, and power of our diverse community; that it is embraced by enterprises and start-ups alike, allows us to now graduate to a Top-Level Project.”

Apache Airflow is used to easily orchestrate complex computational workflows. Through smart scheduling, database and dependency management, error handling and logging, Airflow automates resource management, from single servers to large-scale clusters. Written in Python, the project is highly extensible and able to run tasks written in other languages, allowing integration with commonly used architectures and projects such as AWS S3, Docker, Apache Hadoop HDFS, Apache Hive, Kubernetes, MySQL, Postgres, Apache Zeppelin, and more. Airflow originated at Airbnb in 2014 and was submitted to the Apache Incubator March 2016.

Apache Airflow is in use at more than 200 organizations, including Adobe, Airbnb, Astronomer, Etsy, Google, ING, Lyft, NYC City Planning, Paypal, Polidea, Qubole, Quizlet, Reddit, Reply, Solita, Square, Twitter, and United Airlines, among others. A list of known users can be found at https://github.com/apache/incubator-airflow#who-uses-apache-airflow

“Adobe Experience Platform is built on cloud infrastructure leveraging open source technologies such as Apache Spark, Kafka, Hadoop, Storm, and more,” said Hitesh Shah, Principal Architect of Adobe Experience Platform. “Apache Airflow is a great new addition to the ecosystem of orchestration engines for Big Data processing pipelines. We have been leveraging Airflow for various use cases in Adobe Experience Cloud and will soon be looking to share the results of our experiments of running Airflow on Kubernetes.” 

“Our clients just love Apache Airflow. Airflow has been a part of all our Data pipelines created in past 2 years acting as the ring-master and taming our Machine Learning and ETL Pipelines,” said Kaxil Naik, Data Engineer at Data Reply. “It has helped us create a Single View for our client’s entire data ecosystem. Airflow’s Data-aware scheduling and error-handling helped automate entire report generation process reliably without any human-intervention. It easily integrates with Google Cloud (and other major cloud providers) as well and allows non-technical personnel to use it without a steep learning curve because of Airflow’s configuration-as-a-code paradigm.”

“With over 250 PB of data under management, PayPal relies on workflow schedulers such as Apache Airflow to manage its data movement needs reliably,” said Sid Anand, Chief Data Engineer at PayPal. “Additionally, Airflow is used for a range of system orchestration needs across many of our distributed systems: needs include self-healing, autoscaling, and reliable [re-]provisioning.”

“Since our offering of Apache Airflow as a service in Sept 2016, a lot of big and small enterprises have successfully shifted all of their workflow needs to Airflow,” said Sumit Maheshwari, Engineering Manager at Qubole. “At Qubole, not only are we a provider, but also a big consumer of Airflow as well. For example, our whole Insight and Recommendations platform is built around Airflow only, where we process billions of events every month from hundreds of enterprises and generate insights for them on big data solutions like Apache Hadoop, Apache Spark, and Presto. We are very impressed by the simplicity of Airflow and ease at which it can be integrated with other solutions like clouds, monitoring systems or various data sources.”

“At ING, we use Apache Airflow to orchestrate our core processes, transforming billions of records from across the globe each day,” said Rob Keevil, Data Analytics Platform Lead at ING WB Advanced Analytics. “Its feature set, Open Source heritage and extensibility make it well suited to coordinate the wide variety of batch processes we operate, including ETL workflows, model training, integration scripting, data integrity testing, and alerting. We have played an active role in Airflow development from the onset, having submitted hundreds of pull requests to ensure that the community benefits from the Airflow improvements created at ING.  We are delighted to see Airflow graduate from the Apache Incubator, and look forward to see where this exciting project will be taken in future!”

“We saw immediately the value of Apache Airflow as an orchestrator when we started contributing and using it,” said Jarek Potiuk, Principal Software Engineer at Polidea. “Being able to develop and maintain the whole workflow by engineers is usually a challenge when you have a huge configuration to maintain. Airflow allows your DevOps to have a lot of fun and still use the standard coding tools to evolve your infrastructure. This is ‘infrastructure as a code’ at its best.”

“Workflow orchestration is essential to the (big) data era that we live in,” added de Bruin. “The field is evolving quite fast and the new data thinking is just starting to make an impact. Apache Airflow is a child of the data era and therefore very well positioned, and is also young so a lot of development can still happen. Airflow can use bright minds from scientific computing, enterprises, and start-ups to further improve it. Join the community, it is easy to hop on!”

Availability and Oversight
Apache Airflow software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Airflow, visit http://airflow.apache.org/ and https://twitter.com/ApacheAirflow

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server –the world’s most popular Web server software. Through the ASF’s meritocratic process known as “The Apache Way,” more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation’s official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. “Apache”, “Airflow”, “Apache Airflow”, and “ApacheCon” are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Read more

For this edition of FIC 2019, YesWeHack is organizing, for the first time in the history of FIC, a special event dedicated to Bug Bounty.

The International Cybersecurity Forum: the European reference event bringing together all stakeholders in digit…

Read more