Are managers seduced by the sweet elixir of open source?

IT managers see the potential for cost-cutting from transitioning application development to open source software (OSS). Today, companies can hire recent college graduates with skills in open source development and avail themselves of the free software. But is all that glitters really gold? Users groups and more formal workgroups are experiencing an avalanche of new routines that have been made available. And users are now sharing programs along with their own cutting-edge ideas, whether based on R, Python, Tensorflow, or another programming language or library.

Therefore, it has to be assumed that business leaders can’t help but be influenced by this trend. They see the potential hiding in plain sight as more and more open source code is adopted – even for mission-critical processes. Andy leaders are asking IT managers to explore the possibilities due to the ever-present pressure to cut back on costs. However, a comprehensive risk-benefit analysis is often further down on the to-do list.

Hidden Insights: Are managers seduced by the sweet elixir of open source — Are managers seduced by the sweet elixir of open source?

The frequent and changing nature of big data downloads

Is it really true that hiring recently graduated open source enthusiasts and downloading all this free software helps lower the total cost of ownership over three to five years? In this case, the total cost of ownership refers to the life cycle costs related to development, rollout, reasonable risk premiums, ongoing operations and maintenance. The exact composition of the costs in relation to business processes can vary from company to company.

To find an answer to these questions, you must take a closer look at the company’s business model, risk remediation requirements, and dependence on vendors. This kind of discussion has become a critical component in collaborations between vendors and customers. Hopefully, the overview below can help you make a more objective evaluation when it comes to such strategic decisions.

Most open source tools, including R and Python, are primarily based on developers’ use of routines that load all of the data into the PC’s memory in one fell swoop. This can present a challenge when you’re working with big data. To solve this quandary, data scientists need to supplement their routines with other packages or write their own code as a work-around. This further complicates the already-demanding process of achieving optimal workflow between data entry, adaptation and execution.

Freedom of choice

In most cases, users prefer distributed processing of big data tables. By adapting data tables in this way, other projects and users can access the same set of data that is resident in the memory. Many users highlight the importance of being able to utilise distributed processing of big data while accessing data processing functions that are available via open APIs during open source development.

Data scientists want the freedom to code in their own favourite language. For recent graduates, this is typically the programming language, methods and functions they learned during their studies. For example, data analysts often use Python to invoke powerful external processing and analysis engines. R developers may be more concerned about connecting with external multithreaded processors since R’s architecture doesn’t permit this.

In such cases, it’s important for analysts to be able to access APIs that are as similar as possible (syntactically uniform) while still simple to learn and use. Moreover, there must be a straightforward way to handle classification variables and missing data values when the analysts model the data.

Comprehensive analytical life cycle

Some companies still have some reservations since open source tools largely lack the concepts, tools and methods to implement a comprehensive analytical life cycle. This life cycle encompasses all facets of the evolution of data from its acquisition to its development and subsequent maturation into actionable, operative information.

The importance of a complete analytical life cycle becomes readily apparent when analysts are maintaining or adapting models to suit the varying needs of test, development and production environments. A limited number of models can, of course, be manually recoded, but not hundreds of models integrated into a mission-critical production run. Thus, it’s probably no surprise that data scientists are often more content to describe their own concepts, research and modelling than their existing models. This ignores the fact that existing models are the nerve centre of the production environment and often a source of hidden costs. This is where managers need to actively take charge to ensure their data scientists are able to see the big picture.

Information security and personal data protection

We hear from many users that new security problems often pop up around open source code. At the same time, many projects don’t have the mechanisms in place to troubleshoot and remedy the issues. Likewise, there is no accepted standard way in which to document the security of open source code projects. In an international survey of managers, respondents pointed to security as the biggest challenge for open source software.

The survey also revealed that only 5% of the half-million largest official repositories on GitHub provided their own security documentation. Perhaps the most daunting challenge is that even if this issue were solved, it isn’t always possible to identify and warn people using the old code since open source users groups often don’t maintain records of who is using a particular component or version.

Is open source really free?

Open source is here to stay. And just as universities are certain to churn out graduates with open source coding skills, open source vendors will continue to crawl out of the woodwork to cater to this booming market. The field is getting crowded, and some vendors will fall by the wayside. We’ll probably see a fair amount of consolidation and new clusters in frameworks such as Spark.

What should management be asking IT to make the right decisions? At a minimum, managers should ask for documentation that includes cost and earnings estimates, risk assessments, and analyses that forecast the impact on productivity growth and user friendliness. This should be a part of the decision-making process when management hashes out future strategies or analyzes the need for specific investments.

Open source considerations for IT

When it comes to the IT department, a number of strategic and practical tasks arise that need to be answered.

Direct and indirect costs in relation to infrastructure.
Costs related to the life cycle environment: development, rollout, lifespan and maintenance.
The opportunity for reuse and innovation, the benefits and potential impact on customer interaction.
Boosting revenue, potential cost overruns and the impact on work processes. What needs to be changed? What exactly is Plan B?
New opportunities created by suggestions for changes in the technology, including the value of being able to handle larger amounts of data, improve customer dialogue, access more processing power, and expand computing capacity.
Interaction with other data systems, standardisation and the ability to retrain the system.
Complete freedom for users in the choice of a cloud-based solution and the freedom to choose from the various cloud services available on the market.
Uniform, intuitive and graphical user interfaces and effective training procedures.

Extensive calculation models that produce these kinds of overviews while also taking into account qualitative factors are not readily available. However, what complicates these calculations is the changing nature of technology and the difficulty in finding workers who possess new skills. These two factors act as an X-factor and raise the level of uncertainty when it comes to the calculations. If we take a closer look at these types of challenges on the basis of strategic analyses that can help predict the future, we see accelerating changes within several product areas.

Hitting the sweet spot

This makes it increasingly difficult for companies to hit the sweet spot of the life cycle and be able to relate to what Gartner called “adopting too early, giving up too soon, adopting too late, or hanging on too long.” If we look at, for example, Gartner’s Hype Cycle for Emerging Technologies, we gain a better understanding of what to expect from new solutions and the challenges these present to architecture and system development. Groundbreaking systems such as the brain computing interface, knowledge graphs and autonomous mobile robots will cross our paths over the course of the next five years.

These are just three examples in the concept stage. Each will inevitably present overwhelming demands for accuracy on our risk remediation and decision-making processes. In order for open source software to take its seat at the table, documentation standards must significantly improve. There must be better version control routines for interface development and production, as well as structured service routines for bug fixing and data cleansing. Last, but not least, vendors need to take more responsibility for any defects or bugs that may arise.

Are you the manager of a company? Then ask your IT department for documentation and make sure users are closely involved every step of the way as you develop your IT plans for the future. After all, whether you’re a private company or a government agency, it’s all about safeguarding the viability and future growth of your business. IT is the tool to implement this in a cost-effective, secure manner.

Want to know about SAS and the open ecosystem? Read our white paper SAS in the Open Ecosystem or check out our e-book Out in the Open With Analytics.

Blogs