SciencePAD Roadmap

The main SciencePAD motivations, concepts and expectations have been described in an overview document published in March 2012. Since then the agreed plan has been steadily carried forward by a number of EMI partners and collaborating projects. The plan foresees four major phases:

  • Alpha Phase (March to June 2012): definition of priorities and identification of volunteers SciencePAD maintainers
  • Design Phase (July to December 2012): progressive implementation of a prototype community portal, dissemination, engagement of scientific communities in trying the functionality and providing feedback
  • Concept Phase (January to April 2013): start of the regular activities, further requirements and implementation cycles. Until this date the community is run as a proof-of-concept service collecting information on the community interest
  • Operation Phase (May 2013 onward): regular operations, fund raising for continuing activities based on the success of the initiative. Phase down and discontinuation if no interest has emerged.

The current phase (Design Phase) is focused on designing in more details the expected functionality of the SciencePAD system and regularly releasing prototypes of increasing functionality to give users the possibility to provide feedback, ideas, comments and actively participate in the development of the SciencePAD concept.

 

Major functions

During the initial discussions phase a number of features were defined to become part of the initial set of functionality of the SciencePAD initiative. The different features can be grouped as follows, each of them has an associated Forum in the SciencePAD portal to allow discussions and exchanges of ideas:

 

Web site

The web site or wep portal is the main interaction point for the SciencePAD community. It provides access to all SciencePAD features and resources. The portal is currently developed using Drupal to take advantage of the already existing functionality provided by Drupal and its thousands of modules to create and manage content and social network interactions. It is currently accessible from the generic address http://SciencePAD.org, which redirects to the CERN-hosted Drupal site at http://SciencePAD.web.cern.ch (see below for the current plans about site hosting).

Associated forum: General discussion

 

Data model

The data model is the first step in the design and implementation. The elements that compose the typical software development environment from an organizational point of view have to be captured and described with enough information to allow useful data mining, comparisons, etc. As of today four main entities have been defined: People, Organizations, Collaborations and Software. These entities have been briefly described in a document published in the forum and a prototype implementation is available in the SciencePAD portal.

Associated forum: Catalogues and data model

 

Data discovery and automation

During the definition of the data model all the information for the objects in SciencePAD has to be inserted manually by the users. However, once the model has been defined and agreed together with the necessary encoding methods, moer efficient and automated ways of collecting, discovering and associating information have to be defined. Information can be imported or lnked from external sources or can be used to create new obejctes and associations (for example the contributors of a software product can be discovered from the commits in the source code management system, high-level product dependencies can be inferred from package level dependencies in package repositories, etc.). The relationships can be created automatically or suggested to the users when they login and browse the catalogues. The goal is to be able to ask the minimum amount of information and discover the rest from the base data.

Associated forum: Data discovery and automation

 

Comment and rating system

One of the majot goals of the SciencePAD initiative is to offer users the possibility of expressing in an open and transparent way their opinion on the software they use. The Comment and Rating system is the core of this functionality. Comments on the software can be for example be expressed using the forum topics associated to each registered software products or service or directly on the official software page in the SciencePAD portal. Ratings can be expressed using a standard rating system (for example the 5-star system). Ratings and comments have the double function of collecting user opinions and helping developers assess the user perception of their products so that it can be improved if necessary. Ratings and comments can also help other users to understand whether the software is right for them and what other existing users' experience is.

Associated forum: Comment and rating system

 

Subscriptions, followers, groups

Once users, organizations, collaborations and software are registered, we of course want to be kept informed of updates and announcements; we want to get in touch with users or developers with similar interests; we want to know of new releases of the software we are using. In a word we want to exploit the advantages of having a network of relationships with people, projects and software products. This is the typical feature of a social network and SciencePAD is indeed a specialized social network across a very diverse community of software developers and users within larger scientific research activities. It must be possible for users to create relationships based on their own interests, follow other users, create or become part of dedicated interest groups focused on specialissed topics. We don't presume to be inventing anything here, we rather look at what other successful social networks are doing and provide similar functionality on top of the special information we handle in SciencePAD. Collaboration and integration with existing social networks is very necessary.

Associated forum: Subscriptions, followers, groups

 

Data processing and reporting

The whole point of collecting and storing information is to be able to process it and extract meaningful conclusions, trends and correlations, generate useful reports and reuse the information as part of other documents, presentations, artcles, etc. It should be possible from the SciencePAD portal to search for people, organizations, collaborations and software using pre-defined or user-defined criteria, display the result of searches in various formats (lists, tables, maps, etc.), export the information in formats compatible with external spreadsheets or charting programs and so on. User should be able to define their own reports and save them in their profile.

Associated forum: Data processing and reporting

 

Marketplace

The SciencePAD Marketplace is where users and developers can match requirements and solutions. It can be a place discuss advertise functionality, expertise, services or consultancy; a place to combine products in community-specific sets and create integrated resusable solutions to be shared across the community. The concept of marketplace is powerful, but it must be clearly defined in scope and functionality to become truly useful and intuitive.

Associated forum: Marketplace

 

Unique Software IDs

During the initial discussions about possible objectives for SciencePAD, one topic raised particular interest: making software products uniquely identifiable and citable in scientific publications in the very same way as standard publications are identifiable and citable. Work is already in progress to extend the concept of Digital Object Identifiers (DOI) from papers to datasets and a further extension to software products releases seems a rather logical next step. This discussion needs a wide collaboration among software developers, publishing experts, librarians, etc. The possibility of minting and assigning software DOIs in a reliable and standard way would greatly benefit the software developers on one side, increasing their deserved academic recognition, but would also make the publication of scientific research fully reproducible through the combination of uniquely identifiable theories, processes, datasets and software applications.

Associated forum: Software products uniques identifiers

 

Planned tasks

Each of the above-described features must undergo a phase of design and prototype implementation during the period July-December 2012. As we proceed with the discussion we will add more and more details to each feature. As of September 2012 the design and implementation plan is as follows (? = completed tasks):

July 2012

  • ? Initial set up of the Drupal web site hosted at CERN, choice of theme, design of the overall layout and graphical aspect

August 2012

  • ? Initial draft of the data model, prototype implementations of the Person, Organization and Collaboration classes
  • ? Basic search and display of existing Persons, Organizations and Collaborations
  • ? All code and scripts developed for the portal is stored in Github

Semptember 2012

  • ? Iteration on the data model, prototype implementations of the Software classes
  • ? Basic search and display of existing Software products and services
  • ? By mid-September it should be possible to register users (using the standard CERN account system), Organizations, Collaborations and Software. Information is collected using plain forms that users have to fill.
  • Initial design of data discovery and automation methods

October 2012

  • Introduction of openID-based authentication
  • Implementation of data discovery and automation
  • Discuss with existing sources of scientific software data about collaborations and data exchanges
  • Implementation of comment and rating system (use stock Drupal modules)
  • Subscriptions and followers implementation (use stock Drupal modules)

November 2012

  • Progressive simplification of registration forms (reduce the amount of information asked, take advantage of automated data discovery)
  • Automatic creation of software forums upon software registration
  • Initial design and prototype of the Marketplace
  • Groups implementation (use stock Drupal modules)

December 2012 

  • Iteration on Marketplace design and implementation
  • Set of pre-defined reports, data exports
  • Consolidate developed code in propor Drupal modules
  • Move officially the portal under the SciencePAD.org domain

 

Future Tasks

As the design and prototype phase concludes in Deecember 2012 we plan to have a functional SciencePAD portal that allows users to:

  • register organizations, collaborations and software with minimal input
  • use a (semi-)automated data discovery system with context-aware recommendations to users about missing information, potentally valuable relationships, relevant news and announcements
  • rate the software based on personal experience and provide comments and feedback to the software developers
  • be part of a basic social network with bi-lateral followers subscriptions and personalized dynamic user home pages
  • create and manage interest groups around software-related scientific or technical topics
  • experiment with a first implementation of the Marketplace
  • search for software information and generate reports on software usage according to meaningful criteria

In addition to the above features, the discussion on software identifiers will be started between October and December 2012 and wil be continued during the first months of 2013. Any relevant agreement or decision will be implemented in SciencePAD.

During the first months of 2013 the SciencePAD portal will be operated officially as a production service. Its usage and any positive or negative trends will be monitored to assess whether the initiative is successful or otherwise. In case of successful operations, the governance and management mechanism to make it a regular community-driven services will be discussed and put in place.

This live roadmap document will be kept updated as the discussion goes on and features are implemented, discarded or added.

Your comments and feedback on the SciencePAD roadmap and your active participations in the designa and implementation of its features are required to make it successful. We encourage discussions in the dedicated forums and welcome any offer to participate in or lead any of the described features.