5001.001.002.003 Development of a Pedigree Informatic System for America Makes Materials

Ideal data infrastructure adopting the Materials Data Curation System (MDCS) and Representational State Transfer Application Programming Interface (RESTful API) to archive project deliverables and connect to the other databases.

 

Development of a human-readable, searchable, and extendable XML-based database to systematically archive various types of data from America Makes projects.

Problem

America Makes project deliverables contain high volumes of complicated data making data management tools more critical. To this end, an interoperable information system is crucial. A well-designed database and visualization software are the key tools of an information system to understand the present knowledge and point out the gaps. An ideal database contains schemas to label data with detailed descriptions (metadata) which specify the differences among the datasets for subsequent visualization and analyses. Since additive manufacturing (AM) is a developing technology, an extensible database is preferred for maturing the technology. The needs of schemas and a database for AM development correspond to one of the strategic goals of the Material Genome Initiative (MGI).

 

Objective

The objective of this project was to develop a human-readable, searchable, and extendable XML-based database to systematically archive various types of data from America Makes projects. The project enabled the America Makes membership and government stakeholders to cross search the project deliverables for evaluation of project performance and identification of technology gaps. This data infrastructure could also be integrated with the America Makes Digital Store Front using API functions for general America Makes members.

Technical Approach

The project approach leveraged the data management tools developed by Southwest Research Institute (SwRI) and the open source data infrastructure from the National Institute of Standards and Technology (NIST) to improve the searchability and interoperability of the America Makes Digital Storefront. A preliminary data management tool was developed to automatically parse different types of raw data to XML format. A pedigree chart of XML schemas was created to cure the project data and organize the deliverables into a hierarchical structure of materials, processing strategies, structures, and property measurements. The schemas were developed based on the success of the NIST AM-Bench program. (AM-Bench provides a continuing series of controlled benchmark measurements, in conjunction with a conference series, enabling modelers to test their simulations against rigorous, highly controlled AM benchmark test data.) During the project period, the sample XML files were archived in an MDCS instance inside the SwRI firewall for additional cyber protection. Through the built-in functions and the RESTful API, MDCS enabled the interconnection of instances for data federation among the allied institutes. This framework provides a consistent mechanism to cure and share data and requires low maintenance for future projects.

Accomplishments

This project adopted and refined the public XML schemas which were developed for AM benchmark and AM material database projects at NIST. These schemas accommodated the data including the project background, feedstock information, AM and postbuilding schedules, microstructure characterization, property measurements, and simulation codes. To achieve the project objective, a comprehensive data management workflow was developed to parse information, enter data, and validate the schemas. To identify the project features and enable the searching function, a Python tool was implemented to parse the project deliverables into reusable formats. Since these deliverables contained files in different formats such as pdf, docs, and pptx, the Python tool employed several libraries including PyMuPDF, tabula, docx2python, and pptx.

This project developed an XML-based database, archived the selected project deliverables, and demonstrated Configurable Data Curation System (CDCS), which is an open-sourced software providing the data federation function. The CDCS instance for this project was then connected to other databases to demonstrate the mechanism of data federation. The data federation function allows the America Makes members to cross search project deliverables using scientific terminologies for future applications.

Project Participants

Project Principal

Other Project Participants

  • Southwest Research Institute

Public Participants

  • U.S. Department of Defense

Success Story

Back to Projects