Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

Marco Pimentel is the Post-Doctoral Research Assistant working on the HAVEN project. He studied biomedical engineering at the New University of Lisbon, Portugal, and joined the Oxford Centre for Doctoral training in Healthcare Innovation in 2010. He completed his DPhil in Engineering in 2015 for which he focused on multivariate time-series modelling using Gaussian processes for detecting deterioration in vital-sign data acquired from post-operative patients. He has been working with the Critical Care Research Group since 2014 and his talk to the group was an achievement that delivered a full explanation of using patient and hospital data for research purposes without using maths or equations!

Translating medical research into clinical practice is far from being trivial. With the significant growth in the uptake of electronic health record (EHR) systems, and the availability of patient information collected in hospitals electronically, there has been a surge in the number of studies concerning analyses of large data sets to aid the identification of deteriorating patients in hospitals, but how clinicians or healthcare organisations adapt the results from such studies into practice is not straightforward.

Historically, the healthcare industry has generated large amounts of data, driven by compliance and regulatory requirements, record keeping and patient care. While most data are stored in hard copy form, the current trend is toward rapid digitisation of these large amounts of data. The close link between the University of Oxford (namely, the Institute of Biomedical Engineering) and the Oxford University Hospitals NHS Foundation Trust enabled the transition from paper charts to electronic documentation of vital signs - SEND. This system is now live in every adult ward across the Trust. Given the mandatory requirements and the potential to improve the quality of healthcare delivery meanwhile reducing the costs, these ever-growing quantities of data  (oftenly referred to as ‘big data’) hold the promise of supporting a wide range of medical and healthcare functions, including, but not limited to, clinical decision support, treatment management, and disease surveillance.

The term “big data” continues to generate a lot of hype in every industry including healthcare. Conceptually, “big data” includes data sets that are so large as to be considered unmanageable by traditional data management tools (such as those currently used in hospitals) or for human interpretation without the help of computerised data processing and/or analytics. We did not set out to review definitions of big data here; suffice is to say that no single definition of big data is universally accepted, though certain definitions do stand out. Big data, for example, is commonly defined by the 3 Vs (volume, velocity, variety), as well as other variants: there is the “4 Vs” definition, which captures data volume, velocity, variety, and veracity, and the “5 Vs” definition, which also considers value.

Regardless of the exact definition of “big data”, it is undeniable that digital data are proliferating in diverse forms within the healthcare field, not only because of the adoption of EHRs, but also because of the growing use of wireless technologies for ambulatory monitoring (example studies using wireless technologies include PICRAM and CALMS-2). In healthcare, in fact, data are overwhelming not only, or not so much, because of its volume but also because of the diversity of data types (“variety”) and the speed (“velocity”) at which the data must be managed. They include clinical data from clinical decision support systems, patient data in electronic patient records, machine generated/sensor data, such as from monitoring vital signs, physician’s written notes and prescriptions, medical imaging, laboratory, pharmacy, and other administrative data. With HAVEN, one of our current projects, we are aiming to create a system in which all of these sources of patient data and information are gathered and interlinked, and all healthcare staff are interconnected.

For the data scientist and researcher, there are, amongst this vast amount of data, many opportunities. One of the use cases of healthcare data in medicine is the application of machine learning techniques for alerting systems or to predict the likelihood of certain events based on continuous data streams. This is, among others, one use case of the HAVEN project. A different task could be the identification of subpopulations of similar patients, based on the collected data, that can guide treatment decisions for a given individual, and move towards, what is frequently called, “personalised medicine”, which essentially describes an approach to medicine that aims to take into account patient-specific factors [*]. Such an approach is not new; individual factors are taken into account when treating patients, and many subtypes of disease have already been identified that associated with differing treatments (e.g., type-1 and type-2 diabetes). However, by discovering associations and understanding patterns and trends within the data, after analysing the data, healthcare providers and other stakeholders in the healthcare delivery system can develop more thorough and insightful diagnoses and treatments, resulting, one would expect, in higher quality care at lower costs and in better outcomes overall.

While a challenge to traditional statistical techniques because of the level of granularity and resolution, healthcare data call for novel causal inference methodologies to model time-varying exposures and covariates to be explored by the data scientist. And woven through these issues are those of continuous data acquisition, different standards, different levels of quality assurance, and data cleansing. Health data are rarely standardised, often fragmented, and/or generated in legacy IT systems with incompatible formats, hence, requiring an extra number of processing steps for deciding whether the data are “useful” or not. Equally important, and challenging for producing a “result” that can be generalised, are the different conditions and scenarios in which data are captured, which may be through a range of different protocols and managerial processes that are not appropriately recorded electronically in current clinical information systems. Other considerations regarding ownership, governance, privacy, and security must also not be underlooked, and they are topics of discussion in the health tech industry news nowadays.

These are some of the great challenges, mentioned above, that the work from the cross-disciplinary collaboration between the University of Oxford and the Oxford University Hospitals NHS Foundation Trust continues to tackle. There is an urgent need to address these challenges, so we can see the rapid, widespread implementation and use of health data analytics across healthcare organisations and healthcare industry. The HAVEN project, which is currently underway, is no doubt a significant step towards that goal.


[*] D. Clifton, K. Niehaus, P. Charlton, and G. Colopy, “Health informatics via machine learning for the clinical management of patients,” Yearbook of Medical Informatics, vol. 10, no. 1, p. 38, 2015.



Useful links:

The HAVEN project

The University of Oxford Department of Engineering

The Institute of Biomedical Engineering

Computational Health Informatics Laboratory

Marco Pimentel, post-doctoral Research Assistant