Archives

# Yes br be estimate use estimate use be estimate br

Yes

be_estimate_use, estimate_use, be_estimate

_start_, association, between, _start__association_between, _start__association, association_between, be, estimate, use,
toxicity
Yes

be_estimate_use, estimate_use, be_estimate

Table 3

Number of abstracts and GSK126 phrases for each feature representation.

Training Set

Test Set 1

Test Set 2

Rep
Abs
Seed terms
Avg /Abs

Abs
Seed terms
Avg /Abs

Abs
Seed terms
Avg /Abs

Table 4

Model-representation performance using the initial seed terms as the gold standard for the Support Vector Machine (SVM) and General Linear Model (GLM). Bold depicts the best precision (P), recall (R), F1-score (F1) and accuracy (A).

Training Set

Test Set 1

Test Set 2

Model
P
R
F1
A
P
R
F1
A

P
R
F1
A

Table 5

Evaluation of test set 1 using manual annotations as the gold standard for the machine learning classifiers (Support Vector Machine (SVM) and General Linear Model (GLM) with each representation) and the list model. Bold depicts the best precision (P), recall (R), F1-score (F1) and accuracy (A).

P
R
F1
A

the outcomes reported from treatment strategies that differed with re-spect to their underlying biological processes. The silver standard was used to identify outcomes reported anywhere in any of the abstracts in the collection.
Table 7 shows the most frequent verified outcomes from anywhere in the abstract for chemotherapy (Doxorubicin and Docetaxel), hor-mone (Tamoxifen, Raloxifene and Bazedoxifene) and targeted (Trastu-zumab) therapies. Where multiple treatments are used in a study the abstract would be assigned to all applicable treatment types. The out-come measures have been unified, where abbreviations are converted to the long form, and modifiers are removed so that response rate, partial response and complete response are collapsed into the outcome response.

Similarly time frame and arithmetic qualifiers are removed so 1-year overall survival, five year median OS, and 3 week overall survival rate are all unified to overall survival. Clearly some of these details (particularly the time-frame) would be important to discern in subsequent analysis, but here we focus the discussion on surrogate and clinically relevant outcomes reported with respect to the treatment strategy.

The expressions listed in Table 7 show what was measured but provide only indirect insight into the effectiveness of a treatment strategy, because negation and the preceding verb have not been con-sidered. For example an abstract might state that no adverse events were recorded or that there was a limited response to the given treat-ment. However, the outcomes alone mirror differences in how these breast cancer treatment strategies work and how they are used when treating breast cancer.

The STEEP standard states that overall survival (marked * in Table 7) is “recognized as the least ambiguous and most clinically relevant clinical end point in clinical trials of cancer therapy” [18]. It is there-fore not surprising that OS is frequently reported for all three treatment types; however, the more general term of survival was used more fre-quently. Interestingly, the most frequent outcome across all three treatment strategies was response, which was not in the initial set of survivorship seed terms. Other new outcomes related to both the health effectiveness (e.g. resistance, progression, and incidence) and with ad-verse effects (e.g. toxicity, death, and safety).

STEEP states that “it takes decades before improvements in overall survival can reliably be confirmed” [18, p. 2128] so surrogate measures such disease free survival and progression free survival are used to reduce delays in reporting new treatment results. Table 7 shows that the

Table 6

Summary of model performance using the overall outcome estimate as the gold standard for the classifiers (Support Vector Machine (SVM) and General Linear Model (GLM)) with each representation and the list model. The highest precision (P), recall (R), F1-score (F1) and accuracy (A) are shown in bold.