======= Header metadata ======= Evaluation on 1943 random PDF files out of 1943 PDF (ratio 1.0). ======= Strict Matching ======= (exact matches) ===== Field-level results ===== label accuracy precision recall f1 title 95.01 72.17 68.35 70.21 authors 93.42 61.41 58.22 59.77 first_author 97.73 90.57 85.57 88 abstract 86.73 16.34 14.91 15.6 keywords 94.51 67.3 51.74 58.5 all fields 93.48 61.52 56.14 58.71 (micro average) 93.48 61.56 55.76 58.42 (macro average) ======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches) ===== Field-level results ===== label accuracy precision recall f1 title 96.39 83.53 79.1 81.26 authors 94.08 69.24 65.64 67.39 first_author 98.07 94 88.82 91.34 abstract 90.54 48.97 44.69 46.73 keywords 94.62 73.89 56.81 64.24 all fields 94.74 74.2 67.72 70.81 (micro average) 94.74 73.93 67.01 70.19 (macro average) ==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8) ===== Field-level results ===== label accuracy precision recall f1 title 97.19 89.84 85.07 87.39 authors 95.47 80.11 75.94 77.97 first_author 97.65 92.64 87.53 90.01 abstract 94.94 80.1 73.1 76.44 keywords 95.27 85.01 65.36 73.9 all fields 96.1 85.65 78.16 81.73 (micro average) 96.1 85.54 77.4 81.14 (macro average) = Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95) ===== Field-level results ===== label accuracy precision recall f1 title 96.45 84.84 80.34 82.53 authors 93.89 69.73 66.1 67.87 first_author 97.41 90.62 85.63 88.05 abstract 94.38 75.34 68.76 71.9 keywords 95.01 80.4 61.81 69.89 all fields 95.43 80.21 73.2 76.55 (micro average) 95.43 80.19 72.53 76.05 (macro average) ===== Instance-level results ===== Total expected instances: 1943 Total correct instances: 144 (strict) Total correct instances: 435 (soft) Total correct instances: 821 (Levenshtein) Total correct instances: 637 (ObservedRatcliffObershelp) Instance-level recall: 7.41 (strict) Instance-level recall: 22.39 (soft) Instance-level recall: 42.25 (Levenshtein) Instance-level recall: 32.78 (RatcliffObershelp) ======= Citation metadata ======= Evaluation on 1943 random PDF files out of 1943 PDF (ratio 1.0). ======= Strict Matching ======= (exact matches) ===== Field-level results ===== label accuracy precision recall f1 title 97.57 77 66.37 71.29 authors 97.3 75.55 62.3 68.29 first_author 98.39 86.2 70.94 77.83 date 99 91.31 74.55 82.08 inTitle 96.75 69.92 63.64 66.63 volume 99.09 93.27 78.53 85.27 issue 99.46 81.14 75.02 77.96 page 98.64 89.97 75.11 81.87 all fields 98.28 83.08 70.31 76.16 (micro average) 98.28 83.05 70.81 76.4 (macro average) ======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches) ===== Field-level results ===== label accuracy precision recall f1 title 98.69 88.39 76.18 81.83 authors 97.8 81.06 66.83 73.26 first_author 98.52 87.87 72.32 79.34 date 98.96 91.31 74.55 82.08 inTitle 97.87 80.86 73.6 77.06 volume 99.06 93.27 78.53 85.27 issue 99.44 81.14 75.02 77.96 page 98.6 89.97 75.11 81.87 all fields 98.62 87.25 73.83 79.98 (micro average) 98.62 86.73 74.02 79.83 (macro average) ==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8) ===== Field-level results ===== label accuracy precision recall f1 title 98.93 90.79 78.25 84.06 authors 98.4 86.79 71.56 78.44 first_author 98.51 87.93 72.36 79.39 date 98.95 91.31 74.55 82.08 inTitle 97.97 81.95 74.59 78.1 volume 99.05 93.27 78.53 85.27 issue 99.43 81.14 75.02 77.96 page 98.59 89.97 75.11 81.87 all fields 98.73 88.54 74.93 81.17 (micro average) 98.73 87.89 75 80.9 (macro average) = Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95) ===== Field-level results ===== label accuracy precision recall f1 title 98.85 89.86 77.45 83.2 authors 97.92 82.11 67.7 74.21 first_author 98.35 86.22 70.96 77.85 date 98.96 91.31 74.55 82.08 inTitle 97.71 79.42 72.29 75.69 volume 99.06 93.27 78.53 85.27 issue 99.44 81.14 75.02 77.96 page 98.6 89.97 75.11 81.87 all fields 98.61 87.15 73.76 79.9 (micro average) 98.61 86.66 73.95 79.77 (macro average) ===== Instance-level results ===== Total expected instances: 90125 Total extracted instances: 86169 Total correct instances: 31263 (strict) Total correct instances: 43303 (soft) Total correct instances: 46874 (Levenshtein) Total correct instances: 43004 (RatcliffObershelp) Instance-level precision: 36.28 (strict) Instance-level precision: 50.25 (soft) Instance-level precision: 54.4 (Levenshtein) Instance-level precision: 49.91 (RatcliffObershelp) Instance-level recall: 34.69 (strict) Instance-level recall: 48.05 (soft) Instance-level recall: 52.01 (Levenshtein) Instance-level recall: 47.72 (RatcliffObershelp) Instance-level f-score: 35.47 (strict) Instance-level f-score: 49.13 (soft) Instance-level f-score: 53.18 (Levenshtein) Instance-level f-score: 48.79 (RatcliffObershelp) Matching 1 : 59947 Matching 2 : 3653 Matching 3 : 2810 Matching 4 : 677 Total matches : 67087