Turkish Language Annotation
of an Internet Pathology
Image Archive.
G. William Moore, MD, PhD [1,2,3].
Enver Vardar, MD [4].
Yener S. Erozan, MD [3].
Fatih Durmusoglu, MD. [5].
From:
Pathology and Laboratory Medicine Service,
Veterans Affairs Maryland Health Care System, Baltimore, Maryland [1].
Department of Pathology, University of Maryland School of Medicine,
Baltimore, Maryland [2].
Department of Pathology,
The Johns Hopkins Medical Institutions, Baltimore,
Maryland [3].
Department of Pathology, Sosyal Sigortalar Kurumu, Izmir, Turkey [4].
Department of Obstetrics and Gynecology,
Marmora University School of Medicine, Istanbul, Turkey [5].
TABLE OF CONTENTS.
1. ABSTRACT.
2. INTRODUCTION.
3. ORGANIZATION OF WRITTEN TURKISH.
4. INSTRUCTIONS FOR TURKISH INPUT.
5. NEW WORD FORMATION FOR TURKISH WORDS.
6. MATERIALS.
7. UNIFIED MEDICAL LANGUAGE SYSTEM.
8. BARRIER WORD METHOD.
9. BARRIER WORD METHOD: TURKISH TRANSLATION.
10. SAMPLE QUERY: ENTER TURKISH ROMAJI.
11. SAMPLE QUERY: SELECT ENGLISH TRANSLATION.
12. SAMPLE QUERY: SELECT UMLS TERM.
13. SAMPLE QUERY: SELECT AFIP LEGEND TITLE.
14. SAMPLE QUERY: VIEW TURKISH ANNOTATIONS.
15. RESULTS.
16. CONCLUSION.
17. REFERENCES.
18. ZIPF DISTRIBUTION: TURKISH TRANSLATION.
1. ABSTRACT.
NEXT PAGE.
RETURN TO TABLE OF CONTENTS.
Background:
Anatomic pathology images in a large archive must be recoverable
both by pathologic diagnosis and by descriptive content.
The Image Archive of The Johns Hopkins Autopsy Resource
website (JHAR-IA), at URL:
http://www.netautopsy.org
consists of over five thousand uncopyrighted anatomic pathology images
from the Armed Forces Institute of Pathology Electronic Fascicles (AFIP-EF).
The images have been computer-indexed in the Unified Medical Language System
(UMLS), based upon corresponding English-language image-legends.
For Turkish speakers who use English as a second language,
it is helpful to annotate these image-legends in Turkish,
so that images may be recalled by Turkish keywords.
Design:
All words and UMLS concepts in the pathology image-legends
of the AFIP-EF posted on the JHAR-IA were pointed to Turkish words
or phrases. Turkish is an Altaic language,
linguistically unrelated to English, but displayed in the Roman alphabet
with six special characters.
Simple noun-phrases, e.g., CARCINOMA OF KIDNEY,
were translated into Turkish with appropriate word-rearrangements
and noun-inflections, corresponding to rules of Turkish syntax.
Indexing software was written in M-language (formerly, MUMPS),
and display software was written
in the Practical Extraction and Reporting Language (PERL).
Results:
There were 5,465 pathology images posted on the JHAR-IA,
with image-legends containing 5,364 distinct words
and pointing to 3,016 distinct UMLS concepts,
ranging in frequency from 5,465 occurrences of four UMLS terms
to one occurrence apiece of 875 UMLS terms.
Each word from the image-legends was translated
as a Turkish annotation.
There were 1,992 UMLS terms (66%) that were noun-phrases,
prepositional phrases, or other elementary grammatical constructions
that could be computer-translated into grammatically correct Turkish.
Conclusion:
English is the dominant language of the Internet,
but non-native English speakers may by assisted
in finding images based upon non-English keywords.
The Johns Hopkins Autopsy Resource Image Archive website
may be queried on the Internet
with either English or Turkish query-words, and bilingual annotations.
2. INTRODUCTION.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
ANATOMIC PATHOLOGY IMAGES RECOVERABLE BY DIAGNOSIS AND DESCRIPTIVE CONTENT.
IMAGE ARCHIVE OF JOHNS HOPKINS AUTOPSY RESOURCE WEBSITE (JHAR-IA):
http://www.netautopsy.org
Click on: 5000 IMAGES. Click on: TURKISH.
OVER FIVE THOUSAND UNCOPYRIGHTED ANATOMIC PATHOLOGY IMAGES
FROM ARMED FORCES INSTITUTE OF PATHOLOGY ELECTRONIC FASCICLES (AFIP-EF).
IMAGES COMPUTER-INDEXED IN UNIFIED MEDICAL LANGUAGE SYSTEM (UMLS).
WORDS AND UMLS CONCEPTS POINTED TO TURKISH WORDS OR PHRASES.
INDEXING SOFTWARE IN M-LANGUAGE (FORMERLY, MUMPS).
DISPLAY SOFTWARE IN PRACTICAL EXTRACTION AND REPORTING LANGUAGE (PERL).
GOAL: ANNOTATE TEXT IN TURKISH.
3. ORGANIZATION OF WRITTEN TURKISH.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
ALTAIC LANGUAGE, LINGUISTICALLY UNRELATED TO ENGLISH.
DISPLAYED IN THE ROMAN ALPHABET WITH SIX SPECIAL CHARACTERS:
ç g I ö s ü
VOWEL HARMONY:
FRONT VOWELS: e i ö ü
BACK VOWELS: a I o u
SIMPLE NOUN-PHRASES WITH WORD-REARRANGEMENTS AND NOUN-INFLECTIONS.
EXAMPLE: CARCINOMA OF KIDNEY.
WORD-FOR-WORD: KARSINOMA SI/ BO/BREK
REARRANGED: BO/BREK KARSINOMA SI/
INFLECTED: BO/BREK KARSINOMA SI/
4. INSTRUCTIONS FOR TURKISH INPUT.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
Dikkat!
Turkçe arama yapabilmek için:
ç yerine c ; g yerine g ;
I yerine i ;
ö yerine o ; s yerine s ; ü yerine u ;
gibi fonetik igaretleri olmayan harfleri
kullanInIz.
Örnegin, ingilizce KIDNEY kelimesi
içeren patoloji görüntülerini
taramak için:, lütfen BÖBREK yerine BOBREK
yazInIz.
Türkçe notlar:
Prof. Yener S. Erozan, M.D., Enver Vardar, M.D., ve
Fatih Durmusoglu, M.D.
Attention!
For entering a Turkish search-word,
you must enter the special letters of the Turkish language
without diacritical marks, as follows:
c for ç ; g for g ; i for I ;
o for ö ; s for s ; u for ü .
For example, please enter BOBREK , not BÖBREK ,
in order to obtain image-legends containing
the English word, KIDNEY.
Turkish annotations by:
Prof. Yener S. Erozan, M.D., Enver Vardar, M.D., and
Fatih Durmusoglu, M.D.
5. NEW WORD FORMATION
FOR TURKISH WORDS.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
SAY IT IN FRENCH.
TRANSCRIBE PHONETICALLY AS TURKISH.
EXAMPLE: PATHOLOGIE => PATOLOJI
6. MATERIALS.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
6,241 LEGEND-TEXTS FROM ELECTRONIC FASCICLES OF AFIP.
NON-COPYRIGHTED 5,465 IMAGES COMPRESSED 1:10 AS JPEG FILES
IMAGES LOADED INTO THE INTERNET AUTOPSY DATABASE IMAGE ARCHIVE.
www.netautopsy.org
7. UNIFIED MEDICAL LANGUAGE SYSTEM (UMLS).
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
UNIFIED MEDICAL LANGUAGE SYSTEM (UMLS)
:
DEVELOPED BY U.S. NATIONAL LIBRARY OF MEDICINE
(USNLM) IN 1986.
PURPOSE: AID DEVELOPMENT
OF SYSTEMS
TO RETRIEVE ELECTRONIC
BIOMEDICAL INFORMATION.
URL:
http://www.nlm.nih.gov/research/umls/
LAST UPDATED: March 19, 1999.
SIZE: 96,412,092 BYTES.
CONCEPT UNIQUE IDENTIFIERS (CUIs): 625,530, MAX=C0700344.
SYNONYMS: 1,362,823.
LANGUAGE: PRIMARILY ENGLISH.
PARTIAL TRANSLATIONS: GERMAN, FRENCH,
SPANISH, ITALIAN, RUSSIAN,
DUTCH, PORTUGUESE, HUNGARIAN,
FINNISH, SWEDISH, NORWEGIAN, DANISH.
NO TURKISH.
OVER 50 SOURCE-VOCABULARIES.
8. BARRIER WORD METHOD.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
NATURAL-LANGUAGE MEDICAL TEXT: SEQUENCE OF MEDICAL CONCEPTS
SEPARATED BY GRAMMATICAL OBJECTS.
THE GRAMMATICAL OBJECTS, OR BARRIER WORDS:
NUMERALS, PUNCTUATION, SINGLE LETTERS, ARTICLES, PREPOSITIONS,
AND COMMON VERBS AND MODIFIERS.
MEDICAL CONCEPTS, OR KEYWORDS:
ARE ONE-WORD OR MULTIPLE-WORD TERMS,
CONSISTING OF MEDICALLY SIGNIFICANT WORDS.
9. BARRIER WORD METHOD:
TURKISH TRANSLATION.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
BARRIER WORD METHOD: SAMPLE TEXT.
LENTIGINOUS COMPOUND NEVUS . this LESION is an EARLY COMPOUND NEVUS ,
because a NEST has MIGRATED from the EPIDERMIS into the DERMIS
( lower right of c ) . elsewhere , the HISTOLOGY
is that of a SIMPLE LENTIGO .
barrier words displayed in lower case.
KEYWORDS DISPLAYED IN UPPER CASE.
LEGEND NAME |
UMLS CODE |
UMLS NAME |
TURKISH |
| LENTIGINOUS |
C0023321 |
Lentigo |
lentigo |
| COMPOUND NEVUS |
C0259781 |
Compound Nevus |
bilesim nevüs |
| LESION |
C0012634 | LESION | lezyon |
| EARLY |
C0205085 |
Early |
erken |
| COMPOUND NEVUS |
C0259781 |
Compound Nevus |
bilesim nevüs |
| NEST |
C0205234 | FOCAL | fokal |
| MIGRATED |
C0232902 |
Migration |
göç |
| EPIDERMIS |
C0014520 |
Epidermis |
epidermis |
| DERMIS |
C0011646 |
Dermis |
dermis |
| LOWER |
C0205104 |
Inferior |
enferiör |
| RIGHT |
C0205090 | RIGHT | sag |
| HISTOLOGY |
C0019638 |
Histologic |
histolojik |
| SIMPLE LENTIGO |
C0302255 |
Lentigo Simplex |
lentigo simpleks |
10. SAMPLE QUERY:
ENTER TURKISH.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
Click on SUBMIT:
The search engine will return a clickable listing of relevant images.
11. SAMPLE QUERY:
SELECT ENGLISH TRANSLATION.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
-------------------------------------------------------------------
Search Requested at: Sun Oct 10 12:38:38 1999, Greenwich Mean Time.
Search String Requested: BOBREK
-------------------------------------------------------------------
To begin a search, make a selection,
then click on SUBMIT:
12. SAMPLE QUERY:
SELECT UMLS TERM.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
-------------------------------------------------------------------
Search Requested at: Sun Oct 10 12:41:41 1999, Greenwich Mean Time.
Search String Requested: KIDNEY böbrek
-------------------------------------------------------------------
Please select the desired UMLS CONCEPT,
and click on the SUBMIT button:
13. SAMPLE QUERY:
SELECT AFIP LEGEND TITLE.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
-------------------------------------------------------------------
Search Requested at: Sun Oct 10 12:44:44 1999, Greenwich Mean Time.
Search String Requested: C0007134 KIDNEY böbrek
-------------------------------------------------------------------
Please select the desired LEGEND TITLES,
and click on the SUBMIT button:
14. SAMPLE QUERY:
VIEW TURKISH ANNOTATIONS.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
-------------------------------------------------------------------
Search Requested at: Sun Oct 10 12:47:47 1999, Greenwich Mean Time.
Search String Requested: ###000446 C0007134 KIDNEY böbrek
-------------------------------------------------------------------
###446
METASTATIC RENAL CELL CARCINOMA.
metastazik renal hücre karsinoma .
In this figure, well-defined lobules of tumor are present
içinde bu sekil , iyi - tanImlanmIs lobüller tümör olmak var
within the superficial and deep dermis (A).
içinde yüzeyel ve derinn dermis ( bir ) .
These are composed of malignant cells with clear,
bunlar olmak olusturmak malign hücreler ile temiz ,
highly glycogenated cytoplasm and prominent associated vascularity
yüksekçe glikojenize sitoplazma ve belirgin birlestirilmis vaskülerite
and hemorrhage (B).
ve hemoraji ( B ) .
Lesions of this type must be differentiated from malignant nodular
lezyonlar bu tip mecbur olmak farklIlasmIs 'den malign nodüler
(clear cell) hidradenomas.
( temiz hücre ) hidradenomalar .
15. RESULTS.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
UMLS-CODES ASSIGNED TO 5,465 AFIP IMAGE-LEGEND TEXTS.
5,364 DISTINCT WORDS.
3,016 DISTINCT UMLS CONCEPTS.
5,465 OCCURRENCES OF TWO UMLS CONCEPTS.
ONE OCCURRENCE APIECE OF 875 UMLS CONCEPTS.
OTHER UMLS CONCEPTS ASSIGNED TO MULTIPLE IMAGE-LEGENDS.
16. CONCLUSION.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
ENGLISH IS DOMINANT LANGUAGE OF THE INTERNET.
NON-NATIVE ENGLISH SPEAKERS MAY NEED ASSISTANCE.
IMAGE ARCHIVE WEBSITE WITH ENGLISH OR TURKISH QUERY-WORDS.
BILINGUAL ANNOTATIONS.
17. REFERENCES.
NEXT PAGE.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
1. UMLS Knowledge Sources. 9th edition. 1998. DOCUMENTATION.
National Institutes of Health. National Library of Medicine.
Bethesda, Maryland 20854.
2. College of American Pathologists. Systematized Nomenclature
of Human and Veterinary Medicine (SNOMED International).
College of American Pathologists, Northfield, IL, 1993.
3. Berman JJ, Moore GW.
SNOMED-encoded surgical pathology databases:
A tool for epidemiologic investigation.
Mod Pathol. 1996 Sep;9(9):944-950.
4. Silverberg SG.
SNOMED-encoded surgical pathology databases:
's no big deal - or is it?
Mod Pathol. 1996 Sep;9(9):953-954.
5. Moore GW, Berman JJ.
Automatic SNOMED coding.
Proc Annu Symp Comput Appl Med Care. 1994;18:225-229.
6. Moore GW, Berman JJ.
Performance analysis of manual and automated
systematized nomenclature of medicine (SNOMED) coding.
Am J Clin Pathol. 1994 Mar;101(3):253-256.
7. Berman JJ, Moore GW.
Object-oriented controlled-vocabulary translator
using TRANSOFT + HyperPAD.
Proc Annu Symp Comput Appl Med Care. 1991;15:973-975.
8. Berman JJ, Moore GW, Donnelly WH, Massey JK, Craig B.
A SNOMED analysis of three years accessioned cases
(40,124) of a surgical pathology department:
implications for pathology-based demographic studies.
Proc Annu Symp Comput Appl Med Care. 1994;18:188-192.
9. Moore GW, Berman JJ, Hanzlick RL, Buchino JJ, Hutchins GM.
A prototype Internet autopsy database.
1625 consecutive fetal and neonatal
autopsy facesheets spanning 20 years.
Arch Pathol Lab Med. 1996 Aug;120(8):782-785.
10. Berman JJ, Moore GW, Hutchins GM.
Internet autopsy database.
Hum Pathol. 1997 Apr;28(4):393-394.
11. Moore GW, Miller RE, Hutchins GM. Indexing by MeSH titles
of natural language pathology phrases identified on first encounter
using the barrier word method. In: Scherrer JR,
Côté RA, Mandil SH, eds.
Computerized Natural Medical Language Processing for Knowledge
Representation. Amsterdam: North-Holland; pp 29-39, 1989.
12. Murphy GF, Elder DA. Armed Forces Institute of Pathology Atlas
of Tumor Pathology. Non-Melanocytic Tumors of the Skin,
Electronic Fascicle version 2.0. Washington, D.C.
Armed Forces Institute of Pathology.
13. Elder DA, Murphy GF. Armed Forces Institute of Pathology Atlas
of Tumor Pathology. Melanocytic Tumors of the Skin,
Electronic Fascicle version 2.0. Washington, D.C.
Armed Forces Institute of Pathology.
14. Murphy WM, Beckwith JB, Farrow GM. Armed Forces Institute
of Pathology Atlas of Tumor Pathology. Tumors of the Kidney,
Bladder and Related Urinary Structures, Electronic Fascicle
version 2.0. Washington, D.C. Armed Forces Institute of Pathology.
15. Rosai J, Carcangiu ML, DeLellis RA. Armed Forces Institute
of Pathology Atlas of Tumor Pathology. Tumors of the Thyroid Gland,
Electronic Fascicle version 2.0. Washington, D.C. Armed Forces Institute
of Pathology.
16. DeLellis RA. Armed Forces Institute of Pathology Atlas
of Tumor Pathology. Tumors of the Parathyroid Gland,
Electronic Fascicle version 2.0. Washington, D.C.
Armed Forces Institute of Pathology.
17. Kurman RJ, Norris HJ, Wilkinson EJ. Armed Forces Institute
of Pathology Atlas of Tumor Pathology. Tumors of the Cervix,
Vagina, and Vulva, Electronic Fascicle version 2.0. Washington, D.C.
Armed Forces Institute of Pathology.
18. Silverberg SG, Kurman RJ. Armed Forces Institute of Pathology
Atlas of Tumor Pathology. Tumors of the Uterine Corpus
and Gestational Trophoblastic Disease, Electronic Fascicle
version 2.0. Washington, D.C. Armed Forces Institute of Pathology.
19. Rosen PP, Oberman HA. Armed Forces Institute of Pathology
Atlas of Tumor Pathology. Tumors of the Mammary Gland,
Electronic Fascicle version 2.0. Washington, D.C.
Armed Forces Institute of Pathology.
20. Burger PC, Scheithauer BW. Armed Forces Institute of Pathology
Atlas of Tumor Pathology. Tumors of the Central Nervous System,
Electronic Fascicle version 2.0. Washington, D.C.
Armed Forces Institute of Pathology.
21. McLean EW, Burnier MN, Zimmerman LE, Jakobiec FA.
Armed Forces Institute of Pathology Atlas of Tumor Pathology.
Tumors of the Eye and Ocular Adnexa, Electronic Fascicle version 2.0.
Washington, D.C. Armed Forces Institute of Pathology.
22. Colby TV, Koss MN, Travis WD. Armed Forces Institute of Pathology
Atlas of Tumor Pathology. Tumors of the Lower Respiratory Tract,
Electronic Fascicle version 2.0. Washington, D.C.
Armed Forces Institute of Pathology.
23. Brunning RD, McKenna RW. Armed Forces Institute of Pathology
Atlas of Tumor Pathology. Tumors of the Bone Marrow,
Electronic Fascicle version 2.0. Washington, D.C.
Armed Forces Institute of Pathology.
24. Fechner RE, Mills SE. Armed Forces Institute of Pathology
Atlas of Tumor Pathology. Tumors of the Bones and Joints,
Electronic Fascicle version 2.0. Washington, D.C.
Armed Forces Institute of Pathology.
18. ZIPF DISTRIBUTION:
TURKISH TRANSLATION.
LAST PAGE.
RETURN TO TABLE OF CONTENTS.
| RANK | FREQUENCY |
UMLS CODE | UMLS NAME | TURKISH |
| 1 | 5465 |
C0030664 | PATHOLOGY | patoloji |
| 2 | 5465 |
C0441468 | PHOTOGRAPH | fotograf |
| 3 | 2016 |
C0007634 | CELL | hücre |
| 4 | 1812 |
C0441469 | PICTURE | resim |
| 5 | 1140 |
C0027651 | NEOPLASM | neoplazm |
| 6 | 1102 |
C0024109 | LUNG | akciger |
| 7 | 644 |
C0012634 | DISEASE | hastalIk |
| 8 | 617 |
C0030705 | PATIENT | hasta |
| 9 | 581 |
C0205165 | SMALL | küçük |
| 10 | 569 |
C0205234 | FOCAL | fokal |
| 11 | 549 |
C0205164 | LARGE | büyük |
| 12 | 528 |
C0233426 | APPEAR | gözüken |
| 13 | 522 |
C0006141 | BREAST | meme |
| 14 | 487 |
C0205397 | OBSERVE | gözlemek |
| 15 | 466 |
C0038128 | STAIN | boya |
| 16 | 458 |
C0150312 | PRESENT | var |
| 17 | 421 |
C0015392 | EYE | göz |
| 18 | 413 |
C0445247 | SAME | aynI |
| 19 | 408 |
C0010834 | CYTOPLASM | sitoplazma |
| 20 | 407 |
C0205392 | SOME | bazI |
| 21 | 401 |
C0205182 | ATYPICAL | atipik |
| 22 | 387 |
C0022646 | KIDNEY | böbrek |
| 23 | 375 |
C0205402 | PROMINENT | belirgin |
| 24 | 365 |
C0449774 | PATTERN | dagIlIm |
| 25 | 347 |
C0205091 | LEFT | sol |
| 26 | 341 |
C0040132 | THYROID | tiroid |
| 27 | 336 |
C0205160 | NEGATIVE | negatif |
| 28 | 324 |
C0205090 | RIGHT | sag |
| 29 | 323 |
C0042149 | UTERUS | uterus |
| 30 | 320 |
C0449470 | TYPE | tip |
| 31 | 316 |
C0007097 | CANCER | kanser |
| 32 | 300 |
C0205172 | MANY | çok |
| 33 | 300 |
C0370003 | SPECIMEN | materyal |
| 34 | 294 |
C0014609 | EPITHELIUM | epitel |
| 35 | 292 |
C0262950 | BONE | kemik |
| 36 | 285 |
C0332285 | ARISING FROM | 'dan kaynaklanan |
| 37 | 285 |
C0444186 | SMEAR | yayma |
| 38 | 279 |
C0005953 | BONE MARROW | kemik iligi |
| 39 | 275 |
C0017542 | GIEMSA STAIN | Giemsa boyasI |
| 40 | 272 |
C0018964 | HEMATOXYLIN | hematoksilin |
| 41 | 270 |
C0205428 | AFFECTING | etkileyen |
| 42 | 269 |
C0007874 | CERVIX | serviks |
| 43 | 268 |
C0431085 | TUMOR CELLS |
tümör hücreleri |
| 44 | 262 |
C0042591 | VESSEL | damar |
| 45 | 258 |
C0014448 | EOSIN | eozin |
| 46 | 253 |
C0205250 | ELEVATED |
yükseltilmis |
| 47 | 249 |
C0024264 | LYMPHOCYTE | lenfosit |
| 48 | 247 |
C0205308 | OLD | eski |
| 49 | 236 |
C0439508 | YEAR | yIl |
| 50 | 234 |
C0392746 | WELL | iyi |
FREQUENCY DISTRIBUTION OF
50 MOST FREQUENT UMLS CONCEPTS
IN AFIP LEGEND-TEXTS.