文章摘要
黄政,张学福.一种基于网页信息抽取的OA期刊资源采集方法研究[J].数字图书馆论坛,2017,(5):25~32
一种基于网页信息抽取的OA期刊资源采集方法研究
A Research on Open Access Journal Resource Acquisition Method Based on Web Information Extraction
  
DOI:
中文关键词: OA期刊;OA期刊资源采集;网页信息采集;OA期刊资源采集系统
英文关键词: Open Access Journal;Open Access Journal Resource Acquisition;Web Information Acquisition;Open Access Journal Resource Acquisition System
基金项目:
作者单位
黄政 中国农业科学院 
张学福 中国农业科学院 
摘要点击次数: 1990
全文下载次数: 1494
中文摘要:
      本文结合开放获取期刊(Open Access Journal,OA期刊)资源特点,针对无法通过OAI-PMH协议进行资源采集的OA期刊,提出一种基于网页信息抽取的资源采集策略.本文从网页资源描述的角度总结OA期刊资源特点并对其分类.基于网页信息抽取方法在OA期刊资源采集适用性,提出一种基于OA期刊网页元数据抽取的采集方法,并在此方法的基础上设计了采集系统.通过对国内外不遵循OAI-PMH协议的10本OA期刊的网站实证采集,得到45785篇论文的元数据,证明该采集方法能有效地应用于此类资源采集.研究丰富了OA期刊资源采集方式,对不遵循OAI-PMH协议的OA期刊资源采集提供方法借鉴.
英文摘要:
      Open access journal resources have important academic value, however, some open access journals do not follow the OAI-PMH protocol, and cannot collect resources through OAI-PMH protocol. In this paper, based on the characteristics of open Access journal resources, we propose a non OAI-PMH protocol based open access resource acquisition strategy. In this paper, from the point of view of web resources description, this paper summarizes the haracteristics of open access journal resources and classifies them from the point of view of web resources description.Based on the applicability of the web information collection method in collecting open access journal resources, this paper proposes a open access journal resource acquisition strategy non based on OAI-PMH protocol, which is based on the method of acquisition open access journal web metadata extraction and design the acquisition system. Through the empirical study of 10 open access journals which do not provide the OAI-PMH protocol at home and abroad, a total of 45785 papers were collected. It is proved that this method can be effectively applied to the acquisition of such resources. The research enriches the acquisition methods of open access journals, and provides a method to guide the acquisition of open access journals that do not follow the OAI-PMH protocol.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮