Karwan Jacksi

Published Journal Articles

2025

KSTRV1: A scene text recognition dataset for central Kurdish in (Arabic-Based) script

2025-05

Data in Brief (Issue : 5) (Volume : 60)

Scene Text Recognition (STR) has advanced significantly in recent years, yet languages utilizing Arabic-based scripts, such as Kurdish, remain underrepresented in existing datasets. This paper introduces KSTRV1, the first large-scale dataset designed for Kurdish Scene Text Recognition (KSTR), addressing the lack of resources for non-Latin scripts. The dataset comprises 1,420 natural scene images and 19,872 cropped word samples, covering Kurdish (Sorani and Badini dialects), Arabic, and English. Additionally, 20,000 synthetic text instances have been generated to enhance the dataset’s diversity, quantity, and quality by incorporating varied fonts, orientations, distortions, and background complexities. KSTRV1 captures the multilingual landscape of the Kurdistan Region while addressing real-world challenges like occlusion, lighting variations, and script complexity. The dataset includes detailed annotations with bounding boxes, language identification, and text orientation labels, ensuring comprehensive support for training and evaluating STR models. By providing both natural and synthetic data, KSTRV1 enables the development of robust text recognition models, particularly for Central Kurdish, a low-resource language. The KSTRV1 dataset is publicly available at https://doi.org/10.5281/zenodo.15038953 and is expected to significantly contribute to research in multilingual STR, document analysis, and optical character recognition (OCR), facilitating more inclusive and accurate text recognition systems.

2023

A hybrid part-of-speech tagger with annotated Kurdish corpus: advancements in POS tagging

2023-10

Digital Scholarship in the Humanities (Issue : 1) (Volume : 9)

With the rapid growth of online content written in the Kurdish language, there is an increasing need to make it machine-readable and processable. Part of speech (POS) tagging is a critical aspect of natural language processing (NLP), playing a significant role in applications such as speech recognition, natural language parsing, information retrieval, and multiword term extraction. This study details the creation of the DASTAN corpus, the first POS-annotated corpus for the Sorani Kurdish dialect. The corpus, containing 74,258 words and thirty-eight tags, employs a hybrid approach utilizing the bigram hidden Markov model in combination with the Kurdish rule-based approach to POS tagging. This approach addresses two key problems that arise with rule-based approaches, namely misclassified words and ambiguity-related unanalyzed words. The proposed approach’s accuracy was assessed by training and testing it on the DASTAN corpus, yielding a 96% accuracy rate. Overall, this study’s findings demonstrate the effectiveness of the proposed hybrid approach and its potential to enhance NLP applications for Sorani Kurdish.

An Intelligent and Advance Kurdish Information Retrieval Approach with Ontologies: A Critical Analysis

2023-09

International Journal of Intelligent Systems and Applications in Engineering (Issue : 11) (Volume : 11)

Today, there are numerous methods of finding information online: radio, TV, and the internet all provide answers. However, the Internet stands out as being particularly helpful; users can search by typing in questions related to any subject area they wish. Results appear as links to various documents available on the internet, some of which may not even be relevant due to the vast amount of material. Search engines reliant solely on keywords are incapable of making sense of raw data, making it time-consuming and costly to extract critical pieces from an immense collection of web pages. Due to these deficiencies, several concepts were born, such as the Semantic Web (SW) and ontologies. SW serves as an excellent gateway for retrieving key information through various Information Retrieval (IR) techniques. IR algorithms are too simplistic to extract the semantic content from texts. IR, SW, and ontologies can all be used interchangeably, although all three have some connection. The SW can be achieved through IR, while indexing can lead to its creation on the web. The SW is also created through ontologies. Ontologies can be used together with the intelligent approaches to produce web content, which is then marked up using SW Documents. Ontology is the backbone of any software; therefore, the SW becomes simpler to comprehend. Ontology development is the process of creating and refining an ontology over time. This paper investigates various approaches, methodologies, and datasets used to address challenges in information retrieval, including corpus preparation, annotation techniques, query expansion, semantic reasoning, content alignment, and ontology-based retrieval systems.

Web Solution for Processing and Visualizing Mass-Spectrometry Data and Protein Peptides Identified in Cancer Patients

2023-07

International Journal of Intelligent Systems and Applications in Engineering (Issue : 11) (Volume : 3)

This paper addresses the critical problem of processing and visualizing mass spectrometry data and protein peptides identified in cancer patients. The growing volume of data produced by advanced technologies, such as mass spectrometry, has necessitated the development of computer systems capable of effectively storing, analyzing, and presenting this data. In response to this challenge, a web-based solution is presented that empowers researchers and clinicians to gain valuable insights through network visualization of peptides and their associated data points across various cancer types and patient cohorts. By leveraging the power of Laravel on PHP 8, this system provides a robust foundation for efficient data processing and management. Additionally, the integration of an API enables seamless communication with a TypeScript and React-based front-end, resulting in an engaging and interactive user experience. The platform's ability to present the complex relationships between protein peptides and cancer-specific data in a network visualization format offers a powerful tool for researchers and clinicians to explore and interpret the data effectively. The development of this web-based solution contributes to the advancement of proteomics research and holds great potential for improving cancer treatment outcomes. By facilitating the exploration and analysis of mass spectrometry data and protein peptides, the system enables researchers to uncover valuable patterns and insights that can inform the development of more effective treatments for cancer patients. Through this work, a meaningful impact in the field of cancer research is strived for by us, and a valuable resource for the scientific community is provided.

A Semantics-Based Clustering Approach for Online Laboratories Using K-Means and HAC Algorithms

2023-01

Mathematics (Issue : 3) (Volume : 11)

Due to the availability of a vast amount of unstructured data in various forms (e.g., the web, social networks, etc.), the clustering of text documents has become increasingly important. Traditional clustering algorithms have not been able to solve this problem because the semantic relationships between words could not accurately represent the meaning of the documents. Thus, semantic document clustering has been extensively utilized to enhance the quality of text clustering. This method is called unsupervised learning and it involves grouping documents based on their meaning, not on common keywords. This paper introduces a new method that groups documents from online laboratory repositories based on the semantic similarity approach. In this work, the dataset is collected first by crawling the short real-time descriptions of the online laboratories’ repositories from the Web. A vector space is created using frequency-inverse document frequency (TF-IDF) and clustering is done using the K-Means and Hierarchical Agglomerative Clustering (HAC) algorithms with different linkages. Three scenarios are considered: without preprocessing (WoPP); preprocessing with steaming (PPwS); and preprocessing without steaming (PPWoS). Several metrics have been used for evaluating experiments: Silhouette average, purity, V-measure, F1-measure, accuracy score, homogeneity score, completeness and NMI score (consisting of five datasets: online labs, 20 NewsGroups, Txt_sentoken, NLTK_Brown and NLTK_Reuters). Finally, by creating an interactive webpage, the results of the proposed work are contrasted and visualized.

2021

Task Scheduling Algorithms in Cloud Computing: A Review

2021-04

Turkish Journal of Computer and Mathematics Education (Issue : 4) (Volume : 12)

Cloud computing is the requirement based on clients and provides many resources that aim to share it as a service through the internet. For optimal use, Cloud computing resources such as storage, application, and other services need managing and scheduling these services. The principal idea behind the scheduling is to minimize loss time, workload, and maximize throughput. So, the scheduling task is essential to achieve accuracy and correctness on task completion. This paper gives an idea about various task scheduling algorithms in the cloud computing environment used by researchers. Finally, many authors applied different parameters like completion time, throughput, and cost to evaluate the system.

State of Art for Semantic Analysis of Natural Language Processing

2021-03

Qubahan Academic Journal (Issue : 2) (Volume : 1)

Semantic analysis is an essential feature of the NLP approach. It indicates, in the appropriate format, the context of a sentence or paragraph. Semantics is about language significance study. The vocabulary used conveys the importance of the subject because of the interrelationship between linguistic classes. In this article, semantic interpretation is carried out in the area of Natural Language Processing. The findings suggest that the best-achieved accuracy of checked papers and those who relied on the Sentiment Analysis approach and the prediction error is minimal.

An Automated Early Alert System for Natural Disaster Risk Reduction: A Review

2021-03

QALAAI ZANIST JOURNAL (Issue : 1) (Volume : 6)

According to the research published in the last decades, many peoples died due to natural disasters. So, some researchers tried to find a method and solution to reduce these disasters and risks. Lamentably, there is not any value system for a warning from certain dangerous disasters in the country. This suggestion is constructive to diagnose this kind of problem; every country follows different tactics. Based on the various sources of natural weather monitoring systems in the heterogeneous country regions, this review found no solution to warn the community in real-time. This examination is to find the weakness of the current situation as the growth of technology nowadays. Today mobile application's new technology helps an early alert system for natural disaster risk reduction (DRR) that authorities employed in several ways to reduce the natural disaster risks.

A state-of-the-art survey on semantic similarity for document clustering using GloVe and density-based algorithms

2021-02

Indonesian Journal of Electrical Engineering and Computer Science (Issue : 1) (Volume : 22)

Semantic similarity is the process of identifying relevant data semantically. The traditional way of identifying document similarity is by using synonymous keywords and syntactician. In comparison, semantic similarity is to find similar data using meaning of words and semantics. Clustering is a concept of grouping objects that have the same features and properties as a cluster and separate from those objects that have different features and properties. In semantic document clustering, documents are clustered using semantic similarity techniques with similarity measurements. One of the common techniques to cluster documents is the density-based clustering algorithms using the density of data points as a main strategic to measure the similarity between them. In this paper, a state-of-the-art survey is presented to analyze the density-based algorithms for clustering documents. Furthermore, the similarity and evaluation measures are investigated with the selected algorithms to grasp the common ones. The delivered review revealed that the most used density-based algorithms in document clustering are DBSCAN and DPC. The most effective similarity measurement has been used with densitybased algorithms, specifically DBSCAN and DPC, is Cosine similarity with F-measure for performance and accuracy evaluation.

The Importance of E-Learning in the Teaching Processor Secondary Schools /Review Article

2021-01

Academic Journal of Nawroz University (Issue : 1) (Volume : 10)

This study explores the usefulness of e-learning in teaching in secondary institutions. The topic of using new information and communication technology for teaching and learning is very relevant in secondary education institutions. Henceforth, Students can manage the most recent Technologies better. In addition, the School must play an important role to give instructional classes to the teacher to build up their aptitudes on the utilization of present-day advancements and to encourage downloading E-educational module from the service's site. However, still there are deterrents with the application: First, right off the bat the substance of the educational programs is not perfect with E-learning. Second, shortcoming of the mechanical framework important for the foundation of the E-learning framework in general optional school. Third, low attention to understudies and educators about the significance of E-learning and absence of sufficient capability for chiefs and instructors where instructors experience issues in tolerating this kind of Education. This paper examines the concept and the description of e-learning as presented by different researchers, the role that e-learning plays in secondary education institutions in relation to teaching and learning processes, and the advantages and disadvantages of adopting and implementing it.

2020

AN HRM SYSTEM FOR SMALL AND MEDIUM ENTERPRISES (SME)S BASED ON CLOUD COMPUTING TECHNOLOGY

2020-08

International Journal of Research -GRANTHAALAYAH (Issue : 8) (Volume : 8)

Technology has changed our life and the way we work; however, technology has affected several methods of working in Small and Medium Enterprises (SME)s. Human Resource (HR) is one of the core components in businesses, and nowadays most businesses are using technology for daily basis tasks. However, it still is not used all over the world. In Kurdistan Region-Iraq (KRI), most of the SMEs still use the old way of working and follow the paper-based method for their daily basis tasks. According to a survey, more than seventy percent of SMEs in Kurdistan are not using software to manage human resource management tasks. However, some big companies are using HRMS; but even then, there is a lack of use of Cloud Technology. In this study, a model of the Enterprise Human Resource Management System (EHRMS) is proposed and implemented to solve the HR problems in this area using Cloud Technology. The proposed system consists of sixteen standard modules which used usually with famous HRM systems. The system has been developed by using several technologies such as CodeIgniter as a software framework. The system is launched and deployed on Amazon Web Service (AWS) Elastic Compute Cloud (EC2).

Football Ontology Construction using Oriented Programming

2020-03

Journal of Applied Science and Technology Trends (Issue : 1) (Volume : 1)

According to the W3C, the semantic web is the future of the www. The data that is based on the semantic web can be understood by machines and devices. The main component of the semantic web is the ontology, which is known as the backbone of the semantic web. There are many tools used to edit and create an ontology, however, few kinds of research construct an ontology using oriented programming. SPARQL and API OWL are used to access and edit ontologies, though they are not using oriented programming. The main objective of this paper is to build an ontology using oriented programming and allowable to access OWL entities. Owlready module is effectively used in sport ontology for football in 11 European Leagues.

State of the art document clustering algorithms based on semantic similarity

2020-02

Jurnal Informatika (Issue : 2) (Volume : 14)

The constant success of the Internet made the number of text documents in electronic forms increases hugely. The techniques to group these documents into meaningful clusters are becoming critical missions. The traditional clustering method was based on statistical features, and the clustering was done using a syntactic notion rather than semantically. However, these techniques resulted in un-similar data gathered in the same group due to polysemy and synonymy problems. The important solution to this issue is to document clustering based on semantic similarity, in which the documents are grouped according to the meaning and not keywords. In this research, eighty papers that use semantic similarity in different fields have been reviewed; forty of them that are using semantic similarity based on document clustering in seven recent years have been selected for a deep study, published between the years 2014 to 2020. A comprehensive literature review for all the selected papers is stated. Detailed research and comparison regarding their clustering algorithms, utilized tools, and methods of evaluation are given. This helps in the implementation and evaluation of the clustering of documents. The exposed research is used in the same direction when preparing the proposed research. Finally, an intensive discussion comparing the works is presented, and the result of our research is shown in figures.

2018

A State of Art Survey for OS Performance Improvement

2018-09

Science Journal of University of Zakho (Issue : 3) (Volume : 6)

Through the huge growth of heavy computing applications which require a high level of performance, it is observed that the interest of monitoring operating system performance has also demanded to be grown widely. In the past several years since OS performance has become a critical issue, many research studies have been produced to investigate and evaluate the stability status of OSs performance. This paper presents a survey of the most important and state of the art approaches and models to be used for performance measurement and evaluation. Furthermore, the research marks the capabilities of the performance-improvement of different operating systems using multiple metrics. The selection of metrics which will be used for monitoring the performance depends on monitoring goals and performance requirements. Many previous works related to this subject have been addressed, explained in details, and compared to highlight the top important features that will very beneficial to be depended for the best approach selection.

Student Attendance Management System

2018-02

International Journal of Engineering and Technology (Issue : 2) (Volume : 6)

Attendance management is important to every single organization; it can decide whether or not an organization such as educational institutions, public or private sectors will be successful in the future. Organizations will have to keep a track of people within the organization such as employees and students to maximize their performance. Managing student attendance during lecture periods has become a difficult challenge. The ability to compute the attendance percentage becomes a major task as manual computation produces errors, and wastes a lot of time. For the stated reason, an efficient Web-based application for attendance management system is designed to track student's activity in the class. This application takes attendance electronically and the records of the attendance are storing in a database. The system design using the Model, View, and Controller (MVC) architecture, and implemented using the power of Laravel Framework. JavaScript is adding to the application to improve the use of the system. MySQL used for the Application Database. The system designed in a way that can differentiate the hours of theoretical and practical lessons since the rate of them is different for calculating the percentages of the students' absence. Insertions, deletions, and changes of data in the system can do straightforward via the designed GUI without interacting with the tables. Different presentation of information is obtainable from the system. The test case of the system exposed that the system is working enormously and is ready to use to manage to attend students for any department of the University. INTRODUCTION Due to student's interest in classrooms, and whose is the largest union in the study enviro… Read more

LOD explorer: Presenting the Web of data

2018-01

International Journal of Advanced Computer Science and Applications (Issue : 1) (Volume : 9)

The quantity of data published on the Web according to principles of Linked Data is increasing intensely. However, this data is still largely limited to be used up by domain professionals and users who understand Linked Data technologies. Therefore, it is essential to develop tools to enhance intuitive perceptions of Linked Data for lay users. The features of Linked Data point to various challenges for an easy-to-use data presentation. In this paper, Semantic Web and Linked Data technologies are overviewed, challenges to the presentation of Linked Data is stated, and LOD Explorer is presented with the aim of delivering a simple application to discover triplestore resources. Furthermore, to hide the technical challenges behind Linked Data and provide both specialist and non-specialist users, an interactive and effective way to explore RDF resources.

2016

State of the Art Exploration Systems for Linked Data: A Review

2016-11

International Journal of Advanced Computer Science and Applications (Issue : 11) (Volume : 7)

The ever-increasing amount of data available on the web is the result of the simplicity of sharing data over the current Web. To retrieve relevant information efficiently from this huge dataspace, a sophisticated search technology, which is further complicated due to the various data formats used, is crucial. Semantic Web (SW) technology has a prominent role in search engines to alleviate this issue by providing a way to understand the contextual meaning of data so as to retrieve relevant, high-quality results. An Exploratory Search System (ESS), is a featured data looking and search approach which helps searchers learn and explore their unclear topics and seeking goals through a set of actions. To retrieve high-quality retrievals for ESSs, Linked Open Data (LOD) is the optimal choice. In this paper, SW technology is reviewed, an overview of the search strategies is provided, and followed by a survey of the state of the art Linked Data Browsers (LDBs) and ESSs based on LOD. Finally, each of the LDBs and ESSs is compared with respect to several features such as algorithms, data presentations, and explanations.

2015

Design And Implementation Of Online Submission and Peer Review System A Case Study Of E-Journal Of University Of Zakho

2015-08

2015International Journal of Scientific & Technology Research (Issue : 4) (Volume : 8)

With the aim of designing and implementing a web-based article submission management system for academic research papers, several international models such as Elsevier Editorial System and ICOCI, International Conference on Computing and Informatics, are studied and analyzed. Through this analysis, an open access web-based article submission and peer review system for Journal of University of Zakho (JUOZ) is employed. This kind of systems is not only capable of solving issues such as complex manuscript management, time-delays in the process of reviewing, and loss of manuscripts that occurs often in off-line paper submission and review processes, but also is capable to build the foundation for e-journal publications. Consequently, an active and rapid scholarly communication medium can be made. The implementation and deployment of this system can improve the rank of the university and the reputation and the globalization of science and technology research journals.

Database Teaching in Different Universities: A Phenomenographic Research

2015-05

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) (Issue : 4) (Volume : 3)

In this research, the different teaching methodologies practiced in the basic database course taught in different universities are discussed. This paper was written based on researched conducted through a questionnaire about university students in three different universities. The study was performed with a phenomenographic research approach among university staffs that have been graduated from University of Duhok , Nawroz University and University of Mosul . It investigates how and how well they have learned the basic database course during their bachelor degree.

Effects of Processes Forcing on CPU and Total Execution-Time Using Multiprocessor Shared Memory System

2015-04

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING IN RESEARCH TRENDS (Issue : 4) (Volume : 2)

In this paper, the applications of Shared Memory systems towards the implementation of the Parallel Processing approach is provided. Multiple tasks can be dealt with the applications of such systems by using the principles of Shared Memory Parallel Processing programming called Application-Program. The influences of forcing processes amongst processes of Shared Memory system relying on Parallel Processing approach principals are given. These influences are related with computing total and CPU execution times. The CPU usage is also determined with its changing manner depending on the load size and the number of participated CPUs.

General method for data indexing using clustering methods

2015-03

International Journal of Scientific and Engineering Research (Issue : 3) (Volume : 6)

Indexing data plays a key role in data retrieval and search. New indexing techniques are proposed frequently to improve search performance. Some data clustering methods are previously used for data indexing in data warehouses. In this paper, we discuss general concepts of data indexing, and clustering methods that are based on representatives. Then we present a general theme for indexing using clustering methods. There are two main processing schemes in databases, Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP). The proposed method is specific to stationary data like in OLAP. Having general indexing theme, different clustering methods are compared. Here we studied three representative based clustering methods; standard K-Means, Self Organizing Map (SOM) and Growing Neural Gas (GNG). Our study shows that in this context, GNG out performs K-Means and SOM.