<?xml version="1.0" encoding="UTF-8"?>
<itemContainer xmlns="http://omeka.org/schemas/omeka-xml/v5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://omeka.org/schemas/omeka-xml/v5 http://omeka.org/schemas/omeka-xml/v5/omeka-xml-5-0.xsd" uri="https://eprints.ibu.edu.ba/items/browse?collection=3&amp;output=omeka-xml&amp;page=3" accessDate="2026-06-25T05:19:39+01:00">
  <miscellaneousContainer>
    <pagination>
      <pageNumber>3</pageNumber>
      <perPage>10</perPage>
      <totalResults>46</totalResults>
    </pagination>
  </miscellaneousContainer>
  <item itemId="3504" public="1" featured="0">
    <fileContainer>
      <file fileId="4320">
        <src>https://eprints.ibu.edu.ba/files/original/41cddaabb1e237ad0c086dbca13071d0.pdf</src>
        <authentication>369ba7ca22ee871fc9b74adde3cf1d69</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="26578">
                    <text>Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123

Using Exploratory Data Analysis and Big Data Analytics for Detecting Anomalies
in Cloud Computing
Ibrahim Muzaferija1, Zerina Mašetić1
1

International Burch University, Sarajevo, Bosnia and Herzegovina
ibrahim.muzaferija@stu.ibu.edu.ba
zerina.masetic@ibu.edu.ba

Abstract – While leveraging cloud computing for large-scale distributed applications allows
seamless scaling, many companies struggle following up with the amount of data generated in terms
of efficient processing and anomaly detection, which is a necessary part of the management of
modern applications. As the record of user behavior, weblogs surely become the research item
related to anomaly detection. Many anomaly detection methods based on automated log analysis
have been proposed. However, not in the context of big data applications where anomalous behavior
needs to be detected in understanding phases prior to modeling a system for such use. Big Data
Analytics often ignores anomalous point due to high volume of data. To address this problem, we
propose a complemented methodology for Big Data Analytics – the Exploratory Data Analysis,
which assists in gaining insight into data relationships without the classical hypothesis modeling. In
that way, we can gain better understanding of the patterns and spot anomalies. Results show that
Exploratory Data Analysis facilitates anomaly detection and the CRISP-DM Business
Understanding phase, making it one of the key steps in the Data Understanding phase.
Keywords - Cloud Computing, Big Data, Data Mining, Anomaly Detection

1.

Introduction

With constant growth and advancements of the Internet, there are more systems connected to other
connected systems, constantly generating and exchanging data. That data is referred to as Big Data and is
constantly targeted by cyber-attacks as it contains sensitive and valuable information. The term “big data”
refers to data that is so large, complex, or rapid that it’s not possible to process using traditional
computing and data management tools. Big Data provides opportunities to improve research, operational
efficiency, and decision-support applications with increased value for digital applications [1]. At the same
time, Big Data represents the challenges to store, transport, process, mine, and serve the data. Data that is
high in volume, velocity, variety, and veracity must be processed with advanced analytical tools and
algorithms to reveal meaningful information and provide value.
Cloud computing represents the use of distributed and shared resources such as computing, storage,
networking, and analytical software, and provides fundamental support to address the challenges of Big

1

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123
Data. Cloud computing serves both as a technological enabler and producer of big data [1].
Anomalies represent unusual or behaviors that deviate from the normal. In efforts to increase cloud
computing reliability, anomaly detection poses a frequent problem in threat detection and identification,
as reported by Cloud Security Alliance (CSA) [2] which represents the world’s leading organization
dedicated to securing cloud computing environments, conducts annual research with an aim to raise
awareness of threats, risks, and vulnerabilities in the cloud environment. In their latest (2019) report [3],
CSA re-examined the risks with cloud security and took a new approach, examining the problems in
configuration and authentication, rather than the traditional focus on vulnerabilities and malware,
highlighting the following threats:
1.

Data Breaches

2.

Misconfiguration and inadequate change control

3.

Lack of cloud security architecture and strategy

4.

Insufficient identity, credential, access, and key management

5.

Account hijacking

6.

Insider threat

7.

Insecure interfaces and APIs

8.

Weak control plane

9.

Metastructure and applistructure failures

10.

Limited cloud usage visibility

11.

Abuse and nefarious use of cloud services

In this research, we aim to address the threats which can be traced in user logs (numbered 1, 4, 5, 6, 8, 9
and 11) by utilizing Big Data Analytics and Exploratory Data Analysis in order to discover anomalies and
contribute to increase of security in Cloud Computing applications.
2.

Literature Review

Anomaly detection in the cloud infrastructure and big data environment has been the topic of many
research studies in the literature. Since the first introduction of cloud infrastructure in 2006 [4], cloud
computing has greatly impacted the industries. The rapid development of Internet and Big Data
technologies has resulted in increased service development on cloud computing, such as online banking
services, electronic news services, government information systems, mobile services, etc. These systems
handle sensitive and confidential data, making the anomaly detection mechanisms one of its core security
requirements.
In the review paper by Arif Sari [4], [5], different techniques and mechanisms used in the detection of
anomalous activities within the cloud environment are described: threshold detection, statistical analysis,
rule-based measures, data mining, and machine learning. We aim to apply statistical techniques and EDA

2

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123
(Exploratory Data Analysis) in order to discover anomalies.
In the “Big Data processing for Anomaly Detection” survey [6], Ariyaluran et al. present the details of the
comparative analysis and the relationship of three different domains, which are anomaly detection,
machine-learning algorithms, and real-time big data processing. This paper aims to contribute to
complemented techniques for anomaly detection. Once anomalies are detected, we can utilize Machine
Learning and real-time anomaly detection for future improvements.
In their research, Dalal and Rele [6], [7] emphasize the steps in creating effective and reliable
mechanisms for threat detection. They highlight the importance of the first CRISP-DM (Cross Industry
Standardized Process for Data Mining) phase named “Develop Business Understanding”, where reasons
for defects and answers for maintenance are taken into consideration. They discuss the phase “Analyze
Data and Data Dependencies” where the aim is to analyze, combine, and compare the data with the
present situation, without proposing EDA as a baseline for data understanding. Our work aims to employ
EDA in order to complement the methodology.
Also, they highlight the step named “Engage with Subject Matter Experts (SME’s)” for better dataset
examination and analysis of the anomaly situation, along with a grouping of the threat factors. By
employing these methods, we aim to set transparent expectations and bring out clarity to our results. In
further research, we work closely with application development technical lead which serves as SME, and
facilitates in clarification of log data, as well as threats, anomalies and our results
3.

Methodology

The research is implemented using a portion of the CRISP-DM (Cross Industry Standardized Process for
Data Mining) methodology [8], which represents the common standards used by data scientists and data
mining experts in order to build analytical and machine learning models. Prior to analytical and machine
learning model creation, we need to construct a clean dataset of user behavior with anomalies labeled for
future modeling. To do so, in this research we focus on the first three phases: Business Understanding,
Data Understanding, and Data Preparation, as highlighted with red color in the figure below. Modeling
and subsequent phases are researched in our extended study of anomaly detection in cloud computing.

3

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123

Figure 1. CRISP-DM workflow
In the Business Understanding phase, the goal is to determine business objectives, assess the situation
from a business perspective, discuss with subject matter experts, determine data mining goals, and
produce a project plan. In the Data Understanding, we collect and select raw data, describe and explore
the data, consult with subject matter experts, and verify data quality. In the Data Preparation phase, which
is often the most time-consuming phase, we select and clean the data, format data, and construct a clean
dataset.
We approach the mentioned phases using Big Data Analytics and Exploratory Data Analysis (EDA). Big
Data Analytics examines large amounts of data in a non-traditional manner, that is using distributed and
shared resources to support the data quantity and complexity [8], [9]. Exploratory Data Analysis [10] is
an approach to analyzing data in order to summarize their main characteristics and uncover the underlying
structure using statistical and visual methods.
3.1. Data Collection and Selection
Cloud-based enterprise web application logs are produced by multiple servers and services, which are
streamed to Elasticsearch [11] service, an open-source search, and analytics engine for all types of data.
Elasticsearch is distributed, fast, and scalable, which makes it an ideal environment for big data ingestion,
enrichment, storage, analysis, and visualization.

4

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123

Figure 2. Raw data access from Kibana
Raw data is accessed by locally restoring the Elasticsearch cluster snapshot taken for a period of three
months. The cluster contains around 20 GB of semi-structured data collected from different application
services and levels, indexed by a timestamp. Application logs are mapped to 175 attributes and accessed
using Kibana [12], the Elastic Stack service for data analysis and visualization.
Attribute selection is a part of the “Business understanding” and “Data understanding” phase,
implemented together in consultations with application development technical lead, i.e., subject matter
expert (which we’ll refer to as SME). The attributes describing the user’s application usage that were the
most relevant for anomaly detection are selected for further analysis. The following table displays
statistical information for selected attributes.

5

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123

Table 1. Selected data statistical information

Attribute name

Description

Data type

Range

Missing

timestamp

Timestamp

Date Time

[2020-01-05 21:17,

0.0 %

2020-03-26 21:06]
account_id

Account ID,

Nominal

unique company

f6afd09c-****-****-****-

8.87 %

c30a935ccc37, ...

account identifier
client_country

User country

Nominal

BA, US, ...

9.53 %

company_name

Company Name

Nominal

Company A, Company B,

10.17 %

...
platform

Application

Nominal

platform

BrowserMNC,

0.0 %

BackendMNC, ...

principal_id

User email

Nominal

developer@**.com, ...

9.64 %

remote_address

User IP address

Nominal

[ 0.0.0.0. - 255.255.255.255

9.12 %

]
user_agent

User-agent

Nominal

Mozilla/5.0 ( Windows NT

0.0 %

10.0; Win64; x64) … , ...
error_message

Error message

Nominal

validation error, auth error,

99.96 %

...
message

Log message

Nominal

Profiling, FrontTimings, ...

0.18 %

level

Log level

Nominal

Info, error

0.0 %

path

Parameterized

Nominal

PUT

99.78 %

resource request

/customer/***/ticket/***, ...

resource

Request

Nominal

(GET) /invoices, ...

0.0 %

status_code

Response code

Nominal

200, 404, ...

10.17 %

Once the relevant data is selected, we utilize Elastic Stack service named Logstash [13] for collecting the
data, that is, obtaining the initial dataset in CSV format for further work.

6

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123
3.2. Data Cleansing and Engineering
In order to get an insight into data quality, graphical and statistical methods were used to detect
anomalies, faults, outliers, missing values, etc. Moreover, we engineer new attributes in order to increase
the interpretability or decrease data complexity. Exploratory Data Analysis assists understanding of
relations between attributes and allows us to spot tendencies, as well as to identify the necessary cleaning
steps we have to take.
First, we apply filters to remove log data from automated services, such as health-checks and other
application services that don't reflect the user’s interactions. Next, we remove attributes that contain a
high fraction of missing values because the informational significance of attributes is inconsiderable.
Values of “status_code” attribute are mapped to the corresponding descriptions for better interpretability.
We engineer new attributes: “resource_method”, “resource_base” and “user_os”. The “resource_method”
and “resource_base” attributes are created from the values of the “resource” attribute by using regular
expressions to extract the relevant information. The “user_os” attribute is created in a similar manner,
extracting the relevant information using regular expressions from the “user agent” attribute. Creation of
these attributes allows us to focus on the most relevant information and decrease the cardinality of
original attributes.
3.3. Dataset Creation
The clean dataset contains 16 attributes describing the application usage, and 522,763 rows with a
timestamp attribute range from 6th January to 26th March (81 days).
Data is imported to RapidMiner [14], a data science software platform that provides an integrated
environment for data preparation, visualization, machine learning, text mining, and predictive analytics. It
is open source and used for commercial applications, as well as for research, education, training, rapid
prototyping.
In this phase, we continue with Exploratory Data Analysis in order to discover patterns beyond formal
modeling or hypothesis testing tasks. Our aim is to utilize the business understanding to increase the
understanding of data and relationships between attributes in order to spot anomalous trends.
As the application is B2B based, we analyze the company data first: company account histogram,
statistics and distribution. Next, we analyze the behaviors of users in company and general context. By
analyzing the “user” and “user domain” attribute, we spot trends in company context usage and behavior.
Analysis of application resource requests allows us to understand the usage in general context.

7

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123

Figure 3. Counts of application resource requests
From the figure above, we can spot trends and further analyze the resource usage. The resource request
represents a user action, thus are highly valuable for the context of anomaly detection. Moreover, granular
analysis facilitates the business understanding as we gain deeper insight into user generated data.
Next, we analyze the application errors which are often one of the most informative attributes for the
anomaly detection. Anomalies and cyber-attacks are often causing application errors, allowing us to
quickly analyze error data and make distinctions between application anomalies, user anomalies and
possible threats.

Figure 4. Application error logs histogram

8

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123

Figure 5. Application logs status codes histogram
Application status codes are highly correlated with application resource usage. By analyzing status codes,
we gain insight into applications performance and usage trends. Anomalies are most visible when
analyzing the status codes.
Dataset creation is concluded with the creation of an “anomaly” attribute, which represents whether a
specific application log instance is anomalous. The criteria for creation of such attribute are drawn from
the discoveries of EDA and confirmed through the consultations with SME. By addressing the
CRISP-DM phases for Business Understanding, Data Understanding, and Data Preparation with the
application of Exploratory Data Analysis, we are able to discover anomalies in application usage and user
behavior.
4.

Results and Discussion

As web application has busines-to-busines context, we approach the analysis of log data from a company
perspective. We find that companies using the application can have their application usage segmented into
three categories: heavy, medium, and light users, as shown below in the Figure 6. Heavy users are the
companies responsible for application development and support. Medium users reflect the companies
with frequent application usage, while light users represent the companies that are onboarding to
application or in initial phases of application usage. Distinction of company users per their level of usage
helps us create a better business understanding. Because of unbalanced level of application usage per
9

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123
company, we can expect an increased number of anomalies for heavy users, while companies with
medium and light usage may have decreased the number of anomalies. Regarding the percentage of
anomalies, it varies between companies with no specific pattern.

Figure 6. Application usage per company
When analyzing the histogram of application resource methods through the “resource_method” attribute,
we find an anomalous request pattern, as shown below in the Figure 7. Consultations with SME yielded
that resource request method anomaly corresponds to the service whose use has ceased, and the service
behavior can be identified as anomaly.

10

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123

Figure 7. Application resource methods histogram anomaly
When analyzing individual users, we perform segmentation per company using the domain name in user
email address. The histogram of user domains contributes to business understanding as we can spot user
trends per each company. In the figure below, we present the user domain histogram focused on
anomalous application usage of unknown domains. We discover that usage from unknown domains tends
to be increased in the monthly peaks of application usage.

Figure 8. User domain histogram focused on unknown domains
Consultations with SME clarified that unknown domains such as “gmail.com”, “hotmail.com”, and

11

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123
“outlook.com” are used by quality assurance developers and were marked as such. This has further
decreased the number of visits from unknown domains. Moreover, consultations showed that users from
unknown domains are companies in the trial phase, that is application demonstration phase, and are still
eligible for anomaly detection. Application usage from other user domains is distributed as expected: two
development companies take up the most traffic while others are medium and light users.

Figure 9. Log message histogram anomalies
In the figure above, we present an analysis result of log message histogram with revealed anomalies. We
find that anomalies are caused by application development or, more specifically, integration attempts with
other companies using the application.
In the figure below, we present results from correlation analysis of the dataset. The correlation matrix
shows increased correlation between attributes such as “platform” and “message”. These results help us to
identify and discard highly correlated attributes and decrease the dataset complexity.

12

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123

Figure 10. Correlation matrix
Correlation matrix also shows that attributes “status code” and “level” have a level of correlation. This
indicates that application errors can be sourced from application status codes. In the figure below, status
code histogram focused on error status code is depicted. We can spot the error trends together with
identification of error sources.

Figure 11. Status code histogram focused on error status codes
With application of EDA, the resulting anomalies are used in the creation of labeled dataset for anomaly
detection purposes. The dataset can serve as a baseline for creating various analytical and machine
learning anomaly detection models such as frequency threshold detection, supervised anomaly prediction,
unsupervised anomaly detection, etc. In the Table 2, we present the final dataset statistical information.

13

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123

Table 2. Dataset statistical information

Attribute name

Type

Missing Least / Min

Most / Max

Range

timestamp

Date and

0

Jan 6, 2020

Mar 26, 2020 9:06

80d 14h 48min

6:18 AM

PM

58710 (3)

12345 (131,132)

time
account_id

Nominal

3

12345,
c84c286[...]ffea5,
[52 more]

company_name

Nominal

3

Company XYZ

Company A

Company A,

(3)

(131,132)

Company B, [52
more]

country

Nominal

3

XX (29)

US (399,465)

US, BA, IN, [12
more]

platform

Nominal

0

Backend (45%)

Browser (55%)

Browser, Backend

user

Nominal

6

fk***@*.com

fs***@*.com

fs***@*.com,

(4)

(48,738)

de***@*.com,
[209 more]

remote_address

Nominal

3

184.*.*.22 (3)

77.*.*.171 (41,561)

77.*.*.171,
144.*.*.229, [302
more]

user_agent

Nominal

0

Mozilla/[...]4.1

Mozilla/[...]ri/537.3

Mozilla/[...]36,

(3)

6 (77,449)

Mozilla/[...].0,
[114 more]

14

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123
error_msg

Nominal

467,22

Getaddr[...].co

ESOCKET[...]UT

ESOCKET[...]UT,

5

m (1)

(89)

502, [3 more]

level

Nominal

0

error (159)

info (467,225)

Info, Error

message

Nominal

0

Integ[...]led

Profiling (264,851)

Profiling,

(159)

frontTimigs, [1
more]

status_code

Nominal

93

405 Method

200 OK (453,461)

[...]ed (1)
resource_method Nominal

0

PUT (97)

200 OK, 204 No
Content, [8 more]

GET (373,123)

GET, POST, [3
more]

resource_base

Nominal

0

produ[...]ile (8)

endpoints (98,191)

endpoints,
customers, [17
more]

user_domain

Nominal

6

C*** (272)

A*** (351,885)

A***, M***, [9
more]

user_agent_os

Nominal

0

Unknown (3)

Windows (411,762)

Windows, OS X,
[2 more]

anomaly

Binomina

0

True (882)

False (466,502)

False, True

l

5.

Conclusion

This study has shown that the use of Exploratory Data Analysis contributes to and complements the
implementation of CRISP-DM methodology phases: business understanding, data understanding, and
data preparation. Moreover, we demonstrate that Exploratory Data Analysis is efficient method for
detecting anomalies in big data. Summarizing data characteristics and discovering underlying patterns for
data and its distribution brings value for both data understanding and data preparation phase. We confirm
the benefits of proven method from previous studies: consultations with SME play a crucial role in the
business understanding phase and give a valuable contribution in data understanding phase Next,
consultations in the data understanding and data preparation phase facilitates the workflow and can help
us increase the data value.
Future efforts can be placed in implementation of subsequent CRISP-DM phases, that is, modeling,

15

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123
evaluation and deployment. Modeling data using Machine Learning techniques enables complex pattern
discovery, as suitable for big data datasets, and further improves anomaly detection as underlying
mathematical relationships can be leveraged. While this has been proven in majority of studies conducted
in the field of anomaly detection and supervised machine learning, we propose a use of unsupervised
machine learning for finding new anomalies that will enable a creation of extended labeled dataset which can then be used for creation of supervised machine learning model for anomaly detection and
prediction.

6.

[1]

References

“Big Data and cloud computing: innovation opportunities and challenges” [Online]. Available:
https://www.tandfonline.com/doi/full/10.1080/17538947.2016.1239771. [Accessed: 04-Sep-2020]

[2]

“Cloud Security Alliance (CSA)” [Online]. Available: https://cloudsecurityalliance.org/. [Accessed:
04-Sep-2020]

[3]

“Top Threats to Cloud Computing: Egregious.” [Online]. Available:
https://cloudsecurityalliance.org/artifacts/top-threats-to-cloud-computing-egregious-eleven/.
[Accessed: 04-Sep-2020]

[4]

“About AWS.” [Online]. Available: https://aws.amazon.com/about-aws/. [Accessed: 04-Sep-2020]

[5]

A. Sari, “A Review of Anomaly Detection Systems in Cloud Networks and Survey of Cloud
Security Measures in Cloud Storage Applications,” Journal of Information Security, vol. 6, no. 2,
pp. 142–154, Mar. 2015.

[6]

“Real-time big data processing for anomaly detection: A Survey,” Int. J. Inf. Manage., vol. 45, pp.
289–307, Apr. 2019.

[7]

“Cyber Security: Threat Detection Model based on Machine learning Algorithm - IEEE Conference
Publication.” [Online]. Available: https://ieeexplore.ieee.org/document/8724096. [Accessed:
04-Sep-2020]

[8]

“DMME: Data mining methodology for engineering applications – a holistic extension to the
CRISP-DM model,” Procedia CIRP, vol. 79, pp. 403–408, Jan. 2019.

[9]

“A Reference Model for Big Data Analytics” [Online]. Available:
https://www.researchgate.net/publication/327728739_A_Reference_Model_for_Big_Data_Analytic
s. [Accessed: 04-Sep-2020]

[10] “Exploratory data analysis” [Online]. Available: https://psycnet.apa.org/record/2011-23865-003.
[Accessed: 04-Sep-2020]
[11] “Open Source Search: The Creators of Elasticsearch, ELK Stack &amp; Kibana.” [Online]. Available:
https://www.elastic.co/. [Accessed: 04-Sep-2020]
[12] “Kibana.” [Online]. Available: https://www.elastic.co/kibana. [Accessed: 04-Sep-2020]
16

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 12.34567/JONSAE2020123
[13] “Logstash.” [Online]. Available: https://www.elastic.co/logstash. [Accessed: 04-Sep-2020]
[14] “RapidMiner.” [Online]. Available: https://rapidminer.com/. [Accessed: 04-Sep-2020]

17

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <collection collectionId="3">
      <elementSetContainer>
        <elementSet elementSetId="1">
          <name>Dublin Core</name>
          <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
          <elementContainer>
            <element elementId="50">
              <name>Title</name>
              <description>A name given to the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26245">
                  <text>Journal of Natural Sciences and Engineering</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="43">
              <name>Identifier</name>
              <description>An unambiguous reference to the resource within a given context</description>
              <elementTextContainer>
                <elementText elementTextId="26605">
                  <text>2637-2835</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="98">
              <name>DOI</name>
              <description>Digital object identifier</description>
              <elementTextContainer>
                <elementText elementTextId="26606">
                  <text>10.14706</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="45">
              <name>Publisher</name>
              <description>An entity responsible for making the resource available</description>
              <elementTextContainer>
                <elementText elementTextId="26607">
                  <text>International Burch University</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="41">
              <name>Description</name>
              <description>An account of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26608">
                  <text>Journal of Natural Sciences and Engineering (JONSAE) is a peer-reviewed, biannually published international journal focusing on empirical and theoretical research in all branches of Engineering and Natural Sciences. It is published on the behalf of Faculty of Engineering and Natural Sciences of International Burch University and aims to provide the best content regarding by publishing original research papers, review articles, special issues, feature articles, and book reviews. All manuscript submissions are subject to initial appraisal by the Editor, and, if found suitable for further consideration, to peer review by independent, anonymous referees. All peer review is double-blind and submission is online. The journal welcomes theoretical, applied, interdisciplinary and methodological work, with preference on empirical research, critical approach and problem-solving methods in manuscripts.</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="44">
              <name>Language</name>
              <description>A language of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26609">
                  <text>English</text>
                </elementText>
              </elementTextContainer>
            </element>
          </elementContainer>
        </elementSet>
      </elementSetContainer>
    </collection>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="26579">
                <text>Using Exploratory Data Analysis and Big Data Analytics for Detecting Anomalies&#13;
in Cloud Computing</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="26580">
                <text>Ibrahim Muzaferija, Zerina Mašetić &#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="26581">
                <text>– While leveraging cloud computing for large-scale distributed applications allows&#13;
seamless scaling, many companies struggle following up with the amount of data generated in terms&#13;
of efficient processing and anomaly detection, which is a necessary part of the management of&#13;
modern applications. As the record of user behavior, weblogs surely become the research item&#13;
related to anomaly detection. Many anomaly detection methods based on automated log analysis&#13;
have been proposed. However, not in the context of big data applications where anomalous behavior&#13;
needs to be detected in understanding phases prior to modeling a system for such use. Big Data&#13;
Analytics often ignores anomalous point due to high volume of data. To address this problem, we&#13;
propose a complemented methodology for Big Data Analytics – the Exploratory Data Analysis,&#13;
which assists in gaining insight into data relationships without the classical hypothesis modeling. In&#13;
that way, we can gain better understanding of the patterns and spot anomalies. Results show that&#13;
Exploratory Data Analysis facilitates anomaly detection and the CRISP-DM Business&#13;
Understanding phase, making it one of the key steps in the Data Understanding phase.&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="26582">
                <text> Cloud Computing, Big Data, Data Mining, Anomaly Detection</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="43">
            <name>Identifier</name>
            <description>An unambiguous reference to the resource within a given context</description>
            <elementTextContainer>
              <elementText elementTextId="26583">
                <text>2637-2835</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="98">
            <name>DOI</name>
            <description>Digital object identifier</description>
            <elementTextContainer>
              <elementText elementTextId="26584">
                <text>10.14706/JONSAE2021320&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
  </item>
  <item itemId="3503" public="1" featured="0">
    <fileContainer>
      <file fileId="4319">
        <src>https://eprints.ibu.edu.ba/files/original/d8bd5c4881ddc5399123b176dd9fbcd2.pdf</src>
        <authentication>1d4855c1060aa995fd1a0d8cbff1e775</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="26571">
                    <text>Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114

Feedback System Using Sentiment Analysis
Abdulrahman Almonajed 1, Dino Kečo1,
1

International Burch University, Sarajevo, Bosnia and Herzegovina
abdulrahman.almonajed@stu.ibu.edu.ba
dino.keco@ibu.edu.ba

Abstract – Today, when looking at the quality of an online item, the feedback itself plays a very
important role. Based on the feedback we can decide whether the desired item is good or not, get a
picture of the seller and so on. Many companies that have online shops display the most positive
feedback while hiding bad ones or display only a few of them. In this research, we will help people
by automating the process of deciding whether a feedback is positive or negative, which will give
them time for other jobs and save money for hiring people who will work on the feedback. Since
feedback on online articles is very important today, the process of determining positive and
negative feedback should be made as quick and easy as possible. In this research, we will show a
very simple and fast way to classify feedback as positive or negative, which means that the main
question of this research is how to facilitate and speed up the process of determining the polarity of
the feedback. We will use NLP using Python’s library called TextBlob. The used algorithm is called
Naïve Bayes, it gave the accuracy of around 80%.
Keywords - feedback, online article, sentiment analysis
1.

Introduction

These days, the number of online stores is growing very fast [1]. We can see that today we can buy
whatever we want online. Also, through online shopping we can save a lot of money by being able to find
things much cheaper than they are in local stores. By shopping through online shops, we can "escape"
arrogant sellers, as well as annoying sellers who follow us during the shopping and "force" us to buy their
products. Also, we can save a lot of time by avoiding traffic jams, waiting in line at the store, saving
money by not paying for parking, saving our fuel, etc. We can even buy things we don’t have in our city
or country. For leading companies such as Amazon, Alibaba, eBay, and so on, feedback from every user
is very important. They receive thousands of feedback a day, which is very difficult to read and analyze,
which is why they need to automate the process. Understanding and analyzing the feedback can improve
the user experience, improve the products, and so on, but can also help the online shop owners to know
which seller is not doing their job properly, whether it is cheating, etc. Also, there are online applications
where we can book an apartment, rent a car, etc., such as on our BTT (Balkan Tourist Travel) application.
This kind of web application is now well known in our region, so we decided to create one to facilitate the
tourism process in Bosnia and Herzegovina. The application is intended for tourists who visit our country
in large numbers. BTT application will make it easier for them to book everything they need during their
stay in our country with a few clicks. The main goal of the application is to avoid numerous calls and

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114
misunderstandings between our people and tourists. On the BTT we value feedback, so users can leave
feedback on everything that they have used on our application. By doing so, we give our customers the
opportunity to express their opinions, which will help us to achieve the best possible service. In this
research, we will use the BTT application to apply and test our classification method. For the best process
of development, we will be using only one part of the BTT web application. We will perform all the tests
and modifications to achieve the best possible results. And if the results are satisfying we will include all
the other parts of the application.
Customers' opinion is not only important to large companies it is also important to small companies that
are just getting started [2]. Therefore, determining whether the opinion is positive or negative must be
automated as soon as possible and in the best possible way. This research will solve this problem and
determine whether customers’ opinion is positive or negative in a very quick and easy way.
The biggest problem this research solves is the hard work of reading the opinions, which can be praise or
criticism, of users and determining whether it is positive or negative or spending the extra money to hire
people to do that. Later, it will help identify whether the comment is spam or not, which can reduce time
determining feedback's polarity, determine the language of the comment, and so on.
2.

Literature Review

Sentimental analysis, which will be used in this research, has been studied in detail for the last few years.
There are a lot of research papers regarding sentimental analysis, but we will present only the ones that
are useful for our research.
In the paper [3], authors Akanksha Sharma and Dr. Ashim performed a Comparative Study of Different
Approaches Used For Sentiment Analysis from customer reviews, where they stated that this process
helps the owners of the online shop to make the right decision regarding their items. In their research,
they have divided the feedback into three categories: positive, negative, and neutral. Where we can notice
that in our research the classification of feedback is similar, from -1 to 1. 1 represents positive, 0
represents neutral and -1 represents negative. Their research is very similar to ours. They gathered
feedback from e-shops, analyzed the feedback, and finally classified them. The authors mentioned
Support Vector Machine (SVM), Naive Bayes, Lexicon Method, etc. At the end of their research, SVM
was the best compared to other methods.
Research paper [4], also performed a sentiment analysis on user feedback from online shops. Michael
Gamon, the author of this research, uses over 40.000 feedbacks that he collected from two different
sources, Global Support Services, and Knowledge Base Surveys. The author divided the feedback on a
scale between 1 and 4, where 1 represented dissatisfied and 4 for very satisfied. In his research, he used a
linear Support Vector Machine (SVM) for feedback classification with 10-fold cross-validation. As a
result of his research, Michael created two clear classifications (classes). The first class determines

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114
whether the feedback falls under 1 or 4 on the scale, the second class determines whether the feedback
falls under 1 or 2 and 3 or 4. He used 10-fold cross-validation on both classifications with the first 2000
feedbacks in his dataset. The first class (whether the feedback falls under 1 or 4 on the scale) proved to
be more accurate. The precision was 85.47 for the first class, while the second class was 69.23.
Prashali et al. [5], the authors of the research, collected their research data for the classification from
Kaggle website. The data was in excel format, containing 186 feedback. The goal of their research was to
see how to improve the teaching and learning program. Their dataset was composed of students’ feedback
on the teaching program. The result of their research was divided, as in our research, between -1 and 1.
As we mentioned before 1 represents positive, 0 neutral, and -1 negative. We have to mention that in their
research, they used polarity from sentiment analysis to determine whether the feedback is classified as
positive, negative, or neutral.
In the paper [6], the authors wrote about how owners of online stores should analyze every feedback they
get in the shortest time possible. Since this affects their further business and cooperation with the seller on
their online shop. Robots can cause fraud to star ratings on items on online shops, for that reason
feedback on online shops must be analyzed using natural language processing (NLP). In this way, we can
delete false feedback and quickly analyze feedback received. Swati N. Manke and Nitin Shivale classify
their results in two categories, positive and negative.
Author Peter D. Turney in his research paper [7], applied semantic orientation for determining whether
the feedback is positive or negative. For his research, Peter used 410 samples of feedback, which he
acquired from 4 different domains (banks, automobile, movie, and travel destination). He used an
unsupervised learning algorithm to classify feedback as positive or negative. The precision of his
algorithm was averaging 74%, the highest precision was on automobile 84%, while the lowest one was on
movie 66%. The reason for the difference between the precision of automobiles and movies, which was a
pretty huge one was because of some words depending on the context. In the domain of automobiles,
some adjectives may have a negative meaning whereas in the movie sphere it can be the exact opposite
meaning. For example, the adjective “unpredictable” would have a negative meaning in an automobile
but in the movie a positive one. For assessing feedback to be positive or negative, the author Peter
followed 3 steps:
●

Draw out sentences which contain adjectives and adverbs,

●

Predict semantic orientation of each extracted sentence,

●

Categorize feedback as positive or negative according to the semantic orientation of the
sentence.

In [8], the authors used a model to analyze text from feedback written by the users in their research. Also,
the number of stars of the star rating given by the user was taken for determining the results. Joachim
Büschken and Greg M. Allenby tested their model on a hotel and restaurant dataset, which contained the
feedback and the star rating. Their model was built based on Latent Dirichlet Allocation (LDA). In the

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114
restaurant dataset, there are 696 samples (feedback and star rating) from different Italian restaurants.
While in the hotel dataset, feedback and star ratings were collected from two hotels, one in New York and
the second near the JFK airport. The number of samples collected from the hotel in New York is 3.212,
while the second hotel is 1.255, which sums up to 4.467 feedback from the hotels. At the end of their
research, the authors believe that bag-of-sentence is better than bag-of-words for user speech analysis.
Saleem Abuleil and Khalid Alsamara in their research paper [9], wrote about analyzing user feedback
using Natural Language Processing (NLP). The authors presented feedback in two formats, rating
(structured data) and textual (unstructured data). Their research was applied on feedback that has been
written in the Arabic language. In the Arabic language, adjectives take the form of describing another
person or thing in a sentence. In their research, the authors convert unstructured data (text) into structured
data (numerical). They categorized their results into two classes, positive and negative feedback.
In the research paper [10], authors write about measuring customers’ satisfaction using sentiment
analysis. For the classification method, they used sentiment classifier support vector machine (SVM). The
main reason for that was that SVM gave the best results on the basis of the research paper [11]. The data
set was collected from Twitter API. It contained the following:
●

Likes (lists of users that liked specific tweet)

●

Followers (lists of users that follow specific tweet)

●

Mentions (lists of users that was mentioned on a specific tweet)

●

Replies (lists of replies on a specific tweet), and

●

Re-tweet (lists of users that share specific tweet)

In this research, they used the database MySQL Database Management System. The authors classified
their results in two classes, positive and negative. At the end of their research, their algorithm gave a
precision of around 87%.
3.

Methods and Materials

The data that will be used in this research will be taken from the BTT web application. The number of
feedback samples is more than 1000. The application consists of multiple feedback sites, but this research
will be based on feedback from the rent-a-car section/site. The number of data we will test in this research
will depend on the number of feedbacks on the BTT web application. Right now, there are more than
1000 feedback for the rent-a-car section, if new feedback is added, the system will cover them
automatically once it runs. We only used cars’ feedback from the BTT web application. We take data in
HTML format where we have only feedback, without other attributes from the table that are related to
feedback for business logic. The attributes that we will not use are ID, user, and car_ID since it means
nothing to us in determining whether the feedback is positive or negative. This means that only one
column is left since the table contains 4 columns (ID, name, carID, and feedback), which we can see in
the figure below.

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114

Figure 1. Feedback in MySQL
As we mentioned before, we will only use one column for the table, which is the feedback column. Figure
2 shows only feedback from the table in the HTML web page, from where we will take the feedback.

Figure 2. Feedback on HTML page
A.

Data preprocessing

Since this research is based on working with text, in the process of determining whether the given text is
positive or negative, that text must be analyzed and processed. The system will be based solely on
working with English text. We will implement natural language processing (NLP) in the process of
further analyzing and processing the feedback. For the whole process, we will use the python
programming language with its library TextBlob. The library TextBlob will be used to determine if the
given feedback is positive or negative. TextBlob is a python library used for basic text tasks, such as
sentiment analysis, translation, language determination, and so on. All of these tasks can be classified
under NLP tasks. TextBlob allows us to view objects as a regular string in the python for processing the
desired task [12]. The processes and analyzes done in this research are removing HTML tags, removing
non-letters, removing whitespaces and empty elements, lowercase, tokenization, spell checking and
correct misspelled words, and etc. To reduce the number of words of the feedback and make the
classification as accurate as possible, usually removing stopwords is used [13]. When we check the list of
nltk’s stopwords, we can see that it’s not a good idea to always remove stopwords from the dataset. For
example, the stopword “not” it can change the meaning of the sentence at all. Since, the sentence “This
car is not good” after removing stopwords will be “car good”. We can see that the original sentence is

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114
negative, while the sentence after removing stopwords is positive. Of course, it's not always the case that
removing stopwords will change the meaning of the sentence. Because of that, before removing
stopwords it is good to know the sentences inside the dataset.
The figure below shows the example of how removing stopwords can change the meaning of the
sentence.

Figure 3. Example of removing stopwords
In Figure 4. we prove that removing stopwords sometimes can cause an issue. We can see that first
sentence as polarity result -35, which means it's negative, while after removing stopwords from the
sentence, the meaning is changes and polarity result became 70, which means the sentence is positive.

Figure 4. Polarity result before and after removing stopwords
4.

Results

After processing the above analyzes and processes on each feedback we took from the BTT web
application, we will begin the process of determining whether it is positive or negative. Here we come to
sentiment analysis, which will be used from the mentioned python library. From TextBlob's sentiment
analysis, we will use the polarity part which will give us a result between -1 and 1. Where -1 indicates
very negative results, in our case very bad feedback, and 1 is positive. In Figure 5, we show the
implementation of textblob's sentiment polarity and the polarity result or score.

Figure 5. Implementation and result of TextBlob's sentiment property

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114
From the figure above, we can see that the result is not so readable, where we can only check for the
polarity result but we don't know for which feedback is that result. So we combined or merged the
polarity score and feedback, to make the result more readable and understandable. The figure below,
shows the way we combined the feedback and polarity score, and how the result became more
readable and understandable from before.

Figure 6. Feedbacks' polarity result
The table below shows the total accuracy of our algorithm.
Table 1. Result

5.

Algorithm

Approximate result

Naïve Bayes

~ 80%

Discussion

Considering the research papers related to our research, which are already mentioned in the Section 2, we
have notice that it is much faster and easier to determine if the feedback is positive or negative using the
Python’s library TextBlob. As we mentioned before, it is not always good idea to remove stopwords from
the text, as it can change the meaning of the sentences. In some researches, Naïve Bayes algorithm didn’t
give the best result. There may be more causes such as, huge dataset with unnecessary sample or
information, stopwords are removed, preprocessing is not done properly, and so on. In the table below, we
showed the algorithms and results of several previous researches.

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114

Table 2. Conspectus of previous works

Author(s)

Algorithm

Result

Joachim Büschken and Greg M. LDA (Latent Dirichlet Allocation ) 60-70%
Allenby

Michael Gamon

SVM (Support Vector Machine) – 85.47% for the first class, while
two classes
the second class was 69.23%

Al-Otaibi Shaha, Alnassar Allulo, SVM (Support Vector Machine)
Alshahrani Asma, Al-Mubarak
Amany, Albugami Sara, Almutiri
Nada, Albugami Aisha

Peter D. Turney

6.

Around 87%

PMI-IR
(Pointwise
Mutual Around 74%
Information
Information
Retrieval)

Conclusion

To conclude the results, the feedback has been divided into two groups, positive and negative. Feedback,
like in every web site helps the users that are first time on the online shop to determine which product is
of good quality. In this research we proved that removing stopwrods in not always a good idea, because it
can change the meaning of the sentence. Also the research will make it easier for the online shop owners
to determine which feedback is positive and which is negative. In this way the owner will be able to
recognize the quality sellers in a very easy and simple way. In the near future we are planning to improve
this research by adding 'minus'. The minus will be added to sellers for every bad/negative feedback on his
items. In that way we will be able to isolate bad sellers with bad items. If the seller receives a certain
number of minuses he will be warned. If the sellers item gets a certain amount of minuses it will be
automatically deleted. Also a method for recognising whether a feedback is spam or not will be
implemented. This process will be initiated before the sentimental analysis. Since we want to perform the
sentimental analysis only on „real“ feedback. This will speed up the process because we will not analyse
large numbers of spam feedback. Also methods for translating foregin feedback to english language will
be added. This research will be open-source so that every company or person will be able to use it, of
course they will need to own a shop which receives feedback.

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114
REFERENCES

[1] S. CK i G. Edwin, “Online Shopping - An Overview,” June 2014. [Na mreži]. Available:
https://www.researchgate.net/publication/264556861_Online_Shopping_-_An_Overview.
[2] A. Fundin i B. Bergman, “Exploring the customer feedback process,” June 2003. [Na mreži].
Available:
https://www.researchgate.net/publication/240260148_Exploring_the_customer_feedback_process.
[3] A. Sharma i A. Dr. Saha, “A comparative Study of different Approaches Used for Sentiment
Analysis From Customer Reviews,” 14 Dec 2018. [Na mreži]. Available:
https://poseidon01.ssrn.com/delivery.php?ID=5751021240710700920281070710870230680370490
040060050301200900180960751141181190270711060980510290180320020670021111090060881
06122026094048065075111125088015087089126069002034074006017116005086091025113001
0930131.
[4] M. Gamon, “Sentiment classification on customer feedback: Noisy data, large feature vectors, and
the role of linguistic analysis,” January 2004. [Na mreži]. Available:
https://www.researchgate.net/publication/215470705_Sentiment_classification_on_customer_feedb
ack_data_Noisy_data_large_feature_vectors_and_the_role_of_linguistic_analysis .
[5] S. S. Prashali , R. K. Asmita , S. P. Rutuja i U. W. Yamini , “Sentiment Analysis of Feddback Data,”
March 2019. [Na mreži]. Available: https://www.ijtsrd.com/papers/ijtsrd23090.pdf.
[6] N. M. Swati i Nitin Shivale, “A Review onL Opinion Mining and Sentiment Analysis based on
Natural Language Processing,” International Journal of Coumputer Applications, pp. 29-32, 2015.
[7] D. T. Peter, “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised
Classification of Reviews,” July 2002. [Na mreži]. Available:
https://www.aclweb.org/anthology/P02-1053.pdf.
[8] J. Büschken i G. M. Allenby, “Sentence-Based Text Analysis for Customer Reviews,” 2016. [Na
mreži]. Available:
https://www.ku.de/fileadmin/160102/WiSe2015_2016/mksc.2016.0993-ePDF3.pdf.
[9] S. Abuleil i K. Alsamara, “Using NLP Approach for Analyzing Customer Reviews,” 2018. [Na
mreži]. Available:
https://www.slideshare.net/cscpconf/using-nlp-approach-for-analyzing-customer-reviews-86265367.
[10] S. Al-Otaibi, A. Alnassar, A. Alshahrani, A. Al-Mubarak, S. Albugami , N. Almutiri i A. Albugami,
“Customer Satisfaction Measurement using Sentiment Analysis,” International Journal of Advanced
Computer Science and Application, pp. 106-117, 2018.
[11] J. Brynielsson, F. Johansson, C. Jonsson i A. Westling, “Emotion classification of social media posts
for estimating people's reactions to communicated alert messages during crises,” 2014. [Na mreži].
Available:
https://docplayer.net/11592731-Emotion-classification-of-social-media-posts-for-estimating-peoples-reactions-to-communicated-alert-messages-during-crises.html.
[12] S. Loria, “textblob Documentation,” 26 April 2020. [Na mreži]. Available:
https://buildmedia.readthedocs.org/media/pdf/textblob/latest/textblob.pdf.
[13] S. Bird, E. Klein i E. Loper, Natural Language Processing with Python, O'REILLY, 2009.

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <collection collectionId="3">
      <elementSetContainer>
        <elementSet elementSetId="1">
          <name>Dublin Core</name>
          <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
          <elementContainer>
            <element elementId="50">
              <name>Title</name>
              <description>A name given to the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26245">
                  <text>Journal of Natural Sciences and Engineering</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="43">
              <name>Identifier</name>
              <description>An unambiguous reference to the resource within a given context</description>
              <elementTextContainer>
                <elementText elementTextId="26605">
                  <text>2637-2835</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="98">
              <name>DOI</name>
              <description>Digital object identifier</description>
              <elementTextContainer>
                <elementText elementTextId="26606">
                  <text>10.14706</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="45">
              <name>Publisher</name>
              <description>An entity responsible for making the resource available</description>
              <elementTextContainer>
                <elementText elementTextId="26607">
                  <text>International Burch University</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="41">
              <name>Description</name>
              <description>An account of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26608">
                  <text>Journal of Natural Sciences and Engineering (JONSAE) is a peer-reviewed, biannually published international journal focusing on empirical and theoretical research in all branches of Engineering and Natural Sciences. It is published on the behalf of Faculty of Engineering and Natural Sciences of International Burch University and aims to provide the best content regarding by publishing original research papers, review articles, special issues, feature articles, and book reviews. All manuscript submissions are subject to initial appraisal by the Editor, and, if found suitable for further consideration, to peer review by independent, anonymous referees. All peer review is double-blind and submission is online. The journal welcomes theoretical, applied, interdisciplinary and methodological work, with preference on empirical research, critical approach and problem-solving methods in manuscripts.</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="44">
              <name>Language</name>
              <description>A language of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26609">
                  <text>English</text>
                </elementText>
              </elementTextContainer>
            </element>
          </elementContainer>
        </elementSet>
      </elementSetContainer>
    </collection>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="26572">
                <text>Feedback System Using Sentiment Analysis&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="26573">
                <text>Abdulrahman Almonajed  Dino Kečo &#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="26574">
                <text>Today, when looking at the quality of an online item, the feedback itself plays a very&#13;
important role. Based on the feedback we can decide whether the desired item is good or not, get a&#13;
picture of the seller and so on. Many companies that have online shops display the most positive&#13;
feedback while hiding bad ones or display only a few of them. In this research, we will help people&#13;
by automating the process of deciding whether a feedback is positive or negative, which will give&#13;
them time for other jobs and save money for hiring people who will work on the feedback. Since&#13;
feedback on online articles is very important today, the process of determining positive and&#13;
negative feedback should be made as quick and easy as possible. In this research, we will show a&#13;
very simple and fast way to classify feedback as positive or negative, which means that the main&#13;
question of this research is how to facilitate and speed up the process of determining the polarity of&#13;
the feedback. We will use NLP using Python’s library called TextBlob. The used algorithm is called&#13;
Naïve Bayes, it gave the accuracy of around 80%.&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="26575">
                <text>feedback, online article, sentiment analysis</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="43">
            <name>Identifier</name>
            <description>An unambiguous reference to the resource within a given context</description>
            <elementTextContainer>
              <elementText elementTextId="26576">
                <text>2637-2835</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="98">
            <name>DOI</name>
            <description>Digital object identifier</description>
            <elementTextContainer>
              <elementText elementTextId="26577">
                <text>10.14706/JONSAE2021319&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
  </item>
  <item itemId="3502" public="1" featured="0">
    <fileContainer>
      <file fileId="4318">
        <src>https://eprints.ibu.edu.ba/files/original/173680acbb933aed28bb44102ca00405.pdf</src>
        <authentication>0d0fc48c2c249e7e9811fba7c9bc6847</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="26564">
                    <text>Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311

Understanding Forms and Models of Cloud Computing Technologies Adopted in the
Selected Institutions in Southwestern Nigeria
Gbonjubola Oluwafunmilayo BINUYO1
1- African Institute for Science Policy and Innovation, Obafemi Awolowo University, Nigeria
gobinuyo@gmail.com
Abstract - The study examined the forms and models of cloud computing technology adopted in the
selected institutions from four states in Southwestern Nigeria. The three purposively selected institutions
were Federal, State and Private owned making twelve institutions. However, the administered
questionnaire was filled in by the ten (10) IT personnel, ten (10) lecturers and five (5) students from each
of the selected institutions making 300 respondents. The questionnaire elicited information on the forms
and models of cloud computing technology adopted and the extent of use of the adopted cloud computing
technologies in the selected institutions. Secondary data were obtained from relevant literature. Data
collected were analysed with descriptive and inferential statistics. The study concludes that the forms of
cloud computing technology adopted by the selected institutions in Southwestern Nigeria are
infrastructure-as-a-service (IaaS), software-as-a-service (SaaS) and platform-as-a-service (PaaS) while
software-as-a-service (SaaS) is often used by the institutions. Also, the models of adopted cloud computing
technology are private, public, hybrid and community cloud computing by the selected institutions in
Southwestern Nigeria. The adopted forms and models of cloud computing technology are used for
different business functions such as payroll, procurement, human resources, accounting and finance,
CRM, application development, and project management.
Keywords-Cloud computing, Institutions and Nigeria
1.

Introduction

The aim of this study is to explicate the forms and model of cloud computing technology adopted in the selected
institutions and determine the extent of use of forms of cloud computing technology and the business function
deployed on cloud computing technology adopted by the selected institutions in Southwestern Nigeria.
Scholars have defined cloud computing from their perspectives. Cloud computing depends on subscription
service to accessing networked storage space and computer resources [1]. By implication, it is a paid service(s)
to securing online information and communications technologies’ services. As cited in [1] that not all
establishment are leapfrogging to adopting cloud computing technologies especially established institutions in
developing countries like Nigeria [2].
Globally, higher institutions are encountering with the challenges of needed level of information and
communications technology (ICT) required to enhancing good quality education and R&amp;D activities especially
in developing countries [3]. Giving yearly educational report of Republic of Yemen, it indicates that the
educational sectors are challenged with hindrances to carrying out required quality education to the populace in
the country. Among the hindrances to delivering good quality education at Republic of Yemen are due to
inadequate needed infrastructure resources, under budget allocation to ICT, absence of ICT technical and
teaching personnel [4].
At present, majority of activities are been conducted online. Among the activities are online document editing
and writing, email checking, online interaction, collaboration, among others. Therefore, it is imperative globally
for educational system to meet up with the advancement in ICT technology for rendering quality education [3].
Also, given the high cost attached to providing and maintaining the needed hardware and software, it is highly
needed for educational system to adopt low cost advanced technology such as cloud computing. This cloud

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
computing addresses the challenge of high cost attached to both computer software and hardware needed to
rendering quality education to the populace by providing ICT resources on a pay per use basis [3].
There have been diverse empirical studies on cloud computing technologies adopted in institutions [5-11].
Although, there are some theoretical review studies on the same phenomenon [4, 12-15] . However, scholars
have noted that there is dearth of empirical studies on cloud computing technology in institutions especially
Nigerian institutions [13,15,16]. Also, there is dearth of information on the forms and model of computing
technology adopted in Universities in Nigeria, this is because cloud computing research is nascent in Nigeria
[16], hence the need for this study.
The remaining part of this paper is ordered as follows such as review of related literatures, method of research
deployed, the study results and discussion, conclusion and recommendations.

2.

Literature Review

There is an increasing empirical research interest in cloud computing from both developing and developed
economies. This cloud computing research interest have engineered vast intellectual and financial investment in
cloud R&amp;D [16]. Given that, it is highly imperative to know that cloud computing can be inform of service
model and deployment model [16-18].
(a) cloud computing as a service model: It is service model when it entails Software, Platform and Infrastructure
[17]. The discussion of cloud computing as a service is stated below:
(i) Software as service (SaaS) was defined as distribution model that allows users to access applications run on
their servers over the Internet and charged customers per usage [18]. In other words, it is a remote online
application accessed by users/customers via the network using a simple web navigator [18]. In general, SaaS
refers to any online services (cloud services) that users can access remotely or subscribed to and pay per usage
basis. These types of cloud services entail accounting, invoicing, performance monitoring, communications,
tracking sales and planning among others. Furthermore, using SaaS is like renting rather than purchasing it [18].
Unlike mainstream traditional software with limited license and the number of devices that can use it. SaaS
offers the users the opportunity of subscribing to the software instead of purchasing it.
(ii) Platform as a service (PaaS) allows for clients or customers to hire software, hardware, repository and
network capacity through Internet. PaaS is of great interest to application developers because it provides for
easy changes and upgrades to the features of the operating system in use and also allows for an application to be
developed by developers distributed over different geographical locations across international boundaries.
Costs can be reduced by the use of infrastructure services from a single cloud computing service provider rather
than have and maintain several hardware facilities that often do identical functions. Examples of PaaS include
Salesforce, IBM Bluemix, Cloudbees and Microsoft Azure among others.
(iii) Infrastructure-as-a-Service (IaaS): This service delivery model enables clients to rent the equipment used in
service operations and control the deployed applications and operating systems among others. Given that,

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
however, updating and patching of operating system at IaaS level are the responsibility of the users within the
contractual period [19].
(b) Cloud computing as deployment model entails public, private, community and hybrid cloud [17, 20]. These
models are discussed below:
(i) Public Cloud: The most common type of cloud computing services skewed towards the public cloud
deployment model because as the name implies, are publicly and openly available. Even though they can exist
in private clouds, SaaS provisions like cloud storage, online office applications and IaaS and PaaS contributions
like cloud-based web application development environments and hosting is in related to public cloud model.
Public clouds are also deployed when organisations or individuals do not require the level of infrastructure and
security present in private cloud model [21]. Intuitively, large organisations or enterprises may still deploy
public clouds in situations where privacy is not required, such as online document collaboration, webmail or
storage of non-sensitive documents.
(ii) Private Cloud: It does not allow cloud resources to be shared with unknown third parties. It is otherwise
known as internal cloud that is strictly for internal use of an establishment [22]. Private cloud loud resources
perhaps located either onsite or offsite premises of the organization, hence, this model does not come with the
benefit of reduced investment or expenditure in IT infrastructure or equipment.
(iii) Community cloud: This type of model is solely for a group or collection of users within an organisation
having a shared or common goal [23]. Here, IT resources are provided as a service to group of users in order to
enable an elastic collaborative use of computing resource. It is often limited to selected or limited set of
employees within an organisation such as security department, head of departments, a team or sub-unit in an
organisation.
(iv) Hybrid cloud: This model integrates two different deployment models such as public, private and
community models. Organisations often combine two differing models to form a hybrid cloud in a bid to
maximise efficiencies. In hybrid cloud, the combined clouds retain their identities but are bound together by
standardized or proprietary technology [24].
Given cloud computing as service and deployment models, however, measuring the contribution of Nigerian
scholars to the number and impact of cloud computing study was needed [16]. Content analysis and bibliometric
was deployed in papers extracted from Scopus database within the specified time and country (2016 and
Nigeria). The analysis of the extracted papers shows that majority of cloud computing study in Nigeria tend
towards Education and Saas model of cloud computing [16]. In support of that assertion, [11] studied the effect
and challenges of adopting cloud computing technology in government owned universities in the Southwestern
Nigeria. In the study, one hundred (100) IT (information technology) personnel, fifty (50) para-IT personnel and
fifty (50) students making two hundred (200) respondents in total were selected in each of the selected ten (10)
universities using stratified sampling techniques with the aid of questionnaire. Out of the two thousand (2,000)
questionnaire administered, one thousand, seven hundred and forty-two (1742) were retrieved which represents
a respondent rate of 87.1%. Microsoft excel was used to analyse the data descriptively. The outcome of the
study implies that the adoption of cloud computing has an important effect on enhanced availability, cost
effectiveness, low environmental impact, reduced and reduced investment in physical asset among others.

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
Hence, the main issues challenging the use of cloud were data insecurity, regulatory compliance concerns,
lock-in and privacy concerns.
Cloud computing is an avenue to experience efficient and optimize IT (information and technology) services at
least cost which is induced by pay as you use (PAYU) to cloud service providers [3]. There are other benefits
attached to the use of cloud computing, among the benefits is high return on investment [25]. Given the benefits
attached to the adoption and use of cloud computing, however, many sectors especially the higher education are
skeptical in adopting cloud computing technology [3, 25].
On a contrary, cloud computing technology is highly being adopted by higher institutions mainly because of
financial reasons [4]. Thinking beyond financial reason for adopting cloud computing, among the technical
reasons for adopting cloud computing by IT manager or decision maker can be attributed to organizational,
environment, technological and individual factors [4]. Cloud computing is a feasible in meeting the
technological needs of an ogranisation efficiently, effectively and at reduced investment on physical asset with
least environmental impact and IT complexity [1, 11].
[1] examined the behavioural intent to adopting cloud computing technology in large and small organization
using an Enhanced Technology Acceptance Model (ETAM). [1] concluded that attitude and adopters’ ability to
use cloud computing (self-efficacy) were better predictor of intention to adopt cloud computing technology.
Perceived usefulness and perceived ease of use of cloud computing were better predictor of attitude to adopt
cloud computing technology and perceived ease of use and the relevant of cloud computing to adopters’ work
(job relevance) were the predictor of perceived usefulness.
Recently, [15] systematically reviewed empirical studies on cloud computing technologies. The study showed
from the reviewed studies that empirical studies on cloud computing technology are dearth of cloud computing
usage/utilization. The study also identified challenges and benefits attributed to cloud computing adoption. The
study empirically showed that universities in the selected area are willing to adopting cloud computing
technologies. Meanwhile, [14] had earlier concluded from the reviewed literature on cloud computing
technology adoption in organisations that the factors that determines the adoption of cloud computing
technologies varies. [14] further noted that most of the reviewed studies operationalised the intention to adopt
cloud computing in a binary form rather than the actual use of the technology. Meanwhile,[13] showed from the
systematic literature review on empirical studies carried out on cloud computing technology adoption in
universities that several universities have utilized different types of cloud computing service models.
[25] examined the perception of IT and non-IT personnel on factors associated to the poor adoption of cloud
computing technologies in African enterprises with Nigeria as a case study. The study concluded that the fear of
unknown such as job loss, cyber threat, privacy issue and data theft were the hindrances to the adoption of cloud
computing technology. In addition to that, [26] showed that top management support, competitive pressure, and
compatibility are the factors attributed to cloud computing technologies.
Based on the aforementioned studies, this paper adopts theory of Technology Acceptance Model (TAM) as a
focusing device for the analysis of this study. Technology Acceptance Model explains the perceive usefulness of
technology, perceive ease of use of technology and attitude toward using technology [27]. The three constructs
are key determinants of technology adoption model. First, perceived usefulness (PU) explains thus that people

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
tend to use or not use a technology based on the usefulness perception of the technology. Second, perceived ease
of use (PEOU) explains that potential users of technology are of the opinion that a given technology is useful
and requires less effort to use it. Third, attitude of a user toward a technology was a major determinant of
whether the user will actually use or reject the innovation [27]. Based on that, the applicable research method is
adopted for this study.
3.

Research Method

This study deployed multi-stage sampling technique in data collection. Four states were randomly selected from
six in Southwestern Nigeria. Three institutions otherwise called universities were purposively selected from
each of the selected states. The justification for the purposive selection is to comprise one federal, one state and
one private owned university from each of the selected four states making twelve universities in total.
Furthermore, questionnaire was administered and filled in by the personnel in the purposive selected
institutions: ten (10) IT personnel, ten (10) lecturers and five (5) students were considered from each of the
selected institutions making three hundred (300) respondents. The yardstick for selecting the institutions is
based on those institutions that are using cloud computing technologies while the purposive selection of the
respondents in the institutions were based on referrer of expertise personnel on the subject matter.
The questionnaire elicited information on the forms and models of cloud computing technology adopted. The
respondents were asked to tick the forms and models of cloud computing adopted in their institutions. The forms
of cloud computing adopted for this study include Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS)
and Infrastructure-as-a-Service (IaaS) while the models of cloud computing include private, public, hybrid and
community cloud computing. Furthermore, respondents were to rank in five scales (5) the extent of use of the
adopted cloud computing technologies in the selected institutions such as: no use (A), little use (B), moderate
use (C), highly use (D) and lastly, often use (E); where Alphabet A is the lowest and Alphabet E is the highest.
The respondents were further asked to indicate appropriately (multiple response is allowed) the type of cloud
computing technologies deployed in the institutions such as Gmail-Based Institution Email Service, Dropbox,
Docusign, Skydrive, Netsuite, Cisco-WebEx, Amazon Elastic or Web Services, Learning Management Systems
(LMS), Microsoft Azure Cloud, Integrated Development Environments (IDEs), Cloud based APIs, and Cloud
based .NET Platforms. In addition to that, the respondents were asked to rank the extent of use of the adopted
cloud computing technologies for business function in five scales such as not applicable (A), little use (B),
moderate use (C), highly use (D) and often use (E) where Alphabet A is the lowest and Alphabet E is the
highest. The variables for business functions include payroll, application development, project management,
accounting and financing, CRM/sales management, procurements, human resources and messaging and
collaboration. Data collected were analysed with descriptive statistics such as frequencies and crosstabulation.

4.

Results and Discussion
The Table 1 in this study explains the three intuitions selected for this study such as Federal owned

institutions, State owned institutions and Private owned institutions. Not only that, the table further shows the
number of questionnaires administered to the selected institutions and the number of questionnaire retrieved.

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
The table shows that out of three hundred (300) questionnaires administered, 56.3% (169) were retrieved and
used for the analysis of this study. Meanwhile, from the perspective of [16] majority of cloud computing study
in Nigeria tend towards Education and SaaS model of cloud computing, hence, this further contributes to those
studies.
Categories of the institutions

Questionnaire Administered

Questionnaire Retrieved

Frequency

Percentage

Frequency

Percentage

Federal owned institution

100

33.3

57

19

State owned institution

100

33.3

63

21

Private owned institution

100

33.3

49

16.3

Total

300

100

169

56.3

Table 1 Number of Institutions Selected

Table 2 explains the forms and models of cloud computing technology adopted in the selected institutions. The
table shows that majority (78.3%) of the institutions adopts software-as-a-service, while 65.1% and 54.3% of the
institutions also adopts platform-as-a-service and infrastructure-as-a-service respectively. The adoption of forms
of cloud computing corroborates the reports of previous scholars on the forms of cloud computing technology
adopted in institutions [17] [28] [29] and [30]. Hence, the adoption of these technologies will reduce the cost of
operations of the selected institutions from keeping hardware, storage facilities, maintenance cost among others.
Concerning models of cloud computing technology adopted by the selected institutions in the study area. Table
2 further shows that the selected institutions adopts private cloud computing (53.5%), public cloud computing
(54.3%), hybrid cloud computing (51.9%) and community cloud computing (51.2%). This is line with posits of
previous scholars on the models of cloud computing technologies adopted by institutions [20-23, 31]. In
addition to that, this study corroborated [13] that several universities have utilized different types of cloud
computing service models. By implication, universities in the study area adopted different forms and models of
cloud computing based on their discretion, cost reduction, needful, necessity, and industrial revolution,
technology push and demand among others. In support of the adopted theory for this study, the selected
universities inductively adopted cloud computing technology based on perceive usefulness, perceive ease of use
and attitude of user toward a technology as indicated as element of technology acceptance model by [27].

Table 2: Forms and Models of Cloud Computing Technology Adopted
Characteristics

Frequency

Percent (%)

Software-as-a-Service (SaaS)

101

78.3

Platform-as-a-Service (PaaS)

84

65.1

Forms of Cloud Computing

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
Infrastructure-as-a-Service (IaaS)

70

54.3

Private Cloud

69

53.5

Public Cloud

70

54.3

Hybrid Cloud

67

51.9

Community Cloud

66

51.2

Models of Cloud Computing

*Multiple response is applicable
Table 3 explains the level of institutional use of the forms of cloud computing technology adopted by the
selected institutions. Table 3 shows that majority (38.8%) the selected institutions that adopted
infrastructure-as-a-service moderately use the technology follow by 24.8% of the institutions that highly use the
infrastructure-as-a-service. Concerning the use of software-as-a-service by the selected institutions, Table 3
further shows that majority (34.9% and 32.6%) of the selected institutions moderately and highly use
software-as-a-service respectively. Concerning the use of platform-as-a-service by the selected institutions,
Table 3 shows that majority (26.4% and 41.1%) of the selected institutions little use and moderately use
platform-as-a-service respectively.
By implication, Table 3 shows that software-as-a-service (SaaS) is mostly used by the selected institutions in
Southwestern Nigeria. This might be as a result of idiosyncratic of SaaS that connotes any cloud services that
users can access remotely or subscribed to and pay per usage basis [18]. Among the SaaS cloud services that can
be subscribed to or use remotely are accounting, invoicing, performance monitoring, communications, tracking
sales and planning [18]. In addition to that, this study corroborates [16] that, majority of cloud computing study
in Nigeria tend towards Saas model of cloud computing.

Table 3: Level of Institutional Use of Cloud Computing Technology
Characteristics

Level of cloud computing usage (%)

Forms of cloud computing

A

B

C

D

E

IaaS

14

7

38.8

24.8

0.8

SaaS

1.6

14

34.9

32.6

3.9

PaaS

10.9

26.4

41.1

3.9

1.6

*Multiple response is applicable
Key: A = No use; B = Little use; C = Moderate use; D = Highly use; E = Often use
Table 4 shows the cloud computing technology adopted by the selected institutions in the study area. The table
shows that most of the cloud computing technologies adopted in the selected institutions are cloud based APIs

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
(55.8%), cloud based.NET Platforms (51.9%), Cisco-WebEx (48.8%), integrated development environment
(IDEs) (43.4%), Amazon Elastic or Web Services (31.8%). More also, other cloud computing technologies
adopted by the institutions includes Gmail-Based Institution Email Service (26.4%), Microsoft Azure Cloud
(18.6%), Learning Management Systems (LMS) (16.3%), Skydrive (12.4%), Netsuite (8.5%), Dropbox (7.8%),
and Docusign (0.8%). This shows that the selected institutions exhibited some level of cloud computing
technologies. Perhaps, the necessity to adopt low cost advanced technology such as cloud computing warrant the
selected institutions to adopting the cloud technologies. Meanwhile, [3] had postulated earlier that cloud
computing technologies addresses the challenge of high cost attached to both computer software and hardware
needed to rendering quality education to the populace by providing ICT resources on a pay per use basis. By
implication, the selected institutions adopted cloud computing technologies so as to providing high quality that
is affordable, accessible at least cost for the stakeholders in the institutions.
Table 4: Cloud Computing Technology Adopted by the Selected Institutions
Characteristics

Frequency

Percent

(N=111)
Gmail-Based Institution Email Service

34

26.4

Dropbox

10

7.8

Docusign

1

0.8

Skydrive

16

12.4

Netsuite

11

8.5

Cisco-WebEx

63

48.8

Amazon Elastic or Web Services

41

31.8

Learning Management Systems (LMS)

21

16.3

Microsoft Azure Cloud

24

18.6

Integrated Development Environments (IDEs)

56

43.4

Cloud based APIs

72

55.8

Cloud based .NET Platforms

67

51.9

*Multiple response is applicable
The Table 5 in this study shows the extent of cloud computing technology in business function in the selected
institutions in the study area. The selected institutions highly use (30.2%) and often use cloud computing
technology in their payroll function. In addition to that, the table shows that the selected institutions highly
(34.1%) and often use (25.6%) cloud computing technology in their application development function.
Furthermore, Table 5 shows that the selected institutions moderately use (25.6%) and highly use (22.5%) cloud
computing technology in their project management functions. The table shows that the selected institutions

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
moderately use (33.3%) cloud computing technology in their accounting and financing functions. Also, the
institutions little use (27.9%) and moderately use (31.8%) cloud computing technology in their CRM/sales
management function. This table shows that the selected institutions moderately use (39.5%) cloud computing
technology in their procurements function. In addition, the selected institutions moderately use (37.2%) cloud
computing technology in their human resources. Lastly, the selected institutions little use (34.9%) and
moderately use (32.6%) cloud computing technology in managing and collaboration function.
By implication, the payroll functions of the selected institutions have been digitised and can be done anywhere
in the world (telecommuting). Not only that, the selected institutions have deployed cloud computing
technologies in their project management, accounting and financing, CRM/sales management, procurements,
human resources, managing and collaboration functions.
Table 5: Extent of Use of Cloud Computing Technology in Business Function
Characteristics

Extent of use of cloud computing technology

Business Function

A

B

C

D

E

Payroll

17.8

9.3

18.6

30.2

11.6

Application Development

10.1

7

8.5

34.1

25.6

Project Management

16.3

15.5

25.6

22.5

3.9

Accounting and Financing

17.1

24

33.3

7

0.8

CRM/Sales Management

21.7

27.9

31.8

3.1

-

Procurements

22.5

21.7

39.5

2.3

-

Human Resources

20.2

23.3

37.2

3.9

1.6

Messaging and Collaboration

11.6

34.9

32.6

7

3.1

*Multiple response is applicable
Key:A = Not applicable; B = Little use; C = Moderate use; D = Highly use; E = Often use

5.

Conclusion

The study concludes that the forms of cloud computing technology adopted by the selected institutions in
Southwestern

Nigeria

are

infrastructure-as-a-service

(IaaS),

software-as-a-service

(SaaS)

and

platform-as-a-service (PaaS) while software-as-a-service (SaaS) is often used by the institutions. Also, the
models of adopted cloud computing technology are private, public, hybrid and community cloud computing by
the selected institutions in Southwestern Nigeria. The adopted forms and models of cloud computing technology
are used for different business functions such as payroll, procurement, human resources, accounting and finance,
CRM, application development, and project management.

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311

6.

Limitations and future work

This study is limited to universities in Southwestern Nigeria, further studies perhaps consider the whole
universities in Nigeria. The study did not consider factors influencing the adoption of cloud computing
technologies, further studies may consider that. The study only use quantitative method in data collection and
descriptive analysis, further studies may consider mixed method in data collection and analysis.

7.

Acknowledgement

The author appreciates the contributions of indispensable scholars who in one way or the other contributes to the
scholastics of this paper.
REFERENCES
[1]

O. T., Arogundade, et al., “Investigation of Factors Affecting Cloud Computing Adoption inn Nigeria”.

Journal of Natural Science, Engineering and Technology, 2016, 15(2), 73-94.
[2]

A. Ume, A. Bassey, H. Ibrahim, “Impediments facing the introduction of cloud computing among

organizations in developing countries: Finding the answer”. Asian Transactions on Computers, 2012, 2, 12-20
[3]

S. Okai, M. Uddin, A. Arshad, R. Alsaqour, and A. Shah, “Cloud Computing Adoption Model for

Universities to Increase ICT Proficiency”, SAGE, 2014, 1-10. DOI: 10.1177/2158244014546461
[4]

S. Abdulnoor, M. D. Sulfeeza, and M. S. Siti, “Empirical Studies on Cloud Computing Adoption: A

Systematic Literature Review”. Journal of Theoretical and Applied Information Technology, 2017, 6809-6832.
[5]

N. Sultan, “Cloud Computing for Education: A New Dawn?” International Journal of Information

Management, 2010, 30, 109– 116.
[6]

T. Ercan, “Effective Use of Cloud Computing in Educational Institutions,” Procedia Social and

Behavioral Sciences, 2010, 2, 938–942
[7]

M. Mircea and A. Adreescu, “Using Cloud Computing in Higher Education: A Strategy to Improve

Agility in the Current Financial Crisis”. IBIMA, 2011, 1-15. DOI:10.5171/2011.875547
[8]

F. E. Mehmet and B. K. Serhat, B. K. Cloud Computing for Distributed University Campus”,

International Conference on the Future of Education, Pixel Publishing International, 2011
[9]

Y. G. Abdulsalam and U. Z. Fatima "Cloud Computing: Solution to ICT in Higher Education in

Nigeria", Advances in Applied Science Research, 2011, 2 (6):364-369, Pelagia Research Library.
[10]

J. Anjali, and U. S. Pandey “Role of Cloud Computing in Higher Education", International Journal of

Advanced Research in Computer Science and Software Engineering, 2013, 3(7), 966-972.

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
[11]

C. A. Oyeleye, T. M. Fagbola, and C. Y. Daramola, “The Impact and Challenges of Cloud Computing

Adoption on Public Universities in Southwestern Nigeria. (IJACSA)” International Journal of Advanced
Computer Science and Applications, 2014, 5(8), 13-19.
[12]

S. O. Olabiyisi, T. M. Fagbola, R. S. Babatunde “An Exploratory Study of Cloud and Ubiquitous

Computing Systems”. World Journal of Engineering and Pure and Applied Sciences, 2012, 2(5):148-155.
[13]

M. S. Ibrahim, N. Salleh, and S. Misra, “Empirical Studies of Cloud Computing in Education: A

Systematic Literature Review”. Springer International Publishing Switzerland, 2015, 725-737. DOI:
10.1007/978-3-319-21410-8_55
[14]

H. Hassan, M. H. Mohd-Nasir, and N. Khairudin, “Cloud Computing Adoption in Organisations:

Review of Empirical Literature”. SHS Web Conferences. 2017, 34. 1-6. DOI: 10.1051/shsconf/20173402001.
[15]

M. B. Ali, T. Wood-Harper, M.R.A. Mohamad, “Benefits and Challenges of Cloud Computing

Adoption

and

Usage

in

Higher

Education.

Stanford

University”,

2018,

1-22.

http://dx.doi.org/10.4018/IJEIS.2018100105.
[16]

A. A. Ezenwoke, and E. Igbekele, “Cloud Computing Research in Nigeria: A Bibliometric and Content

Analysis”. Asian Journal of Scientific Research. 2019, 12(1), 41-53
[17]

M. Ahronovitz, D. Amrhein, P. Anderson, A. Andrade "Cloud Computing Use Cases White Paper", 4th

ed. 2010. Accessed from http://www.cloud-council.org/Cloud_Computing_Use_Cases_Whitepaper-4_0.pdf
accessed 4th November, 2020.
[18]

K. Hashizume, "An Analysis of Security Issues for Cloud Computing", Journal of Internet Services

and Applications. 2012, 4(5): 3-13.
[19]

M. Murphy, L. Abraham, M. Fenn, and S. Goasguen, (2009), "Autonomic Clouds on the Grid", Journal

of Grid Computing, pp. 1-18.
[20]

D. Catteddu, and G. Hogben, "Cloud Computing: Benefits, risks and recommendations for information

security". 2009, 3-11.
[21]

A. Mansour, "The Adoption of Cloud Computing Technology in Higher Education Institutions:

Concerns and Challenges (Case Study of Islamic University of Gaza)" 2013.
[22]

Q. Zhang, L. Cheng, and R. Boutaba, “Cloud Computing: State-of-the-art and Research Challenges",

Journal of Internet Services and Applications, 2010, 1(1): 7-18.
[23]

K. Sharma, S. Thakur, A. Kalia, J. Thakur, and S. Kumar, "Emerging Cloud Computing Paradigm:

Vision, Research Challenges and Development Trends", International Journal of Research and Engineering and
Technology, 2014, 3(5): 11-34. EISSN:2319- 1163, ISSN: 2321-7308,
[24]

Cloud Security Alliance (CSA) "Security Guidance for Critical Areas of Focus in Cloud Computing

V2.1". 2009, 2-7.

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
[25]

G. A. Oguntala, R.A. Abd-Alhameed, and J. O. Odeyemi, “Systematic Analysis of Enterprise

Perception Towards Cloud Adoption in the African States: The Nigerian Perspective”. African Journal of
Information Systems, 2017, 9(4), 213-231.
[26]

S-K. Yoo, and B-Y. Kim, “A decision-making model for adopting a cloud computing system”.

Sustainability, 2018, 1-15. Doi:10.3390/su10082952
[27]

F. D. Davis, “A technology acceptance model for empirically testing new enduser information systems:

Theory and results”. Doctoral dissertation. Cambridge, MA: MIT Sloan School of Management, 1985
[28]

P. Buxmann, L. Sonja, and H. Thomas "Software as a Service", WIRTSCHAFTSINFORMATIK, 2008,

50 (6):500-503.
[29]

M. Anandarajan, and B. Arinze, (2010), "Factors that Determine the Adoption of Cloud Computing: A

Global Perspective", International Journal of Enterprise Information Systems, IJEIS, 6(4): 55-68.
[30]

R.

Miller,

(2011),

“Understanding

the

Different

Levels

http://www.businessservicemanagementhub.com/2011/03/16/understanding-the-

of

Cloud

Computing",

different-levels

of-

Cloud-computing/ accessed 7th October, 2020.
[31]

F. Shimba, "Cloud Computing: Strategies for Cloud Computing Adoption". Masters Dissertation at the

school of computing Dublin. Dublin Institute of Technology, 2010.

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <collection collectionId="3">
      <elementSetContainer>
        <elementSet elementSetId="1">
          <name>Dublin Core</name>
          <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
          <elementContainer>
            <element elementId="50">
              <name>Title</name>
              <description>A name given to the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26245">
                  <text>Journal of Natural Sciences and Engineering</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="43">
              <name>Identifier</name>
              <description>An unambiguous reference to the resource within a given context</description>
              <elementTextContainer>
                <elementText elementTextId="26605">
                  <text>2637-2835</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="98">
              <name>DOI</name>
              <description>Digital object identifier</description>
              <elementTextContainer>
                <elementText elementTextId="26606">
                  <text>10.14706</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="45">
              <name>Publisher</name>
              <description>An entity responsible for making the resource available</description>
              <elementTextContainer>
                <elementText elementTextId="26607">
                  <text>International Burch University</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="41">
              <name>Description</name>
              <description>An account of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26608">
                  <text>Journal of Natural Sciences and Engineering (JONSAE) is a peer-reviewed, biannually published international journal focusing on empirical and theoretical research in all branches of Engineering and Natural Sciences. It is published on the behalf of Faculty of Engineering and Natural Sciences of International Burch University and aims to provide the best content regarding by publishing original research papers, review articles, special issues, feature articles, and book reviews. All manuscript submissions are subject to initial appraisal by the Editor, and, if found suitable for further consideration, to peer review by independent, anonymous referees. All peer review is double-blind and submission is online. The journal welcomes theoretical, applied, interdisciplinary and methodological work, with preference on empirical research, critical approach and problem-solving methods in manuscripts.</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="44">
              <name>Language</name>
              <description>A language of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26609">
                  <text>English</text>
                </elementText>
              </elementTextContainer>
            </element>
          </elementContainer>
        </elementSet>
      </elementSetContainer>
    </collection>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="26565">
                <text>Understanding Forms and Models of Cloud Computing Technologies Adopted in the&#13;
Selected Institutions in Southwestern Nigeria&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="26566">
                <text>Gbonjubola Oluwafunmilayo Binuyo</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="26567">
                <text>The study examined the forms and models of cloud computing technology adopted in the&#13;
selected institutions from four states in Southwestern Nigeria. The three purposively selected institutions&#13;
were Federal, State and Private owned making twelve institutions. However, the administered&#13;
questionnaire was filled in by the ten (10) IT personnel, ten (10) lecturers and five (5) students from each&#13;
of the selected institutions making 300 respondents. The questionnaire elicited information on the forms&#13;
and models of cloud computing technology adopted and the extent of use of the adopted cloud computing&#13;
technologies in the selected institutions. Secondary data were obtained from relevant literature. Data&#13;
collected were analysed with descriptive and inferential statistics. The study concludes that the forms of&#13;
cloud computing technology adopted by the selected institutions in Southwestern Nigeria are&#13;
infrastructure-as-a-service (IaaS), software-as-a-service (SaaS) and platform-as-a-service (PaaS) while&#13;
software-as-a-service (SaaS) is often used by the institutions. Also, the models of adopted cloud computing&#13;
technology are private, public, hybrid and community cloud computing by the selected institutions in&#13;
Southwestern Nigeria. The adopted forms and models of cloud computing technology are used for&#13;
different business functions such as payroll, procurement, human resources, accounting and finance,&#13;
CRM, application development, and project management.&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="26568">
                <text>Cloud computing, Institutions and Nigeria&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="43">
            <name>Identifier</name>
            <description>An unambiguous reference to the resource within a given context</description>
            <elementTextContainer>
              <elementText elementTextId="26569">
                <text>2637-2835</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="98">
            <name>DOI</name>
            <description>Digital object identifier</description>
            <elementTextContainer>
              <elementText elementTextId="26570">
                <text>10.14706/JONSAE2021318&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
  </item>
  <item itemId="3501" public="1" featured="0">
    <fileContainer>
      <file fileId="4317">
        <src>https://eprints.ibu.edu.ba/files/original/4185b962c7b2e1090b65243b0dbbab63.pdf</src>
        <authentication>c9e429afa68c8df8f76a72e2686eb35b</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="26557">
                    <text>Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311

Contemporary housing trends in Sarajevo
Emina Mehic1
1-International Burch University, Sarajevo, Bosnia and Herzegovina
emina.mehic@stu.ibu.edu.ba
Abstract – Within the last 20 years, there has been witnessed a significant increase of the urban
population of Sarajevo, as a result of economic and social migrations. Consequently, this has caused
an increasing demand for new housing which is mainly profit-oriented without any beneficial social,
environmental or cultural implication. Primary objective of this research is to analyze the current
situation and to assess the quality of the buildings not only as a housing solution, but as a complex
that unites the community who inhabits it. This research will be conducted in a qualitative manner
in analysis and statistical approach over the data related to the urbanization, building standards
and positive effects of the building. Newly built parts of settlements Otoka and Stup will be used as
case studies, since these parts of the city are most influenced by the mass production of the new
housing solutions. This paper stresses out the correlation between high demand for the new housing
and decreased quality of the housing without respecting minimum spatial and environmental
standards, without basic amenities, social infrastructure and recreational and cultural activities.
There is a need for improvements in contemporary housing design that will reflect with positive
impacts on social, environmental, economic and cultural aspects of urban living.
Keywords - Contemporary housing trends, qualitative analysis, Otoka, Stup
1.

Introduction

City of Sarajevo is becoming a large construction site, meaning that more and more residential buildings
and buildings in general are being built. For the past couple of years, the fast appearance of the entire
residential settlements is noticeable. The parts of the city that are affected the most are Otoka and Stup.
One of the most characteristic housing solutions are definitely residential settlements called Stup
Nukleus, a newly built residential and business complex in Stup, municipality of Ilidža and Nova Otoka
in Otoka, municipality of Novi Grad.
With the urbanization of the capital city of Sarajevo extending rapidly. It is not a surprising phenomenon
that more and more investors are seeking an opportunity for profit. In order to realize why the interest is
so high in these specific parts of the city, history and urban plans for Sarajevo will give us a more precise
point of view.
Otoka is a settlement in the capital city of Bosnia and Herzegovina, Sarajevo, located in municipality
Novi Grad. Otoka is closely coupled with the following: Buća Potok (North side), Čengić vila (East side),

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
Aneks (South-East side), Švrakino Selo (South side). Its residential core represents a chain of high-rise
buildings (Streets: Žrtava Fašizma, Brčanska, Aleja Lipa). [2]
The majority of residential buildings built in this part of the city was built by the government in early 70s
when Otoka was considered one of the most prominent, modern and cleanest parts of the Sarajevo
suburbia. The residential design of this part of the city was also advanced considering the other buildings.
As shown on Figure 1, These were built during socialist regime, since significant attention was paid to
environmental aspects of the settlement. There were designated areas for parks, elementary schools,
preschools and shopping. [1] Originally residential settlements were built on the left side of Miljacka
river, which before the 70's was mainly empty fields. Accordingly, there were no plans for extensive
construction on the other side of the river, since the idea was to maintain Otoka Meandar as the green
“lungs” of the city containing recreational areas and walking paths. The area to the North between two
major traffic axis – Bulevar Meše Selimovića and Džemala Bijedića street were treated as industrial site.
After the 1990’s war new buildings were erected in the Meandar area. “Stadium Otoka” was built in 1993

and it was additionally upgraded and renovated in 2011. “Istiklal Mosque” was also built in 2001, beside
these two, Vistafon multipurpose hall and Olympic pool – two large scale projects were built in this
period. Even though these are mainly sport and recreational buildings that provide social interaction and
entertainment opportunities the green lungs of the city were seriously jeopardized. In the meantime, with
the construction of the mentioned buildings industrial zones slowly started decaying and as the market
needs and industry demand changed. The industrial companies that owned the area were destroyed in the
shady privatization processes that followed the war. Industries that have survived the war and
privatization, were allocated outside of the city. This created an opportunity to transform the entire
industrial zone into residential settlement.[6] [7]

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311

Examining urbanization plans we can conclude that the first residential zone was expected to be at
maximum 6-8 floor height, but today we can see that the floor height almost doubled and we can notice
12-13 story buildings. The building blocks that we are examining now in Nova Otoka were initially
planned with a maximum height of 21 meter, but with the change of the regulatory plan in 2017, their
height increased to 42m. However, even though the height of the buildings was increased the distances or
the number of the pertaining facilities remained the same.
Another important issue is vehicular congestion that is happening on a daily basis in this part of the city,
because Otoka as mentioned is the geographical center of the city. It is a connection point from the hill
settlements and the valley, with tram connection and the main road. Furthermore, once the Otoka
settlement was previously built vehicle traffic was directed with neighbourhood lanes planned in a ring
style around the perplexing which added to more secure conditions generally and decreased the
congestion.

Stup, shown on Figure 3., is a settlement in the capital city of Bosnia and Herzegovina, Sarajevo, located
in the municipality Ilidža. Geographically it is located in the western part of the city further from the city
centre. It is encompassed by the river Miljacka on the South, and on the North by the river Dobrinja.
Neighboring settlements are Briješće, Alipašin most, Alipašino Polje, Olimpijsko selo, Nedžarići, Zračna
luka Butmir, Ilidža, Pejton, Otes and Azići. This part of the city was quite rural since it was considered on
the outskirts of the city, so mainly low-rise, single family houses and industrial buildings were located in
this area. These were mainly owner-occupied housing and there were now larger scale buildings. Once the
regulatory plan was provided, Stup area was separated into zones. One of the zones - Stup Nukleus was
designated as a residential settlement zone comprising recreational and green areas. However, there were
multiple missteps during the implementation of the plan itself. The Institute for Development Planning of

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
Sarajevo Canton hasn’t specifically stated the dimensions of the single buildings, but rather provided
zones for approved buildings with pertaining area coverages and building indexes. On the other hand, the
developer chose to ignore the regulations and building indexes and built the entire buildable area. This
has 8caused very high building density, for instance we have several cases of 6-meter distance between
two 13 story buildings. Regarding the historical narrative of Stup Nukleus the site in 1992 was owned by
a farming cooperative. After the war, the area became privately owned. Construction of the Stup Nucleus
residential settlement began in 2011. The Municipality of Ilidža drafted a Study on the socio-economic
justification for the establishment of a public institution in the Stup II settlement in November 2017,
which plans for the construction of the school to begin this year, but it never happened. The closest school
to this settlement is currently Aleksa Santic Elementary School, located in the Aerodromskom naselju,
which is more than one kilometer away, and access to it is very dangerous because of the frequent traffic,
especially for younger children. Regarding the vehicular connection of Stup, it is connected to the main
traffic axis- Džemala Bijedića street and it contains one of the biggest road loops that is connecting city to
other magistral roads that are leading to Mostar, Zenica or Tuzla. With this being said, we can now
incorporate the general characteristics of both settlements to create a detailed analysis of the new building
construction trends and he future of building in the capital city of Sarajevo. [3]

Figure 3. Stup in Yugoslavia, as spacious new settlement near to industrial zone
[www.klix.ba]

2.

Methodology

The case study will show the quality, trends, potential problems and possible improvements for
contemporary housing trends in Sarajevo. This will allow us to contain all necessary information that will
be relevant for our research. The results will be used to give recommendations for the design of
residential housing in the future.

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311

3.

Case study

Urbanistic criteria:
On Figure 4. below the regulatory plan of Stup Nukleus can be seen. Based on the urban typology and
regulation plan proposed we will be able to bring up some conclusions and find relevant data that will
affect the evaluation of the results. [9]

Figure 4. Regulation plan of Stup Nukleus [Institute for Planning Development of Sarajevo Canton]
Stup Nukleus was built in three separate phases and even though the majority of it was built during the
first phase. The construction process started in 2001 and it consisted of 5 buildings with heights varying
between 5 and 12 stories high. Smallest distance between these buildings is 6 meters and it is between the
10 story building and 7 story building which creates a big issue in terms of vistas, day light and extreme,
almost inhuman density. [4]
Buildings are taking around 7.471 m2 of the site area which is 20.245 m2. We can come to a conclusion
that more than a third of the actual site is covered by the buildings. Furthermore, this brings us to the
calculation of Urban Density Index (expressed through floor area ratio) which in this case equals
0,36902939. This is quite a lot taking into a consideration that buildings are over 10 stories high, creating
the image of very high physical concentration and spatial congestion.

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311

Figure 5. Regulation plan of Stup Nukleus in the first phase of development
[Institute for Planning Development of Sarajevo Canton]

The second phase, represented on the Figure 7., of Stup Nukleus development contained incredible
amount of 11 buildings ranging from 6 to 13 floors high. The smallest distance between these buildings is
7,5 m. The total area covered by the buildings is 18. 455 m2 out of 51. 056 m2 of the total site area. The
Urban Density Index (expressed through floor area ratio) for the second phase of Stup Nukleus is
0,3614658414 which is smaller than the first mentioned phase. [3]

Figure 6. Regulation plan of Stup Nukleus in the second phase of development
[Institute for Planning Development of Sarajevo Canton]

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311

Figure 7. Completion of Stup Nucleus I
[Tibra Pacifc]
However, the situation on site is considerably worse than the first phase. Because the amount of
extremely high buildings is much more pronounced than before and some parts of the site are simply
incapable of receiving any daylight. There are also cases where the buildings are facing each other to
extent of creating privacy issues.

Figure 8. Construction of Stup Nucleus 2 in third phase of Stup Nucleus development
[Tibra Pacific]
The third phase contains similar situation like it is shown on regulatory plan bellow, it contains 4.508 m2.
These one is still in development so it is hard to get the exact value for the UDI, it contains 3 buildings
and 1the highest one is 9 floors high.

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311

Figure 9. Regulation plan of Stup Nukleus in the third phase of development
[Institute for Planning Development of Sarajevo Canton]
On the other hand, when we talk about Nova Otoka we can notice 5 new buildings with two of them with
the same height of 12 floors, which as mentioned before has doubled after the change of regulatory plan.
The covered area of Nova Otoka is 10.601 m2 out of the total area of 26.930 m2, and one more building
that is in further location, not in between these buildings has area of 2031 m2. [10] It is important to
notice that the UDI in Nova Otoka is 0,3936502042. It is high, but there is a factory in between the
buildings that is contains the rest of this field. This technically means that here the building density is
almost close to ~ 0,86. For the general size of sit it is high and it takes large portion of space.

Figure 10. Regulation plan of Otoka
[Institute for Planning Development of Sarajevo Canton]

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
Environmental, social and cultural criteria:
Based on the documentation and geographical analysis of the site, where Stup Nukleus is located we can
conclude that there is no park in the close proximity of the complex, never the less there are no amenities
for children or any similar project planned. The closest park that is intended for recreational and leisure
purposes is 25 minutes walk from the complex and it is 1.9 km away. On the other hand, based on the
analysis of Nova Otoka site we conclude that there is only one very small park within the complex,
however the amenities for children are quite limited. Closest larger park that is intended for recreational
and leisure purposes is 37 minutes walk from the complex and it is 3.3 km away.
Considering the social aspects of the mentioned complexes we can notice a very bad trend of lack of care
for the social interaction. It is important to mention the better position of Otoka compared to Stup that
didn’t have any predispositions for social and cultural facilities, which Otoka inherited from socialist
Yugoslav construction. After the careful examination of the site, we have concluded that Stup Nukleus
has 5 privately owned coffee shops and 3 restaurants which based on the population and building density
is not enough. [8] Beside these private commercial activities, there is no any sort of entertainment,
recreational or cultural enforcing amenity in either one of the sites we are examining in this case study.
[5]
Architectural criteria:
Stup Nukleus is commonly considered to be one of the worst complex built in Sarajevo in last two
decades. The main issue we have discovered based on the interviews, was that the insufficient distance
between buildings. [11]
We will select the sample apartment from these buildings. The example that we used is the apartment with
2 bedrooms and has total of 58 m2. The selected type is the most common and the most repetitive type of
the apartment in the entire complex. Regarding the layout and the dimensions of the rooms it is noticeable
that from the lobby the living room with the kitchen and dining are accessible. The total area for these
spaces is of 17,80 m2. From this space you can access the balcony 9,20 m2. To the left of the lobby there is
a bathroom, area of 4,06 m2. The master bedroom is 14,37 m2 with access to the loggia. To the right of the
front door is a pantry, area of 1,80 m2, while access to a smaller bedroom that has area 8,39 m2, from the
living room. Some of these apartments are above the 7th floor. Which brings us to the next point and that
is the disadvantages of Stup Nukleus buildings. This disadvantage is the insufficient amount of natural
light. This issue is closely connected to the distance between buildings. The floors above the 7th floor, do
have access to the natural light. Other parts are poorly designed and they get at most 3 hours of daylight.
Looking upon the window to space ratio, we can notice that there is lack of windows throughout of the
apartment, the rooms are small and they are really hard to fit any larger piece of furniture. As well on the
floor plan you can see that the kitchen and bathroom are too small. Beside that, this apartment as you can
see is facing the north side. This side is the side that gets the small amount of light in it. The issue is with

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
the air circulation from the kitchen to the only window in the left part of apartment that has to go through
living and dining room.

Figure 11. 58 m2 apartment in Stup Nucleus as average size apartment [www.olx.ba]
On the other hand, as mentioned before, Nova otoka is also a project from the same construction
company as Stup Nukleus complex and it is considered to be more contemporary and higher level than
Stup. Since Nova Otoka was just recently completed, we were able to find more information about the
technical execution of the construction and about building layout itself. This apartment is located on the
west side of the complex and it is on 12th floor, meaning there is just one floor above it. [3]
Further more As mentioned before Nova otoka is also a project from the same construction company as
Stup Nukleus complex and it is considered to be more contemporary building than the previously
mentioned building. Since Nova Otoka was just recently completed, we were able to find more
information about the technical execution of the construction and about building it self. Floors facilities:
two floors basement, ground floor and 12 residential floors. The basement floors are designed as parking
spaces, ground floor contains offices, while the 12 floors above the ground floor are planned as housing
units. The complex contains 12 floors, but the last 2 floors are two story penthouses. This complex
apartment size varies from 32,49 m2 to 133,63 m2 where average area of the apartments is 65 m2. This
apartment is located on first floor of 12 story building A. It is on South and facing the main road, which is
very frequent and has high vehicle density during the day, especially the Otoka settlement due to the
issues with traffic jams is know to be the start of the jams making vehicle concentration very high.

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
In order to make better comparison, we will select the similar size of the apartment from “Nova Otoka”
complex, which has 57.16 m2. The apartment consists out of living room with connected kitchen and
dining
room with the total area of 22,64 m2. Master bedroom with area of 13,15 m2 is directly next to children
bedroom: 7,10 m2 and within the lobby with area of 3,72 m2, across from the bedroom there is toilet with
area of 4,24 m2 within the living room we can notice the balcony: 6,31 m2.

Figure 12. 57 m2 apartment in Nova Otoka as average size apartment [www.olx.ba]
Additionally, more significantly the shape, the rooms inside of the buildings are just not practical, because
placing a bed in middle of the room, leaves around 70 cm of space that is accessible. This is a new
practice and it proving to be bad and non-functional. Resident will always have a lack for space for
wardrobes.
4.

Conclusion

Evaluating the situation and the data presented above, we can state that the analysis showed that most of
these new complexes like the Stup Nukleus and Nova Otoka are built mainly for profit, without any
concern for environmental, social or cultural benefits of such developments. There is lack of care for
providing smart residential building solutions or on the other hand any basic social, recreational and
cultural infrastructure resulting in inhuman, unsocial and quite hostile built environment without any
sense of identity. A significant improvement can be done by adding areas like parks and playgrounds for
children. Instead, the developers are opting for rather cruel profit machine that will bring money
exclusively to the investors.
There is a significant influence of scale, more precisely building density and distances between buildings
on the overall quality of the studied complexes. One of the main issues especially noted in Stup Nukleus

�Journal of Natural Sciences and Engineering, Vol. 2, No.2 (2020)
DOI number: 10.14706/JONSAE2021311
is that there is an evident lack of daylight in between the buildings, especially where the distance between
two buildings is not more than 8 meters. This is causing privacy issues, issues with vistas which can also
lead to the further psychological issues. From the regulatory plan is very important to state that the
density and the height of the buildings is not by any regulations or laws that are set in place. The case
study has shown that the layout of the bedrooms within the buildings is highly questionable, based on
their position and the size. The versatility, the flexibility and the functionality of certain spaces, bedrooms
foremost, are dubious due to their limited size. [4]
Furthermore, it is important to conclude with saying that there needs to be improvements and persistency
of government to pursue the execution of the initially set regulatory plans. Moreover, there is an evident
need for a clear set of residential standards in terms of room size, layout, orientation etc. These standards
should be used and applied as regulatory mechanisms. This will prevent any future mistakes. On the other
hand, the investors need to keep in mind all of the aspects of living, rather than just providing profitable
housing solutions without any amenities. Lastly, the final users of the housing should be more aware of all
the consequences and implications of the inadequate residential settlements, instead of focusing just on
price per m2.
5.

REFERENCES

[1] Bošnjak, Katarina. “URBANI IDENTITET SARAJEVA.” AABH, 5 Nov. 2016
aabh.ba/urbani-identitet-sarajeva/.
[2] “Općina Novi Grad Sarajevo.” Općina Novi Grad Sarajevo, 2015;
www.novigradsarajevo.ba/index.php?option=com_content&amp;view=article&amp;id=17&amp;Itemid=21.
[3] Sarajevo, Canton. “Building Regulations and Laws for Canton Sarajevo”, 2017, propisi.ks.gov.ba
[4] Bachelard, G. (1994). The Poetics of Space. Boston: Beacon Press books
[5] Finci, J. (1962). Development of Disposition and Function in Residential Culture of Sarajevo.
Sarajevo:
[6] NP Oslobodjenje. Grabrijan, D., &amp; Neidhardt, J. (1957). Architecture of Bosnia and the Way to
Modernity. Ljubljana.
[7] Ernst, J. Z., Vukicevic, B., Jakulj, T., &amp; Ilich, W. (2017, August 22). Sarajevo Paradox: Survival
throughout History and Life after the Balkan War. Retrieved from Columbia University: from
http://www.columbia.edu/cu/ece/research/intermarium/vol6no3/ernst.pdf
[8] Federalni zavod za statistiku. (n.d.). Retrieved from
http://fzs.ba/index.php/popis-stanovnistva/popisstanovnistva-2013/preliminarni-rezultati-popisa-2013/
[9] “PACIFIC’ d.o.o. Kiseljak.” TIBRA, 2019, tibra-pacific.com/tibra_new/.
[10] Otoka, Nova. “NOVA OTOKA.” NOVA OTOKA, 1 Aug. 2015, www.novaotoka.com/en/home.php
[11] Općina Ilidža
https://www.opcinailidza.ba/uploads/files/shares/REGULACIONI%20PLANOVI/Regulacioni%20plan
20Stup%20Nukleus.pdf

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <collection collectionId="3">
      <elementSetContainer>
        <elementSet elementSetId="1">
          <name>Dublin Core</name>
          <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
          <elementContainer>
            <element elementId="50">
              <name>Title</name>
              <description>A name given to the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26245">
                  <text>Journal of Natural Sciences and Engineering</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="43">
              <name>Identifier</name>
              <description>An unambiguous reference to the resource within a given context</description>
              <elementTextContainer>
                <elementText elementTextId="26605">
                  <text>2637-2835</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="98">
              <name>DOI</name>
              <description>Digital object identifier</description>
              <elementTextContainer>
                <elementText elementTextId="26606">
                  <text>10.14706</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="45">
              <name>Publisher</name>
              <description>An entity responsible for making the resource available</description>
              <elementTextContainer>
                <elementText elementTextId="26607">
                  <text>International Burch University</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="41">
              <name>Description</name>
              <description>An account of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26608">
                  <text>Journal of Natural Sciences and Engineering (JONSAE) is a peer-reviewed, biannually published international journal focusing on empirical and theoretical research in all branches of Engineering and Natural Sciences. It is published on the behalf of Faculty of Engineering and Natural Sciences of International Burch University and aims to provide the best content regarding by publishing original research papers, review articles, special issues, feature articles, and book reviews. All manuscript submissions are subject to initial appraisal by the Editor, and, if found suitable for further consideration, to peer review by independent, anonymous referees. All peer review is double-blind and submission is online. The journal welcomes theoretical, applied, interdisciplinary and methodological work, with preference on empirical research, critical approach and problem-solving methods in manuscripts.</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="44">
              <name>Language</name>
              <description>A language of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26609">
                  <text>English</text>
                </elementText>
              </elementTextContainer>
            </element>
          </elementContainer>
        </elementSet>
      </elementSetContainer>
    </collection>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="26558">
                <text>Contemporary housing trends in Sarajevo</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="26559">
                <text>Emina Mehić</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="26560">
                <text>Within the last 20 years, there has been witnessed a significant increase of the urban&#13;
population of Sarajevo, as a result of economic and social migrations. Consequently, this has caused&#13;
an increasing demand for new housing which is mainly profit-oriented without any beneficial social,&#13;
environmental or cultural implication. Primary objective of this research is to analyze the current&#13;
situation and to assess the quality of the buildings not only as a housing solution, but as a complex&#13;
that unites the community who inhabits it. This research will be conducted in a qualitative manner&#13;
in analysis and statistical approach over the data related to the urbanization, building standards&#13;
and positive effects of the building. Newly built parts of settlements Otoka and Stup will be used as&#13;
case studies, since these parts of the city are most influenced by the mass production of the new&#13;
housing solutions. This paper stresses out the correlation between high demand for the new housing&#13;
and decreased quality of the housing without respecting minimum spatial and environmental&#13;
standards, without basic amenities, social infrastructure and recreational and cultural activities.&#13;
There is a need for improvements in contemporary housing design that will reflect with positive&#13;
impacts on social, environmental, economic and cultural aspects of urban living.&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="26561">
                <text>Contemporary housing trends, qualitative analysis, Otoka, Stup</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="43">
            <name>Identifier</name>
            <description>An unambiguous reference to the resource within a given context</description>
            <elementTextContainer>
              <elementText elementTextId="26562">
                <text>2637-2835</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="98">
            <name>DOI</name>
            <description>Digital object identifier</description>
            <elementTextContainer>
              <elementText elementTextId="26563">
                <text>10.14706/JONSAE2021317&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
  </item>
  <item itemId="3487" public="1" featured="0">
    <fileContainer>
      <file fileId="4294">
        <src>https://eprints.ibu.edu.ba/files/original/2ec0ac41ff0e0b71ab32b474c716b5ce.pdf</src>
        <authentication>c6d7c6996b753e2359491dfe742709ae</authentication>
      </file>
    </fileContainer>
    <collection collectionId="3">
      <elementSetContainer>
        <elementSet elementSetId="1">
          <name>Dublin Core</name>
          <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
          <elementContainer>
            <element elementId="50">
              <name>Title</name>
              <description>A name given to the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26245">
                  <text>Journal of Natural Sciences and Engineering</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="43">
              <name>Identifier</name>
              <description>An unambiguous reference to the resource within a given context</description>
              <elementTextContainer>
                <elementText elementTextId="26605">
                  <text>2637-2835</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="98">
              <name>DOI</name>
              <description>Digital object identifier</description>
              <elementTextContainer>
                <elementText elementTextId="26606">
                  <text>10.14706</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="45">
              <name>Publisher</name>
              <description>An entity responsible for making the resource available</description>
              <elementTextContainer>
                <elementText elementTextId="26607">
                  <text>International Burch University</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="41">
              <name>Description</name>
              <description>An account of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26608">
                  <text>Journal of Natural Sciences and Engineering (JONSAE) is a peer-reviewed, biannually published international journal focusing on empirical and theoretical research in all branches of Engineering and Natural Sciences. It is published on the behalf of Faculty of Engineering and Natural Sciences of International Burch University and aims to provide the best content regarding by publishing original research papers, review articles, special issues, feature articles, and book reviews. All manuscript submissions are subject to initial appraisal by the Editor, and, if found suitable for further consideration, to peer review by independent, anonymous referees. All peer review is double-blind and submission is online. The journal welcomes theoretical, applied, interdisciplinary and methodological work, with preference on empirical research, critical approach and problem-solving methods in manuscripts.</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="44">
              <name>Language</name>
              <description>A language of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26609">
                  <text>English</text>
                </elementText>
              </elementTextContainer>
            </element>
          </elementContainer>
        </elementSet>
      </elementSetContainer>
    </collection>
    <itemType itemTypeId="10">
      <name>Lesson Plan</name>
      <description>A resource that gives a detailed description of a course of instruction.</description>
    </itemType>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="26481">
                <text>FPGA-based Implementation of IIR Filter for Real-Time Noise Reduction in Signal</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="26482">
                <text>Aladin Kapić1, Rijad Sarić1, Slobodan Lubura1, 2, Dejan Jokić</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="26483">
                <text>Filtering of unwanted frequencies represents the main aspect of digital signal processing (DSP) in&#13;
any modern communication system. The main role of the filter is to perform attenuation of certain frequencies&#13;
and pass only frequencies of interest. In a DSP system, sampled or discrete-time signals are processed by digital&#13;
filters using different mathematical operations. Digital filters are commonly categorized as Finite Impulse&#13;
Response (FIR) and Infinite Impulse Response (IIR). This research focuses on the full VHDL implementation&#13;
of digital second-order lowpass IIR filter for reducing the noisy frequencies on the FPGA board. The initial&#13;
step is to determine, from continuous time domain function, the transfer function in the complex {s} domain,&#13;
then map transfer function in complex {z} domain and finally calculate the difference equation in discrete-time&#13;
domain of the system with adequate coefficients. Prior to the FPGA implementation, the IIR filter is tested in&#13;
MATLAB using a signal with mixed frequencies and signal with randomly generated noise. The digital&#13;
implementation is completed by using fixed-point binary vectors and clocked processes.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="26484">
                <text>digital signal processing; IIR filter; digital design; FPGA; VHDL; Bode diagram</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="43">
            <name>Identifier</name>
            <description>An unambiguous reference to the resource within a given context</description>
            <elementTextContainer>
              <elementText elementTextId="26485">
                <text> 2637-2835</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="98">
            <name>DOI</name>
            <description>Digital object identifier</description>
            <elementTextContainer>
              <elementText elementTextId="26486">
                <text>10.14706/JONSAE2021316&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
  </item>
  <item itemId="3480" public="1" featured="0">
    <fileContainer>
      <file fileId="4287">
        <src>https://eprints.ibu.edu.ba/files/original/c91f0c7eb4c25c44efb93b1215302dc1.pdf</src>
        <authentication>11fd7f01ac78fa3f60095d139353018c</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="26460">
                    <text>Journal of Natural Sciences and Engineering, Vol. 1, (2019)
DOI number: 10.14706/JONSAE2019114

Quantitative estimation of cooling load capabilities of residential buildings using
machine learning

Nedret Bećirović, Ismail Bejtović, Jasmin Kevrić

International Burch University, Sarajevo, Bosnia and Herzegovina
nedret.becirovic@stu.ibu.edu.ba
ismailbejtovic@hotmail.com
jasmin.kevric@ibu.edu.ba
Abstract – Based on previous research on energy efficiency of the buildings, particularly their cooling
load capabilities we will develop a collection of machine learning methods for detecting buildings
with best cooling load capabilities. This collection will study the influence of 8 input variables (relative
compactness, surface area, wall area, roof area, overall height, orientation, glazing area, glazing area
distribution) on one output parameter, that is cooling load of buildings. The results of this study
support the practicability of using machine-learning software to estimate building parameters as a
convenient and accurate approach, as long as the methods chosen are well suited for the type of data
in question.
Keywords – cooling load, energy efficiency, machine learning, neural network.

1.

Introduction

Considering growing electrical energy consumption in the residential sector [1] and Global Warming it is
noticeable that energy consumption for cooling will surpass energy consumption for heating in the
foreseeable future. Heating and cooling load are two very important parameters in the efficient building
design. These two parameters are closely related to the materials that the building is made of, so
construction decisions made early on have a great impact on the final result. There has been a considerable
body of research [2] on this field and on this dataset but with no focus on the cooling load itself. Various
software for simulation of energy consumption has been used over the years often in conjunction with
architectural design. Accuracy of the simulation varies often across from one software package to another
[3]. Therefore this work is envisaged as an addition to the existing software solutions.

It is often the case that building parameters are compared separately with cooling and heating load, and
simple correlation has been sought [4]. Multiple regression analysis was very popular for prediction of
energy consumption until it was proven that a simple Neural Network is much better than Multiple Linear
Regression Analysis with a large database [5].

�Journal of Natural Sciences and Engineering, Vol. 1, (2019)
DOI number: 10.14706/JONSAE2019114
For architects it is very important to single out and rank parameters that have the strongest impact since
normality assumptions do not hold for very complicated problems. For example, glazing areas will have
minimal impact on the cooling load. Surface area and overall height are parameters with strongest impact.

This work is done in hope it will help future architects, energy advisors for building smart buildings and
generally in the field of energy efficiency. Further studies could help with choosing suitable materials for
the construction.

2. Data

This study is based on UCI database made, non-gaussian dataset made by a CAD software Ecotect. Dataset
represents 12 different building forms, where each form is composed of 18 building blocks of the same
volume (3.5 x 3.5 x 3.5), and houses have also the same volume, which is 771.75 m3, but different height
and surface area. Materials used in these 18 blocks are all contemporary and with best U-values which are
well defined for walls, floors etc with variations in glazing area and orientation [2].

With twelve building forms and three glazing area variations with five glazing area distributions each, and
for four orientations, (12x3x5x4) 720 building samples. 12 building types are considered without glazing
but with four sides of orientation (4x12). In all it gives 768 different building types. [2]

Since parameters are identified which have the strongest impact a new dataset can be constructed where
some parameters can be locked in value and others can be varied.

Data-mining is the identification of the parameter which has the greatest influence of the result. Statistical
tools will be used tools but also inputs from builders, architects, masons etc. will give great value to the
study. They can also provide knowledge of feasibility of building parameters. How much a particular
building feature costs in the real world.

This is a well understood, relatively large dataset with 786 buildings each having 8 parameters. This is not
a skewed dataset, so this dataset is not treated as such, meaning that data were not sifted through. Some
light pruning, or trimming of data is an essential part of the random and best first search methods.

Data are though skewed in another way. Dataset is non-gaussian, and it is of great importance to find any
bias that may have influenced the dataset using classical statistical analysis which visually gives an outlying
parameter. There were not any parameters which should be given more or less weight in the neural network
model. Finding a dataset of real buildings or extracting data from buildings with a great cooling load was

�Journal of Natural Sciences and Engineering, Vol. 1, (2019)
DOI number: 10.14706/JONSAE2019114
also a goal for this work. Glazing area did not have much importance in this data set for finding cooling
load. New modern types of materials are changing the paradigm of the builders' philosophy and focus of
this work changed back on the study of the virtual buildings i.e. our dataset. It would be best to actively
follow the research on the field, particularly if there has been a report on a construction of the buildings
based on research using this or a similar dataset. Dataset has been normed, quantified and classified in a
very understandable and logical way by Xifara-Tsanas, (see Table 1).
Table 1. Mathematical representation of the input and output variables to facilitate the presentation of the
subsequent analysis and results.
Mathematical

Name

Number of possible values

x1

Relative compactnes

12

x2

Surface area

12

x3

Wall area

7

x4

Roof area

4

x5

Overall height

2

x6

Orientation

4

x7

Glazing area

4

x8

Glazing area distribution

6

y2

Cooling load

636

representation

3. Methods

Classical statistical tools like histograms and scatter plots are firstly applied to dataset. Seeing the data on
the graph is a great help in understanding the data. It gives the idea in which direction study has to go.
Improving a model can take two different directions: make the model simpler or add complexity. Making
a simpler model involves feature reduction, pruning branches and removing learners from an ensemble.
Adding complexity means fine-tuning involving model-combination or adding more data sources [6].
Out of many software tools, WEKA is chosen because it is easy to use and it is easily accessible. Searching
for the best computer intelligence method that is suitable for artificial dataset was the first step. Which
algorithm to use is to be based on dataset form and trial and error method. Getting a good result from the
start with a random forest method gave indication in which direction to go.

For the analysis of the available data set, five different regression algorithms were used:

�Journal of Natural Sciences and Engineering, Vol. 1, (2019)
DOI number: 10.14706/JONSAE2019114

•

Linear Regression

•

Random Forest

•

REPTree

•

SMOreg

•

Multilayer Perceptron

These algorithms are recommended for these types of datasets [7]. Regression analysis was helpful to model
the relationship between dependent variables (cooling load) and independent variables (8 attributes in our
dataset), and because a class from a data set (cooling load) has a large number of different instances. Cross
validation was used with ten folds, to get insight of how the model will behave to an unknown dataset.

All of the above algorithms are regression algorithms, with the same goal, but working in different ways.
Linear regression models are linear predictor functions whose model parameters are estimated from the
data. Linear regression models are often fitted using the least square approach, but they may be fitted in
many other ways [8].

Random forest is an ensemble method, which creates multitude of decision trees, and gives as output mean
prediction of individual trees. This algorithm applies bootstrap aggregating, or bagging, to its tree learners.
Compared to decision tree random forest tends to provide more accurate classification of a feature, because
of the decreased bias and variance. The more decision trees are chosen the more computational power is
required [9].

Reduced Error Pruning Tree (REPTree) is a fast decision tree learner, which creates multiple trees in
different iterations and selects the best one from all created trees. REPTree builds regression tree
information gain and prunes it using reduced-error pruning. For numeric attributes it sorts values only once
[10].
SMOreg uses a support vector machine for regression. RegSMOImproved for SMOreg are used to learn
parameters, but many other algorithms can be used, like Platt’s SMO [11].

Multilayer perceptron is a class of feedforward artificial neural networks. It consists of at least three layers
of nodes: an input layer, a hidden layer and an output layer. It is by far the most popular architecture because
of its structural flexibility, good representational capabilities, and the availability of a large number of
training algorithms [12].

Feature selection is a key part of the applied machine learning process, just as model selection is. Feature
selection should be considered as a part of the model selection process. If not, bias can inadvertently be
introduced into models and it results in overfitting.

�Journal of Natural Sciences and Engineering, Vol. 1, (2019)
DOI number: 10.14706/JONSAE2019114

Feature selection must be included within the inner-loop when using accuracy estimation methods such as
cross-validation. This means that feature selection is performed on the prepared fold right before the model
is trained [7].

Dataset used in this work is small both in number of features and samples and it does not suffer from the
“curse of dimensionality” [13] p.4. Feature selection and feature extraction methods are not recommended
for this type of datasets with a small number of features [13] but extracting the information about which
variables are most important, is important in this type of study. Choosing this particular approach is a type
of rudimentary data mining.

Four attribute evaluators and two search methods combinations are used:
•

CfsSubsetEval and BestFirst

•

ClassifierAttributeEval and Ranker

•

ClassifierSubsetEval and BestFirst

•

CorrelationAttributeEval and Ranker

CfsSubsetEval creates subsets of attributes, where predictive ability of each feature and level of
redundancy is considered. Features need to be highly correlated with class and low intercorrelation. Best
first search method is used with CfsSubsetEval.

ClassifierAttributeEval evaluates the worth of an attribute by using a user-specified classifier. For
example if we use linear regression on our dataset, linear regression needs to be chosen for the classifier
attribute evaluator. Ranker search method is used with classifier attribute evaluators.

Classifiersubseteval evaluates attribute subsets on training data or a separate hold out testing set. Same as
classifier attribute evaluator it uses classifier to estimate how good are subsets. Bestfirst search method is
used with ClassifierSubsetEval.

CorrelationAttributeEval evaluates the worth of an attribute by measuring the correlation between it and
the class. Each value of an attribute is treated as an indicator. Ranker method is used with
CorrelationAttributeEval.
Best-first search method searches the space of attribute subsets by greedy hill-climbing augmented with a
backtracking facility. Bestfirst may start with the empty set of attributes and search forward, or start with
the full set of attributes and search backward, or start at any point and search in both directions.
Ranker search method ranks attributes by their individual evaluations, where it is used with attribute
evaluators.

�Journal of Natural Sciences and Engineering, Vol. 1, (2019)
DOI number: 10.14706/JONSAE2019114

4. Results and Discussion

Classical statistical tools like probability distribution were used firstly in order to get the sense of the data.
Table 2 represents the attribute subset evaluator used on random forests. Random forests with Classifier
Subset Evaluator and Best First search method gave the best results for all the combinations. Best First
search method is a heuristic or informed search; it evaluates the second step before taking the first. Then it
chooses which way to go. For this combination of methods only attribute nr.2 (Surface Area), is not
considered. Since the volume of the buildings is fixed it is logical that surface area has a little variation and
therefore a little impact on the result.

Table 2. Results for combination of random forest and search methods
Random Forest
Attribute

Correlation

Mean

Root

Relative

Root Relative

Selected

Evaluator and

Coefficient

Absolute

Mean

Absolute

Squared Error

Attribute

Error

Squared

Error

1.4319

2.2692

16.6687

Search Method
CfsSubSetEval

0.9711

and BestFirst
Classifier

s
23.8241%

3, 5, 6, 7

17.1345%

1, 2, 3, 4,

%
0.9582

1.0079

1.6320

AttributeEval and

11.7324
%

5, 6, 7, 8

Ranker
Classifier

0.959

2.0323

2.6933

AttributeEval and

23.3581

28.2775%

1, 2, 4, 5

17.0046%

1, 3, 4, 5,

%

Ranker
ClassifierSubsetE

0.9854

0.9967

1.6196

val and BestFirst
CorrelationAttrib

11.6030
%

0.9852

1.0079

1.6320

uteEval and

11.7324

6, 7, 8
17.1345%

%

1, 2, 3, 4,
5, 6, 7, 8

Ranker
CorrelationAttrib

0.9841

uteEval and

1.0859

1.6904

12.6408

17.7479%

5, 1, 3, 7

%

Ranker
Relationship between the volume of a built form and the surface area of its enclosure is called compactness.
Roundness is a similar feature.

R. Buckminster Fuller, engineer and an architect claimed that round houses have best energy efficiency,
and an attempt to extract this feature has been made, but with no results.

�Journal of Natural Sciences and Engineering, Vol. 1, (2019)
DOI number: 10.14706/JONSAE2019114

Surface area, attribute nr.2, directly shows compactness of the building and by similarity, roundness.
Classifier attribute evaluators removed this feature and gave the best correlation coefficient meaning that
compactness has no impact on cooling load.

Usage of geometric compactness for such evaluative purposes is criticized on multiple grounds. It does not
capture the specific morphology of the building shape, disregards transparent blocks of the structure and
does not correlate with orientation att. nr. 6 [14].

High correlation coefficient with all attributes included, except for surface area finally pointed that
compactness does not affect thermal load. Our model gave similar results using the same dataset as Tsanas
and Xifara [2] with slightly better correlation coefficient which is shown in Table 3 for classifier attribute
evaluator and ranker, in Table 4 for correlation attribute evaluator and ranker.

Table 3. Ranking of attributes according to attribute evaluator and ranker
ClassifierAttributeEval and Ranker
Mathematical representation

Name

Ranked

x1

Relative compactnes

6.8134

x2

Surface area

6.8134

x4

Roof area

5.5105

x5

Overall height

5.2827

x3

Wall area

2.3935

x7

Glazing area

0.1718

x8

Glazing area distribution

0.0306

Table 4. Ranking attributes according to correlation attribute evaluator and ranker
CorrelationAttributeEval and Ranker
Mathematical representation

Name

Ranked

x5

Overall height

0.8958

x1

Relative compactnes

0.6343

x3

Wall area

0.4271

x7

Glazing area

0.2075

�Journal of Natural Sciences and Engineering, Vol. 1, (2019)
DOI number: 10.14706/JONSAE2019114
x8

Glazing area distribution

0.0505

x6

Orientation

0.0143

x2

Surface area

-0.673

x4

Roof area

-0.8625

Further study is to be done with different variations of cross folds with above-mentioned algorithms. Results
would be standing stronger if another dataset to test our algorithm was available. “K-nearest neighbor”
algorithm gave poor results. It is a “data sensitive” algorithm, vulnerable when faced with large amounts
of data. Different datasets would be a great boost to this work to test methods against them.

Parameter tuning is an iterative process, and Weka makes it easy to use it, without need to understand how
parameters work. Especially, when dealing with feature selection, bias can be inadvertently introduced into
models as it can give unforeseen consequences, mostly overfitting [7] [15].

Numerical values calculated by software simulations, lies very closely to previous results. Close values as
compared to similar studies on the same dataset is a characteristic of the machine learning scientific field
and using different methods and coming to the same results is an achievement [16].

6. Conclusion

Results of the previous study were repeated [17], and further work was done with examining cooling load
resulting in slightly better correlation coefficient than in article with high scientific impact [2].

Trial and error are at the core of machine learning. Choosing right algorithms is a trade-off between speed,
accuracy, and complexity. Starting with simple combinations and then adding complexity is the core of
dealing with machine learning while constantly having in mind what type of data is dealt with.

Empirical study gives answers to what algorithm to use or what parameters to choose. Knowing beforehand
what method will work best is almost impossible. Constantly iterating different combinations of similar
methods with systematic workflow and using Weka is a way forward. New and easy accessible software
packages makes it easier to spot and exploit new research areas, which previously were inaccessible due to
low computing capability.

REFERENCES

�Journal of Natural Sciences and Engineering, Vol. 1, (2019)
DOI number: 10.14706/JONSAE2019114
[1]

Y-T. Chen, “The Factors Affecting Electricity Consumption and Sector – A Case of Taiwan”,

2017.
[2]

A. Tsanas, A. Xifara, “Accurate quantitative estimation of energy performance of residential

building using
statistical machine learning tools”, Science Direct, 2012, p 9.
[3]

A. Yezioro, “An applied artificial intelligence approach towards assessing building performance
simulation tools”,

Energy and Buildings, 2007, p 40.
[4]

T. Catalina, J. Virgone, “Cooling energy demand evaluation by means of regression models”.

Proceedings of the Eleventh International Conference Enhanced Building Operations, New York City 2011,
pp 6.
[5]

D. Datta, S. A. Tassou, D. Marriot, “Application of Neural Networks for the Prediction of the

Energy Consumption”, 1997.
[6]

Mathworks, “Mastering Machine Learning: A Steb-by-Step Guide with MATLAB.” Available at:

https://www.mathworks.com/campaigns/offers/mastering-machine-learning-withmatlab.confirmation.html?ab_test=b_version.
[7]

J. Brownlee, “Machine Learning Mastery With Weka”, Wellington: Jason Brownlee 2019.

[8]

X. Yan, X. Su, “Linear Regression Analysis: Theory and Computing”, World Scientific, 2009.

[9]

D. Natingga, “Data Science Algorithms in a Week”, 2017.

[10]

S. Kalmegh, “Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree

for Classification of Indian News.”, IJISET- International Journal of Innovative Science, Engineering and
Technology, 2015, Vol. 2 Issue 2.
[11]

S. K. Shevade, “Improvements to the SMO Algorithm for SVM Regression”, IEEE Transactions

on Neural Networks, 2000, vol. 11, no. 5-6.
[12]

P. Thomas, M. C. Suhner, “A new Multilayer Perceptron Pruning Algorithm for Classification and

Regression Applications”, Neural Processing Letters, Springer Verlag, 2015, p 31.
[13]

M. S. Raza, U. Qamar, “Understanding and Using Rough Set Based Feature Selection – Concepts,

Techniques and Applications”, Springer, 2017.
[14]

W. Pessenlehner, A. Mahdavi, “Building Morphology, Transparence and Energy Performance”,

Eight International IBPSA Conference, Netherlands, Eindhoven, 2003.
[15]

M. Kosinski, Y. Wang, “Deep neural networks are more accurate than humans at detecting sexual

orientation from facial images”, Journal of Personality and Social Psychology, 2018.
[16]

J. Christian, “Statistician: Machine Learning Is Causing A Crisis in Science”, Available:
https://futurism.com/machine-learning-crisis-science.

[17]

A. Bajek, A. Hasandić, “Energy Efficiency of the buildings.” Sarajevo: International Burch
University 2017.

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <collection collectionId="3">
      <elementSetContainer>
        <elementSet elementSetId="1">
          <name>Dublin Core</name>
          <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
          <elementContainer>
            <element elementId="50">
              <name>Title</name>
              <description>A name given to the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26245">
                  <text>Journal of Natural Sciences and Engineering</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="43">
              <name>Identifier</name>
              <description>An unambiguous reference to the resource within a given context</description>
              <elementTextContainer>
                <elementText elementTextId="26605">
                  <text>2637-2835</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="98">
              <name>DOI</name>
              <description>Digital object identifier</description>
              <elementTextContainer>
                <elementText elementTextId="26606">
                  <text>10.14706</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="45">
              <name>Publisher</name>
              <description>An entity responsible for making the resource available</description>
              <elementTextContainer>
                <elementText elementTextId="26607">
                  <text>International Burch University</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="41">
              <name>Description</name>
              <description>An account of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26608">
                  <text>Journal of Natural Sciences and Engineering (JONSAE) is a peer-reviewed, biannually published international journal focusing on empirical and theoretical research in all branches of Engineering and Natural Sciences. It is published on the behalf of Faculty of Engineering and Natural Sciences of International Burch University and aims to provide the best content regarding by publishing original research papers, review articles, special issues, feature articles, and book reviews. All manuscript submissions are subject to initial appraisal by the Editor, and, if found suitable for further consideration, to peer review by independent, anonymous referees. All peer review is double-blind and submission is online. The journal welcomes theoretical, applied, interdisciplinary and methodological work, with preference on empirical research, critical approach and problem-solving methods in manuscripts.</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="44">
              <name>Language</name>
              <description>A language of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26609">
                  <text>English</text>
                </elementText>
              </elementTextContainer>
            </element>
          </elementContainer>
        </elementSet>
      </elementSetContainer>
    </collection>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="26461">
                <text>Quantitative estimation of cooling load capabilities of residential buildings using&#13;
machine learning</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="26462">
                <text>Nedret Bećirović, Ismail Bejtović, Jasmin Kevrić</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="26463">
                <text> Based on previous research on energy efficiency of the buildings, particularly their cooling&#13;
load capabilities we will develop a collection of machine learning methods for detecting buildings&#13;
with best cooling load capabilities. This collection will study the influence of 8 input variables (relative&#13;
compactness, surface area, wall area, roof area, overall height, orientation, glazing area, glazing area&#13;
distribution) on one output parameter, that is cooling load of buildings. The results of this study&#13;
support the practicability of using machine-learning software to estimate building parameters as a&#13;
convenient and accurate approach, as long as the methods chosen are well suited for the type of data&#13;
in question.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="26464">
                <text>cooling load, energy efficiency, machine learning, neural network.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="43">
            <name>Identifier</name>
            <description>An unambiguous reference to the resource within a given context</description>
            <elementTextContainer>
              <elementText elementTextId="26465">
                <text> 2637-2835</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="98">
            <name>DOI</name>
            <description>Digital object identifier</description>
            <elementTextContainer>
              <elementText elementTextId="26466">
                <text>10.14706/JONSAE2021315&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
  </item>
  <item itemId="3479" public="1" featured="0">
    <fileContainer>
      <file fileId="4286">
        <src>https://eprints.ibu.edu.ba/files/original/27eac86735340248e4eda9d6b63e242a.pdf</src>
        <authentication>dfc7fdc2237adebaad2030ec2e8f4107</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="26453">
                    <text>Leveraging Raspberry Pi as a server for the integration of the NETCONF protocol
within IoT systems based on YANG
Dalibor Đumić1, Slobodan Lubura2
1
2

International Burch University, Sarajevo, Bosnia and Herzegovina

University of East Sarajevo, East Sarajevo, Bosnia and Herzegovina

dalibor.dumic@stu.ibu.edu.ba
slobodan.lubura@ets.ues.rs.ba
Abstract – Herein the idea of leveraging Raspberry Pi as a server for the integration of an incipient
network management protocol, the Network Configuration Protocol (NETCONF), within IoT
systems based on YANG is presented. The practical realization of this idea requires the
implementation of the NETCONF protocol together with REpresentational State Transfer web
services (RESTful). Such an interesting and innovative practical realization like this opens new
additional possibilities in domotics systems and these possibilities will be discussed in this paper.
Keywords – Django, domotics, Internet of Things, NETCONF, Raspberry Pi, RESTful web
services, YANG

1.

Introduction

In each home network there are always heterogeneous devices that are expected to be connected. All of
these devices are different if compared because they can be based on different hardware platforms, the
controller services can be of a different nature, and also the software components for enabling the network
access can vary [1]. For example, when we compare wearable technology based on the IoT like a
smartwatch or wristband with smart home devices such as a washing machine or air conditioner, we will
notice different capabilities in terms of memory usage, processing speed, and power consumption [2].
Because of that, the IoT devices can be generally classified based on their key characteristics:
●

communication flows in the system,

●

memory management,

●

data manipulation and processing,

●

power control and consumption.

For example, a smart coffee machine is not always powered on because it performs certain tasks when
required, but only when a user turns on it via a user interface such as a mobile application whenever the
user wants to drink a coffee or when the user is on the way to home and wants to have already prepared
coffee. These kinds of devices consume less power for communication. There are many actuators in home
automation systems that must be managed by systems connected to the Internet via network protocols [3].

�The focus of this paper is on the practical implementation of the methodology proposed in [4] and this
methodology was carried out by the empirical study of the NETCONF protocol that will be used as a
network protocol for enabling the connection of the gateway with the Internet. The gateway will perform
effective management of sensors and devices in a home network and it will be based on the RESTful
technologies.

The paper is organized into five sections. Section 1 introduces us to the IoT systems and the purpose of this
paper. In Section 2, the NETCONF protocol and its features are introduced. The proposed integration of
the NETCONF protocol in the IoT is detailed in Section 3. The results of the proposed integration of the
NETCONF protocol are noted in Section 4. The benefits of the proposed integration and the main
conclusions are discussed in Section 5.

2. The Network Configuration Protocol (NETCONF) and its features

A.

NETCONF

The Network Configuration Protocol (NETCONF) is a network management protocol with great features
such as installing, manipulating, and deleting the configuration of the devices in the network. Its purpose
is managing network devices, retrieving its configuration data, and uploading or manipulating new
configuration data of the network devices [5]. That means devices on the network can take different states
according to their configuration.
To switch between the device’s states, the configuration datastores are used. By definition, a configuration
datastore contains a set of information that is needed for the configuration, and thereby that configuration
is required to change the state of a device to chosen operational state from its initial default state. NETCONF
currently supports event notification features and the following multiple configuration datastores [6]:
●

"running" – this configuration is always present and it is used as the currently active configuration

●

"startup" – this configuration is used in the next startup

●

"candidate” - this configuration that can be used instead of currently running configuration through
an explicit commit.

By using NETCONF operations, it is possible to manipulate device configuration. The NETCONF
operations are invoked as Remote Procedure Calls (RPCs) from the client to the server. Some minor
operations are [6]:
●

“commit” - commits the "candidate" configuration to "running",

●

“copy-config” - copy one configuration datastore to another,

●

“edit-config” - changes the contents of a configuration database,

●

“get-config” - retrieves configuration datastore,

●

“lock” - prevent changes to a datastore from another party, and

●

“unlock” - releases lock on a datastore.

�Configuration data stored on devices and the protocol messages between devices are encoded in Extensible
Markup Language (XML) on both client and server side. Any script or application can be the client in order
to be runned as part of a network manager. The server is typically a network device. There is a rule that a
device on the network must support at least one NETCONF session. The main NETCONF message
exchange between client and server in a single NETCONF session [7] is illustrated in Figure 1. At the start,
the device and controller create a NETCONF session and share their list of their own capabilities by sending
&lt;hello&gt; messages. A capability describes a supported data model. After the session has started,, the
NETCONF executes exchanges &lt;rpc&gt; and &lt;rpc-reply&gt; messages. The &lt;rpc&gt; message consists of an
enclosed NETCONF command which is sent from the controller to the device. The &lt;get&gt; command in the
&lt;rpc&gt; message is used to get the running configuration and state information of the device (3). The &lt;editconfig&gt; request is used to write a specific configuration on the device (5). The &lt;rpc-reply&gt; message is sent
from the device to the controller in response to a &lt;rpc&gt; message. The response data for the given method
invoked is encoded as one or more child elements enclosed in the &lt;rpc-reply&gt; message.

Figure 1. NETCONF messages

The information that a client retrieves from the server consists of two parts: configuration data and state
data [6]. The purpose of the configuration data is to give a description of actions that will change a system
from its previous state into the state described in the configuration data, while the purpose of the state data
is to provide information such as read-only status data and collected statistics. For specifying NETCONF
data models and operations, the YANG data modeling language is used.

A.

YANG

To perform the NETCONF operations, a YANG module has to be defined as a hierarchy of data such as
configuration data, state data, RPCs, and notifications. By defining the YANG module, a description of all
data sent between both NETCONF client-side and server-side becomes completed. Each YANG module is
consisting of statements and some of the statements are previewed in Table 1 [8].
Table 1. YANG statements
Statements
augment
choice

Description
Extends existing data
hierarchies
Defines mutually

�container
extension
feature
grouping
key

exclusive alternatives
Defines mutually
exclusive alternatives
Allows new statements
to be added to YANG
Indicates parts of the
model are optional
Groups data definitions
into reusable sets
Defines the key leafs for
lists
Defines a leaf node in
the data hierarchy
A leaf node that can
appear multiple times

leaf
leaf-list

list
notification
rpc

typedef
uses

A hierarchy that can
appear multiple times
Defines notification
Defines input and
output parameters for
an RPC
Defines a new type
Incorporates the
contents of a
"grouping"

With the help of XML parsers and XSLT scripts, a translation of the YANG module into an equivalent
XML syntact becomes possible. Every YANG module consists of a set of built-in types and has a type
mechanism through which additional types may be defined. The modeler of the YANG module can add
constraints to the model to prevent impossible or illogical data. The purpose of these constraints is to
provide information about the data being sent from the server and help a client to understand the data that
the server will accept in order to avoid sending incorrect data from the client to the server. Table 2 briefly
describes some other common YANG constraints [9]

Table 2. YANG constraints
Statements
length

Description
Limits the length of
string

�mandatory
max-elements
min-elements

Requires the node
appear
Limits the number of
instances in list
Limits the number of
instances in list

must

XPath expression must
be true

pattern
range
reference
unique
when

Regular expression
must be satisfied
Value must appear in
range
Value must appear
elsewhere in the data
Value must be unique
within the data
Node is only present
when XPath expression
is true

Generally said, the YANG module is a single data model that contains three types of statements:
●

module-header statements – they describe the module and provide the information about the
module

●

revision statements – they provide information about the history of the module

●

definition statements – they are the body of the module where the YANG module is defined.

In order to use the YANG module, it firstly has to be defined or modeled to the specific problem domain.
After that, the YANG module can be loaded, compiled, or coded into the server. In the end, the NETCONF
server may implement any number of the YANG modules [10].

3. Proposed Methodology

After the empirical study of the NETCONF protocol and retrieving its features, an implementation of the
proposed integration was divided into two parts: server-side and client-side, as it is shown in Figure 2.

Figure 2. Both client and server sides are communicating over the Internet [4]
A.

Server-side

To implement the proposed integration, the following requirements are defined:

�●

small physical dimensions, because it has to be hidden in home installation and not visible;

●

able to boot Linux Operating System, since the Linux OS is open-source;

●

has General Purpose Input Output (GPIO) pins for interfacing with the sensors and devices,

●

has Ethernet port and/or WiFi module, and

●

CPU based on ARM for fast computing.

A great match for the single board with the following characteristics is Raspberry Pi 3 B+, which is based
on a 1.4GHz 64-bit quad-core ARM Cortex-A53 processor. The good thing about Raspberry Pi is that it
has the GPIO module which can be used through several programming languages such as C, C#, Python,
Java, etc. The fact is that the integration will be implemented by using Python programming language and
it makes Raspberry Pi a perfect match [11]. A server would be connected via appropriate connection lines
to these rooms as it is shown in the Figure 3.

Figure 3. Raspberry Pi as server connected to sensors and devices in each room via GPIO line [4]

In order to build a server, the Netopeer2, a set of tools implementing network configuration based on the
NETCONF protocol, is installed [12][13]. Each room in a home has sensors and relays for controlling
devices. For each room, a custom YANG module is created, and each custom YANG module manipulates
with data such as temperature, humidity, open or closed status, turned off or turned on status, etc. Thanks
to custom YANG modules, the server can easily manage the information related to the sensors and relays
in the home. The structure of the simplest custom YANG module for a room is shown in the section
“Appendix”.

�B.

Client-side

On the client-side, any device which supports the NETCONF protocol can communicate with the server.
However, the challenge is to develop an application by means of RESTful services. It should send the RPC
commands such as “edit-config” or “get-config” directly to the NETCONF server in order to retrieve
information about rooms in the user’s home. Finally, its interface must be user-friendly and rich with data
charts, data graphs, toggle buttons, etc.
The very first step is to develop a script that shall “talk” with the NETCONF server. Thanks to the enormous
possibilities of the Python programming language, it is possible to communicate with the server via the
NETCONF protocol by using ncclient library. The ncclient library enables an easy way of the client-side
scripting around the NETCONF protocol, and as well as the possibility of the application development [14].

The next step was to develop a web application and merge it with the script based on ncclient library. There
are many high-level Python web frameworks and one of them is Django. Django is specific because it
encourages rapid development and clean, pragmatic design [15]. By combining Django and ncclient, a
powerful user-friendly web application is created, and it will fulfill its main purpose – to collect all
information about the conditions such as temperature and humidity in the rooms of the user’s home and to
control devices in the rooms of the user’s home, all of it over the NETCONF protocol.

4. Results

On the client-side we have an application based on both front-end and back-end development in the Django
framework and merging its back-end with the ncclient module for interfacing with the server as shown in
Figure 4.

Figure 4. Developed client application

�On the server-side we have Raspberry Pi computer booting Linux OS which runs Netopeer2 and sysrepo
modules for enabling the NETCONF protocol and interfacing the data through YANG modules. The
Raspberry Pi is connected to several sensors and actuators, as shown in the Figure 5:

Figure 5. Raspberry Pi running as the NETCONF server

The URL of the recorded video of the methodology proposed in this paper can be found below in the
reference section [16]. A clip from the recorded video is shown in Figure 6 and it can be seen that two
processes are running parallely: sysrepo and netopeer2.

Figure 6. Testing the proposed methodology

An overview of both client and server sides is shown in Figure 7.

�Figure 7. Used technologies on both client and server sides

The complete overview of the proposed integration is shown in Figure 8.

Figure 8. Overview of the complete integrated system

4. Conclusion
Through the empirical study of the NETCONF protocol, great capabilities of the NETCONF protocol are
discovered. The NETCONF protocol allows us to have an unlimited number of YANG modules with
different structures of the data. This characteristic of the NETCONF protocol is of crucial importance for
using it in the home automation system and similar systems. The proposed integration is not a challenge
anymore. Thanks to the powerful Python Web framework and ncclient Python library, it is possible to
develop a rich web application that can be outperformed on many devices such as single board computers,
desktop computers, notebooks, and even tablets.

APPENDIX
Implemented module for a room in the YANG language:
module room1 {
namespace "urn:sysrepo:room1";
prefix r1;
description "The room yang module.";
revision 2019-09-14 {
description "Initial revision.";
}

container room-data {
description "Room 1 info.";
leaf temperature {

�description "Actual temperature inside the room.";
type uint8 {
range "0..125";
}
}

leaf humidity {
description "Actual humidity inside the room.";
type uint8 {
range "0..100";
}
}

leaf ac-status {
description "Informs whether the AC is switched on or off.";
type boolean;
}
}
}

ACKNOWLEDGMENT
Many thanks to the experts from the RT-RK Institute for Computer Based Systems in Banja Luka who
contributed and influenced so much to the development of this research from the early stages of the project.

REFERENCES
[1]

M. Tooba, A. Muhammad and A. M. Martinez-Enriquez, "Smart Solution for Heterogeneous

Device Interoperability in IoT," 2018 Seventeenth Mexican International Conference on Artificial
Intelligence (MICAI), Guadalajara, Mexico, 2018, pp. 70-75,
[2]

Van den Abeele, F., Hoebeke, J., Moerman, I., &amp; Demeester, P. (2015). Integration of

Heterogeneous Devices and Communication Models via the Cloud in the Constrained Internet of
Things. International Journal of Distributed Sensor Networks.
[3]

Vijay S., Banga M.K. (2018) Management of IoT Devices in Home Network via Intelligent Home

Gateway Using NETCONF. In: Kumar N., Thakre A. (eds) Ubiquitous Communications and Network
Computing. UBICNET 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and
Telecommunications Engineering, vol 218. Springer, Cham
[4]

D. Đumić, S. Došlić, M. Antić, B. Milić, “Integration of the NETCONF Protocol in the Internet

of Things by means of RESTful Web Services”, 6th International Conference on Electrical, Electronic and
Computing Engineering IcETRAN, pp. 983 - 987, ETRAN Society, June 2019
[5]

R. Enns, M. Brojklund, J. Schoenwaelder and A. Bierman, “Network Configuration Protocol

(NETCONF)”, Internet Engineering Task Force (IETF), ISSN: 2070-1721, June 2011. [Online]. Available:
https://tools.ietf.org/html/rfc6241
[6]

H. Ji, B. Zhang, G. Li, X. Gao and Y. Li, "Challenges to the New Network Management Protocol:

NETCONF," 2009 First International Workshop on Education Technology and Computer Science, Wuhan,
Hubei, 2009, pp. 832-836, doi: 10.1109/ETCS.2009.189.

�[7]

M. Dallaglio, N. Sambo, F. Cugini and P. Castoldi, "Management of sliceable transponder with

NETCONF and YANG," 2016 International Conference on Optical Network Design and Modeling
(ONDM), Cartagena, 2016, pp. 1-6
[8]

M. Dallaglio, N. Sambo, F. Cugini, P. Castoldi, “Management of sliceable transponder with

NETCONF and YANG”, International Conference on Optical Network Design and Modeling, pp. 1 – 6,
IEEE, May 2016
[9]

P. Shafer, “An Architecture for Network Management using NETCONF and YANG”, Internet

Engineering

Task

Force

(IETF),

ISSN:

2070-1721,

June

2011,

[Online].

Available:

https://tools.ietf.org/id/draft-ietf-netmod-arch-07.html
[10]

M. Brojklund, “YANG – A Data Modeling Language for the Network Configuration Protocol

(NETCONF)”, Internet Engineering Task Force (IETF), ISSN: 2070-1721, October 2010. [Online].
Available: https://tools.ietf.org/html/rfc6020
[11]

The Raspberry Pi Foundation. “Raspberry Pi 3 Model B+”, [Online], Available:

https://www.raspberrypi.org/products/raspberry-pi-3-model-b-plus/
[12]

Czech Educational and Research Network (CESNET), “Netopeer2 – The NETCONF Toolset”,

[Online], Available: https://github.com/CESNET/Netopeer2
[13]

sysrepo

-

YANG-based

datastore for

Unix/Linux application,

[Online],

Available:

http://www.sysrepo.org/static/doc/html/start_page.html
[14]

S. Bhushan, L. Poulopouls, Python library for NETCONF clients, [Online], Available:

http://ncclient.readthedocs.org/
[15]

Django Software Foundation, [Online], Available: https://docs.djangoproject.com/en/3.0/

[16]

NETCONF Protocol + Raspberry Pi + Django = Home Automation || Yugoscientiz © 2019,

[Online], Available: https://www.youtube.com/watch?v=ZoiYGt2NbCA

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <collection collectionId="3">
      <elementSetContainer>
        <elementSet elementSetId="1">
          <name>Dublin Core</name>
          <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
          <elementContainer>
            <element elementId="50">
              <name>Title</name>
              <description>A name given to the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26245">
                  <text>Journal of Natural Sciences and Engineering</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="43">
              <name>Identifier</name>
              <description>An unambiguous reference to the resource within a given context</description>
              <elementTextContainer>
                <elementText elementTextId="26605">
                  <text>2637-2835</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="98">
              <name>DOI</name>
              <description>Digital object identifier</description>
              <elementTextContainer>
                <elementText elementTextId="26606">
                  <text>10.14706</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="45">
              <name>Publisher</name>
              <description>An entity responsible for making the resource available</description>
              <elementTextContainer>
                <elementText elementTextId="26607">
                  <text>International Burch University</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="41">
              <name>Description</name>
              <description>An account of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26608">
                  <text>Journal of Natural Sciences and Engineering (JONSAE) is a peer-reviewed, biannually published international journal focusing on empirical and theoretical research in all branches of Engineering and Natural Sciences. It is published on the behalf of Faculty of Engineering and Natural Sciences of International Burch University and aims to provide the best content regarding by publishing original research papers, review articles, special issues, feature articles, and book reviews. All manuscript submissions are subject to initial appraisal by the Editor, and, if found suitable for further consideration, to peer review by independent, anonymous referees. All peer review is double-blind and submission is online. The journal welcomes theoretical, applied, interdisciplinary and methodological work, with preference on empirical research, critical approach and problem-solving methods in manuscripts.</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="44">
              <name>Language</name>
              <description>A language of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26609">
                  <text>English</text>
                </elementText>
              </elementTextContainer>
            </element>
          </elementContainer>
        </elementSet>
      </elementSetContainer>
    </collection>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="26454">
                <text>Leveraging Raspberry Pi as a server for the integration of the NETCONF protocol&#13;
within IoT systems based on YANG</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="26455">
                <text>Dalibor Đumić1, Slobodan Lubura</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="26456">
                <text>Herein the idea of leveraging Raspberry Pi as a server for the integration of an incipient&#13;
network management protocol, the Network Configuration Protocol (NETCONF), within IoT&#13;
systems based on YANG is presented. The practical realization of this idea requires the&#13;
implementation of the NETCONF protocol together with REpresentational State Transfer web&#13;
services (RESTful). Such an interesting and innovative practical realization like this opens new&#13;
additional possibilities in domotics systems and these possibilities will be discussed in this paper.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="26457">
                <text>Django, domotics, Internet of Things, NETCONF, Raspberry Pi, RESTful web&#13;
services, YANG</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="43">
            <name>Identifier</name>
            <description>An unambiguous reference to the resource within a given context</description>
            <elementTextContainer>
              <elementText elementTextId="26458">
                <text> 2637-2835</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="98">
            <name>DOI</name>
            <description>Digital object identifier</description>
            <elementTextContainer>
              <elementText elementTextId="26459">
                <text>10.14706/JONSAE2021314&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
  </item>
  <item itemId="3478" public="1" featured="0">
    <fileContainer>
      <file fileId="4285">
        <src>https://eprints.ibu.edu.ba/files/original/277ccb2bcc1a93885c5603d23beeeaa1.pdf</src>
        <authentication>713fcc8ab28178f5189f971fd2845cb6</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="26446">
                    <text>Journal of Natural Sciences and Engineering, Vol. 1, (2020)
DOI number: 12.34567/JONSAE2020123

Student Attendance Pattern Detection and Prediction
Ibrahim Muzaferija1, Zerina Mašetić2, Samed Jukić3, Dino Kečo4
1

International Burch University, Sarajevo, Bosnia and Herzegovina
ibrahim.muzaferija@stu.ibu.edu.ba
zerina.masetic@ibu.edu.ba
samed.jukic@ibu.edu.ba
dino.keco@ibu.edu.ba

Abstract – Since the early beginnings of education systems, attendance has always played a crucial
role in student success, as well as in the overall interest of the matter. The most productive way of
increasing the student attendance rate is to understand why it decreases, try to predict when it is
going to happen, and act on causing factors in order to prevent it. Many benefits of predicted and
increased attendance rate can be achieved, including better lecture organization (i.e. lecture time and
duration, lecture class choice, etc). This paper describes the steps in the extraction of knowledge from
the university's student database and making a model that predicts whether the student will attend
the class or not. Results show that the attendance patterns are best reflected when employing a
decision tree algorithm, a C4.5 model that is interpretable and able to predict the attendance with
0.81 AUC performance measure.
Keywords - Data Mining, Educational Data Mining, Machine Learning

1.

Introduction

Data mining (DM) is an approach to discover useful information in data. It uses statistical and machine
learning (ML) techniques to operate on large volumes of data to discover hidden patterns and relationships
that describe the behaviors of systems that produced the data. Relationships and patterns discovered provide
helpful insight into decision making, as well as making predictions, thus solving numerous problems.

In recent years, there has been an increase in the use of ML techniques in many fields, such as education,
economics, business, statistics, medicine, and sport. The main objective of this paper is to apply ML
techniques in the educational field to analyze student behaviors and to predict whether the student will
attend the class.

Traditionally, educational institutions are collecting large volumes of data related to students, classes,
faculty members, and educational processes. However, collected data is often not analyzed enough to
provide significant results. In general, collected data is used for producing simple reports that are not highly
significant in contributing to the decision making process in the institutions.

�Journal of Natural Sciences and Engineering, Vol. 1, (2020)
DOI number: 12.34567/JONSAE2020123

Currently, educational systems aim to enhance the teaching and learning process by carefully analyzing
collected data, and discovering patterns related to student behavior and their final outcome. Reasons are to
identify which students will perform well, so that they can be awarded scholarships and more importantly,
to identify the students who may fail so that some form of help and assistance may be offered to them.

Besides identifying students by their performance, it's also important to discover which aspects of teaching
and learning systems facilitate student learning and success. One of the aspects that are closely related to
student performance is student attendance, meaning that students who have a higher attendance rate also
have a higher success rate in the end [1].

The paper is structured in seven sections: 1. Introduction section; 2. The previous work section describes
the previous efforts for the topic; 3. The methods and materials section describes data cleaning and
processing steps; 4. The model creation section describes model selection and creation methodology; 5.
The results section provides model results and evaluation; 6. In the discussion section, a comparison
between this study and previous studies is made; 7. The conclusion section provides recommendations for
future work in the area of educational data mining.

2.

Previous Work

Gurmeet Kaur and Williamjit Singh [2] applied machine learning methods from the WEKA tool in order
to predict students' performance from the College of Science and Technology – Khan Younis. Thir work
was concluded with two classification algorithms, Naive Bayes and J48, which provided an accuracy of
63.59% and 63,53% respectively.

C. Anuradha and T. Velmurugan [3] conducted a comparative analysis of the evaluation of classification
algorithms in the prediction of students' performance. The dataset was obtained from the college database,
containing 19 attributes that describe the student, his family, and the living environment, as well as previous
performances. Their goal was to compare algorithms in predicting students’ performance in end semester
examinations. The results show that Bayesian classifiers, as well as JRip and J48, had the highest accuracy
which is very close to 70%.

Abeer Badr El-Din Ahmed and Ibrahim Sayed Elaraby [4] describe the importance of Educational Data
Mining (EDM) and Knowledge Discovery in Databases (KDD) in achieving the main goal of higher
education institutions, that is, providing quality education to students. They used classification algorithms
to identify those students who needed special attention in order to reduce the failing ratio and taking
appropriate action at the right time, resulting in a decrease of the falling ratio by more than 15%.
Anal Acharya and Devadatta Sinha [5] used a dataset that contains a huge number of features that describe
a student, by applying feature selection algorithms like Correlation-Based Feature Selection (CBFS) and
Information Gain Attribute Evaluation (IGATE), they reduced the number of features and performed cross

�Journal of Natural Sciences and Engineering, Vol. 1, (2020)
DOI number: 12.34567/JONSAE2020123
modeling with five machine learning algorithms: Decision Trees (DT), Bayesian Networks (BN), Artificial
Neural Networks (ANN), Support Vector Machines (SVM) and Multi-Layer Perceptron (MLP). Features
related to gender, university, time, and family are the ones having the highest information gain, as well as
the models created using decision tree algorithms, provide 10-15% more reliable performance in
comparison to other classification algorithms.

The study conducted by Havan Agrawal and Harshil Mavani [6] confirms that past performances have
indeed got a significant influence over current performances. Further, they used neural network algorithms
and confirmed that the accuracy of the algorithms is proportional to dataset size, meaning that with the
increase of dataset size, the algorithms generalize the problem better.
In this paper, we’ll address the problem with a selection of best-performing machine learning algorithms
for EDA, as proposed by Anal Acharya and Devadatta Sinha [5] and Gurmeet Kaur and Williamjit Singh
[2], such as Logistic Regression, Decision Tree, Rule-based, k-NN, etc. Moreover, an increased number of
data samples is obtained in order to improve the algorithms generalizing ability, in contrast to the number
of data samples used in the previous study conducted by Gurmeet Kaur and Williamjit Singh [2].

3.

Methods and Materials

The research is based on CRISP-DM [7] methodology as it describes common approaches used by data
mining experts, while the paper contains a simplified version of the processing model shown below.

Figure 1. Data processing workflow

A. Data selection
Initial data was obtained from International Burch University’s Student Academic System [8] and contains
2nd-year student attendance data from the years 2016/2017 and 2017/2018. Although the dataset doesn't
contain all the details about the students and their classes (such as day of the week in which the class was
held, exact start and end time of classes, professor ID, etc.), it’s enough to extract the patterns of student
attendance behavior and create a model that predicts it.

�Journal of Natural Sciences and Engineering, Vol. 1, (2020)
DOI number: 12.34567/JONSAE2020123
The data was obtained as an SQL file, and after importing the file to the local database, RapidMiner [9]
was used to fetch the tables and store them in CSV format. Every further operation is done using the
RapidMiner, as it has the Weka [10] extension.

The following table displays whether or not an attribute of the original dataset was copied over to the data
mining dataset. All the selected attributes were considered relevant to the task of predicting student
attendance to classes.

Table 1. Initial dataset attribute selection
Table

Attribute

Accepted

Notes

student_id

x

No need for additional IDs

student_number

x

No need for additional IDs

student_id

✓

Student ID

course_code

✓

Course ID

branch

x

Same values in other tables

year

x

Same values in other tables

semester

x

Same values in other tables

student_id

✓

Student ID

attendance_id

✓

Class attendance ID

attendance_id

✓

Class attendance ID

course_code

✓

Course ID

branch

✓

Branch

year

✓

Year

semester

✓

Semester number

course_date

✓

Starting date of the week in which class
was held

type

✓

Type of the class

students

student_courses

student_attendance

course_attendance

�Journal of Natural Sciences and Engineering, Vol. 1, (2020)
DOI number: 12.34567/JONSAE2020123
topic

x

Not relevant / High cardinality

duration

✓

Duration of the class

B. Data Cleansing

In order to get an insight into data quality, graphical and statistical methods were used to detect anomalies,
faults, outliers, missing values, etc. First, the dataset was divided into four parts: 1st semester of 2016, 2nd
semester of 2016, 1st semester of 2017, and 2nd semester of 2017.

After examination, data related to both semesters of the year 2016 contained no anomalies and were
consistent, thus were labeled as clean data. Furthermore, 2nd semester of the year 2017 contained
incomplete data due to university system failure (class attendance from the last 2 weeks is missing), and
1st-semester data were not consistent (having a huge number of recorded attendances in the 14th week and
almost none in 15th week).

The dataset contained automatic attendance values that were irrelevant for creating a model and those
samples were removed. Some attendance samples recorded before and after the semester were marked as
outliers. Samples related to midterm and final exams showed the decrease of recorded attendances due to
the nature of exam weeks, as instead of multiple lectures in those weeks, only one was held - the exam.
Those samples were not relevant in predicting the lecture attendance and were discarded.

C. Deriving Data

From the course_date attribute, containing the date of the week in which the class was held, week attribute
was derived, containing week number in the semester.

The attribute attended is added to the table student_attendances and contains the value 1, which reflects that
the student attended the class. Later when joining tables, this attribute will have missing values which
indicate that students didn't attend the class.

The dataset contains only the records of students that attended the class and no records of students that
didn't attend. In order to populate the attribute attended with reflection did the student attend the class,
joining the tables is necessary.

First, by performing an inner join of student_courses and course_attendance tables, matching course_code
from one table with course_code from another, a new table is created containing a matched list of students
per course attendance IDs.

�Journal of Natural Sciences and Engineering, Vol. 1, (2020)
DOI number: 12.34567/JONSAE2020123

Next, by performing a left join of the previously created table and student_attendance table, matching both
attendance_id and student_id from one table with attendance_id and student_id from another table, a new
table is created containing attendance values where the student attended the class and missing values where
the student was absent. Finally, missing values were replaced with 0, indicating that the student was absent.
D. Dataset Creation

During the data preparation phase, attributes considered most relevant were selected to shape the model's
prediction capabilities. Then, using the RapidMiner tool, all data was cleaned and exported as a CSV dataset
that will be used in training and testing the model. The final dataset contains about 58,000 attendance
samples from the 2nd semester of the year 2016, and the following table displays qualitative and
quantitative aspects of all the attributes present on the final dataset. The goal attribute (or prediction class)
is “attended” which indicates did the student attend the class (marked as 1) or not (marked as 0).

Table 2 - Final dataset attribute description
Attribute

Data type

Range

Missing
values

Distinct
values

Unique
values

Statistics

id

integer

[1,58019]

0

58019

58019

—

attended

integer

0,1

0

2

0

Least: 1 (21327)
Most: 0 (36692)

course_code

nominal

MAN 201,
(...)

0

85

0

Least:
IRES 305 (5)
Most:
MAN 201 (6784)

branch

nominal

A,B,C,D,E,
F

0

6

0

Least: D (1628)
Most: A (37368)

type

nominal

Recitation,
lecture, lab

0

3

0

Least: recitation (1954)
Most: lecture (46511)

duration

integer

[1,4]

0

4

0

Min: 1
Max: 4
Average: 1.684

week

integer

[1,15]

0

15

0

Min: 1
Max: 15
Average: 7.861

4.

Model Creation

This machine learning problem belongs to the classification types [11]. In order to reach the business goal,
the complete understanding of data is required to generate the model. Currently, there are several modeling
algorithms for classification types of problems, and they are shown in the table below.

�Journal of Natural Sciences and Engineering, Vol. 1, (2020)
DOI number: 12.34567/JONSAE2020123

In order to correctly create, evaluate and validate the model, one of the key steps is the separation of the
data into training, testing, and validation.

Table 3. Machine Learning algorithms
Type

Name

Functions

Logistic Regression
ID3 (Decision Tree)
C4.5 (J48)

Trees

Random Forest
One-Rule
Rules
PRISM
Memory-Based

k-NN

The most convenient method for training and testing separation is called Cross-Validation [12], as it splits
the data into folds, and crosses the results of training and testing with different folds. The cross-validation
is conducted using five folds of training data. Validation data will not be used in cross-validation in order
to provide reliable testing results at the end.

5.

Results

All the decision tree algorithms had the minimal gain set to “0.01” in order to prevent premature pruning
of the tree branches, and pruning confidence threshold to “0.25”. Other model settings have been kept on
the default values because they are preselected for optimal model performance. After applying manifold
training and testing methods known as cross-validation [13], building the models with different algorithms
yielded promising results, as shown using the metrics such as accuracy, the area under the curve (AUC),
precision, recall, fallout, and f-measure [14]. Moreover, models have been evaluated with validation data
holdout and the results match with the cross-validation testing results presented below.

Table 4. Evaluations of created models
Algorithm

Accuracy

AUC

Precision

Recall

Fallout

F-Measure

�Journal of Natural Sciences and Engineering, Vol. 1, (2020)
DOI number: 12.34567/JONSAE2020123
Logistic Regression

75.37%

0.803

71.09%

55.63%

13.16%

62.41%

ID3

68.38%

0.697

56.20%

63.31%

28.68%

59.54%

C4.5

77.41%

0.812

73.04%

61.12%

13.12%

66.55%

Random Forest

66.48%

0.700

56.41%

38.73%

17.39%

45.92%

One-Rule

74.60%

0.500

69.25%

55.65%

14.39%

61.69%

PRISM

64.15%

0.500

71.90%

4.07%

0.93%

7.70%

K-NN

70.42%

0.672

58.13%

69.86%

29.25%

63.45%

The machine learning algorithm that creates the most accurate model is a decision tree algorithm known as
C4.5. The reason is the enhanced method of tree pruning that reduces misclassification errors due to noise
and too many details in the training data set, as described in the study conducted by Anuja Priyam et al
[15]. The accuracy of the model is fairly satisfying, taking into consideration that previous works provided
an accuracy of less than 70%. As opposed to previously mentioned studies, our data set contains more
examples thus produces a more accurate prediction model. This process allows the extraction of relevant
information from the model and helps draw the lines of action for this business problem.

Table 5. Confusion matrix for C4.5 model
true 0

true 1

class precision

predicted 0

31878

8291

79.36%

predicted 1

4814

13036

73.03%

class recall

86.88%

61.12%

In regards to interpretability, the decision tree generated by the C4.5 algorithm is easy to interpret as the
size of the tree generated is 357 and the number of leaves is 230. The most important attribute on the dataset,
as taken from the model, is the course code.

Furthermore, it's wrong to assume that one student attending classes has the same cost, from a business
perspective, as one that never goes to class. That means that students that attend classes are beneficial and
students that miss classes have a cost. With that in mind, the model needs to help in finding the solutions
that decrease the overall cost. There are four possibilities:
1. We predicted the student would attend class and he did;
2. We predicted the student would not attend class and he did not;
3. We predicted the student would attend class, but he did not;
4. We predicted the student would not attend class, but he did.

�Journal of Natural Sciences and Engineering, Vol. 1, (2020)
DOI number: 12.34567/JONSAE2020123

Point 1 is the best scenario, so it needs to have a negative cost (to be a benefit). Point 2 is the worst case,
so it needs to have the highest cost. Point 3 is also negative, but not as negative as the previous one. Point
4 is positive, but not as good as the first point. With that information, it is possible to build a cost matrix
for the class “Attended”:
Table 6. Cost matrix for the model
Actual
T

F

T

-15

15

F

-5

5

Prediction

Building the cost-matrix doesn’t affect the model’s performance but aids in the final outcome of prediction
by introducing the business bias and targeting to increase the business value.

6.

Discussion

The possible issue with the study conducted by Gurmeet Kaur and Williamjit Singh [2] is the small number
of instances (as low as 52) contained in the dataset and used to build the model. In order to make a model
more accurate and more prone to generalization, Havan Agrawal and Harshil Mavani [6] propose using a
higher number of instances, which made the model described in this paper more accurate. Moreover, crossvalidation, as one of the extra steps that are taken in model construction, increased the model’s overall
ability to generalize and provide higher accuracy than models in previous studies.

While conducting the research, it was noticed that the quantity and quality of data plays a crucial role in
the final outcome and performance. We highly devise to use a high number of instances in future studies,
and continuum stream of attendance data in deployed models to continuously train the model as the trends
responsible for student attendance dynamic behavior progresses over time.

The feature engineering task in the data preparation step has yielded significant model improvement as
compared to the models from previous studies that are built without deriving new attributes. Moreover, the
induction of external data has also improved the performance of the model as outliers were removed.

7.

Conclusion

This study has shown that patterns for student attendance exist and can predict whether the student will
attend the class. The importance of student data quantity and quality is presented, as well as the methods
for cleaning and transforming the data. The creation of a machine learning model should include cross-

�Journal of Natural Sciences and Engineering, Vol. 1, (2020)
DOI number: 12.34567/JONSAE2020123
validation as one of the key steps, and we devise using multiple algorithms for achieving the best results.
When there is a business value to achieve, it’s recommended to use a cost-matrix to further adjust the model
and increase the business value. The model for predicting student attendance can be used to improve in the
area of causing factors and increase the attendance ratio, which will subsequently increase the passing ratio,
i.e., the number of students that graduate. Future works can include an increase in data set examples, as
well as dimensionality increase by adding attributes for external factors of students’ attendance, such as a
professor who held the lecture and weather information of the day.

REFERENCES

[1]

A. S. N. Kim, S. Shakory, A. Arman, C. Popovic, and L. Park, “Understanding the impact of

attendance and participation on academic achievement,” 2019. [Online]. Available:
https://doi.org/10.1037/stl0000151. [Accessed: 14-Feb-2020].
[2]

“Prediction Of Student Performance Using Weka Tool,” Vidya Publications. [Online].

Available: http://ijoes.vidyapublications.com/paper/Vol17/02-Vol17.pdf. [Accessed: 26-Nov-2018].
[3]

“A Comparative Analysis on the Evaluation of Classification Algorithms in the Prediction of

Students Performance.” [Online]. Available:
http://www.indjst.org/index.php/indjst/article/view/74555/58051. [Accessed: 26-Nov-2018].
[4]

A. B. El-Din Ahmed and Ibrahim Sayed Elaraby, “Data Mining: A prediction for Student’s

Performance Using Classification Method,” HR PUB. [Online]. Available:
http://www.hrpub.org/download/20140105/WJCAT3-13701793.pdf. [Accessed: 26-Nov-2018].
[5]

“Early Prediction of Students Performance using Machine Learning Techniques,” Semantics

Scholar. [Online]. Available:
https://pdfs.semanticscholar.org/6447/4a9172a97cdf5d39c6fdcc21fc0c61fc7df3.pdf. [Accessed: 26-Nov2018].
[6]

“Student Performance Prediction using Machine Learning.” [Online]. Available:

http://www.ece.uvic.ca/~rexlei86/SPP/otherswork/V4I3-IJERTV4IS030127.pdf. [Accessed: 26-Nov2018].
[7]

“IBM Knowledge Center.” [Online]. Available:

https://www.ibm.com/support/knowledgecenter/en/SS3RA7_15.0.0/com.ibm.spss.crispdm.help/crisp_ove
rview.htm. [Accessed: 19-Dec-2018].
[8]

International Burch University, “Home,” International Burch University. [Online]. Available:

https://www.ibu.edu.ba/. [Accessed: 19-Dec-2018].
[9]

“Lightning Fast Data Science Platform for Teams | RapidMiner©,” RapidMiner, 19-Jan-2016.

[Online]. Available: https://rapidminer.com/. [Accessed: 19-Dec-2018].
[10]

“Weka 3 - Data Mining with Open Source Machine Learning Software in Java.” [Online].

Available: https://www.cs.waikato.ac.nz/ml/weka/. [Accessed: 19-Dec-2018].
[11]

“[No title].” [Online]. Available: https://www.cs.princeton.edu/~schapire/talks/picasso-

minicourse.pdf. [Accessed: 10-Nov-2019].

�Journal of Natural Sciences and Engineering, Vol. 1, (2020)
DOI number: 12.34567/JONSAE2020123
[12]

“[No title].” [Online]. Available: https://www.cs.princeton.edu/~schapire/talks/picasso-

minicourse.pdf. [Accessed: 10-Nov-2019].
[13]

“3.1. Cross-validation: evaluating estimator performance — scikit-learn 0.21.3 documentation.”

[Online]. Available: https://scikit-learn.org/stable/modules/cross_validation.html. [Accessed: 10-Nov2019].
[14]

L. Egghe, “The measures precision, recall, fallout and miss as a function of the number of

retrieved documents and their mutual interrelations,” Inf. Process. Manag., vol. 44, no. 2, pp. 856–876,
Mar. 2008.
[15]

“Comparative Analysis of Decision Tree Classification Algorithms” [Online]. Available:

https://inpressco.com/wp-content/uploads/2013/03/Paper17334-3371.pdf. [Accessed: 05-July-2020].

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <collection collectionId="3">
      <elementSetContainer>
        <elementSet elementSetId="1">
          <name>Dublin Core</name>
          <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
          <elementContainer>
            <element elementId="50">
              <name>Title</name>
              <description>A name given to the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26245">
                  <text>Journal of Natural Sciences and Engineering</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="43">
              <name>Identifier</name>
              <description>An unambiguous reference to the resource within a given context</description>
              <elementTextContainer>
                <elementText elementTextId="26605">
                  <text>2637-2835</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="98">
              <name>DOI</name>
              <description>Digital object identifier</description>
              <elementTextContainer>
                <elementText elementTextId="26606">
                  <text>10.14706</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="45">
              <name>Publisher</name>
              <description>An entity responsible for making the resource available</description>
              <elementTextContainer>
                <elementText elementTextId="26607">
                  <text>International Burch University</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="41">
              <name>Description</name>
              <description>An account of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26608">
                  <text>Journal of Natural Sciences and Engineering (JONSAE) is a peer-reviewed, biannually published international journal focusing on empirical and theoretical research in all branches of Engineering and Natural Sciences. It is published on the behalf of Faculty of Engineering and Natural Sciences of International Burch University and aims to provide the best content regarding by publishing original research papers, review articles, special issues, feature articles, and book reviews. All manuscript submissions are subject to initial appraisal by the Editor, and, if found suitable for further consideration, to peer review by independent, anonymous referees. All peer review is double-blind and submission is online. The journal welcomes theoretical, applied, interdisciplinary and methodological work, with preference on empirical research, critical approach and problem-solving methods in manuscripts.</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="44">
              <name>Language</name>
              <description>A language of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26609">
                  <text>English</text>
                </elementText>
              </elementTextContainer>
            </element>
          </elementContainer>
        </elementSet>
      </elementSetContainer>
    </collection>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="26447">
                <text>Student Attendance Pattern Detection and Prediction</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="26448">
                <text>Ibrahim Muzaferija1, Zerina Mašetić2, Samed Jukić3, Dino Kečo4</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="26449">
                <text> Since the early beginnings of education systems, attendance has always played a crucial&#13;
role in student success, as well as in the overall interest of the matter. The most productive way of&#13;
increasing the student attendance rate is to understand why it decreases, try to predict when it is&#13;
going to happen, and act on causing factors in order to prevent it. Many benefits of predicted and&#13;
increased attendance rate can be achieved, including better lecture organization (i.e. lecture time and&#13;
duration, lecture class choice, etc). This paper describes the steps in the extraction of knowledge from&#13;
the university's student database and making a model that predicts whether the student will attend&#13;
the class or not. Results show that the attendance patterns are best reflected when employing a&#13;
decision tree algorithm, a C4.5 model that is interpretable and able to predict the attendance with&#13;
0.81 AUC performance measure</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="26450">
                <text>Data Mining, Educational Data Mining, Machine Learning</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="43">
            <name>Identifier</name>
            <description>An unambiguous reference to the resource within a given context</description>
            <elementTextContainer>
              <elementText elementTextId="26451">
                <text> 2637-2835</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="98">
            <name>DOI</name>
            <description>Digital object identifier</description>
            <elementTextContainer>
              <elementText elementTextId="26452">
                <text>10.14706/JONSAE2021313&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
  </item>
  <item itemId="3477" public="1" featured="0">
    <fileContainer>
      <file fileId="4284">
        <src>https://eprints.ibu.edu.ba/files/original/2ce3ee80befbef6a1b69b1e7067b0262.pdf</src>
        <authentication>371f86fa32eeedceeb3439bff85b522d</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="26439">
                    <text>Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114

Overview of Human Lineage Genetic Marker Studies in Bosnia and Herzegovina:
Y chromosome story
Aldin Pirić1, Sabahudin Ćordić1, Lejla Smajlović-Skenderagić1, Serkan Dogan1, Damir Marjanović1,2
1

International Burch University, Sarajevo, Bosnia and Herzegovina
2

Institute for Anthropological Research, Zagreb, Croatia
aldin.piric@stu.ibu.edu.ba
sabahudin.cordic@stu.ibu.edu.ba
l.smajlovic.skenderagic@ibu.edu.ba
serkan.dogan@ibu.edu.ba
damir.marjanovic@ibu.edu.ba

Abstract – Modern Bosnia and Herzegovina is a state consisting of multiple ethnicities and regions
located in the Western Balkan, with a very complex history. The earliest historical findings show that
its area was inhabited since the Paleolithic. From that time, this part of Europe, especially the region
of the Modern Bosnia and Herzegovina, could be recognized as the crossroad for the different human
migration and the meeting point for different cultures, religions and gene pools. Mitochondrial DNA
is being used for maternal lineage testing, while the Y chromosome is being used for paternal lineage
testing. Therefore, these markers are being referred to as lineage markers. Lineage markers are often
used for parental lineage monitoring in population genetics, human genetics, as well as in forensic
genetics. The main intention of this paper is to construct a short overview of the Y chromosome
studies performed in Bosnia and Herzegovina within the last two decades.
Keywords - Bosnia and Herzegovina, lineage markers, molecular markers, population genetic
studies, Y chromosome

1.

Introduction

Existent archeological artifacts are proving that territory of Bosnia and Herzegovina has been populated
since Neolithic [1]. However, some of the archeological findings imply that the first inhabitants settled here
in the Paleolithic era [2]. In the early Bronze Age, Indo-European tribes known as the Illyrians settled in
the various region of the modern territory of Bosnia and Herzegovina. [3] the tribes were governed by the
Romans for more than five centuries [4]. During that time, a lot of the residents of the Roman empire,
including Roman soldiers settled down in the region [1].
After the fall of the Roman empire, this area remained a borderline between the Eastern and Western
empires which encouraged. various tribes, such the Avars, the Slavs, and others, that massively invade this
region. Additionally, two important events, along with several other historical episodes, significantly
impacted the structure of B&amp;H human population. The first of those are large migration waves from the
North-East (which were extremely intensive during the 6th and 7th centuries) which moved different
Gothic, Avar and Slavic clans into the area. The second one was the expansion of the Ottoman Empire into
this part of the Balkans in the fifteenth century [5]. All these historical episodes left their imprint on the

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114
population structure of modern B&amp;H inhabitants and created fascinating genetic diversity within.
Therefore, it is not surprising that modern B&amp;H population is one of the most genetically studied regional
populations, especially by the use of so-called lineage genetic markers.
Unlike autosomal markers, Y-linked and mitochondrial markers do not undergo each generation shuffling,
but instead get passed down from one generation to the next, with the only differences being induced by
mutations. For these reasons, these markers are often used for parental lineage monitoring in population
genetics, human genetics, as well as in forensic genetics. Mitochondrial DNA is being used for maternal
lineage testing, while the Y chromosome is being used for paternal lineage testing. Therefore, these markers
are being referred to as lineage markers [6].
Previously published papers presented a short historical overview of earlier published human population
studies in Bosnia and Herzegovina, conducted within the last three centuries [7,8]. However, usage of the
lineage markers within those papers was just briefly noted. Expansion of human population studies based
on these genetic markers, as well as the significance of the obtained results, initiated us to put more attention
on this part of BH population genetics. Therefore, this paper will extensively elaborate usage of the Y
chromosome DNA markers within analysis of the BH human genetic pool, including the most recent data
published after previously mentioned papers.

2. Human Y Chromosome as Genetic Marker

Y chromosome has been given many different definitions, some of them being “nonrecombining desert”
and “gene-poor chromosome”. Compared to other chromosomes, the Y chromosome has low number of
genes with half of its sequence consisting of repeated elements. Moreover, it lacks the recombination ability
and is in continuous decay. The Y chromosome is inherited through the patrilineal inheritance pattern, i.e.,
from father to son, meaning that each male person from the same patrilineal lineage would have an identical
profile. The relatively small degree of molecular diversity between markers located on this chromosome
comes from the absence of gene recombination in 95% of its length and the mechanism of random mutations
as the only possible source of polymorphisms [6].
Denver convention criteria classifies the human Y chromosome as G chromosome, that is, the category of
the shortest chromosomes in the human set, consisting also of chromosomes 21 and 22. It contains about
50 million base pairs, which makes out around 1.8% of the total human genome. The Y chromosome
contains important information used in determining the parental lineage of a specific male. This is possible
because the Y chromosome contains highly polymorphic regions. The human Y chromosome is present in
a sole copy in normal males, inherited from the father, and, as already mentioned, 95% of its complex does
not undergo recombination. Only 5% of this chromosome has the potential ability to interact with the X
chromosome, and the interacting region is called the pseudoautosomal region of the Y chromosome [6].
The Y chromosome has an important role in forensic analyses in cases of rape of women, in particular,
those involving more than one man, especially in cases of mixed samples when there is an overwhelming
amount of female DNA. Y-STR (Short Tandem Repeat) and Y-SNP (Single Nucleotide Polymorphism)
markers are useful in cases of parenthood testing or further kinship through the male line, when the children

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114
are male, and in the process of identification when only kin from the father’s line is known. In addition, the
Y chromosome is more and more being used in human migration studies due to its property of not
undergoing recombination throughout the transfer of genetic material between generations [9].
Actually, since the first Y chromosome polymorphism was published [10], an entire decade has passed
before the binary, and later STR markers, located on the NRY region found their wider application in
phylogenetic studies monitoring human migration patterns, through the construction of phylogenetic trees
[11]. The SNP patterns can be used to determine lineages which are referred to as haplogroups. Haplogroups
can also be inferred from readily available Y-STR genotyping data. Vast amount of forensic Y-STR data
is available for the use in population genetic studies [12].

3. Overview of the Y Chromosome Population Genetic Studies in Recent B&amp;H Inhabitants

The analysis of STR and SNP variation, autosomal, and Y-chromosome markers were studied so that
molecular genetic diversity of B&amp;H could get incorporated into regional and European frames, but also to
provide necessary reference for statistical calculations used in forensic genetics. In order to ensure the most
relevant calculation, the data are still periodically updated.
Initial results were obtained by observing 28 Y-chromosome biallelic markers in the B&amp;H population [13].
This study was constructed on the ground of regional data and designed to include 256 male individuals.
The results showed extremely close genetic relationship between three populations (three main Bosnian
and Herzegovinian ethnic groups) and their close relationship to other populations in the Balkans. Of
course, further elaboration of this issue required additional studies with a multidisciplinary approach,
application of additional molecular markers, expansion of the sample and structural investigation of each
ethnic group, as well as the analysis of ancient genetic material from the archeological skeletal samples.
In the same year (2005) very first Y STR population data set for the BH human population was published
[14]. Hundred tested males have been voluntary donors. The PowerPlex®Y System has been used in order
to amplify 12 Y-STR loci by via PCR. These STR loci are: DYS19, DYS385a, DYS385b, DYS389I,
DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438 and DYS439. From eighty-one
detected Y-STR haplotypes (from a total number of 100 obtained samples) 69 were unique, 7 appeared two
times, 4 appeared three and only 1 ﬁve times. Statistical analysis incorporated: gene diversity, major allele
frequency, the most frequent haplotypes, allele frequency distribution and observed haplotype diversity [3]
for 12 PowerPlex®Y loci.
Four years later, with the intent to improve existing database and to obtain more specific results for local
populations for a variety of DNA markers, group of authors decided to analyze additional individuals from
Canton Sarajevo area. Estimation of genetic diversity at 12 Y-chromosomal STR loci included in the
PowerPlex® Y System was used to extend the existing database, and create a more realistic view of the
state of the genetic structure within regional Bosnian and Herzegovinian human population, in particular
regarding the diversity among the isolated and non-isolated local populations. In addition, the intent of that
study was to estimate genetic distinctiveness of the Canton Sarajevo population within the general B&amp;H
population as well as with populations of geographically neighboring countries.

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114
Y-STR haplotypes were generated for a sample of 100 unrelated, healthy male individuals living in Canton
Sarajevo (Bosnia and Herzegovina) using PowerPlex®Y System kit [15]. Within this pool, the totals of 81
different haplotypes were detected with 71 of them unique. Absolute frequencies of the remaining 10
haplotypes were two for six haplotypes, three for two haplotypes, five for one haplotype and six for one
haplotype. Obtained results suggested that the local population of Canton Sarajevo, with respect to the
detected haplotype and gene diversity, may be considered a projection of general B&amp;H population. Since
this population represents the largest regional population in Bosnia and Herzegovina with emphasized
migration influx this is quite a logical outcome.
Four years later, in 2013, Y chromosome diversity of the B&amp;H population was examined again, but with
the increased number of STR loci. The sampling was performed using buccal swabs from unrelated, healthy
men originating from all regions of Bosnia and Herzegovina. Total number of samples obtained was 100.
DNA samples were typed for 23 Y STR loci, with 6 new loci: DYS481, DYS533, DYS576, DYS549,
DYS643, and DYS570, which are included in the new PowerPlex® Y 23 amplification kit. The absolute
frequency of generated haplotypes was calculated, and results showed that only two samples shared the
identical Y 23 haplotype. DYS418 was identified as the most polymorphic locus, with 14 detected alleles
and the minimum polymorphic loci were DYS437, DYS389I, DYS393, and DYS391. Decreasing the
number of repeating haplotypes is very important in forensic DNA analysis, and this study showed that it
can be achieved by increasing the number of highly polymorphic Y STR markers [16].
Whit Athey’s Haplogroup Predictor was used to determine Y chromosome haplogroup frequencies via Y
chromosome marker frequencies from the same 100 individuals [17]. According to those results, the most
frequent haplogroup seems to be I2a, with a commonness of 49%, followed by R1a and E1b1b, each
accounting for 17% of all haplogroups within the population. Remaining haplogroups encountered in this
study are J2a (5%), I1 (4%), R1b (4%), J2b (2%), G2a (1%) and N (1%). Preliminary B&amp;H population data
published before 10 years was confirmed by these results. The prediction about B&amp;H population as a part
of the Western Balkan area, which served as the Last Maximum refuge for the Paleolithic human European
population was also confirmed in this paper. Furthermore, these results corroborated the hypothesis that
this region was an important stopping point on the “Middle East-Europe highway” during the Neolithic
farmer migrations. Finally, since these results were almost completely in accordance with previously
published data on B&amp;H and neighboring populations that were generated by Y chromosome single
nucleotide polymorphism (Y-SNP) analysis, it was concluded that in silico analysis of YSTRs is a reliable
method for approximation of the Y chromosome haplogroup diversity of an examined population.
In the meantime, the same STR set of loci was employed to explore the distribution and polymorphisms of
23 short tandem repeat (STR) loci on the Y chromosome in the Turkish population recently settled in
Sarajevo, Bosnia and Herzegovina and to investigate its genetic relationships with the homeland Turkish
population and neighboring populations [18]. This study included 100 healthy unrelated male individuals
from the Turkish population living in Sarajevo. Amplification was performed using PowerPlex Y 23
amplification kit. The studied population was compared to other populations using pairwise genetic
distances, which were represented with a multi-dimensional scaling plot. Haplotype and allele frequencies
of the sample population were calculated and the results showed that all 100 samples had unique haplotypes.
The most polymorphic locus was DYS458, and the least polymorphic DYS391. The observed haplotype

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114
diversity was 1.0000 ± 0.0014, with a discrimination capacity of 1.00 and the match probability of 0.01. Rst
values showed that the observed population was closely related in both dimensions to the Lebanese and
Iraqi populations, while it was more distant from Bosnian, Croatian, and Macedonian populations. At the
end, the conclusion is that Turkish population living in Sarajevo can be observed as a representative Turkish
population because results were the same as those published for the population from Turkey. This study
showed that populations which are geographically close, were related genetically to each other.
The methods for haplogroup prediction were encountered in this study [19]. 23 loci from previously
obtained Y-STR haplotypes from 100 unrelated healthy Turkish males, who had recently settled in
Sarajevo, were utilized for the purpose of determining the haplogroups via Whit Athey’s Haplogroup
Predictor software. In total 90 studied haplotypes had the Bayesian probability greater than 92.2 % and had
the range between 51.4% and 84.3% for the 10 haplotypes left. 17 differently distributed haplogroups were
found, with Y-haplogroup J2a being the most prevalent one, with abundance percentage of 26% of all
samples, while haplogroups R1b, G2a, and R1a were less prevalent, with the range from 10% to 15% of all
the samples. These 4 haplogroups together contribute to 63% of all Y-chromosomes. in total 11 haplogroups
(E1b1b, G1, I1, I2a, I2b, J1, J2b, L, Q, R2, and T) had a range from 2% to 5%, whereas other haplogroups,
namely E1b1a and N were found in only 1% of all samples. Results have shown that a large percentage of
the Turkish paternal line is linked with West Asia, Europe Caucasus, Western Europe, Northeast Europe,
Middle East, Russia, Anatolia, and Black Sea Y chromosome lineages. Conclusion is that the analyzed
Turkish population can serve as a representative sample for the Turkish population residing in Turkey,
because results were consistent with those data published earlier in the literature for Turkish population in
Turkey.
In years 2016 and 2017, similar studies were performed on the human population residing in Tuzla, Bosnia
and Herzegovina. Namely, Tuzla Canton is one of the most populated regions in Bosnia and Herzegovina,
thus its genetic analysis could serve as proof of past demographic events. The first study, which included
in total 100 unrelated healthy adult males genotyped using 23-Y STR loci included within PowerPlex Y23
kit [20], employed statistical tests such as haplotype diversity, allele frequencies and Rst-based genetic
distances calculated between new dataset and the one from Bosnia and Herzegovina and other places. The
distances were afterwards visualized through multidimensional scaling plot and neighbor-joining
phylogenetic tree analyses. Discrimination capacity of the PowerPlex Y23 kit appeared to be high, because
all 100 individuals had the unique haplotypes, and newly incorporated loci seem very informative.
However, no significant difference between the study population and the general population of Bosnia and
Herzegovina, as well as between the population of Tuzla and neighboring populations. [20]
In the Second study, for the same 100 unrelated male individuals from Tuzla Canton, Bosnia and
Herzegovina (B&amp;H) in silico haplogroup assignments were made and it was based on 23-loci Y-STR data
using the following four different algorithms [21]. Dominant haplogroups were I, R and E with their
sublineages I2a, R1a, and E1b1b. It is in connection with the published Y-SNP data for the B&amp;H population.
In general, results which are represented in this study did not only constitute a concordance study on the
four haplogroup assignment algorithms which are also most popular, but they also give a deep knowledge
about differentiation that can be find within population of B&amp;H based on Y haplogroups for the first time.

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114
Those studies-initiated publication of the few more papers which were including Y STR data from B&amp;H
human population. The first one was published in 2015 and it was focused on the clustering of the European
human population based on the Y-STR data [22]. Three overall clusters were formed as a result of
autosomal STR loci analyses, namely the European, Asian and African. However, Y-STR analyses
highlighted formations of new sub-clusters. This is confirmed since the European cluster was easily divided
into four distinct groups represented as four branches of the phylogenetic tree, while the Asian population
cluster consists of two sub-clusters. Given the aforementioned clustering trends evident in both
phylogenetic trees, it was concluded that clusters were indeed formed as a consequence of geographical
proximity that triggered a mixing of gene pools, which in turn resulted in the formation of neighboring
populations that exhibit strong genetic similarities. Overall, this study effectively highlights that Y-STRs
could be a more informative tool in structural population studies as they are more informative than
autosomal STRs because they not only enable continental clustering but are also a great tool for additional
regional studies as well. Formation of four sub-clusters of European populations is once again proving the
great potential of Y-chromosomal markers in the wide spectrum of genetic analyses.
The second one was published in 2018 and it was focused on the analysis of the Balkan human population
based on the Y-STR data [12]. This study aimed to provide insight into genetics relations in Balkan
population using silico analysis of Y-STR haplotypes and predicting haplogroups as well as doing network
analysis of the same haplotypes. The population dataset was obtained using 23, 17, 12, 9 and 7 Y-STR loci
for 13 populations, including Bosnia and Herzegovina (B&amp;H), Croatia, Slovenia, Greece, Macedonia,
Romany (Hungary), Hungary, Serbia, Montenegro, Albania, Kosovo, Romania and Bulgaria. The overall
dataset consists of 2179 samples with 1878 different haplotypes. Between thirteen analyzed Balkan
populations, in four of them 12a was recognized as the major haplogroup. Each population with 12a as the
major haplogroup (B&amp;H, Croatia, Montenegro and Serbia) was from the former Yugoslavia republic. The
last two major populations from Yugoslavia, Macedonia and Slovenia, had E1b1b and R1a haplogroups as
the most prevalent. E1b1b haplogroup was the most prevalent in the population of Macedonia, Romania,
as well as Albania and Kosovo. Comparing I2a haplogroup clusters to E1b1b and R1b haplogroup clusters,
the former one is more compact, which indicates a larger degree of homogeneity within the haplotypes that
belong to that haplogroup. This study indicates that an effective approach for utilization of publicly
available Y-STR datasets may lie in combination of haplogroup prediction and network analysis.

4. Conclusion

Describing something that lasts for two decades as "a beginning" is quite unusual. However, that is the truth
in the case of Y chromosome human population-genetic studies in Bosnia and Herzegovina. There are still
many interesting features hidden within the existent diversity of local human populations in this small, but
intriguing, country that are still waiting to be discovered and described. Several preliminary hypotheses
were completely changed, such us origin of R1b haplogroup within this region, or significantly questioned,
such us origin of notably high frequency of I2a haplogroup in Bosnia (as Balkan LGM refugium marker or

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114
“Slavic migration marker” increased by founder effect) [23]. Those, and many other Y chromosome stories
are just waiting to be told.

REFERENCES
[1]

Malcolm, N. (1996). Bosnia: A short history. NYU Press.

[2]

Hadžibegović, I., &amp; Imamović, M. (1994). Bosna i Hercegovina od najstarijih vremena do kraja
Drugog svjetskog rata.

[3]

Wilkes, J. (1995). The Illyrians. Wiley-Blackwell.

[4]

Klaić, V. (1990). Povijest Bosne do propasti kraljevstva. Svjetlost Sarajevo.

[5]

Marjanovic D., et al. Doc Praehistorica, 23 (2006) 21.-6.

[6]

Marjanović, D., et al. (2018). Forensic genetics: Theory and application.

[7]

Marjanović, D., Pojskić, N., Kapur, L., Haverić, S., Durmić-Pašić, A., Bajrović, K., &amp;
Hadžiselimović, R. (2008). Overview of human population-genetic studies in Bosnia and
Herzegovina during the last three centuries: history and prospective. Collegium antropologicum,
32(3), 981-987.

[8]

Lasić, L. (2016). Historical Overview of the Human Population-Genetic Studies in Bosnia and
Herzegovina: Small Country, Great Diversity. Collegium antropologicum, 40(2), 145-149.

[9]

Semino, O., Passarino, G., Oefner, P. J., Lin, A. A., Arbuzova, S., Beckman, L. E., ... &amp; Marcikiæ,
M. (2000). The genetic legacy of Paleolithic Homo sapiens sapiens in extant Europeans: AY
chromosome perspective. Science, 290(5494), 1155-1159.

[10]

Casanova, M., Leroy, P., Boucekkine, C., Weissenbach, J., Bishop, C., Fellous, M., ... &amp;
Siniscalco, M. (1985). A human Y-linked DNA polymorphism and its potential for estimating
genetic and evolutionary distance. Science, 230(4732), 1403-1406.

[11]

Underhill, P. A., Myres, N. M., Rootsi, S., Metspalu, M., Zhivotovsky, L. A., King, R. J., ... &amp;
Kutuev, I. (2010). Separating the post-Glacial coancestry of European and Asian Y chromosomes
within haplogroup R1a. European Journal of Human Genetics, 18(4), 479.

[12]

Šehović, E., Zieger, M., Spahić, L., Marjanović, D., &amp; Dogan, S. (2018). A glance of genetic
relations in the Balkan populations utilizing network analysis based on in silico assigned Y-DNA
haplogroups. AnthropologicAl review, 81(3), 252-268.

[13]

Marjanovic, D., Fornarino, S., Montagna, S., Primorac, D., Hadziselimovic, R., Vidovic, S., ... &amp;
Andjelinovic, S. (2005). The peopling of modern Bosnia‐Herzegovina: Y‐chromosome
haplogroups in the three main ethnic groups. Annals of Human Genetics, 69(6), 757-763.

[14]

Marjanovic, D., Bakal, N., Pojskic, N., Kapur, L., Drobnic, K., Primorac, D., ... &amp; Hadziselimovic,
R. (2005). Population data for the twelve Y-chromosome short tandem repeat loci from the sample
of multinational population in Bosnia and Herzegovina. Journal of Forensic Science, 50(1),
JFS2004289-2.

[15]

Ćenanović, M., Pojskić, N., Kovačević, L., Džehverović, M., Čakar, J., Musemić, D., &amp;
Marjanović, D. (2010). Diversity of Y-short tandem repeats in the representative sample of the
population

of

Canton

Sarajevo

antropologicum, 34(2), 545-550.

residents,

Bosnia

and

Herzegovina. Collegium

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2019114
[16]

Kovačević, L., Fatur-Cerić, V., Hadžić, N., Čakar, J., Primorac, D., &amp; Marjanović, D. (2013).
Haplotype data for 23 Y-chromosome markers in a reference sample from Bosnia and
Herzegovina. Croatian medical journal, 54(3), 286-290.

[17]

Doğan, S., Ašić, A., Doğan, G., Besic, L., &amp; Marjanovic, D. (2016). Y-Chromosome Haplogroups
in the Bosnian-Herzegovinian Population Based on 23 Y-STR Loci. Human biology, 88(3), 201210.

[18]

Dogan, S., Primorac, D., &amp; Marjanović, D. (2014). Genetic analysis of haplotype data for 23 Ychromosome short tandem repeat loci in the Turkish population recently settled in Sarajevo,
Bosnia and Herzegovina. Croatian medical journal, 55(5), 530.

[19]

Doğan, S., Doğan, G., Ašić, A., Bešić, L., Klimenta, B., Hukić, M., ... &amp; Marjanović, D. (2016).
Prediction of the Y-Chromosome Haplogroups within a recently settled Turkish Population in
Sarajevo, Bosnia &amp; Herzegovina. Collegium antropologicum, 40(1), 1-7.

[20]

Babić, N., Dogan, S., Čakar, J., Pilav, A., Marjanović, D., &amp; Hadžiavdić, V. (2017). Molecular
diversity of 23 Y-chromosome short tandem repeat loci in the population of Tuzla Canton, Bosnia
and Herzegovina. Annals of human biology, 44(5), 419-426.

[21]

Dogan, S., Babic, N., Gurkan, C., Goksu, A., Marjanovic, D., &amp; Hadziavdic, V. (2016). Ychromosomal haplogroup distribution in the Tuzla Canton of Bosnia and Herzegovina: A
concordance study using four different in silico assignment algorithms based on Y-STR
data. Homo, 67(6), 471-483.

[22]

Dogan, S., Ašić, A., Buljubašić, S., Bešić, L., Avdić, M., Ferić, E., ... &amp; Marjanović, D. (2015).
Overview of European population clustering based on 23 Y-STR loci. Genetika, 47, 901-908.

[23]

Primorac, D., Marjanović, D., Rudan, P., Villems, R., &amp; Underhill, P. A. (2011). Croatian genetic
heritage: Y-chromosome story. Croatian medical journal, 52(3), 225-234.

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <collection collectionId="3">
      <elementSetContainer>
        <elementSet elementSetId="1">
          <name>Dublin Core</name>
          <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
          <elementContainer>
            <element elementId="50">
              <name>Title</name>
              <description>A name given to the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26245">
                  <text>Journal of Natural Sciences and Engineering</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="43">
              <name>Identifier</name>
              <description>An unambiguous reference to the resource within a given context</description>
              <elementTextContainer>
                <elementText elementTextId="26605">
                  <text>2637-2835</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="98">
              <name>DOI</name>
              <description>Digital object identifier</description>
              <elementTextContainer>
                <elementText elementTextId="26606">
                  <text>10.14706</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="45">
              <name>Publisher</name>
              <description>An entity responsible for making the resource available</description>
              <elementTextContainer>
                <elementText elementTextId="26607">
                  <text>International Burch University</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="41">
              <name>Description</name>
              <description>An account of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26608">
                  <text>Journal of Natural Sciences and Engineering (JONSAE) is a peer-reviewed, biannually published international journal focusing on empirical and theoretical research in all branches of Engineering and Natural Sciences. It is published on the behalf of Faculty of Engineering and Natural Sciences of International Burch University and aims to provide the best content regarding by publishing original research papers, review articles, special issues, feature articles, and book reviews. All manuscript submissions are subject to initial appraisal by the Editor, and, if found suitable for further consideration, to peer review by independent, anonymous referees. All peer review is double-blind and submission is online. The journal welcomes theoretical, applied, interdisciplinary and methodological work, with preference on empirical research, critical approach and problem-solving methods in manuscripts.</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="44">
              <name>Language</name>
              <description>A language of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26609">
                  <text>English</text>
                </elementText>
              </elementTextContainer>
            </element>
          </elementContainer>
        </elementSet>
      </elementSetContainer>
    </collection>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="26440">
                <text>Overview of Human Lineage Genetic Marker Studies in Bosnia and Herzegovina: Y chromosome story</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="26441">
                <text>Aldin Pirić1, Sabahudin Ćordić1, Lejla Smajlović-Skenderagić1, Serkan Dogan1, Damir Marjanović1,2</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="26442">
                <text>Abstract – Modern Bosnia and Herzegovina is a state consisting of multiple ethnicities and regions&#13;
located in the Western Balkan, with a very complex history. The earliest historical findings show that&#13;
its area was inhabited since the Paleolithic. From that time, this part of Europe, especially the region&#13;
of the Modern Bosnia and Herzegovina, could be recognized as the crossroad for the different human&#13;
migration and the meeting point for different cultures, religions and gene pools. Mitochondrial DNA&#13;
is being used for maternal lineage testing, while the Y chromosome is being used for paternal lineage&#13;
testing. Therefore, these markers are being referred to as lineage markers. Lineage markers are often&#13;
used for parental lineage monitoring in population genetics, human genetics, as well as in forensic&#13;
genetics. The main intention of this paper is to construct a short overview of the Y chromosome&#13;
studies performed in Bosnia and Herzegovina within the last two decades.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="26443">
                <text>Keywords - Bosnia and Herzegovina, lineage markers, molecular markers, population genetic&#13;
studies, Y chromosome</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="43">
            <name>Identifier</name>
            <description>An unambiguous reference to the resource within a given context</description>
            <elementTextContainer>
              <elementText elementTextId="26444">
                <text>2637-2835</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="98">
            <name>DOI</name>
            <description>Digital object identifier</description>
            <elementTextContainer>
              <elementText elementTextId="26445">
                <text>10.14706/JONSAE2021312&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
  </item>
  <item itemId="3476" public="1" featured="0">
    <fileContainer>
      <file fileId="4283">
        <src>https://eprints.ibu.edu.ba/files/original/6e51695a244d6e444275de0abb1daba0.pdf</src>
        <authentication>047401ca20f00a9f8f24fea2eb6cc2e4</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="26438">
                    <text>Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2020114

Sentiment Analysis on Twitter Data using Big Data

Obada Almonajed, Samed Jukić
1

International Burch University, Sarajevo, Bosnia and Herzegovina
almonajed.obada@ibu.edu.ba
samed.jukic@ibu.edu.ba

Abstract –With the increasing number of users and data on the Internet, especially social media sites,
sentiment analysis topic became one of the important and essential fields for most. Collection of
people's feelings and sentiment and classifying the data attracted most businesses and companies.
Recently, twitter sentiment analysis has attracted much attention, because of Twitter's growth and
popularity. The solution for handling enormous amounts of data from social media is a new term
called Big data. Big data is not just for having a large amount of data, but also the importance of
processing and the usage of the data. In this paper, we collect live data from Twitter using Apache
Spark; and apply machine learning algorithms provided by Apache Spark machine learning library
for classification of each Twitter message. Naive Bayes and Logistic Regression are used for testing
the model. Naive Bayes algorithm gave better results, where it has an average accuracy around 75%,
while the Logistic Regression algorithm was around 69%.
Keywords–big data, sentiment analysis, twitter, apache spark, social media, machine learning.

1. Introduction
Social media, one of the best things about it is in its name; social. It connects various people across the
world by sharing information to them and receiving information from them. The main purpose of social
media is to connect people and allow them to share thoughts and opinions. It allows also to read the news,
watch videos, read stories, view and share photos. Social media is becoming an integral part of our lives. It
is a way of connecting and building a relationship with others. It allows you to hear what people say and to
respond. The most popular platforms are Facebook, Twitter, YouTube, Instagram, Snapchat.

Since social media allows people to connect those days social media are very important for businesses. It
takes advantage of social media to increase brand exposure and customer reach. Publishing to social media
is very simple. For example, a company can create a page on Facebook, and post new products, sales
announcements, market brands, and products as images or text or video. No matter the size of the business,
it is important to recognize the value and trend for better understanding and utilizing the platform.

People can talk about your business without your knowledge. So, as a company, it is important to know
and monitor social media conversations about the brand. Based on reviews, the company can always adjust
the present market situation and satisfy customers in a better way. In order to identify the text written by

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2020114
your customers, a sentiment analysis tool is used. Sentiment analysis or opinion mining is used to determine
the emotional tone of message or text. The main usage of this tool is to understand how people feel and
think about something. The tool is very useful for companies and can affect decision making.Using machine
learning, companies can analyze the content on social media to see the meaning behind the messages.

An Enormous number of people across the world use social media. In order to gain such data, store, and
process, we will use Big data. Big data is not only for storing a large amount of data but the ability to
analyze. Big data allows us to get and analyze real-time data from social media. For this paper, one of the
fastest big data platforms Apache Spark will be used. Compared with Hadoop, it can be faster up to one
hundred times[1]. Apache Spark framework provides native bindings for Java, Python, Scala, Machine
Learning, and support SQL. The purpose of the paper is to collect data from Twitter and determine and
classify the feeling of the user into positive or negative using machine learning and Apache Spark.

2. Literature Review
Pang et al. [2], in the paper, they came out that unigram is a better model over others. Regardless of whether
there is no large difference between unigram precision and mix of unigrams and bigrams precision, where
the precision using unigrams has 82.9% and precision using the mix of unigrams and biagrams is 82.7%;
both predicted with SVM algorithm. However, Dave et al. [3] have inverse results, where bigrams gave
preferable precision over unigrams utilizing SVM and Baseline algorithms. SVM brings about 87.2%
precision for the first test and 85.8% precision for the second test for bigrams.

Pak et al. [4] gathered around 300.000 various tweets for Twitter. The tweet can be classified into three
classes, positive, negative, or neutral. They thought about that, the emoji in the message represents the
actual sentiment of the text. Thus, if ':(' emoji is included in the message, regardless of what is the content:
the message has negative sentiment. Likewise, if a tweet has ':)', the message is considered as negative
sentiment. For learning algorithms, they utilized multinomial Naïve Bayes, SVM and Conditional random
fields, yet Naïve Bayes indicated the best outcomes. To make the precision of the classifier better, they
removed some n-grams, since it isn't showing any sentiment.

Authors of the paper [5], have researched the usage of Apache Flume and Apache Hive which is built on
top Hadoop for analyzing Twitter data. In the research[6], the authors wrote and discussed a
recommendation system that provides a summary of users’ feedback, comments, and reviews about
different subjects using the Hadoop framework. Similarly, the authors of the researches [7], built a
recommendation system that recommends services. The researchers of the paper[8], build a Hadoop
framework for determining and analyzing the customers’ feedback toward a product from social networks,
that framework extracts and analyzes the feedback of social user relationship management.

Go et al. [9] broke down Twitter suppositions utilizing various machine learning algorithms. The algorithms
are Naïve Bayes, Maximum Entropy (MaxEnt), and Support Vector Machine (SVM). They remembered

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2020114
emojis for the training data and utilized two classes for tweets' classification, positive and negative classes.
In the wake of training data, they infer that emojis have a negative effect on data while applying MaxEnt
and SVM algorithms on the data, however don't influence Naive Bayes. What is specific in their study is
that, they explore the usage of unigrams, bigrams, combination of unigram and bigram and parts of speech.
They conclude with the result the mix of unigrams and bigrams beats every other model, and parts of speech
tags were not valuable at all.

3.

Methodology

A.

Sentiment analysis

With the usage of sentimental analysis, it can be learned whether the customers are satisfied with some new
service or not. Twitter is mainly used for firms to get customer feedback. Simple articles are being written
to identify whether people like or dislike something new. Firms are using that information to make a
decision so that they can make some service better and improve the firm’s sales. When sentiment analysis
is applied on content, it means users are looking for the opinion in the text. Is the product review positive
or negative? Are customers satisfied with the product or not? Are positive opinions greater than negative
or not? All kinds of questions can be answered with Sentiment Analysis. By sentiment analysis, users can
learn how customers' view the company's product or service. Shortly we can say sentiment analysis is being
used for agree/disagree, like/dislike, for/against [10]. For example, the sentence ‘I recommend this product
to everyone.’, the word ‘recommend’ indicates that the writer is happy, and the sentiment is positive.

In this paper, positive and negative words will be collected and used to train the machine to be able to
classify the messages. For getting, storing, and classifying such data users will use Big data tools. Big data
is data that exceeds the processing capacity of conventional database systems [11]. Big data means that
there is a large number of data to collect. If users want to always get data from social sites faster, they
should use big data. As data is more and more increased, it is becoming harder to control them, so Big data
is the solution. Hadoop for years was the leading open source framework for Big data; recently Apache
Spark is the leading and most popular framework. Hadoop and Spark almost perform the same tasks, but
Spark is more preferable, especially when it comes to speed; because the way it processes data is faster.

B. Data and Findings
For the work and experiment, we used one document. The document contains different examples of
messages with their outputs (classes) either positive or negative. The document is used to train and test the
system because this computer program is going to be supervised learning, which is learning from example.
They are using the known dataset for the training system called Stanford Twitter Sentiment Corpus (STS)
[12]. Each tweet in this dataset has the following data: ID of the user, timestamp of the tweet, the username

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2020114
of the user who posted the tweet, and the tweet itself. Next to each tweet, there is a class, either positive or
negative. The document contains about 1 million samples of positive and negative tweets. In the following
Figure, we show example of the dataset:

Figure 1. Samples of the Dataset

C. Process
First of all, we need to install Spark and include it in the Scala project. After that, we need to initialize a
Spark Context, which is going to tell Spark how to access a cluster. The Spark Context takes a parameter,
which is known as SparkConf or Spark Configuration. SparkConf allows the user to configure some
common properties which will be passed to Spark Context, like application name, master URL. memory
size, key value-pairs, and other properties.

Figure 2. Configuration
After configuration of the application, we started with the online collection of tweets. For online and realtime data, Spark streaming is required. Spark streaming receives live data from Twitter and divides them
into batches, where the user can later apply actions and process the data. In the next figure, we show
implementation of Spark Streaming.

Figure 3. Spark Streaming
User can get tweets from a specific secondary user, or all tweets that start with special word, or all tweets
that contains special hashtag ’#’. In our system, we collect all tweets containing special hashtag, and include
that hashtag into the arguments of the system. Now, after all configurations we are able to collect data from
Twitter. and save them to a file. In our system, we are saving the data to the text file. In the next figures,
we show how to fetch data and how to save data into text files.

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2020114

Figure 4. Fetching Tweets

Figure 5. Save data in text format

D. Spark Machine Learning Library
The next and most important step is to classify each tweet to positive or negative class. Use Spark machine
learning library, which contains different algorithms. Data, in order to be analyzed, it has to be converted
to vectors. For that, use a well known and very useful tool called Hashing. Hashing is translating text data
to numeric data. In Spark, most common and used hashing is HashingTF.it is important to say that, before
analyzing the caught data from Twitter, it is a prerequisite to hash each data, as it is shown in the figure
below.

Figure 6. Hashing data
We used two algorithms for comparing the better one, Naive Bayes and Logistic Regression. Logistic
Regression is a binary classification, which means it can classify data into one of two groups. While Naive
Bayes can be used for multiple groups.First, we have used a 10 cross-validation. Cross-validation is splitting
a dataset into more than one pan. It is used to ensure that every data has been used for training and testing
data. Training data is always larger in size than testing data. If a user has 1000 samples of data, the user can
take 800 for training and 200 for testing. Since he has used 10 cross-validation, it means 9 folds for training
and 1 fold for testing.

Table 1. Cross validation example
1-fold

Training

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2020114
2-fold

Training

3-fold

Training

4-fold

Training

5-fold

Ttraining

6-fold

Training

7-fold

Training

8-fold

Training

9-fold

Training

10-fold

Testing

Next, just move the testing data to another place in dataset, and another place in the table, like in table 2
where testing data is now 1-fold and it is at the top and beginning of the dataset. As we can understand
testing data has to be moved each fold cross validation to one place and each data will be in testing and
training part.
Table 2. Cross validation example 2
1-fold

Testing

2-fold

Training

3-fold

Training

4-fold

Training

5-fold

Ttraining

6-fold

Training

7-fold

Training

8-fold

Training

9-fold

Training

10-fold

Training

For each fold, it is important to calculate the accuracy; so, at the end you will determine its performance
and if the classifier and data are good or not.Cross-validation and the accuracy are very important, they
indicate to how well the learner will be able to make right and correct prediction for new data. For
algorithms of learning, we used two machine learning algorithms as we mentioned before, Naïve Bayes
and Logistic Regression. Results showed that Naive Bayes is better at prediction of the text. More details
about the results will be described in the next section.

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2020114

4. Results
To train and test the system use Stanford Twitter Sentiment Corpus (STS) dataset which is available online.
It contains more than one million samples. After the completion of testing on our data the results as well as
accuracy of each k-fold is shown in the table below:

Table 3. 10-fold cross validation
k-fold

Naive Bayes

Logistic Regression

1-fold

77.3

68.8

2-fold

70.4

73.4

3-fold

75.7

74.3

4-fold

77.2

67.7

5-fold

76.4

64.6

6-fold

73.6

66.5

7-fold

69.8

75.3

8-fold

79.1

65.8

9-fold

77.3

67.2

10-fold

74.5

71.05

To calculate the accuracy of the classifier, true positive plus true negative over total number of testing
data:

Figure 7. Formula to Calculate the Accuracy
Code regarding our program:

Figure 8. Code to Calculate the Accuracy

‘predictionAndLabel’- this is displaying the actual prediction result and the prediction of the system. Real
example from our system is shown in the following figure, where it is shown the prediction of the system
and real prediction of the data.

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2020114

Figure 9. Prediction and Actual
Example: One sentence: ‘Good project, I liked it.’, result of classification using Naïve Bayes algorithm
was: 1.0 means positive. while the result of LogisticRegression algorithm was: 0.0 which means negative
sentiment. Another example: ‘I love it :)’, prediction of Naïve Bayes is 1.0 and the Logistic Regression is
also 1.0 which is positive and correct.

The total accuracy of both algorithms, Naive Bayes and Logistic Regression, after cross-validation is shown
in the following table.
Table 4. Accuracy

Average Accuracy

Naive Bayes

Logistic Regression

75.13%

69.465%

From this table we can see that Naive Bayes average accuracy is somewhere around 75 percent. Logistic
Regression accuracy is a bit lower than Naive Bayes and its accuracy is around 69 percent. There is some
difference, not so big. That difference is around 6 percent. As a conclusion for those results we take the
right to say that Naive Bayes algorithm provides great results. Logistic Regression with a this, bit lower
percentage, can be considered as a great algorithm as well. After the users have finished the training of the
system, use it for catching the data from Twitter and predict the data using both algorithms,Naïve Bayes or
Logistic Regression. To get better results, we should use Naive Bayes rather than Logistic Regression.
Finally, the best way is to save data in a text file, so the companies can easily keep track of the users' opinion
about the company's products and about the company in general.

5. Discussion
In our paper, as you could see, we proved how text classification can be done in a fast and easy way by
using Spark. Use Spark as Big data and for applying machine learning algorithms. Use two well-known
machine learning algorithms, Naive Bayes and Logistic Regression. Using these algorithms we achieved a
very high model's accuracy by applying to data sets that contained different types of sentences and
emoticons. Also, we have shown how emoticons can help in improving the model's accuracy, if used
correctly. Using more data in training and testing sets in our cross-validation method, we would achieve
better results.

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2020114
In this section of paper, an endeavor was made to compare the various methods and results of algorithms
performance.Considering the research papers related to our research, which are already mentioned in
Section 2, notice that in any case, the text should always be predicted using different methods and then
decide which method is the best for achieving our goal. In the following table notice that, summarize
different Supervised Machine Learning approaches for Twitter sentiment analysis.

Table 5. Summary of previous work
Paper
Pak and

Methods
Supervised

Algorithms
Multinomial

Datasets
Tweets

Results
Multinomial Naive Bayes with

Paroubek [4]

Machine

Naive

collected using

bigrams

Learning

Support Vector

Twitter API

superior

Bayes,

Machine
(SVM),

accomplished

a

performance

contrasted with unigrams and
and

trigrams.

Conditional
Random Field
(CRF)
Go et al [9]

Supervised

Naive

Machine
Learning

Bayes,

Tweets

The

Maximum

Entropy

Maximum

collected using

(MaxEnt) with both unigrams

Entropy

Twitter API

and bigrams accomplished a

(MaxEnt), and

precision of 83% contrasted

Suppor Vector

with the Naive Bayes with a

Machine

precision of 82.7%.

(SVM)
Pang et al [2]

Supervised

Support

IMDb

The

Machine

Vector

unigrams

Learning

Machine

accuracy utilizing the mix of

(SVM), Naive

unigrams and bigrams is 82.7%

Bayes, and

with Support Vector Machine

MaxEnt

(SVM).
Support

accuracy
has

They

utilizing
82.9%

proved

Vector

and

that

Machine

(SVM) is superior to Naive
Bayes and Maximum Entropy
(MaxEnt), where the accuracy
utilizing unigrams has 81.0%
with Naive Bayes and 80.4%
with

Maximum

Entropy

(MaxEnt), and the accuracy
utilizing both unigrams and
bigrams has 80.6% with Naive

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2020114
Bayes

and

80.8%

with

Maximum Entropy (MaxEnt).

Some earlier research and studies utilized various groups of sentiment, similar to satisfaction, sadness,
frustration, dread and shock. While, in our research, we classified the tweets into two groups, positive or
negative, no third group. Most researches were about applying ML algorithms on tweets for sentiment
analysis, without the use of Big data. While, we used Big data with the machine learning algorithms in our
research.

From Table 5., see that Go et al got better accuracy using Naive Bayes algorithms. They did an additional
procedure, which we neglected, and that is related to emoticons, they deleted any tweet that contains both
positive and negative emoticons. This may happen if a tweet contains two subjects. Although we don't
know the accuracy of the model in the research of Pak and Paroubel, we can surely say that they did a good
research, because they followed the steps necessary to determine if the text is positive or negative. The
steps followed included the removal of any URLs and usernames (user-names follow the "@" symbol) and
removal of any characters that repeat more than twice turning a phrase such as OOMMMGGG to
OOMMGG, which is applied by a regular expression.

6. Conclusion
In this paper it was shown how usage of Spark as Big data can help us classify text from tweets to positive
and negative in a very simple yet very fast way.By using common algorithms Naïve Bayes and Logistic
Regression we have achieved a very high by applying to large data sets that contained a various number of
different emoticons and sentences. We determined that Naïve Bayes is much better than Logistic
Regression by training and applying cross validation to our dataset, where its highest accuracy was around
79%. That is the most relevant result regarding the usage of Big Data. Also, in our paper we have
demonstrated and shown how it is fast and easy to use and understand it, and how it is powerful with large
data sets. For that reason, we can conclude that it is the best tool regarding Twitter sentiment analysis. But
not only can sentimental analysis be used for Twitter, it can be used for any type of documentation or data.
In the near future our plan is to have and use richer data sets for training, Spark Graphs for better data
visualization and usage of real-time data rather than offline data. It can be achieved easy; just classification
methods have to be applied and used right after getting each tweet from Twitter. We can see from the
previous related works that are mentioned in the Chapter 2, sentiment analysis on Twitter data can be used
in many different areas. From those papers, we can conclude that the main goal was to determine the
products' quality, so we can say that the main goal is to make it easier for companies to check whether the
item is good or not for the customers. Also, politicians and companies want to know what people write in
real time about them, so they request monitoring tools to know the opinions, feelings and sentiments that
their potential customers are publishing. This method can also be used in film production, since we can see
that many Twitter users write their opinion about watched films, about the actors, and so on.

�Journal of Natural Sciences and Engineering, Vol. 3, (2020)
DOI number: 10.14706/JONSAE2020114
REFERENCES
[1] P. P. Chitturi, Apache Spark for Data Science Cookbook, Packt Publishing Ltd, 2016.
[2] B. Pang, L. Lee i S. Vaithyanathan, »Thumbs up? Sentiment Classification using Machine Learning,«
2002. [Mrežno]. Available: https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf.
[3] D. Kushal, S. Lawrence i D. M. Pennock, »Mining the Peanut Gallery: Opinion Extraction and,« 2003.
[Mrežno]. Available: https://www.kushaldave.com/p451-dave.pdf.
[4] A. Pak i P. Paroubek, »Twitter as a Corpus for Sentiment Analysis and Opinion Mining,« 2010.
[Mrežno].

Available:

https://pdfs.semanticscholar.org/6b7f/c158541d5a7be2b2465f7d8a42afa97d7ae9.pdf?_ga=2.1218413
55.1543760336.1572899814-899645452.1571167125.
[5] Sanggeta, »Twitter Data Analysis Using FLUME &amp; HIVE on Hadoop,« February 2016. [Mrežno].
Available: http://www.irdindia.in/journal_ijraet/pdf/vol4_iss2/27.pdf.
[6] J. P. Verma, B. Patel i A. Patel, »Big Data Analysis: Recommendation System with,« 2015. [Mrežno].
Available:
https://www.researchgate.net/profile/Jaiprakash_Verma/publication/282686173_Big_Data_Analysis
_Recommendation_System_with_Hadoop_Framework/links/57f4afb708ae280dd0b77681.pdf.
[7] K. R. Shrote i A. V. Deorankar, »Review based service recommendation for big data,« February 2016.
[Mrežno]. Available: https://ieeexplore.ieee.org/document/7538334.
[8] F. Z. Ennaji, A. E. Fazziki, M. Sadgal i D. Benslimane, »Social intelligence framework: Extracting
and

analyzing

opinions

for

social

CRM,«

November

2015.

[Mrežno].

Available:

https://ieeexplore.ieee.org/abstract/document/7507229.
[9] A. Go, R. Bhayani i L. Huang, »Twitter Sentiment Classification using Distant Supervision,« 2009.
[Mrežno]. Available: https://cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf.
[10] l. Bing, Opinions, Sentiment, and Emotion in Text, Cambridge University Press, 2015.
[11] E. Dumbil, Big Data Now, O'Reilly, 2012.
[12] Kazanova, »Sentiment140 dataset with 1.6million twwets,« 2017. [Mrežno]. Available:
https://www.kaggle.com/kazanova/sentiment140.
[13] C. t. W. projects, »Twitter,« 2007. [Mrežno]. Available: https://en.wikipedia.org/wiki/Twitter.
[14] A. Pak i P. Paroubek, »Twitter as a Corpus for Sentiment Analysis and Opinion Mining,« 2010.
[Mrežno].

Available:

https://pdfs.semanticscholar.org/6b7f/c158541d5a7be2b2465f7d8a42afa97d7ae9.pdf?_ga=2.1218413
55.1543760336.1572899814-899645452.1571167125.

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <collection collectionId="3">
      <elementSetContainer>
        <elementSet elementSetId="1">
          <name>Dublin Core</name>
          <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
          <elementContainer>
            <element elementId="50">
              <name>Title</name>
              <description>A name given to the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26245">
                  <text>Journal of Natural Sciences and Engineering</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="43">
              <name>Identifier</name>
              <description>An unambiguous reference to the resource within a given context</description>
              <elementTextContainer>
                <elementText elementTextId="26605">
                  <text>2637-2835</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="98">
              <name>DOI</name>
              <description>Digital object identifier</description>
              <elementTextContainer>
                <elementText elementTextId="26606">
                  <text>10.14706</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="45">
              <name>Publisher</name>
              <description>An entity responsible for making the resource available</description>
              <elementTextContainer>
                <elementText elementTextId="26607">
                  <text>International Burch University</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="41">
              <name>Description</name>
              <description>An account of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26608">
                  <text>Journal of Natural Sciences and Engineering (JONSAE) is a peer-reviewed, biannually published international journal focusing on empirical and theoretical research in all branches of Engineering and Natural Sciences. It is published on the behalf of Faculty of Engineering and Natural Sciences of International Burch University and aims to provide the best content regarding by publishing original research papers, review articles, special issues, feature articles, and book reviews. All manuscript submissions are subject to initial appraisal by the Editor, and, if found suitable for further consideration, to peer review by independent, anonymous referees. All peer review is double-blind and submission is online. The journal welcomes theoretical, applied, interdisciplinary and methodological work, with preference on empirical research, critical approach and problem-solving methods in manuscripts.</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="44">
              <name>Language</name>
              <description>A language of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26609">
                  <text>English</text>
                </elementText>
              </elementTextContainer>
            </element>
          </elementContainer>
        </elementSet>
      </elementSetContainer>
    </collection>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="26431">
                <text>Sentiment Analysis on Twitter Data using Big Data</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="26432">
                <text>Obada Almonajed, Samed Jukić</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="26433">
                <text>Abstract –With the increasing number of users and data on the Internet, especially social media sites,&#13;
sentiment analysis topic became one of the important and essential fields for most. Collection of&#13;
people's feelings and sentiment and classifying the data attracted most businesses and companies.&#13;
Recently, twitter sentiment analysis has attracted much attention, because of Twitter's growth and&#13;
popularity. The solution for handling enormous amounts of data from social media is a new term&#13;
called Big data. Big data is not just for having a large amount of data, but also the importance of&#13;
processing and the usage of the data.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="26434">
                <text>Keywords–big data, sentiment analysis, twitter, apache spark, social media, machine learning</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="43">
            <name>Identifier</name>
            <description>An unambiguous reference to the resource within a given context</description>
            <elementTextContainer>
              <elementText elementTextId="26435">
                <text>2637-2835</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="98">
            <name>DOI</name>
            <description>Digital object identifier</description>
            <elementTextContainer>
              <elementText elementTextId="26436">
                <text>10.14706/JONSAE2021311&#13;
</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="45">
            <name>Publisher</name>
            <description>An entity responsible for making the resource available</description>
            <elementTextContainer>
              <elementText elementTextId="26437">
                <text>Faculty of Engineering and Natural Sciences, IBU</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
  </item>
</itemContainer>
