Boitumelo Nkwe and Michael Kyobe, Department of Information Systems, University of Cape Town, Cape Town, South Africa
The increasing adoption of the Internet of Things (IoT) has introduced unique challenges to both users and the cybersecurity domain. As IoT evolves, cybersecurity threats and vulnerabilities meted against IoT devices also increased. IoT devices are susceptible to breaches, therefore forensic investigations focusing on IoT technologies need to be improved. This study aims to provide an understanding of the challenges in IoT forensics investigation since 2017. Furthermore, the article looks at different solutions in the form of frameworks and methodologies that have been developed to address these challenges and the gaps in the existing literature. The researchers adopted a systematic review methodology to guide the synthesis of the literature. The key issues highlighted in this study include the heterogeneous nature of IoT, the lack of proper investigative tools and frameworks that encompass all levels of IoT forensics, the lack of privacy, and the lack of standardization in the investigation process.
Internet of Things (IoT), IoT forensics, Cybersecurity, Challenges
Ibtesam Gwasem, Weichang Du, Andrew McAllister, Department of Computer Science, University of New Brunswick, Fredericton, Canada
Testing in software product lines is crucial for delivering high-quality software products that meet user needs. While research on software product line testing has primarily focused on functional attributes, verifying quality requirements has been overlooked. Quality attributes (i.e., security) are essential for a satisfactory user experience. Goal models have proven useful for capturing both functional and non-functional (quality) requirements in early system development stages. Researchers propose using goal models as a foundation for creating test cases to validate software systems. This paper introduces a methodology for verifying quality requirements in software product lines using goal models. The focus is on testing quality attributes of final software applications in product lines developed based on the feature and goal model approach. The methodology facilitates defining testable quality requirements, effective testing scope design, concrete test case creation, and efficient test case reuse. A prototype testing system was developed to support the methodology.
Software product lines, testing, non-functional requirements, test case reuse, reuse test case design
Xiaoqin HU, Beijing Language and Culture University, China
This research aims to explore a deeper representation of the internal structure and semantic relationship of multiword nouns (MWNs) for improving MWN discovery. This representation focuses on MWN formations, which follow a series of categorical and semantic constraints. The internal semantic relations of MWNs are represented by semantic class combinations of constituents, and the internal structures are represented by a set of categorical combinations in a hierarchy. These linguistically motivated semantic features are combined with statistically motivated semantic features, and the results present an improvement for MWN discovery.
Multiword nouns, automatic discovery, internal structure, internal semantic relation, semantic class combination, linguistic knowledge
Yanan Jia, Businessolver, USA
As human-machine voice interfaces provide easy access to increasingly intelligent machines, many state-of-the art automatic speech recognition (ASR) systems are proposed. However, commercial ASR systems usually have poor performance on domain-specific speech especially under low-resource settings. The author works with pre-trained DeepSpeech2 and Wav2Vec2 acoustic models to develop benefit-specific ASR systems. The domain specific data are collected using proposed semi-supervised learning annotation with little human intervention. The best performance comes from a fine-tuned Wav2Vec2-Large-LV60 acoustic model with an external KenLM, which surpasses the Google and AWS ASR systems on benefit-specific speech. The viability of using error prone ASR transcriptions as part of spoken language understanding (SLU) is also investigated. Results of a benefit-specific natural language understanding (NLU) task show that the domain-specific fine-tuned ASR system can outperform the commercial ASR systems even when its transcriptions have higher word error rate (WER), and the results between fine-tuned ASR and human transcriptions are similar.
Automatic Speech Recognition, DeepSpeech2, Wav2Vec2, Semi-supervised learning annotation, Spoken language understanding
HongLi Deng1 XinZhong Liu2 XianMing Bei3 1School of Liberal Arts, Jinan University, Guangzhou, Guangdong, China College of Culture and Communication, Guangxi Science and Technology NormalUniversity Laibin, Guangxi, China 2School of Liberal Arts, Jinan University, Guangzhou, Guangdong, China 3School of Chinese Language and Culture, Guangdong University of Foreign Studies, Guangzhou, Guangdong, China
Based on the theory of second language acquisition, The article analyzes the pronunciation of the “er” which means “two” in Chinese by learners from different native language backgroud, and explores the key acoustic characteristics and related influencing factors on the acquisition of retroflex vowels in S&P. The study has 4 findings: (1) F2 of retroflex vowels rises in S&P , and F3 falls, and the difference between F3 endpoint and F2 endpoint is small, which means that F3 and F2 are closer to each other. The key characteristics to the learner s “two” pronunciation is the slope of F3 and the value of F3, the greater the slope of F3 falls, the smaller the value, and the closer the learner’s “two” pronunciation is to the S&P. (2) Retroflex vowels are highly marked phonemes, which makes it difficult to acquire them. (3) Factors such as the acquisition environment, the length of second language acquisition time,influence the acquisition of retroflex vowel. (4)The early learning environment promotes the acquisition of retroflex vowel in Putonghua.
retroflex vowels; slope of F3; acoustic characteristics; influencing factors ;acquisition theory
Raul Salles de Padua Imran Qureshi and Mustafa U. Karakaplan, Stanford University, University of Texas at Austin, University of South Carolina
Financial analysis is an important tool for evaluating company performance. Practitioners work to answer financial questions to make profitable investment decisions, and use advanced quantitative analyses to do so. As a result, Financial Question Answering (QA) is a question answering task that requires deep reasoning about numbers. Furthermore, it is unknown how well pre-trained language models can reason in the financial domain. The current state-of-the-art requires a retriever to collect relevant facts about the financial question from the text and a generator to produce a valid financial program and a final answer. However, recently large language models like GPT-3  have achieved state-of-the-art performance on wide variety of tasks with just a few shot examples. We run several experiments with GPT-3 and find that a separate retrieval model and logic engine continue to be essential components to achieving SOTA performance in this task, particularly due to the precise nature of financial questions and the complex information stored in financial documents. With this understanding, our refined prompt engineering approach on GPT-3 achieves near SOTA accuracy without any fine-tuning.
Question Answering, GPT-3, Financial Question Answering, Large Language Models, Information Retrieval, BERT, RoBERTa, F
Wu Zhang Miotech, 69 Jervois St, Sheung Wan, Hong Kong
Duplicated training data usually downgrades machine learning models’ performance. This paper presents a practical algorithm for efficiently deduplicating highly similar news articles in large datasets. Our algorithm comprises three components - document embedding, similarity computation, and clustering- each utilizing specific algorithms and tools to optimize both speed and performance. We demonstrate the efficacy of our approach by accurately deduplicating over 7 million news articles in less than 4 hours.
News deduplication, natural language processing
Asrul Sani Ariesandy1, Mukhlis Amien2, Alham Fikri Aji3, Radityo Eko Prasojo4,1Sekolah Tinggi Informatika & Komputer Indonesia (STIKI), Malang, Indonesia, 2Kata.ai Research Team, Jakarta, Indonesia, 3Beijing Institute of Technology, China, 4Faculty of Computer Science, Universitas Indonesia
Neural Machine Translation (NMT) works better in Indonesian when it takes into account local dialects, geographical context, and regional culture (colloquialism). NMT is typically domaindependent and style-dependent, and it requires lots of training data. State-of-the-art NMT models often fall short in handling colloquial variations of its source language and the lack of parallel data in this regard is a challenging hurdle in systematically improving the existing models, despite the fact that Indonesians frequently employ colloquial language. In this work, we develop a colloquial Indonesian-English test-set collected from YouTube transcript and Twitter. We perform synthetic style augmentation to the source formal Indonesian language and show that it improves the baseline Id-En models (in BLEU) over the new test data.
Neural Machine Translation, NMT, Natural Language Processing, NLP, Low-Resource Language, Indonesian, Artificial Intelligence
Karol Lynch1, Joern Ploennigs1,2 and Bradley Eck1, 1IBM Research Europe, Dublin, Ireland, 2University of Rostock, Rostock, Germany
The usage of mathematical formulas as concise representations of a document’s key ideas is common practice. Correctly interpreting these formulas, by identifying mathematical symbols and extracting their descriptions, is an important task in document understanding. This paper makes the following contributions to the mathematical identifier description reading (MIDR) task: (i) introduces the Math Formula Question Answering Dataset (MFQuAD) with 7508 annotated identifier occurrences; (ii) describes novel variations of the noun phrase ranking approach for the MIDR task; (iii) reports experimental results for the SOTA noun phrase ranking approach and our novel variations of the approach, providing problem insights and a performance baseline; (iv) provides a position on the features that make an effective dataset for the MIDR task.
Information Extraction, Reading Comprehension, Large Language Models
Omar Shafie, Kareem Darwish, and Bernard J. Jansen, Hamad Bin Khalifa University
Hadith is the term used to describe the narration of the sayings and actions of Prophet Mohammad (p.b.u.h.). The study of Hadith can be modeled into a pipeline of tasks performed on a collection of textual data. Although many attempts have been made for developing Hadith search engines, existing solutions are repetitive, text-based, and manually annotated. This research documents 6 Hadith Retrieval methods, discusses their limitations, and introduces 2 methods for robust narrative retrieval. Namely, we address the challenge of user needs by reformulating the problem in a two-fold solution: declarative knowledge-graph querying; and semantic-similarity classification for Takhreej groups retrieving. The classifier was built by fine-tuning an AraBERT transformer model on a 200k pairs sample and scored 90% recall and precision. This work demonstrated how the Hadith Retrieval could be more ef icient and insightful with auser-centered methodology, which is an under-explored area with high potential.
Hadith, Knowledge-graphs, Arabic, Semantic Similarity
Ved Vasu Sharma and Anit Bhandari, SquadStack Inc. New Delhi, India
The Telesales market is worth about US$ 27 Bn in 2022 and is expected to grow to US$ 55 Bn by 2029 across the globe. India is not only a huge consumer market but also a large rewarding market for telesales experts with low operational costs. Even when a large number of call recordings are generated on a daily basis, telesales is one of the most untouched markets when it comes to Engineering Innovations and AI applications involving Linguistics, NLP, Audio processing, etc. Speech Recognition is generally a pre-requisite for most of these applications. Hence, we have proposed a solution for Speech Recognition for the telesales industry in a huge market like India, operating primarily in Indic languages and accents for which no general-purpose ASR has acceptable performance. Our model achieves a competitive WER of 19.42% on the telesales dataset in the Indian context.
Speech Recognition, Telesales, Indic Languages, Hinglish, Audio Processing.
Mohamed Abdul Karim Sadiq1, Thirumurugan Shanmugam2 and Nasser AlFannah3, 1, 2Department of Information Technology, College of Computing and Information Science, University of Technology and Applied Sciences, Suhar campus, Oman, 3Deputation, Ministry of Transport, Communications and Information Technology, Sultanate of Oman
Despite the existence of many job portals, both employers and candidates face difficulties in the search process. Primarily, the problem arises due to the mismatch in expressing the requirements with the contents of profiles. Though certain automated systems exist to support the recruitment procedure, application of Natural Language Processing (NLP) could enhance the extraction of useful information and rank the resume documents of candidates. The challenge is to transform unstructured textual data to structured reusable information. This issue is more evident in the case of young job seekers with minimal or no previous work experience. Suitable techniques in NLP are explored along with relevant data sets to enhance the employment process in a smart manner.
Natural Language Processing, Human Resources Management, Information Extraction, Resume Matching.
Ekrem Duman, Department of Industrial Engineering, Ozyegin University, Istanbul, Turkey
For building successful predictive models one should have enough number of examples for the class to be predicted (the positive class). When the number of examples of the positive class is very small, building strong predictive models becomes a very challenging task. In this study we pick up one such problem: predicting the bank personnel which might commit fraud (stealing money from customer accounts). For this problem, in order to have a strong enough predictive model, we decided to combine the powers of descriptive and predictive modeling techniques where we developed several descriptive models and used them as an input of a predictive model at the last stage. The results show that our solution approach perform quite well.
Personnel fraud, predictive modeling, banking.
Amer Abuhantash, Department of Business Administration , University of the People, United States
This research study aims to investigate the relationship between knowledge management, information technology (IT) investment, and economic prosperity in Middle Eastern countries, with a specific focus on the United Arab Emirates (UAE) and Saudi Arabia. The Middle East region has witnessed significant economic growth and transformation in recent decades, and understanding the factors that contribute to economic prosperity is crucial for sustainable development. Knowledge management and IT investment have emerged as important drivers of economic growth in various contexts. This research will examine how knowledge management practices and IT investments influence economic prosperity in the UAE and Saudi Arabia, exploring similarities and differences between the two countries. The findings of this research will contribute to the existing literature on knowledge management, IT investment, and economic prosperity, while providing valuable insights for policymakers and business leaders in the Middle East.
knowledge management, information technology investment, economic prosperity, Middle Eastern countries, United Arab Emirates, Saudi Arabia.
Asma Charfi, Takwa Kochbati, ChokriMraidha and Université Paris-Saclay, CEA, List, F-91120, Palaiseau, France
In this paper, we will investigate the role that can play the AI in adopting a Model Based SystemEngineering (MBSE) tool. The MBSE approach is widely adopted in the development of complex systems (real time systems, cyber physical systems, system of systems, etc.) however, in the practice, the tools implementing this approach are facing several problems and are far from being adopted by system provider. We argue that the AI can be useful and beneficial if integrated in the right MBSE step and that the need of using AI techniques (either Machine learnings, NLP …) in MBSE tools should be more investigated to fit the stakeholders’ need.
MBSE, AI, ALM, NLP, MBSE Tool
Bojan Nokovic, McMaster University, Computing and Software Department, 1280 Main Street West, Hamilton, Ontario, Canada
Over probabilistic models, we analyze an innovative online authentication process based on image recognition. For true positive identification, the user needs to recognize the relationship between identified objects on distinct images which we call an outer relation, and the relation between objects in the same image which we call an inner relation. We use probabilistic computational tree logic formulas (PCTL) to quantify false-negative detection and analyze the proposed authentication process. That helps to tune up the process and make it more convenient for the user while maintaining the integrity of the authentication process.
Hierarchical State Machines; Probabilistic Model Checker; Costs/rewards; Verification
Nieto Bernal Wilson1 and Vega Jurado Jaider2, 1Department of Systems Engineering, Norte Universidad, Barranquilla-Colombia, 2Departament of Entrepreneurship and Management, Norte Universidad, Barranquilla-Colombia
Disruptive technologies today have become the catalyst to promote the development of individual and organizational capacities. The work focuses on integrating the use of emerging technologies such as cloud computing, big data, the internet of things, blockchain, datasets, data warehouses, lake data, machine learning, data analytics, simulation, hyper-automation, and social networks, among others, to respond to the organizations requirements associated with the fulfillment of objectives, goals, kpis, regulation, trends, the creation of goods and services, loyalty, and integrated management. The response to this type of requirement translates operationally into new project processes and programs that are structured in a corporate portfolio and that are addressed with intensive agile methodologies in collaboration, communication, self-management, and virtual environments, enabling the development of these new organizational capabilities and especially the innovation that organizations require to interact within an ecosystem that today is highly digital (customers, suppliers, employees, regulators, investors, states, and competitors in general). this work presents a framework for the comprehensive development of platforms for knowledge management and transformative innovation, based on emerging information architecture (IA, DW, ML) for its implementation, so it is convenient to develop a current profile of the organization that allows identifying which are those innovation capabilities that you want to develop (organization, processes, products, services, technologies, knowledge, r&d, among others) so that from there an objective profile is established which must establish the desired capabilities, finally, the difference between the current capacity and the objective capacity gives rise to a gap, which is addressed through an implementation plan that allows achieving the desired innovation capacities, this process develops on a timeline and is projected recursively, giving rise to a process of continuous improvement in terms of development of innovation capabilities within a digital ecosystem.
Framework for developed, Disruptive Digital Platform, DevOps as Service, MOPD.
FuChe Wu1 and Andrew Dellinger2, 1Providence University, Taiwan, 2Elon University, USA
This paper proposes an algorithm for improving estimation accuracy in industrial applications. Traditionally, a weighted sum method is used to map two views, but this approach often leads to blurry results. Instead, the winner-take-all approach is suggested as a means of achieving better accuracy. Three criteria are introduced for evaluating image quality, including sharpness (which measures the effect of motion artifacts from the RGB camera), flatness (which estimates the number of parts in the depth image that belong to plane parts), and fitness (which checks the match between the current view and existing map). While depth images provide 3D structure of the environment, they typically lack sufficient resolution to deliver accurate results. However, by estimating a plane, accuracy can be improved and a more precise boundary can be obtained from the higher resolution of the RGB image.
Accuracy, winner-take-all, depth image.
Kaiqing Fan1, Siwen Yang2, Zongbao Dai3 , 1United Automotive Electronic Systems Co., Ltd, China, AI Lab, 2Beijing Brainpower Pharma Consulting Co. Ltd, China, Data Team, 3United Automotive Electronic Systems Co., Ltd, China, Big Data Team
Traditionally, to identify the unique fingerprint of each mobile phone has its limitations. These limitations cause lots of the Internet black industry chain, the internet black and gray industry caused a loss of trillions of dollars due to fraud all over the world, of which smart phones accounted for a large part in 2020. Based on the I/Q signal and AI algorithm, we can correctly identify the unique fingerprint of each mobile phone hardware device. It is because the I/Q signals from mobile phone hardware device cannot be changed by hackers; But under the traditional method, lots of parameters which are made of fingerprint of each mobile phone can be changed by hackers. Through the combination of I/Q signals and AI algorithm, we can correctly identify 95% and higher of the unique fingerprint of mobile phones. Furthermore, because we can track and identify each mobile phone hardware device, the cost of the Internet black industry chain will be much higher than before, since each hardware device costs a lot. This way may be a nice method to distinguish the Internet black industry chain.
I/Q Signal, AI Algorithm, Identification, Unique Fingerprint, Mobile Phone Hardware Device .
Shrijak Dahal and Aayushma Pant, Institute of Engineering, Tribhuvan University, Nepal
Manipulation of facial appearances, voices and photos using deep generative approaches also called Deepfake has favoured a wide range of benign and malicious applications. The evil use of this technology has created frauds, false allegations and hoaxes that undermine and destabilize organizations. Even though many algorithms have been seen to produce realistic faces and voices, there are some artifacts that are hidden from the naked eyes and need to be alleviated. In this research work, we focus on identifying and detecting Deepfake videos and audios. DCT (Discrete Cosine Transform) is implemented for extracting image features which is passed on multi layered CNN (Convolutional Neural Network) architecture network along with original video images. Likewise, filter banks and MFCC (Mel-Frequency Cepstral Coefficient) are used for audio processing followed by CNN architecture to detect real and fake audios.
Audio Forensics, CNN, DCT, Deepfake, FFT, MFCC, Video Forensics .