+ All Categories
Home > Documents > Uncertainty dynamics in games and its implication to ...

Uncertainty dynamics in games and its implication to ...

Date post: 16-Jan-2022
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
28
Uncertainty dynamics in games and its implication to entertaining activity assessment by GAO Yuexian submitted to Japan Advanced Institute of Science and Technology in partial fulfillment of the requirements for the degree of Doctor of Philosophy Supervisor: Professor Hiroyuki Iida School of Information Science Japan Advanced Institute of Science and Technology October,2021
Transcript

Uncertainty dynamics in games and its implication to

entertaining activity assessment

by

GAO Yuexian

submitted to

Japan Advanced Institute of Science and Technology

in partial fulfillment of the requirements

for the degree of

Doctor of Philosophy

Supervisor: Professor Hiroyuki Iida

School of Information Science

Japan Advanced Institute of Science and Technology

October,2021

Absract

Research Content

Games are embedded and grown with the human culture even before human society

emerged (Huizinga1980). With artificial intelligence (AI) advancement, games are gradu-

ally regarded as an accessible way to simulate human society and explore more profound

insights into human science. The human mind, often considered synonymous with con-

sciousness, has been called the last frontier of science (Herbert 1993). Human mind is as

mysterious as universe, where the physics-in-mind could be reasonably related to physics

in nature. What this dissertation interested, is probing into a deeper sight of the mech-

anism of physics in mind using game and game-liked activities as the benchmark. This

thesis is about establishing the correlation between the psychological model and the in-

formation model, incorporating concept from physics, which can help people to better

understand the human life.

Research Purpose

This dissertation focuses on the uncertainty dynamics in games and its implication to

analyzing entertaining activity. (1) Firstly, the human player-versus-human player rules

based card game is firstly discussed, which can be competed by multiple players at the

same time but can maintain the same competition for each player. This study set experi-

ments and simulate DouDiZhu self-play. The preliminary simulation results revealed that

three-player DouDiZhu is perfectly refined with sophistication, entertainment, and fairness

on the measurement of game refinement at DouDiZhu[(3, 1, 2)(20, 17, 17)]. Cooperation

is essential to maintain the fairness and uncertainty of both sides. (2) It then explores

single-player arcade games where human players simply compete against the rules of the

game and can significantly observe individual player skill improvement through practice.

Four popular arcade games are selected and analyzed as benchmark. The application

scope of the theory of motion in mind is expanded by incorporating relative velocity and

resultant force. A feasible scheme of potential growth rate is proposed to measure the

single-agent arcade games that were unquantifiable before because these kinds of games

have no definitive game length. It found jerk’s dynamics would affect game’s uncertainty

to a great extent and is the essential factor to sustain the game’s engagement. (3)Then

we discuss idle games, which are not competitive and even encourage players not to par-

ticipate in the progress of the game. Uncertainty is the expectation to closing the gap

i

between income and cost in idle game domain. It also found that not only the equations

of motion in mind model would help to analyze the idle games, but also the derivatives of

the functions are also able to. Motives in mind (Eq) and predictive motion tendency (~p2)

are found to be the most important parameter when applied to the entertainment without

much interaction. Moreover, long-term jerk was found able to maintain the freshness to

the player. Synchronization can also help to maintain the engagement of the player.

From chapter to chapter, the model of motion in mind is also established, optimized

its range, and evolved into a relative sophisticated shape. Taking different games and

recreational activities as objects and using the construction of physical models in the

mind as media, this study explores the effects of dynamic changes of uncertainty on the

entertainment and attraction of games and recreational activities. Through the physical

modeling of the motion in mind, this dissertation innovatively constructs the operation

mode of the human mind world, provides a brand-new understanding and angle for the

study of either human beings themselves or the nature of entertainment in different areas

of the motion in mind model and is meaningful to establish entertainment science.

Keyword: Game; Entertainment; Uncertainty; Information dynamics; Engagement;

DouDiZhu; Arcade; Idle game

Research Accomplishment

Journal Papers and Book Chapter

1. Gao Y., Li, W. , Xiao, Y., Khalid, M.N.A., Iida, H., Nature of Attractive Multiplayer

Games: Case Study on China’s Most Popular Card Game - doudizhu, Information,

2020, 11(3): 141, 2020.

2. Gao Y, Khalid M N A, Umi Kalsom Yusof, H.Iida, Coopetition in Solving Combina-

torial Optimization Problem: Application to the Industrial Assembly Line Balancing

Problem. TEST Engineering & Management, vol. 82, pp.11992-12000,2020.

1. Gao Y., Li W, Khalid M N A, Iida, H., Quantifying attractiveness of incomplete-

information multi-player game: case study using doudizhu. In: Alfred R., Lim Y.,

Haviluddin H., On C. (eds) Computational Science and Technology. Lecture Notes

in Electrical Engineering, vol 603. Springer, pages 301-310, 2019.

2. M. N. A. Khalid, Umi Kalsom Yusof, Yuexian GAO, H. Iida. Coopetition in Solving

Combinatorial Optimization Problem: Application to the Industrial Assembly Line

Balancing Problem. In: Alfred R., Lim Y., Haviluddin H., On C. (eds) Compu-

tational Science and Technology. Lecture Notes in Electrical Engineering, vol 603.

Springer, 2019.

3. Gao Y. “Construction of Popularization Space for Popular Science Reading”, Read-

ing Promotion of Science Popularization, National Library of China Publishing

House, 2019.

4. Gao, Y. Doudizhu Tournament on the 2019 Computer Olympiad. ICGA Journal,

vol.42, no.1, pp.1-3,2020.

Presentations at international academic conferences

1. Gao Y., Li W, Khalid M N A, Iida, H. Quantifying attractiveness of incomplete-

information multi-player game: case study using DouDiZhu. The Sixth Interna-

tional Conference on Computational Science and Technology(ICCST2019),48, Kota

Kinabalu, 8.2019.

2. Gao Y, Khalid M N A, Umi Kalsom Yusof, H. Iida. Coopetition in Solving Combi-

natorial Optimization Problem: Application to the Industrial Assembly Line Bal-

ancing Problem. The Sixth International Conference on Computational Science and

Technology(ICCST2019), 18, Kota Kinabalu, 8. 2019.

3. Gao Y, Gao N, Khalid M N A, Iida, H. Finding Flow in Training Activities by Ex-

ploring Single-Agent Arcade Game Information Dynamics. The 19th International

Federation for Information Processing – International Conference on Entertainment

Computing (IFIP-ICEC2020), 17, Xian, 11. 2020.

Other

• JAIST Heisei 31th year research site formation support business: budding research

support. A new theoretical construction of entertainment science - science as a ride

for the world of mind.

Dissertation Abstract

Blind Estimation of Room Acoustic Parameters and Speech Transmission IndexBased on the Concept of the Modulation Transfer Function

Intended degree: Doctor of Science (Information Science)Laboratory: Unoki-LabStudent Number: 1820422Name: Mr. Suradej Duangpummet

1. Research Content

Assessment of the quality of auditory spaces is essential in acoustics and speech signalprocessing. In room acoustics, the characteristics of auditoriums are related to the qualityof life in many aspects. Specifically, emergency announcements and alarm sounds need tobe easily audible and intelligible since, in emergency circumstances, we can follow the safetyprocedures appropriately. For theaters or concert halls, excellent sound characteristics andsuperior acoustics are ideal environments for performances. The sounds of live performancesshould be clear and transparent so that attendees can enjoy the entertainment. In speechsignal processing, many algorithms can benefit from prior knowledge of acoustic conditionssuch as speech dereverberation and noise suppression. Consequently, sound quality andspeech intelligibility can be improved.

Intelligibility of speech and pleasure to music are subjective descriptions. It is difficultto convey such descriptions from listeners to architects who are responsible for designing au-ditoriums or diagnosing acoustic problems. Conventionally, speech intelligibility and soundquality can be determined by conducting listening experiments with a group of listeners.Unfortunately, the experiments are expensive, unreliable, and time-consuming. It is alsoimpractical for real-time applications. For instance, hearing aids, automatic speech recog-nition, and speaker verification might need to evaluate acoustic characteristics to enhancetheir performance at a moment. Thus, the quality of a sound field and subjective aspectsare defined through room acoustic parameters and objective indices related to the physicalproperties of a sound field. Hence, architects, acousticians, and signal processing algorithms,can justify acoustic conditions by measuring or predicting acoustical parameters.

Several useful room acoustic parameters and objective indices have been standardized.In IEC 60268-16 : 2020, the speech transmission index (STI), which is an objective index,is used to predict speech intelligibility from the quality of a speech transmission channel.The STI is calculated based on the concept of the modulation transfer function (MTF).The MTFs of seven-octave bands with their weighting values are converted to be a realnumber from 0 to 1 (bad to excellent intelligibly). In ISO 3382: 2009, it specifies methodsfor measurement the reverberation times (T60 or T30) and other room-acoustic parameters,including early decay time (EDT), clarity (early-to-late-arriving sound energy ratios: C80 formusic or C50 for speech), Deutlichkeit (early-to-total sound energy ratio: D50), and centertime (Ts). These parameters are derived from measuring the room impulse response (RIR).

1

In the time domain, an RIR completely describes the characteristics of a sound field. Sim-ilarly, a system transfer function in the frequency domain and the MTF in the modulation-frequency domain are the counterparts. In general, the RIR or MTF needs to be measured.However, it is difficult to measure RIR or MTF in daily-life places where people cannotbe excluded, e.g., public stations, airports, and department stores. Moreover, by the na-ture of such public areas, room acoustics are prone to be a time-varying system. Soundabsorption, reverberation, or other acoustical parameters are changed by varying occupantsand object arrangements. Thus, acoustic parameters that were measured complying withthe standards might be different from the current one. Hence, many methods have beenproposed to estimate an acoustical parameter without measuring the RIR, known as blindestimation methods.

The blind estimation of an acoustical parameter is an ill-posed condition because bothsound source and RIR are unknown. The ill-posed or blind inverse problem is challeng-ing since it needs additional assumptions or complementary prior knowledge to formulatethe estimation. Furthermore, the robustness of the estimator against various rooms (e.g.,diffuse/non-diffuse field and connected chamber) and background noise needs to be takeninto account. The MTF is the basic concept to estimate the amount of reverberation andnoise of a bounded space with a single-channel microphone. To this end, this researchpresents blind estimation methods for estimating five-room acoustic parameters, STI, andSNR from a degraded speech signal in reverberant and noisy reverberant environments.

In a reverberant environment, the unknown RIR is modeled by using a stochastic RIRmodel. Two RIR models were investigated: Schroeder’s RIR model and the extended RIRmodel. The reverberation time is only a single parameter in Schroeder’s RIR as a simpleexponential decay (TR). The extended RIR model is the extended version of Schroeder’sRIR model. It consists of three control parameters, including rising parameter (Th), peakposition (T0), and exponential decay parameter (Tt). Thus, the extended RIR model is amuch more accurate and flexible model. The parameter TR in Schroeder’s RIR and thethree parameters of the extended RIR model are blindly estimated. The envelope of theRIR model is reconstructed from those of the estimated parameters.

Sub-band analysis is used as the same as the algorithm for calculating the STI. Thedistortion in seven-octave bands is estimated through the parameters of the RIR model. Theapproximated RIR for each sub-band can be reconstructed from their envelope modulatedwith band-limited noise. The wide-band RIR is also approximated from the summationof the sub-band signals. Therefore, the estimated acoustical parameters and STI for bothsub-band and wide-band can be derived.

A speech signal can be decomposed into a fine structure and temporal structure. Fortemporal structure, a power envelope (PE) or temporal amplitude envelope (TAE) is usedas a feature. On the basis of the MTF, PE or TAE represents the modulation distortioncaused by reverberation and noise of the transmission channel or sound field. The TAE alsoplays an important role in speech intelligibility. In the proposed scheme, these features areextracted from an observed signal by using Hilbert transform and a low-pass filter. Theimportant modulation frequency related to human perception is between 4− 16 Hz and thedominant frequency at 4 Hz. An observed signal in a given room is regarded as the output

2

of the convolution between the RIR and speech signal. Hence, the modulation features(TAE/PE) and the convolution operation are the basic concept.

One-dimensional convolutional neural networks (CNNs) with TAE feature extractedfrom a four-second observed signal. Then, a more sophisticated network, i.e., a combinednetwork between CNNs and long short-term memory (LSTM) networks was used. TheseDNNs were trained from pairs of TAE/PE and parameters of the RIR models. In addition,data augmentation techniques are used since the dataset of measured RIRs is inadequate fortraining the DNNs. The reverberant speech signals were synthesized from the RIR models.Then, the measured RIRs were used for verification the effectiveness.

The effectiveness and performance of the proposed methods were evaluated. Simulationswere performed by estimating the parameters from reverberant and noisy reverberant speechsignals. The accuracy of the estimated acoustical parameters was compared with baselinescalculated from measured RIRs and existing works. The robustness against various back-ground noise was also evaluated by adding four types of noise with different SNR levels intothe reverberant speech signals.

The experimental results suggest that the proposed method can correctly, blindly, andsimultaneously estimate five-room acoustic parameters, STI, and SNR from a speech signalin reverberant and noisy reverberant environments. The accuracy in terms of standardderivation of the error of the estimator for each parameter, i.e., T60, EDT, C80, D50, Ts,and STI, was 9.4%, 10.5%, 2.7 dB, 14%, 45 ms, and 0.05, respectively. These resultsof the estimated parameters were close to the standard measurement derived from theRIR. Moreover, the SNR was estimated along with those parameters. The accuracy of theestimated SNR was approximately 10 dB.

2. Research Purpose

In recent researches and developments in applications of acoustics, blindly estimation ofroom acoustic parameters and room impulse response is interesting, challenging, and activeresearch topic. Speech signal processing also undergoes rapid development based on theseestimations. Since blind parameter estimation is identical to blind system identification,blind equalization, and blind inverse problem, this research might be useful in other researchareas such as geology, physic, image processing, and control system.

Existing researches focus on estimating only one or two acoustical parameters. However,there are various informative acoustical parameters for different purposes. Therefore, thisresearch aims to estimate many useful room acoustic parameters related to speech intelligi-bility and sound quality in a quasi-real-time environment.

The originality of this research is the blind estimation framework for a more accuratestochastic model of the room impulse response. Instead of estimating one or two acousticalparameters such as reverberation time or speech transmission index, this research estimatesthe parameters of the extended RIR model so that many acoustical parameters can bederived from the reconstructed stochastic RIR model.

The novelty is that the proposed method is blindly estimating the parameter of the RIRmodel on the basis of the modulation transfer function (MTF) in octave bands and the

3

MTF-feature is incorporated into the non-linear regression using the convolutional neuralnetworks (CNNs). As a result, the five-room acoustic parameters and speech transmissionindex can be estimated simultaneously and accurately.

3. Research Accomplishment

[1] “Blind estimation of speech transmission index and room acoustic parameters basedon the extended model of room impulse response,” Suradej Doungpummet, Jessada Karn-jana, Waree Kongprawechnon, and Masashi Unoki, Applied Acoustics (Journal paper withpeer review), Vol. 185, pp. 1–12, 2022.

[2] “Blind Estimation of Room Acoustic Parameters and Speech Transmission Index us-ing MTF-based CNNs,” Suradej Doungpummet, Jessada Karunjana, Waree Kongprawech-non, and Masashi Unoki, 29th European Signal Processing Conference (EUSIPCO), (Inter-national conference with peer review), 2021.

[3] “Speech Privacy Protection based on Optimal Controlling Estimated Speech Trans-mission Index in Noisy Reverberant Environments,” Suradej Doungpummet, Jessada Karn-jana, Waree Kongprawechnon, and Masashi Unoki, 28th European Signal Processing Con-ference (EUSIPCO), (International conference with peer review), 2020.

[4] “Speech Privacy Protection based on Controlling Estimated Speech TransmissionIndex,” Phunruangsakao Chatrin, Kraikhun Phrimphissa, Suradej Doungpummet, JessadaKarnjana, Waree Kongprawechnon, and Masashi Unoki, in 17th International Conference onElectrical Engineering/Electronics, Computer, Telecommunications and Information Tech-nology (ECTI-CON), (International conference with peer review), pp. 628–631, 2020.

[5] “A Robust Method for Blindly Estimating S peech Transmission Index using Con-volutional Neural Network with Temporal Amplitude Envelope,” Suradej Doungpummet,Jessada Karnjana, Waree Kongprawechnon, and Masashi Unoki, Int. Proc. Asia-PacificSignal and Information Processing Association Annual Summit and Conference (APSIPA),(International conference with peer review), pp.1208-1214, 2019.

[6] “Room Acoustic Parameters Estimation using MTF-based CNNs,” Suradej Doungpu-mmet, Jessada Karnjana, Waree Kongprawechnon, and Masashi Unoki, in Acoustic Societyof Japan, Spring meeting, (Domestic conference without peer review), 2021.

[7] “Study on Robust Method for Blindly Estimating Speech Transmission Index usingConvolutional Neural Network with Temporal Amplitude Envelope,” Suradej Doungpummet,Jessada Karnjana, Waree Kongprawechnon, and Masashi Unoki, in Engineering Acoustics,IEICE, (Domestic conference without peer review), no.163 pp.47-52, 2019.

[8] “Blind Estimation of Speech Transmission Index in Noisy Reverberant Environmentusing Deep Learning with Modulation Spectrum,” Suradej Doungpummet, Jessada Karn-jana, Waree Kongprawechnon, and Masashi Unoki, in Acoustic Society of Japan, (Domesticconference without peer review), 2019.

4

On Attractor Detection and OptimalControl of Boolean Networks

Doctoral Degree

Hiraishi’s laboratory

Trinh Van Giang

1820424

1 Research Content

Boolean Networks (BNs) are simple but efficient mathematical formalismfor modeling and analyzing complex biological systems, such as, gene reg-ulatory networks, signal transduction networks. Beyond systems biology,BNs have widely been applied to various other areas, such as, mathematics,neural networks, social modeling, robotics, and computer science. Besidesa plenty of applications, BNs are also interesting mathematical objects thathave recently attracted various work in theory. Attractor detection and op-timal control of BNs are two crucial but challenging issues of research onBNs. They have also become hot topics in many research communities aswell as had a plenty of applications in many areas. However, the existingtheories and methods for these issues mainly focus on synchronous typesof BNs and few ones focus on asynchronous types of BNs. Moreover, theexisting methods are inefficient when the size of the network becomes largeor the structure of the network becomes more complex. Hence, we focus inthis research on developing theories as well as efficient methods for attrac-tor detection and optimal control of different types (mainly asynchronousones) of BNs. We consider different types of BNs because of the follow-ing reasons. First, each type of BNs has its part in real life and can besuitable for modeling a specific type of systems; the choice among themin a specific circumstance depends on the available data and application.Second, relations in dynamics between different types of BNs can be ex-ploited to efficiently analyze BNs. Finally, this consideration may providenew theoretical insights to the theory of BNs.

1

2 Research Purpose

In theoretical aspects, we explore a number of new theoretical results thatcontribute to understanding the dynamics of BNs. First, we discover sev-eral relations in dynamics between different types of BNs. In addition,we also obtain several relations in dynamics between BNs and other con-ventional models. In particular, we demonstrate that these findings pavethe potential ways to analyze different types of BNs as well as many otherconventional models. Second, we discover several relations between thedynamics of a BN and its network structures. More specifically, we for-mally state and prove several relations between the dynamics of a BN anda feedback vertex set of its interaction graph. Notably, these relations donot depend on the updating scheme of the BN. Furthermore, we also stateand prove a theorem on relations between the dynamics of an asynchronousBoolean network and a negative feedback vertex set of its interaction graph.Finally, we introduce several complexity analysis on three meaningful opti-mal control problems of deterministic asynchronous probabilistic Booleannetworks. Specifically, we prove that all the three problems (also theirrestricted versions) are NPPP-hard and in PSPACE.

In practical aspects, we develop several algorithms and methods for at-tractor detection and optimal control of different typical types of BNs,which have been widely used in many research fields (mainly systems bi-ology). These algorithms and methods are mainly based on the new the-oretical results obtained in this research along with the reasonable use offormal techniques (e.g., binary decision diagrams, satisfiability, satisfiabil-ity modulo theory, bounded model checking, Petri nets). It is noted thatsome of them are the first analytical and practical methods with respectto attractor detection of the considered type. Furthermore, we imple-ment software tools for all the proposed algorithms and methods as wellas conduct experiments to evaluate their performance. The experimentalresults on various classes of networks (randomly generated and real bio-logical networks) show that our algorithms and methods outperform thecorresponding state-of-the-art ones and can handle large-scale networks. Inparticular, our method for finding attractors of an asynchronous Booleannetwork can handle very large networks with up to 1000 nodes in term ofrandomly generated networks and more than 300 nodes in terms of realbiological networks. Notably, the principle that we propose in our algo-rithms and methods is general, thus enabling us to apply it to many types

2

of BNs as well as paving potential ways to improve these algorithms andmethods.

Although systems biology has served as the main motivation for ourresearch, applications of this research are by no means limited to biologicalsystems. For example, asynchronous BNs are useful in modeling, studying,and controlling nonlinear dynamics in multivariate systems. In addition,BNs provide a convenient modeling framework to explore general proper-ties of complex systems in general, such as, self-organization, criticality,causality, canalization, robustness, and evolvability. Other systems thatcan be modeled by BNs include multi-agent systems, social networks (e.g.,information flow on Twitter or Facebook), smart grids, and supply chainnetworks (e.g., movement of materials). Since we consider general BNs(i.e., there is no restriction in Boolean functions) as well as different typesof BNs, the results (theoretical results and computational methods) intro-duced in this research can be applied to a wide range of other systems.

Keywords: Boolean networks, gene regulatory networks, attractor de-tection, optimal control, formal methods.

3

Publications and Awards

Journal papers (peer reviewed)

[1] Trinh Van Giang and Kunihiko Hiraishi: “On attractor detection and optimal con-trol of deterministic generalized asynchronous random Boolean networks,” IEEE/ACMTransactions on Computational Biology and Bioinformatics, 2020, in press. https://doi.org/10.1109/TCBB.2020.3043785.

[2] Trinh Van Giang, Tatsuya Akutsu and Kunihiko Hiraishi: “An FVS-based approachto attractor detection in asynchronous random Boolean networks,” IEEE/ACMTransactions on Computational Biology and Bioinformatics, 2020, in press. https://doi.org/10.1109/TCBB.2020.3028862.

[3] Trinh Van Giang and Kunihiko Hiraishi: “A study on attractors of generalizedasynchronous random Boolean networks,” IEICE Transactions on Fundamentalsof Electronics, Communications and Computer Sciences, 103(8), 987-994, 2020.https://doi.org/10.1587/transfun.2019EAP1163.

Conference papers (peer reviewed)

[4] Trinh Van Giang and Kunihiko Hiraishi: “An improved method for finding attrac-tors of large-scale asynchronous Boolean networks”, in Proc. 18th IEEE Interna-tional Conference on Computational Intelligence in Bioinformatics and Computa-tional Biology (CIBCB 2021), 1-9, 2021. https://doi.org/10.1109/CIBCB49929.2021.9562947.

[5] Trinh Van Giang and Kunihiko Hiraishi: “An efficient method for approximating at-tractors in large-scale asynchronous Boolean models,” 13th International Workshopon Biological Network Analysis and Integrative Graph-Based Approaches (IWBNA2020), in Proc. IEEE International Conference on Bioinformatics and Biomedicine(BIBM 2020), 1820-1826, 2020. https://doi.org/10.1109/BIBM49941.2020.9313230.

[6] Trinh Van Giang and Kunihiko Hiraishi: “Algorithms for finding attractors of gen-eralized asynchronous random Boolean networks”, in Proc. 12th Asian ControlConference (ASCC 2019), 67-72, 2019. http://ieeexplore.ieee.org/document/8765169.

4

A STUDY OF VISUAL QUESTIONANSWERING FOR BLIND PEOPLE

Doctoral Degree

NGUYEN’s laboratory

LE THANH TUNG

1820435

1 Research Content

Multi-media website which contains tons of image and text data has a highdemand for extracting and understanding representation and relationshipof image and question to support users for retrieving information, answer-ing questions, etc. Especially, it is essential to support blind people as wellas visual impaired community to overcome difficulties in their daily lives.If we can build a model to answer textual questions in images, we can au-tomatically understand knowledge of visual and textual content togetherwithout a physical vision requirement. This task raises some challengesdue to unique characteristics of multi-modal systems as well as a specificdomain for blind people including i) question may not be in well-grammartexts; ii) image is poor quality from the collecting process that requires arobust approach to extract visual features; iii) unanswerable sample ap-pears the question answering task.

The aim of this study is to take advantage of advanced Deep Learningtechniques to understand and extract meaning and relationship betweenimage and question to predict answers. To this end, the research questionis how to employ deep learning architectures to represent and combine theimage and question effectively to obtain their hidden relationship especiallyin the special challenges in VQA dataset for the blind.

1

2 Research Purpose

To answer the above research question, we propose four subtasks as follows:

• Answerability Prediction - determines whether the content of images isanswered by a question or not. By taking advantage of Transformer ar-chitecture, we propose a VT-Transformer model to extract the visualand textual feature delicately thanks to the strength of pre-trainedmodels. According to the experimental results, VT-Transformer gen-erally outperforms the existing baselines. Besides, we also achieve thebest result in VizWiz-VQA 2020 and 2021 competitions.

• Visual Question Classification - divide VQA samples into the specifickinds of questions. Dealing with the difficulties on object-less images,we thus propose an Object-less Visual Question Classification model,OL–LXMERT, to generate virtual objects replacing the dependenceof Object Detection in previous Vision-Language systems. Throughour experiments in our modified VizWiz-VQC 2020 dataset of blindpeople, our Object-less LXMERT achieves promising results in thebrand-new multi-modal task in comparison to competitive approaches.

• Yes/No Visual Question Answering - solves the specific kind of ques-tion instead of all kinds of questions. In this task, we point out the im-portance of Yes/No question types and propose the BERT-RG modelwhich combines the strength of ResNet and VGG to extract the resid-ual and global features to obtain the visual information. By inte-grating the stacked attention, the relationship of question and imagesare intensified by the regional features. Through the detail of exper-iment and ablation studies, our model outperforms the competitiveapproaches in VizWiz-VQA 2020 dataset and competition.

• General Visual Question Answering - determines the answer in allkinds of questions. In this work, we propose the novel Bi-directionCo-Attention Network to intensify the textual and visual features si-multaneously. Besides, we also apply the VT-Transformer to extractmeaningful image and text information. Our method Bi-direction Co-Attention VT-Transformer consistently shows strong performances inthe VizWiz-VQA dataset. Besides, it also achieves a promising resultin the latest competition in VizWiz-VQA 2021.

2

According to the results, our approaches obtain significant improvementcompared to the previous works on Visual Question Answering. Especially,our consideration of the VQA task is promising to deploy in the practice,especially for blind people. Further research could be undertaken to in-tegrate external knowledge at multi-modal systems as well as multitasklearning for Visual Question Answering.

Keywords: Visual Question Answering, BERT, Vision Transformer,Co-Attention, Answerability, Yes/No Question, VizWiz-VQA.

3

Publications and Awards

Journals

[1] Tung Le, Huy Tien Nguyen, and Minh Le Nguyen. “Multi Visual and Textual Em-bedding on Visual Question Answering for Blind People” Neurocomputing, Sepem-ber 2021.

Conference papers

[2] Tung Le, Thong Bui, Huy Tien Nguyen, and Minh Le Nguyen. “Bi-direction Co-Attention Network on Visual Question Answering for Blind People” 14th Interna-tional Conference on Machine Vision (ICMV 2021), November 2021.

[3] Tung Le, Huy Tien Nguyen, and Minh Le Nguyen. “Vision and Text Transformer forPredicting Answerability on Visual Question Answering”, 2021 IEEE InternationalConference on Image Processing (ICIP), pp. 934-938, September 2021.

[4] Tung Le, Nguyen Tien Huy, and Nguyen Le Minh. “Integrating Transformer intoGlobal and Residual Image Feature Extractor in Visual Question Answering forBlind People.” Knowledge and System Engineering (KSE) 12th International Con-ference on IEEE, pp 31-36, November 2020.

[5] Tung Le and Nguyen Le Minh, “Integration of Textual Discriminator into Multi-head Attention Model in Relation Extraction”, Information system WINTER FESTAEpisode 5, September 2019. (without peer-review)

[6] Nguyen Tien Huy, Le Thanh Tung, and Nguyen Le Minh. “Opinions Summariza-tion: Aspect Similarity Recognition Relaxes The Constraint of Predefined Aspects.”Proceedings of the International Conference on Recent Advances in Natural Lan-guage Processing (RANLP 2019), pp 487-496, September 2019.

[7] Tung Le and Nguyen Le Minh, “Combined Objective Function in Deep LearningModel for Abstractive Summarization”, Proceedings of the Ninth International Sym-posium on Information and Communication Technology (SOICT 2018), pp 84-91,December 2018.

4

Abstract of Dissertation

Dissertation Title: Design and Development of Video Aided Retention Support System for Enhancing Disaster Survival Skills Among International Students

Intended Degree: Information Science Name: Safinoor SAGORIKA, ID: s1820014

Laboratory: Hasegawa Lab

Abstract

Part 1: Research Content

Background: Response to natural disasters and how to save lives and resources became a vital issue around many

countries in the world. As Japan is one of the most disastrous countries, academic institutions in Japan regularly provide

disaster survival skills training to reduce vulnerability and to create disaster awareness among the students. But providing

training to a large number of international students who come to Japan every year is a great challenge. In many cases,

these international students do not have enough knowledge and training on how to survive in a disaster situation while

living in Japan. The available literature and survey results show a significant gap in the field of disaster survival skills

(DSS) between Japanese and international students. It is found that around 82% of international students did not receive

any disaster drills or training prior to coming to Japan. The current education and training provided on disasters in Japan

are not necessarily specific based on their demand. However, there are diverse types of content used in DSS education

and training. Among them, video content received broad interest from the students and educators/instructors in a self-

directed video-based learning environment. But, in Japan, DSS video content specially designed for international students

is limited. Besides, unstructured long video contents consume learning time and concentration of the students resulting

in poor engagement and learning outcome from video content. In addition, scattered and unstructured short videos

available in different sources force students to lose their way of learning as well as miss some important content. Moreover,

tracking, and analyzing students’ learning behavior inside video parts including the attention and retention process to

support them during learning are missing in traditional video-based learning.

Objectives and Research Questions: To overcome these issues, the objective of the research is to analyze, design, develop,

implement, and evaluate the Video Aided Retention Tool (VART) to support international students in enhancing their

disaster survival skills through self-directed video-based learning. In pursuing the objectives, this research focused on

one Major Research Question (MRQ): How to develop an adaptive self-directed video-based learning support system for

enhancing DSS among international students? and five Subsidiary Research Questions (SRQs) as SRQ1: Which type of

content structuring systems are appropriate for the DSS video content? SRQ2: What type of domain, students’, e-teaching

strategy models are required for video-based DSS training? SRQ3: What is the process of integrating different models

with the VART system? SRQ4: How to implement the system among international students for providing DSS training?

and SRQ5: How to assess students ‘learning outcomes and provide necessary feedback and recommendation in video-

based training and learning process?

Research Method: The research follows the five phases of the ADDIE (Analysis, Design, Development, Implementation,

and Evaluation) model from the beginning to the end of the task as a framework for the VART system development in

the proposed platform. In the analysis phase, the research conducted a good number of literature reviews to realize the

current situation of disaster training and learning in Japan. Besides, the research completed a questionnaire survey and

collected primary data from 133 international students at the Japan Advanced Institute of Science and Technology

(JAIST) to realize the actual situation of DSS knowledge and experiences. In the design phase, the research provides the

Abstract of Dissertation

design structure/architecture of the four conceptual models for VART. The models are i) domain model, ii) students’

model, iii) e-teaching strategy model, and finally iv) a conceptual model with the integration of VART for supporting the

DSS learning. In the development phase, the research developed three content structuring systems: i) non-support (N)

traditional long video, ii) structured (S) long video with virtual fragmentation, and local indexed, and iii) branching (B)

scenario lessons with short videos to determine the appropriate content structure of VART. In the implementation phase,

the researcher conducted an experiment to identify the appropriate content structuring system and understand the

effectiveness of the proposed method. In the evaluation phase, the research compared the changing impact of the learning

outcome among the learners before and after implementation of the system, summarized and modified the functions where

necessary, and proposed the new system for implementation in the disaster survival education domain.

Experiment Results: To identify the appropriate content structuring system, the research conducted an experiment among

the 36 international students in JAIST to track students' watching and learning behaviors, including the attention and

retention process. Results show that the branching (B) scenario lessons are the most preferred by the participants (50%)

in the video-based learning system followed by the structured (S) video (45%). Very few participants (5%) only preferred

non-supported (N) video structure. In addition, the Normality test result shows that video 02 structured video (S) score

and video 03 branching video (B) scores are non-normal distribution, while video 01 non-support video (N) score is a

normal distribution. The Friedman test indicates that the statistical significance among the three videos is <.001, which

is below 0.05. So, it is statistically proven that the three videos have significant differences. In the Bonferroni correction,

we found statistical significances <.001 less than .017 between videos 01 and 02, and between videos 01 and 03. So, there

are also statistically significant differences between the video 01 scores with videos 02 and 03 scores. The result shows

positive effects on videos 02 & 03, and the score results are also higher than video 01. Therefore, from the different

analysis results, it is statistically proved that the video 02 structured method and video 03 branching method significantly

influence the video-based training and learning for acquiring DSS. In addition, to realize the structural relationship among

the students’ previous knowledge on DSS, watching duration, repetitions, clicks, and score from a video-based learning

environment, the research applied Structural Equation Model (SEM) using SmartPLS for videos 02 and 03. The SEM-

Partial Least Square (PLS) bootstrapping model fit analysis indicates that the d_ULS and d_G (Saturated and Estimated

model) values of videos 02 and 03 are in the supported range of SEM-PLS model fit. The Normed Fit Index (NFI) for

both types of video content are well supported (acceptable value between 0 and 1), which is 0.609 and 0.694 for structural

and branching videos, respectively.

After three months of the experiment, a delayed survey (post-experiment feedback survey) was also conducted

with the same 36 participants to investigate their remembering status and preferences of the Learning Objects (LOs) to

be learned next, with an aim to assist students in their learning process providing necessary recommendations.

Accordingly, the research first used students’ repetition data from the experiment to create group students’ repetitions

maps and single students’ learning path visualization maps. After that, it compared the group repetition data with the

delayed survey results for videos 02 and 03. The delayed survey data were also used to calculate the accuracy of the

proposed recommendation algorithms. Results showed that, in both video conditions, the number of repetitions had better

consistency than the delay survey feedback results. In addition, the number of repetitions of LOs had a significant impact

on the groups as well as the individual learning process. Hence, the research emphasizes that the LOs which received a

greater number of views, the system might recommend such LOs both for the individual or group of students from cold

standby and hot standby recommendation approaches. Finally, the research provided a couple of mathematical algorithms

Abstract of Dissertation

to create necessary recommendations for the groups as well as each individual student considering different conditions in

their learning process.

Part 2: Research Purpose

Academic Level or Basic System Functions: The research created the platform, developed content under different content

structures, added essential support functions to the videos, and allowed students to access the platform and learn from the

video domain. With the assistance of VART, the domain model displays the important contents, important video parts

with the local indexes, and students watch some videos as retention and the system gets the students’ model based on

watching history data. The VART then assists the e-teaching strategy model in receiving and combining data from the

students' model and the domain model and knowing the learner's attention and retention process. Based on the watching

and learning behavior data, the system determines instant feedback and recommendation to the students.

Originality and Novelty: The research designed, developed and integrated the three essential models with VART

especially for the video-based DSS training and learning point of view which is considered as the new addition in this

research. It used and integrated different tools with Moodle LMS and H5P interactive content plugins to create new

learning support systems that are vital for the students to acquire disaster survival knowledge and skills. These new

functions include three different types of content structuring systems, virtual fragmentation of long videos, adding local

indexes for inside video parts, unifying scattered short videos, adding interactions and mini-tests inside videos to realize

students’ skill learning outcomes, tracking students’ watching learning behavior including attention, and retention process,

creating group students’ repetition maps, individual students’ learning path visualization maps, and provide necessary

recommendations to the group and individual students, and so on to assist them in their learning process. In addition, the

approach of providing DSS training using selected video contents and fragmentation of the long videos into meaningful

chunks in small, highly focused materials in the structured video has similarities to microlearning. The branching scenario

lessons also have the same approach. From the microlearning point of view, the research has included a new function to

navigate retention for the important parts of the videos, which added external values in the content structuring system.

Possibilities: The proposed VART system helps to overcome the existing limitations in online video-based DSS learning,

and supports students acquiring the necessary DSS skills in a self-directed learning manner. The research experimented

to identify the appropriate content structuring system which is one of the important findings as well as contributions of

the experiment that might help learners and educators/instructors select appropriate video-based content structure in the

self-directed video and animation-based teaching, training, and learning. Besides, the summative assessment indicates a

significant improvement in students’ learning behavior and learning outcome. The repetition maps and learners' learning

path visualization map developed in this research assists both the learners and the educators to visualize the learners'

interactions with the contents with a quick look. This system is developed to monitor the learning progress of each student

and assist them in adjusting to the content structure dynamically. If a learner misses any important content from the

content list, the maps can indicate that missed part/parts. Similarly, after viewing the learning path map, learners can

avoid the contents that he/she has already mastered. So, it is expected that the experiment results and research outcome

might help the instructors/trainers to design and develop a proper online platform or system to provide DSS training

among international students in Japanese universities in a collaborative manner. In addition, this kind of video-based

learning support system might be easily adaptable and implementable to design and deliver diverse types of skill-based

Abstract of Dissertation

training courses among different groups of learners. Hence the range of the research outcome might be extended to many

other domains.

Keywords: Disaster Survival Skill (DSS), Video Aided Retention Tool (VART), Content Structuring System, H5P,

Video-based Learning.

Part 3: Research Accomplishment

International Journal Articles:

1. S. Sagorika and S. Hasegawa, “Design of Video Aided Retention Tool for the Health Care Professionals in Self-directed Video-based Learning,” Turkish Online Journal of Distance Education, vol. 21, no. Special Issue-IODL, pp. 121–134, 2020.

2. S. Sagorika and S. Hasegawa, “Development and Identification of Content Structuring System for Video-based Disaster Survival Skill Training Among the International Students” Journal of Educational Technology Research and Development. (On submission)

3. S. Sagorika and S. Hasegawa, “Development of Data-driven Learning Path Visualization Map and

Recommendation System in Video-based Disaster Survival Skill Training” Journal of Research and Practice in Technology Enhanced Learning. (On submission)

International Conference Proceedings:

Manuscripts Related to Major Research

4. S. Sagorika and S. Hasagawa, “Model of Video Aided Retention Tool for Enhancing Disaster Survival Skills on Earthquake among International Students,” in 28th International Conference on Computers in Education, 2020, vol. II, p. 215-225, November 2020, Tokyo, Japan.

5. S. Sagorika and S. Hasegawa, “Video Aided Retention Tool for Enhancing Decision-Making Skills Among Health Care Professionals.,” in International Open and Distance Learning Conference Proceedings Book, 2019, p. 305-312. IODL-2019, Eskişehir, Turkey.

6. S. Sagorika and S. Hasegawa, “Designing a Soft-skill Cultivation Platform for Health Care Professionals (HCPs),” in The 13th International Conference on Knowledge, Information and Creativity Support Systems (KICSS 2018), p. 212-217, 2018. Pattaya, Thailand: Artificial Intelligence Association of Thailand. Manuscripts Related to Minor Research

7. S. Sagorika, S. Kawanishi, and S. Hasegawa, “Adopting Orphan Migrants in Japanese Elderly Care Services: A Diversified Model for Ishikawa Prefecture”, in the 6th Asian Conference on Aging & Gerontology 2020 (AGen2020) Conference Proceedings, p.13-18, 2020. The International Academic Forum (IAFOR) 2020, Tokyo, Japan.

Construction A and D’ Lattices

for Power-Constrained Communications

Fan Zhou1720012

Information ScienceBits of Information, Transmitted and Stored (BITS) Lab

Research Content

In communications we are interested in how to reliably transmit information throughan unreliable channel. The information could be a text message, a piece of audio or somedata stored in a computer. An unreliable channel is noisy medium physically passingthe information from a point to another point, or saving the information now and re-trieving it later, such as Wi-Fi, an optical fiber, a magnetic disk drive and so on. In1948, Claude E. Shannon published a seminal paper entitled “A Mathematical Theoryof Communication”, in which he established the fundamental theorem for point-to-pointcommunications, and addressed that information can be efficiently and reliably transmit-ted by coding. Let the units information bits per channel bit denoted by R, called thecode rate. Shannon defined the maximum amount of information a channel can carry asthe channel capacity C, and showed that if R < C such codes exist to achieve reliablecommunications. Conversely, if the code rate R is greater than the channel capacity C,it is not possible to have reliable transmissions.

Error-correcting codes can provide reliable communications over unreliable channels,and are so named because they correct the errors that occur during transmission. Error-correcting linear codes are mainly defined in finite fields Fn

q with dimension k and blocklength n = k/R, while a lattice is defined as a discrete additive subgroup of the n-dimensional real space (or Euclidean space) Rn. A vector representing information con-sists of k symbols where each symbol is an element of {0, 1, . . . , q− 1}, and is mapped toa codeword of length n by a generator matrix. The set of all codewords is called a code-book/constellation with size qk. A lattice information integer vector of length n is mappedto a lattice point of length n. The set of lattice points consists of all linear combinationsof generator vectors and thus is infinite. In channel coding, a code is mainly measured bytwo properties: error-correction capability and code rate. The error-correction capabilityprovides reliable transmission, and high code rate allows larger amount of data transmis-sion per unit. Lattice codes can provide high code rates because they are constructed byan alphabet of size larger than that of the finite field codes.

The signals containing data are sent to a channel by the transmitter. The normof a signal is called the signal power, which determines how much power is required fortransmission. In practice the transmitter never has an infinite power, and thus the averagetransmit power shall be constrained. The n elements of a signal lie within a sphere ofradius

√nP around the origin, where P defines a power constraint. In a noisy channel,

the power of the noise is determined by the variance and mean of the distribution. Themost important noisy channel to consider is the additive white Gaussian noise (AWGN)channel, where the noise satisfies a Gaussian distribution. The ratio between the signal

1

power and the noise power is called the signal-to-noise ratio (SNR). For high code rate,transmission needs high SNR.

Assume an arbitrary sequence of data. To transmit the sequence, its elements needto be converted into a signal with the form appropriate for transmission in the channelsuch that transmitted signals satisfy the power-constraint and have zero mean. This canbe performed by a modulation technique. In modern communication systems, digitalmodulations are used, where the constellation consists of discrete points.

One of the widely used digital modulations is the quadrature amplitude modulation(QAM), which works well at low SNR with well-known finite field codes such as low-density parity-check (LDPC) codes, convolutional codes, Turbo codes, Polar codes andBCH codes. But QAM modulation scheme without probabilistic shaping on signals cannotachieve the AWGN channel capacity at high SNR. This is because they do not produceGaussian-like (or hypersphere-like) constellations, and such a constellation is essentialfor approaching the capacity of the channel when SNR is high. Lattices can providebetter constellations than QAM by applying lattice geometric shaping. In addition, atlow dimensions the QAM modulation scheme applied to finite field codes can also beregarded as a lattice constellation. One benefit provided by shaping on constellations iscalled the shaping gain, which measures the effectiveness of power-constraint and thusis higher the better, whose theoretic limit is 1.53 dB of a hypersphere constellation asn→∞.

Lattices are discrete points in Rn. The set of points in Rn that have the closestdistance to a lattice point than to any other lattice points is called the Voronoi region,thus a lattice point is at the center of the Voronoi region. Due to the discreteness andsymmetry of lattices, if a Voronoi region is shifted by every lattice point, the union of theshifted Voronoi regions cover the whole space of Rn. More importantly, a lattice is aninfinite structure. For practical use, the signal power must be constrained, thus a finiteset of points of the lattice must be selected, e.g., the intersection of the lattice and someregion. And this can be accomplished by lattice geometric shaping using the zero-centeredVoronoi region of some lattice, combining coding with modulation. The lattice performingshaping is called a shaping lattice, which needs to be a subset of the coding lattice, i.e., thelattice used for coding. The resulting intersection as a set of lattice points of the codinglattice that lie in the zero-centered Voronoi region of the shaping lattice is a nested latticecode, also known as a Voronoi code/constellation. Under this coding scheme, the codinglattice corrects errors while the shaping lattice satisfies the power constraint and providesthe shaping gain—the nested lattice code provides high code rate.

The simplest lattice is the one-dimensional integer lattice consisting of every integer asa lattice point. In the literature, several low-dimensional lattices are well-known especiallyfor their good shaping gain of Voronoi regions, e.g., the E8 lattice, the BW16 lattice andthe Leech lattice, where the decoding algorithms for BW16 and Leech lattice are not asefficient as decoding/quantizing E8. LDLC lattices, as an analog to LDPC codes, havegood coding properties but require a high decoding complexity. Lattices can also be builtfrom linear codes, using several methods such as Construction A, D, and D’. Applyingthese methods is to lift the codebook of linear codes from finite fields to the real space,and the resulting lattices are called Construction A lattices, Construction D lattices, andConstruction D’ lattices, respectively. Construction D and D’ are applied to a family ofnested binary linear codes. Construction A is the one-level special case of D suitable for

2

Table 1: Prototype matrix of first-level QC-LDPC code with length 2304.-1 -1 53 -1 15 56 -1 -1 55 35 -1 8 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1-1 -1 26 -1 -1 51 -1 59 14 -1 16 -1 0 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -118 -1 3 -1 -1 82 42 -1 33 -1 -1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 -1 -1 -1-1 30 73 53 -1 49 -1 -1 8 -1 -1 -1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 -1 -1-1 67 -1 15 84 -1 -1 -1 -1 -1 -1 -1 -1 3 -1 82 0 -1 -1 -1 -1 -1 -1 -1-1 -1 -1 -1 -1 71 83 34 -1 -1 -1 -1 -1 0 -1 -1 25 0 -1 -1 -1 -1 -1 -1-1 -1 -1 -1 -1 -1 8 27 87 -1 -1 -1 0 -1 -1 -1 -1 59 0 -1 -1 -1 -1 -1-1 -1 -1 -1 -1 -1 -1 -1 91 -1 62 52 -1 -1 -1 0 -1 -1 6 0 -1 -1 -1 -1-1 -1 -1 -1 -1 -1 -1 -1 -1 11 5 17 -1 -1 0 -1 -1 -1 -1 12 0 -1 -1 -1-1 -1 2 43 53 -1 -1 -1 -1 -1 -1 -1 -1 -1 73 -1 -1 -1 -1 -1 34 0 -1 -154 -1 26 -1 -1 12 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 9 0 052 91 -1 -1 -1 -1 -1 -1 -1 38 -1 -1 13 -1 -1 -1 -1 -1 -1 -1 -1 -1 66/71∗ 0

Table 2: Prototype matrix of second-level QC-LDPC code with length 2304.54 67 26 15 84 12 8 27 87 11 5 17 0 3 0 82 0 59 0 12 0 9 0 052 91 2 43 53 71 83 34 91 38 62 52 13 0 73 0 25 0 6 0 34 0 66/71∗ 0

an arbitrary linear code, which can produce lattices good for quantization but is generallytricky for good coding properties unless applied to a nonbinary code. Construction A andD use a generator matrix while D’ uses its inverse matrix called a parity-check matrix.There are some applications such as convolutional code lattices based on convolutionalcodes and Construction A, BCH code lattices based on BCH codes and Construction D,LDPC code lattices based on LDPC codes and Construction D’.

The goal of this work is to develop a nested lattice code with the following prop-erties: good error-correction capability, high shaping gain, and low-complexity encod-ing/decoding. This dissertation designed nested quasi-cyclic (QC)-LDPC codes as givenin Table 1 and Table 2. Using these codes, a Construction D’ lattice was built for coding,and was evaluated using four distinct lattices for shaping. For finding convolutional codelattices with a good tradeoff on shaping gain and quantization complexity, an exhaustivesearch was performed over convolutional code generator polynomials and the numericalresults are shown in Figure 1. The gap between the black and colored curve in Figure 2is approximately the shaping gain achieved for the corresponding nested lattice code.

24 36 72 144 288 576 1152 2304

0.80

1.00

1.20

1.40

1.50

E8

BW16

Leech

Sphere bound

Asymptotic 1.53 dB

Dimension n

Shap

inggain,dB

m “ 2 m “ 3 m “ 4 m “ 5 m “ 6 m “ 7RZTCC “ 1{2RZTCC “ 1{3RTBCC “ 1{2RTBCC “ 1{3

Figure 1: Best-found shaping gain of convolu-tional code lattices.

37.5 38 38.5 39 39.5 40 40.5 41 41.510´5

10´4

10´3

10´2

10´1

100

Shannon limit

Eb{N0, dB

WordError

Rate

R “ 8.2947,CCL shaping

R “ 8.3090,Leech lattice shaping

R “ 8.2959, BW16 lattice shaping

R “ 8.2993, E8 lattice shaping

R1 “ 8.2993, hypercube shaping

Figure 2: Four distinct lattices shaping a Con-struction D’ lattice.

Research Purpose

Lattices have the potential to provide reliable and power-efficient data transmission inthe next-generation wireless communications. Information theory has provided remark-able insights into lattices and their applications for practical communication systems. Thebenefits of lattices for communications are: 1) high code rate 2) higher transmit powerefficiency than conventional QAM constellations and 3) they form an essential compo-nent of compute-and-forward relaying, which provides high throughput and high spectralefficiency.

If lattice codes are to be used in practical point-to-point wireless communication sys-tems, what would they look like? This dissertation suggests that the coding lattice wouldbe based on QC-LDPC codes because these codes are already widely used in wireless com-

3

munication systems. The shaping lattice would be based on convolutional codes, becausethey provide good shaping gain and are widely understood.

This dissertation addresses the design of nested lattice codes with potential propertieswhich have good coding properties, good shaping properties, and have low-complexityencoding and decoding. Construction D’ lattices based on QC-LDPC codes are for cod-ing and thus contribute to reliable data transmission. Construction A lattices based onconvolutional codes are used to satisfy the channel power-constraint and provide shapinggain. These constructions have group property and provide high code rates.

Two encoding methods and a decoding algorithm for Construction D’ coding latticesthat can be used with shaping lattices for power-constrained channels are given. Themultistage decoding algorithm uses successive cancellation by employing binary decodersof the component binary codes that form a Construction D’ lattice. An indexing methodfor nested lattice codes is modified to avoid an integer overflow problem at high dimension.Convolutional code generator polynomials for Construction A lattices with the greatestshaping gain are given, the result of an extensive search. It is shown that rate 1/3convolutional codes provide a more favorable performance-complexity trade-off than rate1/2 convolutional codes. For a given dimension, tail-biting convolutional codes have highershaping gain than that of zero-tailed convolutional codes. A design for QC-LDPC codesto form Construction D’ lattices is presented, where their parity-check matrices can beeasily triangularized, thus enabling efficient encoding and indexing when formed a nestedlattice code. The resulting QC-LDPC Construction D’ lattices are evaluated using fourshaping lattices: the E8 lattice, the BW16 lattice, the Leech lattice and the best-foundconvolutional code lattice, showing a shaping gain of approximately 0.65 dB, 0.86 dB,1.03 dB and 1.25 dB at dimension 2304.

Research Accomplishment

• F. Zhou and B. M. Kurkoski, “Construction D’ lattices for power-constrained com-munications,” submitted to IEEE Transactions on Communications, available atarXiv:2103.08263 [cs.IT]. Under review—major revision.

• F. Zhou, A. Fitri, K. Anwar, and B. M. Kurkoski, “Encoding and decoding Con-struction D’ lattices for power-constrained communications,” in Proceedings of the2021 IEEE International Symposium on Information Theory, Melbourne, Australia,July 2021, pp. 1005–1010. Reviewed.

• F. Zhou and B. M. Kurkoski, “Shaping gain of lattices based on convolutional codesand Construction A,” in Proceedings of the 2018 International Symposium on Infor-mation Theory and its Applications (ISITA), Singapore, October 2018, pp. 183–187.Reviewed.

• F. Zhou and B. M. Kurkoski, “On low-dimensional convolutional code lattices whichare good for shaping,” Croucher Summer Course in Information Theory, HongKong, China, July 2017. Reviewed.

• F. Zhou and B. M. Kurkoski, “Shaping LDLC lattices using convolutional codelattices,” IEEE Communications Letters, vol. 21, no. 4, pp. 730–733, April 2017.Reviewed.

4

Doctoral Dissertation

Incorporating the Locality Sensitive Hashing

Technique into Clustering Algorithms for Massive

Categorical and Mixed Datasets

Nguyen Mau Toan

1620409

Supervisor: Professor Huynh Van Nam

Graduate School of Advanced Science and Technology

Japan Advanced Institute of Science and Technology

(Information Science)

December 2021

Abstract

Part 1: Research Content

The dissertation aims to develop a novel clustering framework for handling massive datasets of

both categorical and numerical attributes. The proposed framework focusses on optimizing the

clustering initialization and reducing complexity at the same time. In detail, the dissertation

contains 6 chapters, and the contents of chapters can be briefly described as follows:

In chapter 1, we introduce the research background and scope the problems of the

conventional clustering algorithms for big categorical and mixed datasets. We also briefly state

the motivations, research objectives, and the contributions of this dissertation.

In chapter 2, we state the problems of crisp clustering problem for mixed data of

numerical and categorical data. There are two main processes of a k-means-like algorithm:

clustering initialization and clustering iteration. Furthermore, three approaches that can optimize

the clustering algorithms were also introduced, namely clustering representation optimization,

cluster initialization optimization, and cluster iteration optimization. It is clear that clustering

representation optimization and cluster initialization optimization slow down the algorithm

because of complex structure and extra computation. Using clustering iteration optimization can

affect to reduce the complexity of the clustering algorithm.

In chapter 3, we show the necessary preliminaries that are used in our research. In detail,

the principles of LSH is applied to predict the semi-clusters while context-sensitive measures and

maximum-cut solution help to achieve the good LSH hash functions. Moreover, the principle,

advantages, and disadvantages of the related research with this research are also briefly shown in

this chapter.

Chapter 4 proposes LSH-k-prototypes for clustering mixed datasets with the utilization of

LSH to predict the potential natural clusters. The shortlists of neighbor clusters are also used to

reduce the required computations in each iteration. Furthermore, LSH-k-representatives is the

modification of LSH-k-prototypes, which works for categorical values only.

LSH-k-representatives and LSH-k-prototypes can predict the initial clusters based on the

locality-sensitive factors of objects. In detail, the larger LSH buckets are predicted as the core

initial clusters. As a result, LSH-k-representatives and LSH-k-prototypes have the highest

clustering effectiveness scores among other related works. By applying the dynamic shortlists for

clusters, LSH-k-representatives can increase the clustering speed up to from 2 to 32 times

compared to its original method. Beside these advantages, because our proposed methods come

with the process of building the LSH hash table for all attributes, the total time for predicting

initial clusters of our methods is higher than others. Especially, when creating the hash function

for the attribute with a massive number of unique categorical values, our methods take more time

to calculate the dissimilarity matrix and find the maximum cut. We recommend future research

to use other partitioning techniques to subdivide these categorical domains into multiple

two-subsets; for example, a k-means-like algorithm can be utilized for this task.

Chapter 5 proposes two methods so-called Fk-centers and LSHFk-centers. Fk-centers is

the first method to apply the fuzzy kernel-based representation (fcenter) into the fuzzy clustering

algorithm of categorical data. Moreover, LSHFk-centers is the method that incorporates our

LSH-based cluster prediction techniques into LSHFk-centers algorithm. Fk-centers has better

representation than other related works, which can exploit the observed information of

categorical values following the complementarity balance of the importance of each unique

categorical value in each fuzzy cluster. As the result, Fk-centers has a higher average fuzzy

silhouette score than other related works. When applying the modified version of our LSH-based

cluster prediction technique into Fk-centers, LSHFk-centers can even have higher average fuzzy

silhouette scores with extremely high stability. However, a hash function is currently extracted

from a single attribute; therefore, the inter-attribute properties have not been untapped. other

research may try different techniques to compute the optimal values for the smoothing

parameters other than using LSCV. Thereby, it can either reduce the computation time or

increase the accuracy of smoothing parameters.

Chapter 6 yields the conclusion, limitations, and future works for the whole dissertation.

Part 2: Research Purpose

The research includes two main purposes, the first purpose is to propose a novel technique to

predict the potential cluster centers for k-means-like algorithms while the second purpose is to

propose a new approximate method to reduce the computation of clustering iteration.

For the first purpose, A better prediction can lead to a higher chance to capture the global

optimal solution. To achieve a better or the global solution, the clustering algorithm must seek

the solutions several times with different initial states. Therefore, with better initializations, the

number of required epochs can be reduced significantly for capturing the global optimum. For

this reason, a “good” initial state is very important to achieve the global optimum. In this

research, we first aim to propose a new scheme to use a dimension reduction technique so-called

Locality-Sensitivity Hashing (LSH) to predict such “good” initial state of the cluster so that the

global optimal can be potentially obtained. The empirical experiment using real and synthetic

datasets showed that our proposed method LSH-k-representatives and LSH-k-prototypes not only

can outperform other related works in terms of clustering accuracy but also have the best

consistency for clustering categorical and mixed data, respectively. However, the proposed

LSH-based cluster prediction requires extra processes in order to create the LSH hash table,

which makes the proposed method not ready for handling big data yet.

For the second purpose, clustering iterations are the heaviest task in k-means-like

algorithms. This task includes the distance computation from objects to cluster representations.

Reducing these distance computations can sharply decrease the complexity of the clustering

algorithm. Dimension reduction and data sample are the most used techniques that can

approximate the clustering procedures. However, these approaches change the nature of the data

instead of changing the algorithm to make it more appropriate. This dissertation also fills such

shortcoming by proposing a new heuristic approach for approximately reducing the complexity

of a typical k-means-like algorithm. In detail, the proposed method can avoid the potential

unnecessary distance computations from objects to cluster representations in each iteration.

Consequently, after applying our proposed method into LSH-k-representatives, the incorporated

algorithm can process up to 2 to 32 times faster than its own original version with comparable

clustering accuracy.

Part 3: Research Accomplishment

The accomplishment of this work is presented from perspectives of theoretically and

practically for the crisp clustering and fuzzy clustering problems. This accomplishment is being

demonstrated by the publications in both international journals and international conferences as

follows:

International Journals:

[1] T. N. Mau and V.-N. Huynh, “An LSH-based k-Representatives Clustering Method for Large

Categorical Data.” Neurocomputing, Volume 463, 2021, Pages 29-44, ISSN 0925-2312, DOI:

https://doi.org/10.1016/j.neucom.2021.08.050, (peer review).

[2] T. N. Mau, Y. Inoguchi and V.-N. Huynh, “A Novel Cluster Prediction Approach based on

Locality-Sensitive Hashing for Fuzzy Clustering of Categorical Data.” IEEE Access, (To be

submitted).

International Conferences:

[1] T. N. Mau and V. -N. Huynh, “Kernel-Based k-Representatives Algorithm for Fuzzy

Clustering of Categorical Data,” 2021 IEEE International Conference on Fuzzy Systems

(FUZZ-IEEE), 2021, pp. 1-6, DOI: https://doi.org/10.1109/FUZZ45933.2021.9494597, (peer

review).

[2] T. N. Mau and V. -N. Huynh, “Automated Attribute Weighting Fuzzy k-Centers Algorithm

for Categorical Data Clustering,” Modeling Decisions for Artificial Intelligence. MDAI 2021.

Lecture Notes in Computer Science, vol 12898. Springer, Cham 2021, pp. 205-217, DOI:

https://doi.org/10.1007/978-3-030-85529-1_17, (peer review).

Keywords: Cluster analysis, fuzzy cluster analysis, categorical data, mixed data, LSH, cluster

initialization, big data.


Recommended