[recruit] SKT AI Fellowship 3기 모집

  • 상세 페이지: https://www.sktaifellowship.com/
  • 지원 자격
    • AI 기술 개발에 관심이 있거나 경험이 있는 대학(원)생 누구나
    • 3인 1팀 또는 2인 1팀으로 구성하여 지원
  • 지원 혜택
    • 정규 채용 지원 시 서류 전형 우대
    • SKT 현업 개발자의 멘토링
    • 과제 수행을 위한 연구비 600만원
    • 최종 우수 프로젝트 상금 400만원
      (최우수팀 400만원, 우수팀 300만원)
    • SK그룹 주요 기술 행사 초청
  • 주요 일정
    • 지원 접수: 04/15(목) – 05/16(일) 자정까지
    • 서류 결과 발표: 05/25(화)
    • 온라인 인터뷰/PT 심사: 05/31(월)
    • 3기 오리엔테이션: 06/04(금)
    • 중간 리뷰: 8월
    • 프로젝트 최종 발표: 11월 초
  • 연구 과제
    • 서비스 로봇용 신규 Vision AI 응용 기술 개발
    • 지능형 Data 검색 엔진 개발
    • GAN으로 생성된 거짓 영상 판별 기술 개발
    • Smart Factory 서비스를 위한 진동/압력/온도 센서의 Anomaly Detection 개발
    • 5GX MEC 기반 Vision AI 응용 모델 개발
    • Self-supervised Learning on Billion Unlabeled Image Data 및 Full Stack Product
    • AI 기반 카메라 위치 추정 및 광고판/간판 검출 기술 연구
    • KoBERT/KoGPT/KoBART 기반 언어처리 Application 개발
    • Kinect 데이터 기반 스마트 물류 자동인식 기술 개발
    • Multi-modal 감정 인식 AI 모델 개발
    • AI 기반 고 디지털 미디어 복원 기술 개발

[seminar] A review of on-device fully neural end-to-end speech recognition and synthesis algorithms

  • 연사: Dr. Chanwoo Kim (Vice President, Samsung)
  • 방식: 비대면 (webex)
  • 주소: https://dongguk.webex.com/dongguk/j.php?MTID=m76990a14d544ddb14ad59f8ed6638d5d
  • 비밀번호: aixx
  • 초록: In this talk, we review various end-to-end automatic speech recognition and speech synthesis algorithms and their optimization techniques for on-device applications. Conventional speech recognition systems comprise a large number of discrete components such as an acoustic model, a language model, a pronunciation model, a text-normalizer, an inverse-text normalizer, a decoder based on a Weighted Finite-State Transducer (WFST), and so on. To obtain sufficiently high speech recognition accuracy with such conventional speech recognition systems, a very large language model (up to 100 GB) is usually needed. Hence, the corresponding WFST size becomes enormous, which prohibits their on-device implementation. Recently, fully neural network end-to-end speech recognition algorithms have been proposed. Examples include speech recognition systems based on Connectionist Temporal Classification (CTC), Recurrent Neural Network Transducer (RNN-T), Attention-based Encoder-Decoder models (AED), Monotonic Chunk-wise Attention (MoChA), transformer-based speech recognition systems, and so on. The inverse process of speech recognition is speech synthesis where a text sequence is converted into a waveform. Conventional speech synthesizers are usually based on parametric or concatenative approaches. Even though Text-to-Speech (TTS) systems based on the concatenative approaches have shown relatively good sound quality, they cannot be easily employed for on-device applications because of their immense size. Recently, neural speech synthesis approaches based on Tacotron and Wavenet started a new era of TTS with significantly better speech quality. More recently, vocoders based on LPCnet require significantly smaller computation than Wavenet, which makes it feasible to run these algorithms on on-device platforms. These fully neural network-based systems require much smaller memory footprints compared to conventional algorithms.