Project post
Click to open
I need an experienced data scientist or AI specialist to assist with evaluating a Korean dataset using OpenAI API calls. my goal is to set up a good testing template I can repeat
Key Aspects:
- Inference Speed: Achieving a balance of optimal inference speed is vital for this project. The desired inference speed for the evaluation stands at a moderate level when considering 2000 users.
- Total Processing Accuracy: I’m looking to ensure the total processing accuracy of this evaluation is high.
- TPA (Total Processing Accuracy): Aspects of TPA are crucial for this project; we aim for a balanced importance between total processing accuracy and inference speed. (<1 /s is imperative)
- Implement this on a shared googleColab or Jupyternotebook for on-demand use of the application
Ideal Skills and Experience:
- Proficiency in Korean language and understanding of language nuances.
- Previous experience with data evaluation using OpenAI API calls.
- Proven track record in achieving optimal inference speeds for data processing.
- Demonstrated ability to balance and optimize processing accuracy alongside inference speed.
- Strong communication skills to provide regular updates on the project’s progress.
refer to this documentation; https://python.langchain.com/docs/langsmith/walkthrough/
request for access to the example document where all the testing data and instructions should be maintained and updated; https://docs.google.com/document/d/16GFgABbbnH7RXgL56SXNu41xRspstgtVcy0TQxmwd-A/edit?usp=drive_link
API example; https://api-engtoprod.meta-wedit.com/api-docs#/USER%20API/UsersController_perpectConversation
deliverables; The test procedure as described and per our discussion to the format of the sample google doc. Set of scripts for JMeter configs, python for accuracy and inference test. (hopefully Colab) A working PC implementation to run the test, on a machine and VM (dockerized) to replicate the same testing environment repeatedly. A full documentation of test procedure, steps, screenshots, reference links, of running tests, and setups.
revised requirments
Key responsibilities:
- Conducting performance tests on LLMs.
- Analysing the results and comparing them to identify the best performing system.
- Providing recommendations on how to optimize the performance of the chosen LLM.
- Set up a local testing environment (chatbot, playground dashboard, Jmeter 5.5)
- Set up a cloud server (API server)
- Google Colab, Jupyter Notebook
- All configuration, test scripts in python, bash and exe)
- Dockerized images for future implementation
- Our Korean dataset of 20,000 published to HuggingFace workspace based on an existing Korean dataset
- documentation 1. testing items/procedures 2. instructions, manuals of this project.
Models to consider:
llama3-8b-8192
text-davinci-003
Mixtral 8x22b
Whisper
Reference:
https://pf7.eggs.or.kr/aigenerative_overview.html
Sample testing items (send me a permission request with your name plz)
https://docs.google.com/document/d/16GFgABbbnH7RXgL56SXNu41xRspstgtVcy0TQxmwd-A/edit
Project resources
googleDrive
overview
| TestItems | uconc | q&aDataset / hfConverse | hfQ&A / hfQ&A2 | projectBlog | hfQ&A3
datasets
hfDataset check under metrics for various testing items
HuggingFace Korean dataset 일상대화 : 다양한 질의답1 : 네이버 지식인 질의답 : 다양한 질의답2 :




다양한 질의답2 : https://huggingface.co/datasets/unoooo/alpaca-korean
our dataset
other models
Models to consider: llama3-8b-8192 text-davinci-003 Mixtral 8x22b Whisper
AWS server
WindowsServer port: on MS RemoteDesktop itsInstance accessible only via aws console

Windows Sever
Setup, Jmeter
Creds and IP
mlOps
[JupyterNotebook]
LLM candidates
Models to consider: | llama3-8b-8192 | text-davinci-003 | Mixtral 8x22b | Whisper |
misTral for mistral llm api docs
gorqCloud and playgroundConsole
ncSoftVaroq Korean LLM by NCSoft based varco llm and kendra
gpt4all locally hosted chatbot opensource.
Day 1, day2, day 3, day 4, day 5, day 6
Click to open
Day1
- Shared creds with freelancers
- Task assigned among the group
- AWS WindowServer instance
- Completed the testing pc on the server Day2
- Application Server
- API server
- Lib install and setup
- Download llm models
- Write scripts Day3 (today)
- Write scripts using quantization of llms models
- setup the application server.
- pythong library issue
- model candidates per quantization
- flask app installed
performance test
metrics
perplexity, accuracy, wer
Click to open
├───metrics
│ ├───accuracy
│ ├───bertscore
│ ├───bleu
│ ├───bleurt
│ ├───cer
│ ├───chrf
│ ├───code_eval
│ ├───comet
│ ├───competition_math
│ ├───coval
│ ├───cuad
│ ├───exact_match
│ ├───f1
│ ├───frugalscore
│ ├───glue
│ ├───google_bleu
│ ├───indic_glue
│ ├───mae
│ ├───mahalanobis
│ ├───matthews_correlation
│ ├───mauve
│ ├───mean_iou
│ ├───meteor
│ ├───mse
│ ├───pearsonr
│ ├───perplexity
│ ├───precision
│ ├───recall
│ ├───roc_auc
│ ├───rouge
│ ├───sacrebleu
│ ├───sari
│ ├───seqeval
│ ├───spearmanr
│ ├───squad
│ ├───squad_v2
│ ├───super_glue
│ ├───ter
│ ├───wer
│ ├───wiki_split
│ ├───xnli
│ └───xtreme_s
`
The following wiki, pages and posts are tagged with
Title | Type | Excerpt |
---|---|---|
2021-10-04-wiki-colloseo.md | post | 추천의 원리 더 깊게 보기 클러스터링, 협업필터링, 프로파일링 |
2021-10-04-wiki-googleapi-image-search.md | post | 동영상 검색 기술을 활용한 서비스 등장 |
2021-10-04-wiki-recopic.md | post | 개인화추천- 이커머스- 클러스터링- 협업필터링- 프로파일링 |
2021-10-04-wiki-tmong.md | post | 서비스 제작 사례를 통해 서비스 기획 프로세스를 알아봅니다. |
Weather app from firebase | post | Sunday-weather-app, open weather api |
Bridging Language Barriers with Blockchain Technology | post | Tue, Apr 16, 24, LangChain is a revolutionary platform leveraging blockchain technology to facilitate seamless communication and collaboration across languag... |
AWS Korean Voice ChatGPT: Enhancing Conversational AI with Hugging Face | post | Sat, Apr 20, 24, Leveraging state-of-the-art deep learning techniques and pretrained language models, Korean Voice ChatGPT enables seamless and natural conve... |
Performance test for aidoncent based on EnglishTogether | post | Mon, Apr 22, 24, Aidocent setup to performance test within 5 days |
aidocent performance test | post | Fri, Apr 26, 24, aidocent performance test |
Exploring Edge AI Technologies | post | Tue, May 21, 2024, A comprehensive guide on Edge AI technologies, their opportunities, limitations, and practical applications. |
Exploring Edge AI Technologies | post | Tue, May 21, 2024, A comprehensive guide on Edge AI technologies, their opportunities, limitations, and practical applications. |
github and hf implementation | post | Wed, May 22, 24, run Mistral7B locally and integrate with existing llm app |
FPGA Overview | post | Wednesday, FPGA is fast-growing and most adaptable ai application at the edge |
Leave the routines to ai at the edge and always keep yourself on the loop | post | Wed, May 29, 24, prototyping an llm ai on fpga |
Workflow and Architecture of AI models on Edge devices using FPGA | post | Fri, May 31, 24, comprehensive framework for deploying AI models at the edge, leveraging various technologies. how to connect with jupyternotebook |
locally serving llm chatbots | post | Tue, Jun 11, 24, using langchain production ready llmrag |
현장의 요구 사항을 반영한 실전 Voice AI 개발 플랫폼 | post | Sun, Sep 21, 25, Practical Voice AI platform that aligns field requirements with ASR/TTS/NLU integration, pipelines, and deployment |
Voice Platform — Extract from Kor2Unity summary | post | Mon, Sep 22, 25, Extracted items from Kor2Unity issue summary |
Goorm AI워크로드 최적화 클라우드 엔지니어링 트랙 지원 | post | Mon, Sep 22, 25, AI 트랙 지원용 링크 모음 |
Exploring Jetson Nano in AIoT Applications | page | Jetson Nano serves as a potent platform for Edge AI applications, supporting popular frameworks like TensorFlow, PyTorch, and ONNX. Its compact form factor a... |
🔭sensor detection | page | RealSense with Open3D |
{# nothing on index to avoid visible raw text #}