Philip May on LinkedIn: #aicc #ihub #pix #vti #genai #llm #rag | 13 comments (2024)

Philip May

Data scientist and open source enthusiast with NLP focus @ Deutsche Telekom

  • Report this post

We are a small group of AI experts in the Deutsche Telekom AICC (AI Competence Center). Our task is to train use case specific LLMs for the Telekom business domain. For example, we work with Mixtral and Llama models. We have started a new blog on our internal social media platform. There we share our insights, experiences and news. Our first article is about how we found a semi-RAG system.A semi-RAG system is not something you intentionally build or invent. It is a certain state of a RAG system where an LLM has to combine parametric knowledge (acquired through training) with prompt knowledge (from the knowledge DB).One reason you need to do this is that user questions are so extremely wide-ranging that it is extremely difficult to have all the questions in advance in finished texts in the knowledge database. The other reason is that your knowledge base may simply be too limited.If you train your own use-case specific open LLMs for such systems, then the training data must be designed in a different way. This was a very important realization on our way to successful on-premises LLMs.If you are a Deutsche Telekom or T-Systems International employee, you can read all the details on our blog:https://lnkd.in/eiB4sPmyAnd please do not hesitate to press the subscribe button on Yam-United if you want to receive updates. 😉 #AICC #iHub #PIX #VTI #GenAI #LLM #RAG

  • Philip May on LinkedIn: #aicc #ihub #pix #vti #genai #llm #rag | 13 comments (2)

92

13 Comments

Like Comment

Dr. Hamed Ketabdar

GenAI Lead at Deutsche Telekom, Lecturer at TU Berlin

1d

  • Report this comment

Thanks Philip! is Semi-RAG same i related to what is called 'Semi-Structured RAG' in the community?

Like Reply

1Reaction 2Reactions

Like Reply

1Reaction 2Reactions

Orkhan Amrullayev

Data Scientist | ML/LLM Engineer

1d

  • Report this comment

Is the website restricted for outside of Germany? Because it says “Page is unavailable”

Like Reply

1Reaction 2Reactions

Dr. Jan Philipp Harries

Taming LLMs @ ellamind

2h

  • Report this comment

Philip May this is really cool, would love to read the blog. Great stuff that you and the team are doing 👍.BTW: Will you be able to join the 2nd #AIDEV2 on 9/24? I think this would fit very well 😉.

Like Reply

1Reaction

Shubham Kharola

Business Analyst at Deutsche Telekom Digital Labs

9h

  • Report this comment

Anand Saurabh

Like Reply

1Reaction

Vinzent Wuttke

Helping mid-sized global market leaders to bring ML into production | Leiter Business Development @ Datasolut

1d

  • Report this comment

Thanks for sharing with the community. That is amazing!

Like Reply

1Reaction 2Reactions

See more comments

To view or add a comment, sign in

More Relevant Posts

  • Philip May

    Data scientist and open source enthusiast with NLP focus @ Deutsche Telekom

    • Report this post

    I still remember very clearly how I trained a semantic bilingual German-English embedding model almost four years ago. Back then for T-Systems on site services GmbH. Nowadays, with the hype of #RAG, it is probably one of the most popular German-language open source models, measured by the more than one million downloads per month.This success fills me with joy and pride and also with some doubts. But why the doubts?I think it is important to understand that there is a big difference between semantic embeddings and Q/A retrieval models.Semantic embeddings can be used to cluster texts. Or, for example, to search large texts on the basis of a few keywords. However, they are less suitable for finding answers to questions.The reason is that the semantic similarity between a question and a text with the answer does not necessarily have to be high. For this reason, Q/A retrieval models that are trained to embed questions and potential answers close to each other are primarily suitable for retrieval in RAG systems.I'm afraid many are using my semantic embedding as a replacement for a Q/A retrieval model in a RAG application. This should not be done.At Telekom, we use self-trained Q/A retrieval models for our RAG retrieval. We also have our own data sets for this. For other EU languages, by the way, we have had very good experiences with the intfloat/multilingual-e5-large model. Incidentally, this also works very well for German.- my semantic bilingual German-English embedding called "T-Systems-onsite/cross-en-de-roberta-sentence-transformer": https://lnkd.in/eSx6kc6m- the "intfloat/multilingual-e5-large model" model: https://lnkd.in/e7aetC3tDeutsche Telekom #AICC #iHub #PIX #weloveai #VTI #GenAI #LLM

    • Philip May on LinkedIn: #aicc #ihub #pix #vti #genai #llm #rag | 13 comments (13)

    153

    16 Comments

    Like Comment

    To view or add a comment, sign in

  • Philip May

    Data scientist and open source enthusiast with NLP focus @ Deutsche Telekom

    • Report this post

    I made a systematic comparison of the pandas file formats, compression methods and compression levels. The comparison is based on the compression rate and the save/load times.The article can be found here: https://lnkd.in/e6w7pWSXTL;RD:If you consider the compression method together with the compression level, zstd is the best option. This is especially true for compression levels 10 to 12.In terms of data format, Feather seems to be the best choice. Feather has a better compression ratio than Parquet. Up to a compression level of 12, the storage times of Parquet and Feather are practically the same. The loading times of Feather are definitely and significantly better than those of Parquet.For these reasons, Feather seems to be the best choice in combination with zstd and a compression level of 10 to 12.This can be done with:df.to_feather("filename.feather", compression="zstd", compression_level=10)

    Pandas Data Format and Compression # philipmay.org

    28

    3 Comments

    Like Comment

    To view or add a comment, sign in

  • Philip May

    Data scientist and open source enthusiast with NLP focus @ Deutsche Telekom

    • Report this post

    Since some time I like #DVC but also #Jupyter #Notebooks. Because the DVC examples only ever show Python scripts, I always wondered how you can still use notebooks for the pipelines. Now this is the solution. Thanks for sharing Alaeddine Abdessalem!Deutsche Telekom, #AICC

    11

    1 Comment

    Like Comment

    To view or add a comment, sign in

Philip May on LinkedIn: #aicc #ihub #pix #vti #genai #llm #rag | 13 comments (25)

Philip May on LinkedIn: #aicc #ihub #pix #vti #genai #llm #rag | 13 comments (26)

2,868 followers

  • 92 Posts
  • 7 Articles

View Profile

Follow

More from this author

  • The German colossal, cleaned Common Crawl Corpus released Philip May 3y
  • Talk: Challenges and Potentials in the Training of German Language Models Philip May 3y
  • Cross Language Sentence Model for Semantic Search released Philip May 3y

Explore topics

  • Sales
  • Marketing
  • Business Administration
  • HR Management
  • Content Management
  • Engineering
  • Soft Skills
  • See All
Philip May on LinkedIn: #aicc #ihub #pix #vti #genai #llm #rag | 13 comments (2024)
Top Articles
Busted Newspaper Wayne County KY Arrests
Busted Newspaper Greenup County KY Arrests
UPS Paketshop: Filialen & Standorte
Phcs Medishare Provider Portal
Fort Carson Cif Phone Number
Overnight Cleaner Jobs
Marist Dining Hall Menu
Yi Asian Chinese Union
Victoria Secret Comenity Easy Pay
Rochester Ny Missed Connections
Craigslist Chautauqua Ny
Chastity Brainwash
Ssefth1203
‘Accused: Guilty Or Innocent?’: A&E Delivering Up-Close Look At Lives Of Those Accused Of Brutal Crimes
N2O4 Lewis Structure & Characteristics (13 Complete Facts)
History of Osceola County
Praew Phat
Itziar Atienza Bikini
TBM 910 | Turboprop Aircraft - DAHER TBM 960, TBM 910
Danforth's Port Jefferson
eHerkenning (eID) | KPN Zakelijk
Clare Briggs Guzman
Happy Life 365, Kelly Weekers | 9789021569444 | Boeken | bol
Bòlèt Florida Midi 30
All Obituaries | Verkuilen-Van Deurzen Family Funeral Home | Little Chute WI funeral home and cremation
Drying Cloths At A Hammam Crossword Clue
Delectable Birthday Dyes
Beaufort 72 Hour
Inter Miami Vs Fc Dallas Total Sportek
Dashboard Unt
Free T33N Leaks
Intel K vs KF vs F CPUs: What's the Difference?
Bayard Martensen
Stockton (California) – Travel guide at Wikivoyage
Salemhex ticket show3
Swimgs Yuzzle Wuzzle Yups Wits Sadie Plant Tune 3 Tabs Winnie The Pooh Halloween Bob The Builder Christmas Autumns Cow Dog Pig Tim Cook’s Birthday Buff Work It Out Wombats Pineview Playtime Chronicles Day Of The Dead The Alpha Baa Baa Twinkle
Fox And Friends Mega Morning Deals July 2022
Leland Nc Craigslist
One Credit Songs On Touchtunes 2022
AsROck Q1900B ITX und Ramverträglichkeit
Aliciabibs
Tillman Funeral Home Tallahassee
Nsav Investorshub
Fwpd Activity Log
Craigslist - Pets for Sale or Adoption in Hawley, PA
Hovia reveals top 4 feel-good wallpaper trends for 2024
Alston – Travel guide at Wikivoyage
Jamesbonchai
Rocket Lab hiring Integration & Test Engineer I/II in Long Beach, CA | LinkedIn
Spurs Basketball Reference
Arre St Wv Srj
Guidance | GreenStar™ 3 2630 Display
Latest Posts
Article information

Author: Sen. Ignacio Ratke

Last Updated:

Views: 6231

Rating: 4.6 / 5 (56 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Sen. Ignacio Ratke

Birthday: 1999-05-27

Address: Apt. 171 8116 Bailey Via, Roberthaven, GA 58289

Phone: +2585395768220

Job: Lead Liaison

Hobby: Lockpicking, LARPing, Lego building, Lapidary, Macrame, Book restoration, Bodybuilding

Introduction: My name is Sen. Ignacio Ratke, I am a adventurous, zealous, outstanding, agreeable, precious, excited, gifted person who loves writing and wants to share my knowledge and understanding with you.