Linux/Unix
Product Overview
- A text corpus cleaning library for large model training, supporting functions such as filtering, cleaning, and deduplication of text corpus data
2. A high-performance programming framework compatible with Spark SQL, PySpark, Pandas and other programming interfaces and unified Data + AI
3. Provides a distributed computing programming platform based on C++ to support the development of high-performance data processing modules
4. It processes massive large-scale graph data and provides a high-performance PageRank isometric calculation algorithm
5. Machine learning library for massive large-scale data sets, providing traditional machine learning algorithms such as K-Means and KR