
Houdu Chukonu
Overview
- A text corpus cleaning library for large model training, supporting functions such as filtering, cleaning, and deduplication of text corpus data 2. A high-performance programming framework compatible with Spark SQL, PySpark, Pandas and other programming interfaces and unified Data + AI 3. Provides a distributed computing programming platform based on C++ to support the development of high-performance data processing modules 4. It processes massive large-scale graph data and provides a high-performance PageRank isometric calculation algorithm 5. Machine learning library for massive large-scale data sets, providing traditional machine learning algorithms such as K-Means and KR
Highlights
- Corpus cleaning for large model training
- It is fully compatible with Spark and has a speed ratio of several times to tens of times compared to Apache Spark in common applications.
- Supports C++ high-performance module development
Details
Pricing
Houdu Chukonu
Vendor refund policy
Returns are currently not supported, but can be cancelled anytime; 请联系tech@houdutech.cn
Legal
Vendor terms and conditions
Content disclaimer
Usage information
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
- Text corpus cleaning library for large model training, supporting functions such as filtering, cleaning, and deduplication of text corpus data
- High-performance programming framework compatible with Spark SQL, PySpark, Pandas and other programming interfaces and unified Data + AI
- Provides a C++-based distributed computing programming platform to support the development of high-performance data processing modules
- Processes massive large-scale graph data and provides a high-performance PageRank isometric calculation algorithm
- A machine learning library for massive large-scale data sets, providing traditional machine learning algorithms such as K-Means and KR
Additional details
Usage instructions
-
Start the cluster
(1), go to CloudFormation, search CloudFormation in the AWS console, go to the CloudFormation homepage (2) to create a stack, select Existing Template -> Upload Existing Template. The address under the template is:
https://chukonu.houdutech.cn/aws-download/cloudformation-chukonu-v1.0.yaml
(3) Set the parameters. In the specified stack details, instanceCount is set to 3, and select the EC2 instance for InstanceType. It is recommended to use m6i.2xlarge, or you can adjust it according to your own actual situation
(4) Other stack options can be submitted according to the default parameters. -
Run the test program
(1) Download the test script and program.
Test script download address: https://chukonu.houdutech.cn/aws-download/submit-job.sh
Test program download address: https://chukonu.houdutech.cn/aws-download/word_count.py (2) After the download is complete, it will
Upload submit-job.sh and word_count.py to the ChuKonumaster server user directory, then execute the command sh submit-job.sh to test. Note: After completing the subscription, follow the above steps, and you can use the server without starting the server on the website or EC2 console. See more usage instructions:
Resources
Vendor resources
Support
Vendor support
Technical support contact information: tech@houdutech.cn
Amazon Web Services infrastructure support
Amazon Web Services Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.